You are on page 1of 112

EE 5410 High Speed and

Embedded Computer
Networking
Interconnection Networks

11/24/2021 EE5410 Ece SCHMIDT 1


Switch Fabric
Motivation

Network on Chip

Distributed processing capability

11/24/2021 EE5410 Ece SCHMIDT 2


Setup
• Multiple processing elements (PE)/nodes
(on-chip processors, memory units , line
cards)/ routers
• Communicate with an interconnection
network (IN)
• Communicated data: packets, messages,
cells, flits
• Cells, flits: Fixed size segments of messages
11/24/2021 EE5410 Ece SCHMIDT 3
Classification of Interconnection
Networks
• Time division (Shared-medium) networks
• Space division networks

Direct networks Single Path


Indirect networks Multi Path
Blocking
Non-blocking

11/24/2021 EE5410 Ece SCHMIDT 4


Shared-medium interconnection
• Single internal • shared by all packets
communication traveling from input
structure: a bus or a ports to output ports
memory. through the IN.
• Cells arriving at input
ports are time-division • Advantage:
multiplexed to a high- – Simple
speed stream with – Easy
N*line rate Multicast/broadcast
• Disadvantage:
– Does not scale
11/24/2021 EE5410 Ece SCHMIDT 5
Shared Medium Interconnection:
Bus
• Operation:
– A time slot is divided into
N mini-slots.
– During each mini-slot, a
cell from an input is
broadcast to all output
ports.
• Throughput determined
by :
– The throughput of the
bus
– Memory bandwidth for
buffers.
11/24/2021 EE5410 Ece SCHMIDT 6
Shared Medium Interconnection:
Bus
• Buffers:
– Completely partitioned by
port
– When an output port is
temporarily congested
due to high traffic
loading, its FIFO buffer is
filled and starts to discard
cells.
– other FIFO buffers may
have plenty of space but
cannot be used by the • Solution: a shared-memory
interconnection, has a better
congested port buffer utilization.

11/24/2021 EE5410 Ece SCHMIDT 7


Shared Medium Interconnection: Memory

• Operation: Time Slot


Interchange (TSI)
– sequential write, reordered
read
– the information in each slot of
the input frame is rearranged
into the appropriate slot in the
output frame
• Throughput determined by :
– Memory bandwidth for
buffers.
• Memory size :
– should be adjusted
accordingly to keep the cell
loss rate below a chosen
value.
11/24/2021 EE5410 Ece SCHMIDT 8
Shared Medium Interconnection: Memory

• Advantage:
– best memory utilization
– all input/output ports share
the same memory.
• Memory organization:
– complete partitioning:
• Each output port gets a
dedicated share
– full sharing:
• the entire memory is shared by
all output ports without any
reservation.
– Partial Sharing:
• putting an upper and a lower
bound on the memory space

11/24/2021 EE5410 Ece SCHMIDT 9


Space division interconnection
• Multiple physical paths between the input and output
nodes.
• Operate concurrently so that multiple cells can be
transmitted across the interconnection simultaneously.

11/24/2021 EE5410 Ece SCHMIDT 10


Space division interconnection
• Space division IN use packets to route data from the source
to the destination Processing Element (PE) via a network
fabric that consists of
• Switching elements/ routers
• interconnection links (wires)

Distributed, Direct
Centralized, Indirect

11/24/2021 EE5410 Ece SCHMIDT 11


Space division interconnection
• The total capacity of the interconnect:
– the product of the bandwidth of each
path and the number of paths that can
transmit cells concurrently.
• Theoretically: unlimited.
• Practice: restricted by physical
implementation constraints
– the device pin count, connection
restrictions, synchronization
considerations.
11/24/2021 EE5410 Ece SCHMIDT 12
Space division interconnection:
Indirect Networks
• Connect terminal nodes via one or more intermediate
stages of switch nodes.
• Only terminal nodes are sources and destinations of traffic
• Intermediate nodes simply switch traffic to and from
terminal nodes.

Centralized, Indirect
11/24/2021 EE5410 Ece SCHMIDT 13
Space division interconnection:
Direct Networks
• Each terminal node (e.g., a processor core or cache in a chip
multiprocessor) is associated with a router
• All routers act as both sources/sinks of traffic and as direct network
switches for traffic from other nodes.
• Most designs of on-chip networks have used direct networks since
co-locating routers with terminal nodes is often most suitable in
area-constrained environments on a chip.

Distributed, Direct

11/24/2021 EE5410 Ece SCHMIDT 14


Indirect Interconnection
Networks (IN)
• NxN IN enables N nodes to communicate
• Network permutation: For an NxN IN, an
arbitrary set of N (1-1) I/O connections
• What is the number of permutations?
• P=N!
• For an arbitrary network: P≤N!

11/24/2021 EE5410 Ece SCHMIDT 15


Indirect IN: Accessibility and
blocking
• full accessibility:
– each inlet can be connected to each outlet when no other
I/O connection is established in the network
– Full accessibility is a feature usually required today in all
interconnection networks
• blocking property:
– network connection capability between idle inlets and
outlets in a network with an arbitrary current permutation,
that is when the other inlets and outlets are either busy or
idle and arbitrarily connected to each other.

11/24/2021 EE5410 Ece SCHMIDT 16


Indirect IN: Blocking
• Non-blocking:
– an I/O connection between an arbitrary idle inlet and
an arbitrary idle outlet can be always established by
the network independent of the network state at set-
up time.
• Blocking:
– at least one I/O connection between an arbitrary idle
inlet and an arbitrary idle outlet cannot be established
by the network
– Because of internal congestion due to the already
established I/O connections.

11/24/2021 EE5410 Ece SCHMIDT 17


Indirect IN: Blocking
• Strict-sense non- • Rearrangeable non-
blocking (SNB): blocking (RNB):
– the network can always – the network can always
connect each idle inlet to connect each idle inlet to
an arbitrary idle outlet an arbitrary idle outlet by
independent of the current applying, if necessary, a
network permutation suitable internal
– independent of the rearrangement of the I/O
connections already
already established set of
established.
I/O connections and of the
policy of connection – Rearranging paths for
allocation. connected pairs involves
extra control complexity,
– higher implementation
– may cause disruption of a
cost
connection.
11/24/2021 EE5410 Ece SCHMIDT 18
Indirect IN: Paths
• single-path
– only one path exists for any input/output pair
– simpler routing control
– crossbar, fully interconnected, banyan
• multiple-path
– there is more than one path for any
input/output pair
– higher fault tolerance.
– Clos networks

11/24/2021 EE5410 Ece SCHMIDT 19


Indirect IN: Crossbar switch
• Indirect, single path, non-
blocking
• NxN crossbar switch
• Crosspoints:
– NxN square array
– individually operated
– one corresponding to each
input/output pair.
• Each crosspoint has two
possible states:
– cross(default)
– bar
• A connection between input port
i and output port j :
– set the(i, j)th crosspoint switch to
the bar state • Control=0➔ Cross
– let other crosspoints along the
connection remain the cross • Control=1➔Bar
state.
11/24/2021 EE5410 Ece SCHMIDT 20
Crossbar switch
• Self-routing property:
– bar state of a crosspoint:
• can be triggered
individually by each
incoming cell when its
destination matches with
the output address
• No global information
about other cells and their
destinations is required.
– control function is distributed
among all crosspoints.
– control complexity is
significantly reduced in the
switching fabric

11/24/2021 EE5410 Ece SCHMIDT 21


Crossbar switch
• Advantages:
– Internally nonblocking
– simple in architecture
– Modular
• Disadvantages:
– Complex in terms of the number of the crosspoints, which grows
with N2.
– The arbitration can also become a system bottleneck as the
switch size increases.
• Possible locations for the buffers in a crossbar switch:
– at the crosspoints in the switch fabric
– at the inputs of the switch
– at the inputs and outputs of the switch.

11/24/2021 EE5410 Ece SCHMIDT 22


Multi Stage Interconnection (MIN)
• A common way of addressing the crossbar scaling problem
• Splitting the large crossbar switch into several stages of smaller
switches
• interconnected in such a way that a single pass through the switch
fabric allows any destination to be reached from any source.

11/24/2021 EE5410 Ece SCHMIDT 23


Banyan-based Switches
• Indirect, single path, blocking
• constructed from 2x2 switching elements (crossbar switches)
• NxN banyan switch, cells pass through log2N stages of switching
elements before reaching their destinations➔ Similar to binary
trees
• self-routing switches
• Three topologies: delta, omega, and banyan networks
all of them offer equivalent performance

11/24/2021 EE5410 Ece SCHMIDT 24


Banyan-based Switch: Advantages
• Complexity:
– Banyan based switch complexity of paths and switching
elements of O(Nlog2N)
– Crossbar complexity: order O(N2)
– much less than crossbar-based switch
• Parallel structure
– several cells on different paths can be processed
simultaneously
• modular and recursive structure
– large-scale switches can be built by using elementary
switching elements without modifying their structures.

11/24/2021 EE5410 Ece SCHMIDT 25


Banyan-based Switch: Self Routing
– no control mechanism is
needed for routing cells Self Routing in Banyan Network
from inputs to outputs
– Routing information is
contained within each cell
– it is used while the cell is
routed along the path
– Destination:110
– down(1)➔down(1) ➔
up(1)
– Destination:001
– up(0)➔up(0) ➔ down(1)

11/24/2021 EE5410 Ece SCHMIDT 26


Banyan-based Switch: Blocking
problem
• While a cell is being routed
in the IN, it can get stuck:
Internal link blocking
– when multiple cells contend
for a link at the same time
inside the IN
– where an internal physical
link is shared by multiple
connections among
input/output ports.
• A blocking switch is a switch
suffering from internal
blocking.
• A switch that does not
suffer from internal blocking 011 to 010
is called nonblocking. 010 to 011

11/24/2021 EE5410 Ece SCHMIDT 27


Three stage Clos networks
• Indirect, multipath, non-
blocking/blocking based on
parameters
• A three stage network is defined
by five parameters:
– r 1, r 2, r 3: the number of
switches in each stage
– m1: the number of inputs to a first
stage switch
– n3: the number of outputs to a
third stage switch.
• switches in consecutive stages
are connected by exactly one
edge.
• other parameters are then
uniquely determined
11/24/2021 EE5410 Ece SCHMIDT 28
Connections in Clos networks
• any switch a in stage 1 and
any switch b in stage 3.
• An input to a may be
connected to an output of b
via a middle switch f .
• Other inputs to a may be
connected to other outputs
of b via the middle switches
g , h , etc.,

11/24/2021 EE5410 Ece SCHMIDT 29


Paull’s Matrix
• represents these paths
• enters the middle
switches f , g , h etc. into
the (a ,b ) entry of an r 1
×r 3 matrix.
• each entry of the matrix
may have none, one, or
more than 1 middle
switches.

11/24/2021 EE5410 Ece SCHMIDT 30


Paull’s Matrix: Conditions for legitimate point to point
switching
1. Each row can have at
most m1 symbols,
2. Each column can have
at most n3 symbols
3. The symbols in each
row must be distinct
4. The symbols in each
column must be distinct
m1: the number of inputs to a first
stage switch
n3: the number of outputs to a third
stage switch.

11/24/2021 EE5410 Ece SCHMIDT 31


What is the condition to have a non-blocking
Clos Network
• What is the number of switching elements in the
second stage such that the network is:
– Strict sense non-blocking
– Rearrangeable non-blocking
• Which one will have less switching elements?
• Rearrangeable non-blocking

11/24/2021 EE5410 Ece SCHMIDT 32


Strict-Sense Non-Blocking Clos Networks

• Suppose we first make connections from a single first stage switch a to third stage
switches except for third stage switching element b. We connect all inputs of a
except for one (to later connect to b).
• In the process, we enter m1-1 symbols into row a of the matrix.

11/24/2021 EE5410 Ece SCHMIDT 33


Strict-Sense Non-Blocking Clos Networks

• Then we make connections to a third stage switch b to all first stage switches
except a. We connect all outputs of a except for one (to later connect to a).
• In the process, we enter n3-1 symbols into column b

11/24/2021 EE5410 Ece SCHMIDT 34


Strict-Sense Non-Blocking Clos Networks

• Now we want to make the last connection between the unused input of a and
unused output of b.
• Upto now we entered (m1-1)+(n3-1) symbols. We need one more symbol to make
the connection. All these symbols have to be distinct.
• Total we need: (m1-1)+(n3-1) +1= m1+n3-1 symbols

11/24/2021 EE5410 Ece SCHMIDT 35


Strict-Sense Non-Blocking condition
for Clos Networks
• The condition for a Clos network to be
strict-sense non-blocking for point- to-point
connections is given by Clos Theorem:
– A Clos network is strict-sense non-blocking if
and only if the number of second stage
switches r 2 ≥m1 +n3 -1.
– In particular, a symmetric network with m1
=n3 =n is strict-sense non- blocking if and
only if r 2 ≥2n -1.
11/24/2021 EE5410 Ece SCHMIDT 36
Rearrangeable Non-blocking
Clos Networks
• Slepian-Duguid Theorem:
– A three stage Clos network is rearrangeably non-blocking if
and only if r 2 ≥max(m1 ,n3 ).
– In particular, a symmetric network with m1 =n3 =n is
rearrangeably non-blocking if and only if r2≥n .
• Proof:
– Necessary: A three stage Clos network is rearrangeable →r2
≥max(m1 ,n3).
• Since there are m1 inputs to a first stage switch and n3
outputs to a third stage switch, it is also necessary to have at
least max(m1 ,n3 ) second stage switches in order that the
first or third stage switches may be fully utilized.
– We leave the sufficient condition out of the scope

11/24/2021 EE5410 Ece SCHMIDT 37


An example
• m1=2, n3=3 Cannot be established
• Assume r2=3 (max of without rearrangement
2,3)

a x d

b y e

c z

11/24/2021 EE5410 Ece SCHMIDT 38


An example: Rearranged
• m1=2, n3=3 Cannot be established
• Assume r2=3 (max of without rearrangement
2,3)

a x d

b y e

c z

11/24/2021 EE5410 Ece SCHMIDT 39


Recursive Construction
• Given a switching network:
– Construct a three stage Clos network:
– Factor it into three stages of smaller switching
elements
– Check given conditions on the number of second
stage switches:
• strict-sense
• rearrangeably non-blocking networks
– Idea:
– Can we build the smaller switching elements from
even smaller switching elements using three-stage
construction?
– reducing their crosspoint complexity.
11/24/2021 EE5410 Ece SCHMIDT 40
Recursive Construction for Rearrangeable

• Assume first stage: pxp,


second stage: qxq, third
stage: pxp
• The crosspoint count for the
rearrangeable construction is
– Cr=p2q +q2p +p2q = 2p2q +q2p
– pxq=N
– Take derivative with respect to
q
– q=sqrt(2N), p=sqrt(N/2)
– Crmin=O(Nsqrt(N))
11/24/2021 EE5410 Ece SCHMIDT 41
Recursive Construction

• Special case: N=2n , n is a


positive integer
• Factoring N into p =2 and
q =N /2.

11/24/2021 EE5410 Ece SCHMIDT 42


Recursive Construction : Rearrangeable
non-blocking networks
• Special case:
N=2n , n is a
positive integer
• Factoring N into p
=2 and q =N /2.

11/24/2021 EE5410 Ece SCHMIDT 43


Recursive Construction : Rearrangeable
non-blocking networks
• Keep factoring
recursively:
• N=N/2 (middle
stage)

11/24/2021 EE5410 Ece SCHMIDT 44


Recursive Construction : Rearrangeable
non-blocking networks
• The resulting network is called
a Benes network:
– 2log2N−1 stages
– Each stage consists of N /2
switching elements
– O(NlogN) switching elements

❑ Each switching element is size


2×2.
❑ the number of crosspoints:
4Nlog2N.
❑ this is roughly the theoretical
minimum number required
◼ What is the theoretical
minimum?
11/24/2021 EE5410 Ece SCHMIDT 45
Combinatorial Bounds on the
Crosspoint Complexity
• Idea:
– 2x2 switching nodes
– two legitimate states 2x2x2x…x2=N!
2K=N!
– the cross state and the pass state. log2N!=KO(NlogN)➔Lower
– Put K of them together somehow to bound
build an NxN switch (Using some approximation)
– realize N! different connection
patterns

11/24/2021 EE5410 Ece SCHMIDT 46


Combinatorial Bounds on the
Crosspoint Complexity: Summary
• Using the Benes network, we have
2log2N−1 stages each of N /2 switches
of size 2x2.
• we have a total of Nlog2N−N /2 binary
switches.
• This number is very close to the lower
bound

11/24/2021 EE5410 Ece SCHMIDT 47


Control Algorithms for Rearrangeable
Networks
• The connection pattern across the whole
switching network can be dramatically changed,
even though we want to add just one
connection.
• Consequently, adding a connection in a Benes
network can be almost as complicated as
reconnecting all input-output pairs.
• Control algorithms can be applied recursively.

11/24/2021 EE5410 Ece SCHMIDT 48


Alternative approach: Cantor
Networks
• Each input and output of the Cantor
network are connected to the
corresponding input and output of m
Benes network through
demultiplexers and multiplexers.
• m=log2N Benes planes is sufficient
to make the Cantor network strict-
sense non-blocking.
• Since the Benes network has
crosspoint count 4Nlog2N the Cantor
network gives a strict-sense non-
blocking network of complexity of
roughly 4N(log2N)2, if we ignore the
crosspoint count for the multiplexers
and demultiplexers.

11/24/2021 EE5410 Ece SCHMIDT 49


Strict sense non-blocking networks cost

11/24/2021 EE5410 Ece SCHMIDT 50


System on-chip (SoC)
• Integration of an entire system
onto the same silicon die
(System-on-Chip, SoC)
• general-purpose fully
programmable processors, co-
processors, DSPs, dedicated
hardware accelerators, communication—rather than
memory blocks, I/O blocks computation-
dominated
• multi-processor SoCs
Bertozzi, Davide, and Luca Benini.
(MPSoCs) "Xpipes: A network-on-chip architecture for
gigascale systems-on-chip." IEEE circuits
and systems magazine 4.2 (2004): 18-31.
11/24/2021 EE5410 Ece SCHMIDT 51
Network on-chip (NoC)
• A scalable communication
infrastructure
• On-chip packet-switched micro-
network of interconnects
• Traditional large-scale multi-
processors+ the wide-area networks

• on-chip router (or switch)-based networks


• packetized communication
• Cores access the network by means of proper
interfaces,
• packets forwarded to destination through a
multihop routing path.
11/24/2021 EE5410 Ece SCHMIDT 52
Organization Styles for SoC

• Homogeneous: • Heterogenous:
– Tile-Based On-Chip Multi- – MPEG-4 MPSoC
Processor.
– Identical Cores-Processing
Elements
– regular mesh
11/24/2021 EE5410 Ece SCHMIDT 53
Classification of Interconnection
Networks
• Time division (Shared-medium) networks
• Space division networks

Direct networks Single Path


Indirect networks Multi Path
Blocking
Non-blocking

11/24/2021 EE5410 Ece SCHMIDT 54


Space division interconnection:
Direct Networks
• Each terminal node (e.g., a processor core or cache in a chip
multiprocessor) is associated with a router
• All routers act as both sources/sinks of traffic and as direct network
switches for traffic from other nodes.
• Most designs of on-chip networks have used direct networks since
co-locating routers with terminal nodes is often most suitable in
area-constrained environments on a chip.

Distributed, Direct

11/24/2021 EE5410 Ece SCHMIDT 55


On-chip Networks
• On-chip network resources: channels and router nodes.
• Node: any component that connects to the on-chip
network: core, cache, memory controller, etc.
• Topology: the physical layout and connections between
nodes and channels
• Routing: determines the path of the packet through the
network
• Flow control: allocating (and deallocating) buffers and
channel bandwidth to waiting packets
• Router microarchitecture: input buffers, router state,
routing logic, allocators, and a crossbar (or switch).
11/24/2021 EE5410 Ece SCHMIDT 56
Data transmission
• Flit:
• Fixed length message (packet) flow
control units
• Logical
• include a virtual-channel identifier
(VCID) to record the assignment of
packets to control state
• Phit:
• Number of bits that can be transferred
in parallel in a single cycle
• The flows in wide area networks are • Physical
replaced by flow of flits: A packet.• Typically, phit size = flit size
• Each packet includes routing • The flow of phits across the channel is
information (RI) and a sequence synchronized to prevent buffer
number (SN). overflow in the receiving end of the
channel.
11/24/2021 EE5410 Ece SCHMIDT 57
Data transmission
• A head flit:
– the first flit of a packet
– carries the packet’s routing
information.
• As a packet traverses a network,
the head flit allocates channel state
for the packet and the tail flit
deallocates it.
• Body and tail flits:
– have no routing or sequencing
• The flows in wide area networks information
are replaced by flow of flits: A – must follow the head flit along its
packet. route and remain in order.

11/24/2021 EE5410 Ece SCHMIDT 58


Switching in interconnection
networks
• Performed at each switching element,
regardless of topology
• Establishes the connection of paths for packets
(How allocated?)
• Needed to increase utilization of shared
resources in the network
• Flow control: Allocation of resources during the
routing of a packet requires allocation of various
resources: Channels and buffers (if they exist)

11/24/2021 EE5410 Ece SCHMIDT 59


Flow Control Channel Allocation

Input Input Input


1 (R1) 3 (R2) 2 (R3)

Output Output Output


Router 1 Router 2 2 (R2) Router 3 4 (R2)
4 (R1)

Source Destination
end node end node

input and output channels as a sequence

11/24/2021 EE5410 Ece SCHMIDT 60


Bufferless Flow Control

• If dropped: NACKs
are sent back, and
the original sender
has to re-transmit
the packet

11/24/2021 EE5410 Ece SCHMIDT 61


Bufferless Flow Control: Circuit
Switching
• A routing probe is first sent to reserve the channels
• Allocate input and output channels as a sequence
– Coupled allocation of the input and output channels
– All channels in the sequence have to be free at the time of
allocation
• The probe may be held at an intermediate router until the
channel is available (hence, not truly bufferless),
• Set-up: A “circuit” path is established a priori and torn
down after use
• Possible to pipeline the establishment of the circuit with
the transmission of multiple successive packets along
the circuit
11/24/2021 EE5410 Ece SCHMIDT 62
Circuit Switching
• Routing, arbitration, switching performed once
for train of packets
– Routing bits not needed in each packet header
– Reduces latency and overhead
• Can be highly wasteful of scarce network
bandwidth
– Links and switches go under utilized
• during path establishment and tear-down
• if no train of packets follows circuit set-up

11/24/2021 EE5410 Ece SCHMIDT 63


Circuit Switching
• Set-up: Realized by injecting the routing header (routing
probe) flit into the network.
• routing probe:
– contains the destination address and some additional control
information
– progresses toward the destination reserving physical links as it is
transmitted through intermediate routers.
– When reaches the destination, a complete path has been set up
and an acknowledgment is transmitted back to the source.
• The message contents may now be transmitted at the
full bandwidth of the hardware path.

11/24/2021 EE5410 Ece SCHMIDT 64


Circuit Switching
Buffers
for routing probes/acks

Source Destination
end node end node

11/24/2021 EE5410 Ece SCHMIDT 65


Circuit Switching
Buffers
for routing probes/acks

Source Destination
end node end node
Request for circuit establishment
(routing and arbitration is performed during this step)

11/24/2021 EE5410 Ece SCHMIDT 66


Circuit Switching
Buffers
for routing probes/acks

Source Destination
end node end node
Request for circuit establishment

Acknowledgment and circuit establishment


(as token travels back to the source, connections are established)

11/24/2021 EE5410 Ece SCHMIDT 67


Circuit Switching

Source Destination
end node end node
Request for circuit establishment

Acknowledgment and circuit establishment

Packet transport:
(no routing, no arbitration no packet buffer)

11/24/2021 EE5410 Ece SCHMIDT 68


Circuit Switching

Source Destination
end node end node
Request for circuit establishment

Acknowledgment and circuit establishment

Packet transport

High contention, low utilization (r ) → low throughput

11/24/2021 EE5410 Ece SCHMIDT 69


Buffered Flow Control
• Storing a flit/a packet in a buffer
• Decouples allocation of the input channel
to a flit from the allocation of the output
channel to a flit.
• A flit can be transferred over the input
channel on cycle i and stored in a buffer
for a number of cycles j until the output
channel is successfully allocated
11/24/2021 EE5410 Ece SCHMIDT 70
Buffered Flow Control
• Buffers and channel bandwidth can be
allocated to either flits or packets.
• Packet-level Flow Control
– store-and-forward
– cut-through
• Flit-level Flow Control
– wormhole

11/24/2021 EE5410 Ece SCHMIDT 71


Buffered Flow Control and
Virtual Channels
• A virtual channel is a buffer organization per
packet
• holds the state needed to coordinate the
handling of the flits of the packet over a channel
• State:
– Output channel
– pointers to the flits of the packet that are buffered on
the current node
– The number of flit buffers available on the next node
(for control)
11/24/2021 EE5410 Ece SCHMIDT 72
Virtual Channels
• Each virtual channel
associates with a buffer.
• Packets
– received by input ports
– stored into virtual-
channel buffers
according to their virtual-
channel ID (VCID).
Liu, Feiyang, Huaxi Gu, and Yintang Yang.
"Performance study of virtual-channel router for
Network-on-Chip." 2010 International
Conference On Computer Design and
Applications. Vol. 5. IEEE, 2010.

11/24/2021 EE5410 Ece SCHMIDT 73


Virtual Channels
• Routing Control (RC)
unit:
– gets routing
information (source
node address,
destination node
address, etc.) from
the buffered packets,
Liu, Feiyang, Huaxi Gu, and Yintang Yang.
"Performance study of virtual-channel router for – calculates an
Network-on-Chip." 2010 International
Conference On Computer Design and
appropriate output
Applications. Vol. 5. IEEE, 2010. port for each packet.
11/24/2021 EE5410 Ece SCHMIDT 74
Virtual Channels
• virtual-channel
allocator
– Allocates a free
output virtual
channel to the
packet.

Liu, Feiyang, Huaxi Gu, and Yintang Yang.


"Performance study of virtual-channel router for
Network-on-Chip." 2010 International
Conference On Computer Design and
Applications. Vol. 5. IEEE, 2010.

11/24/2021 EE5410 Ece SCHMIDT 75


Virtual Channels
• Switch allocator
– controls the connection
status of the crossbar
– allocates a time slot to
each data flit to
transmit through the
crossbar.
• Finally, all the packets
Liu, Feiyang, Huaxi Gu, and Yintang Yang.
"Performance study of virtual-channel router for
are ejected from
Network-on-Chip." 2010 International output port to next
Conference On Computer Design and
Applications. Vol. 5. IEEE, 2010. router or local IP core.
11/24/2021 EE5410 Ece SCHMIDT 76
Packet Switching
• Store-and-forward switching
– Bits of a packet are forwarded only after entire
packet is first stored
– Packet transmission delay is multiplicative
with hop count, D

11/24/2021 EE5410 Ece SCHMIDT 77


Store and Forward
Buffers
for data
packets

Store

Source Destination
end node end node

Packets are completely stored before any portion is forwarded

11/24/2021 EE5410 Ece SCHMIDT 78


Store and Forward
Requirement:
buffers must be
sized to hold
entire packet
(MTU)

Forward
Store

Source Destination
end node end node

Packets are completely stored before any portion is forwarded

11/24/2021 EE5410 Ece SCHMIDT 79


Packet Switching
• Cut-through switching
– Bits of a packet are forwarded once the header
portion is received
– router can start forwarding the header and following
data bytes as soon as routing decisions have been
made and the output buffer is free.
– The message does not even have to be buffered at
the output and can cut through to the input of the next
router before the complete packet has been received
at the current router.

11/24/2021 EE5410 Ece SCHMIDT 80


Packet Switching
• Cut-through switching
– Virtual cut-through: flow control is applied at the
packet level
– Wormhole: flow control is applied at the flow unit (flit)
level
– Buffered wormhole: flit-level flow control with
centralized buffering

11/24/2021 EE5410 Ece SCHMIDT 81


Cut Through

Routing

Source Destination
end node end node

Portions of a packet may be forwarded (“cut-through”) to the next switch


before the entire packet is stored at the current switch

11/24/2021 EE5410 Ece SCHMIDT 82


Buffers for data
packets
Requirement:
Virtual cut-through buffers must be
sized
to hold entire
packet
(MTU)

Source Destination
end node end
Buffers for node
flits:
packets can be
Wormhole larger
than buffers

Source Destination
83
end node end node
Buffers for data
packets
Requirement:
Virtual cut-through buffers must be sized
to hold entire packe
(MTU)

Busy
Link

Packet completely
stored at
the switch
Source Destination
Buffers for flits:
end node endbenode
packets can large
than buffers
Wormhole

Busy
Link

Packet stored
along the path

Source Destination
end node Maximizing sharing of link BW increases utilization end node
Timing
• computation of the base latency of an L-bit message in
the absence of any traffic
– physical data channel width = W bits
– phit size = flit size= W bits
– The routing header (control information) =1 flit
– Message size (control+data): L + W bits.
– The physical channel between two routers operates at B Hz; that
is, the physical channel bandwidth is BW bits per second.

11/24/2021 EE5410 Ece SCHMIDT 85


Timing
• A router can make a routing decision in tr seconds.
• channel wires are short enough to complete a transmission in one clock cycle. The bit
transmission and propagation delay across this channel tw=1/B seconds
• Once a path has been set up through the router, the intrarouter delay or switching
delay is denoted by ts.
• The router internal data paths are assumed to be matched to the channel width of W
bits.
• Thus, in ts seconds a W-bit flit can be transferred from the input of the router to the
output.
• The source and destination processors are assumed to be D links apart.

Duato, J. "Interconnection
Networks: An Engineering
Approach, M. Kaufmann
Pub." Inc., USA (2002).

11/24/2021 EE5410 Ece SCHMIDT 86


Baseline Timing: Circuit Switching

Switching decision is once

11/24/2021 EE5410 Ece SCHMIDT 87


Baseline Timing: Store and
Forward

Packet transmission delay is


additive with hop count, D

11/24/2021 EE5410 Ece SCHMIDT 88


Baseline Timing: Cut-through

tblocking= Waiting time for a free output link.

The packet payload sees the path


as a single serial link

11/24/2021 EE5410 Ece SCHMIDT 89


On-chip Networks: Topology
• Small number of nodes➔ dedicated ad hoc wiring to
interconnect
– N2 problem
– Crossbars: scale poorly for a large number of cores
• Solution: Direct network topologies

11/24/2021 EE5410 Ece SCHMIDT 90


Topology Metrics
• Traffic Independent: Do not depend on
the source-destination pairs
– Degree, bisection bandwidth, diameter, path
diversity
• Traffic Dependent: Depend on the
source-destination pairs
– Hopcount, maximum channel load
• Also applicable to indirect networks!
11/24/2021 EE5410 Ece SCHMIDT 91
Topology Metrics
• Degree. number of links at each node.
• Cost: A higher degree requires more ports at routers, which
increases implementation complexity and adds area/energy
overhead at each router.

11/24/2021 EE5410 Ece SCHMIDT 92


Topology Metrics
• Bisection bandwidth. The bandwidth across a cut that
partitions the network into two equal parts.
– ring: two links cross the bisection
– torus: six links cross the bisection
– mesh network: three links cross the bisection

11/24/2021 EE5410 Ece SCHMIDT 93


Topology Metrics
• Bisection bandwidth. The bandwidth across a cut that
partitions the network into two equal parts.
• useful in defining worst-case performance of a particular
network, since it limits the total data that can be moved from
one side of the system to the other.
• Cost: amount of global wiring that will be necessary to
implement the network.
• Note: less useful metric for on-chip networks as opposed to
off-chip networks➔ Global on-chip wiring is considered
abundant relative to off-chip pin bandwidth.

11/24/2021 EE5410 Ece SCHMIDT 94


Topology Metrics
• Diameter.The maximum distance between any two nodes in
the topology
• Distance: the number of links in the shortest route.
– ring: diameter=4
– torus: diameter=2
– mesh network: diameter=4
• Latency: Maximum latency in the topology, in the absence of
contention.

11/24/2021 EE5410 Ece SCHMIDT 95


Topology Metrics
• Path diversity. A topology that provides multiple
shortest paths
• Fault tolerance, load balancing

11/24/2021 EE5410 Ece SCHMIDT 96


Topology Metrics
• Hop count. The number of hops a message takes from
source to destination, or the number of links it traverses
• Latency
• Max hop count: Diameter
• Average hop count: over all possible source-destination pairs
in the network.

11/24/2021 EE5410 Ece SCHMIDT 97


Topology Metrics
• Maximum channel load.
• Find the link in the topology that will be the most loaded under
the given traffic pattern (worst case or average case patterns
can be generated)
• Maximum channel load: Load of this link when all inputs
generate 1 cell-flit/unit time
• Maximum Injection Bandwidth (Throughput) = Speed-up
/Maximum Channel Load.
• Example: Maximum Channel Load =2, speed-up=1
– if we inject a flit every cycle at every node into the network, two
flits will wish to traverse this specific channel every cycle.
– maximum bandwidth of the network =half the link bandwidth, i.e.,
at most, a flit can be injected every two cycles
11/24/2021 EE5410 Ece SCHMIDT 98
Example

• Uniform random: Every node has equal


probability of sending to every node
• Half of traffic from every node will cross
bottleneck channel (why half?)➔8x1/2=4
• Network saturates at ¼ injection bandwidth

11/24/2021 EE5410 Ece SCHMIDT 99


Direct Networks: Popular
Topologies
• Most of the implemented networks have
an orthogonal topology.
• A network topology is orthogonal if and
only if
– nodes can be arranged in an orthogonal n-
dimensional space
– every link can be arranged in such a way that
it produces a displacement in a single
dimension.
11/24/2021 EE5410 Ece SCHMIDT 100
Direct Networks: Popular
Topologies
• Strictly orthogonal topology: every node
has at least one link crossing each
dimension.
• Weakly orthogonal topology:
– Some nodes may not have any link in some
dimensions.
– Crossing a given dimension from a given
node may require moving in another
dimension first.
11/24/2021 EE5410 Ece SCHMIDT 101
k-ary n-cubes
• n: the number of dimensions
• k: the number of nodes along each
dimension
• N=kn: Total number of nodes

11/24/2021 EE5410 Ece SCHMIDT 102


Strictly Orthogonal Topologies

• the distance between two nodes =sum of dimension


offsets.
• the displacement along a given link only modifies the
offset in the corresponding dimension.

11/24/2021 EE5410 Ece SCHMIDT 103


Strictly Orthogonal Topologies

• Routing can be easily implemented


– select a link that decrements the absolute value of the offset in
some dimension.
– The set of dimension offsets can be stored in the packet header
and updated (by adding or subtracting one unit) every time the
packet is successfully routed at some intermediate node.
11/24/2021 EE5410 Ece SCHMIDT 104
Strictly Orthogonal Topologies:
2D mesh
• most popular topology
– all links have the same length
• eases physical design
– area grows linearly with the number
of nodes
– must be designed in such a way as
to avoid traffic accumulating in the
center of the mesh

4 ary 2 cube Mesh


Diameter = 6
Avg hop count = 3

11/24/2021 EE5410 Ece SCHMIDT 105


Routing and Switching
• Routing algorithm determines the path
selected by a packet to reach its
destination.
• Switching mechanism determines
– how and when the input channel is connected
to the output channel selected by the routing
algorithm.
– how network resources are allocated for
message transmission.
11/24/2021 EE5410 Ece SCHMIDT 106
Routing
• Dimension-ordered routing (DOR)
– Simple, most commonly used
– Deterministic: messages from node A to B will
always traverse the same path.
– Alternative paths are not used
– a message traverses the network dimension-
by-dimension (strictly ordered) switches to the
next dimension when it reaches the ordinate
matching its destination.
11/24/2021 EE5410 Ece SCHMIDT 107
Routing
• Making use of alternative paths: messages
traverse different routing paths from A to B (if
available)
– Oblivious: without regard to network congestion (can
be random)
• Can uniformly distribute the load over the paths
– Adaptive to network congestion state

11/24/2021 EE5410 Ece SCHMIDT 108


Example for Oblivious
• Valiant’s Routing
Algorithm:
– From s to d: choose a
random d’
– Route: s→d’ then d’→d
• Problem:
– Increases the hop-count
• Solution:
– Put some constraint on d’
such that it lies in the
minimum quadrant
11/24/2021 EE5410 Ece SCHMIDT 109
Deadlock
• Deadlock freedom can be
ensured in the routing
algorithm, by preventing
cycles among the routes
generated by the algorithm

11/24/2021 EE5410 Ece SCHMIDT 110


Resources
• Dally, William James, and Brian Patrick Towles. Principles and
practices of interconnection networks. Elsevier, 2004.
• Duato, J. "Interconnection Networks: An Engineering Approach, M.
Kaufmann Pub." Inc., USA (2002).
• Hennessy, John L., and David A. Patterson. Computer architecture:
a quantitative approach. 6th Edition, Appendix F. Timothy Mark
Pinkston, José Duato
• Peh, Li-Shiuan, and Natalie Enright Jerger. On-chip networks.
Morgan & Claypool Publishers, 2nd Edition, 2017.

11/24/2021 EE5410 Ece SCHMIDT 111


EE 5410 High Speed and
Embedded Computer
Networking
Interconnection Networks

11/24/2021 EE5410 Ece SCHMIDT 112

You might also like