Professional Documents
Culture Documents
Ingo Sander
ingo@imit.kth.se
Network-on-Chips
z z
Today buses are the dominating technology for system-on-chips However, buses have severe limitations that become evident, if the number of components in a system is large
z z
The bus is a communication bottleneck, bandwidth is limited Buses are only scalable to a certain extent
Networks-on-Chip shall overcome the limitation of buses, since the provide a much larger amount of communication resources and are scalable
2B1448 SoC Architectures 2
A Network-on-Chip
S T S T Channel S T S T
December 14, 2005
Network-on-Chip
S S S T Switch T Terminal Node S S T S T S T S T S T S T S T S T S T
2B1448 SoC Architectures
S T S T S T S T
S T S T S T S T
S T S T T
S T S T S T
T S
T S T
3
Processor Memory Hardware component Bus-based system with several components, e.g. Processor and Memory
4
Network-on-Chip
S T S T S T S T
December 14, 2005
Network Interface
z
S T S T S T S T
S T S T S T S T
S T S T S T S T
Information in the form of packets is routed via channels and switches from one terminal node to another
Network Switch
Interface
Different terminals with different interfaces shall be connected to the network The network uses a specific protocol and all traffic on the network has to comply to the format of this protocol
Network Interface
Network Switch
z
Network abstractions
In order to allow for different resources to connect to the network, the network interface can be divided into
z
International Standards Organization (ISO) developed the Open Systems Interconnection (OSI) model to describe networks:
z
Network Interface
A resource independent part (Network Interface) A resource dependent part (Resource Network Interface)
7
7-layer model
Provides a standard way to classify network components and operations Networks-on-Chips use a similar protocol stack corresponding to the 4 lowest layers of the OSI protocol
2B1448 SoC Architectures 8
OSI model
application presentation session transport network data link physical
December 14, 2005
OSI layers
end-use interface data format application dialog control connections end-to-end service reliable data transport mechanical, electrical
2B1448 SoC Architectures 9
z z z z
Physical: connectors, bit formats, electrical properties Data link: error detection and control across a single link (single hop). Network: end-to-end multi-hop data communication Transport: connection-oriented services over multiple links, e.g. ordering of packets, errorfree connection
2B1448 SoC Architectures 10
Session: services for end-user applications: data grouping, checkpointing, etc Presentation: data formats, transformation services Application: interface between network and end-user programs
11
z z
A message is a contiguous group of bits that is delivered from source terminal to destination terminal. A message consists of packets. A packet is the basic unit for routing and sequencing. Packets maybe divided into flits. A flit (flow control digit) is the basic unit of bandwidth and storage allocation. Flits do not have any routing or sequence information and have to follow the route for the whole packet. A phit (physical transfer digits) is the unit that is transfered across a channel in a single clock cycle.
2B1448 SoC Architectures 13
Packet
RI
SN
Body Flit Body Flit Tail Flit
Head Flit
Flit
Type
VC Phit
Messages, Packets, Flits and Phits are handled in different layers of the network protocol
December 14, 2005 2B1448 SoC Architectures 14
Network-on-Chip
S T S T S T S T
December 14, 2005
S T S T S T S T
S T S T S T S T
S T S T S T S T
Network-on-Chip Topologies
Ingo Sander ingo@imit.kth.se
Dally: Ch 3, (4), 5
z z
Topology (static arrangement of channels and nodes) Routing Techniques (selection of a path through the network) Flow Control (how are network resources allocated, if packets traverse the network) Router Architecture (buffers and switches) Traffic Pattern
15
Network Topology
z
Topology Examples
0 0 00 1 10 20 1 0 1 2 3
The network topology refers to the static arrangement of channels and nodes in the network A good topology allows to fulfill the requirements of the traffic at reasonable costs Network topology can be compared with a network of roads
2B1448 SoC Architectures 17
4-node ring
2 00 01 02 03 3 10 11 12 13 01 11 21
4 02 12 22
20
21
22
23
30
31
32
33
6 03 7 13 23
4x4-Torus
December 14, 2005 December 14, 2005 2B1448 SoC Architectures
18
Nomenclature Network-on-Chip
z
00 01 02 03
is equivalent to
10 11 12 13
20
21
22
23
30
31
32
33
The topology of an interconnection network is specified by a set of nodes N* connected by a set of channels C Messages originate and terminate in set of terminal nodes N, where N N* Here:
z z
N = N* = 16 C = 16 4 = 64
20
19
Nomenclature Network-on-Chip
z
00 01 02 03
Nomenclature Network-on-Chip
Each channel c = (x,y) C connects a source node x to a destination node y, where x, y N* A channel is characterized by its width wc or wxy , which is the number of parallel signals it contains The source node of a channel is denoted sc and the destination node dc
21
00
01
02
03
z
10 11 12 13
10
11
12
13
20
21
22
23
20
21
22
23
30
31
32
33
30
31
32
33
Its frequency fc or fxy is the rate at which bits are transported on a signal Its latency tc or txy is the time required for a bit to travel from x to y Usually the latency is directly related to the physical length of the channel lc = vtc of the by a propagation velocity v The bandwidth of the channel is bc = wcfc
22
Nomenclature Network-on-Chip
z
00 01 02 03
Direct Network
z
00
01
02
03
10
11
12
13
CIx = {c C | dc = x} is
z
20 21 22 23
z
30 31 32 33
degree
2B1448 SoC Architectures
The degree of x is x = |Cx |, which is the sum of the in degree and out
10
11
12
13
20
21
22
23
30
31
32
33
Direct Network
Indirect Network
z
Indirect Network
December 14, 2005 23 December 14, 2005 2B1448 SoC Architectures 24
Bisection of a network
z z
Bisection
z
A bisection of a network is a cut that partitions the entire network nearly in half The channel bisection of a network is the minimum channel count over all bisections of the network
Channel bisection
z
Bc =
z
min
bisections
C ( N1 , N 2 )
Bandwidth bisection
z
4-node ring
The bisection bandwidth of a network is the minimum bandwidth over all bisections of the network
BB =
December 14, 2005
min
bisections
B ( N1 , N 2 )
25 December 14, 2005 2B1448 SoC Architectures 26
Paths
Non-Minimal Path (|P| = 5)
Paths
z
02 03
00
01
10
11
12
13
z
20 21 22 23
z
30 31 32 33
A path is an ordered set of channels P = { c1, c2, ... ,cn }, where dc,i = sc,i+1 for i = 1... (n - 1) The length or hop count of a path is |P | A minimal path from node x to node y is a path with the smallest hop-count
27
z
00 01 02 03
10
11
12
13
20
21
22
23
30
31
32
33
Paths
Largest minimal hop count (Diameter Hmax = 4)
00 01 02 03
Paths
z
10
11
12
13
The diameter Hmax is the largest minimal hop count over all pairs of terminal nodes
10 1
11 2
12 3
13 2
20
21
22
23
2 20
3 21
4 22
3 23
The average minimum hop count Hmin is defined as the average hop count over all sources and destinations
30
31
32
33
30 1
31 2
32 3
33 2
H min =
z
1 N2
x , yN
H ( x, y )
30
Here: Hmin = 2
29
Paths
Non-Minimal Path (|P| = 5)
Paths
z
02 03
00
01
10
11
12
13
20
21
22
23
30
31
32
33
A specific implementation may choose to incorporate some non-minimal path Then the actual average hop count Havg is defined over the path used by the network
00
01
02
03
10
11
12
13
z
20 21 22 23
30
31
32
33
H avg H min
2B1448 SoC Architectures 31
32
Traffic Patterns
z
Throughput
z
The traffic pattern is a very important factor for the performance of a network In uniform random traffic each source is equally likely to send to each destination Uniform random traffic is the most commonly used traffic pattern, however it implies a balancing of the load, which often does not cause a problem for the network
2B1448 SoC Architectures 33
The throughput of a network is the data rate in bits per second that the networks accepts per input port The topology of a network has a significant impact on the throughput (besides flow control and routing) The ideal throughput is defined as the throughput assuming a perfect routing and flow control
z z
34
Thoughput
z
Throughput
z
Maximum throughput occurs, if some channel of the network becomes saturated The channel load of a channel is
z
the ratio of the bandwidth demanded from the channel to the bandwidth of the input ports (in other words) the amount of traffic that must cross the channel, if each input unit injects one unit of traffic according to the given traffic pattern
The ideal throughput ideal is the input bandwidth that saturates the bottleneck channel
ideal = b / max
The channel that carries the largest fraction of the traffic determines the maximum channel load max
2B1448 SoC Architectures 35
In general it is difficult to determine the maximum channel load max, but in case for uniform traffic the task is much simpler
36
z
10 11 12 13
20
21
22
23
30
31
32
33
z z
Assuming uniform traffic, 50% of the packets cross the bisection channels Best throughput, if packets are evenly distributed over the bisection channels Load on these channels is then = N / 2BC Thus max = N / 2BC And the ideal throughput is ideal b / max = 2bBC /N, which is a upper bound
37
z z z
A packet needs in average Hmin hops to be delivered There are C channels in the network We have N nodes sending packets With equal load, we get a lower bound for
c , LB = max,LB =
H min N C
38
Latency
z
Latency
z
The latency of the network is the time required for a message to traverse a network, from the the time head arrives at the input port to the time where the tail of the mesage departs the output port Latency depends not only on topology, but also on routing, flow control and the design of the router Here the focus lies on topology
2B1448 SoC Architectures 39
Head latency Th : Time required for head of the message to traverse the network Serialization latency Ts= L/b : Time required for the tail to catch up (time for a message of length L to cross a channel with bandwidth b)
40
Head Latency
z
Latency
z
z z
Router delay Tr (time spent in the routers) and time of flight Tw (time spent on wires) Tr = Hmin tr Tw = Dmin / v (average distance Dmin , propagation velocity)
Clearly Hmin, Dmin, and b are to a large extent determined by the topology If there is congestion in the network there is a forth term TC
41
42
Latency
Head Tail Arrival at node x tr Leave x txy Arrival at node y tr Leave y txy Arrival at switch z L/b
z z z
y
z z
The network has N = 64 nodes Hmin = 4 Channel width wc = 16 Channel frequency fc = 1 GHz Channel latency tc = 5 ns Router delay tr = 8 ns Packet Length L = 64 bytes
z z
43
44
z z z
z z z
The network has N = 64 nodes Hmin = 2 8 / 3 = 5.33 Channel width wc = 32 Channel frequency fc = 1 GHz Channel latency tc = 1 ns Router delay tr = 1 ns Packet Length L = 16 bytes
z z z
z z z
The network has N = 64 nodes Hmin = 2 8 / 3 = 5.33 Channel width wc = 32 Channel frequency fc = 1 GHz Channel latency tc = 1 ns Router delay tr = 1 ns Packet Length L = 16 bytes
z z
Tr = Hmin tr = 5.33 1ns = 5.33 ns Tw = Hmin tc = 5.33 ns Ts = L / b = L / (fc wc ) = 16 8 / (1 GHz 32) = 128 / 32 ns = 4 ns T0 = 5.33 ns + 5.33 ns + 4 ns = 14.66 ns
45
46
Path Diversity
z
Path Diversity
0 0 00 1 10 20 1
A network with multiple minimal paths between most pairs of node is more robust than a network that has only one single route between the nodes
Random Traffic
z
Each node is equally likely to send a message to any other node 50% of the packets pass the bisection max = 1
2 01 3 11 21
4 02 5 12 22
47
48
Traffic Patterns
z z
Path Diversity
z
The performance of a network is strongly depending on the traffic pattern The table below shows a number of different traffic patterns that can be used to analyze the performance of the network
Bit Rotation Traffic z The node with address { b2, b1, b0 } sends to { b1, b0, b2 } z Thus we get the following permutation { 0, 2, 4, 6, 1, 3, 5, 7 } z Thus packets from nodes {0,1,4, 5} will all have to pass switch node 10 z max = 2 (since for instance channel 00, 10 is used by two connections)
0 00 1 10 20
2 01 3 11 21
4 02 5 12 22
49
50
Path Diversity
z
In the torus there are several minimal paths to go, if the source and destination are not adjacent Also a non-minimal route can be taken, this is not possible in a butterfly network
10
11
12
13
Torus and Mesh networks, k-ary n-cubes, pack N = kn nodes in each dimension and channels between nearest neighbors Advantages
z z z
20
21
22
23
Regular structure allows efficient packaging For local communication latency is low Good path diversity Comparably larger hop count
2B1448 SoC Architectures 52
30
31
32
33
Disadvantage
z
4x4-Torus
51
Torus
z z
Mesh
z z
00
01
02
03
10
11
12
13
10
11
12
13
20
21
22
23
20
21
22
23
30
31
32
33
30
31
32
33
Channel Bisection BC,T = 4 N / k Channel load under uniform traffic (50% of traffic crosses bisection) T,U = k / 8 Channel load under worst traffic (100% of traffic crosses bisection) T,W = k / 4 Average minimum hop count (k even) Hmin, T = nk / 4
Channel Bisection BC,M = 2 N / k Channel load under uniform traffic (50% of traffic crosses bisection) M,U = k / 4 Channel load under worst traffic (100% of traffic crosses bisection) M,W = k / 2 Average minimum hop count (k even) Hmin, M = nk / 3
54
53
In order to implement a network on a chip, the abstract nodes of the network must be mapped to real positions in physical space A goal is to have the same latency for all channels
Summary
z
The topology is an important factor of the network Mesh and Tori offer a huge amount of bandwidth and path diversity Performance is dependent on the traffic pattern
57
Definitions on Non-Blocking
z
z z
A network is non-blocking, if it can handle all circuit requests that are a permutation of the inputs and outputs Otherwise it is blocking A network is strictly non-blocking, if any permutation can be set up incrementally, without the need to rearrange existing connections A network is rearrangably non-blocking, if it can route arbitrary permutations, but incremental constructions may require a rearrangement of existing connections
2B1448 SoC Architectures 59
Unicast traffic means that each input is connected to at most one output Multicast traffic means that an input can be connected to several outputs Networks may be non-blocking for unicast or unicast and multicast
60
Non-Interfering Networks
z
Crossbar Networks
z
z
1. 2.
While non-blocking is very important for circuit-switching applications, it is by far not as important in packet switching networks A packet switching network shall be noninterfering, which means
There must be adequate channel bandwidth to support all of the traffic sharing the channel No single flow is denied service because of other flows for more than a short predetermined period of time
2B1448 SoC Architectures 61
A n x m crossbar or crosspoint switch directly connects n inputs to m outputs with no intermeditate stages It is strictly non-blocking for both unicast and multicast Problem: Cost of a n x n network increases as n2 Can only be used for small networks
Symbol
62
Clos Networks
z
A Clos network is a threestage network in which each stage is composed of a number of crossbar switches A symmetric Clos is characterized by a triple (m, n, r) denoting
z
z z
the number of input and output ports on each input or output switch (n) number of middle stage switches (m) number of input and output switches
2B1448 SoC Architectures
63
64
Benes Network
z
A Clos network from 2 x 2 switches, is also called a Benes network These networks require a minimum number of crosspoints to connect N = 2i ports It has 2i 1 stages with 2i-1 switches Thus total number of crosspoints is (2i 1) 2i+1
2B1448 SoC Architectures 66
65
Concentrators
z
Concentrator
z
Concentrators combine the traffic of several terminal nodes into a single network channel This is useful when
z
Traffic from a single terminal is to small to justify an own network port Traffic from several bursty channels can be combined
M terminals are combined into a single network channel that can have a lower bandwidth than the sum of the terminal channels Concentration Factor is kC = MbT / bN
67
68
Application of Concentrator
Distributors
z
A distributor is the opposite of a concentrator A distributor can arbitrarily distribute packets to network channels
z z z
69
Distributors
z
Summary
z
Concentrators can be used to combine several terminals into one network channel Distributors can be used to connect a highbandwidth terminal to a low bandwidth network
2B1448 SoC Architectures 72
71