You are on page 1of 133

Tutorial: Networks on Chip

The 1st ACM/IEEE International Symposium on Networks-on-Chip Princeton, New Jersey, May 6, 2007

• Network Layer Communication Performance in Network-on-Chips A. Jantsch (Royal Institute of Technology, Sweden) 10:00 - 11:45 am • Power, Energy and Reliability Issues in NoC R. Marculescu (Carnegie Mellon University, USA) 1:15 - 3:00 pm • Tooling, OS Services and Middleware L. Benini (University of Bologna, Italy) 3:15 - 5:00 pm

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 1

Network Layer Communication Performance in Network-on-Chips

Introduction Communication Performance Organizational Structure Interconnection Topologies Trade-offs in Network Topology Routing Quality of Service

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 2

Introduction
Interconnection Network

• Topology: How switches and nodes are connected • Routing algorithm: determines the route from source to destination • Switching strategy: how a message traverses the route
Network interface Network interface

Communication assistm Mem P Mem P

Communication assistm

• Flow control: Schedules the traversal of the message over time

A. Jantsch, KTH

May 6. Jantsch. KTH .Network on Chip Tutorial. Princeton.3 Basic Definitions A. 2007 Network Layer Communication Performance In NoCs .

A. Princeton. 2007 Network Layer Communication Performance In NoCs . Flit is the basic flow control unit.Network on Chip Tutorial.3 Basic Definitions Message is the basic communication entity. Jantsch. May 6. Phit is the basic unit of the physical layer. A message consists of 1 or many flits. KTH .

May 6. Princeton. KTH . Phit is the basic unit of the physical layer. A.3 Basic Definitions Message is the basic communication entity. Indirect network is a network with switches not connected to any node. 2007 Network Layer Communication Performance In NoCs . Direct network is a network where each switch connects to a node.Network on Chip Tutorial. A message consists of 1 or many flits. Jantsch. Flit is the basic flow control unit.

A. May 6. Hop is the basic communication action from node to switch or from switch to switch. Phit is the basic unit of the physical layer. Direct network is a network where each switch connects to a node. Princeton. Indirect network is a network with switches not connected to any node. KTH . 2007 Network Layer Communication Performance In NoCs .Network on Chip Tutorial.3 Basic Definitions Message is the basic communication entity. A message consists of 1 or many flits. Jantsch. Flit is the basic flow control unit.

Indirect network is a network with switches not connected to any node. Routing distance between two nodes is the number of hops on a route.3 Basic Definitions Message is the basic communication entity. Flit is the basic flow control unit. Hop is the basic communication action from node to switch or from switch to switch. Average distance is the average of the routing distance over all pairs of nodes. Phit is the basic unit of the physical layer. Diameter is the length of the maximum shortest path between any two nodes measured in hops. May 6. Jantsch. A message consists of 1 or many flits. A.Network on Chip Tutorial. KTH . Direct network is a network where each switch connects to a node. Princeton. 2007 Network Layer Communication Performance In NoCs .

Jantsch.Network on Chip Tutorial. KTH .4 Basic Switching Techniques Circuit Switching A real or virtual circuit establishes a direct connection between source and destination. A. 2007 Network Layer Communication Performance In NoCs . Princeton. May 6.

4 Basic Switching Techniques Circuit Switching A real or virtual circuit establishes a direct connection between source and destination. KTH . Princeton. A. Packet Switching Each packet of a message is routed independently. May 6. Jantsch.Network on Chip Tutorial. 2007 Network Layer Communication Performance In NoCs . The destination address has to be provided with each packet.

Store and Forward Packet Switching The entire packet is stored and then forwarded at each switch. The destination address has to be provided with each packet. A.4 Basic Switching Techniques Circuit Switching A real or virtual circuit establishes a direct connection between source and destination. Princeton.Network on Chip Tutorial. Jantsch. 2007 Network Layer Communication Performance In NoCs . KTH . May 6. Packet Switching Each packet of a message is routed independently.

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 4

Basic Switching Techniques
Circuit Switching A real or virtual circuit establishes a direct connection between source and destination. Packet Switching Each packet of a message is routed independently. The destination address has to be provided with each packet. Store and Forward Packet Switching The entire packet is stored and then forwarded at each switch. Cut Through Packet Switching The flits of a packet are pipelined through the network. The packet is not completely buffered in each switch.

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 4

Basic Switching Techniques
Circuit Switching A real or virtual circuit establishes a direct connection between source and destination. Packet Switching Each packet of a message is routed independently. The destination address has to be provided with each packet. Store and Forward Packet Switching The entire packet is stored and then forwarded at each switch. Cut Through Packet Switching The flits of a packet are pipelined through the network. The packet is not completely buffered in each switch. Virtual Cut Through Packet Switching The entire packet is stored in a switch only when the header flit is blocked due to congestion.

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Network Layer Communication Performance In NoCs - 4

Basic Switching Techniques
Circuit Switching A real or virtual circuit establishes a direct connection between source and destination. Packet Switching Each packet of a message is routed independently. The destination address has to be provided with each packet. Store and Forward Packet Switching The entire packet is stored and then forwarded at each switch. Cut Through Packet Switching The flits of a packet are pipelined through the network. The packet is not completely buffered in each switch. Virtual Cut Through Packet Switching The entire packet is stored in a switch only when the header flit is blocked due to congestion. Wormhole Switching is cut through switching and all flits are blocked on the spot when the header flit is blocked.
A. Jantsch, KTH

2007 Performance .Network on Chip Tutorial. ContentionDelay is the delay of a message due to contention. May 6. Jantsch. 2 3 4 A. Princeton.5 Latency 1 A B C D Time(n) = Admission + RoutingDelay + ContentionDelay Admission is the time it takes to emit the message into the network. RoutingDelay is the delay for the route. KTH .

h) = h( n b + ∆) n . 2007 Performance ... KTH ..Network on Chip Tutorial. switching delay per hop A. Princeton.... Jantsch.6 Routing Delay Store and Forward: Tsf (n. message size in bits np . May 6... number of hops b .. size of message fragments in bits h .. raw bandwidth of the channel ∆ .

. switching delay per hop A... h) = h( n b + ∆) Tcs(n.Network on Chip Tutorial. May 6. raw bandwidth of the channel ∆ .. h) = n b + h∆ n . Jantsch. message size in bits np .... KTH ...6 Routing Delay Store and Forward: Circuit Switching: Tsf (n. size of message fragments in bits h .. number of hops b . 2007 Performance . Princeton.

size of message fragments in bits h ... May 6. message size in bits np .6 Routing Delay Store and Forward: Circuit Switching: Cut Through: Tsf (n. h) = h( n b + ∆) Tcs(n. 2007 Performance . number of hops b . h) = Tct(n.Network on Chip Tutorial. Princeton.. raw bandwidth of the channel ∆ . h) = n b n b + h∆ + h∆ n .. KTH ... switching delay per hop A... Jantsch...

raw bandwidth of the channel ∆ .. Jantsch. number of hops b . May 6..Network on Chip Tutorial.. h) = n b n b + h∆ + h∆ n−np b Tsf (n. switching delay per hop A.. size of message fragments in bits h .... KTH . np) = + h( np b + ∆) n .6 Routing Delay Store and Forward: Circuit Switching: Cut Through: Store and Forward with fragmented packets: Tsf (n. h.. h) = Tct(n.. h) = h( n b + ∆) Tcs(n. message size in bits np .. Princeton. 2007 Performance .

k=10. d=2. b=32 0 100 200 300 400 500 600 700 Packet size in Bytes 800 900 1000 1100 0 100 200 300 400 500 600 700 Packet size in Bytes 800 900 1000 1100 A. Princeton.7 Routing Delay: Store and Forward vs Cut Through SF vs CT switching. d=2. Jantsch. k=10.Network on Chip Tutorial. May 6. KTH . b=1 12000 Cut Through Store and Forward 10000 8000 6000 4000 2000 0 300 250 Average latency Average latency 200 150 100 50 0 350 Cut Through Store and Forward SF vs CT switching. 2007 Performance .

m=8 800 900 1000 1100 90 80 70 Average latency Average latency 60 50 40 30 20 Cut Through Store and Forward 300 Cut Through Store and Forward 250 200 150 100 50 10 0 0 200 400 600 800 Number of nodes (k=2) 1000 1200 0 0 200 400 600 800 Number of nodes (d=2) 1000 1200 A. d=2. k=2. k=10. Princeton. d=2. KTH . m=8 800 900 1000 1100 0 100 200 300 400 500 600 700 Packet size in Bytes SF vs CT switching. Jantsch. k=10.Network on Chip Tutorial.7 Routing Delay: Store and Forward vs Cut Through SF vs CT switching. b=1 12000 Cut Through Store and Forward 10000 8000 6000 4000 2000 0 300 250 Average latency Average latency 200 150 100 50 0 350 Cut Through Store and Forward SF vs CT switching. 2007 Performance . b=32 0 100 200 300 400 500 600 700 Packet size in Bytes SF vs CT switching. May 6. d=2.

C . nE .. link bandwidth per cycle. size of message envelope. Jantsch. 2007 Performance . w∆ . b . Princeton.. n ...... KTH . 1 For a k × k mesh with bidirectional channels: Total bandwidth = (4k 2 − 4k )b Bisection bandwidth = 2kb 2 3 4 A B C D A. total number of channels. bandwidth lost during switching... May 6... ∆ . raw bandwidth of a link. message size.Network on Chip Tutorial. minimum bandwidth to cut the net into two equal parts... switching time for each switch in cycles. w ..8 Local and Global Bandwidth Local bandwidth = b n+n n+w∆ E Total bandwidth = Cb[bits/second] = Cw[bits/cycle] = C [phits/cycle] Bisection bandwidth ...

bitwidth per channel A.. number of cycles a message occupies a channel n .Network on Chip Tutorial. number of nodes h ..... average message size w .. KTH . average routing distance l = n/w . each host issues a packet every M cycles C ..... 2007 Performance . number of channels N ...9 Link and Network Utilization total load on the network: L = N hl [phits/cycle] M N hl [phits/cycle] ≤ 1 MC load per channel: ρ = M .. May 6.. Princeton. Jantsch.

Network on Chip Tutorial. Jantsch. KTH .10 Network Saturation Network saturation Network saturation Delivered bandwidth Latency Offered bandwidth Delivered bandwidth Typical saturation points are between 40% and 70%. 2007 Performance . May 6. The saturation point depends on • Traffic pattern • Stochastic variations in traffic • Routing algorithm A. Princeton.

KTH . 2007 Organizational Structure .Network on Chip Tutorial. Jantsch. May 6.11 Organizational Structure • Link • Switch • Network Interface A. Princeton.

Long link Several data words can travel on the link simultaneously.12 Link Short link At any time there is only one data word on the link. 2007 Organizational Structure . Synchronous clocking Both source and destination operate on the same clock. Narrow link Data and control information is multiplexed on the same wires. Wide link Data and control information is transmitted in parallel and simultaneously. A. Asynchronous clocking The clock is encoded in the transmitted data to allow the receiver to sample at the right time instance. Princeton.Network on Chip Tutorial. Jantsch. May 6. KTH .

Scheduling) A. Jantsch. KTH .13 Switch Receiver Input ports Crossbar Input buffer Output buffer Transmitter Output ports Control (Routing. May 6.Network on Chip Tutorial. 2007 Organizational Structure . Princeton.

Buffering • Input buffers • Output buffers • Shared buffers Routing • Source routing • Deterministic routing • Adaptive routing Output scheduling Deadlock handling Control flow A.Network on Chip Tutorial. May 6. 2007 Organizational Structure .14 Switch Design Issues Degree: number of inputs and outputs. Princeton. Jantsch. KTH .

Jantsch.Network on Chip Tutorial. 2007 Organizational Structure . Princeton. KTH . May 6.15 Network Interface • Admission protocol • Reception obligations • Buffering • Assembling and disassembling of messages • Routing • Higher level services and protocols A.

Network on Chip Tutorial.16 Interconnection Topologies • Fully connected networks • Linear arrays and rings • Multidimensional meshes and tori • Trees • Butterflies A. May 6. Jantsch. 2007 Topologies . KTH . Princeton.

2007 Topologies .17 Fully Connected Networks Bus: switch degree diameter distance network cost total bandwidth bisection bandwidth Node Node Node = = = = = = N 1 1 O(N ) b b Node Node Node Crossbar: switch degree diameter distance network cost total bandwidth bisection bandwidth = = = = = = N 1 1 O(N 2) Nb Nb A. Jantsch. Princeton. KTH .Network on Chip Tutorial. May 6.

2007 Topologies . KTH . Jantsch.18 Linear Arrays and Rings Linear array: switch degree diameter distance network cost total bandwidth bisection bandwidth Torus: switch degree diameter distance network cost total bandwidth bisection bandwidth = = ∼ = = = 2 N −1 2/3N O(N ) 2(N − 1)b 2b Linear array Torus Folded torus = = ∼ = = = 2 N/2 1/3N O(N ) 2N b 4b A. Princeton. May 6.Network on Chip Tutorial.

Network on Chip Tutorial. May 6. 2007 Topologies . KTH . Jantsch. Princeton.19 Multidimensional Meshes and Tori 2−d mesh k -ary d-cubes are d-dimensional tori with unidirectional links and k nodes in each dimension: 3−d cube number of nodes N switch degree diameter = kd = d = d(k − 1) ∼ d1 2 (k − 1) = O(N ) = 2N b 2−d torus distance network cost total bandwidth bisection bandwidth = 2k (d−1)b A.

Princeton.Network on Chip Tutorial. 2007 Topologies . Jantsch.20 Routing Distance in k -ary n-Cubes Network Scalability wrt Distance 45 40 35 Average distance 30 25 20 15 10 5 0 0 200 400 600 Number of nodes 800 1000 1200 k=2 d=4 d=3 d=2 A. KTH . May 6.

Jantsch. 2007 Topologies . May 6. Princeton. KTH .Network on Chip Tutorial.21 Projecting High Dimensional Cubes 2−ary 2−cube 2−ary 3−cube 2−ary 4−cube 2−ary 5−cube A.

Jantsch.22 Binary Trees 4 3 2 1 0 number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth = = = = ∼ = = = 2d 2d − 1 3 2d d+2 O(N ) 2 · 2(N − 1)b 2b A. Princeton. 2007 Topologies . May 6.Network on Chip Tutorial. KTH .

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 23

k -ary Trees

4

3

2

1

0

number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth

= ∼ = = ∼ = = =

kd kd k+1 2d d+2 O(N ) 2 · 2(N − 1)b kb

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 24

Binary Tree Projection

• Efficient and regular 2-layout; • Longest wires in resource width: lW = 2
d N lW 2 4 0 3 8 1 4 16 1 5 32 2 6 64 2
d−1 2 −1

7 128 4

8 256 4

9 512 8

10 1024 8

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Topologies - 25

k -ary n-Cubes versus k -ary Trees

k -ary n-cubes: number of nodes N switch degree diameter distance network cost total bandwidth = kd = d+2 = d(k − 1) ∼ d1 2 (k − 1) = O(N ) = 2N b

k -ary trees: number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth = ∼ = = ∼ = = = kd kd k+1 2d d+2 O(N ) 2 · 2(N − 1)b kb

bisection bandwidth = 2k (d−1)b

A. Jantsch, KTH

Jantsch.26 Butterflies 01 01 Butterfly building block 0 01 1 01 4 3 2 1 0 16 node butterfly A. Princeton. May 6. 2007 Topologies .Network on Chip Tutorial. KTH .

Princeton.Network on Chip Tutorial.27 Butterfly Characteristics 4 3 2 1 0 number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth = = = = = = = = 2d 2d−1d 2 d+1 d+1 O(N d) 2ddb N 2b A. Jantsch. 2007 Topologies . May 6. KTH .

Princeton.Network on Chip Tutorial.28 k -ary n-Cubes versus k -ary Trees vs Butterflies k -ary n-cubes binary tree cost distance links per node bisection frequency limit of random traffic O(N ) √ 1 d 2 N log N 2 2N d−1 d butterfly O(N log N ) log N log N 1 2N O(N ) 2 log N 2 1 1/N 1/( d N 2) 1/2 A. May 6. KTH . Jantsch. 2007 Topologies .

29 Problems with Butterflies • Cost of the network O(N log N ) 2-d layout is more difficult than for binary trees Number of long wires grows faster than for trees. A. Jantsch. • Each route blocks many other routes.Network on Chip Tutorial. May 6. • For each source-destination pair there is only one route. KTH . Princeton. 2007 Topologies .

Princeton. 1992]. A.30 Benes Networks • Many routes. • Costly to compute non-blocking routes. 2007 Topologies . Jantsch. KTH .Network on Chip Tutorial. • High probability for non-blocking route by randomly selecting an intermediate node [Leighton. May 6.

May 6. Jantsch. 2007 Topologies . KTH . Princeton.31 Fat Trees fat nodes 16−node 2−ary fat−tree A.Network on Chip Tutorial.

Jantsch. Princeton.Network on Chip Tutorial.32 k -ary n-dimensional Fat Tree Characteristics kd k d−1d 2k 2d d O(N d) 2k ddb 2k d−1b 16−node 2−ary fat−tree number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth = = = = ∼ = = = fat nodes A. KTH . 2007 Topologies . May 6.

Princeton. KTH . 2007 Topologies . Jantsch.33 k -ary n-Cubes versus k -ary d-dimensional Fat Trees k -ary n-cubes: number of nodes N switch degree diameter distance network cost total bandwidth = kd = d = d(k − 1) ∼ d1 2 (k − 1) = O(N ) = 2N b k -ary n-dimensional fat trees: number of nodes N number of switches switch degree diameter distance network cost total bandwidth bisection bandwidth = = = = ∼ = = = kd k d−1d 2k 2d d O(N d) 2k ddb 2k d−1b bisection bandwidth = 2k (d−1)b A. May 6.Network on Chip Tutorial.

May 6.Network on Chip Tutorial.34 Relation between Fat Tree and Hypercube binary 2−dim fat tree binary 1−cube A. 2007 Topologies . Princeton. KTH . Jantsch.

KTH .Network on Chip Tutorial. May 6.35 Relation between Fat Tree and Hypercube . 2007 Topologies . Princeton. Jantsch.cont’d binary 3−dim fat tree binary 2−cube binary 2−cube A.

May 6.36 Relation between Fat Tree and Hypercube . KTH .cont’d binary 4−dim fat tree binary 3−cube binary 3−cube A. Jantsch.Network on Chip Tutorial. Princeton. 2007 Topologies .

KTH .Network on Chip Tutorial. May 6. Jantsch. Princeton. 2007 Trade-offs in Topologies .37 Trade-offs in Topology Design for the k -ary n-Cube • Unloaded Latency • Latency under Load A.

KTH . 2007 Trade-offs in Topologies . h) = b 1 1 √ 1 d d(k − 1) = (k − 1) logk N = (d N − 1) RoutingDistance h = 2 2 2 Network scalabilit wrt latency (m=32) 160 140 120 Average latency 100 80 60 40 20 Average latency k=2 d=5 d=4 d=3 d=2 260 240 220 200 180 160 140 120 k=2 d=5 d=4 d=3 d=2 Network scalabilit wrt latency (m=128) 0 1000 2000 3000 4000 5000 6000 Number of nodes 7000 8000 9000 10000 0 1000 2000 3000 4000 5000 6000 Number of nodes 7000 8000 9000 10000 A. May 6. Jantsch.Network on Chip Tutorial.38 Network Scaling for Unloaded Latency Latency(n) = Admission + RoutingDelay + ContentionDelay n + h∆ RoutingDelay Tct(n. Princeton.

Princeton. 2007 Trade-offs in Topologies . h=dk/5) 20 30 40 50 60 Number of nodes 70 80 90 100 Network scalabilit wrt latency (m=128. KTH .Network on Chip Tutorial.5 131 130. h=dk/5) 170 165 160 Average latency 155 150 145 140 135 130 125 0 1000 2000 3000 4000 5000 6000 Number of nodes 7000 8000 9000 10000 k=2 d=5 d=4 d=3 d=2 A.39 Unloaded Latency for Small Networks and Local Traffic Network scalabilit wrt latency (m=128) 137 136 135 Average latency 134 133 132 131 130 129 10 20 30 40 50 60 Number of nodes 70 80 90 100 129.5 129 10 k=2 d=5 d=4 d=3 d=2 Average latency 132 131. May 6. Jantsch.5 130 k=2 d=5 d=4 d=3 d=2 Network scalabilit wrt latency (m=128.

Latency wrt dimension under free−wire cost model (m=32.b=32) Average latency 2 3 4 Dimension 5 6 7 2 3 4 Dimension 5 6 7 A. KTH .b=32) 120 100 80 60 40 20 0 N=16K N=1K N=256 N=128 N=64 Average latency 120 100 80 60 40 20 0 N=16K N=1K N=256 N=128 N=64 Latency wrt dimension under free−wire cost model (m=128.Network on Chip Tutorial. May 6. Jantsch. Princeton.40 Unloaded Latency under a Free-Wire Cost Model Free-wire cost model: Wires are free and can be added without penalty. 2007 Trade-offs in Topologies .

Princeton. d 2 3 4 5 6 7 8 9 10 w(d) 32 21 16 12 10 9 8 7 6 Latency wrt dimension under fixed−wire cost model (m=32.41 Unloaded Latency under a Fixed-Wire Cost Models Fixed-wire cost model: The number of wires is constant per node: 128 wires per node: w(d) = 64 d .b=64/d) Average latency 2 3 4 5 6 Dimension 7 8 9 10 2 3 4 5 6 Dimension 7 8 9 10 A. Jantsch.Network on Chip Tutorial. 2007 Trade-offs in Topologies . May 6. KTH .b=64/d) 120 100 80 60 40 20 0 N=16K N=1K N=256 N=128 N=64 Average latency 120 100 80 60 40 20 0 N=16K N=1K N=256 N=128 N=64 Latency wrt dimension under fixed−wire cost model (m=128.

Jantsch. 2007 Trade-offs in Topologies . Example: N=1024: d 2 3 4 5 6 7 8 9 10 w(d) 512 16 5 3 2 2 1 1 1 Latency wrt dimension under fixed−bisection cost model (m=32B.Network on Chip Tutorial. Princeton.b=k/2) A. May 6.b=k/2) 400 350 300 Average latency 250 200 150 100 50 0 2 3 4 5 6 Dimension 7 8 9 10 N=16K N=1K N=256 N=128 N=64 Average latency 1000 900 800 700 600 500 400 300 200 100 2 3 4 5 6 Dimension 7 8 9 10 N=16K N=1K N=256 N=128 N=64 Latency wrt dimension under fixed−bisection cost model (m=128B.42 Unloaded Latency under a Fixed-Bisection Cost Models Fixed-bisection cost model: The number of wires across the bisection is constant: √ d k bisection = 1024 wires: w(d) = 2 = 2N . KTH .

Princeton.Network on Chip Tutorial. 1990]: n Length of long wires: l = k 2 −1 d Tc ∝ 1 + log l = 1 + ( − 1) log k 2 Latency wrt dimension under fixed−bisection log wire delay cost model (m=32B. KTH .b=k/2) 1200 1000 800 600 400 200 0 N=16K N=1K N=256 N=128 N=64 Average latency Latency wrt dimension under fixed−bisection log wire delay cost model (m=128B. 2007 Trade-offs in Topologies .43 Unloaded Latency under a Logarithmic Wire Delay Cost Models Fixed-bisection Logarithmic Wire Delay cost model: The number of wires across the bisection is constant and the delay on wires increases logarithmically with the length [Dally. Jantsch. May 6.b=k/2) 4000 N=16K N=1K 3500 N=256 N=128 N=64 3000 2500 2000 1500 1000 500 Average latency 2 3 4 5 6 Dimension 7 8 9 10 0 2 3 4 5 6 Dimension 7 8 9 10 A.

Jantsch. 2007 Trade-offs in Topologies . May 6. 1990]: n Length of long wires: l = k 2 −1 Tc ∝ l = k 2 −1 Latency wrt dimension under fixed−bisection log wire delay cost model (m=32B. Princeton.b=k/2) 40000 N=16K N=1K 35000 N=256 N=128 N=64 30000 25000 20000 15000 10000 d 8000 6000 4000 2000 0 5000 2 3 4 5 6 Dimension 7 8 9 10 0 2 3 4 5 6 Dimension 7 8 9 10 A.44 Unloaded Latency under a Linear Wire Delay Cost Models Fixed-bisection Linear Wire Delay cost model: The number of wires across the bisection is constant and the delay on wires increases linearly with the length [Dally. KTH .Network on Chip Tutorial.b=k/2) 12000 N=16K N=1K N=256 10000 N=128 N=64 Average latency Average latency Latency wrt dimension under fixed−bisection log wire delay cost model (m=128B.

May 6. 2007 Trade-offs in Topologies . Princeton.Network on Chip Tutorial. KTH . 1991]: • k -ary n-cubes • random traffic • dimension-order cut-through routing • unbounded internal buffers (to ignore flow control and deadlock issues) A.45 Latency under Load Assumptions [Agarwal. Jantsch.

Network on Chip Tutorial, Princeton, May 6, 2007

Trade-offs in Topologies - 46

Latency under Load - cont’d
Latency(n) = Admission + RoutingDelay + ContentionDelay T (m, k, d, w, ρ) = RoutingDelay + ContentionDelay m T (m, k, d, w, ρ) = + dhk (∆ + W (m, k, d, w, ρ)) w m 1 ρ hk − 1 W (m, k, d, w, ρ) = · 1 + · · w (1 − ρ) h2 d k 1 d(k − 1) h = 2 m w ρ hk ∆ ··· ··· ··· ··· ··· message size bitwidth of link aggregate channel utilization average distance in each dimension switching time in cycles

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Trade-offs in Topologies - 47

Latency vs Channel Load
Latency wrt channel utilization (w=8;delta=1) 300 250 200 150 100 50 0 m8,d3,k10 m8,d2,k32 m32,d3,k10 m32,d2,k32 m128,d3,k10 m128,d2,k32

Average latency

0

0.1

0.2

0.3

0.4 0.5 0.6 Channel utilization

0.7

0.8

0.9

1

A. Jantsch, KTH

Network on Chip Tutorial, Princeton, May 6, 2007

Routing - 48

Routing
Deterministic routing The route is determined solely by source and destination locations. Arithmetic routing The destination address of the incoming packet is compared with the address of the switch and the packet is routed accordingly. (relative or absolute addresses) Source based routing The source determines the route and builds a header with one directive for each switch. The switches strip off the top directive. Table-driven routing Switches have routing tables, which can be configured. Adaptive routing The route can be adapted by the switches to balance the load. Minimal routing allows only shortest paths while non-minimal routing allows even longer paths.
A. Jantsch, KTH

Princeton.Network on Chip Tutorial. May 6.49 Quality of Service • Best Effort (BE) Optimization of the average case Loose or non-existent worst case bounds Cost effective use of resources • Guaranteed Service (GS) Maximum delay Minimum bandwidth Maximum Jitter Requires additional resources A. 2007 Quality of Service . KTH . Jantsch.

A. Jantsch. ρ) regulated if F (b) − F (a) ≤ σ + ρ(b − a) for all time intervals [a.50 Regulated Flows A Flow F is (σ. b]. May 6. 2007 Quality of Service . ρ ≥ 0 is the maximum average rate.Network on Chip Tutorial. 0 ≤ a ≤ b and where F (t) · · · the cumulative amount of traffic between 0 and t ≥ 0. σ ≥ 0 is the burstiness constraint. Princeton. KTH .

σ ≥ 0 is the burstiness constraint. 2007 Quality of Service . b].Network on Chip Tutorial.50 Regulated Flows A Flow F is (σ. Princeton. May 6. ρ) regulated if F (b) − F (a) ≤ σ + ρ(b − a) for all time intervals [a. 0 ≤ a ≤ b and where F (t) · · · the cumulative amount of traffic between 0 and t ≥ 0. ρ ≥ 0 is the maximum average rate. KTH . Jantsch. ρ F(t) σ t1 t2 t3 t4 t5 t A.

51 Regulated Flows . KTH . 2007 Quality of Service .Delay Element F 1 D F 2 A. Jantsch.Network on Chip Tutorial. Princeton. May 6.

Network on Chip Tutorial.51 Regulated Flows . May 6. KTH . Jantsch. 2007 Quality of Service .Delay Element F 1 D F1 ∼ (σ. ρ) F 2 A. ρ) F2 ∼ (σ + ρD. Princeton.

Work Conserving Multiplexer F 1 F 2 b D B F 3 b b A. 2007 Quality of Service .52 Regulated Flows . KTH .Network on Chip Tutorial. Princeton. Jantsch. May 6.

ρ1) F2 ∼ (σ2. ρ2) link bandwidth b < ρ1 + ρ2 F3 ∼ ? maximum delay D = ? maximum backlog B = ? A.Work Conserving Multiplexer F 1 F 2 b D B F 3 b b F1 ∼ (σ1. Princeton. 2007 Quality of Service . Jantsch. May 6.Network on Chip Tutorial. KTH .52 Regulated Flows .

53 Work Conserving Multiplexer .Network on Chip Tutorial. KTH . Jantsch. 2007 Quality of Service . May 6. Princeton.1 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain A.

May 6. KTH .Network on Chip Tutorial. A. Jantsch. 2007 Quality of Service . Princeton.53 Work Conserving Multiplexer .1 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Phase 1 (t1): F1 and F2 transmit at full speed.

1 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Phase 1 (t1): F1 and F2 transmit at full speed. Princeton. KTH . σ1 ≤ σ2 A. 2007 Quality of Service . Jantsch. May 6.53 Work Conserving Multiplexer . Assume: At t = 0 the queue is empty.Network on Chip Tutorial.

Drain rate: b A. Assume: At t = 0 the queue is empty. Jantsch.1 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Phase 1 (t1): F1 and F2 transmit at full speed. KTH . σ1 ≤ σ2 Injection rate: 2b. Princeton. 2007 Quality of Service .Network on Chip Tutorial.53 Work Conserving Multiplexer . May 6.

Jantsch. Princeton.Network on Chip Tutorial.1 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Phase 1 (t1): F1 and F2 transmit at full speed. Assume: At t = 0 the queue is empty. May 6. Drain rate: b bt1 = σ1 + ρ1t1 σ1 t1 = b − ρ1 A.53 Work Conserving Multiplexer . KTH . 2007 Quality of Service . σ1 ≤ σ2 Injection rate: 2b.

54 Work Conserving Multiplexer . Princeton.Network on Chip Tutorial. 2007 Quality of Service . Jantsch.2 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain A. KTH . May 6.

F2 transmits at full speed. A.Network on Chip Tutorial. KTH . Princeton. 2007 Quality of Service . May 6. Jantsch.2 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Phase 2 (t2): F1 transmits at rate ρ1.54 Work Conserving Multiplexer .

May 6. Injection rate: b + ρ1. KTH . F2 transmits at full speed. Jantsch. 2007 Quality of Service . Princeton.Network on Chip Tutorial.54 Work Conserving Multiplexer . Drain rate: b A.2 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Phase 2 (t2): F1 transmits at rate ρ1.

2007 Quality of Service . KTH . Jantsch.Network on Chip Tutorial.54 Work Conserving Multiplexer . Princeton. Injection rate: b + ρ1. Drain rate: b btaccu = σ2 + ρ2taccu σ2 taccu = b − ρ2 A.2 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Phase 2 (t2): F1 transmits at rate ρ1. May 6. F2 transmits at full speed.

3 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain A. Jantsch. Princeton. May 6. KTH .55 Work Conserving Multiplexer . 2007 Quality of Service .Network on Chip Tutorial.

F2 transmits at rate ρ2. KTH . May 6. Jantsch.3 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Phase 3 (tdrain): F1 transmits at rate ρ1.Network on Chip Tutorial. 2007 Quality of Service .55 Work Conserving Multiplexer . Princeton. A.

May 6.Network on Chip Tutorial. F2 transmits at rate ρ2.3 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Phase 3 (tdrain): F1 transmits at rate ρ1. Drain rate: b A. 2007 Quality of Service . Injection rate: ρ1 + ρ2. Jantsch. Princeton.55 Work Conserving Multiplexer . KTH .

F2 transmits at rate ρ2.3 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Phase 3 (tdrain): F1 transmits at rate ρ1. Injection rate: ρ1 + ρ2. 2007 Quality of Service . Drain rate: b tdrain = Bmax b − ρ1 − ρ2 A.55 Work Conserving Multiplexer . Jantsch. KTH . May 6.Network on Chip Tutorial. Princeton.

May 6. KTH . 2007 Quality of Service .55 Work Conserving Multiplexer .3 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Phase 3 (tdrain): F1 transmits at rate ρ1.Network on Chip Tutorial. Princeton. Injection rate: ρ1 + ρ2. F2 transmits at rate ρ2. Jantsch. Drain rate: b tdrain = Bmax b − ρ1 − ρ2 Bmax = bt1 + ρ1t2 A.

Injection rate: ρ1 + ρ2.55 Work Conserving Multiplexer . KTH Bmax = bt1 + ρ1t2 = σ1 + . Jantsch. Princeton. May 6. 2007 Quality of Service . F2 transmits at rate ρ2.Network on Chip Tutorial.3 B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Phase 3 (tdrain): F1 transmits at rate ρ1. Drain rate: b tdrain = Bmax b − ρ1 − ρ2 ρ1 σ 2 b − ρ2 A.

May 6. 2007 Quality of Service .56 Work Conserving Multiplexer .Network on Chip Tutorial.Summary B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain A. Jantsch. KTH . Princeton.

May 6. KTH .Network on Chip Tutorial. Jantsch. 2007 Quality of Service .Summary B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Bmax = σ1 + ρ1 σ 2 b − ρ2 A. Princeton.56 Work Conserving Multiplexer .

Summary B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Bmax = σ1 + ρ1 σ 2 b − ρ2 σ1 + σ2 = b − ρ1 − ρ2 Dmax = taccu + tdrain A.Network on Chip Tutorial. KTH . Princeton.56 Work Conserving Multiplexer . Jantsch. May 6. 2007 Quality of Service .

KTH . Jantsch. Princeton.56 Work Conserving Multiplexer . May 6.Summary B max B 1 1 b Time ρ1 1 b −ρ1 − ρ2 t= 0 t1 taccu t2 tdrain Bmax = σ1 + Dmax F3 ρ1 σ 2 b − ρ2 σ1 + σ2 = taccu + tdrain = b − ρ1 − ρ2 ∼ (σ1 + σ2. 2007 Quality of Service . ρ1 + ρ2) A.Network on Chip Tutorial.

Network on Chip Tutorial. May 6. KTH . Jantsch.57 MPEG Encoding Case Study E S V A. 2007 Quality of Service . Princeton.

2007 Quality of Service .57 MPEG Encoding Case Study E S V Processor Custom HW Interconnect Custom HW Memory A.Network on Chip Tutorial. KTH . Jantsch. May 6. Princeton.

2007 Quality of Service . KTH . Jantsch. Princeton. May 6.57 MPEG Encoding Case Study E S V Processor Custom HW Interconnect Custom HW Memory A.Network on Chip Tutorial.

2007 Quality of Service . Princeton. KTH . May 6.57 MPEG Encoding Case Study E S V Processor Custom HW Interconnect Custom HW Memory A. Jantsch.Network on Chip Tutorial.

May 6. Princeton. KTH .Network on Chip Tutorial. 2007 Quality of Service . Jantsch.57 MPEG Encoding Case Study E S V Processor Custom HW Interconnect Custom HW Memory A.

KTH .cont’d T F1 S V M F1 ∼ (0. ρt) A. Jantsch. Princeton. 2007 Quality of Service .Network on Chip Tutorial.58 MPEG Encoding Case Study . May 6.

Princeton.Network on Chip Tutorial. 2007 Quality of Service .59 MPEG Encoding Case Study . ρt) A. KTH .cont’d T F1 C1 F2 F4 C2 F3 M S F6 C3 F7 F8 V F9 C4 F1 ∼ (0. May 6. Jantsch.

D1) C2 : (ρt. D2) C1 : (ρt.cont’d T F1 C1 F2 F4 C2 F3 M S F6 C3 F7 F8 V F9 C4 F1 ∼ (0. May 6.59 MPEG Encoding Case Study . 2007 Quality of Service .Network on Chip Tutorial. KTH . D3) C4 : (ρt. D4) A. Princeton. ρt) C1 : (ρt. Jantsch.

D2) C1 : (ρt. ρt) A. Princeton. D1) C2 : (ρt. D3) C4 : (ρt.59 MPEG Encoding Case Study . May 6. Jantsch. KTH . ρt) C1 : (ρt. D4) F2 ∼ (ρtD1.Network on Chip Tutorial. 2007 Quality of Service .cont’d T F1 C1 F2 F4 C2 F3 M S F6 C3 F7 F8 V F9 C4 F1 ∼ (0.

Memory F2 F7 M : FM1 M’ FM2 FM3 FM4 (2ρt. 2007 Quality of Service .Network on Chip Tutorial. DM ) A. Jantsch. Princeton. May 6. KTH .60 MPEG Encoding Case Study .

Jantsch.Memory F2 F7 M : FM1 M’ FM2 FM3 FM4 Dmux Fmuxout (2ρt. KTH . ρ1 + ρ2) A. DM ) For a general multiplexer we have: σ1 + σ2 = Cout − ρ1 − ρ2 ∼ (σ1 + σ2. Princeton. 2007 Quality of Service . May 6.Network on Chip Tutorial.60 MPEG Encoding Case Study .

Princeton. DM ) For a general multiplexer we have: σ1 + σ2 = Cout − ρ1 − ρ2 ∼ (σ1 + σ2.60 MPEG Encoding Case Study . ρt) A. ρ1 + ρ2) FM 3 ∼ (ρt(D1 + Dmux + DM ). 2007 Quality of Service .Network on Chip Tutorial. KTH . Jantsch.Memory F2 F7 M : FM1 M’ FM2 FM3 FM4 Dmux Fmuxout (2ρt. May 6.

60 MPEG Encoding Case Study . 2007 Quality of Service .Memory F2 F7 M : FM1 M’ FM2 FM3 FM4 Dmux Fmuxout (2ρt. DM ) For a general multiplexer we have: σ1 + σ2 = Cout − ρ1 − ρ2 ∼ (σ1 + σ2. May 6. ρ1 + ρ2) FM 3 ∼ (ρt(D1 + Dmux + DM ).Network on Chip Tutorial. ρt) FM 4 ∼ ? A. Princeton. Jantsch. KTH .

KTH . Jantsch. 2007 Quality of Service . Princeton. May 6.Network on Chip Tutorial.61 MPEG Encoding Case Study .cont’d T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8 A.

ρt). 2007 Quality of Service . RM 2 ∼ (Sbuffer. Jantsch. May 6.cont’d T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8 RM 1 ∼ (Sbuffer. KTH . ρt). A. RS ∼ (Sbuffer. Sbuffer is the size of the input buffer in S. ρt).Network on Chip Tutorial. Princeton.61 MPEG Encoding Case Study .

σ − σ ) A. σ − σ ) ρ B(σ. ρt). D(σ. 2007 Quality of Service . KTH . May 6. Princeton. Sbuffer is the size of the input buffer in S. ρ)-regulator = max(0. ρt).cont’d T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8 RM 1 ∼ (Sbuffer. RM 2 ∼ (Sbuffer. RS ∼ (Sbuffer.Network on Chip Tutorial.61 MPEG Encoding Case Study . ρ)-regulator = max(0. Jantsch. ρt).

cont’d T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8 RM 1 ∼ (Sbuffer. σ − σ ) F7 ∼ (Sbuffer + ρtD3. ρ)-regulator = max(0. Sbuffer is the size of the input buffer in S. σ − σ ) D(σ. ρ6 = )-regulator ρ C3 : (ρt. D3) B(σ. ρt). KTH . Princeton. Jantsch.max(0 ρt ) . ρt). F ∼ (Sbuffer. ρt) A. RM 2 ∼ (Sbuffer. 2007 Quality of Service . ρt). RS ∼ (Sbuffer. May 6.61 MPEG Encoding Case Study .Network on Chip Tutorial.

KTH .62 MPEG Encoding Case Study .Network on Chip Tutorial. 2007 Quality of Service . May 6. Jantsch.Memory F2 F7 FM1 M’ FM2 FM3 FM4 A. Princeton.

2ρt) FM 2 ∼ (Sbuffer + ρt(D1 + D3 + 2DM ).62 MPEG Encoding Case Study .Network on Chip Tutorial. 2ρt) FM 3 ∼ (ρt(D1 + Dmux + DM ). ρt) A. May 6. Princeton. 2007 Quality of Service .Memory F2 F7 M : FM1 M’ FM2 FM3 FM4 (2ρt. KTH . DM ) Dmux = FM 1 σ1 + σ2 Sbuffer + ρt(D1 + D3) = Cout − ρ1 − ρ2 Cout − 2ρt ∼ (Sbuffer + ρt(D1 + D3). Jantsch. ρt) FM 4 ∼ (Sbuffer + ρt(D3 + Dmux + DM ).

KTH . 2007 Quality of Service . Princeton. Jantsch. May 6.63 MPEG Encoding Case Study .cont’d T F1 F4 C2 C1 F2 F3 S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8 RM1 FM3 M A.Network on Chip Tutorial.

Princeton. 128B + ρt(D3 + Dmux +DM ) − Sbuffer) Delay of the regulators: DRM 1 = DRM 2 = BRM 1 ρt BRM 2 ρt A.63 MPEG Encoding Case Study . 2007 Quality of Service .cont’d Backlog of the regulators: T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8 BRM 1 = max(0.Network on Chip Tutorial. May 6. ρt(D1 + Dmux + DM ) −Sbuffer) BRM 2 = max(0. Jantsch. KTH .

64 MPEG Encoding Case Study .Network on Chip Tutorial.cont’d T F1 F4 C2 C1 F2 F3 S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8 RM1 FM3 M A. 2007 Quality of Service . May 6. Princeton. KTH . Jantsch.

ρt) The flows between memory and V: F8 C4 ∼ (Sbuffer. Princeton. 2007 Quality of Service . May 6. D4) F9 ∼ (Sbuffer + ρtD4. ) ρt F5 ∼ (2Sbuffer + ρtD2. Jantsch. ρt) : (ρt. KTH .cont’d The flow from the memory to S: F3 T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8 ∼ (Sbuffer. ρt) A. A charatcerization of S and its output: Sbuffer S : (ρt. ρt) : (ρ. ρt).64 MPEG Encoding Case Study . D2) C2 F4 ∼ (Sbuffer + ρD2.Network on Chip Tutorial.

cont’d T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8 A. KTH . May 6.Network on Chip Tutorial. Jantsch.65 MPEG Encoding Case Study . 2007 Quality of Service . Princeton.

Princeton.cont’d T F1 F4 C2 C1 F2 F3 RM1 FM3 M S F5 RS F6 C3 F7 RM2 FM4 V F9 C4 F8 End to end delay: Dtotal =D1 + Dmux + DM + DRM 1 + D2 + DS + DRS + D3 + Dmux + DM + DRM 2 + D4 The flow at V: FT →V ∼ (0 + ρtDtotal.Network on Chip Tutorial. KTH . 2007 Quality of Service .65 MPEG Encoding Case Study . ρt) A. Jantsch. May 6.

write acknowledge as separate flows. Model each node in the interconnect as an arbiter. Jantsch. 2007 Quality of Service . A. • Model synchronization as separate flows. May 6.Network on Chip Tutorial. ρi).66 Modeling with Regulated Flows • Interconnect: Model each channel by available bandwidth and maximum delay variation. ρ) flows is F ∼ min(σi. • A simple generalization of (σ. KTH . Princeton. i > 0 F (b) − F (a) ≤ min(σi + ρi(b − a)) i • Good analysis depends on good element models. • Model read request.

2007 Quality of Service . KTH . May 6.Arrival Curves bits α(t) b−a a Given a monotonically increasing function α. Princeton. Jantsch. defined for t ≥ 0.Network on Chip Tutorial.67 Network Calculus . α is an arrival curve for flow F if for all 0 ≤ a ≤ b: F (b) − F (a) ≤ α(b − a) A.

68 Network Calculus . 2007 Quality of Service . May 6.Network on Chip Tutorial. The min-plus convolusion of f and g is the function (f ⊗ g )(t) = inf (f (t − s) + g (s)) 0≤s≤t A. Princeton.Min-Plus Convolusion Given two monotonically increasing functions f and g . KTH . Jantsch.

68 Network Calculus . 2007 Quality of Service .Min-Plus Convolusion Given two monotonically increasing functions f and g . KTH . May 6. The min-plus convolusion of f and g is the function (f ⊗ g )(t) = inf (f (t − s) + g (s)) 0≤s≤t If α is an arrival curve for F we have: F ≤F ⊗α A. Princeton. Jantsch.Network on Chip Tutorial.

A. 2007 Quality of Service . The min-plus convolusion of f and g is the function (f ⊗ g )(t) = inf (f (t − s) + g (s)) 0≤s≤t If α is an arrival curve for F we have: F ≤F ⊗α and F ≤α⊗α with α ⊗ α being the best bound that we can find based on information of α. May 6.Min-Plus Convolusion Given two monotonically increasing functions f and g .Network on Chip Tutorial. KTH .68 Network Calculus . Princeton. Jantsch.

2007 Quality of Service . KTH .Network on Chip Tutorial.69 Network Calculus .Service Curves F*(t) bits F(t) F(t) F*(t) S β (t) (F β) (t) Given a system S with an input flow F and an output flow F ∗. Jantsch. May 6. S offers the flow a service curve β if and only if β is a monotonically increasing function and F ∗ ≥ F ⊗ β which means that F ∗(t) ≥ inf (F (t) + β (t − s)) s≤t A. Princeton.

May 6. the backlog F (t) − F ∗(t) for all t satisfies F (t) − F ∗(t) ≤ sup(α(s) − β (s)) s≥0 A. 2007 Quality of Service . Princeton.70 Network Calculus .Network on Chip Tutorial.Backlog Bound F*(t) bits backlog α F(t) F(t) α F*(t) S β t β Given a flow F constrained by arrival curve α and a system offering a service curve β . Jantsch. KTH .

It satisfies d(t) ≤ h(α. 2007 Quality of Service . KTH . Princeton.Delay Bound bits α delay F(t) F(t) α F*(t) S β t β F*(t) Given a flow F constrained by arrival curve α and a system offering a service curve β .71 Network Calculus . Jantsch. the delay d(t) at time t is d(t) = inf(τ ≥ 0 : F (t) ≤ F ∗(t + τ )). May 6. β ) = sup(inf(τ ≥ 0 : α(t) ≤ β (t + τ ))) t≥0 A.Network on Chip Tutorial.

Output Arrival Curve α∗ bits α β F(t) F(t) α F*(t) S β α β t F*(t) Given a flow F constrained by arrival curve α and a system offering a service curve β . KTH . Jantsch. May 6.72 Network Calculus . the output flow F ∗ is constrained by the arrival curve α∗ α∗ = α (α β. 2007 Quality of Service . Princeton.Network on Chip Tutorial. β )(t) = sup(α(t + s) − β (s)) s≥0 A.

T (t) = R[t − T ]+ Affine function: γσ.τ (t) = (t + τ )/T Step function: uT (t) = 0 for t ≤ T 1 for t > T A.Useful Functions bits R bits R σ bits ρ t T t t Peak rate function: λR(t) = Rt Rate latency function: βR. Princeton.73 Network Calculus .ρ(t) = 0 σ + ρt for t = 0 for t > 0 bits bits 5 4 3 2 1 T t T−t 2T−t 3T−t bits ρ 1 4T−t 5T−t 6T−t t T t Burst-delay function: δT (t) = 0 for t ≤ T ∞ for t > T Staircase function: vT.Network on Chip Tutorial. Jantsch. KTH . May 6. 2007 Quality of Service .

Network on Chip Tutorial. KTH . Jantsch. Princeton.Concatenation of Nodes S / β1 F α S1 β1 β2 S2 F* β2 A. May 6.74 Network Calculus . 2007 Quality of Service .

T2 = βmin(R1.R2). KTH .74 Network Calculus . 2007 Quality of Service .Concatenation of Nodes S / β1 F α S1 β1 β2 S2 F* β2 Example: β1 = βR1. Princeton.T1 ⊗ βR2. May 6.Network on Chip Tutorial. Jantsch.T1 β2 = βR2.T1+T2 A.T2 βR1.

T1 ⊗ βR2.T1+T2 f ⊗g =g⊗f (f ⊗ g ) ⊗ h = f ⊗ (g ⊗ h) (f + c) ⊗ g = (f ⊗ g ) + c for any constant c ∈ R A.Concatenation of Nodes S / β1 F α S1 β1 β2 S2 F* β2 Example: β1 = βR1. Jantsch.Network on Chip Tutorial.R2).T2 = βmin(R1. KTH Useful properties: .T2 βR1.T1 β2 = βR2. May 6.74 Network Calculus . Princeton. 2007 Quality of Service .

Jantsch. 2007 Quality of Service .Pay Bursts Only Once S / β1 F α S1 β1 β2 S2 F* β2 bits ρ σ R1 R2 R2=min( R1. May 6.75 Network Calculus . Princeton.Network on Chip Tutorial. KTH . R2) T1 T2 T1 + T2 t A.

T2 = R2 max(0. t − T2) A. t − T1) β2 = βR2.σ β1 = βR1. R2) T1 T2 T1 + T2 t α = γρ. Princeton. Jantsch. KTH .75 Network Calculus . 2007 Quality of Service .Pay Bursts Only Once S / β1 F α S1 β1 β2 S2 F* β2 bits ρ σ R1 R2 R2=min( R1. May 6.Network on Chip Tutorial.T1 = R1 max(0.

t − T2) βR1.T1+T2 = min(R1.Pay Bursts Only Once S / β1 F α S1 β1 β2 S2 F* β2 bits ρ σ R1 R2 R2=min( R1.T2 = βmin(R1.σ β1 = βR1. Princeton. R2) max(0. Jantsch. KTH .R2). R2) T1 T2 T1 + T2 t α = γρ.75 Network Calculus .T1 = R1 max(0. 2007 Quality of Service .Network on Chip Tutorial. t − (T1 + T2)) A.T2 = R2 max(0. May 6. t − T1) β2 = βR2.T1 ⊗ βR2.

T1+T2 = min(R1. May 6. 2007 Quality of Service . t − T2) βR1. R2) T1 T2 T1 + T2 t α = γρ.Network on Chip Tutorial. t − T1) β2 = βR2. Jantsch.T1 = R1 max(0.Pay Bursts Only Once S / β1 F α S1 β1 β2 S2 F* β2 bits ρ σ R1 R2 R2=min( R1.σ β1 = βR1.75 Network Calculus . t − (T1 + T2)) σ ρT1 σ D1 + D2 = + + + T1 + T2 R1 R2 R2 A.R2). Princeton.T2 = R2 max(0. R2) max(0.T1 ⊗ βR2.T2 = βmin(R1. KTH .

σ β1 = βR1.T2 = βmin(R1. 2007 Quality of Service . Princeton. t − (T1 + T2)) σ ρT1 σ D1 + D2 = + + + T1 + T2 R1 R2 R2 σ DS = + T1 + T2 min (R1. Jantsch.T2 = R2 max(0. KTH .75 Network Calculus . May 6.T1+T2 = min(R1. t − T2) βR1.T1 ⊗ βR2.Network on Chip Tutorial.T1 = R1 max(0. R2) T1 T2 T1 + T2 t α = γρ. R2) A.R2). R2) max(0.Pay Bursts Only Once S / β1 F α S1 β1 β2 S2 F* β2 bits ρ σ R1 R2 R2=min( R1. t − T1) β2 = βR2.

unloaded latency.76 Summary • Communication Performance: bandwidth. May 6. • Routing: deterministic vs source based vs adaptive routing. switch.Network on Chip Tutorial. Princeton. deadlock. loaded latency • Organizational Structure: NI. 2007 Summary . KTH . • Quality of Service and flow regulation A. Jantsch. link • Topologies: wire space and delay domination favors low dimension topologies.

Jantsch. 1998] Duato. Parallel Computer Architecture . Singh. W. Morgan Kaufmann. [DeMicheli and Benini. J. Los Alamitos. L. T.. W. Computer Society Press. Morgan Kaufman Publishers. May 6. (1998). Morgan Kaufman Publishers. Principles and Practices of Interconnection Networks. Networks on Chip. San Francisco. and Gupta. G. L. Morgan Kaufmann. LCNS 2050 A. (1991).An Engineering Approach. Interconnection Networks . (2004). [LeBoudec. and Ni. Princeton. (2001). 2004] Dally. (2006). and Towels. [Leighton. Network Calculus. 4(6):613–624.77 To Probe Further Classic papers: [Agarwal. B. 200] Jean-Yves LeBoudec... J. Yalamanchili. J. 2006] DeMicheli. D. Limit on interconnection performance. (1990). [Dally and Towels. Introduction to Parallel Algorithms and Architectures. 1992] Leighton. IEEE Transactions on Computers. J-Y. Performance analysis of k-ary n-cube interconnection networks. [Culler et al.. 1999] Culler. (1999). KTH .A Hardware/Software Approach. and Benini... A. E. J. Text books: [Duato et al. California. [Dally. Springer Verlag. (1992). 2007 Communication Performance In NoCs . 1990] Dally. S. F. 39(6):775–785. 1991] Agarwal. IEEE Transactions on Parallel and Distributed Systems. A.Network on Chip Tutorial. P.