Professional Documents
Culture Documents
1
High-Speed Router Design
Outline:
Introduction
Router Generations
Table Lookup
Switch Fabric Design
Buffer Placement
2
Where IP routers sit in the network
Core
router
Edge
Router
3
Basic Architectural Components
Routing
Protocols Control Plane
Routing
Table
Forwarding
Switching
Datapath
Table - per-packet
processing
4
Per-packet processing in an IP Router
1. Accept packet arriving on an incoming link.
2. Lookup packet destination address in the forwarding table => to
identify outgoing port(s).
3. Manipulate packet header: e.g., decrement TTL, update header
checksum.
4. Send packet to the outgoing port(s).
5. Buffer packet in the queue.
6. Transmit packet onto outgoing link.
5
Input Port Functions
Physical layer:
bit-level reception
Data link layer: Decentralized switching:
e.g., Ethernet given datagram dest., lookup output port using
routing table in input port memory
goal: complete input port processing at line
speed
queueing: if datagrams arrive faster than
forwarding rate into switch fabric
6
Output Ports
7
High-Speed Router Design
Outline:
Introduction
Router Generations
Table Lookup
Switch Fabric Design
Buffer Placement
8
First Generation Routers
System Bus
Input 1 Output
Numerous1work has proven and
made possible:
Fairness
Delay Guarantees
Delay Variation Control
Loss Guarantees
Input 2 Output 2
Statistical Guarantees
5ns SRAM
Shared
Memory
12
Second Generation Routers
Route Buffer
CPU Table Memory
Slow Path
14
Second Generation Routers
Queueing Structure: Combined Input and Output Queueing
1 write per packet time Rate of writes/reads 1 read per packet time
determined by bus speed
Bus
15
E.g. Cisco 7507 router
Front view
Rear view
Backplane
16
Third Generation Routers
Switched Backplane
MAC MAC
18
Third Generation Routers
Queueing Structure
1 write per pkt time Rate of writes/reads 1 read per pkt time
determined by switch
fabric speedup
Switch
19
E.g. Cisco 12000 series routers
http://www.cisco.com/warp/public/cc/pd/rt/12000/12416/
The Cisco 12000 series offers industry leading scalability, high
performance, and guaranteed priority packet delivery through an
innovative distributed architecture design that enables service providers
to accelerate the evolution of the Internet through delivery of profitable,
next generation services.
The Cisco 12416 Internet router is a 10 Gigabit, 16-slot chassis member
of the Cisco 12000 series that provides a total switching capacity of 320
Gigabits per second (Gbps), with 20 Gbps (10 Gbps full duplex) capacity
per slot.
20
Routers vs. Gateways
21
High-Speed Router Design
Outline:
Introduction
Router Generations
Table Lookup
Switch Fabric Design Routing
Protocols
Buffer Placement Routing
Table
Forwarding
Switching
Table
22
Forwarding Engine
Packet
payload header
Router
Forwarding Table
Dest-network Port
65.0.0.0/8 3
128.9.0.0/16 1
149.12.0.0/19 7
23
What makes table lookup difficult?
24
An example
0 128.9.16.14 232-1
224
65.0.0.0 65.255.255.255
25
Dest. IP Outgoing
Prefixes can overlap! Prefix Port
65.0.0.0/8 3
128.9.0.0/16 1
32
142.12.0.0/19 7
Prefix Length
128.9.16.0/21 2
Longest
24 matching prefix 128.9.176.0/24 128.9.172.0/21 4
65.0.0.0/8 142.12.0.0/19
128.9.0.0/16
8 Impossible!
0 128.9.16.14 232-1
31.25 Mpps 32 ns
28
Table growth of a typical backbone router
29
Prefix length distribution
Multicast
address
30
A standard solution: trie
31
Another example
Root
Dest. IP Outgoing
65 128 142
Prefix Port (65.*)
65.0.0.0/8 3
128.9.0.0/16 1
9 12
142.12.0.0/19 7 (142.12.*)
128.9.16.0/21 2
32
Need more than IPv4 unicast lookups
Multicast
Longest Prefix Matching on the source and group address
IPv6
128bit destination address field
Exact address architecture not yet known
Packet classification
33
High-Speed Router Design
Outline:
Introduction
Router Generations
Table Lookup
Switch Fabric Design Routing
Protocols
Buffer Placement Routing
Table
Forwarding
Switching
Table
34
Basic Architectural Components
Datapath: per-packet processing
35
Input Queueing vs Output Queueing
Single path
Choosing one usually depends on
Crossbar
where the switch will exist in the Broadcast
network and the amount of traffic Banyan
it will have to carry Batcher-banyan
Multiple path
Replicated Banyan
Dilated Banyan
Tandem Banyan
37
Three types of switching fabrics
Interconnection
Networks, e.g.
crossbar
38
Switching Via Memory
packet copied by systems (single) CPU
speed limited by memory bandwidth
39
Switching Via Bus
40
Switching Via Interconnection Networks
An interconnection network is usually constructed using 2 x 2
switching elements, e.g. crossbar switch
44
Crossbar Switch
Data In
2N buses in parallel
1
configuration Data Out
2
Inputs Complexity = N2
4 Cross state
1 2 3 4 Bar state
Outputs
45
Switching Via Interconnection Networks
An interconnection network is usually constructed using 2 x 2
switching elements, e.g. crossbar switch
4 x 4 banyan
8 x 8 banyan 16 x 16 banyan
47
Interconnecting pattern
Label switching stages from 0 to m-1 where N=2^m
Divide switching elements in stage k into 2^k groups
Connecting pattern between stage k and stage k+1:
Outputs from group i of stage k will be connected to inputs
of group 2i and group 2i+1 of stage k+1
Divide the outputs into upper half and lower half
Upper half outputs connected to upper inputs; lower half
outputs connected to lower inputs
Upper outputs of switching elements connected to group 2i
inputs; lower outputs of switching elements connected to
group 2i+1 inputs
Connecting to network inputs: upper inputs first
Connecting network outputs: parallel
48
Routing in banyan network
Each switching element in the i-th stage examines the i-th bit
of the destination address (most-significant-bit first) to make
the decision
if the bit = 1, route to the lower output
if the bit = 0, route to the upper output
011
000
001
010
011
011 100
101
101 110
101 111
Banyan Network
49
Blocking in banyan network
Internal blocking!
011
101 000
001
010
010
011
011 100
101
Output blocking!
110
101 111
Banyan Network
50
Other banyan-typed networks
Banyan Network
51
Simple switches based on banyan network
Pm Pm+1
2x2
Pm Pm+1
3 3
4 4
5 5
6 6
7 7
8 8
Stage 1 Stage 2 Stage 3 55
State transition diagram
1-qek(t) A1,n
A 1,n A2,n AB-1,n
A 1,n AB,n
E1,n E2,n EB,n
2,n
0, e 1, n 2, n B-1,n B, n
qek(t) A2,n AB-1,n
A1,n 1-rnk(t)
C2,b CB,b
(1-q1,bk(t))rbk(t) D1,n DB-1,n
B1,b B2,b BB-1,b BB,b
1, b 2, b B-1,b B, b
F1,b FB-1,b
56
Non-blocking Conditions for Banyan Networks
Theorem. The banyan network is nonblocking if the active inputs x1, ...., xm
( xj > xi if j > i) and their corresponding output destinations y1, ...., ym
satisfy the following:
1. (Distinct & monotonic outputs): y1 y2 ym or y1 y2 ym
2. (Concentrated inputs): Any input between two active inputs is also
active. That is, xi w x j implies input w is active.
x1 = 0000
x2 = 0001
x3 = 0010 y1 = 0010
y2 = 1011
y3 = 1100
57
Labeling switching elements in a banyan
Each node in stage k can be uniquely represented by two
binary numbers (an-k a1,b1 bk-1).
1 0 0 1
0001 (001, ) (01,1) (1,10) ( ,100) 1001
b1 bn1 bn
an a1 (an1 a1 , ) (an2 a1 , b1 ) ( , b1 bn1 ) b1 bn
58
Proof:
Suppose two packets, one from x = an a1 to output y = b1 bn, the
other from x = an a1 to output y = b1 bn , collide in stage k
That is, two paths
b1 bn1 bn
an a1 (an1 a1 , ) (an2 a1 , b1 ) ( , b1 bn1 ) b1 bn
b'1 b' 1 b'n
a'n a'1 (a'n1 a'1 , ) (a'n2 a'1 , b'1 ) n ( , b'1 b'n1 )
b'1 b'n
merge at the same node and share the same outgoing link
Stage k
(an-k a1,b1 bk-1) =
bk = bk
(an-k a1,b1 bk-1)
Thus, we have
an k a1 a'n k a'1
(A)
b1 bk b'1 b'k
59
Since input packets are concentrated, the total # of packets
between x and x, inclusively, is |x- x| + 1
Since all packets are destined for different outputs, thus there
must be |x-x|+1 distinct output addresses
Since outputs are monotonic, the largest and the smallest output
addresses must be y and y, or y and y. Hence we must have
x' x 1 y ' y 1
(B)
x' x y ' y
From (A), we have
000
001
010
011
011 100
101
110
101 111
Banyan Network
61
Theorem 2. Let the input-output pair of packet i be denoted by (xi,
yi). If the packets can be routed through the banyan network without
conflicts, so can the set of packets ((xi + z ) mod N, yi).
Proof: (try it yourself)
e.g. z = 5
x1 = 0000
x2 = 0001
x3 = 0010 y1 = 0010
x1 = 0101
x2 = 0110
x3 = 0111
y2 = 1011
y3 = 1100
62
Exercise:
000
001
010
011
100
101
110
111
63
P1U P2U P3U
Solution:
P0 000
P1L 001
010
011
P2L P3L 100
101
110
111
Let PiL and PiU be defined as shown in The overall loss prob. is
the figure. P0 is the input load and let P P3 0.3516 0.5890
Ploss 0 1 0.5297
P0 = 1. P0 2
At stage 1,
When a packet is equally likely to
P1U 1 (1 0.75P0 ) 0.9375
2
any of the 8 outputs,
P1L 1 (1 0.25P0 ) 2 0.4375 P1 P0 0.25 P02 0.75
At stage 2, P2 P1 0.25 P12 0.6094
P2U 1 (1 0.5P1U ) 2 0.7178 P3 P2 0.25 P22 0.5166
P2 L 1 (1 0.5P1L ) 2 0.3896 P 'loss 0.4834
At stage 3, Therefore, the loss probability in
P3U 1 (1 0.5P2U ) 2 0.5890 this case is higher than the case
P3 L 1 (1 0.5P2 L ) 2 0.3516
that a packet is equally likely to
be destined for any outputs.
64
Sorting Networks
01 00 01 00
11 01 11 01
10 10 11
00 11 00
min{a1,a2} min{a1,a2,a3,a4}
a1 b1
max{min{a1,a2},min{a3,a4}}
a2 b2
a3 b3
min{max{a1,a2},max{a3,a4}}
a4 max{a3,a4}
b4
max{a1,a2,a3,a4}
Stage: 1 2 3
Comparator: it takes two input numbers and places the
larger number on the output pointed by the arrow
and the smaller number on the other output
65
Order-preserving Property: Suppose a sorting network sorts the
input sequence a = a1,a2,...,aN into the output sequence b=
b1,b2,...,bN , then for any monotonically increasing function f, the
network sorts the input sequence f(a) = f(a1),f(a2),...,f(aN) into the
output sequence b = f(b1),f(b2),...,f(bN).
5 Sorting 1
4 Network 4
7 5
1 7
E.g. f(x) = x+2
2 7= f(5) Sorting f(1) =3
6= f(4) Network f(4) =6
9= f(7) f(5) =7
3= f(1) f(7) =9
What if f(x) = x+2, if x < 2;
f(x)= x -3, if x 2 2= f(5) Sorting f(1) =3
1= f(4) f(4) =1
2 4= f(7)
Network
f(5) =2 X
3= f(1) f(7) =4
2
66
Theorem 3 (Zero-One Principle) If a sorting network with N inputs
sorts all the 2N possible sequences of 0s and 1s correctly, then it sorts all
sequences of arbitrary input numbers correctly.
Proof:
Consider a sorting network can sort all sequences of 0s and 1s correctly. By
contradiction, suppose it does not sort input sequences of arbitrary numbers correctly.
That is, there is an input sequence a1,a2,...,aN containing two elements ai and aj such
that ai < aj , but the network places aj before/above ai.
Define a monotonically increasing function f(x) such that f(x) = 0, if x ai ; f(x) = 1,
if x > ai.
According to the order-preserving property, since the network places aj before/above
ai when the input sequence is a1,a2,...,aN , it places f(aj)=1 before/above f(ai)=0
when the input sequence is f(a1),f(a2),...,f(aN). But this input sequence consists of
only 0s and 1s, and yet the network does not sort it correctly, leading to a
contradiction.
Sorting Sorting
a1 0= f(a1)
Network aj Network f(aj) =1
a2 1= f(a2)
...
...
ai : : f(ai) =0
aN 0= f(aN)
67
Sorting networks based on bitonic sort
Merging is a divide-and-conquer technique for sorting.
A k-merger takes two sorted input sequences and merge them
into one sorted sequence of k elements.
Intuitively merging is simpler than sorting in general.
Suppose we have mergers of different sizes, they can be
interconnected (as shown below) to sort an arbitrary input
sequence.
One way to construct the mergers is to use bitonic sorting
algorithm invented by Batcher.
2-merger
4-merger ... N/2-merger
...
2-merger
...
...
N-merger
2-merger
4-merger ... N/2-merger
...
2-merger
68
Some properties of bitonic sequence
A bitonic sequence is a sequence that either increases monotonically
and then decreases monotonically, or decreases monotonically and
then increases monotonically.
E.g. 1,3,5,7,6,4,2,0, 7,5,3,1,0,2,4,6, 1,2,3,3,2,1
A bitonic sorter is a merger that takes a bitonic sequence and sort it
into a monotonic sequence.
...
sorter
...
...
...
sequence sequence
69
Some properties of bitonic sequence
We focus on bitonic sequences with only 0s and 1s (Why?)
Two general forms: 1i0j1k or 0i1j0k
A bitonic sequence a is said to be no less than another bitonic
sequence if none of the element in a is less than any of the element in
b.
e.g. 00000 01110, 11111 01110,
Two sequences do not necessarily have an ordering relationship.
e.g. 00010 and 01110
70
Some properties of bitonic sequence
Theorem 4. If a zero-one sequence of 2n elements a = a1,a2,...,a2n is
bitonic then the two n-element sequences
a = min(a1,an+1),min(a2,an+2), ..., min(an,a2n) and
a= max(a1,an+1),max(a2,an+2) ,..., max(an,a2n)
have two properties:
1. They are both bitonic.
2. a < a. a1 min(a1,an+1)
Proof: a2 min(a2,an+2)
a
...
a ...
an
min(an,a2n)
an+1 max(a1,an+1)
an+2 max(a2,an+2) a
...
...
a2n max(an,a2n)
71
Recursive construction of a k-bitonic sorter
k-half cleaner k/2-half cleaner 2-half cleaner
a1 min(a1,ak/2+1)
Ascending Sequence
a2 min(a2, ak/2+2)
...
...
...
...
min(ak/2,ak)
ak/2
max(a1,ak/2+1)
...
ak/2+1
max(a2,ak/2+2) ...
...
...
...
ak max(ak/2,ak)
k-bitonic k/2-bitonic
sorter
k-half
...
sorter
Cleaner
k/2-bitonic
...
sorter
72
An 8 x 8 Sorting Network using bitonic sorters
2-merger
4-merger
2-merger
8-merger
2-merger
4-merger
2-merger
Note: a merger is implemented using a bitonic sorter
3 2 2 2 2 2 2 2 2 2 2 1
2 3 8 8 3 3 6 6 4 4 1 2
8 8 3 3 8 7 3 3 3 1 4 3
Total # of stages
7 7 7 7 7 8 5 5 1 3 3 4
= 1 + 2 + 3 + + log N
1 1 1 5 5 6 7 4 6 6 6 5
= (1 + log N) log N / 2
6 6 5 1 6 5 4 7 7 7 5 6
5 5 6 6 1 4 8 1 5 5 7 7
4 4 4 4 4 1 1 8 8 8 8 8
N log N (log N 1)
The total # of comparators in an N x N Batcher Network is:
4
73
Batcher-Banyan network
Batcher Banyan
Inputs Outputs
74
Switching in Batcher-banyan network
Only headers are compared
Order of arrival from right to left
If both header bits of the
two packets are 0s or 1s, the ... 0100 Assume comparator is
comparator remains in its in bar state
original state (in this case,
... 1000
Payload Header
bar state) and the bits are
forwarded to the outputs. ... 010 0 Remains in bar
For the first pair of bits that ... 100 0
after 1st bit
differ, set the comparator
state accordingly and remains
unchanged for the rest of the ... 01 00 Remains in bar
packet ... 10 00 after 2nd bit
75
Contention resolution in Batcher-banyan network
Batcher Banyan
0001
001
0100
100
1xxx
i
0100
100
0000
000
1xxx
i
1xxx
i
0101
101
76
Three-phase algorithm
How to solve the output contention problem?
Three phase algorithm for resolving output contention:
Probe phase: only the header of packets enter the sorting
network. Packets with the same output address will be
adjacent to each other at the outputs. Output j+1 checks with
output j to see if their addresses are the same. If yes, let the
packet at output j+1 be the loser and the packet at output j be
the winner.
Acknowledgment phase: acknowledgements are back-
propagated along the same path as the forward path in the
probe phase.
Send phase: send the winning packets; inputs that have lost
contention can buffer their packets for later attempt (=>
waiting system approach)
The first two phases are overheads.
77
An example
0001
001 0000
000 000
X 0100
100 0001
001 001
1xxx
i 010
0100 010
0100
100 0100
011 011
0000
000 0101
100 100
1xxx
i 101
1xxx 101
1xxx
i 1xxx
110 110
0101
101 111
1xxx 111
0001
001 000
000
0000 000
1xxx
100 001
0001
001 001
1xxx
i 010
100
0100 010
0100
100 101
0101
011 011
0000
000 100
xxx
1xxx 100
1xxx
i 101
xxx
1xxx 101
1xxx
i xxx
1xxx
110 110
0101
101 111
xxx
1xxx 111
78
Multiple-Path Banyan Switch Designs
Complexity of Batcher-banyan
79
Multiple-Path Banyan Switch Designs
Dilated Banyan d
The internal link bandwidth is expanded
to reduce the likelihood of a packet
being dropped
For a banyan with dilation degree of d,
the switch elements are of size 2d x 2d.
Each outgoing address has d associated
outgoing links
Replicated Banyan
Suppose each packet is randomly routed Random router
to one of the banyan for switching. The or broadcaster
load to each banyan is reduced by a
factor of K, thus 1 1st banyan 1
Ploss = n P0 / (n P0 + 4K).
Instead of random routing, we can 2 2
broadcast a packet to all K banyan
planes.
Since a packet is lost only if all copies
are lost, so N K-th banyan N
Ploss = (n P0 / (n P0 + 4))K.
80
Tandem Banyan Packet filter for marked packets
1st banyan 2nd banyan K-th banyan
1
N
Delay
elements
Concentrator
N
Delay
elements
Concentrator
Output 1 Output N
82
A practical issue
83
Three-stage switching network
Clos Network
Switch modules are
arranged in three stages and Switch module
any module in the first r1 x r3
(second) stage is n1 x r2 r2 x n3
1
interconnected with any
1 1
module in the second (third)
stage via a unique link
...
Each switch module is a
...
...
nonblocking switch
For an N x N 3-stage
switch: r1 r3
n1 x r1 = n3 x r3 = N
r2
84
An example
n1 = r2 = r1 = r3 = n3 = 3
1 1 1
2 2 2
3 3 3
85
Theorem A three-stage switching network is strictly
nonblocking if and only if
r2 n1 + n3 -1
Proof:
n1 -1
r1 x r3
n1 x r2 1 r2 x n3
1 1
n1-1 inputs are busy
i.e. only one is idle
...
...
...
Batcher-banyan networks
Design of a sorting network (Batcher) to satisfy the non-
blocking conditions
Multipath banyan networks
Keeping the performance of batcher-banyan and
advantages of banyan
Tackling the issue on interconnecting chips with
limited I/Os to form a larger switch fabric
87
High-Speed Router Design
Outline:
Introduction
Router Generations
Table Lookup
Switch Fabric Design
Input Port Queueing
Buffer Placement Output Port Queueing
Combined Input Output Queueing
88
Input Port Queuing
Note: When we call a switch is input-queued, we imply that in each time slot,
the switch fabric can allow at most one packet to be sent by an input port, and
at most one packet can be received by an output port.
Fabric slower than input ports -> queueing may occur at input
queues
Head-of-the-Line (HOL) blocking: queued datagram at front of
queue prevents others in queue from moving forward
89
An Analogy
90
Input Port Queuing
Performance
Load
58.6% 100% 91
92
Input Queueing -- Virtual output queues
93
Input Queues
Virtual Output Queues
Memory b/w = 2R
Scheduler
Delay
Load
58.6% 100%
94
Input Queueing 1 7 1
Scheduling 2 4
2
2
2
3 3
5
4 2 4
?
1 1
2 2
3 3
4 4
Bipartite
Matching
Question: Maximum weight or maximum size? (Weight = 18)
95
Input Queueing
Scheduling
Maximum Size
Maximizes instantaneous throughput
Does it maximize long-term throughput? Not necessarily
Maximum Weight
Can clear most backlogged queues
But does it sacrifice long-term throughput? No.
96
Input Queueing
Why is serving long/old queues better than serving maximum
number of queues?
Non-uniform traffic
Uniform traffic
Avg Occupancy
Avg Occupancy
VOQ #
VOQ #
97
Points to Ponder:
k queues
? N outputs
98
Odd-even switch
INPUTS OUTPUTS
Queue 1 (Odd) 5 3
Queue 2 (Even) 8 6 2 4
1 7
8x8
Queue 1 (Odd)
Queue 2 (Even) 6 4 2 Non-blocking
Switch
Queue 1 (Odd) 3 3 1
Queue 2 (Even) 4 2
100
High-Speed Router Design
Outline:
Introduction
Router Generations
Table Lookup
Switch Fabric Design
Input Port Queueing
Buffer Placement Output Port Queueing
Combined Input Output Queueing
101
Output port queueing
N 1
103
Comments:
Both an output-buffered switch and an input-buffered VOQ switch do not
suffer from HOL blocking.
Does it mean they should have the same performance?
No.
Which one will be better?
Output-buffered switch.
Why?
Intuitively let us consider the case that an output line is idle. For an
output-buffered switch, a packet (if any) in the output buffer can be sent
immediately; for an input-buffered VOQ switch, even there are packets
waiting at the input buffers for this output port, they can not be
immediately sent to the output port/line because of the constraints
imposed by the VOQ scheduling algorithms.
e.g. using longest queue first, the queue destines for this output port may not
be the longest
104
Combined Input-output Queueing
Can we design an input-queued switch that functions
exactly as an output-queued switch?
105
Using Speedup
1
2
1
2
106
The Ideal Solution
Output Queued Switch
1
N N
=?
N
107
The findings
But
How to make such an algorithm fast and efficient for
real implementation?
108
High-Speed Router Design
Summary:
Introduction
Router Generations
Table Lookup
Switch Fabric Design
Buffer Placement
109