Professional Documents
Culture Documents
Anatomy
of Internet Routers
Josef Ungerman
Cisco, CCIE #6167
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Connect 1
Agenda
Packet Processors
• Lookup, Memories, ASIC, NP, TM, parallelism
• Examples, evolution trends
Switching Fabrics
• Interconnects and Crossbars
• Arbitration, Replication, QoS, Speedup, Resiliency
Router Anatomy
• Past, Present, Future – CRS, ASR9000
• 1Tbps per slot?
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public 2
Hardware Router
Control Plane vs. Data Plane
IC
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Hardware Router
Centralized Architecture
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Scaling the Forwarding Plane
NP Clustering
DRAM
RP
CPU
Centralized Router with ASIC Cluster
- ME3600X (fixed)
- ASR903 (modular with RSP)
- ASR1006 (modular with FP and RP)
Ctrl Mem Cluster Interconnect Ctrl Mem
NP TM NP DRAM
Pkt Mem IC, TM Pkt Mem
FP
Distributor
CPU
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Scaling the Forwarding Plane
Switching Fabric
DRAM
RP
CPU
Centralized Router with Switching Fabric
- Cisco 7600 without DFC (TM on LC)
- Cisco ASR9001
Switching
Ctrl Mem Fabric Ctrl Mem
NP TM NP DRAM
Pkt Mem TM Pkt Mem
FP
CPU
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Scaling the Forwarding Plane
Distributed Architecture
Distributed Router
DRAM
- Cisco 7600 with DFC (ES+)
RP
CPU
- Cisco 12000
- Cisco CRS
- Cisco ASR9000
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Packet Processors
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Packet Processing Trade-offs
Performance vs. Flexibility
CPU (Central Processing Unit)
• multi-purpose processors
CPU • high s/w flexibility [weeks], but low performance [1’s of Mpps]
• high power, low cost
• usage example: access routers (ISR’s)
Output Mux
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
“It is always something
RFC 1925
“The Twelve Networking Truths”
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
Hardware Routing Terminology
BGP OSPF
LDP RSVP-TE
Static
ISIS EIGRP
LSD RIB RP
ARP
FIB Adjacency
SW FIB
AIB LC NPU
AIB: Adjacency Information Base
RIB: Routing Information Base
LC CPU FIB: Forwarding Information Base
LSD: Label Switch Database
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
FIB Memory & Forwarding Chain
TLU/PLU
memories storing Trie Data (today typically RLDRAM)
Typically multiple channels for parallel/pipelined lookup
PLU TLU
M-Trie or TBM
Root L3 Load L2 Load
L3 Table L2 Balancing
Balancing
Table
32-way 64-way
. . .
CAM (Content Addressable Memory) 0008.7c75.f431 01
“Associative Memory” 0008.7c75.c401 02
0008.7c75.f405 03
SRAM with a Comparator at each cell . . .
. . .
TCAM (Ternary CAM) 192.168.100.xxx 801
192.168.200.xxx 802
“CAM with a wildcard” (VMR) 192.168.300.xxx 803
CAM with a Selector at some cells . . .
IP Lookup Applications
L3 Switching (Dst Lookup) & RPF (Src Lookup) TCAM Evolution
CAM2 – 180nm, 80Msps, 4Mb, 72/144/288b wide
Netflow Implementation (flow lookup)
CAM3 – 130nm, 125 Msps, 18Mb, 72/144/288b wide
ACL Implementation (Filters, QoS, Policers…) CAM4 – 90nm, 250Msps, 40Mb, 80/160/320b wide
Parallelism Principle #1
Pipeline
• Systolic Array, 1-D Array
PLU/TLU Scale (Pipeline Depth):
DRAM SDRAM TCAM3 SRAMs • multiplies instruction cycle budget
• allows faster execution (MHz)
• non-linear gain with pipeline depth
ACL • typically not more than 8-10 stages
u-code L3 fwd CAM L2 fwd u-code QOS u-code
NF
DRAM
headers only
TM ASIC
DMA Logic
- 2K queues
- 2L shaping
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
16 Mpps, 10 Gbps
SMP Pipelining Programmable ASIC 130/90nm, u-programmable
2 per LC (Rx, Tx)
2004: Engine5 (SIP) – Cisco 12000 240W/10G = 24 W/Gbps
FIB
Parallelism Principle #2:
DRAM TCAM3 RLDRAM TCAM3 SRAMs SMP (Symmetric Multiprocessing)
• Multi-Core, Divide & Conquer
Scale (# of cores)
ACL
u-code u-code L3 fwd CAM L2 fwd u-code QOS u-code • Instruction/Thread/Process/App
NF
ACL granularity (CPU’s: 2-8 core)
u-code u-code L3 fwd CAM L2 fwd u-code QOS u-code
NF • IP: tiny repeating tasks (typically up
ACL
u-code u-code L3 fwd CAM L2 fwd u-code QOS u-code
to hundreds of cores)
NF
ACL
u-code u-code L3 fwd CAM L2 fwd u-code QOS u-code DRAM
NF
headers only
TM ASIC
DMA Logic
- 8K queues
- 2L shaping
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
SMP NPU 2004 QFA (SPP)
80 Mpps, 40 Gbps
QFA (Quantum Flow Array) – CRS 130nm, 188 cores
185M transistors
PLU/TLU Stats/Police ACL
2 per LC (Rx, Tx),~9 W/Gbps
DRAM RLDRAM
FCRAM SRAM TCAM4
TCAM3
256 Engines
188 Future
40nm version
400G
Pkt DRAM More cores, more MHz
Integrated TM
headers only Faster TCAM etc.
TM ASIC
64Kqueues
- 8K queues
Distribute & Gather Logic - 3L shaping
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
“If you were plowing a field, which would you rather use:
Two strong oxen or 1024 chickens?”
Seymour Cray
Clustering XC
2012 QFP
Processing Pool 32 Mpps, 60 Gbps
Fast Memory Access 45nm, C-programmable
160 Engines
256 Clustering capabilities
(64 PPEs x 4 threads)
(40 SOC, Integrated TM
on-chip RLDRAM2 7
sees full packet bodies
resources SRAM TCAM4 Central engine (ASR1K)
RLDRAM2 0
complete packets
Resources & Memory Interconnect
complete packets
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pipelining SMP NPU
Cisco 7600 ES+, ASR9000
Pkt
Pkt DRAM
DRAM
DRAM TCAM SRAM
TM ASIC
Resources & Memory Interconnect - 32K queues
- 4L shaping
PARSE SEARCH1 RESOLVE SEARCH2 MODIFY
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Interconnects Technology Primer
Capacity vs. Complexity
arbiter
Bus
• half-duplex, shared media
• standard examples: PCI [800Mbps], PCIe [Nx 2.5Gbps], LDT/HT [25Gbs]
• simple and cheap
IC
Serial Interconnect
• full-duplex, point-to-point media
• standard examples: SPI [10Gbps], Interlaken [100Gbps]
• Ethernet interfaces are very common (SGMII, XAUI,…)
arbiter
Switching Fabric (cross-bar)
• full-duplex, any-to-any media
• proprietary systems [up to multiple Tbps]
• often uses double-counting (Rx+Tx) to express the capacity
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Q: What’s the capacity? 4 fabric ports @ 10Gbps
Switching Fabric A: (ENGINEERING) 4 * 10 = 40Gbps full-duplex
A: (MARKETING) 4 * 10 * 2 = 80Gbps
2
IP/MPLS
TX
ASIC
MULTICAST
to slots 2,3,4
IP/MPLS
3
TX
ASIC
UNICAST
to slot 3 Type B (1960)
Western Electric
IP/MPLS 4
TX 100-point 6-wire
ASIC
FIA (Fabric crossbar switch
Interface ASIC)
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Fabric Port engineering – examples
CRS-3/16 – 4.48Tbps (7+1)
8x 5Gbps links
5x 5Gbps links
• 16 linecards, 2 RP
• linecard up to 140G (next 400G)
• backwards compatible 14x 10G TX RX 100G
per-cell loadsharing
8 planes = 200Gbps
- 8/10 code = 160Gbps
- cell tax = ~141Gbps
Active RSP440
16x 7.5Gbps links
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
“Non-Blocking” voodoo
RFC1925: It is more complicated than you think.
Ingress Egress
Linecards Linecard
TX
10G 10G sRX Non-blocking!
• zero packet loss
TX
10G 10G RX • port-to-port traffic profile
• certain packet size
10G 10G RX
Blocking (same fabric)..?
TX
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Example: 16x Multicast Replication
Egress Replication
Ingress RX
Linecards
Good:
Egress Replication
TX RX
• Cisco CRS, 12000
RX
• Cisco ASR9K, 7600
TX
10Gbps of multicast
RX
eats 10Gbps fabric bw!
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
What if the fabric can’t replicate multicast?
Ingress Replication Flavors
Egress Linecards
Ingress RX Bad:
Linecards Ingress Replication
TX RX • central replication or
encapsulation engines
*) of course, this is used in centralized routers
TX RX
10Gbps of multicast
RX eats 160Gbps fabric bw!
63 64 (10G multicast impossible)
16 17 18 19 20 21 22 23 24 25 26 27 29 30 31 32
15
07
14
03
RX
Ingress Good-enough/Not-bad-enough:
13
06
12
RX
01
33 34 36 38 40 42 44 46 48
Linecards
11
10
02
TX RX
09
08
TX RX
• non-Cisco
TX RX
TX RX
10Gbps of multicast
RX
eats 80Gbps fabric bw!
RX
(10G multicast impossible)
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cell dip explained
cell format cell cell payload
Fixed overhead [cell header, ~10%]
example:
hdr [48B] Relative overhead [fabric header]
[5B]
Variable overhead [padding]
Cell Tax effect on traffic: saw-tooth curve super-cell or super-frame (packet packing)
% of Linerate
cell buffer
hdr hdr
beginning of IP Packet 3
cell buffer
rest of IP Packet 3 hdr
IP Packet 4
hdr
40
100
200
300
400
500
600
700
800
900
100
0
110
0
120
0
130
0
140
0
150
0
BRKSPG-2772 L3 Packet Size
© 2012 Cisco [B]
and/or its affiliates. All rights reserved. Cisco Public
Q: Is this SF non-blocking?
Cell dip A: (MARKETING) Yes Yes Yes !
A: (ENGINEERING) Non-blocking for unicast
packet sizes above 53B.
100%
80%
60%
Cell dip – per vendor @ 41B, 46B, 53B, 103B,...
40%
20%
0%
1400
40
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1500
L3 Packet Size [B]
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cell dip gets bad – too low speedup (non-Cisco)
MARKETING: this is non-blocking fabric*
*) because we can find at least one packet size that does not block
1N0%
Percentage of Linerate
100%
80%
60%
40%
20%
0%
1400
40
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1500
L3 Packet Size [B]
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cell dip gets worse – multicast added (non-Cisco)
MARKETING: this is non-blocking fabric*
*) because we can still find at least one packet size that does not block, and your network does not have that much multicast anyway
1N0%
Percentage of Linerate
100%
80%
60%
40%
20%
0%
1400
40
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1500
L3 Packet Size [B]
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Router is blocking – 1 fabric cards fails (non-Cisco)
MARKETING: this is non-blocking fabric*
*) because nobody said the non-blocking capacity is “protected”
1N0%
Percentage of Linerate
100%
80%
60%
40%
20%
0%
1400
40
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1500
L3 Packet Size [B]
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
What is “Protected Non-Blocking”
8 Fabric Cards
CRS-1: 56G 49G 42G 2x egress speedup
CRS-3: 141G 123G 106G is always kept
CRS-1
40G non-blocking even with
1 or 2 failed fabric cards 40G
40G TX RX
100G 100G
CRS-3
100G eth. non-blocking with
1 or 2 failed fabric cards
X
X
failed RSP: 440G 220G Active RSP
ASR9000
TX
Active RSP
RX 200G non-blocking even with a
2x 100G 2x 100G failed RSP
X
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
HoLB (Head of Line Blocking) problem
Solution 1: Solution 2:
Traffic Lanes per direction Enough Room
wait
(= Virtual Output Queues) (= Speedup)
grrr -
blocked
go go
wait wait
go
Red Light, or
Traffic Lights “Traffic Jam
(= Arbiter) Ahead“ message Highway Radio
(= Backpressure) (= Flow Control)
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Good HoLB Solutions
Fabric Scheduling + Backpressure + QoS
Arbitrated: Implicit backpressure
Ingress Egress
arbiter + Virtual Output Queues (VOQ)
Linecards Linecard
10G 10G sRX
• Cisco 12000 (also ASR9000)
TX
• per-destination slot queues VOQ
IP ASIC fab ASIC
no Grant = implicit backpressure (Virtual Output Queues)
8q 10G 10G
8q
TX RX
• GSR: 16 slots + 2 RP’s * 8 CoS
+ 8 Multicast CoS =152 queues per LC
Virtual Output Queues (IP)
-8 per destination (IPP/EXP)
-Voice: strict scheduling
-Multicast: separate queues
Speedup Queues (packets)
explicit -u’cast: strict EF, AF, BE
backpressure -m’cast: strict Hi, Lo Output Buffered: Explicit backpressure
141G 226G RX
+ Speedup & Fabric Queues
TX
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
rnxm rmxn
Multi-stage Switching Fabrics crossbars
1
1
crossbars
1
n 1 2 1 n
…
Multi-stage Switching Fabric 2 2
N r … r N
m
50’s: Ch. Clos – general theory of multi-stage telephony switch Unidirectional Communication N=rxn
IP lookup,
ASR9000
IP lookup,
set VOQ Backup/Shadow
Egress EFP • Multi-stage Fabric
Primary Queuing
Arbiter Arbiter • Granular Central VOQ arbiter
VOQ • VOQ set per destination
FIA Q
NP Fab Q Fab Q NP Destination is the NP, not just slot
1NP 1NP TM
2
4 per 10G, 8 per 40G, 16 per 100G
2
NP
3NP
NP • 4 VOQ’s per set
3NP
4
NP 4 4 VOQ’s per destination, strict priority
NP
5NP 5NP Up to 4K VOQ’s per ingress FIA
6
NP
6
NP • Example (ASR9922):
7NP 7NP
8
20 LC’s * 8 10G NP’s * 4 VOQ’s
FIA 8
FIA = up to 640 VOQ’s per ingress FIA
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
Router Anatomy
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
2004: Cisco CRS – 40G+ per slot
midplane 40G
SPA TM TM
TM TM TM TM
NP NP
SPA CPU CPU
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
2010: Cisco CRS – 100G+ per slot
Next: 400G+ per slot (4x 100GE)
same backward-compatible architecture, same upgrade process
RP (active) RP (standby)
midplane 40G
SPA TM TM
TM TM TM TM
NP NP
SPA CPU CPU
100GE NP NP
midplane 140G
midplane 140G
TM TM
CMOS
DSP/ADC
TM TM TM TM 14x 10GE
NP NP
CPU CPU
141G
123G rx
4, 8 or 16 Linecard slots + 2 RP slots 226G
197G tx
QFA (Quantum Flow Array)
- 140 Gbps, 125 Mpps [programmable]
- one for Rx, one for Tx processing
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
CRS Multi-Chassis
RP (active) RP (standby)
midplane 40G
SPA TM S2 TM
S1 S3
TM TM TM TM
NP NP
SPA CPU CPU
100GE NP NP
midplane 140G
midplane 140G
TM TM
CMOS
DSP/ADC
TM TM TM TM 14x 10GE
NP NP
CPU CPU
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
CRS Multi-Chassis (Back-to-Back, 2+0)
RP (active) RP (standby)
midplane 40G
SPA TM
S13 S13
TM
TM TM TM TM
NP NP
SPA CPU CPU
100GE NP NP
midplane 140G
midplane 140G
TM TM
CMOS
DSP/ADC
TM TM TM TM 14x 10GE
NP NP
CPU CPU
Fabric Chassis
Shelf
RP (active) Controller RP (standby)
S2
midplane 40G
SPA TM
S1 S3
TM
TM TM TM TM
NP NP
SPA CPU CPU
100GE NP NP
midplane 140G
midplane 140G
TM TM
CMOS
DSP/ADC
TM TM TM TM 14x 10GE
NP NP
CPU CPU
CPU CPU
4 or 8 Linecard slots
(4 planes active)
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public
2011: Cisco ASR9000 – 200G+ per slot
Next: 800G+ per slot
new RSP, faster fabric, faster NPU’s, backwards compatible
RSP (Route/Switch Processor)
• CPU + Switch Fabric
RSP440 (active) • active/active fabric elements
Core or Edge LC – 24x 10GE RSP440 (fab. active) Core or Edge LC – 2x 100GE
TM NP’
NP NP’ TM
TM CPU CPU TM
TM NP’
NP NP’ TM
TM NP’ CPU CPU
NP NP’ TM
TM TM
TM NP’
NP NP’ TM
440G
220G
4 or 8 Linecard slots
Typhoon Network Processor
(4 planes active) - 120 Gbps, 90 Mpps [programmable]
BRKSPG-2772
- shared or dedicated
© 2012 Cisco and/or its affiliates. All rights reserved.
for Rx and Tx
Cisco Public
2012: Cisco ASR9922 – 500+G per slot
Next: 1.5T+ per slot
7 fabric cards, faster traces, faster NPU’s, backwards compatible
RP (active) RP (standby)
DWDM
TDM + Packet Optics
Switching &
Routing
DWDM
Commons
LASER CPAK
color 1
LASER Simple Laser Specifications
color 2
System
Signaling Optical
MUX
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
NPU Model (40nm)
Terabit per-slot… Resources & Memory Interconnect
Packet Processors
• 45nm, 40nm, 28nm, 20nm process
• Integrated functions – TM, TCAM, CPU, RLDRAM, OTN,…
• ASIC slices – Firmware ISSU, Low-power mode, Partial Power-off
Router Anatomy
• Control plane enhancements – SDN, IP+Optical
• DWDM density – OTN/DWDM satellites
• 100GE density – CMOS optics, CPAK/CFP4
• 10GE density (TGE breakout cables) and GE density (GE satellites)
BRKSPG-2772 © 2012 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
Thank You.
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Connect 52