You are on page 1of 81

BRKARC-3222

Cisco Nexus 9000


Architecture

Tim Stevenson
Distinguished Engineer, Technical Marketing

CCIE 5561 Emeritus


Cisco Spark
Questions?
Use Cisco Spark to communicate
with the speaker after the session

How
1. Find this session in the Cisco Live Mobile App
2. Click “Join the Discussion”
3. Install Spark or go directly to the space
4. Enter messages/questions in the space

cs.co/ciscolivebot#BRKARC-3222

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Session Abstract
This session presents an in-depth study of the architecture of the latest generation
of Nexus 9000 modular and top-of-rack data center switches. Topics include
forwarding hardware, switching fabrics, and other physical design elements, as
well as a discussion of key hardware-enabled features and capabilities that
combine to provide high-performance data center network services.

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 4
What This Session Covers
• Latest generation of Nexus 9000 switches with Cloud Scale ASICs
• Nexus 9500 modular switches with Cloud Scale linecards
• Nexus 9300 Cloud Scale top-of-rack (TOR) switches
• System and hardware architecture, key forwarding functions, packet walks

Not covered:

• First generation Nexus 9000 ASIC/platform


architectures

• Nexus 9500 merchant-silicon based architectures

• Other Nexus platforms

• Catalyst 9000 platform

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Agenda

• Data Center and Silicon Strategy


• Cloud Scale Architecture
• Cloud Scale ASICs
• Forwarding and Features
• Cloud Scale Switching Platforms
• Packet Walks
• Key Takeaways
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 7 7
Nexus 9000 Switching Portfolio
Key Elements of the ASAP Data Center

Nexus 9500 X9400 / X9400-S Nexus 9500 X9600-R / X9600-RX Nexus 9500 X9700-EX/FX
Nexus 9300-EX/FX/FX2

Merchant Broadcom XGS Merchant Broadcom DNX Cisco Cloud Scale


(Trident2+ / Tomahawk) (Jericho) (LSE / LS1800FX /
S6400 / LS3600FX2)

• Broadcom SOC solution • Multi-chip architecture • Cisco SOC solution


• Wide industry availability • Large forwarding tables • Rich forwarding feature
• Published SDK • Deep packet buffer / VOQ set
• Cell-based fabric • Smart buffers
• Advanced telemetry
• Optimized scale, cost,
power
Focus of this session

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
Why Custom Silicon?
Cisco competitive advantage – vehicle for differentiating innovations
• Application Centric Infrastructure (ACI) policy • Intelligent Buffers – DBP / AFD / DPP
model + congestion-aware flowlet switching • Streaming telemetry:
• Flexible forwarding tiles Flow Table for Tetration Analytics
Flow table event notifications
• Single-pass tunnel encapsulations
Streaming Statistics Export (SSX)
• In-built encryption technologies
MACSEC, CloudSec

Tight integration between hardware / software / marketing / sales / support


• Closely aligns hardware designs with software innovations, strategic product
direction, competitive differentiators, serviceability

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
Agenda

• Data Center and Silicon Strategy


• Cloud Scale Architecture
• Cloud Scale ASICs
• Forwarding and Features
• Cloud Scale Switching Platforms
• Packet Walks
• Key Takeaways
Cisco Cloud Scale ASIC Family
• Ultra-high port densities → Reduces equipment footprint,
enables device consolidation
• Multi-speed 100M/1/10/25/40/50/100G → Flexibility and future
proofing
• Rich forwarding feature-set → ACI, Segment Routing, single-
pass VXLAN routing
• Flexible forwarding scale → Single platform, multiple scaling
alternatives
• Intelligent buffering → Shared egress buffer with dynamic,
advanced traffic optimization
• In-built analytics and telemetry → Real-time network visibility for
capacity planning, security, and debugging

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 11
Slice 0
900G
Cloud Scale Family Members Slice Interconnect
Slice 1
LSE 900G
• 1.8T chip – 2 slices of 9 x 100G each
LSE – 18 x 100G
• X9700-EX modular linecards; 9300-EX TORs Slice 0
1.8T
LS1800FX
• 1.8T chip – 1 slice of 18 x 100G with MACSEC
• X9700-FX modular linecards; 9300-FX TORs
Slice 0 Slice 1 LS1800FX – 18 x 100G
1.6T 1.6T
S6400
• 6.4T chip – 4 slices of 16 x 100G each Slice Interconnect

• 9364C TOR; E2 fabric modules Slice 2 Slice 3 Slice 0


1.6T 1.6T 1.8T
LS3600FX2 Slice Interconnect
• 3.6T chip – 2 slices of 18 x 100G with MACSEC + S6400 – 64 x 100G
Slice 1
CloudSec 1.8T
• 9300-FX2 TORs
LS3600FX2 – 36 x 100G
BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 12
What Is a “Slice”?
Slice
• Self-contained forwarding complex Ingress Slice 1 Interconnect
controlling subset of ports on single Egress Slice 1
ASIC
• Separated into Ingress and Egress
functions Ingress Slice 2

• Ingress of each slice connected to Egress Slice 2


egress of all slices
• Slice interconnect provides non-
blocking any-to-any interconnection Ingress Slice n

between slices Egress Slice n

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 13
Slice Forwarding Path
(S6400 /
LS3600FX2 only)
Slice
Ingress → Ingress Forwarding Controller
SSX

Packet Payload
Ingress Ingress Packet
Packets MAC Parser

Lookup Key
Lookup
Result
Lookup
Pipeline
Replication Slice
Interconnect

Egress Forwarding Controller

Egress
Egress Egress Packet Egress
Buffering / Queuing /
Packets MAC Rewrites Policy
Scheduling

← Egress

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 14
Ingress Lookup Pipeline
From
Ingress Ingress Forwarding Controller
MAC
Packet To Egress
Parser Slice

Flex
TCAM
Tiles TCAM

Lookup
Result
Lookup Key

Load
Forwarding Ingress
Balancing,
Lookup Classification
AFD / DPP

Flush
Flow Table
Lookup Pipeline
LSE / LS1800FX /
LS3600FX2 only

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
Flexible Forwarding Tiles Flex Tile Flex Tile Flex Tile
• Provide fungible pool of table entries for lookups
• Number of tiles and number of entries in each tile
Flex Tile Flex Tile Flex Tile
varies between ASICs
• Variety of functions, including:
• IPv4/IPv6 unicast longest-prefix match (LPM) Flex Tile Flex Tile Flex Tile
• IPv4/IPv6 unicast host-route table (HRT)
• IPv4/IPv6 multicast (*,G) and (S,G)
• MAC address/adjacency tables
• ECMP tables
• ACI policy
Forwarding Lookup

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
Default
Flex Tile Routing Templates
• Configurable forwarding templates determine flex tile functions
• “system routing template” syntax

• Templates as of NX-OS 7.0(3)I7(2):


• Default
LPM Heavy
• Dual-stack host scale*†
• Internet peering*
• LPM heavy
• MPLS heavy*
• Multicast heavy
• Multicast NBM**
Multicast Heavy
• Defined at system initialization – reboot required to change
profile
* Template does not support IP multicast
† Template not supported on modular Nexus 9500
** Template not supported on TORs
BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
Agenda

• Data Center and Silicon Strategy


• Cloud Scale Architecture
• Cloud Scale ASICs
• Forwarding and Features
• Cloud Scale Switching Platforms
• Packet Walks
• Key Takeaways
IP Unicast Forwarding
• Router MAC match triggers L3 lookup
• Hardware performs exact-match on VRF and longest-match on IPDA
• Lookup result returns either adjacency pointer (index into MAC table), or ECMP pointer
• MAC table has output BD, rewrite MAC, and output port

“What are the output BD,


rewrite MAC, and output port?”
(BD,DMAC) (VRF,IPDA) ADJ_PTR
Router MAC Route BD, MAC,
Receive Transmit
Lookup Lookups port

MAC Table HRT/LPM MAC Table


ECMP_PTR
“What’s the longest match
“Is the DMAC a on the destination IP?”
Router MAC?” ECMP
“Which ECMP group
ECMP Table should be used?”

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
IP Tables
Several methods for storing IP prefixes in hardware:
• HRT – Hash table used for IPv4 /32 and IPv6 /128 host entries
• Provisioned from flex tiles
• LPM – Traditional prefix/mask entries, or combination of “pivot” and “trie” tiles,
used for other prefix lengths
• Provisioned from flex tiles
• TCAM – Handles overflow/hash collisions
• Traditional TCAM memory, front-ending flexible forwarding lookups

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Pivot / Trie Tiles for Scaling LPM
• “Pivot” tiles are hash tables containing base prefixes – match “base mask” bits
• “Trie” tiles contain leaf entries for corresponding pivots – match up to 3 least-significant prefix bits
• Combination of pivot and trie lookups returns longest-match prefix entry and adjacency pointer

Up to 3 unmasked bits
Base prefix mask – x.x.x.bbb Compare unmasked
shortest mask length in (Unmasked bits)
bits to Trie entries
pivot tile’s mask range
Hash result picks
Pivot table entry Pivot Tile Trie Tile Info Table
/24-/27
b HIT! ADJ_PTR /
10.1.1.237 10.1.1.0/24 Trie | Offset
10.1.1.0/24 HIT! ECMP
Hash
(Masked bits) Prefix | Trie | Index
Compare masked bits to
Destination IP Pivot entry contents
address from packet Index (from Pivot) + offset (from
Trie) indexes Info Table to get
Adjacency / ECMP pointer

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
• Trie tiles contain leaf entries for
Trie Tile Lookup corresponding pivots
Pivot tile mask range: /24-/27 Don’t care • Up to 15 prefixes can be packed into
one trie entry
IP from packet → 10.1.1.237
Binary → 00001010.00000001.00000001.11101101 • Much more efficient than consuming
one table entry per prefix
Matched using hash Matched using trie
lookup in pivot table lookup (3 bits)
(24 bits)

Prefixes in FIB: *
IP Prefix Binary
10.1.1.0/24 00001010.00000001.00000001.******** 1 0

10.1.1.128/25 00001010.00000001.00000001.1*******
11 10 01 00
10.1.1.64/26 00001010.00000001.00000001.01******
10.1.1.224/27 00001010.00000001.00000001.111*****
111 110 101 100 011 010 001 000

Trie lookup matches Trie bitmap: 110000010000000


on these 3 bits
BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
Encapsulation
VXLAN Forwarding BD,DMAC “Is packet destined “What are the tunnel
VRF,IPDA to remote VTEP?” header values?”
• VXLAN and other tunnel encapsulation/ L2/L3 Tunnel Outer MACs/
Receive
decapsulation performed in single pass Lookups Destination IPs/VNID
MAC/ DST_INTF/ ADJ PTRs Rewrite
• Encapsulation LPM/HRT ADJ_PTR

• L2/L3 lookup drives tunnel destination


• Rewrite block drives outer header fields
(tunnel MACs/IPs/VNID, etc.) Decapsulation

• Decapsulation Strip outer


• Packet parser determines whether and header and
“Is this a tunnel (Outer (Inner rewrite inner
what type of tunnel packet packet?” VRF,IPDA) MAC/VRF/IP) packet
• Forwarding pipeline determines whether
My TEP Inner L2/L3
tunnel is terminated locally, drives inner Receive Parser Rewrite
Table Lookups
lookups
“Is the tunnel destination “If Yes, process inner
a TEP I terminate?” packet headers”

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
Load Sharing
Equal-Cost Multipath (ECMP) Dynamic Load-Balancing (DLB)
• Static flow-based load-sharing • Supported on leaf switches in ACI
fabric
• Picks ECMP next-hop based on
hash of packet fields and universal • Congestion aware, flow-based or
ID flowlet-based – rebalances
• Source / destination IPv4 / IPv6 flows/flowlets based on path
address (L3) congestion
• Source / destination TCP / UDP ports
(L4)
• L3 + L4 (default)
• GRE key field

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
Flow versus Flowlet
Flow Flowlet
• 5-tuple of packet values • Series of back-to-back packets of
5-tuple flow
• All packets traverse same path
• Gap of a minimum period between
• Different flows may traverse packets represents flowlet boundary
different paths
• Different flowlets may traverse
different paths
Flow 1
Flow 2
Min Gap Min Gap
Flow 3
Flow 1

Flowlet 1 Flowlet 2 Flowlet 3

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 25
ECMP versus DLB Load-Sharing
ECMP Pointer
from Lookup

Base
Flow Hash
ECMP from Packet ECMP Links Output Port Transmit
Hash % Link
Count Offset
ECMP Table
ECMP Link Count
from IP Lookup Modulo
operation

Add port

Dynamic Rate
Estimator
Add flowlet Least
congested link
New DLB
flowlet Output Port
Candidates
DLB Flow Hash Flowlet Transmit
from Packet Table
64K entries Output Port
Existing
flowlet
BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
Multicast Forwarding
• Multicast source and group forwarding entries populated in HRT
• Additional, secondary table for multicast also provisioned (“MC_INFO”) from flex
tiles
• MET table in egress slice holds output interface list (OIL)
• Replication is single copy, multiple reads
(VRF,IPSA)
(VRF,IPDA)

Route PTR RPF MET_Index Replication


Receive Transmit
Lookups MET List

HRT MC_INFO MET

“Is there a (*,G) or “Did this packet pass “Which interfaces are
(S,G) match?” RPF check?” part of the OIL and
“Where’s the OIL for require replication?”
this mroute stored?”
BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
256 256
256 256

Classification TCAM 256 256 256 256


256 256 256 256
• Dedicated TCAM for packet classification 256 256 256 256
256 256 256 256
• Capacity varies depending on platform 256 256 256 256
256 256 256 256
• Leveraged by variety of features:
256 256 256 256
• RACL / VACL / PACL
256 256 256 256
• L2/L3 QOS
Ingress Slice Ingress Slice
• SPAN / SPAN ACL
Egress Slice Egress Slice
• NAT
256 256 256 256
• COPP
256 256 256 256
• Flow table filter (LS1800FX / LS3600FX2) 256 256 256 256
256 256 256 256

LSE LS1800FX / S6400 / LS3600FX2


4K ingress ACEs / 5K ingress ACEs /
2K egress ACEs 2K egress ACEs
BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
RACL RACL
TCAM Region Resizing RACL RACL
RACL RACL
• Default carving allocates 100% of TCAM and enables: RACL RACL
• Ingress / Egress RACL RACL QOS
• Ingress QOS QOS QOS
• SPAN SPAN SPAN
• SPAN ACLs SACL SACL
• Flow table filter (LS1800FX / LS3600FX2 only) FT FT
• Reserved regions RSVD RSVD

• Based on features required, user can resize TCAM regions to Ingress Slice

adjust scale Egress Slice


• To increase size of a region, some other region must be sized smaller RACL RACL

• Region sizes defined at initialization – changing allocation requires RACL RACL

system reboot RACL RACL

• Configure all regions to desired size (“hardware access-list tcam region”), RACL RSVD
save configuration, and reload

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
Flow Table / Flow Table Events
• LSE / LS1800FX / LS3600FX2 platforms support
hardware flow table logic
• 32K flow table entries per slice + triggered event-
based flow data capture
• Collects full flow information plus metadata for:
• Tetration Analytics
• Fabric Insights or third-party analytics platform
• Netflow Data Export v9

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 30
Flow Table and Flow Table Events Logic
Flow table / FTE operation for
telemetry:
“Is export interval Flow Table
1. Determine if FT/FTE enabled for reached?” Flow
Records “Is event FIFO full
flow Flush to
(or timer popped)?” Flush to
collector(s) collector(s)
2. Install FT record; capture FTE
“Did this packet trigger FTE
records if triggered any events?” Records
3. Flush FT / FTE records, Event FTE FIFO
notification
encapsulate in IP/UDP Admitted flows
Submit for “Is flow collection and/or
Filter TCAM
4. Submit packet for lookup forwarding flow events enabled for
lookup Flow Key this packet?”

Lookup
Parser
Pipeline
Ingress Flows Lookup Result
Lookup Key

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 31
Flow Table Events
Event triggers:
Packet value match Latency threshold
Buffer drop Microburst threshold
ACL drop Forwarding exception
Got it!

Flow Table Event Record


I see queue drops –
but who’s affected?! Flow Data Timestamp Port Queue Buffer Drop

Packet drops! Enable Flow Table


Event on packet
buffer drops

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 32
Full Netflow Netflow v9 support:
• 9300-FX TORs: 7.0(3)I7(1)
• 9300-EX TORs: 7.0(3)I7(2)
Flow table operation for full Netflow:
1. Install FT records as usual
Switch
2. Flush FT records every 100 Control Plane
milliseconds, send to switch CPU via Software Flow
forwarding pipeline Flow Table “Is export interval Cache “Is export interval
reached?” reached?”
Flow Flow
3. CPU builds traditional Netflow cache in Records Flush to Records
Export NDEv9
software switch CPU to collector(s)

4. CPU exports NDEv9 to collector(s) Admitted flows


every 10 seconds Filter TCAM
Flow Key

Lookup
Parser
Pipeline
Ingress Flows Lookup Result
Lookup Key

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 33
Streaming Statistics Export (SSX)
• Streams statistics and other ASIC- • User defines streaming parameters – which
level data statistics, how often, and to which collector
• Direct export from ASIC – no switch • Hardware support in S6400 / LS3600FX2
CPU involvement

SSX Packet
S6400 / LS3600FX2
Periodically pull L2/L3 Headers
values from ASIC
Stats Metadata
SSX Tables TLV-1

Registers TLV-2
Submit for


forwarding lookup

Parser
Lookup TLV-n
Pipeline
CRC
BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
Buffering
• Cloud Scale platforms implement shared-memory egress buffered architecture
• Each ASIC slice has dedicated buffer – only ports on that slice can use that buffer
• Dynamic Buffer Protection adjusts max thresholds based on class and buffer occupancy
• Intelligent buffer options maximize buffer efficiency

Slice 0 Slice 0 Slice 1 Slice 0


18.7MB 10.2MB 10.2MB 20MB
Slice 0
Slice Interconnect Slice Interconnect Slice Interconnect
40.8MB
Slice 1 Slice 2 Slice 3 Slice 1
18.7MB 10.2MB 10.2MB 20MB

LSE S6400 LS1800FX LS3600FX2


18.7MB/slice 10.2MB/slice 40.8MB/slice 20MB/slice
(37.4MB total) (40.8MB total) (40.8MB total) (40MB total)

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 35
Intelligent Buffering
Innovative Buffer Management for Cloud Scale switches
• Dynamic Buffer Protection (DBP) – Controls buffer allocation for congested
queues in shared-memory architecture
• Approximate Fair Drop (AFD) – Maintains buffer headroom per queue to
maximize burst absorption
• Dynamic Packet Prioritization (DPP) – Prioritizes short-lived flows to expedite
flow setup and completion

Miercom Report: Speeding Applications in Data Center Networks


http://miercom.com/cisco-systems-speeding-applications-in-data-center-networks/

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
Dynamic Buffer Protection (DBP)
• Prevents any output queue from consuming more than its fair share of buffer in
shared-memory architecture
• Defines dynamic max threshold for each queue
• If queue length less than threshold, packet is admitted
• Otherwise packet is discarded

• Threshold calculated by multiplying free memory by configurable Alpha (α)


value (weight)
• “queue-limit dynamic alpha-value” in queuing policy

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
Default Alpha on
Alpha Parameter Examples Cloud Scale switches

Alpha (α) = 0.5 Alpha (α) = 1 Alpha (α) = 14


40 40 40

35 35 35
Buffer per queue ==
30 30 free buffer 30
Buffer per queue ==
Buffer per queue ==
Buffer in MB

Buffer in MB
Buffer in MB
25 ½ free buffer 25 25
14 x free buffer
20 20 20

15 15 15

10 10 10

5 5 5

0 0 0
1 2 4 8 16 32 64 1 2 4 8 16 32 64 1 2 4 8 16 32 64
Number of Oversubscribed Queues Number of Oversubscribed Queues Number of Oversubscribed Queues

Buffer per queue (MB) Free buffer (MB) Buffer per queue (MB) Free buffer (MB) Buffer per queue (MB) Free buffer (MB)

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 38
Buffering – Ideal versus Reality
Ideal buffer state Actual buffer state

Buffer available for burst absorption

Buffer available for burst absorption


Buffer consumed by sustained-bandwidth
TCP flows

Buffer consumed by sustained-bandwidth


TCP flows

Sustained-bandwidth TCP flows Sustained-bandwidth TCP flows


back off before all buffer consumed consume all available buffer
before backing off

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 39
Approximate Fair Drop (AFD)
Maintain throughput while minimizing buffer consumption by elephant flows – keep buffer
state as close to the ideal as possible
1. Distinguish elephant flows from other flows
2. Track elephant flows and adjust AFD drop probability
3. Enforce AFD at egress queue
AFD desired
queue depth

Flow data 8K flows 1K flows


from packet Index into Track Drop
Elephant Trap Elephants probability Transmit
Elephant Elephants Egress
Receive Hash
Trap Table Queuing
Drop
“Does this flow “Monitor Elephant flow “Has this queue
exceed Elephant bandwidth, set AFD drop reached AFD queue-
threshold?” probability” desired watermark?”

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 40
Dynamic Packet Prioritization (DPP)
• Prioritize initial packets of new / short-lived flows
• Up to first 1K packets assigned to higher-priority qos-group

Drive new
Flow data Index into 64K flows No priority
from packet Prioritization
Flow Table Prioritization Egress
Receive Hash Transmit
Flow Table Queuing

“Has this flow Yes Maintain


exceeded prioritization original priority
threshold?”

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
Queuing and Scheduling
50/50 DWRR
UC0 MC0 UC1 MC1 UC2 MC2 UC3 MC3 UC4 MC4 UC5 MC5 UC6 MC6 UC7 MC7

CPU Class 0 Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 7 SPAN

Strict Best
Configurable Weights / Priority
Priority Effort
Egress
Port

Final Winner

• 8 user classes and 16 queues per output port (8 unicast, 8 multicast)


• QOS-group drives class; egress queuing policy defines class priority and weights
• Dedicated classes for CPU traffic and SPAN traffic

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 42
Ingress QOS / Egress Queuing Policies
• Default QOS behavior: • To set/change packet markings, use “set
• All user data goes to q-default cos / precedence / dscp” in ingress
• Trust received QOS markings QOS policy

• To select egress queue, use “set • To change queuing behavior, manipulate


qos-group” in ingress QOS policy egress queuing policies

Ingress QOS policy Egress Queuing policy

set qos- Selects queue-limit


Ingress
group queue priority
bandwidth
set dscp AFD
shape Egress
Marks
set prec
packet

set cos

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
Agenda

• Data Center and Silicon Strategy


• Cloud Scale Architecture
• Cloud Scale ASICs
• Forwarding and Features
• Cloud Scale Switching Platforms
• Packet Walks
• Key Takeaways
Cloud Scale Platforms
Nexus 9300-EX and 9300-FX/FX2 Nexus 9500 X9700-EX and
X9700-FX Modules
• Premier TOR platforms
• Switching modules for Nexus 9500
• Full Cloud Scale functionality modular chassis
• ACI leaf / standalone leaf or spine • Full Cloud Scale functionality
• FX option with MACSEC using • ACI spine / standalone aggregation
LS1800FX silicon or spine
• FX2 option with key enhancements • FX option with MACSEC using
using LS3600FX2 silicon LS1800FX silicon

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 45
Nexus 9300-EX Cloud Scale TOR Switches

48-port 10/25G SFP28 + Key Features


6-port 100G QSFP28
N9K-C93180YC-EX – LSE-based
ACI: 1.3(1)
NX-OS: 7.0(3)I4(2)
Dual capability – ACI and NX-OS mode
Flexible port configurations – 1/10/25/40/50/100G
48-port 1/10GBASE-T + Native 25G server access ports
6-port 100G QSFP28
N9K-C93108TC-EX – LSE-based Flow Table / FTE for Tetration Analytics, Fabric
ACI: 2.0(1) Insights, Netflow
NX-OS: 7.0(3)I4(2)
Smart buffer capability (AFD / DPP)
32-port 40G/50G/100G
QSFP28
N9K-C93180LC-EX – LSE-based
ACI: 2.2(1)
NX-OS: 7.0(3)I6(1)

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 46
Nexus 9300-EX Switch Architectures

CPU LSE

1-48 49-54
Front Panel Ports

C93180YC-EX (10/25G + 100G) /


C93108TC-EX (10G + 100G)

CPU LSE

1-28 29-32
Slice 0
Front Panel Ports
Slice 1

C93180LC-EX (40/50G + 100G)


BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 47
Nexus 9300-FX Cloud Scale TOR Switches – Pervasive
MACSEC
48-port 10/25G SFP28 + Key Features
6-port 100G QSFP28
N9K-C93180YC-FX –
LS1800FX-based Dual capability – ACI and NX-OS mode
ACI: 2.2(2e)
NX-OS: 7.0(3)I7(1) Flexible port configurations –
100M/1/10/25/40/50/100G
48-port 1/10GBASE-T + Line-rate 256-bit encryption on all ports
6-port 100G QSFP28
N9K-C93108TC-FX –
32G FC support on all SFP ports
LS1800FX-based
ACI: 2.2(2e)
25G distances beyond 3m (RS-FEC)
NX-OS: 7.0(3)I7(1)
Flow Table / FTE for Tetration Analytics,
Fabric Insights, Netflow
48-port 100M/1GBASE-T +
Smart buffer capability (AFD / DPP)
4-port 10G/25G + 2-port 100G
QSFP28
N9K-C9348GC-FXP –
LS1800FX-based
ACI: 3.0(1)
NX-OS: 7.0(3)I7(1)

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
Nexus 9300-FX Switch Architectures

CPU LS1800FX

1-48 49-54
Front Panel Ports

C93180YC-FX (10/25G + 100G) /


C93108TC-FX (10G + 100G)
CPU LS1800FX

1-48 49-52 53-54


Slice 0
Front Panel Ports

C9348GC-FXP (100M/1G + 10/25G + 100G)


BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
Nexus 9364C 100G Cloud Scale Switch

Key Features

Dual capability – ACI and NX-OS mode


Compact, high-performance fixed ACI spine
100G/50G/40G/10G (single port mode – no
breakout)
2 x 100M/1G/10G SFP+ ports
64-port 100G QSFP28 +
2-port 10G SFP+ MACSEC/CloudSec on 16 ports
N9K-C9364C – S6400-based
ACI: Roadmap Streaming Statistics Export (SSX)
NX-OS: 7.0(3)I7(2)
Smart buffer capability (AFD / DPP)

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 50
Nexus 9364C Switch Architecture

CPU S6400

CloudSec CloudSec CloudSec CloudSec CloudSec CloudSec CloudSec CloudSec


1 2 3 4 5 6 7 8

1-48 49-64 65-66


(10G)
Front Panel Ports

C9364C (100G + 10G)

Slice 0 Slice 2
Slice 1 Slice 3

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
Nexus 9300-FX2 Cloud Scale TOR Switches

Key Features

Dual capability – ACI and NX-OS mode


Versatile standalone 100G switch
36-port 100G QSFP28
N9K-C9336C-FX2 – LS3600FX2-based Compact, high-performance fixed ACI spine
ACI/NX-OS: Roadmap
100G/50G/40G/10G with breakout capability
Flow Table / FTE for Tetration Analytics,
Fabric Insights, Netflow
Streaming Statistics Export (SSX)
MACSEC/CloudSec on all ports
48-port 10/25G SFP28 + VXLAN ESI multi-homing
12-port 100G QSFP28
N9K-C93240YC-FX2 – LS3600FX2-based Smart buffer capability (AFD / DPP)
NX-OS: Roadmap

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 52
Nexus 9300-FX2 Switch Architecture

CPU LS3600FX2

1-36
Front Panel Ports

C9336C-FX2 (100G)

CPU LS3600FX2

1-48 49-60
Slice 0
Front Panel Ports
Slice 1

C93240YC-FX2 (10/25G + 100G)


BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 53
Nexus 9500 Modular Cloud Scale Switches

+ +
Nexus 9504 Nexus 9508 Nexus 9516

Common Equipment E-Series Fabric Modules

EX / FX Series Line Cards

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
X9700-EX 100G Cloud Scale Modules
N9K-X9732C-EX / N9K-X9736C-EX
Advanced features –
Line-rate performance up to • Smart buffer capability (AFD / DPP)
3.2Tbps capacity • Flexible forwarding tables
• VXLAN routing

32-port 100G QSFP28


X9732C-EX – LSE-based
ACI: 1.3(1)
NX-OS: 7.0(3)I4(2)

32 / 36 x QSFP28-based 100G ports


• Pin-compatible with 40G QSFP+
• Flexible speed ports – 1 / 10 / 25 / 40 / 50 / 100G Supported in ACI and 36-port 100G QSFP28
capability X9736C-EX – LSE-based
NX-OS standalone mode
ACI: Roadmap
NX-OS: 7.0(3)I6(1)

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 55
N9K-X9732C-EX / N9K-X9736C-EX Architecture
8 x 100G
with speedup

EOBC To Fabric Modules EOBC To Fabric Modules

LC LSE 1 LSE 2 LSE 3 LSE 4 LC LSE 1 LSE 2 LSE 3 LSE 4


CPU CPU

1-8 9-16 17-24 25-32 1-9 10-18 19-27 28-36

Front Panel Ports Front Panel Ports

X9732C-EX X9736C-EX
8 x 100G front-panel 9 x 100G front-panel
ports per LSE ports per LSE
Slice 0
Slice 1

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 56
X9700-EX 10/25G + 100G Cloud Scale Module
N9K-X97160YC-EX
Advanced features –
• Smart buffer capability (AFD / DPP)
• Flexible forwarding tables
1.6Tbps capacity with • VXLAN routing
line-rate performance

48p 10/25G SFP+ and 4p


100G QSFP28
X97160YC-EX – LSE-based
NX-OS: 7.0(3)I5(2)

48 x SFP28-based 25G ports


• Pin-compatible with 1G SFP and 10G SFP+
• Flexible speed ports – 1 / 10 / 25G capability Supported in NX-OS
4 x QSFP28-based 100G ports standalone mode only
• Pin-compatible with 40G QSFP+
• Flexible speed ports – 1 / 10 / 25 / 40 / 50 / 100G
capability

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 57
N9K-X97160YC-EX Architecture
8 x 100G
with speedup

EOBC To Fabric Modules

LC LSE 1 LSE 2
CPU

1-12 25-36 49 50 13-24 37-48 51 52

Front Panel Ports

X97160YC-EX
24 x 10/25G and 2 x 100G
front-panel ports per LSE

Slice 0
Slice 1

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 58
X9700-FX 100G Cloud Scale Module
N9K-X9736C-FX Advanced features –
• Line-rate MACSEC on all ports
• CloudSec encryption (8 ports)
• Smart buffer capability (AFD / DPP)
3.2Tbps capacity line-rate • Flexible forwarding tables
performance at 170-byte • VXLAN routing
frames

36p 100G QSFP28


X9736-FX – LS1800FX-based
ACI: Roadmap
NX-OS: Roadmap

36 x QSFP28-based 100G ports Supported in ACI and


• Pin-compatible with 40G QSFP+ NX-OS standalone mode
• Flexible speed ports – 1 / 10 / 25 / 40 / 50 / 100G
capability

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 59
N9K-X9736C-FX Architecture
8 x 100G
with speedup

EOBC To Fabric Modules

LC LS1800FX 1 LS1800FX 2 LS1800FX 3 LS1800FX 4


CPU

CloudSec CloudSec CloudSec CloudSec


1 2 3 4

1-7 29-30 8-14 31-32 15-21 33-34 22-28 35-36


Front Panel Ports

X9736C-FX
9 x 100G front-panel
Slice 0 ports per LS1800FX 7 x 100G MACSEC capable 2 x 100G MACSEC / CloudSec
ports per LS1800FX capable ports per LS1800FX

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 60
MACSEC Hardware Encryption
• Provides link-level hop-by-hop encryption
MACSEC Frame Format
• IEEE 802.1AE 128-bit and 256-bit AES DMAC
encryption with MKA Key Exchange
SMAC
• Native hardware support available on: DMAC SEC-Tag
• All ports on X9736C-FX linecard SMAC EType
• All ports on Nexus 93180YC-FX / 93108TC-FX EType Payload
switches
Payload ICV
• 16 x 100G ports on Nexus 9364C switch
FCS FCS
• All ports on Nexus 9336C-FX2 / N9K-C93240YC-
FX2 switches Original Frame Encrypted Frame

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 61
CloudSec Hardware Encryption
CloudSec Frame Format
• Provides VTEP-to-VTEP encryption
DMAC
• Encrypts VXLAN header and payload for
transport over arbitrary IP network SMAC
DMAC EType
• Hardware support available on:
SMAC IP
• 8 x 100G ports on X9736C-FX linecard
EType UDP*
• 16 x 100G ports on Nexus 9364C
IP SEC-Tag
• All ports on 9300-FX2 TORs
UDP VXLAN
• No support on other TOR switches VXLAN Payload
Payload ICV
FCS FCS
Original VXLAN Packet Encrypted Packet

* CloudSec UDP dest port


BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 62
Cloud Scale Fabric Modules – FM-E and FM-E2
9504-FM-E 9508-FM-E 9508-FM-E2 9516-FM-E 9516-FM-E2

ASE2 ASE2 ASE2 S6400 ASE2 ASE2 ASE2 ASE2 S6400 S6400

32 x 100G 32 x 100G 32 x 100G 64 x 100G 64 x 50G 64 x 50G 64 x 50G 64 x 50G 64 x 100G 64 x 100G

• Cloud Scale linecards require Cloud Scale • N9K-C9504-FM-E • N9K-C9516-FM-E


fabric modules ACI: 1.3(1)
NX-OS: 7.0(3)I4(2)
ACI: Roadmap
NX-OS: 7.0(3)I5(2)
• N9K-C9508-FM-E • N9K-C9516-FM-E2
• Provide up to 3.2Tbps capacity per IO module ACI: 1.3(1) ACI/NX-OS: Roadmap
slot with 4 FMs •
NX-OS: 7.0(3)I4(2)
N9K-C9508-FM-E2
ACI/NX-OS: Roadmap
• Note: Cloud Scale FMs support X9700-EX and
X9700-FX modules only
BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 63
Cloud Scale Fabric Connectivity – Nexus 9504
4 x 9504-FM-E
32 x 100G per FM / 4 slots =
FM2 FM3 FM4 FM6 800G per slot per FM

ASE2 ASE2 ASE2 ASE2

8 x 100G for each 800G x 4 FM x 4 slots =


LSE/LS1800FX 12.8T per system

2 x 100G to
each ASE2 32 x 100G per
module = 3.2T / slot

LS1800FX LS1800FX LS1800FX LS1800FX


LSE 1 LSE 2 LSE 3 LSE 4 LSE 1 LSE 2
1 2 3 4

X9732C-EX / X97160YC-EX X9736C-FX


X9736C-EX 16 x 100G per
module = 1.6T / slot
32 x 100G per
module = 3.2T / slot

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 64
Cloud Scale Fabric Connectivity – Nexus 9508
4 x 9508-FM-E
FM2 FM3 FM4 FM6 64 x 100G per FM / 8 slots =
800G per slot per FM
ASE2 ASE2 ASE2 ASE2 ASE2 ASE2 ASE2 ASE2

8 x 100G for each


LSE/LS1800FX 800G x 4 FM x 8 slots =
25.6T per system

1 x 100G to 32 x 100G per


each ASE2 module = 3.2T / slot

LS1800FX LS1800FX LS1800FX LS1800FX


LSE 1 LSE 2 LSE 3 LSE 4 LSE 1 LSE 2
1 2 3 4

X9732C-EX / X97160YC-EX X9736C-FX


X9736C-EX 16 x 100G per
module = 1.6T / slot
32 x 100G per
module = 3.2T / slot
BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 65
Cloud Scale Fabric Connectivity – Nexus 9516 FM-E
256 x 50G per FM /
4 x 9516-FM-E 16 slots = 800G per
slot per FM
FM2 FM3 FM4 FM6
ASE2 ASE2 ASE2 ASE2 ASE2 ASE2 ASE2 ASE2 ASE2 ASE2 ASE2 ASE2 ASE2 ASE2 ASE2 ASE2

800G x 4 FM x
16 slots = 51.2T
1 x 50G per system
to each
ASE2
64 x 50G
per module
= 3.2T / slot

LS1800FX LS1800FX LS1800FX LS1800FX


LSE 1 LSE 2 LSE 3 LSE 4 LSE 1 LSE 2
1 2 3 4
64 x 50G
X9732C-EX / per module X97160YC-EX X9736C-FX
X9736C-EX = 3.2T / slot 32 x 50G
per module
16 x 50G for each = 1.6T / slot Note: 50G flow limit in Nexus 9516 chassis
LSE/LS1800FX with FM-E
BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 66
Cloud Scale Fabric Connectivity – Nexus 9516 FM-E2
128 x 100G per FM
4 x 9516-FM-E / 16 slots = 800G
per slot per FM
FM2 FM3 FM4 FM6
S6400 S6400 S6400 S6400 S6400 S6400 S6400 S6400

800G x 4 FM x
16 slots = 51.2T
1 x 100G per system
to each
S6400
32 x 100G
per module
= 3.2T / slot

LS1800FX LS1800FX LS1800FX LS1800FX


LSE 1 LSE 2 LSE 3 LSE 4 LSE 1 LSE 2
1 2 3 4
32 x 100G
X9732C-EX / per module X97160YC-EX X9736C-FX
X9736C-EX = 3.2T / slot 16 x 100G
per module
8 x 100G for each = 1.6T / slot
LSE/LS1800FX

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 67
Agenda

• Data Center and Silicon Strategy


• Cloud Scale Architecture
• Cloud Scale ASICs
• Forwarding and Features
• Cloud Scale Switching Platforms
• Packet Walks
• Key Takeaways
Receive from Rewrite SMAC/DMAC
Packet Walk (TOR) – IP Unicast Slice 0, transmit
to Slice 1
TTL decrement
Write QOS fields

93180YC-EX

LSE

Slice Interconnect

IFC EFC
Buffering /
Packet Lookup Egress
MAC Queuing / Rewrites MAC
Parser Pipeline Policy
Scheduling
Slice 0 Slice 1

Receive frame Extract header fields Longest-match prefix lookup Receive from slice interconnect Egress policy FCS generation
FCS checking Generate lookup key Adjacency/ECMP pointer Buffer packet in queue (DBP) enforcement Transmit frame
VLAN checking MTU checks DLB/ECMP decision AFD drops
Ingress policy enforcement Scheduling
AFD/DPP
Flow table / FTE

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 69
Policy and rewrites for
MET lookup
Packet Walk (TOR) – Multicast Replication to local OIL
each copy

9364C

S6400
Slice 2 Slice 3
Replication

Slice with no receivers


drops packet
Buffering / Buffering /
Egress
Queuing / Queuing / Rewrites MAC
Policy
Scheduling Scheduling
EFC

Replication to all
slices Slice Interconnect

(*,G) or (S,G) lookup


RPF check EFC
MET pointer Buffering /
IFC Egress
Queuing / Rewrites MAC
Packet Lookup Policy
MAC Scheduling
Parser Pipeline

Replication
Slice 0 Slice 1

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 70
Packet Walk (Modular) – Multicast
Multicast replication to
Replication to all
FM-E2 local OIFs (fabric links)
slices S6400
(*,G) or (S,G) lookup
Slice Interconnect
MET pointer

(*,G) or (S,G) lookup


MET lookup Slice 0 Slice 1 Slice 2 Slice 3 MET pointer
Replication to local OIL
(front panel and/or fabric)

LSE 1 LSE 2 LSE 3 LSE 4 LSE 1 LSE 2

Slice 1 Slice 1 Slice 1


Replication to all
Slice Interconnect Slice Interconnect slices Slice Interconnect

Slice 0 Slice 0 Slice 0

X9732C-EX / X97160YC-EX
Replication to all (*,G) or (S,G) lookup X9736C-EX
slices RPF check
MET pointer Multicast replication
to local OIFs

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 71
Packet Walk – VXLAN Encapsulation Add L2 / IP / UDP / VXLAN header

93180YC-EX

LSE

Slice Interconnect

IFC EFC
Buffering /
Packet Lookup Egress
MAC Queuing / Rewrites MAC
Parser Pipeline Policy
Scheduling
Slice 0 Slice 1

L2/L3 lookup
Adjacency pointer
Remote tunnel endpoint
ECMP decision

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 72
Packet Walk – VXLAN Decapsulation
93180YC-EX

LSE

Slice Interconnect

EFC IFC
Buffering /
Egress Lookup Packet
MAC Rewrites Queuing / MAC
Policy Pipeline Parser
Scheduling
Slice 0 Slice 1

Destination TEP lookup


Remove outer headers Inner L2/L3 lookups
Rewrite SMAC/DMAC Adjacency/ECMP pointer Tunnel check
Decrement TTL ECMP decision Extract header fields
Generate lookup key

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 73
Agenda

• Data Center and Silicon Strategy


• Cloud Scale Architecture
• Cloud Scale ASICs
• Forwarding and Features
• Cloud Scale Switching Platforms
• Packet Walks
• Key Takeaways
Nexus 9000 – Market Momentum

14,500+ 4500+ 65+

Nexus 9K ACI Ecosystem


Customers Globally Customers Partners

ECOSYSTEM PARTNERS

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 75
Key Takeaways
• You should now have a thorough
understanding of the Nexus 9000 Cloud
Scale switching platform architecture
• Feature-rich, innovative switching
platform addresses virtually every
deployment scenario
• Nexus 9000 Cloud Scale platform forms
foundation of the ASAP Data Center

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 76
Cisco Spark
Questions?
Use Cisco Spark to communicate
with the speaker after the session

How
1. Find this session in the Cisco Live Mobile App
2. Click “Join the Discussion”
3. Install Spark or go directly to the space
4. Enter messages/questions in the space

cs.co/ciscolivebot#BRKARC-3222

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Please complete your Online Complete Your Online
Session Evaluations after each
session
Session Evaluation
• Complete 4 Session Evaluations
& the Overall Conference
Evaluation (available from
Thursday) to receive your Cisco
Live T-shirt
• All surveys can be completed via
the Cisco Live Mobile App or the
Communication Stations
Don’t forget: Cisco Live sessions will be available
for viewing on-demand after the event at
www.ciscolive.com/global/on-demand-library/.

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Continue Your Education
• Demos in the Cisco campus
• Walk-in Self-Paced Labs
• Tech Circle
• Meet the Engineer 1:1 meetings
• Related sessions

BRKARC-3222 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 79
Thank you

You might also like