Professional Documents
Culture Documents
Monitoring and
Troubleshooting
Nexus 9000 Switches
Yogesh Ramdoss
Principal Engineer, Customer Experience
@YogiCisco
BRKDCN-3020
#CLUS
Cisco Webex Teams
Questions?
Use Cisco Webex Teams to chat
with the speaker after the session
How
1 Find this session in the Cisco Live Mobile App
2 Click “Join the Discussion”
3 Install Webex Teams or go directly to the team space
4 Enter messages/questions in the team space
#CLUS © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 3
Agenda
• Introduction
• Monitor and Health-Check
• Troubleshooting Tools
• Troubleshooting Traffic Forwarding
• Common Failure Scenarios and Recommendations
• Summary and Take-Aways
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Introduction
Switching Architecture Changes
Shifting of Internal Architecture
Data BUS SOC SOC SOC SOC
Result BUS
Ethernet Out of Band Channel CROSSBAR
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 7
FWD – Forwarding
Consolidation of Functions
40Gbps Fabric 40Gbps Fabric
Channel Channel
EoBC
FABRIC INTERFACE
LC Arbitration
CPU Fabric ASIC Aggregator
Distributed
Forwarding Card
FIRE FIRE FIRE FIRE
ASIC ASIC L2 FWD ASIC ASIC LC Inband
Linecard
to LC
L3 FWD to ARB CPU S6400
Port ASICs
4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G
SOC 1 SOC 2 SOC 3 SOC 4 SOC 5 SOC 6 SOC 7 SOC 8 SOC 9 SOC 10 SOC 11 SOC 12
CTS ASICs
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
Generations of Nexus 9000
1 2
NFE NFE ASE Leverages merchant
Merchant
Silicon Silicon + Cisco ASIC
to enhance services
4
SOC SOC SOC SOC
3
SOC
Switch “is”
the ASIC
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
Nexus 9000 Product Family
Focus For This Session
ASICs Platforms
StrataXGS Trident* 94XX, 9636
StrataXGS Tomahawk* 9432C, C950X-FM-S
StrataXGS Trident* + Northstar 9396, 93128, 95XX
StrataXGS Trident* + Donner 9372, 9332, 93120 This session
StrataDNX Jericho* X9600-R/X-9600-RX is going to
Tahoe-Sugarbowl 93XX-EX, 97XX-E/EX discuss …
Tahoe-Lacrosse 92XX, C950X-FM-E
Tahoe-Davos 92160YC
Rocky-Homewood F/FX/FXP
Rocky-Bigsky 9364C,C950X-FM-E2
Rocky-Heavenly FX2
* Merchant Silicon from Broadcom
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
Building Data Center Fabrics with Nexus 9000
DCNM
DCNM
L3
L3
RR RR
VXLAN / L3
L3
EVPN L3
L2
L3 VPC
L3 L3
Hypervisor Hypervisor
Hypervisor
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 12
Nexus 9000
… platform of possibilities
#CLUS © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Monitor
and
Health-Check
Agenda • Hardware Diagnostics
• On-board Failure Logging (OBFL)
• Introduction • Device Resource Usage
• Control-Plane Policing (CoPP)
• Monitor and Health-Check
• Hardware Rate-Limiters (HWRL)
• Troubleshooting Tools • Recommended Software
• Troubleshooting Traffic Forwarding
• Common Failure Scenarios and Recommendations
• Summary and Take-Aways
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
Hardware Diagnostics
Categories
Bootup Diagnostics
Run at bootup and detect faulty hardware before it is brought online by
NX-OS
Runtime or Health-Monitoring Diagnostics
Runtime diagnostics are also called as Health Monitoring (HM) diagnostics,
which are non-disruptive. They detect runtime hardware errors, memory
errors, hardware degradation, software faults, and resource exhaustion of
a live device.
On-demand Diagnostics
Run once or at user-designated intervals and helps localize faults.
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
Hardware Diagnostics
Bootup Diagnostics
Supervisor Engine:
• USB – Checks integrity at the initialization of the USB Controller.
• ManagementPortLoopback – Tests loopback on the management port
• EOBCPortLoopback* – Checks health of Ethernet Out-of-Band Channel
(EOBC), which is used for communication between Supervisor engine(s) and
modules
• OBFL - Verifies the integrity of the On-Board Failure Log (OBFL) flash
Module
• OBFL - Verifies the integrity of the OBFL flash
NVRAM ASICRegisterCheck
RealTimeClock PrimaryBootROM
PrimaryBootROM SecondaryBootROM
SecondaryBootROM PortLoopback
BootFlash RewriteEngineLoopback
USB
SystemMgmtBus
Console
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
Hardware Diagnostics
On-Demand Diagnostics
• On-demand tests help localize faults and are usually needed:
• to respond to an event that has occurred, such as isolating a fault.
• in anticipation of an event that may occur, such as a resource exceeding its
utilization limit.
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
Hardware Diagnostics
Configuration and Commands
Setting bootup diagnostic level
N93128# config t
N93128(config)# diagnostic bootup level [bypass | complete]
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Hardware Diagnostics
Configuration and Commands
Diagnostics tests status and testing intervals:
N93128# show diagnostic content module <mod | all>
Diagnostics test suite attributes:
B/C/* - Bypass bootup level test / Complete bootup level test / NA
P/* - Per port test / NA
M/S/* - Only applicable to active / standby unit / NA
D/N/* - Disruptive test / Non-disruptive test / NA
H/O/* - Always enabled monitoring test / Conditionally enabled test / NA
F/* - Fixed monitoring interval test / NA
X/* - Not a health monitoring test / NA
E/* - Sup to line card test / NA
L/* - Exclusively run this test / NA
T/* - Not an ondemand test / NA
A/I/* - Monitoring is active / Monitoring is inactive / NA
Module 1: 1/10G-T Ethernet Module (Active)
Testing Interval
ID Name Attributes (hh:mm:ss)
____ __________________________________ ____________ _________________
1) USB---------------------------> C**N**X**T* -NA-
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
On-Board Failure Logging (OBFL)
Why we need it and what it does?
• OBFL logs failure data to persistent storage
• Persistent storage – Non-volatile flash memory on the modules.
Accessible for analysis.
• Enabled by default for all features
• OBFL Flash supports limited numbers of Read-Write operations
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
On-Board Failure Logging (OBFL)
Configuration and Status
N93128(config)# hw-module logging onboard ?
<CR>
counter-stats Enable/Disable OBFL counter statistics
cpuhog Enable/Disable OBFL cpu hog events
environmental-history Enable/Disable OBFL environmental history
error-stats Enable/Disable OBFL error statistics
interrupt-stats Enable/Disable OBFL interrupt statistics
module Enable/Disable OBFL information for Module
obfl-logs Enable/Disable OBFL (boot-uptime/device-version/obfl-history)
N93128# show logging onboard status
----------------------------
OBFL Status
----------------------------
Switch OBFL Log: Enabled
Module: 1 OBFL Log: Enabled
card-boot-history Enabled
card-first-power-on Enabled
<snip>
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
On-Board Failure Logging (OBFL)
On-Board Failure Logging - Sample Nearly 20 different options!
N93128# show logging onboard exception-log
---------------------------------------------------------------
Module: 1
---------------------------------------------------------------
<snip>
exception information --- exception instance 1 ----
Device Id : 49
Device Name : Temperature-sensor
Device Errorcode : 0xc3101203
Device ID : 49 (0x31)
Device Instance : 01 (0x01)
Dev Type (HW/SW) : 02 (0x02)
ErrNum (devInfo) : 03 (0x03)
System Errorcode : 0x4038001e Module recovered from minor temperature alarm
Error Type : Minor error
PhyPortLayer :
Port(s) Affected :
<snip>
Time : Sun Jan 20 19:42:08 2019
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
Device Resource Usage
Hardware Capacity
Resource What it gives?
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
Device Resource Usage
Hardware Capacity – Forwarding (Contd.)
----------------------------------------------------------------------------
Used Free Percent Utilization
----------------------------------------------------------------------------
<snip>
LOU 4 11 26.66
Both LOU Operands 4
Single LOU Operands 0
LOU L4 src port: 2
LOU L4 dst port: 2
LOU L3 packet len: 0
LOU IP tos: 0
LOU IP dscp: 0
LOU ip precedence: 0
LOU ip TTL: 0
TCP Flags 0 16 0.00
Protocol CAM 2 244 0.81
Mac Etype/Proto CAM 0 14 0.00
L4 op labels, Tcam 0 0 30 0.00
L4 op labels, Tcam 1 0 62 0.00
Ingress Dest info table 0 512 0.00
Egress Dest info table 0 512 0.00
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
Device Resource Usage
Hardware Capacity – TCAM
N9K-C9318OYC-EX # show system internal access-list globals
slot 1 -------------------------------------------------
======= Total configured size: 4096
<snip> Remaining free size: 0
INSTANCE 0 TCAM Region Information: Note: Ingress SUP region includes Redirect region
Ingress:
Region TID Base Size Width Egress:
---------------------------------------------------------- Region TID Base Size Width
NAT 13 0 0 1 -------------------------------------------------
Ingress PACL 1 0 0 1 Egress VACL 15 0 0 1
Ingress VACL 2 0 0 1 Egress RAC 16 0 1792 1
Ingress RACL 3 0 1792 1 Egress SUP 18 1792 256 1
Ingress RBACL 4 0 0 1 Egress L2 QOS 19 0 0 1
Ingress L2 QOS 5 1792 256 1 Egress L3/VLAN QOS 20 0 0 1
Ingress L3/VLAN QOS 6 2048 512 1 Egress CoPP 36 0 0 1
Ingress SUP 7 2560 512 1 -------------------------------------------------
Ingress L2 SPAN ACL 8 3072 256 1 Total configured size: 2048
Ingress L3/VLAN SPAN ACL 9 3328 256 1
Ingress FSTAT 10 0 0 1
SPAN 12 3584 512 1
Ingress REDIRECT 14 0 0 1
Ingress NBM 30 0 0 1
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
Device Resource Usage
iCAM - Introduction
• Intelligent CAM Analytics and Machine Learning (iCAM)* provides
visibility into which network traffic or applications utilize system’s
TCAM/SRAM resources.
• Features that use these resources are FIB, ACL
(RACL/PACL/VACL), PBR, NAT, QoS, Multicast, WCCP and more.
• Ability to predict the future usage with high granularity.
• Monitors the scale.
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
Device Resource Usage
iCAM – Scale of the Features – Multicast Routing
N9504(config)# icam monitor scale multicast-routing multicast-routes limit 100
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 35
Device Resource Usage
iCAM – Scale of the Features - Thresholds
N9504(config)# icam monitor scale threshold info 75 warning 85 critical 95
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
Device Resource Usage
iCAM – Scale of the Features - Utilization
N9504# show icam scale utilization
==============================================
Info Threshold = 75 percent (configured) |
Warning Threshold = 85 percent (configured) |
Critical Threshold = 95 percent (configured) |
All timestamps are in UTC |
============================================== Values since the feature enabled
-------------------------------------------------------------------------------------------------
Scale limits for L2 Switching Routing
-------------------------------------------------------------------------------------------------
Feature Verified Config Used Cur Avg 7-Day 7-Day Peak Peak Peak
Scale Scale Util Util Util Timestamp Util Timestamp
-------------------------------------------------------------------------------------------------
VLANs 3967 500 492 98.40 98.40 98.40 2019-02-20 19:21.46 98.40 2019-02-18 11:53:28
<snip>
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
Control-Plane Policing (CoPP)
Things to Check
• Choose either strict (default), moderate, lenient or dense policy.
• CoPP is performed per forwarding-engine. Configure rates to make
sure the aggregate traffic doesn’t overwhelm CPU.
• Monitor drop counters continuously. Traffic dropped because of
malfunction or an attack? Drop counters in default class?. Did we
miss to classify important traffic?
• CoPP configuration is an on-going process. Review configuration
after every major network change.
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 38
Control-Plane Policing (CoPP)
Quick Checks – Config and Stats
N9504# show copp status
Policy-map attached to the control-plane: copp-system-p-policy-strict
(match-any)
N9504# show policy-map interface control-plane | include class-map
class-map copp-system-p-class-l3uc-data (match-any)
class-map copp-system-p-class-critical (match-any)
class-map copp-system-p-class-important (match-any)
class-map copp-system-p-class-multicast-router (match-any)
class-map copp-system-p-class-multicast-host (match-any)
class-map copp-system-p-class-l3mc-data (match-any)
class-map copp-system-p-class-normal (match-any)
class-map copp-system-p-class-ndp (match-any)
<snip>
class-map copp-system-p-class-redirect (match-any)
class-map copp-system-p-class-exception (match-any)
class-map copp-system-p-class-exception-diag (match-any)
<snip>
class-map copp-system-p-class-undesirablev6 (match-any)
class-map copp-system-p-class-l2-default (match-any)
class-map class-default (match-any)
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 39
Control-Plane Policing (CoPP)
Quick Checks – Config and Stats (Contd.)
N9504# show policy-map interface control-plane
<snip>
class-map copp-system-p-class-important (match-any)
match access-group name copp-system-p-acl-hsrp
match access-group name copp-system-p-acl-vrrp
match access-group name copp-system-p-acl-hsrp6
match access-group name copp-system-p-acl-vrrp6
match access-group name copp-system-p-acl-mac-lldp
match access-group name copp-system-p-acl-icmp6-msgs
set cos 6
police cir 3000 pps , bc 128 packets
module 1 :
transmitted 2121674 packets;
dropped 143189 packets; Do “clear copp statistics”
<snip>
class-map class-default (match-any) and check again!
set cos 0
police cir 50 pps , bc 32 packets
module 1 :
transmitted 2231318 packets;
dropped 4239 packets;
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 40
Hardware Rate-Limiters (HWRL)
Things to Check
• Rate-limiters prevent redirected-due-to-exception packets from
overwhelming CPU. E.g., ACL Log or Layer3 Glean
N9504# show hardware rate-limiter
Have a close look
Units for Config: packets per second (kilo bits per
second for span-egress) at the allowed and
Allowed, Dropped & Total: aggregated since last dropped stats
clear counters
Enable/disable Module: 1
RLs, update their R-L Class Config Allowed Dropped Total
rates with +----------------+----------+-------------+----------+-------+
L3 MTU 0 0 0 0
“hardware rate-
L3 ttl 500 65 0 65
limiter” command L3 glean 100 28874211 9539369 38413580
under “config t” L3 mcast loc-grp 3000 0 0 0
access-list-log 100 0 0 0
bfd 10000 0 0 0
exception 50 0 0 0
span 50 0 0 0
<snip>
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
Command Line Interface
Programmability Support We all know grep/egrep, Command outputs are tailored
to highlight key features
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 42
Recommended Software Nexus 9000
Choose the right one Recommended Software
bulletin at Cisco.com
General Recommendation for New and Existing Deployments:
Platform Recommended Release
Nexus 9000 7.0(3)I7(6)
* If 9.2(x) is needed to deploy new hardware or features, use the latest version available on CCO.
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
Monitor and Health Check
Summary Never underestimate the
power of syslog (show logging
• Hardware diagnostic capabilities… bootup, log), and interface counters
runtime and on-demand. Help to check and errors (show interface).
hardware failure and run-time issues.
• OBFL helps to keep an eye on the systems’
events and exceptions. Critical for analysis.
• Monitoring resource usage is critical to take
precautionary measures
• Fine-tune CoPP and HWRL to protect
control-plane as well as to attain stability You are going to find
valuable things!!
• Pick the right software
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 44
Nexus 9000
… platform of possibilities
#CLUS © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Troubleshooting
Tools
Agenda
• Introduction • Ethanalyzer
• SPAN to CPU
• Monitor and Health-Check • Consistency Checkers (CC)
• Troubleshooting Tools • Port ACL / Router ACL
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 47
Ethanalyzer
Introduction
• Built-in tool to analyze the traffic sent and received by CPU. Helpful to
troubleshoot High CPU or Control-plane issues like HSRP failover or OSPF
adjacency flaps.
• Based on tshark code
• Two filtering approaches for configuring a packet capture
Display-Filter Example Capture-Filter Example
“eth.addr==00:00:0c:07:ac:01” “ether host 00:00:0c:07:ac:01”
“ip.src==10.1.1.1 && ip.dst==10.1.1.2” “src host 10.1.1.1 and dst host 10.1.1.2”
“snmp” "udp port 161”
“ospf” “ip proto 89”
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
Ethanalyzer Capture Stop
Filters
Process and Configuration Interface Criteria
(1) Identify Capture Interface
• mgmt – captures traffic on mgmt0 interface
• Inband - captures traffic sent to the control-plane/CPU
(2) Configure Filter
• Display-Filter – captures all traffic but displays only the traffic meeting the criteria
• Capture-Filter - captures all traffic meeting the criteria
(3) Define Stop Criteria
• By default, it stops after capturing 10 frames. Can be changed with limit-
captured-frames configuration. 0 means no limit, runs until user issues cntrl+C
• autostop can be used, to stop the capture after specified duration, filesize, or
number of files.
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
Host
172.18.37.71
Real World Example
Slow Download Rate
Nexus 9000
Server in VLAN 527
Downloads/Uploads over the
WAN are slow Eth4/1 Eth4/2
WAN
Downloads/Uploads on the
LAN have no problem
Internet
No incrementing errors on any SVI 527
interface and low average 10.5.27.2/24
Gateway
interface utilization 64a0.e745.89c1
10.5.27.1
Server
78da.6e19.4500
10.5.27.38
000a.f31a.1c1c
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
Host
172.18.37.71
Real World Example
Slow Download Rate
Nexus 9000
Can we quickly validate on the Nexus 9000
if traffic is hardware or software switched?
Eth4/1 Eth4/2
Ethanalyzer! WAN
N9k# ethanalyzer local interface inband capture-filter "host 10.5.27.38 or host 172.18.37.71" Internet
SVI 527
Gateway
10.5.27.2/24
If traffic is software-switched it 64a0.e745.89c1
would be seen on the inband.
Filter for any traffic between hosts
experiencing the slow downloads. 10.5.27.1
Server
78da.6e19.4500
10.5.27.38
000a.f31a.1c1c
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 52
Host
172.18.37.71
Real World Example
Slow Download Rate
Nexus 9000
Can we quickly validate on the Nexus 9000
if traffic is hardware or software switched?
All traffic from Server (10.5.27.38)
to the InternetEth4/1
(172.18.37.71) is Eth4/2
Ethanalyzer! being software switched WAN
N9k# ethanalyzer local interface inband capture-filter "host 10.5.27.38 or host 172.18.37.71"
Capturing on inband
2017-01-17 07:28:16.406589 10.5.27.38 -> 172.18.37.71 Internet
TCP 60 [TCP Keep-Alive] 28123 > http [ACK]
Seq=1 Ack=1 Win=8760 Len=0
SVI 527
2017-01-17 07:28:16.406603 10.5.27.2 -> 10.5.27.38 ICMP 70 Redirect (Redirect for host)
Gateway
Server 10.5.27.2/24
2017-01-17 07:28:16.406617 10.5.27.38 -> 172.18.37.71 TCP 60 [TCP Out-Of-Order] 28123 > http [FIN, ACK]
Seq=1 Ack=1 Win=8760 Len=0 64a0.e745.89c1
2017-01-17 07:28:16.407142 10.5.27.38 -> 172.18.37.71 TCP 60 28124 > httpN9K
[SYN]
(10.5.272.) sends ICMP
Seq=0 Win=8760 Len=0 MSS=1460 redirects to Server (10.5.27.38)
2017-01-17 07:28:16.407175 10.5.27.38 -> 172.18.37.71 TCP 60 [TCP Out-Of-Order] 28124 > http [SYN]
Seq=0 Win=8760 Len=0 MSS=1460 10.5.27.1
etc...
78da.6e19.4500
10.5.27.38
000a.f31a.1c1c
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 53
Host
172.18.37.71
Real World Example
Slow Download Rate
Nexus 9000
Can we quickly validate on the Nexus 9000
if traffic is hardware or software switched?
Eth4/1 Eth4/2
Ethanalyzer! WAN
N9k# ethanalyzer local interface inband capture-filter "host 10.5.27.38" limit-captured-frames 1 detail | i
Ethernet|Internet Internet
Capturing on inband SVI 527
1 packet captured Gateway
Server
Ethernet II, Src: Cisco_1a:1c:1c (00:0a:f3:1a:1c:1c), 10.5.27.2/24
Dst: Cisco_45:89:c1 (64:a0:e7:45:89:c1)
Internet Protocol Version 4, Src: 10.5.27.38 (10.5.27.38), Dst: 64a0.e745.89c1
172.18.37.71 (172.18.37.71)
But, how to differentiate the regular control-plane packets to SPAN to CPU packets?
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 56
SPAN to CPU
Troubleshooting Packet Loss monitor session 1
source interface eth1/1 rx
monitor session 1
destination interface sup-eth 0
source interface eth1/2 rx
filter access-group ACL1
destination interface sup-eth 0
no shut
filter access-group ACL1
ip access-list ACL1
no shut
permit icmp 10.214.10.5/32 any
ip access-list ACL1
permit icmp 10.214.10.5/32 any
Eth1/1
Network
Network Network
Eth1/2 Eth1/1
N9K-A N9K-B
10.214.10.5/24 10.214.50.11/24
ICMP Traffic
Host A Host B
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 57
SPAN to CPU Captures only the SPAN to CPU
packets, not regular packets!!
Troubleshooting Packet Loss
N9K-A# ethanalyzer local interface inband mirror display-filter "icmp”
Capturing on inband
2018-12-12 04:41:32.164790 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
2018-12-12 04:41:32.165562 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
2018-12-12 04:41:32.166266 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
2018-12-12 04:41:32.166930 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
2018-12-12 04:41:34.167589 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
Eth1/1
Network
Network Network
Eth1/2 Eth1/1
N9K-A N9K-B
10.214.10.5/24 10.214.50.11/24
ICMP Traffic
Host A Host B
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 58
SPAN to CPU
Troubleshooting Packet Loss
N9K-B# ethanalyzer local interface inband mirror display-filter "icmp”
Capturing on inband
2018-12-12 04:41:32.164982 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
2018-12-12 04:41:32.165941 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
2018-12-12 04:41:32.166611 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
2018-12-12 04:41:34.167992 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
Eth1/1
Network
Network Network
Eth1/2 Eth1/1
N9K-A N9K-B
10.214.10.5/24 10.214.50.11/24
ICMP Traffic
Host A Host B
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 59
SPAN to CPU
VXLAN – Topology and Traffic Flow
Spine
10.0.0.100 10.0.0.101
Host A ICMP Host B
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 60
SPAN to CPU
VXLAN Decode Example Available in release 7.0(3)I7(4), 9.2(1) and later releases
N9200# ethanalyzer local interf inband mirror display-filter icmp limit-cap 0 detail
Frame 1 (148 bytes on wire, 148 bytes captured)
<snip>
[Protocols in frame: eth:ip:udp:vxlan:eth:ip:icmp:data] <<< frame structure
Ethernet II, Src: 78:0c:f0:a2:2b:df (78:0c:f0:a2:2b:df), Dst: 70:0f:6a:f2:8c:05
(70:0f:6a:f2:8c:05)
<snip>
Type: IP (0x0800)
Internet Protocol, Src: 10.1.1.1 (10.1.1.1), Dst: 10.1.1.2 (10.1.1.2) <<< VTEPs
Version: 4
Header length: 20 bytes
<snip>
Source: 10.1.1.1 (10.1.1.1)
Destination: 10.1.1.2 (10.1.1.2)
User Datagram Protocol, Src Port: 22790 (22790), Dst Port: 4789 (4789) <<< VXLAN Attributes
Source port: 22790 (22790)
Destination port: 4789 (4789)
<snip>
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 61
SPAN to CPU
VXLAN Example (Contd.)
Virtual eXtensible Local Area Network
Flags: 0x08
<snip>
VXLAN Network Identifier (VNI): 10990010 <<< VNI for vlan 10
Reserved: 0
Ethernet II, Src:, 00:aa:aa:aa:10:10 (00:aa:aa:aa:10:10) Dst: 00:bb:bb:bb:20:20 <<< Inner MAC
(00:bb:bb:bb:20:20)
<snip>
Type: IP (0x0800)
Internet Protocol, Src: 10.0.0.100 (10.0.0.100), Dst: 10.0.0.101 (10.0.0.101) <<< Inner IPs
<snip>
Source: 10.0.0.100 (10.0.0.100)
Destination: 10.0.0.101 (10.0.0.101)
Internet Control Message Protocol <<< Original ICMP
Type: 8 (Echo (ping) request)
Code: 0 ()
Checksum: 0x3597 [correct]
Identifier: 0xb00f
<snip>
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 62
SPAN to CPU
Things to Know
• All SPAN replication is done in the hardware with no impact to CPU
• SPAN packets to CPU are rate-limited, and excess packets are dropped in
the inband path. Use “hardware rate-limiter span …” command to change the
rate.
• Starting from 7.0(3)I7(1) onwards, SPAN packets truncation is supported only
in Nexus 9300-EX/FX/FX2 platforms
• SPAN is not supported for management ports
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 63
Consistency Checkers
How it helps?
Consistency Checkers validate the software state
Protocol
Configurations with the hardware state, and report PASSED or
States
FAILED.
N9K# show consistency-checker ?
copp Verify copp programming from software context
egress-xlate Check PVLAN egress-xlate
fex-interfaces Compares software and hardware state of fex interfaces
Software forwarding
l2
Display Forwarding Information
L2 consistency
l3 L3 consistency
l3-interface Compares software and hardware properties of L3 interf
Programming link-state Compares software and hardware link state of interfaces
membership Check various memberships VLANs, Port-Channel
pacl Verify pacl programming in the hardware
Hardware racl Verify racl programming in the hardware
stp-state Verify spanning tree state in the hardware
Tables vacl Verify vacl programming in the hardware
vpc Verify vpc state in the hardware
vxlan VxLAN consistency checker
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 64
Consistency Checkers
Example – Unicast Route and vPC
• Consistency-Checker for single ip address or prefix – helps to
focus on a broken flow
N9K# show consistency-checker forwarding single-route ipv4 10.127.101.1 prefix 32 vrf
L3-Inner
Starting consistency check for v4 route 10.127.101.1/32 in vrf L3-Inner
Consistency checker passed for 10.127.101.1/32
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 65
Router ACL / Port ACL
Tool and Requirements N9K# show run | include ignore-case tcam
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 66
Router ACL / Port ACL
root@Server~$ ping 172.18.1.100 -c 5000 -W 1 -i 0
Troubleshooting Packet Loss <snip>
5000 packets transmitted, 4886 packets received, 0.2% packet loss,
Using a Port-ACL (PACL) to match
bridged traffic on an L2 switchport
ip access-list 101
N9K-1# show ip access-lists 101 statistics per-entry
IPV4 ACL 101 10 permit icmp 10.0.1.100/32 172.18.1.100/32
statistics per-entry 20 permit ip any any
10 permit icmp 10.0.1.100/32 172.18.1.100/32 [match=5000] ! Apply to server ingress interface
20 permit ip any any [match=323321] interface port-channel101
ip port access-group 101 in
N9K-1
172.18.1.100 N9K-3
Host B Po101
WAN 5000 ICMP Requests
received by N9K-1 10.0.1.100
Host A
N9K-1
1/14
172.18.1.100 N9K-3
Host B
WAN
5000 ICMP Responses 10.0.1.100
received by N9K-3
Host A
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 68
More Tools
• SPAN / ERSPAN, SPAN-on-Drop
• Embedded Logic Analyzer Module (ELAM)
• Flow Tracer*
• VXLAN, DME and KSTACK Consistency
Checkers*
• Streaming Hardware Telemetry
• Flexible Netflow / sFlow
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 70
Nexus 9000
… platform of possibilities
#CLUS © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Troubleshooting
Traffic
Forwarding
“It is a capital mistake to theorize
before one has data. Insensibly
one begins to twist facts to suit
theories, instead of theories to
suit facts.”
Sherlock Holmes (A Scandal in Bohemia)
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 73
Troubleshooting Methodology
Application
• Define the problem, understand the impact, Webpage Choppy
Call Drops Slowness Won’t Load Video
and determine the scope of the problem
based on the information gathered. This
helps you to make progress towards
resolution. Impact/Scope
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 74
Agenda • Nexus 9000 Hardware Forwarding – Refresher
• Path-of-the-Packet Troubleshooting
• Control-Plane Traffic
• Introduction • Data-Plane Traffic
• Monitor and Health-Check
• Troubleshooting Tools
• Troubleshooting Traffic Forwarding
• Common Failure Scenarios and Recommendations
• Summary and Take-Aways
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 75
Nexus 9000 Product Family Slice 0
900G
LSE Slice Interconnect
• 1.8T chip – 2 slices of 9x 100G each Slice 1
• X9700-EX modular linecards; 9300-EX TORs 900G
Slice 0
LS1800FX LSE – 18x 100G 1.8T
• 1.8T chip – 1 slice of 18x 100G with MACSEC
Slice 0
• X9700-FX modular linecards; 9300-FX TORs 1.8T LS1800FX – 18x 100G
LS3600FX2 Slice Interconnect
Slice 1 Slice 0 Slice 1
• 3.6T chip – 2 slices of 18x 100G with MACSEC + 1.6T 1.6T
1.8T
CloudSec
Slice Interconnect
• 9300-FX2 TORs LS3600FX2 – 36x 100G
Slice 2 Slice 3
S6400 1.6T 1.6T
• 6.4T chip – 4 slices of 16x 100G each
S6400 – 64x 100G
• 9332C, 9364C TOR; E2 fabric modules
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 76
Nexus 9000 Traffic Forwarding
Slice Slice
Ingress Slice 1 Interconnect
• Self-contained forwarding
Egress Slice 1
complex controlling subset of
ports on single ASIC
• Separated into Ingress and Ingress Slice 2
Egress functions Egress Slice 2
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 77
Slice Forwarding Path
(S6400 /
LS3600FX2 only)
Slice
Ingress → SSX
Ingress Forwarding Controller
Packet Payload
Ingress Ingress Packet
Packets MAC Parser
Lookup Key
Lookup
Result
Lookup
Pipeline
Replication Slice
Interconnect
Egress
Egress Egress Packet Egress
Buffering / Queuing
Packets MAC Rewrites Policy
/ Scheduling
← Egress
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 78
Ingress Lookup Pipeline
From
Ingress Ingress Forwarding Controller
MAC
Packet To Egress
Parser Slice
Flex
TCAM
Tiles TCAM
Lookup
Result
Lookup Key
Load
Forwarding Ingress
Balancing,
Lookup Classification
AFD / DPP
Flush
Flow Table
Lookup Pipeline
LSE / LS1800FX /
LS3600FX2 only
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 79
Flexible Forwarding Tiles
Flex Tile Flex Tile Flex Tile
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 85
Path of the Packet Process-level Debug
Control-Plane Traffic
OSPF BGP PIM
IP Stack
Packet Netstack
PktMgr Debug Manager
NIC Supervisor
Inband Engine
Inband Stats
System
SC-A SC-B*
System Controllers
Controller Stats Interface counters
Fabric Stats FM1 .. FM4
Fabric
Modules
HiGig
ASIC Counters Router-A
Linecard
CoPP Stats Nexus9500 Eth3/1 192.168.15.25/30
* Standby System Controller
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 86
Path of the Packet
Control-Plane Traffic: Interface Counters
N9508-A# show interface e3/1
Ethernet3/1 is up
admin state is up, Dedicated Interface
<snip>
RX
0 unicast packets 11 multicast packets 2 broadcast packets
13 input packets 2294 bytes
0 jumbo packets 0 storm suppression bytes
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
0 unicast packets 3 multicast packets 0 broadcast packets
3 output packets 702 bytes
0 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 87
Path of the Packet
Control-Plane Traffic: ASIC Counters
N9508-A# show system internal interface ii3/1/1 counters
Internal Port Statistics for Slot: ii3/1/1 If_Index 0x4a100000
================================================================
<snip>
Mac Pktflow:
Rx Counters:
<snip> <…continued…>
Tx Counters: In Discard: 0x0000000000000000/0
<snip> Giants: 0x0000000000000001/1
Mac Control: Output Errors: 0x0000000000000000/0
Rx Pause: 0x0000000000000000/0 Output Discard: 0x0000000000000000/0
Tx Pause: 0x0000000000000000/0 Bad Proto: 0x0000000000000000/0
Reset: 0x0000000000000000/0 Collision: 0x0000000000000000/0
Mac Errors: Late Collision: 0x0000000000000000/0
Undersize: 0x0000000000000000/0 No Carrier: 0x0000000000000000/0
Runt: 0x0000000000000000/0
Crc: 0x0000000000000000/0
Input Errors: 0x0000000000000001/1
<…continued…>
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 88
Path of the Packet
Control-Plane Traffic: ASIC Counters (for front-panel ports)
N9508-A# show hardware internal interface asic counters mod 3
Important Counters/Drops
--------------- -----------------------------------------------------------------------------
Interface Drop Reasons for the Interface, See below output for detail if any
--------- -----------------------------------------------------------------------------------
|9|9|9|9|9|9|8|8|8|8|8|8|8|8|8|8|7|7|7|7|7|7|7|7|7|7|6|6|6|6|6|6|6 ……… 0|0|0|0|0|0
|5|4|3|2|1|0|9|8|7|6|5|4|3|2|1|0|9|8|7|6|5|4|3|2|1|0|9|8|7|6|5|4|3 ……… 6|5|4|3|2|1
Eth3/1 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|X|.|.|.|. ……… .|.|.|.|.|.
Eth3/2 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|. ……… .|.|.|.|.|.
Eth3/3 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|. ……… .|.|.|.|.|.
Eth3/4 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|. ……… .|.|.|.|.|.
<snip>
Eth3/32 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|. ……… .|.|.|.|.|.
Drop Conditions
--------------- -----------------------------------------------------------------------------
67 : TAHOE Ingress DROP_ACL_DROP
Do “clear hardware internal interface-all asic counters mod <mod#>” to clear the conditions
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 89
Path of the Packet
Control-Plane Traffic: FM and Linecards Connectivity 9508-FM-E Fabric Module
Drop Conditions
--------------- -----------------------------------------------------------------------------
Do “clear hardware internal fabric interface-all asic counters mod <mod#>” to clear the conditions
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 93
Sup Engine
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 94
Path of the Packet
<… continued …>
Missed packets (FIFO overflow) 0
Single collisions .............. 0
Control-Plane Traffic: Inband Statistics Excessive collisions ........... 0
Multiple collisions ............ 0
N9508-A# show hardware internal cpu-mac inband stats Late collisions ................ 0
<snip> Collisions ..................... 0
eth3 stats: Defers ......................... 0
RMON counters Rx Tx Tx no CRS ..................... 0
----------------------+------------+-------------------- Carrier extension errors ....... 0
total packets 8406058 4481386 Rx length errors ............... 0
<snip> FC Rx unsupported .............. 0
65-127 bytes packets 8391840 4470748 Rx no buffers .................. 0
<snip> Rx undersize ................... 0
broadcast packets 15 561531 Rx fragments ................... 0
multicast packets 0 0 Rx oversize .................... 0
<snip> Rx jabbers ..................... 0
Error counters Rx management packets dropped .. 0
--------------------------------+-- Tx TCP segmentation context .... 0
CRC errors ..................... 0 Tx TCP segmentation context fail 0
Alignment errors ............... 0 Rate statistics
Symbol errors .................. 0 -----------------------------+---------
Sequence errors ................ 0 Rx packet rate (current/peak) 160 / 1254 pps
Good health-check.
RX errors ...................... 0 Tx packet rate (current/peak) 112 / 889 pps
Set a baseline!!
<… continued …> <snip>
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 95
Path of the Packet
Data-Plane
Issue: Communication failure for an L3 Flow
Eth1/18
Network
N9K-C92160-YC
Eth1/1 10.200.1.1
10.10.5.3/24 172.16.23.23
E865.4994.8C3F
547F.EE5D.41FC
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 104
1
Path of the Packet N9K-C92160-YC
eth1/18
Network
Check Forwarding Information Base (FIB) in Hardware make sure the results are matching
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 105
2
Path of the Packet Table #1 is for default VRF. To
find table number for other
Data-Plane: L3 Flow – Route Programmed in ASIC VRFs, use “show hardware
internal tah l3 v4host” command
Check routing table in the ASIC
module-1# show hardware internal tah l3 172.16.23.23/32 table 1
DLeft location: 0x182604
FP location : 0/0/0x1826 the physical interface where
*Flags: the packet is going to be sent
CC=Copy To CPU, SR=SA Sup Redirect, out. “show hardware internal
DR=DA Sup Redirect, TD=Bypass TTL Dec, tah interface ethernet 1/18 |
DC=SA Direct Connect,DE=Route Default Entry, inc src” should report “69”
LI=Route Learn Info
HW Loc | Ip Entry | VRF | MPath | NumP | Base/L2ptr |CC|SR|DR|TD|DC|DE|LI|
-----------|----------------|---------|-------|------|------------|--|--|--|--|--|--|--|
0/0/0x1826 | 172.16.23.23 | 1 | No | 0 | 0x90003 | | | | | | | |
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 107
4
Path of the Packet
Data-Plane: L3 Flow – Adjacency Programmed in ASIC
next-hop IP address
Adjacency Information in SW
N9K-C9216O-YC# show ip adjacency 10.200.1.1
<snip>
Destination
IP Adjacency Table for VRF default
mac-address
Total number of entries: 1
Address MAC Address Pref Source Interface Egress Interface
10.200.1.1 e865.4994.8c3f 50 arp Ethernet1/18
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 108
5&6
Path of the Packet
Data-Plane: L3 Flow – Adjacency Programmed in ASIC
Entry in the hardware Adjacency Table From Step #2
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 109
Common Failure
Scenarios and
Recommendations
Agenda
• Introduction
• Monitor and Health-Check
• Troubleshooting Tools
• Troubleshooting Traffic Forwarding
• Common Failure Scenarios and Recommendations
• Summary and Take-Aways
• Layer 1 Issues
• Dynamic Routing over vPC
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 111
Common Failure Scenarios
Layer 1 Issues - Symptoms
• Link-level issues
• Port flaps
• Port not coming up
• Transceiver Issues
• Transceiver not recognized
• Breakout cables not working
• Digital Optical Monitoring (DOM) Info missing
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 112
Link flaps or Port not coming up
Recommendation
• Connect the cable/media at both ends, insert the transceivers
completely and through following commands verify speed, duplex,
capabilities, supported modes and DOM values.
show interface eth x/y transceiver details
show interface eth x/y capabilities
show interface brief - check for the interface tuple display and others
show interface eth x/y status
• Enable auto-negotiation at both ends. Yes, we need it!
• Check transparent device or circuit in the middle Use attach module <mod#>
and show system internal
• Find who initiated link-down event first port-client event-history port
<port#> to find the events at
microsecond granularity
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 113
Transceiver Issues
Recommendation
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 114
Frame Corruption / CRC Issues Eth1/18 transmits stomped packets
Locate Frame Corruption N9K-C93180YC-FX# show interface counters errors
----------------------------------------------------------------------
N9K-C93180YC-FX# show switching-mode
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
Configured switching mode: Cut through
----------------------------------------------------------------------
Module Number Operational Mode
Eth1/18 0 0 9 0 0 0
1 Cut-Through
N9K-C93180YC# show interface e1/18
N9K-C93180YC#
<snip>
TX
cut-through (default) switching 9 output error 0 collision 0 deferred 0 late collision
<snip>
mode will propagate CRC errors. N9K-C93180YC#
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 115
Frame Corruption / CRC Issues
Recommendation
• Track the Xmit-Err and Stomped counters to find the source of
frame corruption
• Check for transceiver failures or power level issues. Should
transceiver and/or fiber be swapped out?
• Check for ports/link reporting excessive flaps
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 116
Dynamic Routing over vPC
Fail to Build Routing Adjacency
• Dynamic Adjacencies with vPC Peers - B
a vPC.
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 117
Dynamic Routing over vPC
Fail to Build Routing Adjacency – Solution
• Traffic sent over the peer link will not B
Po2
have the TTL decremented.
• The peer-gateway feature allows the
vPC peer (SVI X) to forward packets
SVI-X SVI-Y
P
P
Sw1 Sw2
on behalf of other peer (SVI-Y). This
save bandwidth by avoiding traffic over
the peer link. Po1
Sw1
• With peer-gateway and peer-router* Sw2
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 118
Dynamic Routing over vPC
Supported Designs
P P
Legend:
Sw1 Sw2
Router
N9K with IPv4 unicast
P
P Router/ Firewall
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 119
Dynamic Routing over vPC
Supported Designs (Continued)
P P
Sw1 Sw2
Sw1 Sw3
P P
Sw2 Sw4
P P
P P
Router1 Router2
DC1 DC2
Both the Routers peer with both
vPC peers. It is done over vPC Each Nexus device peers with
peer-link and using vPC VLAN. two vPC peers. It is done over
Data Center Interconnect (DCI)
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 120
Common Failure Scenarios
To reach the destination of greater stability…
• Don’t overlook layer 1 configurations or issues
• Implement vPC recommendations and best practices
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 124
Summary
&
Take-Aways
Summary
• Higher gate density and bandwidth achievements transitioning
hardware architecture and functions consolidation. Nexus 9000 is
at the core of these transitions, and flexible to fit your datacenter
design. It is platform of possibilities!!
• Lots of avenues to monitor health of the devices and their
resources usage.
• Supports several simple and powerful tools. Familiarize yourself.
• Understand path-of-the-packet for various traffic flows. It helps to
get the facts before building a theory.
• Don’t overlook Layer1 settings or Dynamic Routing over vPC
configuration. It can impact stability.
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 126
Take-Aways
Nexus 9000 have RICH SET OF CLIs, FEATURES
and TOOLS that are developed keeping all of you
in mind.
network downtime
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 127
Nexus 9000
… platform of possibilities
#CLUS © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
References and Useful Links
• Nexus 9000 Configuration Guide
• Transceiver Compatabilty Matrix
• FEX and Nexus9000 Ports Compatability
• CLI Analyzer – A tool at Cisco.com for Nexus Diagnostics
• Intelligent CAM Analytics and Machine Learning (iCAM) – Config Guide
• Nexus 3000/9000 Series Telemetry Sources
• Cisco Nexus Data Broker – Data sheets and literature
• Nexus 9000 Programmability Guide
• Cisco Nexus 3000/9000 NX-API REST SDK User Guide and API Reference
• Open NX-OS Programmabiity – User Guide
• Nexus 9000 GitHub Repository
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 129
Complete your
online session • Please complete your session survey
evaluation after each session. Your feedback
is very important.
• Complete a minimum of 4 session
surveys and the Overall Conference
survey (starting on Thursday) to
receive your Cisco Live water bottle.
• All surveys can be taken in the Cisco Live
Mobile App or by logging in to the Session
Catalog on ciscolive.cisco.com/us.
Cisco Live sessions will be available for viewing
on demand after the event at ciscolive.cisco.com.
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 130
Continue your education
Demos in the
Walk-in Labs
Cisco campus
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 131
Thank you
#CLUS
#CLUS