You are on page 1of 110

#CLUS

Monitoring and
Troubleshooting
Nexus 9000 Switches
Yogesh Ramdoss
Principal Engineer, Customer Experience
@YogiCisco
BRKDCN-3020

#CLUS
Cisco Webex Teams
Questions?
Use Cisco Webex Teams to chat
with the speaker after the session

How
1 Find this session in the Cisco Live Mobile App
2 Click “Join the Discussion”
3 Install Webex Teams or go directly to the team space
4 Enter messages/questions in the team space

Webex Teams will be moderated cs.co/ciscolivebot# BRKDCN-3020


by the speaker until June 16, 2019.

#CLUS © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 3
Agenda
• Introduction
• Monitor and Health-Check
• Troubleshooting Tools
• Troubleshooting Traffic Forwarding
• Common Failure Scenarios and Recommendations
• Summary and Take-Aways

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Introduction
Switching Architecture Changes
Shifting of Internal Architecture
Data BUS SOC SOC SOC SOC
Result BUS
Ethernet Out of Band Channel CROSSBAR

Linecard Linecard Linecard Linecard Linecard


SOC SOC SOC SOC

Design Shifts Resulting from Increasing Gate Density and Bandwidth

10/100M 100M/1G 1G/10G 10G/100/400G


SOC – Switch on Chip

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 7
FWD – Forwarding

Switching Architecture Changes


FIRE – Fabric Interface and Replication Engine ASIC
CTS – Cisco TrustSec
SOC – Switch on Chip

Consolidation of Functions
40Gbps Fabric 40Gbps Fabric
Channel Channel
EoBC

FABRIC INTERFACE

LC Arbitration
CPU Fabric ASIC Aggregator
Distributed
Forwarding Card
FIRE FIRE FIRE FIRE
ASIC ASIC L2 FWD ASIC ASIC LC Inband

Linecard
to LC
L3 FWD to ARB CPU S6400
Port ASICs
4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G 4 X 10G
SOC 1 SOC 2 SOC 3 SOC 4 SOC 5 SOC 6 SOC 7 SOC 8 SOC 9 SOC 10 SOC 11 SOC 12
CTS ASICs

32 x 10G Ports 48 x 10G Ports 64 x 100G Ports


Design Shifts Resulting from Increasing Gate Density and Bandwidth

Catalyst Nexus Nexus


6807-XL 7700 9508

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
Generations of Nexus 9000
1 2
NFE NFE ASE Leverages merchant
Merchant
Silicon Silicon + Cisco ASIC
to enhance services

4
SOC SOC SOC SOC
3
SOC
Switch “is”
the ASIC

Non Blocking Leaf and SOC SOC SOC SOC


Spine based CLOS
Network inside the Switch
NFE – Network Forwarding Engine
ASE – Application Spine Engine
SOC – Switch On Chip

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
Nexus 9000 Product Family
Focus For This Session
ASICs Platforms
StrataXGS Trident* 94XX, 9636
StrataXGS Tomahawk* 9432C, C950X-FM-S
StrataXGS Trident* + Northstar 9396, 93128, 95XX
StrataXGS Trident* + Donner 9372, 9332, 93120 This session
StrataDNX Jericho* X9600-R/X-9600-RX is going to
Tahoe-Sugarbowl 93XX-EX, 97XX-E/EX discuss …
Tahoe-Lacrosse 92XX, C950X-FM-E
Tahoe-Davos 92160YC
Rocky-Homewood F/FX/FXP

Rocky-Bigsky 9364C,C950X-FM-E2
Rocky-Heavenly FX2
* Merchant Silicon from Broadcom
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
Building Data Center Fabrics with Nexus 9000
DCNM
DCNM
L3
L3
RR RR

VXLAN / L3
L3
EVPN L3

L2
L3 VPC
L3 L3
Hypervisor Hypervisor

Hypervisor

Application Centric Standalone – Standalone –


Infrastructure (ACI) – Programmable Fabric Programmable IP Standalone – Traditional
Turnkey Fabric with VXLAN+EVPN Network Data Center Network
DCNM - Data Center Network Management
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 11
Reference 
Just to let you know…
• With wide range of Nexus9000 platforms available in the market
place, this session is going to focus on models that are with Cloud-
scale ASICs and are at the cutting-edge.
• We will not be discussing hardware architecture in detail, but will
provide a quick refresher
• With good number of topics to cover, we are not going to discuss
Multicast, QoS or Buffering.
• Please hold on to your questions till end of the section.
• At any point of time during the presentation and after, ask your
question in Webex Teams room.

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 12
Nexus 9000
… platform of possibilities

#CLUS © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Monitor
and
Health-Check
Agenda • Hardware Diagnostics
• On-board Failure Logging (OBFL)
• Introduction • Device Resource Usage
• Control-Plane Policing (CoPP)
• Monitor and Health-Check
• Hardware Rate-Limiters (HWRL)
• Troubleshooting Tools • Recommended Software
• Troubleshooting Traffic Forwarding
• Common Failure Scenarios and Recommendations
• Summary and Take-Aways

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
Hardware Diagnostics
Categories
Bootup Diagnostics
Run at bootup and detect faulty hardware before it is brought online by
NX-OS
Runtime or Health-Monitoring Diagnostics
Runtime diagnostics are also called as Health Monitoring (HM) diagnostics,
which are non-disruptive. They detect runtime hardware errors, memory
errors, hardware degradation, software faults, and resource exhaustion of
a live device.
On-demand Diagnostics
Run once or at user-designated intervals and helps localize faults.

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
Hardware Diagnostics
Bootup Diagnostics
Supervisor Engine:
• USB – Checks integrity at the initialization of the USB Controller.
• ManagementPortLoopback – Tests loopback on the management port
• EOBCPortLoopback* – Checks health of Ethernet Out-of-Band Channel
(EOBC), which is used for communication between Supervisor engine(s) and
modules
• OBFL - Verifies the integrity of the On-Board Failure Log (OBFL) flash

Module
• OBFL - Verifies the integrity of the OBFL flash

* Only applicable to Nexus 9500s


#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
Hardware Diagnostics
Runtime / Health Monitoring Diagnostics

Sup Engine Tests Module Tests

NVRAM ASICRegisterCheck

RealTimeClock PrimaryBootROM

PrimaryBootROM SecondaryBootROM

SecondaryBootROM PortLoopback

BootFlash RewriteEngineLoopback

USB

SystemMgmtBus

Console

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
Hardware Diagnostics
On-Demand Diagnostics
• On-demand tests help localize faults and are usually needed:
• to respond to an event that has occurred, such as isolating a fault.
• in anticipation of an event that may occur, such as a resource exceeding its
utilization limit.

• Health Monitoring tests can be run on demand. Also, can modify


the default interval for a health monitoring test.

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
Hardware Diagnostics
Configuration and Commands
Setting bootup diagnostic level
N93128# config t
N93128(config)# diagnostic bootup level [bypass | complete]

Activating a runtime diagnostic test and setting interval


N93128(config)# diagnostic monitor interval module <mod#> test <test-id | name | all>
hour <hour> min <min> second <sec>
N93128(config)# diagnostic monitor module <mod#> test <test-id | name | all>

Setting ondemand test, starting and stopping


N93128# diagnostic ondemand iteration <count>
N93128# diagnostic ondemand action-on-failure {continue failure-count <num-fails> | stop}
N93128# diagnostic start module <mod#> test [test-id | name | all | non-disruptive ] [port
port-number | all ]
N93128# diagnostic stop module <mod#> test [test-id | name | all]

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Hardware Diagnostics
Configuration and Commands
Diagnostics tests status and testing intervals:
N93128# show diagnostic content module <mod | all>
Diagnostics test suite attributes:
B/C/* - Bypass bootup level test / Complete bootup level test / NA
P/* - Per port test / NA
M/S/* - Only applicable to active / standby unit / NA
D/N/* - Disruptive test / Non-disruptive test / NA
H/O/* - Always enabled monitoring test / Conditionally enabled test / NA
F/* - Fixed monitoring interval test / NA
X/* - Not a health monitoring test / NA
E/* - Sup to line card test / NA
L/* - Exclusively run this test / NA
T/* - Not an ondemand test / NA
A/I/* - Monitoring is active / Monitoring is inactive / NA
Module 1: 1/10G-T Ethernet Module (Active)
Testing Interval
ID Name Attributes (hh:mm:ss)
____ __________________________________ ____________ _________________
1) USB---------------------------> C**N**X**T* -NA-

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
On-Board Failure Logging (OBFL)
Why we need it and what it does?
• OBFL logs failure data to persistent storage
• Persistent storage – Non-volatile flash memory on the modules.
Accessible for analysis.
• Enabled by default for all features
• OBFL Flash supports limited numbers of Read-Write operations

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
On-Board Failure Logging (OBFL)
Configuration and Status
N93128(config)# hw-module logging onboard ?
<CR>
counter-stats Enable/Disable OBFL counter statistics
cpuhog Enable/Disable OBFL cpu hog events
environmental-history Enable/Disable OBFL environmental history
error-stats Enable/Disable OBFL error statistics
interrupt-stats Enable/Disable OBFL interrupt statistics
module Enable/Disable OBFL information for Module
obfl-logs Enable/Disable OBFL (boot-uptime/device-version/obfl-history)
N93128# show logging onboard status
----------------------------
OBFL Status
----------------------------
Switch OBFL Log: Enabled
Module: 1 OBFL Log: Enabled
card-boot-history Enabled
card-first-power-on Enabled
<snip>

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
On-Board Failure Logging (OBFL)
On-Board Failure Logging - Sample Nearly 20 different options!
N93128# show logging onboard exception-log
---------------------------------------------------------------
Module: 1
---------------------------------------------------------------
<snip>
exception information --- exception instance 1 ----
Device Id : 49
Device Name : Temperature-sensor
Device Errorcode : 0xc3101203
Device ID : 49 (0x31)
Device Instance : 01 (0x01)
Dev Type (HW/SW) : 02 (0x02)
ErrNum (devInfo) : 03 (0x03)
System Errorcode : 0x4038001e Module recovered from minor temperature alarm
Error Type : Minor error
PhyPortLayer :
Port(s) Affected :
<snip>
Time : Sun Jan 20 19:42:08 2019

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
Device Resource Usage
Hardware Capacity
Resource What it gives?

Module Usage of Bootflash, Logflash, and NVRAM


show hardware Total Tx/Rx drops (per module) and ports with highest drop
capacity <options> Interface
count
L2 CAM table resource, ACL resources, IPv4/v6 Unicast Host
and Route entries resources, IPv4/v6 Multicast entries
Forwarding
resources, QoS resources (aggregate and distributed policers)
– per module and per forwarding engine instance
Fabric Fabric channel bandwidth, current ingress and egress traffic rate
PSU redundancy mode, total capacity, power reserved (for Sup,
Power
fabric modules and fans), and power drawn
EOBC (Ethernet Out of
Total packets forwarded, transmit rate, dropped packets
Band Channel)
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
Device Resource Usage
Hardware Capacity – Forwarding Command outputs are tailored
to highlight key features

N9504# show hardware capacity FORWARDING


<snip>
INSTANCE 0x0: ACL Hardware Resource Utilization (Mod 1)
----------------------------------------------------------------------------
Used Free Percent Utilization
----------------------------------------------------------------------------
Ingress L2 QOS 2 254 0.78
Ingress L2 QOS IPv4 0 0.00
Ingress L2 QOS IPv6 0 0.00
Ingress L2 QOS MAC 0 0.00
Ingress L2 QOS ALL 2 0.78
Ingress L2 QOS OTHER 0 0.00
Ingress L2 SPAN ACL 0 256 0.00
Ingress RACL 2 1534 0.13
Ingress L3/VLAN QOS 24 488 4.68
Ingress L3/VLAN SPAN ACL 0 256 0.00
SPAN 0 512 0.00
Egress RACL 2 1790 0.11
Feature BFD 3 103 2.83
<snip>

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
Device Resource Usage
Hardware Capacity – Forwarding (Contd.)
----------------------------------------------------------------------------
Used Free Percent Utilization
----------------------------------------------------------------------------
<snip>
LOU 4 11 26.66
Both LOU Operands 4
Single LOU Operands 0
LOU L4 src port: 2
LOU L4 dst port: 2
LOU L3 packet len: 0
LOU IP tos: 0
LOU IP dscp: 0
LOU ip precedence: 0
LOU ip TTL: 0
TCP Flags 0 16 0.00
Protocol CAM 2 244 0.81
Mac Etype/Proto CAM 0 14 0.00
L4 op labels, Tcam 0 0 30 0.00
L4 op labels, Tcam 1 0 62 0.00
Ingress Dest info table 0 512 0.00
Egress Dest info table 0 512 0.00

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
Device Resource Usage
Hardware Capacity – TCAM
N9K-C9318OYC-EX # show system internal access-list globals
slot 1 -------------------------------------------------
======= Total configured size: 4096
<snip> Remaining free size: 0
INSTANCE 0 TCAM Region Information: Note: Ingress SUP region includes Redirect region
Ingress:
Region TID Base Size Width Egress:
---------------------------------------------------------- Region TID Base Size Width
NAT 13 0 0 1 -------------------------------------------------
Ingress PACL 1 0 0 1 Egress VACL 15 0 0 1
Ingress VACL 2 0 0 1 Egress RAC 16 0 1792 1
Ingress RACL 3 0 1792 1 Egress SUP 18 1792 256 1
Ingress RBACL 4 0 0 1 Egress L2 QOS 19 0 0 1
Ingress L2 QOS 5 1792 256 1 Egress L3/VLAN QOS 20 0 0 1
Ingress L3/VLAN QOS 6 2048 512 1 Egress CoPP 36 0 0 1
Ingress SUP 7 2560 512 1 -------------------------------------------------
Ingress L2 SPAN ACL 8 3072 256 1 Total configured size: 2048
Ingress L3/VLAN SPAN ACL 9 3328 256 1
Ingress FSTAT 10 0 0 1
SPAN 12 3584 512 1
Ingress REDIRECT 14 0 0 1
Ingress NBM 30 0 0 1
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
Device Resource Usage
iCAM - Introduction
• Intelligent CAM Analytics and Machine Learning (iCAM)* provides
visibility into which network traffic or applications utilize system’s
TCAM/SRAM resources.
• Features that use these resources are FIB, ACL
(RACL/PACL/VACL), PBR, NAT, QoS, Multicast, WCCP and more.
• Ability to predict the future usage with high granularity.
• Monitors the scale.

* name may be changed


#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 30
Device Resource Usage
iCAM – Scale of the Features
1) Enable the feature, and enable it to monitor scale
N9K(config)#icam monitor scale

2) Configure scale and interval parameters:


N9K(config)#icam monitor scale <feature> <scale parameter> limit <scale limit>
N9K(config)#icam monitor interval <time> num_intervals <number-of-intervals>
N9K(config)#icam monitor scale threshold info <%> warning <%> critical <%>
N9K(config)#icam monitor prediction scale <feature> <scale parameter>

N9K# show icam scale ?


history Show scale history
l2-switching Layer 2 switching
thresholds Show thresholds statistics
unicast-routing Unicast routing
utilization Show utilization statistics

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
Device Resource Usage
iCAM – Scale of the Features – Multicast Routing
N9504(config)# icam monitor scale multicast-routing multicast-routes limit 100

N9504# show icam scale multicast-routing


============================================
Each feature can be configured
Info Threshold = 80 percent (default) | with independent parameter limit.
Warning Threshold = 90 percent (default) |
Critical Threshold = 100 percent (default) |
All timestamps are in UTC |
============================================
Mroutes configured scale now set = 100
-------------------------------------------------------------------------------
Other multicast parameters still at default
Scale limits for Multicast Routing
-------------------------------------------------------------------------------
Feature Verified Config Used Cur Threshold Polled
Scale Scale Util Exceeded Timestamp
-------------------------------------------------------------------------------
Multicast Routes 32000 100 1 1.00 None 2019-02-20 17:53:11
PIM Neighbors 250 250 0 0.00 None 2019-02-20 17:53:11
IGMP Groups 8000 8000 0 0.00 None 2019-02-20 17:53:11

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 35
Device Resource Usage
iCAM – Scale of the Features - Thresholds
N9504(config)# icam monitor scale threshold info 75 warning 85 critical 95

N9504# show icam scale New threshold levels.


============================================== Default: 80, 90 and 100
Info Threshold = 75 percent (configured) |
Warning Threshold = 85 percent (configured) |
Critical Threshold = 95 percent (configured) |
All timestamps are in UTC | Critical threshold level exceeded
==============================================
-------------------------------------------------------------------------------
Scale limits for L2 Switching Routing
-------------------------------------------------------------------------------
Feature Verified Config Used Cur Threshold Polled
Scale Scale Util Exceeded Timestamp
-------------------------------------------------------------------------------
VLANs 3967 500 492 98.40 Critical 2019-02-20 19:21.46
MAC Addresses 92000 92000 4 0.00 None 2019-02-20 19:21.46

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
Device Resource Usage
iCAM – Scale of the Features - Utilization
N9504# show icam scale utilization
==============================================
Info Threshold = 75 percent (configured) |
Warning Threshold = 85 percent (configured) |
Critical Threshold = 95 percent (configured) |
All timestamps are in UTC |
============================================== Values since the feature enabled
-------------------------------------------------------------------------------------------------
Scale limits for L2 Switching Routing
-------------------------------------------------------------------------------------------------
Feature Verified Config Used Cur Avg 7-Day 7-Day Peak Peak Peak
Scale Scale Util Util Util Timestamp Util Timestamp
-------------------------------------------------------------------------------------------------
VLANs 3967 500 492 98.40 98.40 98.40 2019-02-20 19:21.46 98.40 2019-02-18 11:53:28
<snip>

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
Control-Plane Policing (CoPP)
Things to Check
• Choose either strict (default), moderate, lenient or dense policy.
• CoPP is performed per forwarding-engine. Configure rates to make
sure the aggregate traffic doesn’t overwhelm CPU.
• Monitor drop counters continuously. Traffic dropped because of
malfunction or an attack? Drop counters in default class?. Did we
miss to classify important traffic?
• CoPP configuration is an on-going process. Review configuration
after every major network change.

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 38
Control-Plane Policing (CoPP)
Quick Checks – Config and Stats
N9504# show copp status
Policy-map attached to the control-plane: copp-system-p-policy-strict
(match-any)
N9504# show policy-map interface control-plane | include class-map
class-map copp-system-p-class-l3uc-data (match-any)
class-map copp-system-p-class-critical (match-any)
class-map copp-system-p-class-important (match-any)
class-map copp-system-p-class-multicast-router (match-any)
class-map copp-system-p-class-multicast-host (match-any)
class-map copp-system-p-class-l3mc-data (match-any)
class-map copp-system-p-class-normal (match-any)
class-map copp-system-p-class-ndp (match-any)
<snip>
class-map copp-system-p-class-redirect (match-any)
class-map copp-system-p-class-exception (match-any)
class-map copp-system-p-class-exception-diag (match-any)
<snip>
class-map copp-system-p-class-undesirablev6 (match-any)
class-map copp-system-p-class-l2-default (match-any)
class-map class-default (match-any)
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 39
Control-Plane Policing (CoPP)
Quick Checks – Config and Stats (Contd.)
N9504# show policy-map interface control-plane
<snip>
class-map copp-system-p-class-important (match-any)
match access-group name copp-system-p-acl-hsrp
match access-group name copp-system-p-acl-vrrp
match access-group name copp-system-p-acl-hsrp6
match access-group name copp-system-p-acl-vrrp6
match access-group name copp-system-p-acl-mac-lldp
match access-group name copp-system-p-acl-icmp6-msgs
set cos 6
police cir 3000 pps , bc 128 packets
module 1 :
transmitted 2121674 packets;
dropped 143189 packets; Do “clear copp statistics”
<snip>
class-map class-default (match-any) and check again!
set cos 0
police cir 50 pps , bc 32 packets
module 1 :
transmitted 2231318 packets;
dropped 4239 packets;

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 40
Hardware Rate-Limiters (HWRL)
Things to Check
• Rate-limiters prevent redirected-due-to-exception packets from
overwhelming CPU. E.g., ACL Log or Layer3 Glean
N9504# show hardware rate-limiter
Have a close look
Units for Config: packets per second (kilo bits per
second for span-egress) at the allowed and
Allowed, Dropped & Total: aggregated since last dropped stats
clear counters
Enable/disable Module: 1
RLs, update their R-L Class Config Allowed Dropped Total
rates with +----------------+----------+-------------+----------+-------+
L3 MTU 0 0 0 0
“hardware rate-
L3 ttl 500 65 0 65
limiter” command L3 glean 100 28874211 9539369 38413580
under “config t” L3 mcast loc-grp 3000 0 0 0
access-list-log 100 0 0 0
bfd 10000 0 0 0
exception 50 0 0 0
span 50 0 0 0
<snip>
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
Command Line Interface
Programmability Support We all know grep/egrep, Command outputs are tailored
to highlight key features

N93180TC-EX# show version | ?


or include/exclude. But,
awk Mini AWK there are more!!
cut Print selected parts of lines.
diff Show difference between current and previous invocation
email Email command output
head Display first lines
human Output in human format
json Output in json format
json-pretty Output in json pretty print format
section Show lines that include the pattern as well as the subsequent lines
sed Stream Editor
sort Stream Sorter
tr Translate, squeeze, and/or delete characters
uniq Discard all but one of successive identical lines
vsh The shell that understands cli command
wc Count words, lines, characters
xml Output in xml format (according to .xsd definitions)
xmlin Convert CLI show commands to their XML formats
xmlout Output in xml format (according to the latest .xsd version)
begin Begin with the line that matches
count Count number of lines

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 42
Recommended Software Nexus 9000
Choose the right one Recommended Software
bulletin at Cisco.com
General Recommendation for New and Existing Deployments:
Platform Recommended Release
Nexus 9000 7.0(3)I7(6)

Earlier Recommendations and Releases:


Type Release Number

Previous Recommended Release 7.0(3)I4(8b)


Current Long-lived Release 7.0(3)I7(x)
Previous Long-lived Release 7.0(3)I4(x)
Short-lived Releases 7.0(3)I1(x), 7.0(3)I3(x), 7.0(3)I5(x),
7.0(3)I6(x), and 9.2(x)*

* If 9.2(x) is needed to deploy new hardware or features, use the latest version available on CCO.
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
Monitor and Health Check
Summary Never underestimate the
power of syslog (show logging
• Hardware diagnostic capabilities… bootup, log), and interface counters
runtime and on-demand. Help to check and errors (show interface).
hardware failure and run-time issues.
• OBFL helps to keep an eye on the systems’
events and exceptions. Critical for analysis.
• Monitoring resource usage is critical to take
precautionary measures
• Fine-tune CoPP and HWRL to protect
control-plane as well as to attain stability You are going to find
valuable things!!
• Pick the right software
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 44
Nexus 9000
… platform of possibilities

#CLUS © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Troubleshooting
Tools
Agenda
• Introduction • Ethanalyzer
• SPAN to CPU
• Monitor and Health-Check • Consistency Checkers (CC)
• Troubleshooting Tools • Port ACL / Router ACL

• Troubleshooting Traffic Forwarding


• Common Failure Scenarios and Recommendations
• Summary and Take-Aways

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 47
Ethanalyzer
Introduction
• Built-in tool to analyze the traffic sent and received by CPU. Helpful to
troubleshoot High CPU or Control-plane issues like HSRP failover or OSPF
adjacency flaps.
• Based on tshark code
• Two filtering approaches for configuring a packet capture
Display-Filter Example Capture-Filter Example
“eth.addr==00:00:0c:07:ac:01” “ether host 00:00:0c:07:ac:01”
“ip.src==10.1.1.1 && ip.dst==10.1.1.2” “src host 10.1.1.1 and dst host 10.1.1.2”
“snmp” "udp port 161”
“ospf” “ip proto 89”

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
Ethanalyzer Capture Stop
Filters
Process and Configuration Interface Criteria
(1) Identify Capture Interface
• mgmt – captures traffic on mgmt0 interface
• Inband - captures traffic sent to the control-plane/CPU
(2) Configure Filter
• Display-Filter – captures all traffic but displays only the traffic meeting the criteria
• Capture-Filter - captures all traffic meeting the criteria
(3) Define Stop Criteria
• By default, it stops after capturing 10 frames. Can be changed with limit-
captured-frames configuration. 0 means no limit, runs until user issues cntrl+C
• autostop can be used, to stop the capture after specified duration, filesize, or
number of files.

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
Host
172.18.37.71
Real World Example
Slow Download Rate
Nexus 9000
 Server in VLAN 527
 Downloads/Uploads over the
WAN are slow Eth4/1 Eth4/2
WAN
 Downloads/Uploads on the
LAN have no problem
Internet
 No incrementing errors on any SVI 527
interface and low average 10.5.27.2/24
Gateway
interface utilization 64a0.e745.89c1

10.5.27.1
Server
78da.6e19.4500
10.5.27.38
000a.f31a.1c1c
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
Host
172.18.37.71
Real World Example
Slow Download Rate
Nexus 9000
Can we quickly validate on the Nexus 9000
if traffic is hardware or software switched?
Eth4/1 Eth4/2
Ethanalyzer! WAN

N9k# ethanalyzer local interface inband capture-filter "host 10.5.27.38 or host 172.18.37.71" Internet
SVI 527
Gateway
10.5.27.2/24
If traffic is software-switched it 64a0.e745.89c1
would be seen on the inband.
Filter for any traffic between hosts
experiencing the slow downloads. 10.5.27.1
Server
78da.6e19.4500
10.5.27.38
000a.f31a.1c1c
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 52
Host
172.18.37.71
Real World Example
Slow Download Rate
Nexus 9000
Can we quickly validate on the Nexus 9000
if traffic is hardware or software switched?
All traffic from Server (10.5.27.38)
to the InternetEth4/1
(172.18.37.71) is Eth4/2
Ethanalyzer! being software switched WAN

N9k# ethanalyzer local interface inband capture-filter "host 10.5.27.38 or host 172.18.37.71"
Capturing on inband
2017-01-17 07:28:16.406589 10.5.27.38 -> 172.18.37.71 Internet
TCP 60 [TCP Keep-Alive] 28123 > http [ACK]
Seq=1 Ack=1 Win=8760 Len=0
SVI 527
2017-01-17 07:28:16.406603 10.5.27.2 -> 10.5.27.38 ICMP 70 Redirect (Redirect for host)
Gateway
Server 10.5.27.2/24
2017-01-17 07:28:16.406617 10.5.27.38 -> 172.18.37.71 TCP 60 [TCP Out-Of-Order] 28123 > http [FIN, ACK]
Seq=1 Ack=1 Win=8760 Len=0 64a0.e745.89c1
2017-01-17 07:28:16.407142 10.5.27.38 -> 172.18.37.71 TCP 60 28124 > httpN9K
[SYN]
(10.5.272.) sends ICMP
Seq=0 Win=8760 Len=0 MSS=1460 redirects to Server (10.5.27.38)
2017-01-17 07:28:16.407175 10.5.27.38 -> 172.18.37.71 TCP 60 [TCP Out-Of-Order] 28124 > http [SYN]
Seq=0 Win=8760 Len=0 MSS=1460 10.5.27.1
etc...
78da.6e19.4500
10.5.27.38
000a.f31a.1c1c
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 53
Host
172.18.37.71
Real World Example
Slow Download Rate
Nexus 9000
Can we quickly validate on the Nexus 9000
if traffic is hardware or software switched?
Eth4/1 Eth4/2
Ethanalyzer! WAN

N9k# ethanalyzer local interface inband capture-filter "host 10.5.27.38" limit-captured-frames 1 detail | i
Ethernet|Internet Internet
Capturing on inband SVI 527
1 packet captured Gateway
Server
Ethernet II, Src: Cisco_1a:1c:1c (00:0a:f3:1a:1c:1c), 10.5.27.2/24
Dst: Cisco_45:89:c1 (64:a0:e7:45:89:c1)
Internet Protocol Version 4, Src: 10.5.27.38 (10.5.27.38), Dst: 64a0.e745.89c1
172.18.37.71 (172.18.37.71)

The Server (10.5.27.38) should be using the


Internet Gateway, but is sending traffic 10.5.27.1
Server
destined to the Nexus 9000 MAC address
78da.6e19.4500
10.5.27.38
000a.f31a.1c1c
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
Host
172.18.37.71
Real World Example
Slow Download Rate
Nexus 9000
Root cause:
 Server has a firewall enabled
Eth4/1 Eth4/2
to block ALL ICMP Redirects
WAN
to avoid poisoning Server’s Default
Fix Options: Gateway:
10.5.27.2
1. Re-configure the firewall to Internet
allow ICMP redirects SVI 527
Gateway
2. Add a route for WAN subnets 10.5.27.2/24
to the Server, with Internet 64a0.e745.89c1
Gateway as next-hop
3. Configure “no ip redirects” on 10.5.27.1
the Nexus9000 under the SVI Server
78da.6e19.4500
10.5.27.38
000a.f31a.1c1c
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 55
SPAN to CPU
Introduction and Configuration SPAN Replicated packet

Switch Port ANalyzer (SPAN) SPAN Destination


mirrors the traffic from source SPAN Source
Eth1/6
ports/VLANs to destination port(s).
monitor session 1 Sniffer Device
Eth1/1 Switch
source interface eth1/1
destination interface eth1/6

In SPAN to CPU, the destination CPU SPAN Destination


port is the CPU in the switch.
SPAN Source SPAN Replicated packet
monitor session 1
source interface eth1/1
destination interface sup-eth 0
Eth1/1 Switch
<options>

But, how to differentiate the regular control-plane packets to SPAN to CPU packets?
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 56
SPAN to CPU
Troubleshooting Packet Loss monitor session 1
source interface eth1/1 rx
monitor session 1
destination interface sup-eth 0
source interface eth1/2 rx
filter access-group ACL1
destination interface sup-eth 0
no shut
filter access-group ACL1
ip access-list ACL1
no shut
permit icmp 10.214.10.5/32 any
ip access-list ACL1
permit icmp 10.214.10.5/32 any

Eth1/1
Network
Network Network
Eth1/2 Eth1/1
N9K-A N9K-B

10.214.10.5/24 10.214.50.11/24
ICMP Traffic
Host A Host B

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 57
SPAN to CPU Captures only the SPAN to CPU
packets, not regular packets!!
Troubleshooting Packet Loss
N9K-A# ethanalyzer local interface inband mirror display-filter "icmp”
Capturing on inband
2018-12-12 04:41:32.164790 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
2018-12-12 04:41:32.165562 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
2018-12-12 04:41:32.166266 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
2018-12-12 04:41:32.166930 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
2018-12-12 04:41:34.167589 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request

Eth1/1
Network
Network Network
Eth1/2 Eth1/1
N9K-A N9K-B

10.214.10.5/24 10.214.50.11/24
ICMP Traffic
Host A Host B

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 58
SPAN to CPU
Troubleshooting Packet Loss
N9K-B# ethanalyzer local interface inband mirror display-filter "icmp”
Capturing on inband
2018-12-12 04:41:32.164982 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
2018-12-12 04:41:32.165941 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
2018-12-12 04:41:32.166611 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request
2018-12-12 04:41:34.167992 10.214.10.5 -> 10.214.50.11 ICMP Echo (ping) request

Eth1/1
Network
Network Network
Eth1/2 Eth1/1
N9K-A N9K-B

10.214.10.5/24 10.214.50.11/24
ICMP Traffic
Host A Host B

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 59
SPAN to CPU
VXLAN – Topology and Traffic Flow

Spine

Leaf 10.1.1.1 10.1.1.2

10.0.0.100 10.0.0.101
Host A ICMP Host B

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 60
SPAN to CPU
VXLAN Decode Example Available in release 7.0(3)I7(4), 9.2(1) and later releases

N9200# ethanalyzer local interf inband mirror display-filter icmp limit-cap 0 detail
Frame 1 (148 bytes on wire, 148 bytes captured)
<snip>
[Protocols in frame: eth:ip:udp:vxlan:eth:ip:icmp:data] <<< frame structure
Ethernet II, Src: 78:0c:f0:a2:2b:df (78:0c:f0:a2:2b:df), Dst: 70:0f:6a:f2:8c:05
(70:0f:6a:f2:8c:05)
<snip>
Type: IP (0x0800)
Internet Protocol, Src: 10.1.1.1 (10.1.1.1), Dst: 10.1.1.2 (10.1.1.2) <<< VTEPs
Version: 4
Header length: 20 bytes
<snip>
Source: 10.1.1.1 (10.1.1.1)
Destination: 10.1.1.2 (10.1.1.2)
User Datagram Protocol, Src Port: 22790 (22790), Dst Port: 4789 (4789) <<< VXLAN Attributes
Source port: 22790 (22790)
Destination port: 4789 (4789)
<snip>

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 61
SPAN to CPU
VXLAN Example (Contd.)
Virtual eXtensible Local Area Network
Flags: 0x08
<snip>
VXLAN Network Identifier (VNI): 10990010 <<< VNI for vlan 10
Reserved: 0
Ethernet II, Src:, 00:aa:aa:aa:10:10 (00:aa:aa:aa:10:10) Dst: 00:bb:bb:bb:20:20 <<< Inner MAC
(00:bb:bb:bb:20:20)
<snip>
Type: IP (0x0800)
Internet Protocol, Src: 10.0.0.100 (10.0.0.100), Dst: 10.0.0.101 (10.0.0.101) <<< Inner IPs
<snip>
Source: 10.0.0.100 (10.0.0.100)
Destination: 10.0.0.101 (10.0.0.101)
Internet Control Message Protocol <<< Original ICMP
Type: 8 (Echo (ping) request)
Code: 0 ()
Checksum: 0x3597 [correct]
Identifier: 0xb00f
<snip>

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 62
SPAN to CPU
Things to Know
• All SPAN replication is done in the hardware with no impact to CPU
• SPAN packets to CPU are rate-limited, and excess packets are dropped in
the inband path. Use “hardware rate-limiter span …” command to change the
rate.
• Starting from 7.0(3)I7(1) onwards, SPAN packets truncation is supported only
in Nexus 9300-EX/FX/FX2 platforms
• SPAN is not supported for management ports

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 63
Consistency Checkers
How it helps?
Consistency Checkers validate the software state
Protocol
Configurations with the hardware state, and report PASSED or
States
FAILED.
N9K# show consistency-checker ?
copp Verify copp programming from software context
egress-xlate Check PVLAN egress-xlate
fex-interfaces Compares software and hardware state of fex interfaces
Software forwarding
l2
Display Forwarding Information
L2 consistency
l3 L3 consistency
l3-interface Compares software and hardware properties of L3 interf
Programming link-state Compares software and hardware link state of interfaces
membership Check various memberships VLANs, Port-Channel
pacl Verify pacl programming in the hardware
Hardware racl Verify racl programming in the hardware
stp-state Verify spanning tree state in the hardware
Tables vacl Verify vacl programming in the hardware
vpc Verify vpc state in the hardware
vxlan VxLAN consistency checker

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 64
Consistency Checkers
Example – Unicast Route and vPC
• Consistency-Checker for single ip address or prefix – helps to
focus on a broken flow
N9K# show consistency-checker forwarding single-route ipv4 10.127.101.1 prefix 32 vrf
L3-Inner
Starting consistency check for v4 route 10.127.101.1/32 in vrf L3-Inner
Consistency checker passed for 10.127.101.1/32

• Consistency-Checker for vPC


N9K# show consistency-checker vpc source-interface port-channel 45
VPC 45 name Po45
Validating vpc 45 member: Ethernet1/1/3
Error vpc 45, is_vpc is not 1 and remote vpc state is Up
VPC Consistency Check Failed

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 65
Router ACL / Port ACL
Tool and Requirements N9K# show run | include ignore-case tcam

• For intermittent packet loss issue


hardware access-list tcam region ing-ifacl 512
hardware access-list tcam region ing-racl 1024
specifically in scenario where the N9K# show system internal access-list globals IFACL = PACL
exact packet count can be defined, <snip>
----------------------------------------------------------
Router ACL (RACL) and Port ACL INSTANCE 0 TCAM Region Information:
----------------------------------------------------------
(PACL) can be a useful tool Ingress:
--------
Region TID Base Size Width
• Requires TCAM allocation for PACL -----------------------------------------------------------
NAT 13 0 0 1
followed by switch reload. Ingress PACL
Ingress VACL
1
2
0
0
512
0
1
1
N9K(config-if)# ip port access-group test1 in Ingress RACL 3 512 1024 1
Ingress RBACL 4 0 0 1
ERROR: TCAM region is not configured. Please
Ingress L2 QOS 5 1536 256 1
configure TCAM region and retry the command Ingress L3/VLAN QOS 6 1792 512 1
<snip>
TCAM space is limited. The choice for what is best for Total configured size: 4096
you depends entirely on the specific use-case. By Remaining free size: 0
Note: Ingress SUP region includes Redirect region
default, all TCAM space is already allocated, so you
need to decide where you want to 'steal' TCAM space Egress:
from in order to allocate elsewhere. --------
<snip>

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 66
Router ACL / Port ACL
root@Server~$ ping 172.18.1.100 -c 5000 -W 1 -i 0
Troubleshooting Packet Loss <snip>
5000 packets transmitted, 4886 packets received, 0.2% packet loss,
Using a Port-ACL (PACL) to match
bridged traffic on an L2 switchport
ip access-list 101
N9K-1# show ip access-lists 101 statistics per-entry
IPV4 ACL 101 10 permit icmp 10.0.1.100/32 172.18.1.100/32
statistics per-entry 20 permit ip any any
10 permit icmp 10.0.1.100/32 172.18.1.100/32 [match=5000] ! Apply to server ingress interface
20 permit ip any any [match=323321] interface port-channel101
ip port access-group 101 in

N9K-1

172.18.1.100 N9K-3
Host B Po101
WAN 5000 ICMP Requests
received by N9K-1 10.0.1.100
Host A

Can you tell which direction


Branch Data Center
the packets are getting lost?
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 67
Router ACL / Port ACL
Troubleshooting Packet Loss
Using a Port-ACL (PACL) to match bridged traffic on an L2 switchport
ip access-list 101
statistics per-entry N9K-3# show ip access-lists 101
10 permit icmp 172.18.1.100/32 10.0.1.100/32 IPV4 ACL 101
20 permit ip any any statistics per-entry
! Apply to server ingress interface 10 permit icmp 172.18.1.100/32 10.0.1.100/32 [match=5000]
interface Ethernet1/14 20 permit ip any any [match=221747]
ip port access-group 101 in

N9K-1
1/14
172.18.1.100 N9K-3
Host B
WAN
5000 ICMP Responses 10.0.1.100
received by N9K-3
Host A

Branch Data Center

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 68
More Tools
• SPAN / ERSPAN, SPAN-on-Drop
• Embedded Logic Analyzer Module (ELAM)
• Flow Tracer*
• VXLAN, DME and KSTACK Consistency
Checkers*
• Streaming Hardware Telemetry
• Flexible Netflow / sFlow

* will be released soon


#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 69
Tools and Supported Products
Summary
Nexus 9000 Nexus 9000
Tool
(broadcom) EX/FX/FX2 (Tahoe/Rocky)
Ethanalyzer yes yes

SPAN to CPU yes1 yes

SPAN on Drop no yes

SPAN Filter yes yes

Consistency Checkers yes Yes

PACL/RACL yes2 yes2

ELAM yes3 Yes

Packet tracer yes no 1 = dMirror feature


2 = TCAM carving needed
3 = Northstar / Donner

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 70
Nexus 9000
… platform of possibilities

#CLUS © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Troubleshooting
Traffic
Forwarding
“It is a capital mistake to theorize
before one has data. Insensibly
one begins to twist facts to suit
theories, instead of theories to
suit facts.”
Sherlock Holmes (A Scandal in Bohemia)

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 73
Troubleshooting Methodology
Application
• Define the problem, understand the impact, Webpage Choppy
Call Drops Slowness Won’t Load Video
and determine the scope of the problem
based on the information gathered. This
helps you to make progress towards
resolution. Impact/Scope

• Perform network-wide assessment. Check


Network-Wide
SNMP, syslogs, Netflow data, real-time Assessment
performance/SLA monitoring tools for alerts,
unexpected events, threshold violations etc. Problem
Isolation
• Choose the right tool(s) and troubleshooting
procedure(s) to isolate the problem at a
granular level and diagnose to achieve a fast
resolution.

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 74
Agenda • Nexus 9000 Hardware Forwarding – Refresher
• Path-of-the-Packet Troubleshooting
• Control-Plane Traffic
• Introduction • Data-Plane Traffic
• Monitor and Health-Check
• Troubleshooting Tools
• Troubleshooting Traffic Forwarding
• Common Failure Scenarios and Recommendations
• Summary and Take-Aways

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 75
Nexus 9000 Product Family Slice 0
900G
LSE Slice Interconnect
• 1.8T chip – 2 slices of 9x 100G each Slice 1
• X9700-EX modular linecards; 9300-EX TORs 900G
Slice 0
LS1800FX LSE – 18x 100G 1.8T
• 1.8T chip – 1 slice of 18x 100G with MACSEC
Slice 0
• X9700-FX modular linecards; 9300-FX TORs 1.8T LS1800FX – 18x 100G
LS3600FX2 Slice Interconnect
Slice 1 Slice 0 Slice 1
• 3.6T chip – 2 slices of 18x 100G with MACSEC + 1.6T 1.6T
1.8T
CloudSec
Slice Interconnect
• 9300-FX2 TORs LS3600FX2 – 36x 100G
Slice 2 Slice 3
S6400 1.6T 1.6T
• 6.4T chip – 4 slices of 16x 100G each
S6400 – 64x 100G
• 9332C, 9364C TOR; E2 fabric modules
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 76
Nexus 9000 Traffic Forwarding
Slice Slice
Ingress Slice 1 Interconnect
• Self-contained forwarding
Egress Slice 1
complex controlling subset of
ports on single ASIC
• Separated into Ingress and Ingress Slice 2
Egress functions Egress Slice 2

• Ingress of each slice connected


to egress of all slices
Ingress Slice n
• Slice interconnect provides non- Egress Slice n
blocking any-to-any
interconnection between slices

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 77
Slice Forwarding Path
(S6400 /
LS3600FX2 only)
Slice
Ingress → SSX
Ingress Forwarding Controller

Packet Payload
Ingress Ingress Packet
Packets MAC Parser

Lookup Key
Lookup
Result
Lookup
Pipeline
Replication Slice
Interconnect

Egress Forwarding Controller

Egress
Egress Egress Packet Egress
Buffering / Queuing
Packets MAC Rewrites Policy
/ Scheduling

← Egress

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 78
Ingress Lookup Pipeline
From
Ingress Ingress Forwarding Controller
MAC
Packet To Egress
Parser Slice

Flex
TCAM
Tiles TCAM

Lookup
Result
Lookup Key

Load
Forwarding Ingress
Balancing,
Lookup Classification
AFD / DPP

Flush
Flow Table
Lookup Pipeline
LSE / LS1800FX /
LS3600FX2 only

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 79
Flexible Forwarding Tiles
Flex Tile Flex Tile Flex Tile

• Provide fungible pool of table entries for


lookups Flex Tile Flex Tile Flex Tile
• Number of tiles and number of entries in
each tile varies between ASICs
Flex Tile Flex Tile Flex Tile
• Variety of functions, including:
• IPv4/IPv6 unicast longest-prefix match (LPM)
• IPv4/IPv6 unicast host-route table (HRT)
• IPv4/IPv6 multicast (*,G) and (S,G)
Forwarding Lookup
• MAC address/adjacency tables
• ECMP tables
Recommended session: BRKARC-3222 – Cisco Nexus 9000 Architecture
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 80
Path of the Packet
Control-Plane Traffic - Setup
Nexus 9508 with 97XX modules
N9508-A# show mod
Mod Ports Module-Type Model Status
--- ----- ------------------------------------- --------------------- --------- Modules
2 52 48x10/25G + 4x40/100G Ethernet Module N9K-X97160YC-EX ok
3 32 32x100G Ethernet Module N9K-X9732C-EX ok
5 36 36x100G Ethernet Module N9K-X9736C-EX ok
22 0 8-slot Fabric Module N9K-C9508-FM-E ok
23 0 8-slot Fabric Module N9K-C9508-FM-E ok Fabric Modules
24 0 8-slot Fabric Module N9K-C9508-FM-E ok
26 0 8-slot Fabric Module N9K-C9508-FM-E ok
27 0 Supervisor Module N9K-SUP-B active *
28 0 Supervisor Module N9K-SUP-B ha-standby
29 0 System Controller N9K-SC-A active
30 0 System Controller N9K-SC-A standby

Supervisor Engines System Controllers

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 85
Path of the Packet Process-level Debug
Control-Plane Traffic
OSPF BGP PIM

SysMgr SVI 102


Ethanalyzer 192.168.15.26/30
NX-OS

IP Stack
Packet Netstack
PktMgr Debug Manager
NIC Supervisor
Inband Engine
Inband Stats
System
SC-A SC-B*
System Controllers
Controller Stats Interface counters
Fabric Stats FM1 .. FM4
Fabric
Modules
HiGig
ASIC Counters Router-A
Linecard
CoPP Stats Nexus9500 Eth3/1 192.168.15.25/30
* Standby System Controller
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 86
Path of the Packet
Control-Plane Traffic: Interface Counters
N9508-A# show interface e3/1
Ethernet3/1 is up
admin state is up, Dedicated Interface
<snip>
RX
0 unicast packets 11 multicast packets 2 broadcast packets
13 input packets 2294 bytes
0 jumbo packets 0 storm suppression bytes
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
0 unicast packets 3 multicast packets 0 broadcast packets
3 output packets 702 bytes
0 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 87
Path of the Packet
Control-Plane Traffic: ASIC Counters
N9508-A# show system internal interface ii3/1/1 counters
Internal Port Statistics for Slot: ii3/1/1 If_Index 0x4a100000
================================================================
<snip>
Mac Pktflow:
Rx Counters:
<snip> <…continued…>
Tx Counters: In Discard: 0x0000000000000000/0
<snip> Giants: 0x0000000000000001/1
Mac Control: Output Errors: 0x0000000000000000/0
Rx Pause: 0x0000000000000000/0 Output Discard: 0x0000000000000000/0
Tx Pause: 0x0000000000000000/0 Bad Proto: 0x0000000000000000/0
Reset: 0x0000000000000000/0 Collision: 0x0000000000000000/0
Mac Errors: Late Collision: 0x0000000000000000/0
Undersize: 0x0000000000000000/0 No Carrier: 0x0000000000000000/0
Runt: 0x0000000000000000/0
Crc: 0x0000000000000000/0
Input Errors: 0x0000000000000001/1
<…continued…>

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 88
Path of the Packet
Control-Plane Traffic: ASIC Counters (for front-panel ports)
N9508-A# show hardware internal interface asic counters mod 3
Important Counters/Drops
--------------- -----------------------------------------------------------------------------
Interface Drop Reasons for the Interface, See below output for detail if any
--------- -----------------------------------------------------------------------------------
|9|9|9|9|9|9|8|8|8|8|8|8|8|8|8|8|7|7|7|7|7|7|7|7|7|7|6|6|6|6|6|6|6 ……… 0|0|0|0|0|0
|5|4|3|2|1|0|9|8|7|6|5|4|3|2|1|0|9|8|7|6|5|4|3|2|1|0|9|8|7|6|5|4|3 ……… 6|5|4|3|2|1
Eth3/1 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|X|.|.|.|. ……… .|.|.|.|.|.
Eth3/2 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|. ……… .|.|.|.|.|.
Eth3/3 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|. ……… .|.|.|.|.|.
Eth3/4 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|. ……… .|.|.|.|.|.
<snip>
Eth3/32 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|. ……… .|.|.|.|.|.

Drop Conditions
--------------- -----------------------------------------------------------------------------
67 : TAHOE Ingress DROP_ACL_DROP

Do “clear hardware internal interface-all asic counters mod <mod#>” to clear the conditions

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 89
Path of the Packet
Control-Plane Traffic: FM and Linecards Connectivity 9508-FM-E Fabric Module

Fabric “show hardware internal ASE2 ASE2


Modules FM22 FM23 FM24 FM26
cpu-mac inband active-
iEth12 fm traffic-to-cpu” reports
HiGig
(in hex) FM used to send
iEth03 32x 100G 32x 100G
traffic to CPU. Use “...
Module 3 Slice 0 Slice 1 Slice 2 Slice 3 traffic-from-cpu” for FMs have Tahoe-
traffic in reverse direction. Lacrosse ASICs
Front Panel Ports
“show system internal fabric
N9508-A# show system internal fabric connectivity mod 3 connectivity module 22”
Internal Link-info Linecard slot:3 reports connectivity from
Fabric Modules’ perspective.
--------------------------------------------------------------------
LC-Slot LC-Unit LC-iEthLink MUX FM-Slot FM-Unit FM-iEthLink
--------------------------------------------------------------------
3 0 iEth03 - 22 0 iEth12
slice # 3 0 iEth05 - 22 1 iEth44
<snip>
3 1 iEth11 - 22 0 iEth11
3 1 iEth13 - 22 1 iEth43
<snip>
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 92
Path of the Packet
Control-Plane Traffic: Linecards Drops (on HiGig links towards Fabric Modules)
N9508-A# show hardware internal fabric interface asic counters mod 3
Important Counters/Drops
--------------- -----------------------------------------------------------------------------
Interface Drop Reasons for the Interface, See below output for detail if any
--------- -----------------------------------------------------------------------------------
|9|9|9|9|9|9|8|8|8|8|8|8|8|8|8|8|7|7|7|7|7|7|7|7|7|7|6|6|6|6|6|6|6 ……… 0|0|0|0|0|0
|5|4|3|2|1|0|9|8|7|6|5|4|3|2|1|0|9|8|7|6|5|4|3|2|1|0|9|8|7|6|5|4|3 ……… 6|5|5|3|2|1
iEth1 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|. ……… .|.|.|.|.|.
iEth2 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|. ……… .|.|.|.|.|.
iEth3 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|. ……… .|.|.|.|.|.
iEth4 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|. ……… .|.|.|.|.|.
<snip>
iEth32 |.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|. ……… .|.|.|.|.|.

Drop Conditions
--------------- -----------------------------------------------------------------------------

Do “clear hardware internal fabric interface-all asic counters mod <mod#>” to clear the conditions

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 93
Sup Engine

Path of the Packet NX-OS

Control-Plane Traffic: Inband Counters


IP Stack
N9508-A# show hardware internal cpu-mac inband counters
Packet
eth2 Link encap:Ethernet HWaddr 00:00:00:01:1b:01 Manager
BROADCAST MULTICAST MTU:9400 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 PS-INB
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) Eth2 Eth3
eth3 Link encap:Ethernet HWaddr 00:00:00:01:1b:01
UP BROADCAST RUNNING MULTICAST MTU:9400 Metric:1
RX packets:8484226 errors:0 dropped:0 overruns:0 frame:0
TX packets:4523271 errors:0 dropped:0 overruns:0 carrier:0 SC-A
collisions:0 txqueuelen:1000
RX bytes:860671333 (820.8 MiB) TX bytes:493276319 (470.4 MiB)
ps-inb Link encap:Ethernet HWaddr 00:00:00:01:1b:01
UP BROADCAST RUNNING MULTICAST MTU:9400 Metric:1 Check the Eth counters of
RX packets:14327 errors:0 dropped:0 overruns:0 frame:0 the Inband Interface. PS-INB
TX packets:14312 errors:0 dropped:0 overruns:0 carrier:0
is Pseudo Inband Interface
collisions:0 txqueuelen:1000
RX bytes:38890552 (37.0 MiB) TX bytes:37871460 (36.1 MiB)

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 94
Path of the Packet
<… continued …>
Missed packets (FIFO overflow) 0
Single collisions .............. 0
Control-Plane Traffic: Inband Statistics Excessive collisions ........... 0
Multiple collisions ............ 0
N9508-A# show hardware internal cpu-mac inband stats Late collisions ................ 0
<snip> Collisions ..................... 0
eth3 stats: Defers ......................... 0
RMON counters Rx Tx Tx no CRS ..................... 0
----------------------+------------+-------------------- Carrier extension errors ....... 0
total packets 8406058 4481386 Rx length errors ............... 0
<snip> FC Rx unsupported .............. 0
65-127 bytes packets 8391840 4470748 Rx no buffers .................. 0
<snip> Rx undersize ................... 0
broadcast packets 15 561531 Rx fragments ................... 0
multicast packets 0 0 Rx oversize .................... 0
<snip> Rx jabbers ..................... 0
Error counters Rx management packets dropped .. 0
--------------------------------+-- Tx TCP segmentation context .... 0
CRC errors ..................... 0 Tx TCP segmentation context fail 0
Alignment errors ............... 0 Rate statistics
Symbol errors .................. 0 -----------------------------+---------
Sequence errors ................ 0 Rx packet rate (current/peak) 160 / 1254 pps
Good health-check.
RX errors ...................... 0 Tx packet rate (current/peak) 112 / 889 pps
Set a baseline!!
<… continued …> <snip>

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 95
Path of the Packet
Data-Plane
Issue: Communication failure for an L3 Flow

Eth1/18
Network
N9K-C92160-YC
Eth1/1 10.200.1.1
10.10.5.3/24 172.16.23.23
E865.4994.8C3F
547F.EE5D.41FC

1. Check SW / HW FIB 4. Check SW / HW Adjacency


2. Check Routing Table in the ASIC 5. Check Adjacency Table in the ASIC
3. Check route programmed in the ASIC, 6. Check Adjacency programmed in the ASIC,
and find Adjacency index and verify re-write mac and egress interface

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 104
1
Path of the Packet N9K-C92160-YC
eth1/18
Network

Data-Plane: L3 Flow – Check SW/HW FIB 10.200.1.1 172.16.23.23

Check Forwarding Information Base (FIB) in Software


N9K-C9216O-YC# show ip route 172.16.23.23/32
IP Route Table for VRF "default"
<snip>
172.16.23.23/32, ubest/mbest: 1/0
*via 10.200.1.1, Eth1/18, [90/128576], 00:01:10, eigrp-TEST, internal

Check Forwarding Information Base (FIB) in Hardware make sure the results are matching

N9K-C9216O-YC# show forwarding route 172.16.23.23/32


slot 1
=======
IPv4 routes for table default/base
------------------+---------------+-----------------+------------+-----------------+
Prefix | Next-hop | Interface | Labels | Partial Install |
------------------+---------------+-----------------+------------+-----------------+
172.16.23.23/32 10.200.1.1 Ethernet1/18

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 105
2
Path of the Packet Table #1 is for default VRF. To
find table number for other
Data-Plane: L3 Flow – Route Programmed in ASIC VRFs, use “show hardware
internal tah l3 v4host” command
Check routing table in the ASIC
module-1# show hardware internal tah l3 172.16.23.23/32 table 1
DLeft location: 0x182604
FP location : 0/0/0x1826 the physical interface where
*Flags: the packet is going to be sent
CC=Copy To CPU, SR=SA Sup Redirect, out. “show hardware internal
DR=DA Sup Redirect, TD=Bypass TTL Dec, tah interface ethernet 1/18 |
DC=SA Direct Connect,DE=Route Default Entry, inc src” should report “69”
LI=Route Learn Info
HW Loc | Ip Entry | VRF | MPath | NumP | Base/L2ptr |CC|SR|DR|TD|DC|DE|LI|
-----------|----------------|---------|-------|------|------------|--|--|--|--|--|--|--|
0/0/0x1826 | 172.16.23.23 | 1 | No | 0 | 0x90003 | | | | | | | |

AdjId | FP | BD | DMac | DstIdx | DstIsPtr |


-----------|--------------|-------|-------------------|--------|----------|
0x90003 | 9/0/0x3 | 4122 | e8:65:49:94:8c:3f | 69 | No |
Location for the
Adj entry ID BD/VLAN and Destination
entry in the ASIC
MAC for egress traffic
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 106
3
Path of the Packet
Data-Plane: L3 Flow – Route Programmed in ASIC
Location for the entry in the ASIC
Check route programmed in Tahoe ASIC
module-1# debug hardware internal dav dump asic 0 slice 0 fp 0 table 0:tah_dav_fpx_fptile 0x1826
field-per-line | grep v4_hrt
tile_entry_v4_hrt_info_nonecmp_fields_ecmp=0x00000000
tile_entry_v4_hrt_info_nonecmp_fields_l2ptr=0x00020003 Index for Adjacency
tile_entry_v4_hrt_info_nonecmp_fields_padfield=0x00000000 Table entry
tile_entry_v4_hrt_info_ecmp_fields_ecmpinfo_base=0x00020003
tile_entry_v4_hrt_info_ecmp_fields_ecmpinfo_num_paths=0x00000000
tile_entry_v4_hrt_info_ecmp_fields_ecmpinfo_hash_sel=0x00000000
tile_entry_v4_hrt_info_ecmp_fields_ecmp=0x00000000
tile_entry_v4_hrt_vld=0x00000001
tile_entry_v4_hrt_ip_type=0x00000000
tile_entry_v4_hrt_vrf_type=0x00000000
tile_entry_v4_hrt_vrf=0x00000001
tile_entry_v4_hrt_host_ip=0xAC101717 172.16.23.23 in hex
tile_entry_v4_hrt_padfield=0x00000000

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 107
4
Path of the Packet
Data-Plane: L3 Flow – Adjacency Programmed in ASIC
next-hop IP address
Adjacency Information in SW
N9K-C9216O-YC# show ip adjacency 10.200.1.1
<snip>
Destination
IP Adjacency Table for VRF default
mac-address
Total number of entries: 1
Address MAC Address Pref Source Interface Egress Interface
10.200.1.1 e865.4994.8c3f 50 arp Ethernet1/18

Adjacency Information in HW make sure the results are matching

N9K-C9216O-YC# show forwarding adjacency 50.1.1.1


slot 1
=======
IPv4 adjacency information
next-hop rewrite info interface
-------------- --------------- -------------
10.200.1.1 e865.4994.8c3f Ethernet1/18

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 108
5&6
Path of the Packet
Data-Plane: L3 Flow – Adjacency Programmed in ASIC
Entry in the hardware Adjacency Table From Step #2

module-1# show hardware internal tah l3 adjacency 0x90003


AdjId | FP | BD | DMac | DstIdx | DstIsPtr |
--------|----------|-------|-------------------|--------|----------|
0x90003 | 9/0/0x3 | 4122 | e8:65:49:94:8c:3f | 69 | No |

Entry in the hardware entry for specific adjacency


module-1# debug hardware internal dav dump asic 0 slice 0 fp 9
table 0:tah_dav_fpx_fptile 0x2003 field-per-line | grep l2_entry_mac
tile_entry_l2_entry_mac_entry_mackey_vld=0x00000001
0x2003 reported as Adj entry
tile_entry_l2_entry_mac_entry_mackey_fid_type=0x00000000
by the route programmed in
tile_entry_l2_entry_mac_entry_mackey_fid_vld=0x00000001
the HW (step 4)
tile_entry_l2_entry_mac_entry_mackey_fid=0x0000101a
tile_entry_l2_entry_mac_entry_mackey_mac=0x0000e865:0x49948c3f
tile_entry_l2_entry_mac_entry_entry_type=0x00000000
Re-write destination mac-addr
tile_entry_l2_entry_mac_entry_intf=0x00000045
tile_entry_l2_entry_mac_entry_learn_info=0x00000002
Interface: 0x45 = 69 = Eth1/18

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 109
Common Failure
Scenarios and
Recommendations
Agenda
• Introduction
• Monitor and Health-Check
• Troubleshooting Tools
• Troubleshooting Traffic Forwarding
• Common Failure Scenarios and Recommendations
• Summary and Take-Aways
• Layer 1 Issues
• Dynamic Routing over vPC

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 111
Common Failure Scenarios
Layer 1 Issues - Symptoms
• Link-level issues
• Port flaps
• Port not coming up

• Transceiver Issues
• Transceiver not recognized
• Breakout cables not working
• Digital Optical Monitoring (DOM) Info missing

• Corrupted Frames / CRC Errors


• Devices connected to Nexus9000 report CRC frames received

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 112
Link flaps or Port not coming up
Recommendation
• Connect the cable/media at both ends, insert the transceivers
completely and through following commands verify speed, duplex,
capabilities, supported modes and DOM values.
show interface eth x/y transceiver details
show interface eth x/y capabilities
show interface brief - check for the interface tuple display and others
show interface eth x/y status
• Enable auto-negotiation at both ends. Yes, we need it!
• Check transparent device or circuit in the middle Use attach module <mod#>
and show system internal
• Find who initiated link-down event first port-client event-history port
<port#> to find the events at
microsecond granularity

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 113
Transceiver Issues
Recommendation

Verify the transceiver - to make sure it is compatible one. Check the


compatibility matrix at https://tmgmatrix.cisco.com/
N9508# show module 2
Mod Ports Module-Type Model Status
--- ----- ------------------------------------- --------------------- ---------
2 52 48x10/25G + 4x40/100G Ethernet Module N9K-X97160YC-EX ok

N9508# show interface eth2/6 transceiver


Ethernet2/6 Results of these two
transceiver is present commands help to check
type is SFP-H10GB-CU3M the transceiver compatibility
name is CISCO-MOLEX
<snip>
nominal bitrate is 10300 MBit/sec
Link length supported for copper is 3 m
cisco id is 3
cisco extended id number is 4
cisco part number is 37-0961-03
cisco product id is SFP-H10GB-CU3M

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 114
Frame Corruption / CRC Issues Eth1/18 transmits stomped packets
Locate Frame Corruption N9K-C93180YC-FX# show interface counters errors
----------------------------------------------------------------------
N9K-C93180YC-FX# show switching-mode
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
Configured switching mode: Cut through
----------------------------------------------------------------------
Module Number Operational Mode
Eth1/18 0 0 9 0 0 0
1 Cut-Through
N9K-C93180YC# show interface e1/18
N9K-C93180YC#
<snip>
TX
cut-through (default) switching 9 output error 0 collision 0 deferred 0 late collision
<snip>
mode will propagate CRC errors. N9K-C93180YC#

N9K-C93180YC-FX#show hardware internal errors module 1


Eth1/1 |---------------------------------------------------------|
| Device:Homewood Role:MAC Mod: 1 | Eth1/1. Use
| Last cleared @ Fri Apr 19 15:19:24 2019
Traffic N9K-C93180YC-FX “show internal
| Device Statistics Category :: ERROR
flow
|---------------------------------------------------------| hardware-map”
Eth1/18 Instance:0 and look at
ID Name Value Ports MacId and
CRC -- ---- ----- ----- MacSP columns
<snip>
1048581 Interface Inbound CRC Error Stomped 000167 4:0
<snip>

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 115
Frame Corruption / CRC Issues
Recommendation
• Track the Xmit-Err and Stomped counters to find the source of
frame corruption
• Check for transceiver failures or power level issues. Should
transceiver and/or fiber be swapped out?
• Check for ports/link reporting excessive flaps

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 116
Dynamic Routing over vPC
Fail to Build Routing Adjacency
• Dynamic Adjacencies with vPC Peers - B

Routing protocol control packets (hellos)


Po2

exchanged between Router A and SVI Y


via SW1 will be dropped (TTL 1
decremented with a single hop). So
SVI-X SVI-Y
Sw1 Sw2 P

Router A cannot form routing adjacency


with SVI Y.
Po1
• Loop Avoidance Rule for Traffic over Peer
Link - Data Traffic received on the peer
link cannot be forwarded and routed on A
P

a vPC.

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 117
Dynamic Routing over vPC
Fail to Build Routing Adjacency – Solution
• Traffic sent over the peer link will not B
Po2
have the TTL decremented.
• The peer-gateway feature allows the
vPC peer (SVI X) to forward packets
SVI-X SVI-Y
P
P
Sw1 Sw2
on behalf of other peer (SVI-Y). This
save bandwidth by avoiding traffic over
the peer link. Po1

Sw1
• With peer-gateway and peer-router* Sw2

L3 devices can now form peering A


P

adjacency with both vPC peers. A

* Starting from 7.0(3)I5(1) release

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 118
Dynamic Routing over vPC
Supported Designs

Sw1 Sw2 Sw1 Sw2


Router peers Router peers
with both vPC
P
P P P with both vPC
peers. It is peers. It is done
done over the over an STP
vPC peer-link links using a
P Router P Router vPC VLAN.

P P
Legend:
Sw1 Sw2
Router
N9K with IPv4 unicast
P
P Router/ Firewall

Routing Protocol Peer


Orphan device peering with both vPC peers over vPC VLAN.
P

Dynamic Peering Relationship

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 119
Dynamic Routing over vPC
Supported Designs (Continued)

P P

Sw1 Sw2
Sw1 Sw3
P P

Sw2 Sw4
P P

P P
Router1 Router2
DC1 DC2
Both the Routers peer with both
vPC peers. It is done over vPC Each Nexus device peers with
peer-link and using vPC VLAN. two vPC peers. It is done over
Data Center Interconnect (DCI)

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 120
Common Failure Scenarios
To reach the destination of greater stability…
• Don’t overlook layer 1 configurations or issues
• Implement vPC recommendations and best practices

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 124
Summary
&
Take-Aways
Summary
• Higher gate density and bandwidth achievements transitioning
hardware architecture and functions consolidation. Nexus 9000 is
at the core of these transitions, and flexible to fit your datacenter
design. It is platform of possibilities!!
• Lots of avenues to monitor health of the devices and their
resources usage.
• Supports several simple and powerful tools. Familiarize yourself.
• Understand path-of-the-packet for various traffic flows. It helps to
get the facts before building a theory.
• Don’t overlook Layer1 settings or Dynamic Routing over vPC
configuration. It can impact stability.
#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 126
Take-Aways
Nexus 9000 have RICH SET OF CLIs, FEATURES
and TOOLS that are developed keeping all of you
in mind.
network downtime

Closely monitoring devices’ health, and knowing


troubleshooting techniques significantly reduce
network downtime

Wealth of knowledge shared in this session


ENABLES AND EMPOWERS EACH ONE OF YOU
to achieve the goals of your organization.

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 127
Nexus 9000
… platform of possibilities

#CLUS © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
References and Useful Links
• Nexus 9000 Configuration Guide
• Transceiver Compatabilty Matrix
• FEX and Nexus9000 Ports Compatability
• CLI Analyzer – A tool at Cisco.com for Nexus Diagnostics
• Intelligent CAM Analytics and Machine Learning (iCAM) – Config Guide
• Nexus 3000/9000 Series Telemetry Sources
• Cisco Nexus Data Broker – Data sheets and literature
• Nexus 9000 Programmability Guide
• Cisco Nexus 3000/9000 NX-API REST SDK User Guide and API Reference
• Open NX-OS Programmabiity – User Guide
• Nexus 9000 GitHub Repository

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 129
Complete your
online session • Please complete your session survey
evaluation after each session. Your feedback
is very important.
• Complete a minimum of 4 session
surveys and the Overall Conference
survey (starting on Thursday) to
receive your Cisco Live water bottle.
• All surveys can be taken in the Cisco Live
Mobile App or by logging in to the Session
Catalog on ciscolive.cisco.com/us.
Cisco Live sessions will be available for viewing
on demand after the event at ciscolive.cisco.com.

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 130
Continue your education

Demos in the
Walk-in Labs
Cisco campus

Meet the engineer


Related sessions
1:1 meetings

#CLUS BRKDCN-3020 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 131
Thank you

#CLUS
#CLUS

You might also like