You are on page 1of 29

#CLUS

Troubleshooting ARP
storms on the
Nexus7k

Ruvin Conganige
@Ruvin_Anthony
CTHDCN-2303

#CLUS
Cisco Webex Teams
Questions?
Use Cisco Webex Teams to chat
with the speaker after the session

How
1 Find this session in the Cisco Live Mobile App
2 Click “Join the Discussion”
3 Install Webex Teams or go directly to the team space
4 Enter messages/questions in the team space

Webex Teams will be moderated cs.co/ciscolivebot#CTHDCN-2303


by the speaker until June 16, 2019.

#CLUS © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 3
Who are we
We are Technical Consulting Engineers from the Customer
Experience (CX) Support Services organization, also known
as TAC

What’s our Job Responsibility


We assist customers to resolve their technical issues 24X7

Where you can find us during CiscoLive


We are available at the Technical Solution Clinic (TSC) -
Upper Level, Sails Pavilion. No appointment required
Agenda
• Problem Symptoms
• Data collection
• Analyzing Data
• Fix for the problem
• Conclusion
• Q &A

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Problem Symptoms
Instability of the control plane at frequent intervals

2019 Feb 17 12:57:05 Nexus7700 %OSPFV3-5-ADJCHANGE: ospfv3-20 [8777] Nbr 172.16.118.244 on port-channel4 went INIT

2019 Feb 17 12:57:05 Nexus7700 %OSPF-5-ADJCHANGE: ospf-10 [8778] Nbr 172.16 .118.168 on port-channel3 went INIT

2019 Feb 17 12:57:05 Nexus7700 %OSPFV3-5-ADJCHANGE: ospfv3-20 [8777] Nbr 172.16.118.244 on port-channel4 went EXSTART

2019 Feb 17 12:57:05 Nexus7700 %BFD-5-SESSION_STATE_DOWN: BFD session 1090519060 to neighbor gone down

2019 Feb 17 12:57:05 Nexus7700 %BFD-5-SESSION_REMOVED: BFD session to neighbor 10.80.1.62 on vlan 6 has been removed

2019 Feb 17 12:57:05 Nexus7700 %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 1, VPC peer keep-alive receive has failed

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 6
Validate configuration
Plan of Action
Check for any interface drops/errors
show queuing interface | inc Ethernet|Drop
Show interface counter errors

Check System resources


show system resources/show process cpu sort

Check Control Plane Policer for any drops


show policy-map interface control-plane

Check Inband Driver Stats


show hardware internal cpu-mac inband counters/stats
Ethanalyzer captures on the cpu level
sh system internal pktmgr internal vdc inband

Check Drops at Packet Manager Level

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 7
Data collection

Nexus7700 # sh policy-map interface | in Ethernet|dropped|pkts Nexus7700 #show system resources

Ethernet1/1 Load average: 1 minute: 3.78 5 minutes: 3.46 15

Service-policy (queuing) input: default-8e-4q8q-in-policy minutes: 3.51


Processes : 31346 total, 4 running
SNMP Policy Index: 301989931
CPU states : 8.30% user, 10.40% kernel, 81.28% idle
Class-map (queuing): 8e-4q8q-in-q1 (match-any)
Memory usage: 32939304K total, 7202944K used,
queue-limit percent 10 25736360K free

bandwidth percent 49 Current memory status: OK

queue dropped pkts : 0

queue dropped bytes : 0

queue transmit pkts: 622945 queue transmit bytes: 198261063

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
Data collection

Nexus7700 # show policy-map interface control-plane

Nexus7700 # show policy-map interface control-plane | beg p-class-critical

class-map copp-system-p-class-critical (match-any)

match access-group name copp-system-p-acl-bgp6

match access-group name copp-system-p-acl-lisp

match access-group name copp-system-p-acl-ospf

module 3:

conformed 39921281 bytes,

5-min offered rate 38 bytes/sec

peak rate 1703 bytes/sec at Mon Feb 17 04:29:04 2019

violated 0 bytes, 5-min violate rate 0 bytes/sec peak rate 0 bytes/sec

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
Data collection

Nexus7700 # show policy-map interface control-plane | beg p-class-normal

class-map copp-system-p-class-normal (match-any)


match access-group name copp-system-p-acl-mac-dot1x
match protocol arp
set cos 1
police cir 680 kbps bc 375 ms
conform action: transmit violate action: drop

module 1:

conformed 1917945316 bytes, 5-min offered rate 1919 bytes/sec


peak rate 4068 bytes/sec at Sun Feb 17 12:57:05 2019
violated 1141300 bytes, peak rate 2998 bytes/sec at Sun Feb 17 12:57:05 2019

module 3:

conformed 1065321624 bytes, 5-min offered rate 1017 bytes/sec


violated 71880 bytes, peak rate 239 bytes/sec at Sun Feb 17 12:57:05 2019

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
Data collection
Nexus7700 # show hardware internal cpu-mac inband counters

eth1 Link encap:Ethernet HWaddr 28:52:61:f0:fb:0c


cliet uuid 268: IP MTS QUEUE
UP BROADCAST RUNNING MTU:1500 Metric:1
IP Protocol Packets to or from the Sup
RX packets:158264 errors:0 dropped:3335 overruns:0 frame:0 Glean Adjacency Packets
TX packets:101576 errors:0 dropped:0 overruns:0 carrier:0 (ie ARP not programmed yet)
collisions:0 txqueuelen:1000
RX bytes:18989157 (18.1 MiB) TX bytes:10603183 (10.1 MiB)

Show system internal mts sup sap 278


description
Nexus7700 # show system internal pktmgr client ARP MTS SAP

Client uuid: 268, 4 filters, pid 7881 show system internal adjmgr client index
Filter 1: EthType 0x0806,
Rx: 136090542, Drop: 2463 Protocol Name Alias UUID Index
Ctrl SAP: 278 bfd bfd 706 7
Total Data tags : 2 Data tag 1: 131072 Data tag 2: 131073 netstack Static 545 6
Total Rx: 136097062, Drop: 2463, Tx: 124834078, Drop: 0 IPv4 Static 268 4
arp arp 268 3
IP IP 545 2

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 11
Data collection
Nexus7700 # show process cpu sort

PID Runtime(ms) Invoked uSecs 1Sec Process


----- ----------- -------- ----- ------ -----------
5050 56 28 2012 49.0% arp
5052 165 90 1835 15.0% netstack
4087 875 2196 8171 0 1.0% platform

Nexus7700 # Ethanalyzer local interface inband display-filter arp limit-captured-frames 0

2019 Feb 17 12:57:05.048950 00:1a:64:db:9f:a2 -> 03:bf:0a:f0:03:23 ARP Who has 10.240.3.35? Tell 10.240.3.3
2019 Feb 17 12:57:05.048954 00:1a:64:db:9f:a2 -> 03:bf:0a:f0:03:23 ARP Who has 10.240.3.35? Tell 10.240.3.3
2019 Feb 17 12:57:05.049075 00:1a:64:db:9f:a2 -> 03:bf:0a:f0:03:23 ARP Who has 10.240.3.35? Tell 10.240.3.3
2019 Feb 17 12:57:05.049079 00:1a:64:db:9f:a2 -> 03:bf:0a:f0:03:23 ARP Who has 10.240.3.35? Tell 10.240.3.3
2019 Feb 17 12:57:05.049200 00:1a:64:db:9f:a2 -> 03:bf:0a:f0:03:23 ARP Who has 10.240.3.35? Tell 10.240.3.3
2019 Feb 17 12:57:05.049204 00:1a:64:db:9f:a2 -> 03:bf:0a:f0:03:23 ARP Who has 10.240.3.35? Tell 10.240.3.3
2019 Feb 17 12:57:05.049324 00:1a:64:db:9f:a2 -> 03:bf:0a:f0:03:23 ARP Who has 10.240.3.35? Tell 10.240.3.3
2019 Feb 17 12:57:05.049328 00:1a:64:db:9f:a2 -> 03:bf:0a:f0:03:23 ARP Who has 10.240.3.35? Tell 10.240.3.3
2019 Feb 17 12:57:05.049450 00:1a:64:db:9f:a2 -> 03:bf:0a:f0:03:23 ARP Who has 10.240.3.35? Tell 10.240.3.3
2019 Feb 17 12:57:05.049454 00:1a:64:db:9f:a2 -> 03:bf:0a:f0:03:23 ARP Who has 10.240.3.35? Tell 10.240.3.3

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 12
Conclusion From Data collected

show ip arp summary vrf all

IP ARP Table - Adjacency Summary


Resolved : 14523
Incomplete : 15186 (Throttled : 8192)
Unknown : 0
Total : 29709

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 13
Conclusion From Data collected

Special case:
• In a such a scenario where ARP storm is created on a vlan where vlan interface (SVI) for that vlan
is not present on the switch.
• For instance, these APR packets are not punted to the CPU (process). However, they are still
subject to control plane policy in hardware on the line card level.
• These ARPs will still obstruct valid ARP destined to the CPU on other vlans which can lead to
disrupt the control plane traffic that relies on ARP.
• This can be still identified by using below commands

sh policy-map interface control-plane class copp-system-p-class-normal | in violated|module


sh interface | in Ethernet|broadcast|RX

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 14
Mitigation of the Issue

Background Information

• Packets are forwarded to Supervisor if DIP’s ARP is not resolved


• SUP will initiate ARP request for DIP
• Receiving Line card send all incoming packets to SUP till ARP is resolved
• SUP generates ARP request indefinitely until ARP is resolved
• Hardware Rate Limiter called “Glean” place in order to protect CPU
• Single IP can consume whole rate limiter and deny legitimate IP's access to the CPU
• To address this scenario hardware ip glean throttle was created

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
Mitigation using Glean Throttle

• Single packet is sent to CPU of given flow with “hardware ip glean throttle”
• A single packet is enough to generate ARP request
• Software adds /32 drop adjacency in hardware preventing excess packets to CPU
• Drop adjacency is installed for short period of time and is configurable
• After timer expires , one packet is again sent to CPU and process repeat
• The number of entries installed in this fashion are limited to 1000 and are configurable
• This limit of 1000 is to limit the impact on Routing Information Base (RIB) table size

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
Glean Throttle- Example

Server with IP 172.28.191.200 is down . However Line card receiving traffic for this server.

Nexus7700 # show ip route vrf VRF_ABC 172.28.191.200

IP Route Table for VRF "VRF_ABC"


'*' denotes best ucast next-hop

172.28.191.192/28, ubest/mbest: 1/0, attached


There is no /32 entry
*via 172.28.191.195, Vlan1601, [0/0], 02:01:17, direct

Traffic is sent to the supervisor in order to generate an ARP request

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
Mitigation using Glean Throttle

Nexus7700 # show system internal forwarding vrf VRF_ABC ipv4 route 172.28.191.200 detail
slot 1

RPF Flags legend:


S - Directly attached route (S_Star)
V - RPF valid
M - SMAC IP check enabled
G - SGT valid
E - RPF External table valid
172.28.191.192/28 , sup-eth2

Dev: 0 , Idx: 0x65fb , Prio: 0x8487 , RPF Flags: VS , DGT: 0 , VPN: 9


RPF_Intf_5: Vlan1601 (0x19 )
AdjIdx: 0x5a , LIFB: 0 , LIF: sup-eth2 (0x1fe1 ), DI: 0xc01
DMAC: 0000.0000.0000 SMAC: 0000.0000.0000

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
Mitigation using Glean Throttle

Nexus7700 # show hardware rate-limiter

Module: 1

R-L Class Config Allowed Dropped Total

---------------------------------------------------------

L3 mtu 500 0 0 0

L3 ttl 500 0 0 0

L3 glean 100 3326 3190 6516

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
Mitigation using Glean Throttle

Nexus7700 # show ip route vrf VRF_ABC 172.28.191.200

IP Route Table for VRF "VRF_ABC"


'*' denotes best ucast next-hop
An adjacency is installed in
172.28.191.200/32, ubest/mbest: 1/0, attached the RIB

*via 172.28.191.200, Vlan1601, [250/0], 00:01:37, am

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Mitigation using Glean Throttle

Nexus7700 # show system internal forwarding vrf VRF_ABC ipv4 route 172.28.191.200 detail

slot 1

RPF Flags legend:


S - Directly attached route (S_Star)
V - RPF valid
M - SMAC IP check enabled
G - SGT valid
E - RPF External table valid
172.28.191.200/32 , Drop

Dev: 0 , Idx: 0x65fb , Prio: 0x8487 , RPF Flags: VS , DGT: 0 , VPN: 9


RPF_Intf_5: Vlan1601 (0x19 )
AdjIdx: 0x5a , LIFB: 0 , LIF: Drop (0x1fe1 ), DI: 0xc01
DMAC: 0000.0000.0000 SMAC: 0000.0000.0000

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
Mitigation using Glean Throttle

Nexus7700 # show hardware rate-limiter

Module: 1

R-L Class Config Allowed Dropped Total

----------------------------------------------------------
Hardware Rate limiter
L3 mtu 500 0 0 0 does not see drops
L3 ttl 500 0 0 0

L3 glean 100 0 0 0

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
Helpful commands and information

• sh queuing interface ethernet 1/25 stats | in dropped

• sh interface | in Ethernet|discard

• sh system resources

• sh processes cpu sort | ex 0.0

• sh ip arp summary vrf all

• Clear copp stats

• sh policy-map interface control-plane | in class|violate|module

• sh hardware internal cpu-mac inband stats

• sh hardware internal cpu-mac inband counters

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
Helpful commands and information

• sh system internal pktmgr client

• show system internal adjmgr client index

• show system internal mts buffers summary

• sh system internal mts sup sap <#> description

• ethanalyzer local interface inband display-filter arp limit-captured-frames

More information on ethanlyzer

https://www.cisco.com/c/en/us/support/docs/switches/nexus-7000-series-switches/116136-trouble-
ethanalyzer-nexus7000-00.html

More information on Glean throttling

https://www.cisco.com/c/en/us/support/docs/switches/nexus-7000-series-switches/200677-Nexus-7000-
Understanding-hardware-ip-g.html

#CLUS CTHDCN-2303 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
Cisco Webex Teams
Questions?
Use Cisco Webex Teams to chat
with the speaker after the session

How
1 Find this session in the Cisco Live Mobile App
2 Click “Join the Discussion”
3 Install Webex Teams or go directly to the team space
4 Enter messages/questions in the team space

Webex Teams will be moderated cs.co/ciscolivebot#BRKXXX-xxxx


by the speaker until June 16, 2019.

#CLUS © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 25
Complete your
online session • Please complete your session survey
evaluation after each session. Your feedback
is very important.
• Complete a minimum of 4 session
surveys and the Overall Conference
survey (starting on Thursday) to
receive your Cisco Live water bottle.
• All surveys can be taken in the Cisco Live
Mobile App or by logging in to the Session
Catalog on ciscolive.cisco.com/us.
Cisco Live sessions will be available for viewing
on demand after the event at ciscolive.cisco.com.

#CLUS BRKACI-3456 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
Continue your education

Demos in the
Walk-in labs
Cisco campus

Meet the engineer


Related sessions
1:1 meetings

#CLUS Session ID © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
Thank you

#CLUS
#CLUS

You might also like