TACSEC-2006 Troubleshooting Cisco Secure Firewall Cluster Failures and Packet Drops - 2023

#CiscoLive
Troubleshooting Cisco
Secure Firewall Cluster
Failures and Packet Drops
Oscar Montoya Torres

Security Team Captain
TACSEC-2006
#CiscoLive
Your presenter
Oscar Montoya Torres

• Mexico City
• Technical Consulting Engineer CX Security
• Security Team Captain
• 4+ years in Cisco TAC
• Focused on Secure Firewall FTD/ASA
#CiscoLive TACSEC-2006 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 3
“Simple can be harder than complex.
You have to work hard to get your
thinking clean to make it simple.”
-Steve Jobs
• Introduction
• Connection Flags and Packet
Flow
• Unit Join Failures
• MTU issues
Agenda • NAT/PAT Failures
• Troubleshooting packet drops
• Q&A
TACSEC-2006 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Introduction
Introduction – What is Cluster?
Clustering lets you group multiple units together as a single logical device
while achieving the increased throughput and redundancy.
Introduction – Requirements
All units in a cluster:
• Must be the same model
• Running the same software version
• Same Firewall mode:
• Physical Appliance: routed/transparent
• Virtual Appliance: routed only
Supported on:
• FPR9300/FPR4100 - Up to 16 units
• Secure Firewall 3100 - Up to 8 units
• vFTD AWS, GCP, Azure – Up to 16 units
• vFTD KVM, Vmware – Up to 4 units
Introduction – How FTD cluster is deployed?
• Spanned EtherChannels: all data links are grouped into one EtherChannel on
the switch side.
• Cluster Control Link (CCL) includes control and data traffic.
• Recommended to have per-unit EtherChannel for link redundancy on CCL.
Introduction – Cluster Node terminology
Control Unit - One node is elected as Control Unit.
Data Unit - The rest of nodes are Data Units.
Following centralized features are handled by the

control unit only:
• Inspections NetBIOS, TFTP, SUNRPC, SQLNET

• Site-to-Site VPN
• Dynamic routing
• Security Group Tag (SGT) is learned by the
Control Unit and populated to Data Units
• FMC communicates with Control Unit for policy
deployments changes, then the configuration is
replicated to data units
Connection Flags
and Packet Flow
Connection Flags - Cluster flow terminology
• Flow Owner: Whichever unit receiving the first packet for a new connection
will become the flow owner.
FTD1# cluster exec show conn add 192.0.2.152 | in 59718
FTD1(LOCAL):******************************************************
TCP VLAN2401 192.0.2.152:59718 VLAN2401 172.16.10.131:1523, idle 0:06:32, bytes 15826, flags UIO
FTD2:*************************************************************
TCP VLAN2401 192.0.2.152:59718 VLAN2401 172.16.10.131:1523, idle 0:06:32, bytes 0, flags Y
• Flow Director: The connection states are backed-up on a different unit

called the director. The director redirect other units to learn which unit is
the flow owner.
FTD1(LOCAL):******************************************************
TCP VLAN2401 192.0.2.152:59718 VLAN2401 172.16.10.131:1523, idle 0:06:32, bytes 15826, flags UIO
FTD2:*************************************************************
TCP VLAN2401 192.0.2.152:59718 VLAN2401 172.16.10.131:1523, idle 0:06:32, bytes 0, flags Y
Connection Flags - Cluster flow terminology
• Forwarder Flow: If a unit receives a packet for a flow that it doesn't own, it
will contact the director for that flow to learn which unit owns the flow.
Once it knows this, it will become a Forwarder, which will then be used to
forward any packets it receives on that connection directly to the owner.
[output omitted]
FTD3:*************************************************************
TCP VLAN2401 192.0.2.152:59718 VLAN2401 172.16.10.131:1523, idle 0:06:32, bytes 0, flags z
• Backup director flow: If the director chosen for the flow is also the owner
then a 'backup director' flow will be created.
[output omitted]
FTD4:*************************************************************
TCP VLAN2401 192.0.2.152:59718 VLAN2401 172.16.10.131:1523, idle 0:06:32, bytes 0, flags y
Packet Flow
1. TCP SYN originates from Client and arrives to FTD1. FTD1 becomes the flow owner. FTD2 is elected the flow director.
2. TCP SYN/ACK packet arrives from Server to FTD3.
3. FTD3 asks the director for the flow owner. FTD3 then forward the packet to the owner.
4. Owner unit sends state update to director unit.
5. The owner reinjects the packet on the interface OUTSIDE and then forwards the packet towards the Client.
6. Any subsequent packets delivered to director unit or forwarder will be forwarded to owner.
Unit Join Failures
Case Study 1: Interface health check failure
• By default, interface monitoring is enabled.
• In case of a link failure, the node is removed from the cluster until the issue is
fixed.
• In the following example FTD1 is out of cluster due to CCL link failure, but the
same issue can also happen due to data link failure.
Switching
CCL failed Infra
CCL Po48 CCL Po48
Troubleshooting commands:
• show cluster history
Data Po1 Data Po1 • show cluster info trace
• scope eth-uplink
scope fabric a
CCL Po48 Data Po1 show port-channel
• connect fxos
show port-channel summary
show port-channel database
show lacp neighbor
show lacp counters interface port-channel ID
show lacp interface ethernet x/x
FTD1 FTD2 FTD3
Control Unit Data Unit Data Unit
show lacp internal event history errors
• show cluster history command:
FTD1> show cluster history
09:01:34 UTC May 2 2023

CONTROL_NODE DISABLED Cluster interface down
09:01:34 UTC May 2 2023

DATA_NODE DATA_NODE Event: Broadcast announce drop message
to all units with reason
CLUSTER_QUIT_REASON_MASTER_UNIT_HC
09:01:34 UTC May 2 2023
DATA_NODE DATA_NODE Event: Cluster unit FTD1 state
is DISABLED
• show cluster info trace command:

FTD1> show cluster info trace
May 02 09:01:31.162 [DBUG]Cluster state machine client Cluster Unit_Test Client returns is done with progression
May 02 09:01:31.162 [DBUG]Cluster state machine notify client Cluster Unit_Test Client of progression
May 02 09:01:31.162 [INFO]State machine changed from state CONTROL_NODE to DISABLED
May 02 09:01:31.162 [INFO]Interface Port-channel48 is going down
May 02 09:01:31.162 [CRIT]Unit FTD1 is quitting due to Cluster Control Link down (1 times after last rejoin). Rejoin will be attempted after 5
minutes.
May 02 09:01:31.162 [DBUG]Send event (DISABLE, RESTART | INTERNAL-EVENT, 300000 msecs, Cluster interface down) to FSM. Current state CONTROL_NODE
May 02 09:01:29.932 [DBUG]RPC call, Cluster SVM Client to id 0 with parameter 0x0000000000000000, returns RPC_SUCCESS
• LACP errors on FXOS:
FTD1(fxos)# show lacp internal event-history interface ethernet 1/2
10) FSM:<Ethernet2/1> Transition at 258423 usecs after Tue May 2 09:01:31 2023
Previous state: [LACP_ST_PORT_MEMBER_COLLECTING_AND_DISTRIBUTING_ENABLED]
Triggered event: [LACP_EV_UNGRACEFUL_DOWN]
Next state: [LACP_ST_PORT_IS_DOWN_OR_LACP_IS_DISABLED]
Previous state: [LACP_ST_PORT_IS_DOWN_OR_LACP_IS_DISABLED]
Triggered event: [LACP_EV_PORT_HW_PATH_DISABLED]
Next state: [FSM_ST_NO_CHANGE]
Previous state: [LACP_ST_PORT_IS_DOWN_OR_LACP_IS_DISABLED]
Triggered event: [LACP_EV_CLNUP_PHASE_II]
Next state: [LACP_ST_PORT_IS_DOWN_OR_LACP_IS_DISABLED] Interfaces on the switch were modified causing the
operational state changing to down, so the CCL Po
went to down as LACP BPDUs were not received.
• Switch Logs:
Nexus# show logging
2023 May 2 09:01:11 Nexus1 %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel10: first operational port changed from Ethernet1/1 to none
2023 May 2 09:01:11 Nexus1 %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel10: Ethernet1/1 is down
2023 May 2 09:01:11 Nexus1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel10 is down (No operational members)
2023 May 2 09:01:11 Nexus1 %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel10,bandwidth changed to 100000 Kbit
2023 May 2 09:01:11 Nexus1 %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/1 is down (Link failure)
2023 May 2 09:01:11 Nexus1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel10 is down (No operational members)
2023 May 2 09:01:11 Nexus1 %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel1: first operational port changed from Ethernet1/2 to none
2023 May 2 09:01:11 Nexus1 %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel1: Ethernet1/2 is down
Next Actions:
• Verify the port-channel configuration and make sure the port-channels
are up.
• Schedule switch interface/vPC/Port-channel interface configuration
changes during maintenance window.
• If data port-channels will be modified on switch, disable health-
monitoring on cluster side to avoid a cluster event.
• Take advantage of auto-rejoin configuration to tweak the existing unit re-
join timers.
Device Management > Edit Cluster >

Cluster Tab > Edit Cluster Health
Monitor Settings
Case Study 2: Snort engine failure
In the following example FTD2 is out of cluster due to snort failure.
• show cluster history

• show cluster info trace
Switching
• Show loggin | inc 7481
Infra • For snort failure:
CCL Po48 CCL Po48 > expert
sudo su
cd /ngfw/var/log/
Data Po1 Data Po1
less messages | grep crash
less messages | grep snort
CCL Po48 Data Po1
top
pmtool status | grep <process>
ls –lrth /ngfw/var/log/crashinfo/ - for snort3
ls –lrth /ngfw/var/data/cores/ - for snort2
• For high disk:
Snort failed df –ah
du –hsx
FTD1 FTD2 FTD3 find /ngfw -type f -exec du -Sh {} + | sort -rh | head -n 15
Control Unit Data Unit Data Unit
06:40:41 UTC May 2 2023

CONTROL_NODE CONTROL_NODE Event: Asking data node FTD2
to quit due to snort Application
health check failure, and
data node's application state
is down
06:40:41 UTC May 2 2023
CONTROL_NODE CONTROL_NODE Event: Cluster unit FTD2 state
is DISABLED
06:40:44 UTC May 2 2023

DATA_NODE DISABLED Received control message DISABLE (application health check failure)

FTD2 > show cluster info trace
[output omitted]
May 02 06:40:41.882 [INFO]State machine changed from state DATA_NODE to DISABLED
May 02 06:40:41.882 [DBUG]Send event (DISABLE, RESTART | MESSAGE, permanent, Received control message DISABLE (application health check
failure)) to FSM. Current state DATA_NODE
May 02 06:40:41.882 [INFO]ASLR enabled, text region 55d3c6367000-55d3caa12cfd
May 02 06:40:41.882 [INFO]Notify chassis de-bundle port for blade unit-1-1, stack 0x000055d3c7c6da2f 0x000055d3ca3de0a3 0x000055d3c7c68edd
May 02 06:40:41.882 [DBUG]Receive CCP message: CCP_MSG_QUIT from FTD2 to FTD1 for reason CLUSTER_QUIT_REASON_APP_HC
May 02 06:40:41.882 [DBUG]Send CCP message to all: CCP_MSG_APP_STATE
May 02 06:40:41.882 [INFO]snort application status is changed from up to down
• Syslog logs:
FTD1# show logging | include 7481
May 02 06:40:41 %FTD-3-748101: Clustering: Peer unit FTD2(1) reported its snort application status is down
May 02 06:40:41 %FTD-3-748103: Clustering: Asking data node FTD2 to quit due to snort Application health check failure, and data node's
application state is down
May 02 06:40:41 %FTD-3-748101: Clustering: Peer unit FTD2(1) reported its diskstatus application status is up
May 02 06:40:41 %FTD-3-748101: Clustering: Peer unit FTD2(1) reported its snort application status is down
• Firepower logs from FTD2:

root@FTD2:/ngfw/var/log# less messages | grep snort
May 2 06:40:41 firepower SF-IMS[20813]: [20813] pm:process [INFO] Calling crash command '/ngfw/usr/local/sf/bin/snort3-save-crashinfo.py' for process
'37c0a584-e89c-11ed-b8ee-1a2fcfb32d3f'.
May 2 06:40:41 firepower SF-IMS[20964]: [21102] ndclientd:ndclientd [WARN] [snort] Received a signal of snort failure
May 2 06:40:41 firepower SF-IMS[20964]: [21102] ndclientd:ndclientd [WARN] [snort] Critical process failures have exceeded the threshold!
May 2 06:40:41 firepower SF-IMS[20964]: [21069] ndclientd:ndclientd [WARN] [snort] Service has failed, stopping Notification Daemon heartbeats.
May 2 06:40:41 firepower SF-IMS[20964]: [21069] ndclientd:ndclientd [WARN] [snort] sending version [2] HB stop message
May 2 06:40:41 firepower Notification Daemon[20963]: Notification Daemon :NGFW-1.0-snort-1.0--->OFFLINE
May 2 06:40:41 firepower Notification Daemon[20963]: Notification Daemon: Sending a Status Down for NGFW-1.0-snort-1.0 with failure reason More than 50
percent of snort instances are down
May 2 06:40:41 firepower SF-IMS[20964]: [21069] ndclientd:ndclientd [INFO] [snort] Received valid HeartbeatStop response from ND.
May 2 06:40:41 firepower snort3-save-crashinfo.py: /snort3-save-crashinfo.py: Successfully generated snort3 crash information to the
file/ngfw/var/log/crashinfo/snort3-crashinfo.1682531972.473076
May 2 06:40:42 firepower snort[56909]: --------------------------------------------------

May 2 06:40:42 firepower snort[56909]: o")~ Snort++ 3.1.36.100-2
May 2 06:40:42 firepower snort[56909]: --------------------------------------------------
Next Actions:
• By default, snort engine and disk status are monitored by ndclientd process as part
of the cluster health-check.
• If snort fails or disk is full, the unit is removed from the cluster as it is not healthy.
• For snort failure:
• Check the /ngfw/var/log/messages file for failure reason
• Snort traceback, core files can be collected from /ngfw/var/log/crashinfo and
/ngfw/var/data/cores respectively.
• Engage TAC with a troubleshooting file for further RCA.
• In case High Disk usage is detected, remove unnecessary files to free disk space.
• Troubleshoot Excessive Disk Utilization
MTU Issues
Case Study 3: CCL MTU mismatch
In the following example FTD2 is not joining the cluster due to CCL MTU
test failure.
FTD1# show run mtu

mtu inside 1500 Troubleshooting commands:
mtu outside 1500 Switching
mtu cluster 1600
Firewall:
Infra
CCL Po10 CCL Po12
• Show cluster history
• show cluster info trace
MTU 1600 MTU 1600 • show running-config mtu
• show interface
CCL Po11 • show interface port-channel ID
127.2.1.1 127.2.2.1 127.2.3.1
MTU 1500
• ping <interface-name> <IP> size <value>
CCL Po48 CCL Po48 CCL Po48
Switch:
• show interface Ethernet x/x
FTD1 FTD2 FTD3 • show port-channel xx
Data Po1 Data Po1 Data Po1 • show run | in mtu
FTD3# show run mtu

FTD2# show run mtu mtu inside 1500
mtu inside 1500 Switching mtu outside 1500
mtu outside 1500 Infra mtu cluster 1600
mtu cluster 1600
Case Study 3: CCL MTU mismatch
21:22:17 UTC May 7 2023

CONTROL_NODE CONTROL_NODE Event: Cluster unit FTD2 state
is DATA_NODE_APP_SYNC
[output omitted]
21:22:27 UTC May 7 2023

CONTROL_NODE CONTROL_NODE Event: CCL MTU test to unit FTD2
failed

FTD1# show cluster info trace | inc MTU
May 07 21:45:08.500 [WARN]CCL MTU test to unit FTD2 failed
Case Study 3: CCL MTU size mismatch
• Ping test over the CCL to verify if the MTU:
FTD2# ping cluster 127.2.2.1 size 1600
Type escape sequence to abort.
Sending 5, 1600-byte ICMP Echos to 127.2.2.1, timeout is 2 seconds:
?????
Success rate is 0 percent (0/5)
FTD1# ping cluster 127.2.1.1 size 1600

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 127.2.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms
• Console logs at the moment of the joining failure:

FTD2# WARNING: Unit FTD2 is not reachable in CCL jumbo frame ICMP test, please check cluster interface and switch MTU configuration
The data unit has left the cluster because application configuration sync is timed out on this unit. Disabling cluster now!
Cluster disable is performing cleanup..done.
Unit FTD2 is quitting due to system failure for 1 time(s) (last failure is data unit application configuration sync timeout).
Rejoin will be attempted after 5 minutes.
All data interfaces have been shutdown due to clustering being disabled. To recover either enable clustering or remove cluster group
configuration.
Case Study 3: CCL MTU size mismatch
• Check interface MTU configuration on FTD2 and switch:
FTD2# show interfaces port-channel 1
Interface Port-channel1 "inside", is up, line protocol is up
Hardware is EtherSVI, BW 80000 Mbps, DLY 1600 usec
MAC address f8e5.7e1f.418e, MTU 1500
IP address 172.20.1.1, subnet mask 255.255.255.0
FTD2# show interface port-channel 48

Interface Port-channel48 "cluster", is up, line protocol is up
Hardware is EtherSVI, BW 40000 Mbps, DLY 10 usec
Description: Clustering Interface
MAC address 0015.c500.028f, MTU 1600
IP address 192.168.2.1, subnet mask 255.255.0.0
Nexus1# show interface port-channel 10

port-channel10 is up
admin state is up,
vPC Status: Up, vPC number: 10
Hardware: Port-Channel, address: 4488.1618.ee24 (bia 4488.1618.ee24)
MTU 1600 bytes, BW 40000000 Kbit , DLY 10 usec
Nexus1# show interface port-channel 11 MTU wrongly configured

port-channel11 is up,
admin state is up, on CCL Po11 in the
vPC Status: Up, vPC number: 11
Hardware: Port-Channel, address: 88fc.5dba.d788 (bia 88fc.5dba.d788)
switch
MTU 1500 bytes, BW 40000000 Kbit , DLY 10 usec
Case Study 4: Database connections timeout through
the Firewall
In the following example, users report intermittent connection problems
between application and database.
• Define a flow (Source IP / Destination IP/ Destination Port/

Protocol)
• Apply packet captures on all units on ingress, egress and asp
drop.
• Packet captures on Source and destination endpoints when
possible.
• Review cluster configuration
• show run cluster
• show run mtu
• cluster exec show conn detail <IP_address>
• cluster exec show conn long address <IP address>
• Collect syslogs
the Firewall
• show conn detail command for the specific flow:
FTD1# show conn detail port 46638
TCP Inside:172.16.20.1/46638 Outside: 192.168.100.2/1524,
flags UIO , idle 12m5s, uptime 12m5s, timeout 1h0m, bytes 28576, cluster sent/rcvd bytes
[output omitted]
From director/backup FTD2: 16858 (23 byte/s)
Initiator: 172.16.20.1, Responder: 192.168.100.2
Connection lookup keyid: 1345113130
• ASP drops captures:

FTD1# show capture
capture ASP type asp-drop all circular-buffer headers-only [Capturing - 3700 bytes]
match ip host 172.16.20.1 host 192.168.100.2
FTD1# cluster exec show cap ASP | i 172.16.20.1.46638
FTD1(LOCAL):******************************************************
FTD2:*************************************************************
1: 19:31:40.797093 Outside P0 192.168.100.2.1524 > 172.16.20.1.46638: P 81975167:81975360(193) ack
17763954 win 122 Drop-reason: (tcp-not-syn) First TCP packet not SYN,
Drop-location: frame 0x000055d587d1c36a flow (NA)/NA
the Firewall
• Syslogs about this flow:
FTD1> show logging | inc 192.168.100.2
May 23 19:26:53 10.129.10.34 : %FTD-6-302023: Teardown director TCP connection for

Inside:172.16.20.1/46638 to Outside:192.168.100.2/1524 duration 0:01:15 forwarded bytes 16858 Cluster
flow with CLU removed from due to idle timeout
May 23 19:25:38 10.129.10.34 : %FTD-6-302022: Built director stub TCP connection for Inside:/46638
(172.16.20.1/46638) to Outside:192.168.100.2/1524 (192.168.100.2/1524)
May 23 19:25:38 10.129.10.34 : %FTD-6-302023: Teardown forwarder TCP connection for

Outside:192.168.100.2/1524 to unknown:172.16.20.1/46638 duration 0:00:00 forwarded bytes 0 Forwarding or
redirect flow removed to create director or backup flow
May 23 19:25:38 10.129.10.34 : %FTD-6-302022: Built forwarder stub TCP connection for
Outside:192.168.100.2/1524 (192.168.100.2/1524) to unknown:172.16.20.1/46638 (172.16.20.1/46638)
May 23 19:25:38 10.129.10.33 : %FTD-6-302013: Built inbound TCP connection 796624636 for
Inside:172.16.20.1/46638 (172.16.20.1/46638) to Outside:192.168.100.2/1524 (192.168.100.2/1524)
the Firewall
• cluster-cflow-clu-timeout
A cluster flow with CLU is considered idle if director/backup
unit no longer receives periodical update from the owner.
• show conn detail confirms there is no director/backup
flow for the connection on FTD2.
FTD1# cluster exec show conn detail port 46638 port 1524
FTD1(LOCAL):******************************************************
TCP Inside: 172.16.20.1/46638 Outside: 192.168.100.2/1524,
flags UIO , idle 12m5s, uptime 12m5s, timeout 1h0m, bytes 28576, cluster sent/rcvd bytes
[output omitted]
From director/backup FTD2: 16858 (23 byte/s)
Initiator: 172.16.20.1, Responder: 192.168.100.2
Connection lookup keyid: 1345113130
FTD2:*************************************************************
Case Study 4: Database connections timeout
through the Firewall
MTU on FTD1 and FTD2: MTU on switch:
switch# show int po7 | grep MTU
MTU 9000 bytes, BW 10000000 Kbit, DLY 1 usec
FTD1# show run mtu switch# show int po17 | grep MTU
mtu Inside 9000 MTU 9000 bytes, BW 10000000 Kbit, DLY 1 usec
mtu Outside 9000 switch# show int po9 | grep MTU
mtu diagnostic 1500 MTU 9000 bytes, BW 20000000 Kbit, DLY 1 usec
mtu cluster 9184
FTD2# show run mtu

mtu Inside 9000
mtu Outside 9000
mtu diagnostic 1500
mtu cluster 9184
FTD cluster CCL interface MTU is set to 9184

Switch side cluster interface MTU is set to 9000
Make sure:
MTU configuration matches CLU control messages (communication between
cluster members) will follow the MTU settings
between Firewalls and Switch over the CCL.
Next Actions:
The cluster control link traffic includes data packet forwarding, so the cluster
control link needs to accommodate the entire size of a data packet plus
cluster traffic overhead.
• Always make sure:
CCL MTU = Data Interface MTU + at least 100 bytes trailer.
NAT/PAT Failures
Case Study 5: PAT allocation Imbalance (Firepower 6.6)
In the following example, FTD1 is unable to create new NAT connections
when FTD1 rejoined the cluster after a cluster failure.
• Cluster exec show nat pool cluster

• Cluster exec show nat pool
• Cluster exec show xlate | inc address <IP>
• show run nat
• show run all xlate
• cluster exec capture <name> interface <name> trace match <protocol>
host <IP1> host <IP2>
• cluster exec Capture <name> type asp-drop all <protocol> host <IP1>
host <IP2>
• Cluster exec show asp drop
• Before the failure, each unit is owner of an IP address of the pool:
FTD1# show running-config object Server
object network inside-net
subnet 192.168.100.0 255.255.255.0
object network Mapped-IPGroup
range 192.0.2.150 192.0.2.151
Switch
show running-config nat infra
object network inside-net
nat (Inside,Outside) dynamic pat-pool Mapped-IPGroup
Outside
192.0.2.150 192.0.2.151
FTD1# show nat pool cluster
IP outside 192.0.2.150, owner FTD1, backup FTD2 FTD2
FTD1 Data Unit
IP outside 192.0.2.151, owner FTD2, backup FTD1 Control Unit
• show nat pool command shows allocation of both IPs: Inside
FTD1# show nat pool

UDP PAT pool outside:Mapped-IPGroup, address 192.0.2.150, range 1-511, allocated 1 Switch
UDP PAT pool outside:Mapped-IPGroup, address 192.0.2.150, range 512-1023, allocated 2 infra
UDP PAT pool outside:Mapped-IPGroup, address 192.0.2.150, range 1024-65535, allocated 12312
Client
• FTD1 failed and left the cluster. FTD2 becomes now owner of both
IP addresses:
IP outside 192.0.2.150, owner FTD2, backup FTD1
IP outside 192.0.2.151, owner FTd2, backup FTD1
• Due to huge amount of traffic, FTD2 created connections

using both public IP addresses.
• FTD1 is recovered and re-joins the cluster.
• The control unit (FTD2) will attempt to find an unused IP
address of the PAT Pool to assign it to FTD1.
• Since the PAT pool is composed of only two IP addresses.
FTD2 will keep the ownership of both IP addresses as it
has active xlates.

• show xlate command will display the NAT sessions

FTD2# show xlate
138693 in use, 3046971 most used
Flags: D - DNS, e - extended, I - identity, i - dynamic, r - portmap,
s - static, T - twice, N - net-to-net
TCP PAT from inside:192.168.100.10/53740 to outside:192.0.2.150/53740 flags ri idle 0:07:36 timeout 0:00:30
Next Actions:
• An IP address can be re-balanced when zero xlates exist for that IP
address.
• As workaround, the xlates can be cleared from the IP to make it
available for redistribution.
• Starting Firepower 6.7, cluster uses Port block-based distribution PAT.
• More enhancements were made to Firepower 7.0 and 7.1
Troubleshooting
Packet Drops
How to troubleshoot Packet drops through a
cluster?
1. Define a specific flow
2. What service is impacted?
3. Define a source host, destination host,
destination port and protocol
4. Define Ingress and egress interface
Source Host: 172.16.10.10

Destination Host: 72.163.4.161 (cisco.com)
Destination port: 443
Protocol: TCP
Ingress Interface: Inside
Egress Interface: Outside
5. Collect packet captures: can be applied on Data Plane, CCL and ASP drop.
• cluster exec: enable captures in all cluster members

• reinject-hide: to not show reinjectd packets from the CCL
FTD1# cluster exec capture CAPI interface INSIDE match tcp host 172.16.10.10 host 72.163.4.161 eq 443
FTD1# cluster exec cap CAPO reinject-hide interface OUTSIDE match tcp host 192.0.2.150 host 72.163.4.161 eq 443
FTD1# cluster exec cap ASP type asp-drop all buffer 33554432 headers-only match ip host 172.16.10.10 host 72.163.4.161
FTD1# cluster exec capture capccl interface cluster trace match icmp any any
FTD1# cluster exec show capture CAPI

FTD1(LOCAL):******************************************************
capture CAPI type raw-data buffer interface INSIDE [Capturing - 5140 bytes]
match tcp host 172.16.10.10 host 72.163.4.161 eq 443
FTD2:*************************************************************
capture CAPI type raw-data buffer 33554432 interface INSIDE [Capturing - 260 bytes]
match tcp host 172.16.10.10 host 72.163.4.161 eq 443
• trace: to see how packets are handled by the data plane
FTD1# show cap CAPI packet-number 1 trace
25985 packets captured
1: 08:42:09.362697 802.1Q vlan#201 P0 172.16.10.10.45954 > 72.163.4.161.443: S 992089269:992089269(0) win 29200
<mss 1460,sackOK,timestamp 495153655 0,nop,wscale 7>
...
Phase: 4
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'INSIDE'
Flow type: NO FLOW
I (0) got initial, attempting ownership.
Phase: 5
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'INSIDE'
Flow type: NO FLOW
I (0) am becoming owner
...
6. Connection logs and syslogs:
• show conn command will show the connection table.

FTD1# cluster exec show conn add 172.16.10.10
FTD1(LOCAL):******************************************************
TCP Outside 72.163.4.161:443 INSIDE 172.16.10.10:1526, idle 0:06:32, bytes 15826, flags UIO
FTD2:*************************************************************
TCP Outside 72.163.4.161:443 INSIDE 172.16.10.10:1526, idle 0:06:32, bytes 0, flags Y
• Syslogs are helpful to see events data logs.

Cisco Secure Firewall Threat Defense Syslog Messages
Key Takeaways
• CCL MTU = Data Interfaces + at least 100 bytes of trailer
• MTU configuration must match between the cluster and switch
• Interface health monitor enabled in all interface by default
• Snort engine and disk status enabled by default
• For packet drops, define and specific source and destination host,
destination port and protocol to set up packet captures
Cluster Troubleshooting commands:
show cluster history: To view event history for the cluster
show cluster access-list: Shows hit counters for access policies.
show cluster conn: Shows the aggregated count of in-use connections for all units.
show cluster conn count: Only the connection count is display
show cluster interface-mode: Shows the cluster interface mode, either spanned or individual.
show cluster memory: Shows system memory utilization
show cluster resource usage: Shows system resources and usage.
show cluster traffic: Shows traffic statistics.
show cluster xlate count: Shows current translation information.
show cluster info: Shows cluster information.
show cluster info trace: this command shows the debug information
show cluster info trace module hc: this command shows the debug information regarding health checks
show cluster info health details: To verify the heartbeat frequency
show cluster info conn-distribution: To Shows the connection distribution in the cluster.
show cluster info packet-distribution: Shows packet distribution in the cluster.
cluster exec show nat pool cluster: command to check if the pool is distributed
cluster exec show nat pool: To display statistics of NAT pool usage on all units
cluster exec show conn detail: Displays connections in detail, including translation type and interface
information.
cluster exec show conn long address: Displays connections in long format.
cluster exec capture <name> interface <name> trace match <protocol> host <IP1> host <IP2>: configure captures in
Data Plane and CCL
cluster exec capture <name> type asp-drop all <protocol> host <IP1> host <IP2>: Configure ASP drop captures
Cluster exec show cap <name>: To display the details of the capture
cluster exec show asp drop: To debug the accelerated security path dropped packets or connections.
“Simple can be harder than complex.
You have to work hard to get your
thinking clean to make it simple.”
-Steve Jobs
Fill out your session surveys!
Attendees who fill out a minimum of four session

surveys and the overall event survey will get
Cisco Live-branded socks (while supplies last)!
Attendees will also earn 100 points in the

Cisco Live Game for every survey completed.
These points help you get on the leaderboard and increase your chances of winning daily and grand prizes
Q&A
• Visit the Cisco Showcase
for related demos
• Book your one-on-one

Meet the Engineer meeting
• Attend the interactive education

with DevNet, Capture the Flag,
Continue and Walk-in Labs
your education • Visit the On-Demand Library

for more sessions at
www.CiscoLive.com/on-demand
TACSEC-2006 © 2023 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
Thank you
#CiscoLive
#CiscoLive

TACSEC-2006 Troubleshooting Cisco Secure Firewall Cluster Failures and Packet Drops - 2023

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TACSEC-2006 Troubleshooting Cisco Secure Firewall Cluster Failures and Packet Drops - 2023

Uploaded by

Copyright:

Available Formats

#CiscoLive

Oscar Montoya Torres

Oscar Montoya Torres

Following centralized features are handled by the

• Inspections NetBIOS, TFTP, SUNRPC, SQLNET

• Flow Director: The connection states are backed-up on a different unit

09:01:34 UTC May 2 2023

FTD2> show cluster history

09:01:34 UTC May 2 2023

• show cluster info trace command:

Device Management > Edit Cluster >

• show cluster history

06:40:41 UTC May 2 2023

FTD2> show cluster history

06:40:44 UTC May 2 2023

• show cluster info trace command:

• Firepower logs from FTD2:

May 2 06:40:42 firepower snort[56909]: --------------------------------------------------

FTD1# show run mtu

FTD3# show run mtu

21:22:17 UTC May 7 2023

21:22:27 UTC May 7 2023

• show cluster info trace command:

FTD1# ping cluster 127.2.1.1 size 1600

• Console logs at the moment of the joining failure:

FTD2# show interface port-channel 48

Nexus1# show interface port-channel 10

Nexus1# show interface port-channel 11 MTU wrongly configured

• Define a flow (Source IP / Destination IP/ Destination Port/

• ASP drops captures:

FTD1# cluster exec show cap ASP | i 172.16.20.1.46638

May 23 19:26:53 10.129.10.34 : %FTD-6-302023: Teardown director TCP connection for

May 23 19:25:38 10.129.10.34 : %FTD-6-302023: Teardown forwarder TCP connection for

FTD2# show run mtu

FTD cluster CCL interface MTU is set to 9184

• Always make sure:

CCL MTU = Data Interface MTU + at least 100 bytes trailer.

• Cluster exec show nat pool cluster

• show nat pool command shows allocation of both IPs: Inside

FTD1# show nat pool

• Due to huge amount of traffic, FTD2 created connections

FTD2# show nat pool cluster

• show xlate command will display the NAT sessions

Source Host: 172.16.10.10

• cluster exec: enable captures in all cluster members

FTD1# cluster exec show capture CAPI

• show conn command will show the connection table.

• Syslogs are helpful to see events data logs.

• CCL MTU = Data Interfaces + at least 100 bytes of trailer

• MTU configuration must match between the cluster and switch

• Interface health monitor enabled in all interface by default

• Snort engine and disk status enabled by default

Attendees who fill out a minimum of four session

Attendees will also earn 100 points in the

• Book your one-on-one

• Attend the interactive education

your education • Visit the On-Demand Library

You might also like