Professional Documents
Culture Documents
Article Content
Symptoms
How to troubleshoot Fibre Channel node to switch port or SFP communication problems by means of elimination?
Link failure.
G port.
No light.
Loss of Signal.
Faulty SFP
Troubleshoot FC port
errors on FC port
Cause
Too many SFP pro-actively replaced while the problem lies outside the SFP / switch.
Resolution
1. Identify the node and switch port involved in the communications failure.
2. Verify that the switch port is administratively up (unblocked, no shut), or enabled.
3. Make sure there are redundant paths available to attached device before proceeding.
Warning: Before proceeding any further, make sure you know how your node will react if it gets a new FCID. Some OS versions of AIX
and HP-UX do not react well to such changes, since the FCID is built in the hardware path to the storage device. If you move the
cable, you might have data unavailable. If you have any doubts, consult with an EMC Technical Support Engineer.
Note: If there is an issue with the SFP, this procedure is the quickest way of bringing the device back online.
5. Change the disabled port to enable state (or administratively up) and bring the device back online.
6. Clear / reset the stats / counters to zero on the switch. (See notes How to...)
7. Monitor the port with the respective commands for 4-6 hours.
If the error counters increase, the problem lies outside the switch, and the customer needs to be advised to:
The new port SFP and the Cable requires cleaning. (To prevent contamination on the SFP of a dirty cable.
Consider using the EMC cleaning kit.)
The attached device needs to be investigated further by whomever support the device.
On Cisco switch, if the "errdisabled" state comes back with no counter increase, a SR needs to be opened for
further back end investigation.
If the errors do not increase (or the Errdisabled state on Cisco switch does not come back), the SFP on the previous port
is defective, raise SR for SFP replacement providing above analysis results, SFP details (SM or MM, and speed, etc.)
Note: You can do the same procedure from Step 6 onwards if you replaced the cable and / or attached device, by checking the
counters.
Additional Information
Note:
Most of the time, if a SFP optical transceiver definitely fails, you will see a clear optics failure in the event log.
Hardware failures can easily be isolated by applying a simple algorithm to the problem; if it is not this piece of hardware, then it is
the other piece. Loop until you isolate the failure.
BROCADE EXAMPLES:
porterrshow :
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err
General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Please clear
port statistics (https://support.emc.com/kb/304525) and retake data after 4-6 hours.
From the errors, we can see the link fail and loss of sync PLUS enc out errors, these can also include loss sig error.
These combinations of errors generally indicate a host reboot or a link reset external to the switch. THe enc out errors are caused
during the speed negotiation as part of a link initialization.
Expected Actions:
Verify that the device attached to the port had a legitimate reason to go offline/ online i.e. host reboot. If not, raise a SR.
porterrshow :
CURRENT CONTEXT -- 3 , 111
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err
General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Please clear
port statistics (https://support.emc.com/kb/304525) and retake data after 4-6 hours.
Enc out errors without any associated errors indicate dirty cable.
Expected Actions:
Inspect and clean all optic faces on cable and SFP connected to this port and attached devices.
porterrshow :
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err
General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Please clear
port statistics (https://support.emc.com/kb/304525) and retake data after 4-6 hours.
The frame is entering the switch port with a bad CRC but with the end of the frame still marked as good.
This is an indication that this is the first port to register the bad frame so the issue is either the SFP / Cable / Attached device
interface on this specific port.
Expected Actions:
See default action in the resolution.
For an ISL port, clear stats with statsclear and slotstatsclear commands, wait 4-6 hours, and collect supportsaves from both
switches and open SR for normal troubleshooting.
Example 4 CRC:
porterrshow :
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err
General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Please clear
port statistics (https://support.emc.com/kb/304525) and retake data after 4-6 hours.
The port is recording a frame entering the switch with a bad CRC frame, but with the frame already marked as bad. Normally see this
on an ISL and NPIV F-ports.
Expected Actions:
If CRC errors are logging on NPIV port, have the device investigated by maintaining vendor !
For an ISL port, check all ports in the fabric for any port logging crc g_oef and action as in Example 3.
porterrshow :
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err
General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Please clear
port statistics (https://support.emc.com/kb/304525) and retake data after 4-6 hours.
This is applicable only on platforms that support 10 Gbps or 16 Gbps ports (6505/6510/6520/DCX-8510) and it was introduced with
Condor3 ASIC, the GEN5 Platform. ER_PCS_BLK shows the number of Physical Coding Sublayer (PCS) block errors. This counter is
equivalent with enc_out for 8Gb/4Gb link and it's used only for 10GB and 16Gb speed.
From the errors, we can see link fail and loss of sync plus pcs err errors, these can also include loss sig error.
These combinations of errors generally indicate a host reboot or link reset external to the switch.
The pcs err errors are caused during the speed negotiation as part of link initialization.
Expected Actions:
Verify that the device attached to the port had a legitimate reason to go offline/online i.e. host reboot. If not, raise SR.
porterrshow :
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err
PCS ERR errors without any associated errors indicate dirty cable.
Expected Actions:
Inspect and clean all optic faces on cable and SFP connected to this port and attached devices.
CISCO EXAMPLES:
Example 1:
Hardware is Fibre Channel, SFP is short wave laser w/o OFC (SN).
0 discards, 0 errors
0 discards, 0 errors
General Reason:
The "Errdisabled" state of an interface can be a bit misleading as interface counters can be clean on the front end and the
switch seems to down the port with "errdisabled" state, and error counters increasing on the back end (ASIC / internal /
linecard).
Expected Actions:
See default action in the resolution. If re-occurring, collect tech support details output and open SR.
Note: Information on "Errdisabled" state from Cisco: The bit errors can occur for the following reasons:
A bit error rate threshold is detected when 15 error bursts occur in a 5-minute period. By default, the switch disables the interface
when the threshold is reached. You can enter a shutdown and no shutdown command sequence to reenable the interface.
You can configure the switch to not disable an interface when the threshold is crossed. By default, the threshold disables the
interface.
Example 2:
CRCs incrementing
0 discards, 17 errors
17 CRC, 0 unknown class
2 discards, 0 errors
General Reason:
The port is recording a frame entering the switch with a bad CRC but a good end of frame. The CRC counter only increments on
the specific ingress port logging the error and any investigations should be done on this physical link.
Expected Actions:
Article Properties
Affected Product
Connectrix
Product
Version
Article Type
Solution