2011-5-5

Security Level:

A

Handling Common Faults and Alarms on the RTN Network
www.huawei.com

HUAWEI TECHNOLOGIES CO., LTD.

Huawei Confidential

Contents
1 2 3 4 5 6 7 8 9 10
Process of Locating Common Faults Locating Link Faults Locating Faults of TDM Services Locating Faults of Packet Services Locating Faults of Protection Schemes Locating Clock Faults Locating DCN Faults Locating Other Faults Handling Common Alarms Typical Cases of Fault Locating Reference Documents
Huawei Confidential Page 2

11

HUAWEI TECHNOLOGIES CO., LTD.

Process of Locating Common Faults
Start Check alarms.

Check service flows and locate faults. Check key configurations.

Check black box, errlog, debugbuf, dopra records.

Record network configurations, operation procedures, fault symptoms, and time points of key events.

Collect data.

End

HUAWEI TECHNOLOGIES CO., LTD.

Huawei Confidential

Page 3

Check records of manual operations. and dopra logs. 2. 3. Collect fault information by using specific tools. Huawei Confidential Page 4 . Check version information. Check current and historical alarms and current and historical 15-minute and 24-hour events reported by the NMS and NEs. and board configuration. HUAWEI TECHNOLOGIES CO. 6. Check the flow of services. the time when services recover and the triggered events. NE configuration..Process of Locating Common Faults 1. network topologies. Check the time when service interruption occurs and the triggered events. and oplog records on NEs. debugbuf logs. Check black box records. LTD. and configuration of convergence NEs. operation records on the NMS. errlog records. 4. 5. including service add/drop nodes.

Contents 1 2 3 Process of Locating Common Faults Locating Link Faults Locating Faults of TDM Services Locating Faults of Packet Services Locating Faults of Protection Schemes Locating Clock Faults Locating DCN Faults Locating Other Faults 4 5 6 7 8 9 10 11 Handling Common Alarms Typical Cases of Fault Locating Reference Documents Huawei Confidential Page 5 HUAWEI TECHNOLOGIES CO. . LTD..

5 Handle the fault. Yes Perform rollbacks.. Locate faults by performing loopbacks. Huawei Confidential Page 6 .Locating Faults of Microwave Links Common Locating Process Start 1 Are there any wrong operations? No 2 Yes Are there any ODU or IF board faults? Handle alarms. End HUAWEI TECHNOLOGIES CO. No Is Tx power normal? No Yes Is Rx power lower than normal? No Fading causes abnormal Rx power? Yes 3 Handle the fault. No Faults are rectified? Yes Yes Handle the fault. LTD. No 6 Are links faulty unidirectionally? 7 No Go to the next step. 4 Handle the fault.

Slow down-fading causes abnormal Rx power. Common Cause The ODU is faulty. The ODU is faulty. 4. Rx power is always lower than normal value. 5.Locating Faults of Microwave Links Common Symptoms and Causes Fault Type Tx power is abnormal. Fading margin is insufficient. Rx power is normal. Fast fading causes abnormal Rx power. Slow up-fading causes abnormal Rx power. Antennas have different polarization directions. HUAWEI TECHNOLOGIES CO. Multipath fading is severe. 3. Transmission is blocked by mountains or buildings. Antennas malfunction or the connection between antennas and ODUs is faulty. is faulty unidirectionally. but the microwave link There is external interference. LTD. There is external interference. such as wet waveguide interface and loosely-installed flexible waveguide. 2. Antennas are not aligned.. 1. Huawei Confidential Page 7 .

Whether the ODU is muted 3.Whether the ODU is powered off 2. Check for incorrect operations..Whether a loopback is performed on the IF board 4.Locating Faults of Microwave Links Handling Method Handling Procedure 1.Whether the E1 capacity is consistent at the two ends for the Hybrid microwave HUAWEI TECHNOLOGIES CO. Handling Method Focus on: 1. Huawei Confidential Page 8 .Whether the configuration is consistent at the two ends 5.Whether the configuration matches the models of ODUs and combiners 6. LTD.

Huawei Confidential Page 9 ..Locating Faults of Microwave Links Handling Method Handling Procedure 2. LTD. Handle equipment faults. Handling Method Focus on: VOLT_LOS CONFIG_NOSUPPORT HARD_BAD TEMP_ALARM IF_INPWR_ABN RADIO_MUTE RADIO_TSL_HIGH RADIO_TSL_LOW RADIO_RSL_HIGH IF_CABLE_OPEN 3. Replace the ODU. Handle abnormal Tx power. HUAWEI TECHNOLOGIES CO.

HUAWEI TECHNOLOGIES CO. Especially. check whether the received signal is from the main lobe. 6. Check the polarization directions of antennas and adjust the incorrect polarization direction. 7. if the Rx power difference between the active and standby ODUs at one end is higher than 9 dB (for non-balanced combiners) or 5 dB (for balanced combiners). If Rx power declines rapidly and remains lower than normal.Locating Faults of Microwave Links Handling Method Handling Procedure 4. 3. Handling Method 1. Check the antenna direction. replace ODUs to determine the faulty component. Check the antenna gain at the two ends and replace the antennas that do not provide required antenna gain. LTD. 8. Check whether transmission is blocked by any mountains or buildings. align antennas. 2.. If the RSL difference between the two ends is higher than 10 dB. On a 1+1 HSB microwave link. If antennas are not aligned. 4. perform 1+1 switching or replace ODUs/combiners to determine the faulty component. 9. 5. check the installation of antennas and ensure that the azimuth of antennas meets the planning requirements. Huawei Confidential Page 10 . Handle lower-thannormal Rx power. Replace ODUs/combiners to determine the faulty component.

•For microwave links with 1+1 SD.Locating Faults of Microwave Links Handling Method Handling Procedure Handling Method 5. To handle down-fading: •Increase the installation heights of antennas. such as: •Adjust the position of the antenna to block the reflected wave or make the reflection point fall on the ground that has a small reflection coefficient. •Use a spectrum analyzer to analyze interference sources. •Reduce the transmission distance. LTD. adjust the height difference between two antennas to make one's Rx power higher than the other's Rx power. HUAWEI TECHNOLOGIES CO. Huawei Confidential Page 11 .. reducing multipath fading. To handle up-fading: •Check for co-channel interference. or change plans to minimize the interference. •Increase fading margins by using larger-diameter antennas or raising antennas' Tx power. •Contact the spectrum management department for clearing the interference spectrum. •Configure 1+1 SD for microwave links. •Increase the antenna gain. To handle fast fading: Contact the network planning department for appropriate plan changes. Handle fading. •Increase Tx power.

Perform an inloop on the IF port. Check for co-channel interference. broken. 5. Replace the IF board if the fault persists. 2. you can infer that the ODU is faulty. HUAWEI TECHNOLOGIES CO. 4. 4.Locating Faults of Microwave Links Handling Method Handling Procedure 6. 1. 2. If the fault is rectified after replacement. Handle interference. 7. Huawei Confidential Page 12 . Handling Method 1. LTD. Locate faults by performing loopbacks. 3. Contact the spectrum management department for clearing the interference spectrum. 6. or change plans to minimize the interference.. Check cable connectors and redo the substandard ones. or pressed. Replace the ODU. Check for adjacent channel interference. Use a spectrum analyzer to analyze interference sources. Check IF cables and replace those that are soggy. 3.

If the alarm parameter is 0x01-0x03. which are composed of IFH2 boards. Cause 2: On Hybrid microwave links. Huawei Confidential Page 13 .. Cause 1: The model and configuration parameters of the ODU do not meet the requirements. LTD. check whether the configuration parameters of the ODU port meet the requirements of network planning. check whether the configuration parameters of the IF port meet the requirements of network planning. the maximum Tx power of ODUs is determined by the IF modulation mode and AM enabling status.   Check the alarm parameters to determine the configuration parameters that do not meet the requirements. the configured ODU's Tx power is beyond the allowed range.) Possible Causes   Handling Procedure Cause 1: The model and configuration parameters of the ODU do not match the requirements. If not.Locating Faults of Microwave Links Common Alarms  The CONFIG_NOSUPPORT is an alarm indicating that the configuration is not supported.  HUAWEI TECHNOLOGIES CO. (On Hybrid microwave links. change the parameter settings. If the alarm parameter is 0x04-0x06.

Locating Faults of Microwave Links Common Alarms  The RADIO_RSL_LOW is an alarm indicating that the RSL is over low. Possible Causes    Cause 1: Certain other alarms occur at the opposite site. LTD. HUAWEI TECHNOLOGIES CO. Cause 2: The opposite Tx power is over low.. Cause 3: Signal attenuation on the microwave link is heavy. Huawei Confidential Page 14 .

Check whether the opposite NE is powered off. LTD.Locating Faults of Microwave Links Common Alarms Handling Procedure  Cause 1: Certain other alarms occur at the opposite site. HUAWEI TECHNOLOGIES CO. clear the alarm immediately..     RADIO_MUTE CONFIG_NOSUPPORT RADIO_TSL_LOW BD_STATUS    Cause 2: The opposite Tx power is over low. replace the opposite ODU. Check whether the opposite Tx power is normal. Huawei Confidential Page 15 . Check whether any of the following alarms is reported at the opposite site. If yes. If not.

Huawei Confidential Page 16 . If yes. ODUs. correct the polarization direction. If not. contact the network planning department for improving anti-fading performance. damp. replace the faulty component. realign the antennas. If the alarm is reported occasionally.Locating Faults of Microwave Links Common Alarms  Cause 3: Signal attenuation on the microwave link is heavy. Check the antenna gain at the two ends and replace the antennas that do not provide required antenna gain. combiners. If not. Check whether the antennas at both ends are aligned.    and combiners at both ends. Check whether the polarization direction is set correctly for the antennas..  Check whether the outdoor units such as antennas. or damaged. LTD. If yes. ODUs. and flexible waveguides are wet.  Check whether the alarm is repeatedly reported among historical alarms. contact the network planning department for avoiding the block. Check whether transmission is blocked by any mountains or buildings.  HUAWEI TECHNOLOGIES CO.

LTD.  Cause 3: The operating frequency of the local ODU is different from that of the opposite ODU.Locating Faults of Microwave Links Common Alarms The MW_LOF is an alarm indicating the loss of microwave frames. Huawei Confidential Page 17 .. Handling Procedure See "Handling Faults of Microwave Links.  Cause 4: The transmit unit of the opposite site is faulty.  Cause 5: The receive unit of the local site is faulty. Possible Causes  Cause 1: The microwave link performance degrades."  HUAWEI TECHNOLOGIES CO.  Cause 2: The IF working mode of the local site is different from that of the opposite site.

Handling Procedure See "Handling Faults of Microwave Links."  HUAWEI TECHNOLOGIES CO.. Possible Causes  Cause 1: The receive power of the ODU is abnormal.  Cause 3: The receive unit of the local site is faulty. LTD.  Cause 4: Interference exists.Locating Faults of Microwave Links Common Alarms The MW_FECUNCOR is an alarm indicating that uncorrectable errors exist in the forward error correction (FEC) coding of microwave frames.  Cause 2: The transmit unit of the opposite site is faulty. Huawei Confidential Page 18 .

Cause 2: The alarmed board has hardware errors. LTD.  Check the alarm parameter. Huawei Confidential Page 19 . Handling Procedure Cause 1: Clock tracing is looped.Locating Faults of Microwave Links Common Alarms The HARD_BAD is an alarm indicating hardware errors.  Cause 2: The alarmed board has hardware errors.  HUAWEI TECHNOLOGIES CO. The value 0x06 indicates that clock signals are interlocked and therefore the timing loop needs to be cleared. Possible Causes  Cause 1: Clock tracing is looped..  Replace the alarmed board.

LTD. the possible causes are as follows:  Check whether the IF board reports the HARD_BAD.Locating Faults of Microwave Links Common Alarms The BD_STATUS is an alarm indicating that the board cannot be detected. Huawei Confidential Page 20 . the possible causes are as follows:  The board is installed in an incorrect slot.  The slot that houses the board is faulty. Cause 2: If the ODU reports the alarm. clear the alarm immediately.  The ODU is faulty. IF_CABLE_OPEN. Replace the faulty ODU. If yes. BD_STATUS. or VOLT_LOS alarm..  The board and the backplane are connected incorrectly.  The board is faulty. Possible Causes Cause 1: If the IDU reports the alarm.  HUAWEI TECHNOLOGIES CO.

Re-install the alarmed board. Install the board in another slot. Replace the faulty ODU. HUAWEI TECHNOLOGIES CO. Cause 2: If the ODU reports the alarm.Locating Faults of Microwave Links Common Alarms Handling Procedure Cause 1: If the IDU reports the alarm. Replace the alarmed board. LTD. handle the alarm as follows: Check whether the physical slot and logical slot of the alarmed board are the same. handle the alarm as follows: Check whether the alarm is caused by other alarms. Huawei Confidential Page 21 ..

Yes Reset or replace alarmed boards. Huawei Confidential Page 22 . E 1 MLPPP No Any alarms on the boards? Types of NNI ports Yes Any alarms on E1 ports? Yes No No physicallayer alarms Handle alarms..STM MLPPP Any laser alarms? No Any alarms on SDH ports? Yes No Any VC-12 alarms? Yes No Compute the boards and physical links that services traverse. LTD.Locating Faults of Ethernet Links Common Locating Process ETH Any laser alarms? Yes No Any ETH physical-layer alarms? Yes No Start C . Any alarms on IF ports? Yes No MW HUAWEI TECHNOLOGIES CO.

signals 1.Checking Alarms on Ethernet Links ETH_LOS Loss of optical signals ETH_LINK_DOWN Connection fault on the network port MAC_FCS_EXC Excessive bit errors BTS 1 CES RTN 10G/GE RTN GE/FE STM-1 MPLS MPLS RTN BSC RTN BTS 2 CES RTN Core network 10G/GE STM-1 BTS 3 ETH RTN RTN BSC Possible Causes Possible Possible Causes Causes 1. Fiber cuts 2. units are faulty. degrade. fiber connections. 3. Faulty optical modules 3. Electrical cables..Negotiation Excessive bit errors detected at the MAC layer. LTD. HUAWEI TECHNOLOGIES CO. Fiber performance deteriorates. fails due are to different working modes at the 2. or opposite4. Excessive optical attenuation 1. Huawei Confidential Page 23 . Optical ports are dirty. two Lineends. 2.

Common Alarms on Ethernet Ports (1) The ETH_LOS is an alarm indicating loss of connection on Ethernet ports. If the alarm persists then.  Cause 3: The local Rx power is over low. Replace the faulty electrical cable or fiber.  Cause 4: The alarmed board is faulty. clean the receive optical port and fiber connector. verify that the flange and optical attenuator are used correctly. Cause 3: The local Rx power is over low. Verify that the electrical cable or fiber on the Ethernet port is correctly connected. Replace the alarmed board. replace the mapping board at the opposite end.. Cause 2: The electrical cable or fiber on the Ethernet port is faulty. LTD.  Cause 2: The electrical cable or fiber on the Ethernet port is faulty. Cause 4: The alarmed board is faulty. Possible Causes  Cause 1: The electrical cable or fiber on the Ethernet port is incorrectly connected.  HUAWEI TECHNOLOGIES CO. If the alarm persists. add or remove optical attenuators to achieve normal Rx power. Huawei Confidential Page 24 . Handling Procedure Cause 1: The electrical cable or fiber on the Ethernet port is incorrectly connected. If the alarm persists then. Check for the OUT_PWR_ABN alarm on the opposite NE and clear the alarm immediately if it is reported. If the alarm persists then.

Cause 3: The fiber is connected to an incorrect port. Check for the LOOP_ALM alarm at the two ends and clear the alarm immediately if it is reported. Cause 1: Negotiation fails due to different working modes at the two ends. Verify that the working modes are the same at the two ends. If yes. Cause 3: The fiber is connected to an incorrect port. LTD. Possible Causes     Cause 4: A certain board is faulty. Huawei Confidential Page 25 . HUAWEI TECHNOLOGIES CO. Handling Procedure Cause 1: Negotiation fails due to different working modes at the two ends. Cause 2: An inloop is performed on the port. Cause 2: An inloop is performed on the port. Check for hardware-related alarms (such as HARD_BAD) at the two ends and replace the board that reports any of these alarms.. connect the fiber to a correct port. Check whether the fiber on the alarmed port is connected to an incorrect port.Common Alarms on Ethernet Ports (2)  The ETH_LINK_DOWN is an alarm indicating that the connection on the network port is faulty. Cause 4: A certain board is faulty.

Check whether the alarmed port also reports IN_PWR_ABN. HUAWEI TECHNOLOGIES CO. If the alarm persists then. Check for the LOOP_ALM alarm on the NMS and clear the alarm immediately if it is reported. clear the IN_PWR_ABN alarm immediately. If yes.Common Alarms on Ethernet Ports (3)  The MAC_FCS_EXC is an alarm indicating that excessive bit errors are detected at the MAC layer. Cause 3: The fiber connector is dirty.. check for DOS attacks and eradicate any sources that transmit a large amount of invalid data. Clean the fiber connector and the receive optical port. Cause 3: The fiber connector is dirty. Cause 1: The line signals deteriorate. Cause 2: The input optical power is abnormal. Cause 2: The input optical power is abnormal. LTD. verify that the fiber and electrical cable are normal. If the alarm persists then. Possible Causes    Handling Procedure Cause 1: The line signals deteriorate. Huawei Confidential Page 26 .

Unframed of structure oppositeof transmit signalsunits from the opposite site 3.Fiber Excessive cuts 2. Huawei Confidential Page 27 . LTD. 1. Malfunction of clock extraction modules 1. attenuation Excessive of loss received on the signals line 3.. Malfunction 2. Failure in received signals 2. Malfunction of local receive units HUAWEI TECHNOLOGIES CO.Checking Alarms on SDH Links R_LOS Loss of optical signals BTS 1 CES RTN RTN GE/FE GE R_LOC Loss of clock R_LOF Loss of frame STM-1 MPLS MPLS RTN BSC BTS 2 CES RTN RTN GE/10GE Core network STM-1 BTS 3 ETH RTN RTN BSC Possible Causes Possible PossibleCauses Causes 1.

Cause 1 of boards: The local receive board is faulty. Cause 3 of fibers: Rx power is over low..      Handling Procedure Cause 1 of lasers: The local optical port is not used but the local laser is open. Cause 2 of lasers: The local laser is open but the opposite laser is closed. LTD.Common Alarms on SDH Ports (1)  The R_LOS is an alarm indicating loss of signals on the receive side of the line. Cause 2 of lasers: The local laser is open but the opposite laser is closed. Cause 1 of fibers: No pigtail is connected to the local optical port or the pigtail on the local optical port is connected incorrectly. Cause 2 of boards: The opposite transmit board is faulty. Check the enabling status of the opposite laser on the NMS and open the laser if it is closed. Possible Causes   Cause 1 of lasers: The local optical port is not used but the local laser is open. so there is no output of optical signals. Check the enabling status of the local laser on the NMS and close the laser if it is open. Huawei Confidential Page 28 . so there is no output of optical signals. Cause 2 of fibers: Fiber cuts occur. HUAWEI TECHNOLOGIES CO.

Cause 2 of boards: The opposite transmit board is faulty. Verify that the pigtail on the local optical port is correctly connected. If the alarm persists. replace the opposite crossconnect board. Cause 3 of fibers: Rx power is over low. Huawei Confidential Page 29 . add or remove optical attenuators to achieve normal Rx power. Replace broken fibers. If the alarm persists. If the alarm persists then. Replace the opposite transmit board. Cause 2 of fibers: Fiber cuts occur. If the alarm persists then. HUAWEI TECHNOLOGIES CO. If the local Rx power is normal. If the alarm persists then. the local board is faulty and needs to be replaced. Check for the OUT_PWR_ABN alarm on the opposite transmit port and clear the alarm immediately if it is reported. clean the receive optical port and fiber connector. verify that the flange and optical attenuator are used correctly.Common Alarms on SDH Ports (1) Handling Procedure Cause 1 of fibers: No pigtail is connected to the local optical port or the pigtail on the local optical port is connected incorrectly. LTD. set an inloop for the local receive port.. Cause 1 of boards: The local receive board is faulty.

LTD. Cause 4: The signals transmitted from the opposite site do not have the frame structure. Cause 1: Different types of optical modules are used at the two ends. Cause 2: The receive power of the ODU is abnormal.. Cause 2: The receive power of the ODU is abnormal. HUAWEI TECHNOLOGIES CO.Common Alarms on SDH Ports (2)  The R_LOF is an alarm indicating loss of frames on the receive side of the line. Handling Procedure Cause 1: Different types of optical modules are used at the two ends. Verify that optical modules of one type are used at the two ends. Check whether the alarmed port also reports IN_PWR_ABN. clear the IN_PWR_ABN alarm immediately. Possible Causes      Cause 5: The local receive board is faulty. If yes. Cause 3: Fibers are misconnected. Huawei Confidential Page 30 .

Cause 4: The signals transmitted from the opposite site do not have the frame structure. Verify that fibers are connected correctly. Huawei Confidential Page 31 .. HUAWEI TECHNOLOGIES CO. LTD. Check for the HARD_BAD alarm on the local receive board and clear this alarm immediately if it is reported.Common Alarms on SDH Ports (2) Handling Procedure Cause 3: Fibers are misconnected. Cause 5: The local receive board is faulty. Check for the HARD_BAD alarm on the opposite transmit board and clear this alarm immediately if it is reported.

. Fibers on the DDF-side E1/T1 output ports are Some alarms are reported on the opposite site. The electrical cable is faulty. Fibers on local E1/T1 output ports are disconnected or loosely connected. 2. disconnected or loosely connected. E1/T1 services are not received. 4. 5. LTD. A certain board is faulty. HUAWEI TECHNOLOGIES CO.Checking Alarms on E1 Links T_ALOS Loss of signals ALM_E1RAI Far-end alarm indication BTS 1 CES RTN GE/10GE RTN GE/FE STM-1 MPLS MPLS RTN BSC BTS 2 CES RTN RTN GE/10GE Core network STM-1 BTS 3 ETH RTN RTN BSC Possible Causes Possible Causes 1. Huawei Confidential Page 32 . 3.

Huawei Confidential Page 33 . Cause 3: The opposite equipment is faulty. LTD. Cause 2: E1 cables are disconnected or loosely connected.Common Alarms on E1 Ports (1)  The T_ALOS is an alarm indicating loss of signals on E1 ports. Possible Causes      Handling Procedure Cause 1: The opposite site does not transmit any E1 services. Verify that the opposite site transmits E1 services properly. Cause 1: The opposite site does not transmit any E1 services.. Cause 5: The alarmed board is faulty. HUAWEI TECHNOLOGIES CO. Cause 4: The electrical cable is faulty. Verify that E1 cables are correctly connected. Cause 2: E1 cables are disconnected or loosely connected.

Perform a self-loop for the alarmed channel on the DDF side. the E1 cable is faulty and needs to be replaced. the opposite equipment is faulty and the fault needs to be rectified. the interface board is faulty and needs to be replaced.. Perform a self-loop for the alarmed channel on the DDF side. If the alarm clears. Huawei Confidential Page 34 . If the alarm persists. If the alarm clears. If the alarm clears. Cause 4: The electrical cable is faulty. Perform a self-loop for the alarmed channel on the interface board side.Common Alarms on E1 Ports (1) Handling Procedure Cause 3: The opposite equipment is faulty. If the alarm persists. perform a self-loop for the alarmed channel on the interface board side. LTD. HUAWEI TECHNOLOGIES CO. set an inloop for the alarmed channel on the NMS. Cause 5: The alarmed board is faulty.

Common Alarms on E1 Ports (2)

The UP_E1_AIS is an alarm indicating upstream E1 signals. This alarm is reported when the
upstream E1 signal is all 1s.

Possible Causes
 

Cause 1: The opposite site reports the T_ALOS alarm. Cause 2: An inloop is set for the E1 port.

Cause 3: Some boards are faulty.

Handling Procedure Cause 1: The opposite site reports the T_ALOS alarm. Check for the T_ALOS alarm on the opposite site and clear this alarm immediately if it is reported. Cause 2: An inloop is set for the E1 port.

Check whether the E1 port reports the LOOP_ALM alarm on the NMS. If yes, release the inloop on the
E1 port. Cause 3: Some boards are faulty. On the NMS, check whether the local NE and the opposite NE report any hardware-related alarms such as HARD_BAD. If yes, perform a cold reset for the boards that report hardware-related alarms. If the alarm persists then, replace the boards that may be faulty.

HUAWEI TECHNOLOGIES CO., LTD.

Huawei Confidential

Page 35

Common Alarms on E1 Ports (3)

The DOWN_E1_AIS is an alarm indication for downstream 2 Mbit/s signals. This alarm is reported when the downstream E1 signal is all 1s.

Possible Causes

Cause 1: The alarmed board also reports the UP_E1_AIS or T_ALOS alarm.
Cause 2: Some boards are faulty.

Handling Procedure Cause 1: The alarmed board also reports the UP_E1_AIS or T_ALOS alarm. Check whether the alarmed board reports the UP_E1_AIS or T_ALOS alarm on the NMS. If yes, clear the UP_E1_AIS or T_ALOS alarm immediately. Cause 2: Some boards are faulty. On the NMS, check whether the alarmed board and local cross-connect board report any hardwarerelated alarms such as HARD_BAD. If yes, perform a cold reset for the boards that report hardware-related alarms. If the alarm persists, replace the boards that may be faulty.

HUAWEI TECHNOLOGIES CO., LTD.

Huawei Confidential

Page 36

Common Alarms on Other Links (1)

The IN_PWR_ABN is an alarm indicating that the input optical power is abnormal. Cause 1: The opposite transmit power is abnormal. Cause 2: The local receive power is higher than the upper threshold. Cause 3: The local receive power is lower than the lower threshold. Cause 4: The receive board is faulty.

Possible Causes
   

Handling Procedure Cause 1: The opposite transmit power is abnormal. On the NMS, check whether the opposite site reports the OUT_PWR_ABN alarm. If yes, clear this alarm immediately and check whether the IN_PWR_ABN is cleared. If the alarm persists, query the local receive power and handle the alarm according to other causes. Cause 2: The local receive power is higher than the upper threshold. Add proper optical attenuators to the receive optical port and adjust the input optical power to a normal value.

HUAWEI TECHNOLOGIES CO., LTD.

Huawei Confidential

Page 37

Verify that the bending radius of the pigtail on the local site is no smaller than 6 cm. If the alarm persists. replace the optical module and clean the fiber connectors at the two ends. If the alarm persists. Check whether the processing board and cross-connect board on the local site report any hardware-related alarms such as HARD_BAD and TEMP_OVER. If yes. use proper optical attenuators and correctly connect the local optical module. LTD. Huawei Confidential Page 38 . HUAWEI TECHNOLOGIES CO..Common Alarms on Other Links (1) Handling Procedure Cause 3: The local receive power is lower than the lower threshold. Cause 4: The receive board is faulty. replace the boards that report hardware-related alarms.

Replace the optical module of the alarmed port. LTD. Cause 2: The alarmed board is faulty. Huawei Confidential Page 39 . Cause 2: The alarmed board is faulty. Possible Causes   Cause 1: The output optical power is over high or over low. HUAWEI TECHNOLOGIES CO. Replace the alarmed board.Common Alarms on Other Links (2)  The OUT_PWR_ABN is an alarm indicating that the output optical power is abnormal.. Handling Procedure Cause 1: The output optical power is over high or over low.

If yes. enable the automatic shutdown function for looped-back ports. Possible Causes   Handling Procedure Cause 1: The port is looped back.. LTD. For Ethernet services. release the loopback. On the NMS. If yes.Common Alarms on Other Links (3)  The LOOP_ALM is an alarm of loopbacks. Cause 1: The port is looped back. check whether the alarmed port is looped back. release the loopback. Cause 2: The service is looped back. check whether the service is looped back. On the NMS. Huawei Confidential Page 40 . HUAWEI TECHNOLOGIES CO. Cause 2: The service is looped back.

 HUAWEI TECHNOLOGIES CO.  Check whether the actual received traffic indicated by the alarm parameter is higher than the port bandwidth. Handling Procedure Cause 1: The traffic received by the port is higher than the preset threshold of the port. Configure the service on an unused port..Common Alarms on Other Links (4)  The FLOW_OVER is an alarm indicating the traffic received by the port is higher than the threshold. LTD. Possible Causes  Cause 1: The traffic received by the port is higher than the preset threshold of the port. Huawei Confidential Page 41 . If yes. reduce the data transmitted by the opposite site.

.Contents 1 2 3 4 5 6 7 8 9 10 Process of Locating Common Faults Locating Link Faults Locating Faults of TDM Services Locating Faults of Packet Services Locating Faults of Protection Schemes Locating Clock Faults Locating DCN Faults Locating Other Faults Handling Common Alarms Typical Cases of Fault Locating Reference Documents Huawei Confidential 11 HUAWEI TECHNOLOGIES CO. LTD. Page 42 .

. Go to the next step. Yes Handle alarms. Huawei Confidential Page 43 . No Faults are rectified? Yes End HUAWEI TECHNOLOGIES CO. Locate faults by performing sectional loopbacks. Any alarms or events related to RS errors? Yes Process RS errors on different boards. 4 IF boards Handle RS errors on IF boards. No 7 Any alarms related to LOP errors? No Yes Handle LOP errors. 5 Handle RS errors on STM-1 electrical interface boards. No STM-1 electrical boards Any alarms or events related to MS errors or HOP errors? Yes 6 Handle MS errors and HOP errors. SDH optical interface boards 3 Handle RS errors on SDH optical interface boards.Locating Faults of TDM Services Common Locating Process Start 1 Any equipment alarms? No Any pointer justifications? No Yes 2 Handle pointer justifications. LTD.

4. optical power is abnormal. or fiber splice and fiber connectors are dirty. Clock quality deterioration on the network causes pointer justifications. On STM-1 electrical lines. On optical lines. 2. the MW_FEC_UNCOR or RPS_INDI alarm is reported.Locating Faults of TDM Services Common Symptoms and Causes Fault Type Equipment faults Common Cause 1. 5. LTD. • • • 2. On microwave links. and clock tracing needs to be checked or some boards need to be replaced. HUAWEI TECHNOLOGIES CO. grounding is incorrect. or connectors are incorrectly connected. Regenerator section (RS) errors 1. Clock quality deteriorates on the network. Huawei Confidential Page 44 . The clock unit fails. The line is faulty. electrical cables deteriorate. Clock tracing fails and the upstream link clocks need to be checked.. 3. The line board fails. fiber performance deteriorates. Over high board temperature causes bit errors. 3. The board reports the HARD_BAD alarm.

(HOP) errors. 5. Clock quality deterioration on the network causes pointer justifications.. 3. 1. 4. 2. The cross-connect board is faulty. or external interference exists. Unstable power supply. HUAWEI TECHNOLOGIES CO. Operating temperature on the line board is over high. Clock quality deteriorates on the network. but not RS errors. LTD.Locating Faults of TDM Services Common Symptoms and Causes Fault Type Common Cause The line board is faulty. There are multiplex section (MS) 1. Huawei Confidential Page 45 . The PDH service processing board or Ethernet service processing board is faulty. incorrect grounding. errors and higher order path 2. There are only lower order path (LOP) errors. The PDH service processing board or Ethernet service processing board has over high working temperature. 3. 4. The working temperature on the cross-connect board is over high.

. Analyze and process clock alarms. Ensure that the configuration is correct and fibers are correctly connected. 2. Locate the sites with clock asynchronization by changing clock configuration. Handling Method Focus on: TEMP_ALARM SYN_BAD HARD_BAD MW_CFG_MISMATCH 1. Huawei Confidential Page 46 . 4. Handle pointer justifications. Handle alarms. 2. LTD. Replace the components with poor performance. 3. HUAWEI TECHNOLOGIES CO.Locating Faults of TDM Services Handling Method Procedure 1.

locate the fault by performing loopbacks on optical ports. Exchange the fiber cores in the transmit and receive directions on a section of optical channel. If any of these alarms is reported. If none of these alarms is reported. 3. If the alarm is still reported by the alarmed board.Locating Faults of TDM Services Handling Method Procedure 4. 3. the fibers are faulty or the equipment malfunctions at the two ends. Check for the MW_FEC_UNCOR and RPS_INDI alarms. Handling Method 1. the line board on the site is faulty. If the equipment at the two ends is faulty. LTD. If errors change after the fiber cores are exchanged. clear the alarm immediately. and whether any fiber connector is dirty or damaged. check whether the fiber from the equipment to the optical distribution frame (ODF) and the fiber that is led out from the telecommunications room are pressed. replace the alarmed board or exchange the slots of the alarmed board and anther working SDH optical interface board. 2. Huawei Confidential Page 47 . 2. Handle the RS errors on the IF board. the alarmed board is faulty. 5. If the fault persists after the loopback on a site. If fibers are faulty. replace the IF board. 1.. If the equipment at the two ends is faulty. 4. Handle the RS errors on the SDH optical interface board. HUAWEI TECHNOLOGIES CO.

2. Exchange the electrical cables in the receive and transmit directions. If the alarm is still reported by the alarmed board. LTD.Locating Faults of TDM Services Handling Method Procedure 5.. replace the alarmed board. If the fault persists after a loopback is performed on a site. 6. Handle the RS errors on the STM-1 electrical interface board. Handle the MS errors and HOP errors. If errors change after the exchange. If the alarm persists. Check whether the electrical cables are grounded properly and whether the cable connectors and cables are damaged. If the alarm persists after board replacement. locate the fault by performing loopbacks on electrical ports. the alarmed board is faulty. HUAWEI TECHNOLOGIES CO. check for unstable power supply. the line board on the site is faulty. If the equipment at the two ends is faulty. replace the transmit line board. Perform a loopback on the alarmed board. replace the alarmed board or exchange the slots of the alarmed board and anther working SDH electrical interface board. 4. and external interference on the SDH electrical interface board. 1. 3. improper grounding. Huawei Confidential Page 1 . If the equipment at the two ends is faulty. 4. 2. 3. which corresponds to the alarmed board. If the alarm clears. Handling Method 1. the electrical cables are faulty or the equipment at the two ends is faulty.

or cross-connect boards along the overlapped route of errored services. If the alarm persists after board replacement. and external interference. LTD. Huawei Confidential Page 2 .Locating Faults of TDM Services Handling Method Procedure 7. check for unstable power supply. HUAWEI TECHNOLOGIES CO. Handle LOP errors. improper grounding. Ethernet service processing boards. 2. Replace PDH service processing boards.. Handling Method 1.

Cause 5: The channel spacing is different on both ends of a microwave link.           Handling Procedure Cause 1: The number of E1 signals is different on both ends of a microwave link. Cause 2: The AM enabling is different on both ends of a microwave link.Locating Faults of TDM Services Common Alarms  The MW_CFG_MISMATCH is an alarm indicating a configuration mismatch on microwave links. Huawei Confidential Page 3 . Cause 3: The IEEE 1588 overhead enabling is different on both ends of a microwave link. the microwave link. Cause 4: The modulation mode is different on both ends of a microwave link. Then. Cause 5: The channel spacing is different on both ends of a microwave link. Cause 4: The modulation mode is different on both ends of a microwave link. check the configuration on both ends of HUAWEI TECHNOLOGIES CO.. Ensure that the configuration is the same on both ends of the microwave link. Cause 2: The AM enabling is different on both ends of a microwave link.   Possible Causes Cause 1: The number of E1 signals is different on both ends of a microwave link (including the number of E1 signals on the active page and the number of E1 signals on the standby page). LTD. Cause 3: The IEEE 1588 overhead enabling is different on both ends of a microwave link. Determine the possible cause of the alarm according to the alarm parameters.

.Contents 1 2 3 4 5 6 7 8 9 10 11 Process of Locating Common Faults Locating Link Faults Locating Faults of TDM Services Locating Faults of Packet Services Locating Faults of Protection Schemes Locating Clock Faults Locating DCN Faults Locating Other Faults Handling Common Alarms Typical Cases of Fault Locating Reference Documents Huawei Confidential HUAWEI TECHNOLOGIES CO. LTD. Page 4 .

or COMMUN_FAIL occurs? Yes Board hardware errors or inter-board communication failure Reset/Reseat/ Replace boards. No Troubleshoot the opposite equipment. or LASER_MOD_ERR occurs? No Signal loss or degrade No Reset/Reseat/ Replace boards. Loss of synchronization clock Troubleshoot the opposite equipment. R_LOS. or jitters Change network configurations. No Faults are rectified? Yes Contact Huawei engineers. or network cables.Locating Faults of CES Services Common Locating Process Start HARD_BAD. and connections. TEMP_OVER. No optical modules. LTD. No Yes Troubleshoot fibers. No Yes SYNC_C_LOS or LTI occurs? No Troubleshoot clock faults.. No Yes Troubleshoot fibers. T_ALOS. lost packets. Yes MPLS_TUNNEL_LO CV occurs? Tunnel faults Troubleshoot physical links. optical modules. End HUAWEI TECHNOLOGIES CO. Huawei Confidential Page 5 . CES_LOSPKT_EXC or CES_JTRUDR_EXC occurs? Excessive errored packets. BUS_ERR.

Locating Faults of CES Services . TEMP_OVER. LTD. Huawei Confidential Page 6 .. or BUS_ERR COMMUN_FAIL T_ALOS UP_E1_AIS or DOWN_E1_AIS R_LOS.Common Symptoms Symptom CES services are interrupted. or IN_PWR_ABN MPLS_TUNNEL_LOCV HUAWEI TECHNOLOGIES CO. LASER_MOD_ERR. Alarm Reported HARD_BAD.

. CES_MALPKT_EXC. IN_PWR_ABN. CES_STRAYPKT_EXC. LTD.Common Symptoms Symptom CES services have errors and the signal quality degrades. TEM_HA. or BUS_ERR SYNC_C_LOS or LTI CES_LOSPKT_EXC. CES_MISORDERPKT_EXC.Locating Faults of CES Services . or CES_JTROVR_EXC LSR_WILL_DIE. CES_JTRUDR_EXC. TEMP_OVER. Huawei Confidential Page 7 . Alarm Reported HARD_BAD. or LSR_BCM_ALM HUAWEI TECHNOLOGIES CO.

Locating Faults of CES Services Common Causes
Cause 1: The board carrying CES services cannot work properly due to hardware errors, overhigh temperature, or inter-board communication failure. Cause 2: The signal transmitted to the processing board or interface board is lost or degrades. Cause 3: The tunnel or PW carrying CES services is interrupted.

Cause 4: On the NE, the priority of synchronization clock source is lost, or the synchronization
clock source is lost. Cause 5: On the PW carrying CES services, the number of lost packets, errored packets, or jitters within a time unit crosses the threshold.

HUAWEI TECHNOLOGIES CO., LTD.

Huawei Confidential

Page 8

Locating Faults of CES Services - Handling Method
Cause 1: The board carrying CES services cannot work properly due to hardware errors, over-high temperature, or inter-board communication failure. Handle the HARD_BAD, TEMP_OVER, COMMUN_FAIL, or BUS_ERR alarm if any of them is reported. Cause 2: The signal transmitted to the processing board or interface board is lost or degrades. Handle the T_ALOS, UP_E1_AIS, DOWN_E1_AIS, R_LOS, LASER_MOD_ERR, LSR_WILL_DIE, IN_PWR_ABN, TEM_HA, or LSR_BCM_ALM alarm if any of them is reported. Cause 3: The tunnel or PW carrying CES services is interrupted. Enable MPLS OAM. Handle the MPLS_TUNNEL_LOCV alarm if it is reported. Cause 4: On the NE, the priority of synchronization clock source is lost, or the synchronization clock source is lost. Handle the SYNC_C_LOS or LTI alarm if any of them is reported. Cause 5: On the PW carrying CES services, the number of lost packets, errored packets, or jitters within a time unit crosses the threshold. Handle the CES_LOSPKT_EXC, CES_MISORDERPKT_EXC, CES_STRAYPKT_EXC, CES_JTRUDR_EXC, or CES_JTROVR_EXC alarm if any of them is reported.

HUAWEI TECHNOLOGIES CO., LTD.

Huawei Confidential

Page 9

Common Alarms of CES Services (1)

The CES_JTROVR_EXC/CES_JTRUDR_EXC is an alarm indicating the overflow/underflow of CES jitters.


   

Possible Causes
Cause 1: Clock synchronization cannot be performed. Cause 2: Link quality deteriorates, causing more jitters. Cause 3: The size of buffer area is set to a low value. Cause 4: There are too many hops of microwave link on the network side, which generates a large number of jitters.


       

Handling Procedure
Cause 1: Clock synchronization cannot be performed.
On the NMS, check whether the LTI or other clock alarms are reported. If yes, clear these alarms. Cause 2: Link quality deteriorates, causing more jitters. Check whether the alarmed port also reports IN_PWR_ABN or TEM_HA. If yes, clear the IN_PWR_ABN or TEM_HA alarm immediately. Cause 3: The size of buffer area is set to a low value. On the NMS, increase the size of buffer area if possible.

Cause 4: There are too many hops of microwave link on the network side, which generates a large number of jitters.
Reduce the number of hops on the network side.

HUAWEI TECHNOLOGIES CO., LTD.

Huawei Confidential

Page 10

Huawei Confidential Page 11 . or optical modules. Cause 3: The tunnel or PW carrying CES services is congested. check whether the LTI or other clock alarms are reported.. replace the cables. Clean the fiber connectors and optical modules. If the bandwidth and QoS settings cannot meet the requirements of CES services.      Possible Causes Cause 1: Clock synchronization cannot be performed. Modify the parameter settings to the same. fibers. or optical modules that may be faulty. If the alarm persists. On the NMS. or optical modules. Cause 4: The link signal deteriorates or is interrupted due to a fault of cables. and change QoS settings. optical fibers.   HUAWEI TECHNOLOGIES CO.Common Alarms of CES Services (2)  The CES_LOSPKT_EXC is an alarm indicating packet loss of CES services. clear these alarms. LTD. replan the service trail. Cause 2: Parameter settings are different at the two ends of CES services. On the NMS. check whether the bandwidth configured for the tunnel or PW is too low and whether the QoS parameters are set properly. If yes. Cause 3: The tunnel or PW carrying CES services is congested. increase the bandwidth. Verify that electrical cables and fibers are correctly connected to the ports. Cause 2: Parameter settings are different at the two ends of CES services. Cause 4: The link signal deteriorates or is interrupted due to a fault of cables. optical fibers.        Handling Procedure Cause 1: Clock synchronization cannot be performed.

. fibers. If the bandwidth and QoS settings cannot meet the requirements of CES services. LTD. Huawei Confidential Page 12 . If the alarm persists. Modify the incorrect parameter settings on the NMS. Verify that electrical cables and fibers are correctly connected to the ports. Cause 2: The tunnel or PW carrying CES services is congested.   Cause 3: The link signal deteriorates or is interrupted due to a fault of cables. Cause 2: The tunnel or PW carrying CES services is congested. Clean the fiber connectors and optical modules.Common Alarms of CES Services (3)  The CES_MALPKT_EXC is an alarm indicating deformed packets of CES services. or optical modules. or optical modules that may be faulty. and change QoS settings. replace the cables. or optical modules. increase the bandwidth.      Handling Procedure Cause 1: Parameters of CES services are set incorrectly. check whether the bandwidth configured for the tunnel or PW is too low and whether the QoS parameters are set properly. optical fibers.     Possible Causes Cause 1: Parameters of CES services are set incorrectly. HUAWEI TECHNOLOGIES CO. optical fibers. On the NMS. Cause 3: The link signal deteriorates or is interrupted due to a fault of cables. replan the service trail.

If yes. clear these alarms.  HUAWEI TECHNOLOGIES CO. check whether the LTI or other clock alarms are reported. Cause 3: The link signal deteriorates or is interrupted due to a fault of cables.Common Alarms of CES Services (4)  The CES_MISORDERPKT_EXC is an alarm indicating disordered packets of CES services. replan the service trail. If the alarm persists.      Handling Procedure Cause 1: Clock synchronization cannot be performed. optical fibers. or optical modules. fibers. If the bandwidth and QoS settings cannot meet the requirements of CES services. or optical modules that may be faulty. Cause 3: The link signal deteriorates or is interrupted due to a fault of cables.. LTD. and change QoS settings. replace the cables. optical fibers. Cause 2: The tunnel or PW carrying CES services is congested. Cause 2: The tunnel or PW carrying CES services is congested.     Possible Causes Cause 1: Clock synchronization cannot be performed. or optical modules. On the NMS. increase the bandwidth. check whether the bandwidth configured for the tunnel or PW is too low and whether the QoS parameters are set properly. Clean the fiber connectors and optical modules. Huawei Confidential Page 13 . On the NMS. Verify that electrical cables and fibers are correctly connected to the ports.

Possible Causes Cause 1: Parameter settings are different at the two ends of CES services. LTD. Modify the parameter settings to the same.Common Alarms of CES Services (5)     The CES_STRAYPKT_EXC is an alarm indicating errored packets of CES services. Reconnect the fibers or cables correctly.      Handling Procedure Cause 1: Parameter settings are different at the two ends of CES services. Cause 2: Fibers or cables are connected incorrectly.. Cause 2: Fibers or cables are connected incorrectly. Huawei Confidential Page 14 . HUAWEI TECHNOLOGIES CO.

LOOP_AL M occurs? Yes Loopbacks on ports Release loopbacks. No Reset/Reseat/ Replace boards. optical modules. BUS_ERR.Locating Faults of ETH Services Common Locating Process Start HARD_BAD.. No Yes FLOW_OVER occurs? Service configuration faults Rectify service configuration faults. or COMMUN_FAIL occurs? Yes Board hardware errors or inter-board communication failure Reset/Reseat/ Replace boards. Huawei Confidential Page 15 . No Faults are rectified? Yes Contact Huawei engineers. port negotiation failure Change parameter settings on ports. TEMP_OVER. ETH_LOS occurs? No ETH_LINK_ DOWN occurs? No Signal loss or degrade Yes Incorrect connections on network ports. LTD. or network cables. No Yes Troubleshoot fibers. End HUAWEI TECHNOLOGIES CO.

LTD.Locating Faults of ETH Services . TEMP_OVER. or BUS_ERR COMMUN_FAIL ETH_LOS. or LOOP_ALM LASER_SHUT or LSR_WILL_DIE Ethernet services have packet loss or errored packets. Alarm Reported HARD_BAD. TEMP_OVER. or BUS_ERR LSR_WILL_DIE FLOW_OVER HUAWEI TECHNOLOGIES CO.Common Symptoms Symptom Ethernet services are interrupted.. ETH_AUTO_LINK_DOWN. HARD_BAD. ETH_LINK_DOWN. Huawei Confidential Page 16 .

Locating Faults of ETH Services Common Causes  Cause 1: The board carrying ETH services cannot work properly due to hardware errors. Cause 5: Traffic limit on Ethernet ports is set to a low value or parameter settings are different on source and sink ports. Cause 4: Loopbacks are performed for Ethernet ports.. Cause 2: The signal is lost in the receive direction. LTD. over-high temperature. Cause 3: Negotiation between Ethernet ports fails due to incorrect connections on Ethernet ports. Huawei Confidential Page 17 . or inter-board communication failure.     HUAWEI TECHNOLOGIES CO.

Cause 5: Traffic limit on Ethernet ports is set to a low value or parameter settings are different on source and sink ports. HUAWEI TECHNOLOGIES CO. 2. Handle the ETH_LOS. R_LOS. Check whether the working modes of interconnected Ethernet ports are the same. Huawei Confidential Page 18 . Handle the HARD_BAD. LTD. Handle the LOOP_ALM or ETH_EFM_LOOPBACK alarm if any of them is reported. 1. over-high temperature..Handling Method Cause 1: The board carrying ETH services cannot work properly due to hardware errors. or LSR_WILL_DIE alarm if any of them is reported. Handle the FLOW_OVER or ETH_CFM_UNEXPERI alarm if any of them is reported. LASER_SHUT. Cause 3: Negotiation between Ethernet ports fails due to incorrect connections on Ethernet ports. COMMUN_FAIL. or BUS_ERR alarm if any of them is reported. Cause 4: Loopbacks are performed for Ethernet ports.Locating Faults of ETH Services . Cause 2: The signal is lost in the receive direction. TEMP_OVER. Handle the ETH_LINK_DOWN alarm if it is reported. or inter-board communication failure.

The chip checks tunnel labels along the service flow. Start link-layer detection. LTD. Are there incorrect NE labels? Inform users of incorrect NE labels and suggest modifications. Ping tests are successful.Common Locating Process Perform tunnel ping tests.Locating Tunnel Faults . Perform TraceRoute tests. Locate faulty NEs and links. Huawei Confidential Page 19 . HUAWEI TECHNOLOGIES CO.. Tunnel layer is normal.

Huawei Confidential Page 20 .Locating Tunnel Faults . Protection switching fails. MPLS tunnels are faulty. causing service interruption.. HUAWEI TECHNOLOGIES CO. Cause 2: The physical links carrying the tunnels are faulty. Common Causes    Cause 1: Cross-connections cannot be created. or bit errors. and therefore services cannot be provisioned. packet loss.Common Symptoms and Causes Common Symptoms    MPLS tunnels cannot be created. Cause 3: Protection switching fails. causing service interruption. LTD.

Check whether incompatible features are configured for the tunnel. ETH_LOS. Huawei Confidential Page 21 . MPLS APS protection switching fails. LTD. 1. If yes. Handle the HARD_BAD. 2. or MPLS_TUNNEL_LOCV alarm if any of them is reported. HUAWEI TECHNOLOGIES CO. Cause 3: Protection switching fails. MPLS_TUNNEL_BDI. MPLS_TUNNEL_Excess. Handle the failure.Handling Method Cause 1: Cross-connections cannot be created.. Check the IP address of each NE on the LSP.Locating Tunnel Faults . replan tunnels or delete redundant tunnels. 1. Check whether any exceptions (such as board failure or NE reset) occur on the opposite equipment. 2. MPLS_TUNNEL_FDI. Cause 2: The physical links carrying the tunnels are faulty. 1. R_LOS. handle the exceptions. 3. change the IP addresses to values on different network segments. If the IP addresses of two NEs are on the same network segment. If yes. Check whether the number of created tunnels reaches the maximum value.

Cause 3: Some boards on the ingress node are being reset. Check the parameter of CV/FFD status on the ingress node. On the NMS. clear this alarm. Check whether the settings of detection mode and detection packet type are consistent on the two ends. Cause 2: The physical link carrying the tunnel is faulty. change it to enabled.. 1. If the CV/FFD status is disabled.        Possible Causes Cause 1: The ingress node on the tunnel stops transmitting CV/FFD packets. ETH_LOS. Huawei Confidential Page 22 . make consistent settings. Cause 5: Severe congestion occurs on the network. Cause 2: The physical link carrying the tunnel is faulty. If yes. HUAWEI TECHNOLOGIES CO.    2. Cause 6: The CPU is highly occupied and cannot process ARP protocol packets. If not.    Handling Procedure Cause 1: The ingress node on the tunnel stops transmitting CV/FFD packets. LTD. or ETH_LINK_DOWN alarm.Locating Tunnel Faults – Common Alarms (1)  The MPLS_TUNNEL_LOCV is an alarm indicating the loss of tunnel connectivity. check whether the egress node reports the HARD_BAD. Cause 4: The service interface is configured incorrectly.

Locating Tunnel Faults – Common Alarms (1)
Handling Procedure
Cause 3: Some boards on the ingress node are being reset. On the NMS, check whether the ingress node reports the COMMUN_FAIL alarm. If yes, clear this alarm.

Cause 4: The service interface is configured incorrectly.
Check whether the tunnel is configured on a proper port according to the NE planning table. Cause 5: Severe congestion occurs on the network. Check the bandwidth utilization of each port on the LSP. If the bandwidth of some ports is exhausted, allocate some traffic to other links or increase the bandwidth of congested ports.

Cause 6: The CPU is highly occupied and cannot process ARP protocol packets.
Check for the CPU_BUSY alarm on the NMS and clear this alarm immediately if it is reported.

HUAWEI TECHNOLOGIES CO., LTD.

Huawei Confidential

Page 23

Locating Tunnel Faults – Common Alarms (2)

The MPLS_TUNNEL_BDI/MPLS_TUNNEL_FDI is an alarm indicating defects in the forward/backward direction of a tunnel.


Possible Causes
Cause: The upstream NE detects that the tunnel at the physical layer is faulty


Handling Procedure
Cause: The upstream NE detects that the tunnel at the physical layer is faulty On the physical link between the local NE and its upstream NE, check for the faults such as fiber cuts, failure in optical modules, and board failure. Rectify the fault if any.

HUAWEI TECHNOLOGIES CO., LTD.

Huawei Confidential

Page 24

Locating PW Faults - Common Symptoms and Causes
Common Symptoms 1. PWs cannot be created, and therefore services cannot be provisioned.

2. PWs are faulty, causing service interruption, packet loss, or bit errors.

Common Causes Cause 1: The physical link carrying the PW is faulty. Cause 2: Cross-connections of PWs cannot be created. Cause 3: The tunnels carrying PWs are faulty.

HUAWEI TECHNOLOGIES CO., LTD.

Huawei Confidential

Page 25

Handle the HARD_BAD. Handle the faults on tunnels. Check whether the number of created PWs reaches the maximum value. Huawei Confidential Page 26 . 2. Cause 3: The tunnels carrying PWs are faulty. replan PWs or delete redundant PWs. LTD. Check whether any exceptions (such as board failure or NE reset) occur on the opposite equipment.Handling Method Cause 1: The physical link carrying the PW is faulty. HUAWEI TECHNOLOGIES CO. handle the exceptions.Locating PW Faults . Check whether the physical link between the ingress and egress nodes is normal.. R_LOS. or ETH_LOS alarm if any of them is reported. If yes. If yes. 1. Cause 2: Cross-connections of PWs cannot be created. 1. LASER_MOD_ERR. 1.

      Handling Procedure Cause 1: A small number of packets are lost on the PW.Locating PW Faults – Common Alarms  The PW_DROPPKT_EXC is an alarm indicating that the number of lost packets on the PW crosses the threshold. replan the trail of services or increase the bandwidth of congested ports. LTD.. Possible Causes Cause: A small number of packets are lost on the PW. HUAWEI TECHNOLOGIES CO. If yes. Check whether any service ports on the PW are congested. Huawei Confidential Page 27 .

LTD.Contents 1 2 3 4 5 6 7 8 9 10 Process of Locating Common Faults Locating Link Faults Locating Faults of TDM Services Locating Faults of Packet Services Locating Faults of Protection Schemes Locating Clock Faults Locating DCN Faults Locating Other Faults Handling Common Alarms Typical Cases of Fault Locating Reference Documents Huawei Confidential 11 HUAWEI TECHNOLOGIES CO.. Page 28 .

VOLT_LOS. R_LOF.and service-related alarms occur: POWER_FAIL.  RADIO_RSL_HIGH.. MW_LOF. IF_INPWR_ABN.Locating Faults of 1+1 Protection Common Symptoms Fault Symptoms   1+1 protection switching cannot be triggered. 1+1 protection switching is delayed. Huawei Confidential Page 29 . services cannot be The following hardware. RADIO_TSL_LOW.   The packet services transmitted on the Hybrid microwave link are unavailable. HUAWEI TECHNOLOGIES CO. R_LOS. RADIO_TSL_HIGH. R_LOC. After the working channel of a 1+1 protection group is restored. MW_RDI switched from the protection channel to the working channel. LTD. CONFIG_NOSUPPORT. HARD_BAD.

Cause 4: Connections between the IF board and the EMS6 board are incorrect. Cause 3: Hardware-related alarms occur. anti-jitter function is performed upon service alarms and the RDI alarm. LTD. the NE is being reset. or the cable connectors are in poor contact. HUAWEI TECHNOLOGIES CO. Cause 6: IF cables are connected incorrectly. the switching between active and standby SCC boards is being performed. Cause 5: Switching is triggered again upon the RDI alarm.Locating Faults of 1+1 Protection Common Causes Possible Causes Cause 1: The 1+1 protection group is in forced switching state.. Huawei Confidential Page 30 . Cause 2: The 1+1 protection group works in non-revertive mode or works in RDI state.

Cause 3: Hardware-related alarms occur. the switching between active and standby SCC boards is being performed. Connect IF cables correctly. Cause 5: Switching is triggered again upon the RDI alarm. Cause 4: Connections between the IF board and the EMS6 board are incorrect. the NE is being reset. LTD. or the cable connectors are in poor contact. Handle these alarms. anti-jitter function is performed upon service alarms and the RDI alarm. Set the revertive mode of the protection group to revertive.. Cause 2: The 1+1 protection group works in non-revertive mode or works in RDI state. Huawei Confidential Page 31 . Perform the 1+1 switching 30 minutes later. Cause 6: IF cables are connected incorrectly.Locating Faults of 1+1 Protection Handling Method Handling Procedure Cause 1: The 1+1 protection group is in forced switching state. Re-connect the network cables between the IF board and the EMS6 board or use new cable connectors. Clear the forced switching state. HUAWEI TECHNOLOGIES CO.

LTD. Set the revertive mode of the 1+1 protection group to revertive. Troubleshoot the working channel.. Services are transmitted on the standby channel.Locating Faults of 1+1 Protection Common Alarms  The RPS_INDI is a microwave protection switching alarm indication. Huawei Confidential Page 32 . Possible Causes  Handling Procedure   HUAWEI TECHNOLOGIES CO.

Cause 3: TU_AIS insertion upon E1_AIS is not provided (for OptiX RTN 600 V100R005 and OptiX RTN 900 V100R002C01 and later versions).and service-related events occur: performance events of SDH SNCP protection switching Possible Causes Cause 1: SNCP switching fails because the NE software version mismatches the board software version. Huawei Confidential Page 33 . The following hardware. LTD.Locating Faults of SNCP Protection Common Symptoms and Causes Fault Symptoms   SNCP switching fails.. HUAWEI TECHNOLOGIES CO. Cause 2: The working and protection channels of an SNCP protection group fail.

Upgrade the NE software or board software.Locating Faults of SNCP Protection Handling Method Handling Procedure Cause 1: SNCP switching fails because the NE software version mismatches the board software version. LTD. Huawei Confidential Page 34 . Set the TU_AIS insertion upon E1_AIS on the NMS. Cause 3: TU_AIS insertion upon E1_AIS is not provided (for OptiX RTN 600 V100R005 and OptiX RTN 900 V100R002C01 and later versions).. HUAWEI TECHNOLOGIES CO. Troubleshoot the channels. Cause 2: The working and protection channels of an SNCP protection group fail.

Huawei Confidential Page 35 . HUAWEI TECHNOLOGIES CO. Services are transmitted on the standby channel.. Possible Causes  Handling Procedure 1. 2.Locating Faults of SNCP Protection Common Alarms  The PS is an alarm indicating protection switching. Troubleshoot the active channel. LTD. Set the revertive mode of the SNCP protection group to revertive.

Yes No Reconnect fibers or cables. No Faults are rectified? Yes Contact Huawei engineers. Configurations differ on two ends? No Fibers or cables are connected incorrectly? Yes Change the configurations to the same.Common Locating Process Start ETH_APS_PA TH_MISMATC H occurs? Yes Working and protection channels of an APS group differ on the two ends.. Huawei Confidential Page 36 . LTD. Configurations are the same on two ends? Yes APS protocol is enabled on both ends? Yes No Change the configurations to the same. ETH_APS_LOST occurs? Yes APS frames are lost on the protection channel. Yes Troubleshoot the protection channel. End HUAWEI TECHNOLOGIES CO.Locating APS Faults . Yes Hardware alarms occur? No Clock alarms occur? No Tunnel-level alarms occur on the protection channel? Rectify board hardware faults. Yes Troubleshoot clocks. No Enable APS protocol on both ends.

Alarm Reported ETH_APS_PATH_MISMATCH ETH_APS_LOST ETH_APS_SWITCH_FAIL ETH_APS_TYPE_MISMATCH The working tunnel or protection tunnel is faulty. MPLS_TUNNEL_LOCV MPLS_TUNNEL_MISMERGE MPLS_TUNNEL_MISMATCH MPLS_TUNNEL_Excess MPLS_TUNNEL_SD MPLS_TUNNEL_SF MPLS_TUNNEL_UNKNOWN HUAWEI TECHNOLOGIES CO.Common Symptoms Symptom The APS protection group is configured incorrectly or APS frames cannot be received. LTD. Huawei Confidential Page 37 ..Locating APS Faults .

. HUAWEI TECHNOLOGIES CO. Cause 6: The working tunnel or protection tunnel is faulty.   Cause 5: The system reports clock alarms. Huawei Confidential Page 38 .Common Causes     Cause 1: The settings of the APS protection group differ between the two ends. Cause 3: Fibers or cables are connected incorrectly. Cause 2: The APS protection group is deactivated.Locating APS Faults . Cause 4: APS frames cannot be transmitted because hardware-related alarms occur on the board that carries the protection channel. LTD.

If any of them is reported. HUAWEI TECHNOLOGIES CO. Check for the ETH_APS_LOST and ETH_APS_SWITCH_FAIL alarms. Cause 3: Fibers or cables are connected incorrectly.Handling Method Cause 1: The settings of the APS protection group differ between the two ends.. Reconnect the fibers or cables. LTD. Cause 2: The APS protection group is deactivated. Check for the ETH_APS_PATH_MISMATCH and ETH_APS_TYPE_MISMATCH alarms. If any of them is reported. handle the alarm. handle the alarm. Huawei Confidential Page 39 .Locating APS Faults .

and BUS_ERR) occur on the board that carries the protection channel. If yes. Check whether any hardware-related alarms (such as HARD_BAD. COMMUN_FAIL.. If yes. Check whether the system reports clock alarms such as TR_LOC. SYNC_C_LOS. HUAWEI TECHNOLOGIES CO.Handling Method Cause 4: APS frames cannot be transmitted because hardware-related alarms occur on the board that carries the protection channel. Cause 5: The system reports clock alarms. clear these alarms. Check for tunnel-level alarms. clear these alarms. LTD. Huawei Confidential Page 40 . Troubleshoot the tunnel. the tunnel is faulty. Cause 6: The working tunnel or protection tunnel is faulty. If a tunnel reports a tunnel-level alarm.Locating APS Faults . and LTI.

If the settings differ between the two ends.. Huawei Confidential Page 88 . Check whether the protection channel reports an alarm related to signal loss or signal degrade. deactivate the APS protocol at the other end and then activate the APS protocol at both ends. On the NMS.Common APS Alarms (1) The ETH_APS_LOST is an alarm indicating that APS frames are lost. Cause 2: The settings of the APS protection group differ between the two ends. change them to the same. If the opposite NE is configured with APS protection. clear the alarm immediately. create a matching APS protection group on the opposite NE and activate the APS protocol. Handling Procedure Cause 1: The opposite NE is not configured with APS protection. Cause 3: The APS protection group is deactivated. On the NMS. Cause 4: The service on the protection channel is interrupted.  HUAWEI TECHNOLOGIES CO. Cause 3: The APS protection group is deactivated. LTD. check whether the settings of the APS protection group are the same at the two ends. Cause 2: The settings of the APS protection group differ between the two ends. Check whether the APS protocol is activated at both ends. such as ETH_LOS. Cause 4: The service on the protection channel is interrupted. check whether the opposite NE is configured with APS protection. Possible Causes Cause 1: The opposite NE is not configured with APS protection. If yes. If the APS protocol is deactivated at one end.

On the NMS. On the NMS. The ETH_APS_TYPE_MISMATCH is an alarm indicating a protection scheme mismatch. Then. If the settings differ between the two ends. change them to the same. deactivate and activate the APS protection group at the two ends.  Possible Causes    Cause 1: The switching type is different. check whether the settings of the APS protection group are the same at the two ends. Possible Causes  Handling Procedure Cause 1: The settings of the APS protection group differ between the two ends. Handling Procedure Cause: The switching type. or revertive mode of the protection group differs between the two ends. check whether the settings of the APS protection group are the same at the two ends. switching mode. Then. Cause 2: The switching mode is different. Huawei Confidential Page 89 .Common APS Alarms (2)  The ETH_APS_SWITCH_FAIL is an alarm indicating a protection switching failure. If the settings differ between the two ends. change them to the same.. deactivate and activate the APS protection group at the two ends. Cause 1: The settings of the APS protection group differ between the two ends. HUAWEI TECHNOLOGIES CO. Cause 3: The revertive mode is different. LTD.

Locating ETH LAG Faults . Huawei Confidential Page 90 .Common Locating Process HUAWEI TECHNOLOGIES CO. LTD..

LTD.Common Symptoms Symptom Alarm Reported The LAG is invalid. and the services are interrupted. all the member ports cannot LAG_DOWN be used. and the service has packet loss.. The member ports in the LAG cannot be used. LAG_MEMBER_DOWN LOOP_ALM ETH_LOS ETH_LINK_DOWN HUAWEI TECHNOLOGIES CO.Locating ETH LAG Faults . Huawei Confidential Page 91 .

LTD. Huawei Confidential Page 92 .   Cause 3: The loopback is configured on the member ports in the LAG.. HUAWEI TECHNOLOGIES CO.Common Causes   Cause 1: The NEs at the two ends of the LAG are incorrectly configured.Locating ETH LAG Faults . Cause 4: The connections of the member ports in the LAG are faulty or lost. Cause 2: The working mode of the member ports in the LAG is set to halfduplex.

(2) Check whether the ETH_EFM_LOOPBACK alarm exists on each member port in the LAG. modify the working mode of each port to full-duplex. Cause 4: The connections of the member ports in the LAG are faulty or lost. release the loopback on each port to clear the LOOP_ALM alarm. clear the ETH_LOS or ETH_LINK_DOWN alarm. Check whether the working mode of each member port in the LAG is set to half-duplex.Locating ETH LAG Faults . Check whether the ETH_LOS or ETH_LINK_DOWN alarm exists on each member port in the LAG. If the working mode is set to half-duplex. If yes.Handling Method Cause 1: The NEs at the two ends of the LAG are incorrectly configured. Cause 3: The loopback is configured on the member ports in the LAG. (1) Check whether the LOOP_ALM alarm exists on each member port in the LAG. LTD. (2) Check whether the configurations of the NEs at the two ends of the LAG are consistent. and then check whether the alarm is cleared. If yes. If the configurations are inconsistent. Cause 2: The working mode of the member ports in the LAG is set to half-duplex. Huawei Confidential Page 93 . modify the configuration as the same. (1) Query current alarms and check whether the LAG_DOWN or LAG_MEMBER_DOWN alarm exists. HUAWEI TECHNOLOGIES CO. release the remote loopback to clear the ETH_EFM_LOOPBACK alarm. If yes..

Possible Causes   Cause 1: The opposite NE is not configured with any LAGs. the system generates an ETH_LOS. ETH_LINK_DOWN. check whether the opposite NE is configured with a LAG. Handle and clear the alarm and activate the member port. Handling Procedure Cause 1: The opposite NE is not configured with any LAGs. Huawei Confidential Page 94 . or LAG_MEMBER_DOWN alarm. configure one on the opposite NE and check whether the alarm clears. Cause 2: All member ports in the LAG are unavailable. When a member port in the LAG is unavailable.Common ETH LAG Alarms (1)  The LAG_DOWN is an alarm indicating that the LAG is unavailable.. On the NMS. LTD. HUAWEI TECHNOLOGIES CO. If the opposite NE is not configured with a LAG. Cause 2: All member ports in the LAG are unavailable.

Common ETH LAG Alarms (2)  The LAG_MEMBER_DOWN is an alarm indicating that a member port of a LAG is unavailable. Cause 3: The port works in half-duplex mode. clear the LAG_MEMBER_DOWN alarm. Cause 1: The port link is unavailable. Cause 2: The port receives no LACP packet. If yes. add the opposite port to the LAG and check whether the alarm clears.. check whether the opposite port is added to the LAG. Cause 4: The port is looped back. HUAWEI TECHNOLOGIES CO. Cause 3: The port works in half-duplex mode. clear the LAG_MEMBER_DOWN alarm. If yes. Possible Causes     Handling Procedure Cause 1: The port link is unavailable. check whether an ETH_LOS or FLOW_OVER alarm occurs on the port that reports the LAG_MEMBER_DOWN alarm. LTD. Cause 4: The port is looped back. enable the port in the LAG and check whether the alarm clears. check whether an ETH_AUTO_LINK_DOWN alarm occurs on the port that reports the LAG_MEMBER_DOWN alarm. Huawei Confidential Page 95 . On the NMS. Change the working mode of the port to auto-negotiation or full-duplex. check whether the port in the LAG is enabled. If the alarm persists. On the NMS. Release the loopback on the port. If the opposite port is not added to the LAG. Cause 2: The port receives no LACP packet. If the port is not enabled. If the alarm persists.

. LTD.Contents 1 2 3 4 5 6 7 8 9 10 11 Process of Locating Common Faults Locating Link Faults Locating Faults of TDM Services Locating Faults of Packet Services Locating Faults of Protection Schemes Locating Clock Faults Locating DCN Faults Locating Other Faults Handling Common Alarms Typical Cases of Fault Locating Reference Documents Huawei Confidential Page 96 HUAWEI TECHNOLOGIES CO..

HUAWEI TECHNOLOGIES CO. Possible Causes  Cause 1: The priority of the synchronous clock source on the service board is absent from the priority list.. Cause 6: The settings of clock tracing are incorrect. Cause 4: The signals of the synchronous clock source are degraded. LTD. cross-connect. Cause 2: The synchronous clock source is lost and the clock of the NE works improperly.Locating Clock Faults . Huawei Confidential Page 97 . Cause 3: The clock source is switched in SSM mode and the clock source traced by the NE is also switched. and timing board reports an EXT_SYNC_LOS/LTI/S1_SYN_CHANGE/SYNC_C_LOS/SYNC_DISABLE alarm.Common Symptoms and Causes Fault Symptoms   The service has bit errors or is interrupted. The system control.      Cause 5: The external clock source is lost.

If the EXT_SYNC_LOS alarm occurs. clear the EXT_SYNC_LOS alarm. clear the S1_SYN_CHANGE alarm. Cause 3: The clock source is switched in SSM mode and the clock source traced by the NE is also switched. If the S1_SYN_CHANGE alarm occurs.. LTD. HUAWEI TECHNOLOGIES CO. Check for the SYNC_C_LOS alarm. Select a different clock source (by performing a clock source switchover or re-configure the clock source priority list) and find out signal degrade causes along the clock tracing path. Cause 2: The synchronous clock source is lost and the clock of the NE works improperly. If the LTI alarm occurs. Check for the EXT_SYNC_LOS alarm. Check for the S1_SYN_CHANGE alarm. Check for the LTI alarm. clear the LTI alarm. If the SYNC_C_LOS alarm occurs. Cause 5: The external clock source is lost. Cause 4: The signals of the synchronous clock source are degraded. Cause 6: The settings of clock tracing are incorrect. Set clock tracing again according to network planning information.Handling Method Handling Procedure Cause 1: The priority of the synchronous clock source on the service board is absent from the priority list. Huawei Confidential Page 98 . clear the SYNC_C_LOS alarm.Locating Clock Faults .

. Possible Causes  Cause 1: The external clock source is configured in the clock source priority list. but the external clock source cannot be detected or become invalid. but the external clock source cannot be detected or become invalid. and check whether the cable that connects the external clock source is normal.Common Clock Alarms (1)  The EXT_SYNC_LOS is an alarm indicating the loss of the external clock source. Check whether the equipment that provides the external clock source is faulty. Handling Procedure Cause 1: The external clock source is configured in the clock source priority list. HUAWEI TECHNOLOGIES CO. Huawei Confidential Page 99 . LTD.

If the synchronization source is an external clock. Huawei Confidential Page 100 . Query the clock synchronization status and check whether the data in the clock source priority list meets the network planning requirement. Possible Causes  Cause 1: The clock configuration is incorrect. HUAWEI TECHNOLOGIES CO. Cause 2: All the clock sources in the clock source priority list fail. handle the alarm that occurs on the line board. handle the EXT_SYNC_LOS alarm.Common Clock Alarms (2) The LTI is an alarm indicating that the synchronous clock source is lost. handle the alarm that occurs on the tributary board. if the synchronization source is a tributary clock. Cause 2: All the clock sources in the clock source priority list fail.. handle the alarm that occurs on the IF board. if the synchronization source is a line clock. handle the alarm that occurs on the Ethernet board. if the synchronization source is an IF clock. Troubleshoot the synchronization sources based on the clock source priority list. LTD. if the synchronization source is an Ethernet clock.  Handling Procedure Cause 1: The clock configuration is incorrect.

Possible Causes  Cause 1: The original clock source is lost when the SSM protocol or extended SSM protocol is enabled. Handling Procedure Cause 1: The original clock source is lost when the SSM protocol or extended SSM protocol is enabled.Common Clock Alarms (3)  The S1_SYN_CHANGE is an alarm indicating that the clock source is switched in SSM or extended SSM mode. Handle the SYNC_C_LOS alarm that is related to the original clock source. LTD. HUAWEI TECHNOLOGIES CO.. Huawei Confidential Page 101 .

determine the synchronization source corresponding to the lost clock source. Based on the clock source priority list.. Possible Causes  Cause 1: The clock source is lost. Huawei Confidential Page 102 . LTD. Handling Procedure Cause 1: The clock source is lost.Common Clock Alarms (4)  The SYNC_C_LOS is an alarm indicating that the synchronization source is lost. HUAWEI TECHNOLOGIES CO.

replace the board that reports the alarm. HUAWEI TECHNOLOGIES CO. check whether the alarm clears. Possible Causes  Cause 1: The status of the automatic synchronization of SCC boards changes from enabled to disabled. Change the status of the automatic synchronization of SCC boards from disabled to enabled. Handling Procedure Cause 1: The status of the automatic synchronization of SCC boards changes from enabled to disabled. Huawei Confidential Page 103 . Then. If the alarm persists..Common Clock Alarms (5)  The SYNC_DISABLE is an alarm indicating that the automatic synchronization of SCC boards is disabled. LTD.

. LTD.Contents 1 2 Process of Locating Common Faults Locating Link Faults Locating Faults of TDM Services 3 4 Locating Faults of Packet Services Locating Faults of Protection Schemes Locating Clock Faults Locating DCN Faults Locating Other Faults 5 6 7 8 9 10 11 Handling Common Alarms Typical Cases of Fault Locating Reference Documents Huawei Confidential Page 104 HUAWEI TECHNOLOGIES CO. .

.Common Locating Process HUAWEI TECHNOLOGIES CO. LTD. Huawei Confidential Page 105 .Locating Inband DCN Faults .

and the NE is unreachable to the NMS. Cause 4: The received signals of the faulty NE are lost. the query result contains incomplete information. the communication between the NMS and the NE is interrupted.. HUAWEI TECHNOLOGIES CO. or the received optical power is excessively low. and therefore the inband DCN packets cannot be responded. LTD. NE IP addresses or subnet masks conflict. The NE icon on the NMS is gray.  When you query certain information on the NMS.  The operations on the NMS are not responded. Common Causes          Cause 1: On a network. or parameter settings for the interconnected ports are inconsistent.Locating Inband DCN Faults . Cause 6: A DCN storm or DCN interruption occurs as the third-party network that the DCN packets traverse is faulty. the NE IDs. and therefore the DCN packets cannot be extracted.Common Symptoms and Causes Common Symptoms The communication between the NMS and the NE is interrupted. Cause 2: The inband DCN port of the faulty NE is not enabled. Cause 3: The physical connection between the faulty NE and the NMS is interrupted. Cause 8: The SCC board on the faulty NE is being reset or switched. Cause 5: A certain board is faulty. If the response interruption time lasts for more than two minutes. Huawei Confidential Page 106 . Cause 7: The bandwidth configured for the inband DCN channel is excessively small.

Handling Method Cause 1: On a network.Locating Inband DCN Faults . or the received optical power is excessively low. which support the DCN function by default. Cause 5: A certain board is faulty. Check whether the R_LOS. It is usually caused by the new NE on the network. HUAWEI TECHNOLOGIES CO. LTD. NE IP addresses or subnet masks conflict. insert the network cables or fibers again. If the alarm exists. If the fibers or cables are not connected to the ports whose DCN function is enabled by default. Cause 4: The received signals of the faulty NE are lost. (3) Check whether the configurations of the ports at the two ends are consistent. modify the configurations to match each other. If the network cables or fibers are disconnected from the ports.. According to the NE plan table. re-configure these parameters. Cause 2: The inband DCN port of the faulty NE is not enabled. Check whether the HARD_BAD or TEMP_OVER alarm exists on the board configured with the inband DCN channel. replace the board that reports the alarm. Huawei Confidential Page 107 . or IN_PWR_ABN alarm exists on the board configured with the inband DCN channel. check whether the NE ID. change the present port to a port whose DCN function is enabled by default. clear it. or parameter settings for the interconnected ports are inconsistent. are connected to fibers or cables. (1) Check whether the ports. Check whether the network cables or fibers of the faulty NE are disconnected from the ports. If any parameters are incorrect or conflict with the configuration of another NE. ETH_LOS. and therefore the DCN packets cannot be extracted. enable the inband DCN for the ports. Cause 3: The physical connection between the faulty NE and the NMS is interrupted. If the alarm exists. NE IP address and subnet mask of the new NE are correctly configured. the NE IDs. such as the working mode of the Ethernet port. If inconsistent. If not. (2) Check whether the ports at the two ends of the link are enabled.

(2) If a DCN gateway manages a large number of NEs. a DCN gateway manages a maximum of 64 NEs. part of the query information may be lost. If yes. (2) If the DCN connection is not recovered. change the position of the DCN gateway and the number of NEs that the DCN gateway manages.Handling Method (Continued) Cause 6: A DCN storm or DCN interruption occurs as the third-party network that the DCN packets traverse is faulty. and therefore the inband DCN packets cannot be responded. Generally. check whether a protection switchover occurs on a board. it indicates that the SCC board is in the reset state. check whether a port loop or physical link interruption occurs in the third-party network. Cause 8: The SCC board on the faulty NE is being reset or switched. (3) If a protection switchover occurs on a board. LTD. rectify the faults in the third-party network first. a network congestion may occur. If the indicator is blinking green.Locating Inband DCN Faults .. After the PROG indicator is steady on (green). the DCN connection is automatically recovered after rerouting is complete. In this case. the reset of the SCC board is complete and the DCN connection is automatically recovered. you should properly increase the bandwidth configured for the inband DCN channel. Cause 7: The bandwidth configured for the inband DCN channel is excessively small. A protection switchover on a board will reroute DCN packets. Huawei Confidential Page 108 . (1) When the number of services configured on the port exceeds a certain number. If the DCN packets traverse a third-party network. HUAWEI TECHNOLOGIES CO. If a network congestion occurs. (1) Observe whether the PROG indicator on the SCC board is blinking green. especially during package loading.

. .Contents 1 2 3 4 5 6 7 8 9 10 11 Process of Locating Common Faults Locating Link Faults Locating Faults of TDM Services Locating Faults of Packet Services Locating Faults of Protection Schemes Locating Clock Faults Locating DCN Faults Locating Other Faults Handling Common Alarms Typical Cases of Fault Locating Reference Documents Huawei Confidential Page 109 HUAWEI TECHNOLOGIES CO. LTD.

. Cause 3: A certain board is faulty.Locating NE Resets . Cause 2: The power supply of the NE is abnormal. Possible Causes      Cause 1: A manual operation causes the reset.Fault Symptoms and Possible Causes Fault Symptoms  A cold reset or warm reset occurs on an NE. LTD. Cause 5: Other reasons cause the reset. Huawei Confidential Page 110 . HUAWEI TECHNOLOGIES CO. Cause 4: Certain tasks have high CPU usage.

debugbuf records. Cause 4: Certain tasks have high CPU usage. Collect the current and historical alarms on the NMS and the NE. HUAWEI TECHNOLOGIES CO. check whether the voltage of the power supply is stable. errlog records. Check whether the current network has a large scale and whether the number of routes is far greater than the recommended value. and other information required for fault locating. Cause 3: A certain board is faulty..Locating NE Resets .Handling Method Handling Procedure Cause 1: A manual operation causes the reset. and send all the information to Huawei engineers. Cause 2: The power supply of the NE is abnormal. oplog records. Check for a low-voltage reset record or an exception record among errlog records on the NE and records in the black box. Replace the faulty board. Check operation records on the NMS and oplog/errlog records on the NE. Cause 5: Other reasons cause the reset. records in the black box. Huawei Confidential Page 111 . check whether the environment causes abnormal power supply. LTD. dopra records.

    Cause 6: The SWDL_INPROCESS alarm persists after the upgrade is complete. Cause 4: A rollback occurs due to an error in the software activation process.. Cause 5: The upgrade task is not rolled back when an error occurs in the software activation process. Possible Causes   Cause 1: The NE is abnormal in the process of software loading.    Cause 3: A rollback occurs due to a failure in package downloading. Cause 7: User interfaces stop responding in the upgrade process.Locating Package Loading Failures Fault Symptoms and Possible Causes Fault Symptoms  Package loading fails. LTD. Cause 9: The PCBs of the active and standby SCC boards are of different versions. Huawei Confidential Page 112 . Cause 2: Backing up databases fails. Cause 8: A board is reseated in the upgrade process. HUAWEI TECHNOLOGIES CO.

. HUAWEI TECHNOLOGIES CO.Possible Causes Possible Causes     Cause 10: The NE is in the Undispensed state when an upgrade task is being created. Cause 13: No CF card is installed on the SCC board or the memory in the CF card is insufficient. Cause 11: The NE is in the Unactivated state when an upgrade task is being created. Huawei Confidential Page 113 .  Cause 14: Other reasons cause the failure. LTD. Cause 12: The NE is in the Uncommitted state when an upgrade task is being created.Locating Package Loading Failures .

Cause 4: A rollback occurs due to an error in the software activation process. create another upgrade task and run the task. Cause 3: A rollback occurs due to a failure in package downloading. perform a warm reset on the NE. Cause 2: Backing up databases fails. check whether a correct software package is downloaded. activate the software again.. Check whether DCN communication is normal and whether bandwidth is sufficient. If the remaining space on the flash memory is sufficient.Locating Package Loading Failures . Load software 10 minutes later because the NE is in an unstable state. If no fault is found. Huawei Confidential Page 114 .Handling Method  Handling Procedure Cause 1: The NE is abnormal in the process of software loading. change the gateway NE. LTD. If the downloaded software package is correct. HUAWEI TECHNOLOGIES CO. Then. Upload the NE databases to the U2000 again. Check whether any board is removed or whether the NE is manually reset during the upgrade. If the backing-up fails again. check whether the remaining space on the flash memory is greater than the space required by the software package.

For boards whose version information is not updated. check the version of each board.. perform a cold reset on them.Handling Method  Handling Procedure process. Check whether the NE is in a normal state.Locating Package Loading Failures . Restart the tool and create a new task that runs directly from the NE state. If a resetting command cannot be issued. perform a warm reset on the NE. perform a warm reset on the SCC board if the NE has only one SCC board. Cause 7: User interfaces stop responding in the upgrade process. or perform active/standby switching between SCC boards if the NE has two SCC boards. HUAWEI TECHNOLOGIES CO. which is displayed when the task is originally created. If yes. LTD. Then. Select the task and click Ignore to commit the task. Cause 5: The upgrade task is not rolled back when an error occurs in the software activation Cause 6: The SWDL_INPROCESS alarm persists after the upgrade is complete. Huawei Confidential Page 115 .

or enter the :swdl-dnldswmem command on the Navigator. Huawei Confidential Page 116 . If the returned values of m_byPCB for the two SCC boards are different.CSWDL". Run the :mon-get-dump:bid. replace any SCC board to ensure that the SCC boards use the same PCB.Handling Method  Handling Procedure Remove the board. Skip Load Package and create a task from the Dispense state. and then insert the board after the NE enters the normal state."SWDL. Cause 10: The NE is in the Undispensed state when an upgrade task is being created. check whether an SWDL_INPROCESS alarm occurs."" command on the Navigator and check whether the returned values of m_byPCB for the active and standby SCC boards are the same. Cause 8: A board is reseated in the upgrade process. If automatic matching still fails. Cause 9: The PCBs of the active and standby SCC boards are of different versions.Locating Package Loading Failures .ISWDL. LTD. HUAWEI TECHNOLOGIES CO.. If yes. clear the SWDL_INPROCESS alarm first.

) Cause 11: The NE is in the Unactivated state when an upgrade task is being created. Cause 13: No CF card is installed on the SCC board or the memory in the CF card is insufficient..Handling Method  Handling Procedure Skip Load Package and Dispense and create a task from the Active state. Cause 12: The NE is in the Uncommitted state when an upgrade task is being created. check whether an activation operation is allowed. delete unnecessary files in the CF card. and create a task from the Commit state. Cause 14: Other reasons cause the failure. (An activation operation will interrupt services. If no CF card is installed on the SCC board. or enter the :mon-init-sys:0. Huawei Confidential Page 117 . install a CF card. Skip Load Package.Locating Package Loading Failures . Therefore. Collect data and send the data to Huawei engineers. or enter the :swdl-commit-swmem command on the Navigator. and Active. Dispense. if the memory in the CF card is insufficient. LTD.swdl command on the Navigator. HUAWEI TECHNOLOGIES CO.

replace the SCC board. Check whether the HARD_BAD alarm exists on the SCC board. Handling Procedure Cause 1: The CF card is faulty. Replace the CF card and check whether the alarm is cleared. HUAWEI TECHNOLOGIES CO. check whether the alarm is cleared. Then. resulting in a failure to create a CF file. resulting in a failure to create a CF file. If the alarm persists. perform a cold reset on the SCC board. Cause 2: The SCC board is faulty. LTD.. resulting in an initialization failure. Possible Causes   Cause 1: The CF card is faulty. If yes. resulting in an initialization failure. Cause 2: The SCC board is faulty. Huawei Confidential Page 118 .Common Alarms  The CFCARD_FAILED is an alarm indicating that the operation on the CF card fails.

If yes. Then. replace the SCC board. Check whether the CF card is loosened. If the alarm persists.. Possible Causes  Cause 1: The CF card is not installed. Check whether the CF card is installed on the SCC board. check whether the alarm is cleared.  HUAWEI TECHNOLOGIES CO. perform a cold reset on the SCC board. Handling Procedure Cause 1: The CF card is not installed. check whether the alarm is cleared. If the alarm persists. LTD. Huawei Confidential Page 119 .Common Alarms The CFCARD_OFFLINE is an alarm indicating that the CF card is offline. Check whether the HARD_BAD alarm exists on the SCC board. replace the CF card. If not. If yes.  Cause 2: The CF card is in poor contact with the SCC board. Cause 3: The SCC board is faulty. Cause 2: The CF card is in poor contact with the SCC board. Then. install a CF card.  Cause 3: The SCC board is faulty. re-install the CF card.

.Contents 1 2 Process of Locating Common Faults Locating Link Faults Locating Faults of TDM Services Locating Faults of Packet Services Locating Faults of Protection Schemes 3 4 5 6 Locating Clock Faults Locating DCN Faults Locating Other Faults Handling Common Alarms Typical Cases of Fault Locating 7 8 9 10 11 Reference Documents Huawei Confidential Page 120 HUAWEI TECHNOLOGIES CO. . LTD.

 Cause 3: The ODU at the transmit end has abnormal transmit power. the climate) cause the degradation of the working channels. use large-diameter antennas..Handling Common Alarms (1) The AM_DOWNSHIFT is an alarm indicating the downshift of the AM scheme. For details on troubleshooting at the transmit end. Cause 3: The ODU at the transmit end has abnormal transmit power. the climate) cause the degradation of the working channels. no measure should be taken to handle the alarm. see Locating Link Faults.  HUAWEI TECHNOLOGIES CO. Huawei Confidential Page 121 . For details on troubleshooting at the receive end. Cause 5: Multi-path fading occurs due to atmospheric and ground effects.  Cause 4: The ODU at the receive end has abnormal receive power. re-plan transmission links to avoid areas with severe multi-path fading.  Cause 5: Multi-path fading occurs due to atmospheric and ground effects. LTD. Eliminate the interferences around the working channels. the downshift of the AM scheme is normal. Cause 4: The ODU at the receive end has abnormal receive power. Cause 2: There are interferences around the working channels. see Locating Link Faults. Handling Procedure Cause 1: The external factors (for example. Use the NMS to check whether the receive power of the ODU at the receive end is normal. When the external factors (for example. Adjust the elevation angles of the antennas. Therefore. the climate) cause the degradation of the working channels. Possible Causes  Cause 1: The external factors (for example.  Cause 2: There are interferences around the working channels. Use the NMS to check whether the transmit power of the ODU at the transmit end is normal.

Handling Common Alarms (2) The BD_STATUS is an alarm indicating that the board cannot be detected. Cause 3 of the alarm reported by a board of the IDU: The slot is faulty. insert the board in a vacant slot. Cause 2 of the alarm reported by a board of the IDU: The board and the backplane are not connected properly. the power that the IF board supplies to the ODU is abnormal. the power that the IF board supplies to the ODU is abnormal. Cause 5: The ODU is faulty. Possible Causes       Cause 1 of the alarm reported by a board of the IDU: The board is installed in an incorrect slot. Check whether the physical slot and logical slot of the alarmed board are the same. the IF cable is damaged or is not properly connected. Replace the board. Huawei Confidential Page 122 . Cause 2: The board and the backplane are not connected properly. Handling Procedure Cause 1: The board is installed in an incorrect slot. Check whether the slot has broken or bent pins. Cause 4 of the alarm reported by a board of the IDU: The alarmed board is faulty. Cause 3: The slot is faulty. HUAWEI TECHNOLOGIES CO. the IF cable is damaged or is not properly connected. check whether the IF cable is wet or abnormal. Replace the ODU that reports the alarm. LTD. Cause 4: The alarmed board is faulty. If yes. check the voltage at the RF port on the IF board. re-connect the IF cable. Re-install the alarmed board. Cause 5: The ODU is faulty..

Possible Causes    Handling Procedure Cause 1: The board and the backplane are not connected properly. Re-install the alarmed board. check whether an alarm indicating loss/deterioration of a clock source is reported. Cause 1: The board and the backplane are not connected properly. replace the alarmed board. perform a cold reset on the SCC board. Cause 2: The alarmed board is faulty. On the NMS. Cause 3: The inter-board bus is faulty. If the backplane has broken or bent pins. Perform a cold reset on the alarmed board. HUAWEI TECHNOLOGIES CO. If the alarm persists. Huawei Confidential Page 123 . Cause 2: The alarmed board is faulty. LTD. check whether the backplane has broken or bent pins. If the alarm still persists. clear clock alarms and then check whether the BUS_ERR alarm clears. If yes.Handling Common Alarms (3)  The BUS_ERR is an alarm of bus errors. Cause 3: The inter-board bus is faulty.. insert the board in a vacant slot or replace the backplane.

Cause 4: The slot is faulty. HUAWEI TECHNOLOGIES CO. Replace the alarmed board. Cause 3: The alarmed board is faulty. Check whether the slot has broken or bent pins. Cause 2: The board and the backplane are not connected properly. Cause 2: The board and the backplane are not connected properly. Cause 3: The alarmed board is faulty. the alarm disappears automatically. Possible Causes      Cause 1: The alarmed board is reset. Then. Huawei Confidential Page 124 . Re-install the alarmed board. insert the board in a vacant slot or replace the backplane. Handling Procedure Cause 1: The alarmed board is reset. Perform a reset on the alarmed board. Cause 5: The SCC board is faulty. LTD. insert the board in a vacant slot. Then. replace the SCC board. If the alarm persists. Cause 5: The SCC board is faulty.. Cause 4: The slot is faulty. If the backplane has broken or bent pins.Handling Common Alarms (4)  The COMMUN_FAIL is an alarm indicating an inter-board communication failure. Perform a cold reset on the SCC board. check whether the alarm clears. check whether the backplane has broken or bent pins. If yes.

Cause 2: A fan is faulty. If the alarm persists. Re-install the alarmed board. Huawei Confidential Page 125 . Remove the fan board and clean the fans. replace the fan board.Handling Common Alarms (5)  The FAN_FAIL is an alarm indicating that a fan is faulty. Cause 2: A fan is faulty. LTD. HUAWEI TECHNOLOGIES CO. Possible Causes   Cause 1: The alarmed board and the backplane are not connected properly. Handling Procedure Cause 1: The alarmed board and the backplane are not connected properly. check whether the backplane has broken or bent pins. Then. insert the board in a vacant slot or replace the backplane. install the fan board and check whether the alarm clears. If the backplane has broken or bent pins..

Handling Common Alarms (6)  The HARD_BAD is an alarm indicating that the hardware is faulty. check whether the backplane has broken or bent pins. Then. Re-install the alarmed board. Cause 4: The SCC board is faulty. Cause 4: The SCC board is faulty. replace the alarmed board.. Cause 2: The alarmed board and the backplane are not connected properly. If the alarm persists. Huawei Confidential Page 126 . If the alarm persists. HUAWEI TECHNOLOGIES CO. Check the external power supply. check whether the alarm clears. Cause 3: The alarmed board has hardware errors. Cause 2: The alarmed board and the backplane are not connected properly. replace the SCC board. Possible Causes     Handling Procedure Cause 1: The external power supply fails. LTD. Perform a cold reset on the SCC board. Perform a cold reset on the alarmed board and check whether the alarm clears. Cause 3: The alarmed board has hardware errors. Cause 1: The external power supply fails. insert the board in a vacant slot or replace the backplane. If the backplane has broken or bent pins.

Replace the ODU connected to the alarmed IF board. (The connectors to be checked include the connector between the IF pigtail and the IF board. Cause 3: The power module of the ODU is faulty. LTD.. the connector between the IF pigtail and the IF cable. Possible Causes    Cause 1: The IF cable is loose or faulty. Cause 2: The IF port on the IF board is damaged.Handling Common Alarms (7)  The IF_CABLE_OPEN is an alarm indicating that the IF cable is open. Huawei Confidential Page 127 . Check whether the connector of the IF cable is damaged/wet/corroded/loose or whether the connector is made properly. Cause 3: The power module of the ODU is faulty. Replace the alarmed IF board.) Cause 2: The IF port on the IF board is damaged. Handling Procedure Cause 1: The IF cable is loose or faulty. and the connector between the IF cable and the ODU. HUAWEI TECHNOLOGIES CO.

Possible Causes    Cause 1: The IF board is faulty. Cause 3: The ODU is faulty. LTD. Handling Procedure Cause 1: The IF board is faulty. Check whether the connector of the IF cable is damaged/wet/corroded/loose or whether the connector is made properly. If the alarm persists. replace the ODU. and the connector between the IF cable and the ODU. Replace the alarmed IF board. (The connectors to be checked include the connector between the IF pigtail and the IF board. Perform a cold reset on the ODU and check whether the alarm clears. the connector between the IF pigtail and the IF cable.) Cause 3: The ODU is faulty. Cause 2: The IF cable is faulty.. HUAWEI TECHNOLOGIES CO. Cause 2: The IF cable is faulty. Huawei Confidential Page 128 .Handling Common Alarms (8)  The IF_INPWR_ABN is an alarm indicating that the power supplied by an IF board to an ODU is abnormal.

Then. Cause 3: The IEEE 1588 overhead enabling is different on both ends of a microwave link. Cause 2: The AM enabling is different on both ends of a microwave link. HUAWEI TECHNOLOGIES CO. Possible Causes  Cause 1: The number of E1 signals is different on both ends of a microwave link (including the number of E1 signals on the active page and the number of E1 signals on the standby page). Cause 4: The modulation mode is different on both ends of a microwave link. Cause 2: The AM enabling is different on both ends of a microwave link.Handling Common Alarms (9)  The MW_CFG_MISMATCH is an alarm indicating a configuration mismatch on microwave links. check the configuration on both ends of the microwave link. Huawei Confidential Page 129 .. Cause 3: The IEEE 1588 overhead enabling is different on both ends of a microwave link. Cause 5: The channel spacing is different on both ends of a microwave link. Cause 5: The channel spacing is different on both ends of a microwave link. LTD. Cause 4: The modulation mode is different on both ends of a microwave link.     Handling Procedure Cause 1: The number of E1 signals is different on both ends of a microwave link. Ensure that the configuration is the same on both ends of the microwave link. Determine the possible cause of the alarm according to the alarm parameters.

.  Cause 2: The IF working mode or channel spacing at both ends of a microwave link does not match the preset modulation mode.  Cause 5: An interference event occurs. RADIO_RSL_LOW. Huawei Confidential Page 130 . resulting in abnormal receive power.Handling Common Alarms (10) The MW_LOF is an alarm indicating that microwave frames are lost. Cause 5: An interference event occurs. CONFIG_NOSUPPORT. replace the ODU/IF board. Cause 4: An IF/RF transmit/receive channel is faulty. If any of these alarms are reported. set the receive frequency of the local site to the same as the transmit frequency of the opposite site. Perform loopbacks section by section to check whether the ODU/IF transmit/receive channel is faulty. and TEMP_ALARM alarms. IF_CABLE_OPEN. VOLT_LOS.  HUAWEI TECHNOLOGIES CO.  Cause 4: An IF/RF transmit/receive channel is faulty. resulting in abnormal receive power. Modify the settings of IF parameters according to network planning requirements to ensure a match with the preset modulation mode. Cause 2: The IF working mode or channel spacing at both ends of a microwave link does not match the preset modulation mode. If a fault is found. ensure that the receive power of the ODU at both ends of the microwave link meets the planned value. Then. Check for HARD_BAD. Cause 3: The operating frequency of the ODU at the local site is inconsistent with the operating frequency of the ODU at the opposite site. Handling Procedure Cause 1: Certain other alarms occur. In addition. LTD. Eliminate the interference source. clear them immediately. BD_STATUS. Possible Causes  Cause 1: Certain other alarms occur.  Cause 3: The operating frequency of the ODU at the local site is inconsistent with the operating frequency of the ODU at the opposite site. Set the transmit frequency of the local site to the same as the receive frequency of the opposite site.

Handling Common Alarms (11) The MW_LIM is an alarm indicating that a mismatched microwave link identifier is detected. Check whether the link ID of the local site matches the link ID of the opposite site.  Cause 3: The antenna receives the signals from the other sites.  Cause 4: The polarization direction of the XPIC is incorrect.  Cause 2: The services on other microwave links are received due to the incorrect configuration of the microwave link receive frequency at the local or opposite site. because the direction of the antenna is set incorrectly. Handling Procedure Cause 1: The link ID of the local site does not match the link ID of the opposite site. Ensure that the V-polarized XPIC IF boards at the two ends are interconnected through the V-polarized microwave link.  HUAWEI TECHNOLOGIES CO. If not. because the direction of the antenna is set incorrectly. LTD. Huawei Confidential Page 131 . Cause 4: The polarization direction of the XPIC is incorrect. set the receive and transmit frequencies of the two sites again. If not. Check whether the receive and transmit frequencies of the local site are consistent with the receive and transmit frequencies of the opposite site. Cause 3: The antenna receives the signals from the other sites. In addition. Cause 2: The services on other microwave links are received due to the incorrect configuration of the microwave link receive frequency at the local or opposite site. and the mapping between the ODU and the feed. check and adjust the IFX2 board and ODU.. Check whether the configuration of XPIC work groups is correct. Possible Causes  Cause 1: The link ID of the local site does not match the link ID of the opposite site. and the H-polarized XPIC IF boards at the two ends are interconnected through the H-polarized microwave link. set the link IDs of the two sites to the same value according to the requirements of the networking planning. Align the antennas at the two ends.

the power module of the ODU is faulty. the input power or the PIU is abnormal. clear the alarms immediately. the power module of the ODU is faulty. the power module is abnormal.   Handling Procedure Cause 1: If the alarm is reported on the board on the IDU. the power module is abnormal. LTD. the input power or the PIU is abnormal. Cause 3: If the alarm is reported on the ODU. Huawei Confidential Page 132 . If yes. Replace the ODU.. HUAWEI TECHNOLOGIES CO. Cause 2: If the alarm is reported on the board on the IDU. Check whether any alarms are reported on the PIU. Cause 3: If the alarm is reported on the ODU. Possible Causes  Cause 1: If the alarm is reported on the board on the IDU.Handling Common Alarms (12)  The POWER_ALM is an alarm indicating that the power module is abnormal. Replace the alarmed board. Cause 2: If the alarm is reported on the board on the IDU.

LTD. Cause 3: The PIU board is faulty. Handling Procedure Cause 1: The power cable is cut. Cause 3: The PIU board is faulty. Cause 2: The input power is abnormal. or not connected.Handling Common Alarms (13)  The POWER_ABNORMAL is an alarm indicating that the input power supply is abnormal. connect it. Cause 2: The input power is abnormal. or not connected. Contact power supply engineers to rectify the fault. damaged. Possible Causes    Cause 1: The power cable is cut.. Replace the PIU board. damaged. HUAWEI TECHNOLOGIES CO. or not connected. If the power cable is not connected. Huawei Confidential Page 133 . damaged. If the power cable is cut or damaged. replace it with a proper power cable. Check whether the power cable is cut.

. replace the ODU. replace the IF board.Handling Common Alarms (14)  The RADIO_TSL_HIGH is an alarm indicating that the microwave transmit power is too high. If the alarm persists. Possible Causes   Handling Procedure Cause 1: The ODU is faulty. Perform a cold reset on the ODU and check whether the alarm clears. Possible Causes  Handling Procedure Cause 1: The ODU is faulty. LTD. Perform a cold reset on the ODU and check whether the alarm clears. HUAWEI TECHNOLOGIES CO. Huawei Confidential Page 134 . If the alarm persists. replace the ODU. Cause 2: The signals from the IF board to the ODU are abnormal. Cause 2: The signals from the IF board to the ODU are abnormal. Perform a cold reset on the IF board and check whether the alarm clears. Cause 1: The ODU is faulty. Cause 1: The ODU is faulty. If the alarm persists.  The RADIO_TSL_LOW is an alarm indicating that the microwave transmit power is too low.

 Cause 2: The configuration of the upper and lower thresholds of the temperature alarm is not proper. Check the current operating temperature of the board and the configuration of the upper and lower temperature thresholds. check whether the configuration is proper according to actual requirements. Check whether the alarmed board reports other hardware alarms such as HARD_BAD. Check whether the ambient temperature is higher than 45ºC or lower than 0ºC. Cause 4: The alarmed board is faulty. In addition. If the temperature is abnormal. If heat dissipation is impeded due to the dusty air filter. modify the configuration. Check whether the FAN_FAIL alarm occurs. LTD. Cause 2: The configuration of the upper and lower thresholds of the temperature alarm is not proper.Handling Common Alarms (15) The TEMP_OVER is an alarm indicating that the operating temperature of the board crosses the threshold. Huawei Confidential Page 135 . remove the air filter and clean it.  Cause 4: The alarmed board is faulty. clear the alarm first.. If the configuration is not proper. Check whether the air filter is covered with dust. replace the alarmed board. Cause 3: The fan stops working or the air filter is too dusty. If yes.  HUAWEI TECHNOLOGIES CO. Troubleshoot the equipment fault first. Handling Procedure Cause 1: The ambient temperature is very high or very low due to a fault in the cooler or heater equipment. If yes. Possible Causes  Cause 1: The ambient temperature is very high or very low due to a fault in the cooler or heater equipment. which impedes heat dissipation. check whether the cooler or heater equipment is faulty. You can feel the wind and the temperature of the wind at the air outlet.  Cause 3: The fan stops working or the air filter is too dusty.

.Contents 1 2 3 4 5 6 7 8 9 10 11 Process of Locating Common Faults Locating Link Faults Locating Faults of TDM Services Locating Faults of Packet Services Locating Faults of Protection Schemes Locating Clock Faults Locating DCN Faults Locating Other Faults Handling Common Alarms Typical Cases of Fault Locating Reference Documents Huawei Confidential Page 136 HUAWEI TECHNOLOGIES CO. LTD..

It was found that a transient voltage dip occurred on the IF board. LTD.  The services that the IF board carried were automatically restored after an interruption.. Solutions Check the power supply records of the NE.Services on an IF Board Were Interrupted Because the IF Board Was Reset Due to Low Voltage Fault Symptoms  On an NE that was not powered off or reset. Cause Analysis  The software black box contained a record indicating a board reset due to low voltage. HUAWEI TECHNOLOGIES CO. an IF board reported BD_STATUS alarms but it was not reseated or reset. Huawei Confidential Page 137 .

 (3) Route query results showed that some NEs had 600 routes. When these three tasks had high CPU usage simultaneously.Software Watchdogs of RTN NEs Were Frequently Reset Due to a Large Network Scale Fault Symptoms  The software watchdogs of RTN NEs on a live network were frequently reset. HUAWEI TECHNOLOGIES CO. Cause Analysis   (2) Task SOCK is a communication task of the TCP/IP protocol stack. task tNetTask is a communication task of the VXworks operating system. LTD. task VIDL could not be carried out. and tL2TSvR1b58 accounted for more than 60% of the CPU usage. tNetTask. Generally.. it is recommended that an NE has a maximum of 64 routes (or 100 routes in particular cases). As a result. task tL2TSvR1b58 is an internal communication task of NEs. Solutions Divide the network into more subnets. Huawei Confidential Page 138 . the communication traffic was very heavy. (1) Tasks SOCK.

check whether the settings for the S1 byte and SSM protocol were consistent between the two ends. but its opposite NE properly output clock signals. Huawei Confidential Page 139 Cause Analysis   Solutions . reported an LTI alarm. HUAWEI TECHNOLOGIES CO.An NE Failed to Trace an External Clock Due to Inconsistent Settings for the NEs at the Two Ends Fault Symptoms  On a live network. The NE finally restored to normal after its SSM protocol was disabled. check which type of equipment output clock signals at the opposite end. check the connection of the clock line. Due to inconsistent setting for the SSM protocol at the two ends. If the clock output mode was set to 2 Mbit/s. In addition. (1) Check the network topology.. The NMS displayed an LTI alarm but not an EXT_SYNC_LOS alarm. (2) Check whether the clock output mode was set to a same value at the two ends. an NE failed to trace an external clock and entered the free-run state. LTD. check whether the external clock was available. the local NE could not correctly obtain the S1 byte and as a result. check whether the local external clock port was configured with DCC overheads.

the value of the V5 byte carried in the lower order channel was 0 and accordingly the IF board reported an LPUAS event. Cause Analysis   Solutions HUAWEI TECHNOLOGIES CO. LTD. LP_UNEQ alarms could not be reported. As a result. and BIP_SD alarms could not also be reported because they were suppressed by LP_UNEQ alarms. LP_REI. LP_RDI. The IF board and line board were configured with cross-connections but the services that the line board carried were not completely cut over. In addition. LP_TIM. Therefore. LP_RFI.An IF Board Reported an LPUAS Performance Event But Did Not Report a Lower Order Alarm Because LP_UNEQ Alarms Were Suppressed Fault Symptoms  An IF board reported an LPUAS performance event but did not report a lower order alarm. It was found that the reporting status of LP_UNEQ alarms was set to DISABLE.. BIP_EXC. Huawei Confidential Page 140 . Set the reporting status of LP_UNEQ alarms to ENABLE.

. NE2108 and NE2120 were at different sites. the services between NE2108. Note: NE2199 and NE2299 were at the same site. and NE2120 were interrupted. however. The services. LTD. NE2199. After NE2108 was powered off.Service Interruption Due to Incorrect IF Cable Connections Fault Symptoms  The network diagram is provided in the following figure. were not restored even after NE2108 restarted. NE2299. Huawei Confidential Page 141 . HUAWEI TECHNOLOGIES CO.

LTD. simultaneously reported MW_LOF and RADIO_RSL_LOW alarms. it is confirmed that IF board 7 of NE2199 was connected to ODU 17 of NE2299 and IF board 7 of NE2299 was connected to ODU 17 of NE2199. IF board 7 of NE2199 and IF board 5 of NE2299 simultaneously reported MW_RDI alarms. ODU 15 of NE2199 and ODU 17 of NE2299. see the figure provided in the next slide. the active and standby ODUs of NE2199 will report MW_LOF and RADIO_RSL_LOW alarms simultaneously and NE2299's services will not be affected. For the connections..Service Interruption Due to Incorrect IF Cable Connections Cause Analysis  Normally. In addition. HUAWEI TECHNOLOGIES CO. after NE2108 is powered off. however.  It is suspected that the IF cables for the standby links of NE2199 and NE2299 were incorrectly connected. Huawei Confidential Page 142 . Based on the reported alarms.

Service Interruption Due to Incorrect IF Cable Connections Solutions  Exchange the IF cables between IF boards 7 of NE2199 and NE2299. Huawei Confidential Page 143 . Incorrect IF cable connections HUAWEI TECHNOLOGIES CO.. LTD.

Then. A user could not find the NE using the Web LCT but could find the NE using the Navigator. however. Huawei Confidential Page 144 . The subnet mask of the network adapter connected to the public network and that of the NE were set to 255. the Web LCT properly communicated with the NE. and found that the public network and private network had same IP addresses.255. The user.A Login to an NE Failed Due to Conflicting IP Addresses Fault Symptoms  A computer has two network adapters. Cause Analysis   Solutions Check IP addresses on a network and ensure that every IP address on the network is unique. added the NE again.255. HUAWEI TECHNOLOGIES CO. (1) Ran the arp –a command to query the IP addresses of the devices connected to the computer. After the addition of the NE.255.0. could not log in to the NE or successfully ping the NE..0. and that of the private network was set to 255. LTD. The IP addresses of the two network adapters and that of the NE were in a same network segment. (2) Disconnected the network cable that connected one network adapter to the public network.0. one connected to a public network and the other connected to an NE.

LTD.. a newly deployed BTS frequently encountered transient service interruptions. (1) The pinging duration is normal if undersized packets are transmitted but is abnormal if oversized packets are transmitted. it is found that IF parameters were not set according to the network planning document. always took less than 30 ms for a user to successfully ping the BSC from the microwave equipment that was connected to the BTS.000 ms for a user to successfully ping a BSC from the BTS. Huawei Confidential Page 145 . It took 20 ms or even 1. It. (2) Based on IF configurations. Modify the IF parameter settings according to the network planning document. The bandwidth allocated to data services was very low.E1-used bandwidth (for Hybrid microwave) HUAWEI TECHNOLOGIES CO. Cause Analysis   Solutions Note: Bandwidth available to data services = Service bandwidth .A Newly Deployed BTS Encountered Service Interruptions Because IF Parameter Settings Were Not in Compliance with the Network Planning Document Fault Symptoms  After being activated. however.

(5) Changed the NE ID of NE04 to a unique value on the network. Possible cause 2: The network configuration is incorrect. HUAWEI TECHNOLOGIES CO. Handling Procedure (1) Queried NE03's adjacent routes and found that the NE IDs of NE01 and NE02 were displayed. The login was successful.A Remote Login to an NE Failed Due to Repeated NE IDs Fault Symptoms On a new OptiX RTN network. and NE03 formed a chain. A user could log in to NE03 from NE02 but could not log in to NE03 from NE01. (3) Checked NE03 on site.. Huawei Confidential Page 146 . NE02. Then. logged in to NE03 from NE01. NE01. (2) Performed a reset on NE03 and found that the fault persisted. Cause Analysis Possible cause 1: NE03 has a hardware fault. (4) Logged in to NE04 and found that the NE ID of NE04 was the same as that of NE01. causing a DCN communication failure. LTD. and found that one optical port of the EG2 board was connected to NE02 and another optical port of the EG2 board was connected to NE04.

LTD.A Service Created on a Static Tunnel Could Not Be Set Up Due to Incorrect Fiber Connections Fault Symptoms The DCN communication between two NEs was normal but the service created on the static tunnel between the two NEs could not be set up. and HARD_BAD alarms was reported. the ARP protocol worked improperly and the service created on the static tunnel could not be set up. In addition. Huawei Confidential Page 147 . HUAWEI TECHNOLOGIES CO. link/port/board hardware malfunctions were ruled out. (2) Checked the IP addresses of the concerned ports and found that the IP addresses were correct and were in a same network segment. as the DCN communication was normal. (3) Checked the entries of the ARP table and found that the IP address of the opposite port could not be learned. it is found that the MAC address of the port on the sink NE was not the planned one. (5) Checked the fiber connections and found that the fibers were incorrectly connected. (6) Re-connected the fibers according to the NE planning table. ETH_LINK_DOWN. Due to the incorrect fiber connection.. (4) The DCN communication could be normal only after the two interconnected ports successfully learned their opposite ports' MAC addresses. Possible cause 3: The ARP protocol works improperly. Cause Analysis Possible cause 1: The physical link is faulty. Handling Procedure (1) Checked the current alarms of the system and found that none of ETH_LOS. Based on query results. Possible cause 2: The IP addresses of the ports are incorrect.

The four ports simultaneously reported LASER_SHUT alarms but were enabled.. Consequently. and found that LASER_SHUT alarms were reported and that the alarmed ports were in the Enabled state. LTD. Handling Procedure (1) Checked current alarms and the status of the alarmed ports. (2) Queried historical alarms and found a HARD_BAD alarm. the board was not accordingly restored to normal.Services Carried by a LAG Were Interrupted Due to Abnormal Laser States Fault Symptoms The Ethernet services carried by a LAG which consisted of one main port and three slave ports. HUAWEI TECHNOLOGIES CO. Huawei Confidential Page 148 . Then. The HARD_BAD alarm indicated that the board was faulty. Due to the board fault. the lasers entered an abnormal state and the services were interrupted. (3) Performed a cold reset on the board. When the HARD_BAD alarm cleared. however. but could not change the port state to disabled. LASER_SHUT alarms cleared. As a result. LASER_SHUT alarms persisted. Cause Analysis The LAG detected that the board entered an abnormal state and then shut down the lasers at all ports. and the services were restored. the board restored to normal. (4) Replaced the board to prevent this fault from occurring on the network. the LAG shut down the lasers at all ports. The lasers actually did not emit light. were interrupted.

Then. (4) On NE03. The BER tester detected a large number of bit errors. (5) On NE03.  BTS NE01 10GE NE02 NE03 NE04       Handling Procedure (1) Connected a BER tester to NE01 and set an inloop at one 2 Mbit/s port of NE04. set an inloop at the network-side port that was connected to NE04.. (2) Configured a static ARP entry at NE03 with the MAC address being the egress port of NE03 and the IP address being NE04. it was inferred that NE03 malfunctioned. Huawei Confidential Page 149 . (3) Set an outloop at the network-side port of NE04.Locating a CES Service Fault by Performing Loopbacks Fault Symptoms A BER tester detected that a large number of bit errors occurred in the CES service between the BSC and the BTS. LTD. the BER tester detected bit errors. and created a tunnel whose egress label was the same as its ingress label between NE03 and NE04. on NE03. In both cases. HUAWEI TECHNOLOGIES CO. set an outloop at the network-side port that was connected to NE02 and found that no bit error occurred. replaced the 10GE line board that was connected to NE02. Therefore.

Cases of Troubleshooting CES Services Case 1: Bit Errors Occurred in CES Services Due to Insufficient Tunnel Bandwidth [Fault Symptoms]: Configured a 15-timeslot CES service that traversed two NEs into an MLPPP group. [Cause Analysis]: The MLPPP group had only one PPP member whose bandwidth was insufficient for the CES services encapsulated on one PW due to insufficient bandwidth. [Solutions]: Set the tester to PCM31C.. however. Huawei Confidential Page 150 . As a result. the tester displayed an LSS alarm and the used E1 port reported an LMFA alarm. After the setting. LTD. HUAWEI TECHNOLOGIES CO. the service was set up. As a result. the chip could not correctly align frames. an LSS alarm persisting for one second was reported and a PW performance event indicating a jitter buffer overflow was also reported. slip frames occurred due to clock wander and delay variation. Bit errors. whereas the tester was set to Unframe. After the setting. occurred in a 31-timeslot service that was created after the 15timeslot service was deleted. Case 2: Bit Errors Occurred When NEs Traced Different Clock Sources [Fault Symptoms]: On a network shown in the following figure. [Solutions]: Add a new member to the MLPPP group. a large number of packets were discarded. As a result. Case 3: CES Services Could Not Be Set Up Due to a Mismatch of E1 Framing Mode [Fault Symptoms]: An inloop was set at one end of a CES service traversing two NEs and a tester was connected to the other end of the CES service. [Solutions]: Let the ANT20 trace the equipment clock. [Cause Analysis]: The OptiX RTN 910 NE and the ANT20 traced different clock sources. [Cause Analysis]: The alarmed E1 port was set to CRC4-multiframe.

.   HUAWEI TECHNOLOGIES CO. and found that NE01 traced the line clock from optical port 1 on the EG2 board in slot 1 (of NE01) and NE02 traced the line clock from optical port 1 on the EG2 board in slot 2 (of NE02). Huawei Confidential Page 151 . (3) Changed the clock source of NE01 according to the NE planning table.   Handling Procedure (1) Suspected that the clock configuration of NE01 was incorrect because NE01 did not report an alarm. As a result. LTD.Large Clock Frequency Deviations Occurred on NodeBs Due to a Timing Loop   Fault Symptoms All NodeBs connected to NE01 (an OptiX RTN 950 NE) reported an alarm indicating a large clock frequency deviation. The two optical ports were directly interconnected. (2) Queried the clock source priority lists of NE01 and NE02. the clock signals traced by NE01 and NE02 formed a loop. resulting in clock quality deterioration and large clock frequency deviations on the NodeBs connected to NE01.

Service Interruptions Due to Incorrect Setting of MPLS Next-Hop IP Address
Fault Symptoms  All link services converged to an OptiX RTN 900 NE of V100R001C00/C02 were interrupted and several tunnels passing the NE frequently reported MPLS_TUNNEL_LOCV alarms. The alarms, however, transiently cleared. In addition, the DCN communication between all NEs was normal. Handling Procedure  (1) Ruled out a physical link fault because the DCN communication was normal.  (2) Analyzed the distribution of the affected NEs and found that all interrupted services were first converged to an NE and then backhauled to the BSC.  (3) Found that an ARP entry was frequently and automatically added and deleted on the convergence NE. Changed the ARP entry to a static entry. Then, the tunnel alarms cleared and some services were restored.  (4) Checked the configurations of the convergence NE, and found that the NE was configured with multiple tunnels and that the next-hop IP address of the port was set to a value same as the next-hop IP address of the convergence port. The incorrect settings caused abnormal ARP learning and further interrupted tunnel services.  (5) Deleted incorrectly configured services and tunnels, and re-configured services and tunnels according to the network planning document. The services were normal even after the static ARP entry was deleted.

HUAWEI TECHNOLOGIES CO., LTD.

Huawei Confidential

Page 152

Service Interruptions Due to Inconsistent Bandwidth Planning for TDM Services and Ethernet Services
Fault Symptoms

Two OptiX RTN 950 NEs of a version earlier than V100R002C02SPC100 were
interconnected. They carried TDM services and Ethernet services. The microwave link was correctly configured, but the Ethernet services could not be set up and no alarm related to microwave links was reported.

Handling Procedure

(1) Checked the configurations of the two interconnected NEs and found that the number of E1s was set to different values at the two ends. The data discrepancy caused inconsistent bandwidths at the two ends and resulted in service interruptions. (2) Changed the number of E1s at the two ends to the same value.

Notes

Hybrid microwave bandwidth is equal to the sum of the TDM service bandwidth and the Ethernet service bandwidth. For TDM services carried on a microwave link, the number of E1s must be the same at the two ends. Otherwise, the TDM services cannot be set up. Besides, if the set E1 bandwidth uses up all microwave bandwidth, Ethernet services will be interrupted due to absence of bandwidth.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Page 153

Contents
1
Process of Locating Common Faults Locating Link Faults Locating Faults of TDM Services Locating Faults of Packet Services Locating Faults of Protection Schemes Locating Clock Faults Locating DCN Faults

2
3 4 5 6 7 8 9 10 11

Locating Other Faults
Handling Common Alarms Typical Cases of Fault Locating Reference Documents
Huawei Confidential Page 154

HUAWEI TECHNOLOGIES CO., LTD.

LTD.do?actionFlag=intoKBNavigation&aut oFlag=autoThink&colID=ROOTENWEB|CO0000000173&itemId0=29-2&itemId1=3-400 HUAWEI TECHNOLOGIES CO.com/support/pages/navigation/gotoKBNavi..huawei. Huawei Confidential Page 155 .Reference Documents http://support.

please send your feedback to Chen Shaoying (employee ID: 59800). For any comments or suggestions on the documents.do?actionFlag=detailProductSimple& web_doc_id=SE0000498683&doc_type=123-2 For the preceding documents..com/support/pages/kbcenter/view/product. LTD.huawei.huawei. Huawei Confidential Page 156 .com. please download the latest versions from support. HUAWEI TECHNOLOGIES CO.Reference Documents (Continued) Maintenance Guide for the OptiX RTN Equipment http://support.

Security Level: www.com HUAWEI TECHNOLOGIES CO..huawei. Huawei Confidential . LTD.

Sign up to vote on this title
UsefulNot useful