Troubleshooting The MSTP+

Troubleshooting the MSTP+
OSN 7500/3500/1500
Contents
1 Packet Networking Solutions of the MSTP+ Equipment

2 Common Procedure of Fault Locating
3 Fault Locating Methods
4 Tunnel Fault Locating
5 Protection Fault Locating

6 Common Problems with the MSTP+ Equipment
 2

Packet Networking Solutions of the MSTP+Equipment
Coexistence of the TDM domain and the packet RNC RNC
domain on the MSTP+ equipment BSC
Packet services raise new requirements for


bandwidth utilization, service management,

protection, and equipment maintenance of the
transport network. Hence, the transport network
OSN 3500 10GE OSN 3500
has to be upgraded for packet services. STM-16/64
Packet services coexist with SDH services on the


Convergence
live network. The access layer and convergence node
layer of the transport network can be upgraded

OSN 1500
GE Ring
simultaneously or separately. Metro 1000
STM-1/4
BTS OSN 1500
E1 Metro 1000
FE FE
NodeB NodeB
E1
FE
BTS NodeB
 3

Packet Networking Solutions of the MSTP+ Equipment
Hybrid networking of the MSTP+ equipment and the
OptiX PTN equipment
RNC BSC
T2000
If certain equipment on the live network is not ready

for an upgrade, the other equipment can be upgraded

for hybrid networking with the OptiX PTN equipment.

At the convergence layer, the equipment can be 10GE


PTN 3900
OSN 3500 STM-16/64
upgraded to the MSTP+ equipment, which provides
the packet switching kernel, uses the 10GE board,
OSN 3500
and thus composes the 10GE ring with the OptiX PTN
OSN 1500
3900. Metro 1000 PTN 950/1900
OSN 1500
BTS E1 Metro 1000
STM-1/4
At the access layer, the OptiX OSN 1500 and the GE Ring FE

NODEB
OptiX Metro 1000 can be networked with the OptiX
NODEB
FE
PTN 910/950/1900 to build a GE ring that is BTS E1
OSN 1500
Metro 1000
PTN 910/950/19000
connected to the convergence layer. FE

NODEB
The MSTP+ equipment and the OptiX PTN equipment


can be managed together by the U2000.
 4

Contents

 5

Common Procedure of Fault Locating
The simplified topology is as follows:
BSC OSN 3500 OSN 1500 BTS
#4 #3 #2 #1
Locate a fault as follows:

To locate a fault of service degradation or service interruption on the packet transport network, it is essential to find the equipment

where packet loss occurs. As a doctor needs to know the pain spot of a patient before making a prescription, an engineer needs to
locate the fault spot of a network before troubleshooting the fault.
Take network C of 3G equipment as an example. From the NodeB to the RNC, pure packet services are transmitted. In the basic

topology shown above, there are four NEs, two transport NEs and two wireless transmission NEs. When a fault occurs, it can be
located by checking the NEs along the direction of data stream. On the transport NEs, the faulty NE can be identified by checking the
performance statistics at ports and by using the OAM functions. After the faulty NE is identified, the fault can be rectified according to
the equipment configuration and service requirements.
 6

Common Procedure of Fault Locating
1. Check whether the fault occurs at a convergence node, an access

node, or a node interconnected with the base station
2. Check whether the fault occurs on the packet transmission

equipment or on the equipment interconnected with the packet
transmission equipment.
.
3. Troubleshoot the fault at a convergence node, an access node,
or a node interconnected with the base station. The troubleshooting
process is basically the same and is shown in the right figure.
 7

Contents


 8

Fault Locating
Fault locating methods
The following methods are commonly used to locate a fault on the MSTP+ equipment:
 Alarms on the faulty NE
 For most faults, an NE reports the corresponding alarms to instruct the user in troubleshooting. For details
about alarms and alarm handling, see the Feature Description in the documents delivered with the products.
 Traffic statistics
 When the traffic becomes abnormal and no relevant alarms are reported, traffic statistics can be used to circle
the fault spot. Then, the fault can be located exactly with other measures. Traffic statistics include interface
traffic statistics and RMON performance counts.
 Loopback (LB) tests
 When service interruption occurs on a link section, loopbacks can be performed on the service trail to locate the
faulty node. Then, the fault can be located exactly with other measures. The commonly performed loopbacks
are PHY-layer loopback and MAC-layer loopback. Do not loop back E-LAN services. Otherwise, a broadcast
storm occurs.
 OAM functions
 At different service layers, the OAM functions such as MPLS OAM, PW OAM, and ETH OAM are available. For
different troubleshooting purposes, the OAM functions such as ping, LB, traceroute, and PW ping are available.
 9

Troubleshooting Based on Unexpected Alarms
 10

Checking the RMON Statistics of a Port
 11

Tunnel-Based Traffic Statistics
 12

Tunnel-Based Traffic Statistics
 13

PW-Based Traffic Statistics
 14

PW-Based Traffic Statistics
 15

Port-Based Inloops and Outloops
 16

MPLS OAM: Continuity Detection
MPLS_TUNNEL_LOCV
NB 1
ETH GE/FE
MSTP+ GE
MSTP+
SDH MSTP+
MPLS Core
GE/FE
or ETH RNC
network
NB2 ETH MSTP+
MSTP+
GE
MSTP+ RNC
 17

MPLS OAM: Forwarding Errors (Mismatch)
MPLS_TUNNEL_LOCV
NB 1
ETH GE/FE
MSTP+ GE
MSTP+
MSTP+
MPLS Core
GE/FE MPLS RNC
network
NB2 ETH MSTP+
MSTP+
GE
MSTP+ RNC
MPLS_TUNNEL_MISMA
TCH
 18
MPLS OAM: Forwarding Errors (Mismerge)
MPLS_TUNNEL_MISME
NB 1 RGE
ETH GE/FE
MSTP+
MSTP+ GE
GE/FE MSTP+
MPLS Core
RNC
ETH MPLS network
NB2 MSTP+
GE/FE MSTP+
GE
MSTP+
ETH
NB3
MSTP+ RNC
MPLS_TUNNEL_LOCV
 19

MPLS OAM: Defect Indication (BDI)
MPLS_TUNNEL_BDI
Reverse tunnels are bound.

MPLS_TUNNEL_LOCV
NB 1 ETH GE/FE
MSTP+ Reverse tunnels are bound.
MSTP+ GE
MPLS
GE/FE MSTP+
MPLS RNC
Core
NB2 ETH MSTP+ network
MSTP+
GE/FE
GE
MSTP+
ETH
NB3
MSTP+ RNC
 20

MPLS OAM: Defect Indication (FDI)
MPLS_TUNNEL_FDI
NB 1 ETH
MSTP+ GE/FE
MSTP+ GE
MSTP+
GE/FE
MPLS MPLS Core
RNC
ETH network
NB2 MSTP+
MSTP+
GE/FE
GE
MSTP+
ETH
NB3
MSTP+ RNC
 21

LSP Ping
NB 1 ETH GE/FE
MSTP+
GE
MSTP+
MSTP+
GE/FE
MPLS MPLS Core
RNC
ETH network
NB2 MSTP+
MSTP+
GE/FE
GE
MSTP+
ETH
NB3
MSTP+ RNC
 22

LSP TraceRoute
NB 1 ETH
STM-X
MSTP+
MSTP+ STM-X
GE/FE MSTP+
MPLS MPLS RNC
Core
ETH network
NB2 MSTP+
MSTP+
GE/FE
GE
MSTP+
ETH
NB3
MSTP+ RNC
 23
MPLS Tunnel OAM Ping Test
A tunnel ping test is performed:
Test result:
 24
MPLS Tunnel OAM Traceroute Test
A tunnel traceroute test is performed:
Test result:
 25
 26
 27
 28
ETH OAM
The maintenance of Ethernet services is mainly implemented by using

ETH OAM functions (defined in IEEE 802.1ag/ITU-T Y.1731). The ETH
OAM functions include:
–Continuity check (CC), for proactive continuity check
–Loopback (LB), for on-demand continuity check
–Link trace (LT), for on-demand Ethernet link tracing
–Ethernet remote defect indication (RDI)
 29
ETH OAM (CC)
ETH_CFM_LOC
NB 1 ETH
STM-X
MSTP+
MSTP+ GE
MPLS
GE/FE MSTP+
ETH RNC
Core
ETH network
NB2 MSTP+
GE/FE MSTP+
STM-X
MSTP+
ETH
NB3
MSTP+ RNC
MEP
MD
 30
ETH OAM (LB)
NB 1 ETH
STM-X
MSTP+
MSTP+ GE
MPLS
GE/FE MSTP+
ETH RNC
Core
ETH network
NB2 MSTP+
GE/FE MSTP+
STM-X
MSTP+
ETH
NB3
MSTP+ RNC
MEP
MD
 31
ETH OAM (LT)
NB 1 ETH
STM-X
MSTP+
MSTP+ GE
MPLS
GE/FE MSTP+
ETH RNC
Core
ETH network
NB2 MSTP+
GE/FE MSTP+
STM-X
MSTP+
ETH
NB3
MSTP+ RNC
MIP
MEP
MD
 32
ETH OAM Test
An ETH OAM test is performed: Test result:
 33
Contents

 34

Symptoms and Causes of Common Tunnel Faults
 Common Symptoms
 Creating an MPLS tunnel fails, and therefore the services become unavailable.
 An MPLS tunnel becomes faulty, and therefore the services are interrupted.
 Protection switching fails, and therefore the services are interrupted, or packet
loss or bit errors occur in the services.
 Possible Causes
 Cause 1: Creating cross-connections fails.
 Cause 2: The physical link that carries the faulty tunnel becomes faulty.
 Cause 3: Protection switching fails.
 35

Common Troubleshooting Methods for Tunnels
Cause 1: Creating cross-connections fails.

1. Check whether the IP addresses of the ports on different NEs belong to the same network
segment. If yes, modify the IP address of the ports.
2. Check whether one label is allocated to multiple tunnels.
3. Check whether the number of tunnels reaches the maximum value. If yes, adjust tunnels
or delete redundant tunnels.
Cause 2: The physical link that carries the faulty tunnel becomes faulty.
4. Check for the following alarms and handle them if any: HARD_BAD, ETH_LOS,
MPLS_TUNNEL_BDI, MPLS_TUNNEL_FDI, and MPLS_TUNNEL_LOCV.
5. Check whether the opposite NE has any board fault or whether the opposite NE is reset. If
yes, handle the fault on the opposite NE.
Cause 3: Protection switching fails.
The MPLS APS protection switching fails. Handle the failure.
 36

Common Tunnel-Related Alarms
 MPLS_TUNNEL_LOCV: loss of tunnel continuity

Possible Causes
 Cause 1: The ingress node of the tunnel stops transmitting CV/FFD packets.
 Cause 2: The physical link is faulty.
 Cause 3: The board which functions as the ingress node is being reset.
 Cause 4: The service interface is configured incorrectly.
 Cause 5: The CPU is fully used, unable to process ARP protocol packets.
Handling Procedure
Cause 1: The ingress node of the tunnel stops transmitting CV/FFD packets.
1. Check whether the parameters of Detection Mode and Detection Packet Type take the same values
on the two ends. If not, set the parameters to the same values.
2. Check whether the CV/FFD status of the ingress node is Disabled. If yes, change the status to
Enabled.
Cause 2: The physical link is faulty.
Check whether the egress node reports the HARD_BAD, ETH_LOS, or ETH_LINK_DOWN alarm. If yes,
clear the alarm.
 37

Common Tunnel-Related Alarms
Handling Procedure
Cause 3: The board which functions as the ingress node is being reset.
Check whether the ingress node reports the COMMUN_FAIL alarm. If yes, clear the alarm.
Cause 4: The service interface is configured incorrectly.
Check whether the tunnel is configured on the correct interface based on the NE planning
table. For example, check the IP address of the next hop.
Cause 5: The CPU is fully used, unable to process ARP protocol packets.
On the NMS, check whether the CPU_BUSY alarm is reported. If yes, clear the CPU_BUSY
alarm and then check whether the fault is rectified.
 38

MPLS_TUNNEL_MISMATCH
MPLS OAM packet mismatch
Possible Causes
1. The tunnel configuration is incorrect.
2. The physical link is misconnected.
Handling Procedure
1. Check the tunnel configuration at the ingress node and egress node. Specifically, check whether the TUNNELID, ingress-ID, and Egress-ID
take the same values at the two nodes.
2. Check whether the optical fiber is correctly connected. As shown in the figure below, it is intended to connect NE A and NE B. NE C, how-
ever, has a tunnel with the same OUT label as the tunnel on NE A. Therefore, NE C and NE B are connected by mistake, and thus NE B re-
ports the MPLS_TUNNEL_MISMATCH alarm.
 39

MPLS_TUNNEL_MISMERGE
Both correct and wrong OAM packets are received.
Possible Causes
1. The physical link is misconnected.
Handling Procedure
1. Check whether there are misconnections as shown in the figure below.
It is intended to connect NE A and NE B, but NE C and NE B are connected by mistake.
Thus, NE B receives both correct packets and wrong packets.
When NE B receives the packets from NE C, NE B reports the MPLS_TUNNEL_MISMERGE alarm.
In addition, NE D reports the MPLS_TUNNEL_LOCV alarm.
 40

MPLS_TUNNEL_FDI
At a transit node, the physical connection of the IN port is in LinkDown state.
Possible Causes
1. The fiber or network cable is disconnected from the IN port or the IN port is abnormal.
Handling Procedure
1. Check the OAM status at the egress node and locate the faulty NE.
2. Check the alarms on the faulty NE. Check for the alarms about the IN port and laser at the transmit node.
3. Alternatively, you can locate the faulty NE by using the MPLS traceroute function.
 41

Contents

 42

ETH_APS_LOST
Loss of APS protocol packets
Possible Causes
1. The APS protocol is disabled or no APS protection group is configured on the opposite NE.
2. The physical link fails.
Handling Procedure
1. Check whether an APS protection group is configured on the opposite NE. If yes, ensure that the APS
protocol is enabled.
2. Check whether OAM configuration is correct.
 43

ETH_APS_TYPE_MISMATCH
APS type mismatch
Possible Causes
1. The APS configuration (such as the APS protection type and switching mode) is different at the two ends.
Handling Procedure
1. Check whether the APS protection group is configured as the same at the two ends. For example, the APS
protection type is 1+1 at one end and is 1:1 at the other end.
 44

Switching Failure
When the working channel fails, services are not switched to the protection channel and thus services are
interrupted.
Possible Causes
1. No APS protection group is configured.
2. The APS protocol is disabled.
3. The APS protection type is different at the two ends.
Handling Procedure
1. Locate the fault by using the previous alarm detection methods.
 45

Switching Time Exceeds the Threshold
When the working channel fails, services are switched to the protection channel, but the
switching time exceeds 50 ms (the requirement for carrier-level service protection).
Possible Causes
1. Hold-off time is set.
2. OAM packets are not FFD packets with a transmission period of 3.3 ms.
Handling Procedure
1. Check whether hold-off time is set for the APS protection group. If yes, subtract hold-off time from the
switching time.
2. Check whether OAM packets on the working and protection channels are FFD packets with a transmis-
sion period of 3.3 ms. If not, the switching time may exceed 50 ms.
 46

Possible Causes of APS Faults
Cause 1: The settings of the APS protection group differ between the two ends. Check whether the
ETH_APS_PATH_MISMATCH or ETH_APS_TYPE_MISMATCH alarm occurs. If yes, handle the alarm.
Cause 2: The APS protocol is disabled. Check whether the ETH_APS_LOST or ETH_APS_SWITCH_FAIL alarm
occurs. If yes, handle the alarm.
Cause 3: The fiber or electrical cable is misconnected. Check and ensure that the fiber or electrical cable is
connected correctly.
Cause 4: The board that carries the protection channel reports hardware-related alarms, and cannot send APS
frames. Check whether the board that carries the protection channel reports the HARD_BAD, COMMUN_FAIL, or
BUS_ERR alarm. If yes, clear the alarm and check whether the fault is rectified.
Cause 5: The system reports clock-related alarms. Check whether the system reports the TR_LOC, SYNC_C_LOS,
or LTI alarm. If yes, clear the alarm and check whether the fault is rectified.
Cause 6: The protection tunnel is faulty. Check whether the protection channel reports any tunnel-level alarm. If yes,
clear the alarm and check whether the fault is rectified.
 47

Common APS-Related Alarms
Possible Causes
Symptom Alarm
Cause 1: The opposite NE is not configured with APS protection.
The APS protection ETH_APS_PATH_MISMATCH
Cause 2: The settings of the APS protection group differ between the two ends.
group is configured ETH_APS_TYPE_MISMATCH
Cause 3: The APS protocol is disabled. incorrectly or APS
Cause 4: The services on the protection channel are interrupted. frames are not ETH_APS_LOST
received.
ETH_APS_SWITCH_FAIL
Handling Procedure
Cause 1: The opposite NE is not configured with APS protection.
On the NMS, check whether the opposite NE is configured with APS protection. If not, create an APS protection
group with the same settings as those on the local NE. Then, enable the APS protocol.
Cause 2: The settings of the APS protection group differ between the two ends.
On the NMS, check whether the APS settings at the two ends are the same. If not, modify the settings and ensure
that the APS settings at the two ends are the same.
Cause 3: The APS protocol is disabled.
Check whether the APS protocol is disabled at the two ends. If the APS protocol is enabled at only one end,
disable the APS protocol at that end and then enable the APS protocol at the two ends.
Cause 4: The services on the protection channel are interrupted.
Check whether the protection channel reports an alarm about signal loss or signal degrade, such as ETH_LOS. If
yes, clear the alarm before you proceed.
 48

Common APS-Related Alarms
 ETH_APS_SWITCH_FAIL: protection switching failure

Possible Causes
 The settings of the APS protection group differ between the two ends.
Handling Procedure
The settings of the APS protection group differ between the two ends.
that the APS settings at the two ends are the same. Then, disable and enable the APS protocol at the two ends.
 ETH_APS_TYPE_MISMATCH: protection type mismatch

Possible Causes
 Cause 1: The protection type is different.
 Cause 2: The switching mode is different.
 Cause 3: The revertive mode is different.
Handling Procedure
The protection type, switching mode, or revertive mode of the APS protection group is different at the two ends.
that the APS settings at the two ends are the same. Then, disable and enable the APS protocol at the two ends.
 49

Contents

 50

Common Problems with the MSTP+ Equipment
FAQs
 Q: In an attempt to set a port to be occupied exclusively or to a null port, the system prompts that the port is occupied by
services. Why?
A: If a port transmits any Ethernet service, the service uses a VLAN ID of the port. If the Ethernet DCN is enabled at a port, DCN
packets use a VLAN ID of the port.
 Q: In an attempt to change the attribute of a port from Layer-2 attribute to Layer-3 attribute, the system prompts that the port is
occupied by services. Why?
A: A port of Layer-3 attribute can be configured with tunnels and PWs for carrying far-end services. Before changing the attribute
to Layer-2 attribute, the user needs to delete all services, PWs, and tunnels configured at the port and disable the MPLS
protocol.
 Q: A port of an MPLS tunnel is configured as a LAG member, but the traffic at the port cannot be shared. Why?
A: By default, traffic is shared among LAG members according to the MAC addresses of packets. At a port of an MPLS tunnel,
service packets are encapsulated before transmission. Therefore, all service packets have the same source and destination
MAC addresses and thus the traffic cannot be shared among LAG members. This problem can be solved by adopting the load
sharing mode based on MPLS labels, because the services are distinguished by PW labels or tunnel labels.
 51

FAQs
 Q: When an MSTP+ NE transmits a large amount of CS7 packets, the NE may become unreachable to the
NMS. Why?
A: The MSTP+ equipment adopts the inband DCN, which means that DCN packets and service packets share
the fixed bandwidth of a link. To ensure that protocol packets can be transmitted in most cases, the highest
priority CS7 is assigned to protocol packets. If the bandwidth of a link is fully utilized, protocol packets are
discarded at random. As a result, any protocol, such as DCN, IS-IS, and LAG, may fail temporarily. When the
traffic becomes lighter, the failed protocol can recover.
 Q: The services received from an FE interface are transmitted through an MPLS tunnel, but the rate of the
services is lower than 100 Mbit/s. Why?
A: Before entering an MPLS tunnel, services are encapsulated and the length of each packet increases by 22
bytes. Due to a larger data amount, packet loss occurs when the encapsulated services go through an FE port
at the ingress node. At the egress node, the encapsulation bytes of the services are stripped and thus the
service rate becomes lower than 100 Mbit/s.
 52

FAQs
 Q: The maximum transmission unit (MTU) of a port is 46 to 9600, but the actual range is 960 to 9600. Why?
A: If the MTU is lower than 960, DCN packets are discarded. Therefore, the actual MTU range is 960 to 9600.
 53

Thank You
www.huawei.com

Troubleshooting The MSTP+

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Troubleshooting The MSTP+

Uploaded by

Copyright:

Available Formats

Troubleshooting the MSTP+

1 Packet Networking Solutions of the MSTP+ Equipment

3 Fault Locating Methods

4 Tunnel Fault Locating

5 Protection Fault Locating

Coexistence of the TDM domain and the packet RNC RNC

domain on the MSTP+ equipment BSC

Packet services raise new requirements for

bandwidth utilization, service management,

Packet services coexist with SDH services on the

layer of the transport network can be upgraded

for an upgrade, the other equipment can be upgraded

At the convergence layer, the equipment can be 10GE

connected to the convergence layer. FE

The MSTP+ equipment and the OptiX PTN equipment

can be managed together by the U2000.

1 Packet Networking Solutions of the MSTP+ Equipment

2 Common Procedure of Fault Locating

4 Tunnel Fault Locating

5 Protection Fault Locating

7 Common Problems with the MSTP+ Equipment

The simplified topology is as follows:

BSC OSN 3500 OSN 1500 BTS

Locate a fault as follows:

1. Check whether the fault occurs at a convergence node, an access

2. Check whether the fault occurs on the packet transmission

1 Packet Networking Solutions of the MSTP+ Equipment

3 Fault Locating Methods

5 Protection Fault Locating

6 Common Problems with the MSTP+ Equipment

Fault locating methods

Reverse tunnels are bound.

A tunnel ping test is performed:

A tunnel traceroute test is performed:

The maintenance of Ethernet services is mainly implemented by using

An ETH OAM test is performed: Test result:

1 Packet Networking Solutions of the MSTP+ Equipment

2 Common Procedure of Fault Locating

3 Fault Locating Methods

4 Tunnel Fault Locating

6 Common Problems with the MSTP+ Equipment

Cause 1: Creating cross-connections fails.

 MPLS_TUNNEL_LOCV: loss of tunnel continuity

MPLS OAM packet mismatch

Both correct and wrong OAM packets are received.

At a transit node, the physical connection of the IN port is in LinkDown state.

3 Fault Locating Methods

4 Tunnel Fault Locating

5 Protection Fault Locating

Loss of APS protocol packets

 ETH_APS_SWITCH_FAIL: protection switching failure

 ETH_APS_TYPE_MISMATCH: protection type mismatch

1 Packet Networking Solutions of the MSTP+ Equipment

3 Fault Locating Methods

4 Tunnel Fault Locating

5 Protection Fault Locating

6 Common Problems with the MSTP+ Equipment

You might also like