You are on page 1of 54

Troubleshooting the MSTP+

OSN 7500/3500/1500
Contents

1 Packet Networking Solutions of the MSTP+ Equipment


2 Common Procedure of Fault Locating

3 Fault Locating Methods

4 Tunnel Fault Locating

5 Protection Fault Locating


6 Common Problems with the MSTP+ Equipment

 2

Packet Networking Solutions of the MSTP+Equipment

Coexistence of the TDM domain and the packet RNC RNC

domain on the MSTP+ equipment BSC

Packet services raise new requirements for


bandwidth utilization, service management,


protection, and equipment maintenance of the
transport network. Hence, the transport network
OSN 3500 10GE OSN 3500
has to be upgraded for packet services. STM-16/64

Packet services coexist with SDH services on the


Convergence
live network. The access layer and convergence node

layer of the transport network can be upgraded


OSN 1500
GE Ring
simultaneously or separately. Metro 1000
STM-1/4
BTS OSN 1500
E1 Metro 1000

FE FE
NodeB NodeB

E1
FE
BTS NodeB

 3

Packet Networking Solutions of the MSTP+ Equipment
Hybrid networking of the MSTP+ equipment and the
OptiX PTN equipment
RNC BSC
T2000
If certain equipment on the live network is not ready

for an upgrade, the other equipment can be upgraded


for hybrid networking with the OptiX PTN equipment.

At the convergence layer, the equipment can be 10GE


PTN 3900
OSN 3500 STM-16/64
upgraded to the MSTP+ equipment, which provides
the packet switching kernel, uses the 10GE board,
OSN 3500
and thus composes the 10GE ring with the OptiX PTN
OSN 1500
3900. Metro 1000 PTN 950/1900
OSN 1500
BTS E1 Metro 1000
STM-1/4
At the access layer, the OptiX OSN 1500 and the GE Ring FE

NODEB
OptiX Metro 1000 can be networked with the OptiX
NODEB
FE
PTN 910/950/1900 to build a GE ring that is BTS E1
OSN 1500
Metro 1000
PTN 910/950/19000

connected to the convergence layer. FE


NODEB

The MSTP+ equipment and the OptiX PTN equipment


can be managed together by the U2000.

 4

Contents

1 Packet Networking Solutions of the MSTP+ Equipment

2 Common Procedure of Fault Locating


3 Fault Locating Methods

4 Tunnel Fault Locating

5 Protection Fault Locating

7 Common Problems with the MSTP+ Equipment

 5

Common Procedure of Fault Locating

The simplified topology is as follows:

BSC OSN 3500 OSN 1500 BTS

#4 #3 #2 #1

Locate a fault as follows:


To locate a fault of service degradation or service interruption on the packet transport network, it is essential to find the equipment

where packet loss occurs. As a doctor needs to know the pain spot of a patient before making a prescription, an engineer needs to
locate the fault spot of a network before troubleshooting the fault.

Take network C of 3G equipment as an example. From the NodeB to the RNC, pure packet services are transmitted. In the basic

topology shown above, there are four NEs, two transport NEs and two wireless transmission NEs. When a fault occurs, it can be
located by checking the NEs along the direction of data stream. On the transport NEs, the faulty NE can be identified by checking the
performance statistics at ports and by using the OAM functions. After the faulty NE is identified, the fault can be rectified according to
the equipment configuration and service requirements.

 6

Common Procedure of Fault Locating

1. Check whether the fault occurs at a convergence node, an access


node, or a node interconnected with the base station

2. Check whether the fault occurs on the packet transmission


equipment or on the equipment interconnected with the packet
transmission equipment.
.
3. Troubleshoot the fault at a convergence node, an access node,
or a node interconnected with the base station. The troubleshooting
process is basically the same and is shown in the right figure.

 7

Contents

1 Packet Networking Solutions of the MSTP+ Equipment


2 Common Procedure of Fault Locating

3 Fault Locating Methods


4 Tunnel Fault Locating

5 Protection Fault Locating

6 Common Problems with the MSTP+ Equipment

 8

Fault Locating

Fault locating methods

The following methods are commonly used to locate a fault on the MSTP+ equipment:
 Alarms on the faulty NE
 For most faults, an NE reports the corresponding alarms to instruct the user in troubleshooting. For details
about alarms and alarm handling, see the Feature Description in the documents delivered with the products.
 Traffic statistics
 When the traffic becomes abnormal and no relevant alarms are reported, traffic statistics can be used to circle
the fault spot. Then, the fault can be located exactly with other measures. Traffic statistics include interface
traffic statistics and RMON performance counts.
 Loopback (LB) tests
 When service interruption occurs on a link section, loopbacks can be performed on the service trail to locate the
faulty node. Then, the fault can be located exactly with other measures. The commonly performed loopbacks
are PHY-layer loopback and MAC-layer loopback. Do not loop back E-LAN services. Otherwise, a broadcast
storm occurs.
 OAM functions
 At different service layers, the OAM functions such as MPLS OAM, PW OAM, and ETH OAM are available. For
different troubleshooting purposes, the OAM functions such as ping, LB, traceroute, and PW ping are available.

 9

Troubleshooting Based on Unexpected Alarms

 10

Checking the RMON Statistics of a Port

 11

Tunnel-Based Traffic Statistics

 12

Tunnel-Based Traffic Statistics

 13

PW-Based Traffic Statistics

 14

PW-Based Traffic Statistics

 15

Port-Based Inloops and Outloops

 16

MPLS OAM: Continuity Detection

MPLS_TUNNEL_LOCV

NB 1
ETH GE/FE
MSTP+ GE
MSTP+
SDH MSTP+
MPLS Core
GE/FE
or ETH RNC
network
NB2 ETH MSTP+
MSTP+

GE

MSTP+ RNC

 17

MPLS OAM: Forwarding Errors (Mismatch)

MPLS_TUNNEL_LOCV

NB 1
ETH GE/FE
MSTP+ GE
MSTP+

MSTP+
MPLS Core
GE/FE MPLS RNC
network
NB2 ETH MSTP+
MSTP+

GE

MSTP+ RNC

MPLS_TUNNEL_MISMA
TCH

 18
MPLS OAM: Forwarding Errors (Mismerge)

MPLS_TUNNEL_MISME
NB 1 RGE
ETH GE/FE
MSTP+

MSTP+ GE

GE/FE MSTP+
MPLS Core
RNC
ETH MPLS network
NB2 MSTP+

GE/FE MSTP+

GE
MSTP+
ETH
NB3
MSTP+ RNC

MPLS_TUNNEL_LOCV

 19

MPLS OAM: Defect Indication (BDI)

MPLS_TUNNEL_BDI

Reverse tunnels are bound.


MPLS_TUNNEL_LOCV
NB 1 ETH GE/FE
MSTP+ Reverse tunnels are bound.
MSTP+ GE
MPLS
GE/FE MSTP+
MPLS RNC
Core
NB2 ETH MSTP+ network
MSTP+
GE/FE
GE
MSTP+
ETH
NB3
MSTP+ RNC

 20

MPLS OAM: Defect Indication (FDI)

MPLS_TUNNEL_FDI
NB 1 ETH
MSTP+ GE/FE

MSTP+ GE

MSTP+
GE/FE
MPLS MPLS Core
RNC
ETH network
NB2 MSTP+

MSTP+
GE/FE
GE
MSTP+
ETH
NB3
MSTP+ RNC

 21

LSP Ping

NB 1 ETH GE/FE
MSTP+
GE
MSTP+

MSTP+
GE/FE
MPLS MPLS Core
RNC
ETH network
NB2 MSTP+

MSTP+
GE/FE
GE
MSTP+
ETH
NB3
MSTP+ RNC

 22

LSP TraceRoute

NB 1 ETH
STM-X
MSTP+

MSTP+ STM-X

GE/FE MSTP+
MPLS MPLS RNC
Core
ETH network
NB2 MSTP+

MSTP+
GE/FE
GE
MSTP+
ETH
NB3
MSTP+ RNC

 23
MPLS Tunnel OAM Ping Test

A tunnel ping test is performed:

Test result:

 24
MPLS Tunnel OAM Traceroute Test

A tunnel traceroute test is performed:

Test result:

 25
MPLS Tunnel OAM Ping Test

 26
MPLS Tunnel OAM Ping Test

 27
MPLS Tunnel OAM Ping Test

 28
ETH OAM

The maintenance of Ethernet services is mainly implemented by using


ETH OAM functions (defined in IEEE 802.1ag/ITU-T Y.1731). The ETH
OAM functions include:
–Continuity check (CC), for proactive continuity check
–Loopback (LB), for on-demand continuity check
–Link trace (LT), for on-demand Ethernet link tracing
–Ethernet remote defect indication (RDI)

 29
ETH OAM (CC)

ETH_CFM_LOC
NB 1 ETH
STM-X
MSTP+

MSTP+ GE
MPLS
GE/FE MSTP+
ETH RNC
Core
ETH network
NB2 MSTP+

GE/FE MSTP+

STM-X
MSTP+
ETH
NB3
MSTP+ RNC

MEP

MD

 30
ETH OAM (LB)

NB 1 ETH
STM-X
MSTP+

MSTP+ GE
MPLS
GE/FE MSTP+
ETH RNC
Core
ETH network
NB2 MSTP+

GE/FE MSTP+

STM-X
MSTP+
ETH
NB3
MSTP+ RNC

MEP

MD

 31
ETH OAM (LT)

NB 1 ETH
STM-X
MSTP+

MSTP+ GE
MPLS
GE/FE MSTP+
ETH RNC
Core
ETH network
NB2 MSTP+

GE/FE MSTP+

STM-X
MSTP+
ETH
NB3
MSTP+ RNC

MIP

MEP

MD

 32
ETH OAM Test

An ETH OAM test is performed: Test result:

 33
Contents

1 Packet Networking Solutions of the MSTP+ Equipment

2 Common Procedure of Fault Locating

3 Fault Locating Methods

4 Tunnel Fault Locating


5 Protection Fault Locating

6 Common Problems with the MSTP+ Equipment

 34

Symptoms and Causes of Common Tunnel Faults

 Common Symptoms
 Creating an MPLS tunnel fails, and therefore the services become unavailable.
 An MPLS tunnel becomes faulty, and therefore the services are interrupted.
 Protection switching fails, and therefore the services are interrupted, or packet
loss or bit errors occur in the services.

 Possible Causes
 Cause 1: Creating cross-connections fails.
 Cause 2: The physical link that carries the faulty tunnel becomes faulty.
 Cause 3: Protection switching fails.

 35

Common Troubleshooting Methods for Tunnels

Cause 1: Creating cross-connections fails.


1. Check whether the IP addresses of the ports on different NEs belong to the same network
segment. If yes, modify the IP address of the ports.
2. Check whether one label is allocated to multiple tunnels.
3. Check whether the number of tunnels reaches the maximum value. If yes, adjust tunnels
or delete redundant tunnels.
Cause 2: The physical link that carries the faulty tunnel becomes faulty.
4. Check for the following alarms and handle them if any: HARD_BAD, ETH_LOS,
MPLS_TUNNEL_BDI, MPLS_TUNNEL_FDI, and MPLS_TUNNEL_LOCV.
5. Check whether the opposite NE has any board fault or whether the opposite NE is reset. If
yes, handle the fault on the opposite NE.
Cause 3: Protection switching fails.
The MPLS APS protection switching fails. Handle the failure.

 36

Common Tunnel-Related Alarms

 MPLS_TUNNEL_LOCV: loss of tunnel continuity


Possible Causes
 Cause 1: The ingress node of the tunnel stops transmitting CV/FFD packets.
 Cause 2: The physical link is faulty.
 Cause 3: The board which functions as the ingress node is being reset.
 Cause 4: The service interface is configured incorrectly.
 Cause 5: The CPU is fully used, unable to process ARP protocol packets.
Handling Procedure
Cause 1: The ingress node of the tunnel stops transmitting CV/FFD packets.
1. Check whether the parameters of Detection Mode and Detection Packet Type take the same values
on the two ends. If not, set the parameters to the same values.
2. Check whether the CV/FFD status of the ingress node is Disabled. If yes, change the status to
Enabled.
Cause 2: The physical link is faulty.
Check whether the egress node reports the HARD_BAD, ETH_LOS, or ETH_LINK_DOWN alarm. If yes,
clear the alarm.

 37

Common Tunnel-Related Alarms

Handling Procedure
Cause 3: The board which functions as the ingress node is being reset.
Check whether the ingress node reports the COMMUN_FAIL alarm. If yes, clear the alarm.
Cause 4: The service interface is configured incorrectly.
Check whether the tunnel is configured on the correct interface based on the NE planning
table. For example, check the IP address of the next hop.
Cause 5: The CPU is fully used, unable to process ARP protocol packets.
On the NMS, check whether the CPU_BUSY alarm is reported. If yes, clear the CPU_BUSY
alarm and then check whether the fault is rectified.

 38

MPLS_TUNNEL_MISMATCH

MPLS OAM packet mismatch

Possible Causes
1. The tunnel configuration is incorrect.
2. The physical link is misconnected.

Handling Procedure
1. Check the tunnel configuration at the ingress node and egress node. Specifically, check whether the TUNNELID, ingress-ID, and Egress-ID
take the same values at the two nodes.
2. Check whether the optical fiber is correctly connected. As shown in the figure below, it is intended to connect NE A and NE B. NE C, how-
ever, has a tunnel with the same OUT label as the tunnel on NE A. Therefore, NE C and NE B are connected by mistake, and thus NE B re-
ports the MPLS_TUNNEL_MISMATCH alarm.

 39

MPLS_TUNNEL_MISMERGE

Both correct and wrong OAM packets are received.

Possible Causes
1. The physical link is misconnected.

Handling Procedure
1. Check whether there are misconnections as shown in the figure below.
It is intended to connect NE A and NE B, but NE C and NE B are connected by mistake.
Thus, NE B receives both correct packets and wrong packets.
When NE B receives the packets from NE C, NE B reports the MPLS_TUNNEL_MISMERGE alarm.
In addition, NE D reports the MPLS_TUNNEL_LOCV alarm.

 40

MPLS_TUNNEL_FDI

At a transit node, the physical connection of the IN port is in LinkDown state.

Possible Causes
1. The fiber or network cable is disconnected from the IN port or the IN port is abnormal.

Handling Procedure
1. Check the OAM status at the egress node and locate the faulty NE.
2. Check the alarms on the faulty NE. Check for the alarms about the IN port and laser at the transmit node.
3. Alternatively, you can locate the faulty NE by using the MPLS traceroute function.

 41

Contents
1 Packet Networking Solutions of the MSTP+ Equipment
2 Common Procedure of Fault Locating

3 Fault Locating Methods

4 Tunnel Fault Locating

5 Protection Fault Locating


6 Common Problems with the MSTP+ Equipment

 42

ETH_APS_LOST

Loss of APS protocol packets

Possible Causes
1. The APS protocol is disabled or no APS protection group is configured on the opposite NE.
2. The physical link fails.

Handling Procedure
1. Check whether an APS protection group is configured on the opposite NE. If yes, ensure that the APS
protocol is enabled.
2. Check whether OAM configuration is correct.

 43

ETH_APS_TYPE_MISMATCH
APS type mismatch

Possible Causes
1. The APS configuration (such as the APS protection type and switching mode) is different at the two ends.

Handling Procedure
1. Check whether the APS protection group is configured as the same at the two ends. For example, the APS
protection type is 1+1 at one end and is 1:1 at the other end.

 44

Switching Failure

When the working channel fails, services are not switched to the protection channel and thus services are
interrupted.

Possible Causes
1. No APS protection group is configured.
2. The APS protocol is disabled.
3. The APS protection type is different at the two ends.

Handling Procedure
1. Locate the fault by using the previous alarm detection methods.

 45

Switching Time Exceeds the Threshold

When the working channel fails, services are switched to the protection channel, but the
switching time exceeds 50 ms (the requirement for carrier-level service protection).

Possible Causes
1. Hold-off time is set.
2. OAM packets are not FFD packets with a transmission period of 3.3 ms.

Handling Procedure
1. Check whether hold-off time is set for the APS protection group. If yes, subtract hold-off time from the
switching time.
2. Check whether OAM packets on the working and protection channels are FFD packets with a transmis-
sion period of 3.3 ms. If not, the switching time may exceed 50 ms.

 46

Possible Causes of APS Faults
Cause 1: The settings of the APS protection group differ between the two ends. Check whether the
ETH_APS_PATH_MISMATCH or ETH_APS_TYPE_MISMATCH alarm occurs. If yes, handle the alarm.

Cause 2: The APS protocol is disabled. Check whether the ETH_APS_LOST or ETH_APS_SWITCH_FAIL alarm
occurs. If yes, handle the alarm.

Cause 3: The fiber or electrical cable is misconnected. Check and ensure that the fiber or electrical cable is
connected correctly.

Cause 4: The board that carries the protection channel reports hardware-related alarms, and cannot send APS
frames. Check whether the board that carries the protection channel reports the HARD_BAD, COMMUN_FAIL, or
BUS_ERR alarm. If yes, clear the alarm and check whether the fault is rectified.

Cause 5: The system reports clock-related alarms. Check whether the system reports the TR_LOC, SYNC_C_LOS,
or LTI alarm. If yes, clear the alarm and check whether the fault is rectified.

Cause 6: The protection tunnel is faulty. Check whether the protection channel reports any tunnel-level alarm. If yes,
clear the alarm and check whether the fault is rectified.

 47

Common APS-Related Alarms

Possible Causes
Symptom Alarm
Cause 1: The opposite NE is not configured with APS protection.
The APS protection ETH_APS_PATH_MISMATCH
Cause 2: The settings of the APS protection group differ between the two ends.
group is configured ETH_APS_TYPE_MISMATCH
Cause 3: The APS protocol is disabled. incorrectly or APS
Cause 4: The services on the protection channel are interrupted. frames are not ETH_APS_LOST
received.
ETH_APS_SWITCH_FAIL
Handling Procedure
Cause 1: The opposite NE is not configured with APS protection.
On the NMS, check whether the opposite NE is configured with APS protection. If not, create an APS protection
group with the same settings as those on the local NE. Then, enable the APS protocol.
Cause 2: The settings of the APS protection group differ between the two ends.
On the NMS, check whether the APS settings at the two ends are the same. If not, modify the settings and ensure
that the APS settings at the two ends are the same.
Cause 3: The APS protocol is disabled.
Check whether the APS protocol is disabled at the two ends. If the APS protocol is enabled at only one end,
disable the APS protocol at that end and then enable the APS protocol at the two ends.
Cause 4: The services on the protection channel are interrupted.
Check whether the protection channel reports an alarm about signal loss or signal degrade, such as ETH_LOS. If
yes, clear the alarm before you proceed.

 48

Common APS-Related Alarms

 ETH_APS_SWITCH_FAIL: protection switching failure


Possible Causes
 The settings of the APS protection group differ between the two ends.
Handling Procedure
The settings of the APS protection group differ between the two ends.
On the NMS, check whether the APS settings at the two ends are the same. If not, modify the settings and ensure
that the APS settings at the two ends are the same. Then, disable and enable the APS protocol at the two ends.

 ETH_APS_TYPE_MISMATCH: protection type mismatch


Possible Causes
 Cause 1: The protection type is different.
 Cause 2: The switching mode is different.
 Cause 3: The revertive mode is different.
Handling Procedure
The protection type, switching mode, or revertive mode of the APS protection group is different at the two ends.
On the NMS, check whether the APS settings at the two ends are the same. If not, modify the settings and ensure
that the APS settings at the two ends are the same. Then, disable and enable the APS protocol at the two ends.

 49

Contents

1 Packet Networking Solutions of the MSTP+ Equipment


2 Common Procedure of Fault Locating

3 Fault Locating Methods

4 Tunnel Fault Locating

5 Protection Fault Locating

6 Common Problems with the MSTP+ Equipment

 50

Common Problems with the MSTP+ Equipment

FAQs
 Q: In an attempt to set a port to be occupied exclusively or to a null port, the system prompts that the port is occupied by
services. Why?
A: If a port transmits any Ethernet service, the service uses a VLAN ID of the port. If the Ethernet DCN is enabled at a port, DCN
packets use a VLAN ID of the port.

 Q: In an attempt to change the attribute of a port from Layer-2 attribute to Layer-3 attribute, the system prompts that the port is
occupied by services. Why?
A: A port of Layer-3 attribute can be configured with tunnels and PWs for carrying far-end services. Before changing the attribute
to Layer-2 attribute, the user needs to delete all services, PWs, and tunnels configured at the port and disable the MPLS
protocol.

 Q: A port of an MPLS tunnel is configured as a LAG member, but the traffic at the port cannot be shared. Why?
A: By default, traffic is shared among LAG members according to the MAC addresses of packets. At a port of an MPLS tunnel,
service packets are encapsulated before transmission. Therefore, all service packets have the same source and destination
MAC addresses and thus the traffic cannot be shared among LAG members. This problem can be solved by adopting the load
sharing mode based on MPLS labels, because the services are distinguished by PW labels or tunnel labels.

 51

Common Problems with the MSTP+ Equipment

FAQs
 Q: When an MSTP+ NE transmits a large amount of CS7 packets, the NE may become unreachable to the
NMS. Why?
A: The MSTP+ equipment adopts the inband DCN, which means that DCN packets and service packets share
the fixed bandwidth of a link. To ensure that protocol packets can be transmitted in most cases, the highest
priority CS7 is assigned to protocol packets. If the bandwidth of a link is fully utilized, protocol packets are
discarded at random. As a result, any protocol, such as DCN, IS-IS, and LAG, may fail temporarily. When the
traffic becomes lighter, the failed protocol can recover.

 Q: The services received from an FE interface are transmitted through an MPLS tunnel, but the rate of the
services is lower than 100 Mbit/s. Why?
A: Before entering an MPLS tunnel, services are encapsulated and the length of each packet increases by 22
bytes. Due to a larger data amount, packet loss occurs when the encapsulated services go through an FE port
at the ingress node. At the egress node, the encapsulation bytes of the services are stripped and thus the
service rate becomes lower than 100 Mbit/s.

 52

Common Problems with the MSTP+ Equipment

FAQs

 Q: The maximum transmission unit (MTU) of a port is 46 to 9600, but the actual range is 960 to 9600. Why?
A: If the MTU is lower than 960, DCN packets are discarded. Therefore, the actual MTU range is 960 to 9600.

 53

Thank You
www.huawei.com

You might also like