Professional Documents
Culture Documents
Troubleshooting The MSTP+
Troubleshooting The MSTP+
OSN 7500/3500/1500
Contents
2
Packet Networking Solutions of the MSTP+Equipment
Convergence
live network. The access layer and convergence node
FE FE
NodeB NodeB
E1
FE
BTS NodeB
3
Packet Networking Solutions of the MSTP+ Equipment
Hybrid networking of the MSTP+ equipment and the
OptiX PTN equipment
RNC BSC
T2000
If certain equipment on the live network is not ready
PTN 3900
OSN 3500 STM-16/64
upgraded to the MSTP+ equipment, which provides
the packet switching kernel, uses the 10GE board,
OSN 3500
and thus composes the 10GE ring with the OptiX PTN
OSN 1500
3900. Metro 1000 PTN 950/1900
OSN 1500
BTS E1 Metro 1000
STM-1/4
At the access layer, the OptiX OSN 1500 and the GE Ring FE
NODEB
OptiX Metro 1000 can be networked with the OptiX
NODEB
FE
PTN 910/950/1900 to build a GE ring that is BTS E1
OSN 1500
Metro 1000
PTN 910/950/19000
4
Contents
5
Common Procedure of Fault Locating
#4 #3 #2 #1
where packet loss occurs. As a doctor needs to know the pain spot of a patient before making a prescription, an engineer needs to
locate the fault spot of a network before troubleshooting the fault.
Take network C of 3G equipment as an example. From the NodeB to the RNC, pure packet services are transmitted. In the basic
topology shown above, there are four NEs, two transport NEs and two wireless transmission NEs. When a fault occurs, it can be
located by checking the NEs along the direction of data stream. On the transport NEs, the faulty NE can be identified by checking the
performance statistics at ports and by using the OAM functions. After the faulty NE is identified, the fault can be rectified according to
the equipment configuration and service requirements.
6
Common Procedure of Fault Locating
7
Contents
8
Fault Locating
The following methods are commonly used to locate a fault on the MSTP+ equipment:
Alarms on the faulty NE
For most faults, an NE reports the corresponding alarms to instruct the user in troubleshooting. For details
about alarms and alarm handling, see the Feature Description in the documents delivered with the products.
Traffic statistics
When the traffic becomes abnormal and no relevant alarms are reported, traffic statistics can be used to circle
the fault spot. Then, the fault can be located exactly with other measures. Traffic statistics include interface
traffic statistics and RMON performance counts.
Loopback (LB) tests
When service interruption occurs on a link section, loopbacks can be performed on the service trail to locate the
faulty node. Then, the fault can be located exactly with other measures. The commonly performed loopbacks
are PHY-layer loopback and MAC-layer loopback. Do not loop back E-LAN services. Otherwise, a broadcast
storm occurs.
OAM functions
At different service layers, the OAM functions such as MPLS OAM, PW OAM, and ETH OAM are available. For
different troubleshooting purposes, the OAM functions such as ping, LB, traceroute, and PW ping are available.
9
Troubleshooting Based on Unexpected Alarms
10
Checking the RMON Statistics of a Port
11
Tunnel-Based Traffic Statistics
12
Tunnel-Based Traffic Statistics
13
PW-Based Traffic Statistics
14
PW-Based Traffic Statistics
15
Port-Based Inloops and Outloops
16
MPLS OAM: Continuity Detection
MPLS_TUNNEL_LOCV
NB 1
ETH GE/FE
MSTP+ GE
MSTP+
SDH MSTP+
MPLS Core
GE/FE
or ETH RNC
network
NB2 ETH MSTP+
MSTP+
GE
MSTP+ RNC
17
MPLS OAM: Forwarding Errors (Mismatch)
MPLS_TUNNEL_LOCV
NB 1
ETH GE/FE
MSTP+ GE
MSTP+
MSTP+
MPLS Core
GE/FE MPLS RNC
network
NB2 ETH MSTP+
MSTP+
GE
MSTP+ RNC
MPLS_TUNNEL_MISMA
TCH
18
MPLS OAM: Forwarding Errors (Mismerge)
MPLS_TUNNEL_MISME
NB 1 RGE
ETH GE/FE
MSTP+
MSTP+ GE
GE/FE MSTP+
MPLS Core
RNC
ETH MPLS network
NB2 MSTP+
GE/FE MSTP+
GE
MSTP+
ETH
NB3
MSTP+ RNC
MPLS_TUNNEL_LOCV
19
MPLS OAM: Defect Indication (BDI)
MPLS_TUNNEL_BDI
20
MPLS OAM: Defect Indication (FDI)
MPLS_TUNNEL_FDI
NB 1 ETH
MSTP+ GE/FE
MSTP+ GE
MSTP+
GE/FE
MPLS MPLS Core
RNC
ETH network
NB2 MSTP+
MSTP+
GE/FE
GE
MSTP+
ETH
NB3
MSTP+ RNC
21
LSP Ping
NB 1 ETH GE/FE
MSTP+
GE
MSTP+
MSTP+
GE/FE
MPLS MPLS Core
RNC
ETH network
NB2 MSTP+
MSTP+
GE/FE
GE
MSTP+
ETH
NB3
MSTP+ RNC
22
LSP TraceRoute
NB 1 ETH
STM-X
MSTP+
MSTP+ STM-X
GE/FE MSTP+
MPLS MPLS RNC
Core
ETH network
NB2 MSTP+
MSTP+
GE/FE
GE
MSTP+
ETH
NB3
MSTP+ RNC
23
MPLS Tunnel OAM Ping Test
Test result:
24
MPLS Tunnel OAM Traceroute Test
Test result:
25
MPLS Tunnel OAM Ping Test
26
MPLS Tunnel OAM Ping Test
27
MPLS Tunnel OAM Ping Test
28
ETH OAM
29
ETH OAM (CC)
ETH_CFM_LOC
NB 1 ETH
STM-X
MSTP+
MSTP+ GE
MPLS
GE/FE MSTP+
ETH RNC
Core
ETH network
NB2 MSTP+
GE/FE MSTP+
STM-X
MSTP+
ETH
NB3
MSTP+ RNC
MEP
MD
30
ETH OAM (LB)
NB 1 ETH
STM-X
MSTP+
MSTP+ GE
MPLS
GE/FE MSTP+
ETH RNC
Core
ETH network
NB2 MSTP+
GE/FE MSTP+
STM-X
MSTP+
ETH
NB3
MSTP+ RNC
MEP
MD
31
ETH OAM (LT)
NB 1 ETH
STM-X
MSTP+
MSTP+ GE
MPLS
GE/FE MSTP+
ETH RNC
Core
ETH network
NB2 MSTP+
GE/FE MSTP+
STM-X
MSTP+
ETH
NB3
MSTP+ RNC
MIP
MEP
MD
32
ETH OAM Test
33
Contents
34
Symptoms and Causes of Common Tunnel Faults
Common Symptoms
Creating an MPLS tunnel fails, and therefore the services become unavailable.
An MPLS tunnel becomes faulty, and therefore the services are interrupted.
Protection switching fails, and therefore the services are interrupted, or packet
loss or bit errors occur in the services.
Possible Causes
Cause 1: Creating cross-connections fails.
Cause 2: The physical link that carries the faulty tunnel becomes faulty.
Cause 3: Protection switching fails.
35
Common Troubleshooting Methods for Tunnels
36
Common Tunnel-Related Alarms
37
Common Tunnel-Related Alarms
Handling Procedure
Cause 3: The board which functions as the ingress node is being reset.
Check whether the ingress node reports the COMMUN_FAIL alarm. If yes, clear the alarm.
Cause 4: The service interface is configured incorrectly.
Check whether the tunnel is configured on the correct interface based on the NE planning
table. For example, check the IP address of the next hop.
Cause 5: The CPU is fully used, unable to process ARP protocol packets.
On the NMS, check whether the CPU_BUSY alarm is reported. If yes, clear the CPU_BUSY
alarm and then check whether the fault is rectified.
38
MPLS_TUNNEL_MISMATCH
Possible Causes
1. The tunnel configuration is incorrect.
2. The physical link is misconnected.
Handling Procedure
1. Check the tunnel configuration at the ingress node and egress node. Specifically, check whether the TUNNELID, ingress-ID, and Egress-ID
take the same values at the two nodes.
2. Check whether the optical fiber is correctly connected. As shown in the figure below, it is intended to connect NE A and NE B. NE C, how-
ever, has a tunnel with the same OUT label as the tunnel on NE A. Therefore, NE C and NE B are connected by mistake, and thus NE B re-
ports the MPLS_TUNNEL_MISMATCH alarm.
39
MPLS_TUNNEL_MISMERGE
Possible Causes
1. The physical link is misconnected.
Handling Procedure
1. Check whether there are misconnections as shown in the figure below.
It is intended to connect NE A and NE B, but NE C and NE B are connected by mistake.
Thus, NE B receives both correct packets and wrong packets.
When NE B receives the packets from NE C, NE B reports the MPLS_TUNNEL_MISMERGE alarm.
In addition, NE D reports the MPLS_TUNNEL_LOCV alarm.
40
MPLS_TUNNEL_FDI
Possible Causes
1. The fiber or network cable is disconnected from the IN port or the IN port is abnormal.
Handling Procedure
1. Check the OAM status at the egress node and locate the faulty NE.
2. Check the alarms on the faulty NE. Check for the alarms about the IN port and laser at the transmit node.
3. Alternatively, you can locate the faulty NE by using the MPLS traceroute function.
41
Contents
1 Packet Networking Solutions of the MSTP+ Equipment
2 Common Procedure of Fault Locating
42
ETH_APS_LOST
Possible Causes
1. The APS protocol is disabled or no APS protection group is configured on the opposite NE.
2. The physical link fails.
Handling Procedure
1. Check whether an APS protection group is configured on the opposite NE. If yes, ensure that the APS
protocol is enabled.
2. Check whether OAM configuration is correct.
43
ETH_APS_TYPE_MISMATCH
APS type mismatch
Possible Causes
1. The APS configuration (such as the APS protection type and switching mode) is different at the two ends.
Handling Procedure
1. Check whether the APS protection group is configured as the same at the two ends. For example, the APS
protection type is 1+1 at one end and is 1:1 at the other end.
44
Switching Failure
When the working channel fails, services are not switched to the protection channel and thus services are
interrupted.
Possible Causes
1. No APS protection group is configured.
2. The APS protocol is disabled.
3. The APS protection type is different at the two ends.
Handling Procedure
1. Locate the fault by using the previous alarm detection methods.
45
Switching Time Exceeds the Threshold
When the working channel fails, services are switched to the protection channel, but the
switching time exceeds 50 ms (the requirement for carrier-level service protection).
Possible Causes
1. Hold-off time is set.
2. OAM packets are not FFD packets with a transmission period of 3.3 ms.
Handling Procedure
1. Check whether hold-off time is set for the APS protection group. If yes, subtract hold-off time from the
switching time.
2. Check whether OAM packets on the working and protection channels are FFD packets with a transmis-
sion period of 3.3 ms. If not, the switching time may exceed 50 ms.
46
Possible Causes of APS Faults
Cause 1: The settings of the APS protection group differ between the two ends. Check whether the
ETH_APS_PATH_MISMATCH or ETH_APS_TYPE_MISMATCH alarm occurs. If yes, handle the alarm.
Cause 2: The APS protocol is disabled. Check whether the ETH_APS_LOST or ETH_APS_SWITCH_FAIL alarm
occurs. If yes, handle the alarm.
Cause 3: The fiber or electrical cable is misconnected. Check and ensure that the fiber or electrical cable is
connected correctly.
Cause 4: The board that carries the protection channel reports hardware-related alarms, and cannot send APS
frames. Check whether the board that carries the protection channel reports the HARD_BAD, COMMUN_FAIL, or
BUS_ERR alarm. If yes, clear the alarm and check whether the fault is rectified.
Cause 5: The system reports clock-related alarms. Check whether the system reports the TR_LOC, SYNC_C_LOS,
or LTI alarm. If yes, clear the alarm and check whether the fault is rectified.
Cause 6: The protection tunnel is faulty. Check whether the protection channel reports any tunnel-level alarm. If yes,
clear the alarm and check whether the fault is rectified.
47
Common APS-Related Alarms
Possible Causes
Symptom Alarm
Cause 1: The opposite NE is not configured with APS protection.
The APS protection ETH_APS_PATH_MISMATCH
Cause 2: The settings of the APS protection group differ between the two ends.
group is configured ETH_APS_TYPE_MISMATCH
Cause 3: The APS protocol is disabled. incorrectly or APS
Cause 4: The services on the protection channel are interrupted. frames are not ETH_APS_LOST
received.
ETH_APS_SWITCH_FAIL
Handling Procedure
Cause 1: The opposite NE is not configured with APS protection.
On the NMS, check whether the opposite NE is configured with APS protection. If not, create an APS protection
group with the same settings as those on the local NE. Then, enable the APS protocol.
Cause 2: The settings of the APS protection group differ between the two ends.
On the NMS, check whether the APS settings at the two ends are the same. If not, modify the settings and ensure
that the APS settings at the two ends are the same.
Cause 3: The APS protocol is disabled.
Check whether the APS protocol is disabled at the two ends. If the APS protocol is enabled at only one end,
disable the APS protocol at that end and then enable the APS protocol at the two ends.
Cause 4: The services on the protection channel are interrupted.
Check whether the protection channel reports an alarm about signal loss or signal degrade, such as ETH_LOS. If
yes, clear the alarm before you proceed.
48
Common APS-Related Alarms
49
Contents
50
Common Problems with the MSTP+ Equipment
FAQs
Q: In an attempt to set a port to be occupied exclusively or to a null port, the system prompts that the port is occupied by
services. Why?
A: If a port transmits any Ethernet service, the service uses a VLAN ID of the port. If the Ethernet DCN is enabled at a port, DCN
packets use a VLAN ID of the port.
Q: In an attempt to change the attribute of a port from Layer-2 attribute to Layer-3 attribute, the system prompts that the port is
occupied by services. Why?
A: A port of Layer-3 attribute can be configured with tunnels and PWs for carrying far-end services. Before changing the attribute
to Layer-2 attribute, the user needs to delete all services, PWs, and tunnels configured at the port and disable the MPLS
protocol.
Q: A port of an MPLS tunnel is configured as a LAG member, but the traffic at the port cannot be shared. Why?
A: By default, traffic is shared among LAG members according to the MAC addresses of packets. At a port of an MPLS tunnel,
service packets are encapsulated before transmission. Therefore, all service packets have the same source and destination
MAC addresses and thus the traffic cannot be shared among LAG members. This problem can be solved by adopting the load
sharing mode based on MPLS labels, because the services are distinguished by PW labels or tunnel labels.
51
Common Problems with the MSTP+ Equipment
FAQs
Q: When an MSTP+ NE transmits a large amount of CS7 packets, the NE may become unreachable to the
NMS. Why?
A: The MSTP+ equipment adopts the inband DCN, which means that DCN packets and service packets share
the fixed bandwidth of a link. To ensure that protocol packets can be transmitted in most cases, the highest
priority CS7 is assigned to protocol packets. If the bandwidth of a link is fully utilized, protocol packets are
discarded at random. As a result, any protocol, such as DCN, IS-IS, and LAG, may fail temporarily. When the
traffic becomes lighter, the failed protocol can recover.
Q: The services received from an FE interface are transmitted through an MPLS tunnel, but the rate of the
services is lower than 100 Mbit/s. Why?
A: Before entering an MPLS tunnel, services are encapsulated and the length of each packet increases by 22
bytes. Due to a larger data amount, packet loss occurs when the encapsulated services go through an FE port
at the ingress node. At the egress node, the encapsulation bytes of the services are stripped and thus the
service rate becomes lower than 100 Mbit/s.
52
Common Problems with the MSTP+ Equipment
FAQs
Q: The maximum transmission unit (MTU) of a port is 46 to 9600, but the actual range is 960 to 9600. Why?
A: If the MTU is lower than 960, DCN packets are discarded. Therefore, the actual MTU range is 960 to 9600.
53
Thank You
www.huawei.com