You are on page 1of 40

Preface

In this issue of ZTE's Maintenance Experience, we continue to pass on various field reports and resolutions that are gathered by ZTE engineers and technicians around the world. The content presented in this issue is as below: A Special Document

Maintenance Experience
Bimonthly for Data Products No. 12 Issue 262, August 2011

Maintenance Experience Editorial Committee


Director: Qiu Weizhao Deputy Director: Huang Dabin Editors: Fang Xi, Wang Zhaozheng, Xu Xinyong, Zhang Jian, Zhang Jiebin, Zhao Cen, Zhou Guifeng, Xiao Shuqing, Ge Jun, Zhao Haitao, Huang Ying, Xu Zhijun, Jiang Haijun, Dong Yemin, Dong Wenbin Technical Senior Editors: Hu Jia, Tao Minjuan, Zhang Jianping Executive Editor: Zhang Fan

Nine Maintenance Cases of ZTE's Data Products

Have you examined your service policies and procedures lately? Are you confident that your people are using all the tools at their disposal? Are they trained to analyze each issue in a logical manner that provides for less downtime and maximum customer service? A close look at the cases reveals how to isolate suspected faulty or mis-configured equipment, and how to solve a problem step by step, etc. As success in commissioning and service is usually a mix of both discovery and analysis, we consider using this type of approach as an example of successful troubleshooting investigations. While corporate leaders maintain and grow plans for expansion, ZTE employees in all regions carry out with individual efforts towards internationalization of the company. Momentum continues to be built, in all levels, from office interns to veteran engineers, who work together to bring global focus into their daily work. If you would like to subscribe to this magazine (electronic version) or review additional articles and relevant technical materials concerning ZTE products, please visit the technical support website of ZTE CORPORATION (http://ensupport.zte. com.cn). If you have any ideas and suggestions or want to offer your contributions, you can contact us at any time via the following email: doc@zte.com.cn. Thank you for making ZTE a part of your telecom experience!

Maintenance Experience Newsroom


Address: ZTE Plaza,No. 55, Hi-tech Road South, ShenZhen, P.R.China Postal code: 518057 Contact: Ning Jiating Tel: +86-755-26776049 Fax: +86-755-26772236 Document support Email: doc@zte.com.cn Technical support website: http://ensupport. zte.com.cn

Maintenance Experience Editorial Committee ZTE Corporation August 2011

Contents

Technical Specials
Hub&Spoke Networking of L3 MPLS VPN................................................................................................... 2

Case Study
Interconnection between MPLS Bearer Network and IP NMS Network....................................................... 9 8912 Network Management System Disconnection and COS2 Queue Alarms...........................................12 Multicast Service Interruption for OSPF Type 7 LSA Translation Failure.....................................................16 LDP Interconnection Failure between T600 and NE80................................................................................18 691 Error for Incorrect Status of bras Sub-Interface....................................................................................21 ACL Failure of ZXR10 29 Switch..................................................................................................................23 L2VPN Service Interruption..........................................................................................................................25 Service Interruption Resulting from Incorrect Black-hole Route Configuration............................................28 Route Failure between IPS and VRRP Network..........................................................................................33

August 2011

Issue 262

Hub&Spoke Networking of L3 MPLS VPN


Qian Yuemei / ZTE Corporation
Abstract: This section describes three schemes and the corresponding configuration commands of the Hub&Spoke networking corresponding to L3 VPN. Key words: MPLS, VPN, Hub and Spoke

Function Description
Hub&Spoke networking is a kind of networking scheme of the L3 VPN. You can select the Hub&Spoke networking scheme if you want to set a central access control device in the VPN based on security. After this central access control device is set, the subscribers can access mutually only through this device. This device is used to monitor and filter the access between the central device and the devices at two ends. In the Hub&Spoke networking scheme, the device that cannot be accessed directly is called Spoke and the central device is called Hub. The communication between these two devices is implemented by the RT attribute of the L3 VPN.

The Hub&Spoke networking structure is shown in Figure 1. In the VPN networking, Hub-PE needs to be configured with two VRFs. The import value of one vrf should be the same as the export value of one Spoke-PE. And the export value of another vrf should be the same as the import value of another Spoke-PE. In the figure below, the communication between the Spokes node must pass through the Hub node. The arrows indicate the route from site 2 to site 1. 1. Hub-PE receives the VPN-IPV4 routes distributed by all Spoke-PE nodes. 2. All Spoke-PE nodes receive the VPN-IPV4 routes distributed by Hub-PE. 3. Hub-PE distributes the routes learnt from one Spoke-PE to another Sopke-PE, so one SpokePE accesses another Spoke-PE through the Hub node.

Figure 1. BGP/MPLS VPN Model

Maintenance Experience

www.zte.com.cn

4. The Import-target feature of one Spoke-PE should be different from the export-target feature of another Spoke-PE. In this case, any Spoke-PE cannot distribute the VPN-IPV4 route to another Spoke-PE and cannot access another Spoke-PE directly.

As shown in Figure 2, in the Hub&Spoke networking structure, the route from the Spoke-CE is sent to another Spoke-PE through Hub-CE and Hub-PE. If EBGP is used between HubPE and Hub-CE, Hub-PE will perform the AS-Loop check on the route. However, Hub-PE discards this route because it finds that the route includes its own AS number. In this case, you must configure Hub-PE to allow a repeated AS number manually to implement the Hub&Spoke networking structure when EBGP is used between Hub-PE and Hub-CE. Scheme 2: Hub-CE and Hub-PE, and Spoke-PE and Spoke-CE use IGP. Figure 3 is taken as an example to describe the networking of scheme 2.

Networking Application
The Hub&Spoke networking has the following networking schemes:

Hub-CE and Hub-PE, and Spoke-PE

and Spoke-CE use the External Border Gateway Protocol (EBGP).

Hub-CE and Hub-PE, and Spoke-PE and Hub-CE and Hub-PE use EBGP, and

Spoke-CE use the Interior Gateway Protocol (IGP).

Spoke-PE and Spoke-CE use IGP. The following details the three networking schemes: Scheme 1: Hub-CE and Hub-PE, and SpokePE and Spoke-CE use EBGP.

Figure 2. Networking Structure of Hub-CE and Hub-PE, and Spoke-PE and Spoke-CE with EBGP.

Figure 3. Networking Structure of Hub-CE and Hub-PE, and Spoke-PE and Spoke-CE with IGP

Data Products

August 2011

Issue 262

As shown in Figure 3, all PEs exchange routing information with all CEs through the IGP protocol. The IGP route does not carry the AS_PATH attribute, so the AS-Path parameter of the BGP VPNv4 route is empty. Scheme 3: Hub-CE and Hub-PE uses EBGP, and Spoke-PE and SpokeCE uses IGP. Figure 4 is taken as an example to describe the networking of scheme 3. The implementation principle is just similar to that of the scheme 2. When HubPE receives the routes sent by SpokeCE from Hub-CE, the AS-Path parameter includes the number of the AS where the Hub-PE is located. In this case, you must configure the Hub-PE to allow a repeated AS number manually.

PE(config-vrf)#route-target export XXX:XXX /*Configure one or more depending on the specific conditions.*/

In addition, the AS-Path attribute of the BGP has loop-back detection feature, so it is required to configure the allow-as command and the asoverride command when the AS numbers on the CE side are repeated. 2. Configure the allowas-in command for the vpnv4 address cluster and the ipv4 address cluster in BGP route mode.
PE(config-router-af)#neighbor <neibhbor-address> allowas-in

3. Configure the as-override command for the ipv4 vr address cluster in BGP route mode.
PE(config-router-af)#neighbor <neibhbor-address> as-override

Configuration Commands
1. T h e H u b & S p o k e n e t w o r k i n g structure is implemented through matching the RT attribute on the PE.
PE(config-vrf)#route-target import XXX:XXX /*Configure one or more depending on the specific conditions.*/

Note: these commands are used flexibly depending on the specific conditions.

Configuration Instance
As shown in Figure 5, perform the following four steps to configure a Hub&Spoke networking scheme. 1. Use the IGP protocol among K2, J1 and K4 and establish the LDP neighbor relationship between any two nodes.

Figure 4. Networking Structure of Hub-CE and Hub-PE with EBGP, and Spoke-PE and Spoke-CE with IGP

Maintenance Experience

www.zte.com.cn

2. E s t a b l i s h t h e M P - B G P n e i g h b o r relationship between K2 and j1 and between K4 and J1. For the RD and RT configuration of vrf, refer to the networking figure. 3. Establish the EBGP neighbor relationship between test port 202/3, 202/4 and the vrf port of the interconnection device. 4. Bind two ports of Hub PE (J1) with vrf hub1 and vrf hub2 separately and establish the EBGP neighbor relationship between the Hub PE (j1) and the interconnected CE. The following describes the configuration for Spoke-PE and Hub-PE separately: The configuration for K2 is as follows:
router bgp 65522 address-family ipv4 vrf spoke neighbor 101.0.0.2 remote-as 100 neighbor 101.0.0.2 activate neighbor 101.0.0.2 as-override $ address-family vpnv4 neighbor 10.28.0.105 activate neighbor 10.28.0.105 allowas-in 3

spoke import*/ neighbor 103.0.0.2 remoteas 100 neighbor 103.0.0.2 activate neighbor 103.0.0.2 allowasin 3

The configuration for K4 is as follows:


address-family ipv4 vrf spoke neighbor 100.0.0.2 remoteas 100 neighbor 100.0.0.2 activate neighbor 100.0.0.2 asoverride $ address-family vpnv4 neighbor 100.28.0.105 activate neighbor 100.28.0.105 allowas-in 3

The configuration for J1 is as follows:


router bgp 65522 address-family ipv4 vrf hub1 /*hub1 RT import is the same as spoke export*/ neighbor 102.0.0.2 remote-as 100 neighbor 102.0.0.2 activate neighbor 102.0.0.2 as-override $ address-family ipv4 vrf hub2 /*hub2 RT export is the same as
Figure 5. Hub&Spoke Networking Instances

Data Products

August 2011

Issue 262

After the above configuration is completed, test the networking structure. After sending an EBGP route from port 202/3 to K2, the engineer finds the following debugging information. 1. After receiving the route information with 99.99.99.0/24 network section from test port 202/3, the Spoke K2 sends the route information together with the RT attribute to Hub J1. 00:50:36: BGP: spoke 101.0.0.2 rcv UPDATE w/ attr: origin ? next-hop 101.0.0.2 a s-path as_sequence 100

00:50:36: BGP: spoke 101.0.0.2 recv UPDATE about 99.99.99.0/24 00:50:38: BGP:

00:50:36: BGP: spoke route installing 99.99.99.0/24 -> 101.0.0.2 as_sequence 100 10.28.0.5 send UPDATE w/ attr: origin ? as-path

localpref 100 route target 1:22 mp nlri afi:1 safi:128 nexthop:10.28.0.1 nlri 0 00411 1:21 99.99.99.0/24 2. The Hub J1 sends the received route information to the CE J3 through the vrf hub 1. 06:50:45: BGP: 10.28.0.1 rcv UPDATE w/ attr: origin ? as-path

as_sequence 100 l nlri 00

ocalpref 100 route target 1:22 mp nlri afi:1 safi:128 next-hop:10.28.0.1 0211 1:21 99.99.99.0/24 as_sequence

06:50:47: BGP: hub1 102.0.0.2 send UPDATE w/ attr: origin ? as-path 65522 next-hop 102.0.0.1 nlri 99.99.99.0/24 Note: The CE receives this route only when the address cluster of vrf hub1 ipv4 is configured with the as-override command. Otherwise, the route will be denied. 3. After receiving normal EBGP routing information, the J3 sends it to all BGP neighbors. 08:25:53: BGP: 102.0.0.1 rcv UPDATE w/ attr: origin ? as-path

as_sequence 65522 08:25:53: BGP: 08:25:55: BGP:

next-hop 102.0.0.1

08:25:53: BGP: IP route installing 99.99.99.0/24 -> 102.0.0.1 as_sequence 100

102.0.0.1 recv UPDATE about 99.99.99.0/24

103.0.0.1 send UPDATE w/ attr: origin ? as-path

65522 next-hop 103.0.0.2 nlri 99.99.99.0/24

Maintenance Experience

www.zte.com.cn

Note: The AS-Path parameter of the BGP route received by the CE should be [65522 100]. The asoverride command is configured on PE, so the AS-Path parameter is changed to [65522 65522]. For the special feature of our device, the private AS domain carries only one parameter. So, the AS-Path of the BGP route received by the CE is [65522]. 4. The Hub J1 receives routing information through the vrf hub2 and sends it to the MP-IBGP neighbor, including spoke PE. 06:50:48: BGP: hub2 103.0.0.2 rcv UPDATE w/ attr: origin ? as-path as_sequence 1 00 65522 next-hop 103.0.0.2

06:50:48: BGP: hub2 103.0.0.2 recv UPDATE about 99.99.99.0/24 06:50:50: BGP:

06:50:48: BGP: hub2 route installing 99.99.99.0/24 -> 103.0.0.2 as_sequence 100 hop:10.28.0.10 06:50:50: BGP: 10.28.0.1 send UPDATE w/ attr: origin ? as-path

65522 localpref 100 route target 1:21 mp nlri afi:1 safi:128 next5 nlri 000821 100:102 99.99.99.0/24 as_sequence 100 hop:10.28.0.10

10.28.0.5 send UPDATE w/ attr: origin ? as-path

65522 localpref 100 route target 1:21 mp nlri afi:1 safi:128 next5 nlri 000821 100:102 99.99.99.0/24 Note: The allowas-in command is configured in the address cluster of vrf hub2 ipv4, so the route information with the AS-Path value as [100 65522] is received. 5. After receiving routing information, the Spoke PE K2 sends it to the CE 202/3. 07:54:05: BGP: 10.28.0.105 rcv UPDATE w/ attr: origin ? as-path

as_sequence 100 hop:10.28.0.1

65522 localpref 100 route target 1:21 mp nlri afi:1 safi:128 next-

05 nlri 000821 100:102 99.99.99.0/24 path as_sequence

07:54:07: BGP: spoke 101.0.0.2 send UPDATE w/ attr: origin ? as65522 100 next-hop 101.0.0.1 nlri 99.99.99.0/24 Note: The allowas-in command is configured in the address cluster of vrf spoke vpnv4, so the route

Data Products

August 2011

Issue 262

information with the AS-Path value as [100 65522] is received. 6. After receiving routing information, the Spoke PE K4 sends it to the CE 202/3.
08:27:04: BGP: as_sequence 100 65522 localpref 100 route target 1:21 mp nlri afi:1 safi:128 nexthop:10.28.0.1 05 nlri 000821 100:102 99.99.99.0/24 08:27:06: BGP: as_sequence 100.0.0.2 send UPDATE w/ attr: origin ? as-path next-hop:100.0.0.2 nlri 99.99.99.0/24 65522 100 10.28.0.105 rcv UPDATE w/ attr: origin ? as-path

Note: The allowas-in command is configured in the address cluster of vrf spoke vpnv4, so the route information with the AS-Path value as [100 65522] is received. The routing information of Spoke-PE K2 is as follows:
K2#show ip protocol routing vrf spoke Routes of vpn: Status codes: *valid, >best, s-stale Dest ... *> * 99.99.99.0/24 99.99.99.0/24 101.0.0.2 10.28.0.105 33 35 notag 130 20 200 bgp-ext bgp-int NextHop Intag Outtag RtPrf Protocol

Maintenance Experience

www.zte.com.cn

Zuo Jiye / ZTE Corporation

Interconnection between MPLS Bearer Network and IP NMS Network


Abstract: In the MPLS network, the LDP protocol is not enabled during the network management system interconnection, so some devices in the network allocates multiple labels to the network management server and the communication is interrupted. This problem can be solved after the label policy is configured through the mpls ldp egress command. Key words: Network management server, MPLS, POP, label policy and untagged

Symptom
As shown in Figure 1, only the TAN1 device in the MPLS network can ping the IP address of the NMS successfully after the IP address and the route of the interface are configured. The TAN2 and TAN3 devices cannot ping the IP address of the NMS.

Figure 1. Topology Structure of MPLS Bearer Network and IP NMS Network

Data Products

August 2011

Issue 262

Fault Analysis
1. By checking the routing table of the TAN1, TAN2 and TAN3 device, the engineering can find the routes to the NMS. By checking the 7604 router, the engineer finds that the routes from the NMS are correct. In this case, the items of the routing table are correct.
T1200-2(config-router)#show ip route IPv4 Routing Table: Dest 10.18.1.200 10.150.0.0 10.150.0.2 10.150.0.4 10.150.0.8 10.150.0.32 10.150.0.33 Mask Gw Interface Owner Pri Metric static 1 direct ospf ospf direct 0 address 0 0 0 0 255.255.255.255 10.150.2.250 255.255.255.252 10.150.0.2 255.255.255.255 10.150.0.2 255.255.255.252 10.150.0.1 255.255.255.252 10.150.0.1 255.255.255.252 10.150.0.33 255.255.255.255 10.150.0.33 fei_2/2.4082 smartgroup1 smartgroup1 smartgroup1 smartgroup1 fei_13/1 fei_13/1

110 2 110 2 0 0 0

address 0

T1200-1#show ip route IPv4 Routing Table: Dest 10.18.1.200 10.150.0.4 10.150.0.5 Mask Gw Interface fei_2/1 fei_2/1 fei_2/1 Owner Pri Metric 110 2 0 0 0 255.255.255.255 10.150.0.5 255.255.255.252 10.150.0.5 255.255.255.255 10.150.0.5 ospf direct

address 0

2. By checking the MPLS label forwarding table, the engineer finds that the label of the route from TAN1 to the network management server is untagged. However, the route from TAN2/TAN3 to the network management server is allocated with a detailed label.
T1200-2(config)#show mpls forwarding-table Mpls Ldp Forwarding-table: InLabel 17 19 27 20 24 25 OutLabel Untagged Pop tag 31 Pop tag 25 26 Dest 10.18.1.200 10.150.2.1 10.150.2.3 10.150.2.4 10.150.2.5 10.150.2.6 Pfxlen Interface 32 32 32 32 32 32 fei_2/2.4082 smartgroup1 smartgroup1 fei_13/1 smartgroup1 smartgroup1 NextHop 10.150.2.250 10.150.0.1 10.150.0.1 10.150.0.34 10.150.0.1 10.150.0.1

3. The network management system interconnection uses the normal static route and the LDP protocol is disabled, so the TAN1 device cannot allocate a label for the route to the network management server. The whole-network uses the MPLS, so the data package from the TAN2/ TAN3 device to the TAN1 device carries a label. The TAN1 device finds that the external label is untagged, so it discards the data package. This is the reason for the above fault.

10

Maintenance Experience

www.zte.com.cn

Solutions
Configure a pop-up label policy based on ACL for the TAN1 device and allocate a pop-up label policy to the upstream route of the network management server. In this case, the package sent from the TAN2/TAN3 device to the TAN1 device carries no label, so the TAN1 device can transfer the package properly. The following describes two configuration methods: 1. For the configuration based on the destination address, only the route that matches the destination address can report pop tag to the upstream.
T1200-2(config)#mpls ldp egress for 20 T1200-2(config)#acl standard number 20 T1200-2(config-std-acl)#rule 1 permit 10.18.1.200 0.0.0.0 T1200-2(config-std-acl)#exit

2. For the configuration based on the next hop, all the routes that match the next hop can report pop tag to the upstream.
T1200-2(config)#mpls ldp egress nexthop 20 T1200-2(config)#acl standard number 20 T1200-2(config-std-acl)#rule 1 permit 10.150.2.250 0.0.0.0 T1200-2(config-std-acl)#exit

The first configuration method is flexible and secure, but it is required to add the configuration if the network management server or the destination network section is added. The second method is common. All the routes that match the next hop can report the upper layer device and allocate a pop tag no matter which one points to the destination network. The configuration result is as follows :
T1200-2(config)#show mpls forwarding-table Mpls Ldp Forwarding-table: InLabel 19 27 20 24 25 OutLabel Pop tag 31 Pop tag 25 26 Dest 10.150.2.1 10.150.2.3 10.150.2.4 10.150.2.5 10.150.2.6 Pfxlen Interface 32 32 32 32 32 smartgroup1 smartgroup1 fei_13/1 smartgroup1 smartgroup1 NextHop 10.150.0.1 10.150.0.1 10.150.0.34 10.150.0.1 10.150.0.1

T1200-1(config)#show mpls forwarding-table Mpls Ldp Forwarding-table: InLabel 17 OutLabel Pop tag Dest 10.18.1.200 Pfxlen Interface 32 fei_2/1 NextHop 10.150.0.5

Data Products

11

August 2011

Issue 262

Lessons Learned
During the interconnection between the MPLS bearer network and the network management system, the LDP protocol is disabled, so the LER cannot allocate a label for the route to the network management server. However, the route from the down-stream device is allocated with a label. According to the MPLS protocol, the LER allocates a label for the external route for the un-straight MPLS network. However, the message from the up-stream device to the LER carries a label, so the message is discarded because the external label is untagged. In this case, it is required to configure the label policy for the LER through the mpls ldp egress command.

Qian Yuemei / ZTE Corporation

8912 Network Management System Disconnection and COS2 Queue Alarms


Abstract: Switch 89 transmits all ARP messages to the CPU for processing by default, so the COS2 queue is often congested. In fact, the ARP protocol protection function can be disabled on the port where the vlan port at layer-3 is not transmitted transparently to avoid COS2 queue congestion. Key words: SNMP, ARP, COS2 and protocol protection

Symptom
As shown in Figure 1, ZXR10 8912, as a switch on the convergence-level of one metro network, is connected to the upper SR and BRAS and to the lower OLT and DSLAM. ZXR10 8912 is interconnected to 6509 through gei_ 2/19 interface to transmit VLAN 50 transparently. The faults fed back from the subscribers are as follows: 1. ZXR10 8912 is often disconnected with the NMS server. Some packages are discarded when the subscriber ping the IP
Figure 1. Networking Scheme between 8912 and NMS Server

12

Maintenance Experience

www.zte.com.cn

address of the switch 8912 from the NMS server. 2. There are a lot of COS2 alarms on the switch 8912.

Fault Analysis
1. After logging in to the switch remotely and executing the show logging alarm command to check the alarms on the switch, the engineer finds that there are a lot of COS2 alarms.

If there are too many ARP messages sent to the CPU for processing, there will be many COS2 alarms because the COS queue is congested. The SNMP protocol that is used by the network management for message interaction belongs to the COS5 queue and the ICMP protocol in the Ping package belongs to the COS3 queue. It is doubted that the COS2 queue congestion results in the processing failure of other COS queues. 2. Enter the global mode and print the ARP message on slot 2 through the capture command. The printing results are as follows:

Data Products

13

August 2011

Issue 262

From the above information, the engineer finds that a lot of ARP messages are sent from the 192.168.1.1(0019. e0e4.83c2) address, so there are a lot of COS2 queue congestion alarms. When checking this MAP on the network side, the engineer finds that this IP address belongs to a down-stream normal subscriber. However, the engineer cannot handle the subscriber side at the moment. 3. Enable the SNMP and the ICMP protocol protection function on the upper port gei_2/19 connected with the switch 8912 to ensure that the SNMP and ICMP protocol message are handled first. At the same time, set the aging duration of the ARP message on the VLAN50 network management to 30 minutes (default value is 10 minutes). The configuration is as follows: interface gei_2/19

operation. However, there are still a lot of COS2 alarms on the switch 8912. 5. When reading the manuals related to ARP message processing mechanism for the switch 8912, the engineer finds that the ARP protocol protection function is enabled on the port of the switch 8912. In this case, all ARP messages will be sent to the CPU for processing through physical ports. In fact, all ARP messages can be transmitted directly through the port instead of sending to the CPU for processing if the vlan port of layer-3 is not transmitted transparently.

Solutions
1. When checking the configuration for the switch 8912, the engineer finds that only the network management VLAN enables the address of layer 3. Disable the ARP protection protocol on the physical VLAN port which is not transmitted transparently. interface gei_2/12

no negotiation auto

mode snmp enable mode icmp enable

ipv4 protocol-protect ipv4 protocol-protect switchport mode trunk

ipv4 protocol-protect mode ARP disable switchport mode trunk

no negotiation auto

switchport trunk native vlan 20 switchport trunk vlan 20 switchport trunk vlan 705 switchport qinq normal !

vlan 50

switchport trunk native switchport trunk vlan 50 switchport trunk vlan 92

switchport trunk vlan 1781-1790 smartgroup 10 mode active

interface vlan 50 255.255.255.0 !

ip address 172.16.2.1 arp timeout 30

2. When executing the show logging alarm command to observe the alarms on switch 8912, the engineer does not find any COS2 alarms. After entering the global mode and printing the ARP message, the engineer also does not find any abnormal ARP messages. 3. After disabling the SNMP and the ICMP protocol protection function on the upper port

4. A f t e r t h e c o n f i g u r a t i o n , t h e engineer finds that there is no alarm on the network management system and no package is discarded during the Ping

14

Maintenance Experience

www.zte.com.cn

gei_2/19 and setting the aging duration of the ARP message on the VLAN50 network management to the default value (10 minuses), the engineer finds that there is no alarm and no package is discarded. After that, the problem is solved.

is affected. It is recommended to disable the ARP protocol protection function during the service commissioning if the vlan port of layer-3 is not transmitted transparently. Through this fault processing, we learn the processing mechanism for the COS queue of the switch. The COS is classified as follows: 1. COS7: telnet, bpdu, lacp and so on 2. COS6: ospf, bgp, pim, ldp, igmp, rip and so on 3. COS5: vrrp, dhcp, snmp, bfd, isis and so on 4. COS4: vrrp(TTL=1) , udp_ntp, arp_reply, 802.1x and so on 5. COS3: icmp 6. COS2: arp_request 7. COS1: None 8. COS0: TTL and untrust Aiming at different COS queue alarms, you can find the fault reason according to the message type of the protocol.

Lessons Learned
This fault results from the COS2 queue congestion. In this case, the switch is often disconnected with the network management system because the SNMP and ICMP messages cannot be sent to the CPU for processing. At present, many switches for the metro networks have the COS2 alarms. The COS2 alarms are generated when the COS2 queue is congested because a lot of ARP messages are sent to the CPU for processing. Aiming at this fault, we should pay more attention to find the source ARP messages. The ARP protocol protection function on all physical ports of the 89 switch is enabled, so all ARP messages will be sent to the CPU for processing. In this case, the COS2 queue is often congested and the normal application of the switch

Data Products

15

August 2011

Issue 262

Fu Tao / ZTE Corporation

Multicast Service Interruption for OSPF Type 7 LSA Translation Failure


Abstract: This section describes that the ABR in the NSSA area fails to translate the LSA (type 7), which causes that the route to the multicast resource is interrupted. In this case, the living broadcast video is locked. The solution is to send the IP address of the multicast resource to the NE40 device by using the network command instead of LSAs (type 7). By doing that, the LSA translation is avoided and the fault is eliminated. Key words: OSPF, Type 7, LSA, translation and multicast

Symptom
The subscriber network structure is shown in Figure 1. Two ZXR10 8912 switches are connected to two NE40s. During the interconnection, the OSPF protocol is used. The OSPF area used

for the interconnection between ZXR10 8912 and NE40 is the NSSA area and the area used for the interconnection between NE40 and the metro network is the backbone area 0. ZXR10 8912 is connected to a lower multicast server which provides multicast service for the remote subscribers in a metro network. Through a lalyer-2 switch, this server is connected with two 8912 switches. The frequency channel of the multicast code stream sent by each upper-link is different. These two 8912 switches are just two DRs to receive the multicast code stream (It is implemented by the priority modification). MSDP neighbor relationship is established between two NE40s. These two NE40s share the load and implement the RP hot backup through the Anycast RP mode. The remote subscriber feeds back that the living broadcast screen is locked every several minutes periodically.

Fault Analysis
1. W h e n t h e m u l t i c a s t m e s s a g e s a r e discarded, the multicast data cannot be transmitted to the subscribers completely. In this case, the video must be locked or interrupted or has a
Figure 1. Networking Structure of Multicast Subscribers

mosaic.

16

Maintenance Experience

www.zte.com.cn

2.

For the package loss fault, it is required to

successful when the NE40 device loads the route successfully. In this case, the multicast message can be transmitted successfully and the living broadcast video become normal. 5. The interconnection between the 8912 switch and the NE40 device is in the NSSA area, so the OSPF configuration in the 8912 switch distributes the IP address of a multicast resource by executing the redistribute connected command. The 8912 generates LSAs (type 7) for these routes. After these LSAs are sent to the NE40 device, the NE40 devices which act as an Area Border Router (ABR) need to translate these LSAs (type 7) to the LSAs (type 5) and then calculate the route. If there is something wrong with the LSA translation, the NE40 device will cancel the IP route of the multicast resource.

check the physical link and the protocol processing. The feedback from the site shows that all subscribers in the network are affected shortly after a fault appears. Because all subscribers are affected, the fault source should lay in the core equipment room instead of a node of one subscriber. When pinging the interconnected devices in the core equipment room on the switch 8912, the engineer finds that no package is discarded. In this case, it is required to check the protocol processing. 3. The multicast data cannot be sent to the subscriber. In this case, the engineer needs to check the multicast routing table to check whether the multicast messages are sent from the 8912 switch. (1)When executing the show ip mroute command, the engineer finds that most channels have the multicast route table, the (*, G) and (S,G) entities are normal and the number of the messages is increased. (2)When executing the show ip pimsm neighbor command, the engineer finds that the PIM neighbor relationship between the 8912 switch and the NE40 is established normally. (3)When executing the show logging alarm command, the engineer does not find any alarm. 4. If no abnormality is found, the engineer needs to contact the person who is responsible for the device to find the reason together. After contacting the person who is responsible for the NE40 device, the engineers find that the NE40 device has OSPF route cancellation alarms continuously. All the IP addresses (including the network section IP of the multicast source) of the cancelled routes are to the lower server connected with the 8912 switch. The obvious reason is that the NE40 device cancels the route to the lower server connected with the 8912 switch. In this case, the RPF check of the NE40 device fails after the multicast data is sent to the NE40 device. In this case, the living broadcast video is locked because the subscriber fails to receive the multicast message. The RPF detection is

Solutions
According to the above analysis, it is only required to send the IP address of the multicast resource to the NE40 device by executing the network command instead of LSAs (type 7). In this case, the NE40 device does not need to translate the LSA. The command is as follows: router ospf 1 0.0.0.3 area

network 192.168.125.140 192.168.125.128

The route information of the detailed network section distributed by this command is included in the LAS (type 1). After receiving the route information, the NE40 device calculates the OSPF route. After the above configuration, the video service becomes normal.

Data Products

17

August 2011

Issue 262

Lessons Learned
When a fault occurs, collect the area distribution information of the fault site to eliminate the fault range and confirm the fault device. It is required to check the fault reason together with the engineers of the interconnected device to improve the efficiency if the device of ZTE is interconnected with a device of another manufacturer. During the fault check, it is required to check the important information related to the service. For the multicast problem

in this instance, you need to check the status of the multicast routing table, including whether the increase of the message is normal and whether the aging duration is normal, the PIM neighbor status, the IGMP group status and the DR electing status. For a unicast route, you need to check the OSPF neighbor status, the route learning and the database. Besides the package loss reason, the video also could be locked or have a mosaic if the subscriber receives multiple multicasts at the same time. One multicast may be discarded because the access bandwidth of the ADSL on the subscriber side is short, so the video is "locked or has a mosaic.

Cao Xiangyue / ZTE Corporation

LDP Interconnection Failure between T600 and NE80


Abstract: This section describes the label forwarding failure results from the establishment failure of the LDP neighbor relationship between the T600 device and the routers of another manufacturers. Key words: T600, NE80, LDP, MPLS VPN and label

Symptom
As shown in Figure 1, the engineer can learn the internal and external VPN label of a provincial platform or other metro platform from the T600 device. However, the engineer cannot successfully ping the VPN address. The OSPF and the BGP route are normal.
Figure 1. MPLS VPN Networking Figure

18

Maintenance Experience

www.zte.com.cn

Fault Analysis
1. The on-site engineer can successfully ping the address of the provincial platform on the 7609 device, so it is doubted that the fault may lay in the interconnection between the NE 80 device and the T600 device. 2. The engineer can check the address transferring table for the Loopback address of the T600 device on the 7609 platform and the NE80 platform. Check the label of the NE80 device and the T600 device on the 7609 device. The command is as follows:
7609#show mpls forwarding-table 120.193.15.240 Local tag 7160 Outgoing tag or VC Untagged Prefix or Tunnel Id Bytes tag switched Outgoing interface GE4/1 Next Hop

120.193.15.240/32 293

211.140.160.129

Show the label of the loopback address of the T600 device on the NE80 device. The command is as follows:
<NE80>display mpls lsp brief

The engineer finds that the data related to the120.193.15.240 address cannot be found. 3. However, the engineer can find the label forwarding table of the 7609 device on the T600 device and the NE 80 device. Check the label of the 7609 device on the T600 device. The command is as follows:
T600#show mpls forwarding-table 211.138.130.73 Mpls Ldp Forwarding-table: InLabel 329 OutLabel 2974 Dest 211.138.130.73 Pfxlen Interface 32 gei_1/1 NextHop 120.193.15.85

Check the label forwarding table on the NE80 device. The command is as follows:
<NE80 >display mpls lsp brief ID 1342 I/O-Label 2800/--In-Interface GE7/0/1 Prefix/Mask 211.138.130.73/32 Next-Hop

211.140.160.130

From the above information, the engineer finds that the external label for the loopback address 211.138.130.73 of the 7609 device is sent to the NE80 device and the NE80 device sends this external label to the T600 device. Step 3 shows that the T600 device does not send the external label for the loopback address 120.193.15.240 to the NE80 device, so the NE 80 device does not send it to the 7609 device. In this case, the 7609 device shows that the external label of the loopback address 120.193.15.240 is Untagged.

Data Products

19

August 2011

Issue 262

4. The engineer further analyzes why the T600 device does not send the external label to the NE 80 device and whether there is something wrong with the LDP neighbor connection. In this case, the engineer checks the LDP neighbor connection.

T600#show mpls ldp neighbor Peer LDP Ident: 211.138.130.215:12; Local LDP Ident 120.193.15.240:0 TCP connection: 120.193.15.85.646 - 120.193.15.240.1041 state: Oper; Msgs sent/rcvd: 1023/3710; Downstream Up Time: 05:31:51 LDP discovery sources: gei_1/1; Src IP addr: 120.193.15.85 Addresses bound to peer LDP Ident: 120.193.15.85 Peer LDP Ident: 120.193.15.241:0; Local LDP Ident 120.193.15.240:0 TCP connection: 120.193.15.241.1048 - 120.193.15.240.646 state: Oper; Msgs sent/rcvd: 1111/1111; Downstream Up Time: 05:54:53 LDP discovery sources: gei_1/2; Src IP addr: 120.193.15.94 Addresses bound to peer LDP Ident: 120.193.15.241 192.168.1.3 120.193.15.94 120.193.15.90

The above information shows that the label of the LDP neighbor between the NE 80 device and the T600 device is incorrect. The label value sent by the NE80 is 12 instead of 0. However, the only label value accepted by ZTE device is 0. Through the check and the verification, the engineer finds that the label value sent by the NE80 is not a standard value specified by the RFC.

Solutions
After updating the NE80 device, the engineer finds that this fault is solved.

Lessons Learned
The MPLS VPN problem needs to be solved step by step and layer by layer. First, the engineer needs to know whether this problem results from route fault or label forwarding fault. Then, if it results from the label fault, the engineer needs to confirm whether it results from external-layer or internal-layer label fault. And at last, the engineer needs to check the corresponding label. When eliminating this problem in the metro network, the engineer needs to negotiate with the subscribers and handle this problem together with the engineers of other manufacturers.

20

Maintenance Experience

www.zte.com.cn

Zuo Jiye / ZTE Corporation

691 Error for Incorrect Status of bras Sub-Interface


Abstract: This section describes a 691 error results from the SAL configuration failure when a subscriber dials a number. This problem is solved after the SAL configuration is modified. Key words: T691 error, sub-interface is down, SAL and UAS

Symptom
As shown in Figure 1, a 691 error appears when a modem subscriber connected to the new DSL9800 dials a number. This subscriber fails to dial a number and obtain the IP address. However, the other services and the DSLAM network management are normal.

sub-interface corresponding to the service is in down status and the other interfaces corresponding to the service are in up status. The details are as follows:
gei_1/2.2630 gei_1/2.2661 unassigned 172.27.57.121 up up up down unassigned up

255.255.255.252 up

Fault Analysis
1. According to the phenomenon of the 691 error, the engineer eliminates the transmission channel and the data configuration fault on the access side and guesses that the incorrect account or password results in the authentication fault. However, the subscriber feeds back that the bras

Figure 1. UAS Networking

Data Products

21

August 2011

Issue 262

2. According to the software and hardware principle for the router, the status of the corresponding sub-interface should be in up status when the status of the main-interface is in up status. The engineer considers why the subinterface is in down status and whether this fault relates to this phenomenon. When checking the sub-interface and comparing the configuration with that of the normal sub-interface, the engineer d o e s n o t f i n d a n y d i ff e r e n c e . A f t e r deleting the configuration and creating a sub-interface again, there is also no difference. 3. W h e n c h e c k i n g t h e B R A S function configuration for the T600 device, the engineer finds that the SAL configuration is incorrect. The mandatory domain is not configured. The details are as follows:
interface gei_1/2.2661 bras dot1Q 2661 out_index 22 access-type ethernet encapsulation dot1q pppover-ethernet bind authentication chappap maximum 32000 ppp idle interval 300 traffic-limit 0 ppp keepalive timer 60 count 10 ppp check-magic-number sal 1

Solutions
After adding and modifying the configuration, the engineer finds that the subscriber can dial a number successfully and the problem is solved. At this time, the status of the sub-interface is in up status. The details are as follows:
sal 1 default domain kptc deny any sal 1 default domain kptc permit domain kptc deny any 10:01:18 07/01/2010 UTC alarm 18721 cleared %IP% Interface up on gei_1/2.2661 sent by UPC(RPU) 1

Lessons Learned
According to the phenomenon of the 691 error, the engineer first doubts that this problem results from the authentication fault. In that case, the engineer is confused by the status of the sub-interface and does not check the related configuration carefully. The 678 error instead of the 691 error appears if there is something wrong with the interface. The conclusions are as follows: the BRAS subinterface relates to the PPPoE protocol and this interface is in up status only after the subscriber passes the PPPoE authentication. This interface is in down status when all subscribers are offline. During this fault processing, pay attention to this phenomenon and the SAL configuration.

22

Maintenance Experience

www.zte.com.cn

Ye Lihui / ZTE Corporation

ACL Failure of ZXR10 29 Switch


Abstract: This section describes that the subscriber network is interrupted after the IP+MAC address binding because the hybrid acl item is incorrect. This problem is solved after the hybrid acl configuration item is modified. Key words: hybrid acl, ARP, IP+MAC binding

Symptom
In one network, the subscriber requires to bind the IP+MAC address on the ZXR10 switch (29 series) to meet the customer requirement according to hybrid acl of the ZXR10 switch (29 series). The configuration is as follows:
config acl hybrid number 300 rule 1 permit ip 192.168.1.100 255.255.255.255 any 00.e0.0c.9a.15.9c ff.ff.ff.ff.ff.ff any rule 2 permit ip 192.168.1.101 255.255.255.255 any 00.1f.16.1d. bb.b7 ff.ff.ff.ff.ff.ff any rule 3 permit ip 192.168.1.102 255.255.255.255 any 00.09.0f. fe.00.01 ff.ff.ff.ff.ff.ff any rule 4 deny any any any exit set port 2-3 acl 300 enable

After the configuration, the engineer finds that the PC cannot successfully ping the external address.

Fault Analysis
1. After applying the ACL to the port, the engineer finds that the PC connected with this port cannot successfully ping the IP address of the gateway or any other internal devices. When executing the command cmdarp a on this PC to check the ARP table, the engineer finds that the ARP table item cannot be generated on the PC normally. 2. When checking this ACL carefully, the engineer finds the problem results from the last rule 4 deny any any any port. All messages through this port are forbidden by the ACL. The former ACL allows transmitting the IP messages that comply with this policy server

Data Products

23

August 2011

Issue 262

in the network. However, the ARP message is a type of message that is between layer 2 and layer 3. Before the communication, the device needs to resolve the IP address to the corresponding MAC address through the ARP protocol to transmit the data message. The ACL allows transmitting the IP protocol, but it does not allow transmitting the ARP protocol. In this case, the data message fails to be transmitted because the device cannot resolve the IP address to the corresponding MAC address.

configuration to transmit the ARP message. The ACL modification method is as follows:
config acl hybrid number 300 rule 1 permit ip 192.168.1.100 255.255.255.255 any 00.e0.0c.9a.15.9c ff.ff.ff.ff.ff.ff any rule 2 permit ip 192.168.1.101 255.255.255.255 any 00.1f.16.1d. bb.b7 ff.ff.ff.ff.ff.ff any rule 3 permit ip 192.168.1.102 255.255.255.255 any 00.09.0f. fe.00.01 ff.ff.ff.ff.ff.ff any rule 4 deny ip any any exit

Solutions
After the reason is found, the engineer only needs to modify the ACL

24

Maintenance Experience

www.zte.com.cn

Wu Lifeng / ZTE Corporation

L2VPN Service Interruption


Abstract: This section describes an L2VPN service interruption fault. Through the fault analysis, the engineer finds that this fault results from the number of the labels supported by the device. The engineer executes the related command to ensure that only the loopback address is allocated with a label. After that, the fault is solved. Key words: Black-hole route, longest matching and load sharing

Symptom
As shown in Figure 1, the NE40 device and the T600-B/C/D/E devices belong to the PE device and the T600-A device belongs to the P device. An L2VPN instance is established between the T600-B/C/D/E device and the NE40 device separately to transmit the VLAN of layer-2 transparently. The OSPF route protocol is used between the devices to implement the interworking of the Loopback address. When the link between the T600-A device and the T600-B device is recovered after interruption, the engineer can learn all the routes. However, the services of XDSL-A and XDSL-B are interrupted. In addition, the services of XDSL-A and XDSL-B recover normally two or three hours after the fault recovery. Sometimes, it is required to restart the T600-A device to recovery the services of XDSL-A or XDSL-B.

Fault Analysis
During a fault handling, the main principle is to reduce the fault range and find the fault point. In this fault, the engineer finds that all the routes and the Loopback address interworking are normal even if

Figure 1. L2VPN Networking Structure

Data Products

25

August 2011

Issue 262

the transmission link is interrupted. From the above information, the engineer guesses that maybe the MPLS label results in the fault. The engineer needs to go to the site and verify the fault together with the customer. First, the engineer needs to check the network stats and then check the device configuration to see whether the related service is lost. After that, the engineer finds that the vfi of the T600-B/C and NE40 device is in down status. This is the direct reason that results in the XDSL-A/B service interruption. The engineer checks the fault to locate the fault range. The T600-B device is a fault point, so the engineer needs to check the T600-B device first. 1. Confirm whether the loopback address can be pinged successfully. During the onsite test, the engineer finds that the IP address can be pinged successfully. If no, check the loopback address by executing the show ip ospf neighbor command. 2. Check whether the status of the LDP neighbor is normal. The LDP neighbor is normal because it is in Oper status.
T600-B#show mpls ldp neighbor

Peer LDP Ident: 192.168.167.72:0; Local LDP Ident 192.168.167.61:0 TCP connection: 192.168.167.72.2317 - 192.168.167.61.646 state: Oper; Msgs sent/rcvd: 4002/4022; Downstream Up Time: 01:51:23

Peer LDP Ident: 192.168.167.62:0; Local LDP Ident 192.168.167.61:0 TCP connection: 192.168.167.62.1047 - 192.168.167.61.646 state: Oper; Msgs sent/rcvd: 4117/4117; Downstream Up Time: 03:32:20

Peer LDP Ident: 192.168.167.7:0; Local LDP Ident 192.168.167.61:0 TCP connection: 192.168.167.7.646 - 192.168.167.61.1034 state: Oper; Msgs sent/rcvd: 4943/1475; Downstream Up Time: 03:44:46

3. Check whether the label forwarding table related to the NE40 device is normal. The details are as follows:
T600-B #show mpls forwarding-table 192.168.167.7 Mpls Ldp Forwarding-table: InLabel 2968 OutLabel 1409 Dest 192.168.167.7 Pfxle 32 Interface pos3_8/1 NextHop

124.217.126.249

4. Check the detailed VC information of the L3VPN.


T600-B#show mpls l2 vc vpls detail

26

Maintenance Experience

www.zte.com.cn

The detailed VC information is abnormal because the returned result is empty. 5. The data is bidirectional, so it is required to check the label forwarding table on the T600-A device to confirm the fault.
T600-A#show mpls forwarding-table 192.168.167.61

The returned result is also empty. From this, it is known that the forwarding table related to the 192.168.167.61 does not exist on the T600-A device. The LDP neighbor is not the only condition for the MPLS forwarding. 6. Check the distribution of the external label.

T600-A#show mpls ldp bindings 192.168.167.7/32 local binding: label: 1409 remote binding: lsr: 192.168.167.7:0, label: imp-null(inuse) remote binding: lsr: 192.168.167.65:0, label: 801 remote binding: lsr: 192.168.167.69:0, label: 8130 remote binding: lsr: 192.168.167.67:0, label: 682 remote binding: lsr: 192.168.167.61:0, label: 2968

The above information shows that only external label of the NE40 device exists, and the labels distributed by the destination address 192.168.167.61 and 192.168.167.62 do not exist. 7. When executing the show mpls ldp bindings command on the T600-A device to check the label distribution, the engineer finds that the T600-A devices allocates the label for all routes. 8. When checking the route table on the T600-A device, the engineer finds that the number (morn than 4K) of the route exceeds the number (4K) of the labels supported by the device. The fault occurs because the T600-A device does not have sufficient labels. The service recovers normally after the above modification. In order to avoid the similar fault, it is required to configure the mpls ldp access-fec host-route-only command for the device whose MPLP IP is started.
T600-A#clear mpls ldp T600-A#mpls ldp access-fec host-route-only

2. Restart the MPLS process.

Solutions
1. When checking the configuration, the engineer finds that there is no label distribution policy configured on the T600-A device. The mpls ldp access-fec host-route-only command is used to control the label distribution policy. This command means the label is only distributed to the Loopback address. The details are as follows:

Lessons Learned
The idea to handle the L2 MPLS VPN fault is as follows: 1. C h e c k t h e I G P r o u t e r s b y executing the show ip ospf neighbor command and then check the status of the OSPF neighbor. Generally, the OSPF should be in FULL status.

Data Products

27

August 2011

Issue 262

2. Check the LDP Neighbors by executing the show mpls ldp neighbor command. Generally, the LDP neighbor is in Oper status. 3. Check the forwarding table for the external label by executing the show mpls forwarding-table command. Generally, the external label is distributed by the loopback address of the PE neighbor. 4. Check the detailed VC information by executing the show mpls l2 vc vpls

detail command. 5. Check the alarm by executing the show logging alarm command. 6. Collect the debug information of the mpls ldp by executing the debug mpls ldp all command and then close the debug information by executing the no debug all command. In addition, it is recommended to control the label of the device by using the ACL or executing the mpls ldp access-fec host-route-only command when the MPLS is used.

Hao Bin / ZTE Corporation

Service Interruption Resulting from Incorrect Black-hole Route Configuration


Abstract: This section describes the network interruption results from incorrect blackhole route configured on the routers with double outgoing interfaces. Modify the route configuration to eliminate the fault and realize the load-sharing. Key words: Black-hole route, longest matching and load sharing

Networking Description
As shown in Figure 1, ZXR10 T600 acting as the general outgoing interface of Wimax core network is interconnected with two ISPs. When the traffic from Wimax core network to T600 passes through the firewall, the system translates the private address to two private network address,
Figure 1. MPLS VPN Networking Figure

including 192.168.8.0/22 and 192.168.12.0/22.

28

Maintenance Experience

www.zte.com.cn

The EBGP neighbor relationship is established between T600 and two ISPs. The traffic from T600 to Wimax core network uses the static route.

L2VPN instance is established between the T600-B/C/D/E device and the NE40 device separately to transmit the VLAN of layer-2 transparently. The OSPF route protocol is used between the devices to implement the interworking of the Loopback address.

Symptom
As shown in Figure 1, the NE40 device and the T600-B/C/D/E devices belong to the PE device and the T600-A device belongs to the P device. An

router bgp 45904 network 192.168.8.0 255.255.252.0 network 192.168.12.0 255.255.252.0 network 192.168.8.0 255.255.255.0 network 192.168.9.0 255.255.255.0 network 192.168.10.0 255.255.255.0 network 192.168.11.0 255.255.255.0 network 192.168.12.0 255.255.255.0 network 192.168.13.0 255.255.255.0 network 192.168.14.0 255.255.255.0 network 192.168.15.0 255.255.255.0 neighbor 10.10.21.29 remote-as 17806 neighbor 10.10.21.29 activate neighbor 10.10.21.29 route-map to-Mango-out out neighbor 10.10.51.65 remote-as 17494 neighbor 10.10.51.65 activate neighbor 10.10.51.65 route-map to-Btcl-out out ! ip route 192.168.8.0 255.255.252.0 192.168.11.2 ip route 192.168.12.0 255.255.252.0 192.168.11.2 ip route 192.168.8.0 255.255.255.0 null1 200 ip route 192.168.9.0 255.255.255.0 null1 200 ip route 192.168.10.0 255.255.255.0 null1 200 ip route 192.168.11.0 255.255.255.0 null1 200 ip route 192.168.12.0 255.255.255.0 null1 200 ip route 192.168.13.0 255.255.255.0 null1 200 ip route 192.168.14.0 255.255.255.0 null1 200 ip route 192.168.15.0 255.255.255.0 null1 200! ! ip prefix-list Mango-out seq 5 permit 192.168.8.0 24 ip prefix-list Mango-out seq 10 permit 192.168.9.0 24 ip prefix-list Mango-out seq 15 permit 192.168.10.0 24

Data Products

29

August 2011

Issue 262

ip prefix-list Mango-out seq 20 permit 192.168.11.0 24 ip prefix-list Mango-out seq 25 permit 192.168.8.0 22 ip prefix-list Mango-out seq 30 permit 192.168.12.0 22 ! ip prefix-list Btcl-out seq 5 permit 192.168.12.0 24 ip prefix-list Btcl-out seq 10 permit 192.168.13.0 24 ip prefix-list Btcl-out seq 15 permit 192.168.14.0 24 ip prefix-list Btcl-out seq 20 permit 192.168.15.0 24 ip prefix-list Btcl-out seq 25 permit 192.168.8.0 22 ip prefix-list Btcl-out seq 30 permit 192.168.12.0 22 ! route-map to-Mango-out permit 10 match ip address prefix-list Mango-out ! route-map to-Btcl-out permit 10 match ip address prefix-list Btcl-out !

After the above configuration, all traffic returned from T600 to Wimax core network is interrupted.

Fault Analysis
1. When pinging the public address of the Internet on T600, the engineer finds that the address can be pinged successfully. When pinging the address directly connected to the firewall on T600, the engineer finds that the IP address also can be pinged successfully. When tracing some public network address translated through the firewall on T600, the engineer finds the following tracing results:
T600#trace 192.168.8.5 tracing the route to 192.168.8.5 1 * * [finished]

2. Check the route item related to this address on T600.


T600#show ip route 192.168.8.5 IPv4 Routing Table: Dest 192.168.8.0 Mask 255.255.255.0 Gw 0.0.0.0 Interface null1 Owner Pri Metric 200 0 static

The engineer finds that the 192.168.8.5 address selects the black-hole route.

30

Maintenance Experience

www.zte.com.cn

3. Check the static route table. The details are as follows:


T600#show ip route static IPv4 Routing Table: Dest 192.168.8.0 192.168.8.0 192.168.9.0 192.168.10.0 192.168.11.0 192.168.12.0 192.168.12.0 192.168.13.0 192.168.14.0 192.168.15.0 192.168.31.0 192.168.49.0 Mask 255.255.252.0 255.255.255.0 255.255.255.0 255.255.255.0 255.255.255.0 255.255.252.0 255.255.255.0 255.255.255.0 255.255.255.0 255.255.255.0 255.255.255.0 255.255.255.0 Gw 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 192.168.11.2 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 192.168.11.2 Interface gei_4/2 null1 null1 null1 null1 gei_4/2 null1 null1 null1 null1 null1 gei_4/2 Owner static static static static static static static static static static static static Pri Metric 1 0 200 0 200 0 200 0 200 0 1 0 200 0 200 0 200 0 200 0 200 0 192.168.11.2

Based on the longest matching rule, the router selects the black-hole route first instead of the static route whose next hop of 192.168.8.0/22 or 192.168.12.0/22 is the interface address of the firewall.

Solutions
When the black-hole routes to all private network segments are deleted, the services are recovered. At that time, only the following two static routes exist:
ip route 192.168.8.0 255.255.252.0 192.168.11.2 ip route 192.168.12.0 255.255.252.0 192.168.11.2

To realize the load sharing, add a static route to the private network segment whose next hop is the interface address of the firewall. The details are as follows: ip route 192.168.8.0 255.255.252.0 192.168.11.2
ip route 192.168.12.0 255.255.252.0 192.168.11.2 ip route 192.168.8.0 255.255.255.0 192.168.11.2 ip route 192.168.9.0 255.255.255.0 192.168.11.2 ip route 192.168.10.0 255.255.255.0 192.168.11.2 ip route 192.168.11.0 255.255.255.0 192.168.11.2 ip route 192.168.12.0 255.255.255.0 192.168.11.2 ip route 192.168.13.0 255.255.255.0 192.168.11.2 ip route 192.168.14.0 255.255.255.0 192.168.11.2 ip route 192.168.15.0 255.255.255.0 192.168.11.2

Data Products

31

August 2011

Issue 262

In this case, you can find that the routes distributed by two ISPs are normal.
T600#show ip bgp neighbor out 10.10.21.29 Routes Sent To This Neighbor: Origin codes: i-IGP, e-EGP, ?-incomplete Dest 192.168.8.0/22 192.168.8.0/24 192.168.9.0/24 192.168.10.0/24 192.168.11.0/24 192.168.12.0/22 NextHop 10.10.21.30 10.10.21.30 10.10.21.30 10.10.21.30 10.10.21.30 10.10.21.30 Metric LocPrf i i i i i i Path

T600#show ip bgp neighbor out 10.10.51.65 Routes Sent To This Neighbor: Origin codes: i-IGP, e-EGP, ?-incomplete Dest 192.168.8.0/22 192.168.12.0/22 192.168.12.0/24 192.168.13.0/24 192.168.14.0/24 192.168.15.0/24 NextHop 10.10.51.66 10.10.51.66 10.10.51.66 10.10.51.66 10.10.51.66 10.10.51.66 Metric LocPrf Path i i i i i i

Lessons Learned
In the router, the longest match rule is used for route lookup. When BGP notifies the route aggregation, the network+black-hole route is used to reduce the BGP route

aggregation vibration resulting from the inner route vibration and prevent the route loop. However, this configuration is not required if the inner route is a static route. Do not ignore the actual inner route when the black-hole route is configured.

32

Maintenance Experience

www.zte.com.cn

Li Yueyang / ZTE Corporation

Route Failure between IPS and VRRP Network


Abstract: This section describes that the security feature of IPS will change the transferring path of the source network traffic in the network application. Key words: VRRP, IPS, trusted-port, mistrustful port and links with double route

Symptom
As shown in Figure 1, T64G-1 and T64G-2 are connected to CEs through the default route. The IPS device uses the transparent transmission mode. It detects the invalid message and the network attack. The active IP address of VRRP is on T64G-1.

T64G-1 and T64g-2 enable the VRRP protocol and act as the gateway of the lower server. When the servers connected to T64G-1 or T64G-2 access the public network segment 163, the engineer finds that some network segments cannot be accessed.

Figure 1. Networking Structure between VRRP and IPS

Data Products

33

August 2011

Issue 262

Fault Analysis
1. Log in to T64G-1 and T64G-2 and ping the public address separately. The engineer can ping the IP address
T64G-1#ping 219.151.39.1

successfully on both T64Gs if the source address is not carried. However, if the source address is carried, the subscriber can only ping the IP address of one T64G device successfully. The details are as follows:

Sending 5,100-byte ICMP echoes to 219.151.39.1,timeout is 2 seconds. !!!!! Success rate is 100 percent(5/5),round-trip min/avg/max= 0/0/0 ms. T64G-1#ping 219.151.39.1 source 10.10.204.50 Sending 5,100-byte ICMP echoes to 219.151.39.1,timeout is 2 seconds. ..... Success rate is 0 percent(0/5). T64G-1# T64G-2#ping 219.151.39.1 Sending 5,100-byte ICMP echoes to 219.151.39.1,timeout is 2 seconds. !!!!! Success rate is 100 percent(5/5),round-trip min/avg/max= 0/0/0 ms. T64G-2#ping 219.151.39.1 source 10.10.204.51 Sending 5,100-byte ICMP echoes to 219.151.39.1,timeout is 2 seconds. !!!!! Success rate is 100 percent(5/5),round-trip min/avg/max= 0/4/20 ms. T64G-2#

2. The engineer doubts that the route of the upper device is not distributed. However, the route of the upper device is normal.
T64G-1#show arp exvlanID 11 The count is 5 IP Address Age(min) 2 1 1 Hardware Address

3. When checking the ARP table item from T64G, the engineer finds that the ARP table is normal.

External Internal Interface VlanID 11 11 11 VlanID N/A N/A N/A

Sub

Interface gei_1/7 gei_1/7 gei_1/47

---------------------------------------------------------------------10.10.204.56 10.10.204.52 10.10.204.51 0010.dbff.2060 0010.dbff.2060 0019.c606.dcf1 vlan11 vlan11 vlan11

34

Maintenance Experience

www.zte.com.cn

10.10.204.53 10.10.204.55

2 2

0010.dbff.2060 0010.dbff.2060

vlan11 vlan11

11 11

N/A N/A

gei_1/7 gei_1/7

T64G-2#show arp exvlanID 11 The count is 6 IP Address Age(min) 0 3 1 2 3 3 Hardware Address Interface External Internal VlanID 11 11 11 11 11 11 VlanID N/A N/A N/A N/A N/A N/A Sub Interface gei_1/47 gei_1/47 gei_1/47 gei_1/47 gei_1/47 gei_1/47

---------------------------------------------------------------------10.10.204.53 10.10.204.49 10.10.204.56 10.10.204.55 10.10.204.52 10.10.204.50 0010.dbff.2060 0000.5e00.010d 0010.dbff.2060 0010.dbff.2060 0010.dbff.2060 0019.c606.dcd1 vlan11 vlan11 vlan11 vlan11 vlan11 vlan11

4. When executing the ping 172.16.189.198 source 10.10.204.50 command on the T64G-1 device, the engiener finds that the IP address can be pinged successfully. However, when executing the ping 172.16.189.194 source 10.10.204.50 command, the engineer finds that the IP address cannot be pinged successfully. The route address for the outgoing packet is 192.168.90.26 and that for the incoming packet is 192.168.90.29. In the same way, when executing the ping 172.16.189.194 source 10.10.204.51 command on the T64G-2 device, the engineer finds that the IP address can be pinged successfully. However, when executing the ping 172.16.189.198 source 10.10.204.51 command, the engineer finds that the IP address cannot be pinged successfully. The route address for the outgoing packet is 192.168.90.30 and that for the incoming packet is 192.168.90.25. 5. When executing the ping 192.168.90.30 source 10.10.204.50 command on the T64G-1 device, the engineer finds that the IP address can be pinged successfully. When executing the

ping 192.168.90.26 source 10.10.204.51 command on the T64G-2 device, the engineer finds that the IP address a l s o c o u l d b e p i n g e d s u c c e s s f u l l y. The reason is that the OSPF protocol between the T64G devices is enabled and the direct route is distrubuted. The route address for the outgoing packet and that for the incoming packet are the same. 6. To find the fault reason, the engineer captures the packets and analyzes the packets on T64G and CE. The engineer captures the packets on the T64G-1 device with source address 10.10.204.50 ping 172.16.189.194 (The outgoing route is to CE4 and the incoming route is from CE3, to T64G-2 and then to T64G-1, so the packet capture operation is performed on the T64G-2 device). The capture result shows that the CE device does not give any response to the messages.

Data Products

35

August 2011

Issue 262

By capturing the packets on the port which connected CE3 to T64-2, the engineer finds that CE has sent the response messages to the T64G. In this case, the engineer doubts that the packet is lost on the ISP device between T64G and CE.

IP address of the packet when the IPS device classifies the port into trusted port and untrusted port just like the firewall. If the system does not detect the session from trusted port (source IP) to un-trusted port (destination IP), but detect the messages from un-trusted port (source IP) to trusted port (destination IP), the message will be discarded. In this case, the fault occurs. The reason why some public IP addresses cannot be pinged successfully is as follows: the T64G-1 is an active device, so all the messages are sent out through the default route of T64G-1. However, the incoming messages select the route of T64G-1 or that of T64G-2. When the route of T64G-2 is selected, the incoming messages will be discarded, so the IP address cannot be pinged successfully. To solve this problem, it is recommended to add a heartbeat line between two IPSs. After that, IPS-2 can copy the session from IPS-1 when the outgoing route selects IPS-1 but the incoming route selects IPS-2. In this case, the normal messages cannot be discarded. Of course, you can cancel this policy on IPS. At the same time, this type of IPS property also applies to the firewall.

Solutions
After the engineers of the IPS manufacturer cancel the safety policies on the IPS device, the service becomes normal.

Lessons Learned
The outgoing route from T64G-1 to CE3 (192.168.90.30) is inconsistent with that from CE3 (192.168.90.30) to T64G-1. However, the outgoing route from T64G-1 to CE3 (192.168.90.30) is consistent with that from CE3 (172.16.189.194) to T64G-1 After negotiating with the engineers of IPS manufacturer, the engineer finds that the system will check whether each message is valid or has malicious attack and detect the source and destination

36

Maintenance Experience