You are on page 1of 8

In a circuit related issue link flap/degraded performance this is what you need to

watch for:
Where the circuit resides?
What interface its connected to?
E.g
Logs:
Jul 20 19:20:22 CST: %CONTROLLER-5-UPDOWN: Controller T1 0/0/1, changed state
to down (RAI detected)
Jul 20 19:20:38 CST: %CONTROLLER-5-UPDOWN: Controller T1 0/0/1, changed state
to up
Jul 20 19:20:38 CST: %ISDN-6-LAYER2UP: Layer 2 for Interface Se0/0/1:23, TEI 0
changed to up
Jul 20 19:20:40 CST: %LINK-3-UPDOWN: Interface Serial0/0/1:23, changed state to
up
Jul 20 19:37:28 CST: %CONTROLLER-5-UPDOWN: Controller T1 0/0/1, changed state
to down (AIS detected)
Sh int desc:
Se0/0/0:0
up
(DLCI 400 / EPVC 778)
Se0/0/0:0.1
up
(DLCI 400 / EPVC 778)

up
up

AT&T MPLS T1.5 Ckt. ID: DHEC.455470


AT&T MPLS PVC Ckt ID: DHEC.455470

sh controller t1 0/0/0:
T1 0/0/0 is up.
Applique type is Channelized T1
Cablelength is long 0db
Description: AT&T MPLS T1 Circuit ID: DHEC.445470 (Data)
sh controller t1 0/0/1:
T1 0/0/1 is up.
Applique type is Channelized T1
Cablelength is long 0db
Description: AT&T T1 Pri Circuit ID: DHEC.104668.100 (Voice)
So in this case the T1 0/0/1 link flapped and its used for VOICE. We get a ticket
opened with AT&T under the ID: DHEC.104668.100.
How sure are you there were packets loss and performance degraded?
After entering sh controller t10/0/1 go all the way down.. youll see as below which
proves that circuit is bad.
Total Data (last 24 hours)
4 Line Code Violations, 5939 Path Code Violations,
224 Slip Secs, 2 Fr Loss Secs, 3 Line Err Secs, 44 Degraded Mins,
395 Errored Secs, 166 Bursty Err Secs, 7 Severely Err Secs, 60 Unavail Secs

----------------------------------------------------------------------------------------------------

What does the RAI and AIS means?


LOS = loss of signal - customer router has lost signal to ATT/Verizon smart jack
the fault is onsite. either the SJ itself, cabling between the CSU and the SJ, or the
WIC(possibly the entire router, but not usually if we have OOB)
----------------------------------------------------------------------LOF = loss of frame
-> if its frame relay -between you and the telco frame switch, your next hop
physically connected device
-> if its PPP then the last hop device
----------------------------------------------------------------------RAI = remote alarm indication
--> occurs between the telco devices in the link path
----------------------------------------------------------------------AIS = issue between smart jack to telco

(Router(CSU))====SJ=======TELCO GEAR======TELCO
GEAR========BGP NEiGHBOR Router
--->between CSU and SJ is LOS
--->between SJ and TELCO is AIS
--->between telco and telco is RAI
*SJ is smart jack*
----------------------------------------------------------------------------------------------------

BGP
Always related to a WAN link as its going to another router in the cloud.
Sh ip bgp summary:
Neighbor
V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.7.252.194 4 13979 50988 39516 52533 0 0 3w4d
2514
Meaning the bgp link has been up for 3 weeks and 4 days.
How to differentiate which bgp resides at which link?
Look at the IP in the bgp the link it uses will be the next hop IP address in the sh ip
int brief
Serial0/0/0:0.1
10.7.252.193 YES NVRAM up
up >>>>>
AT&T MPLS PVC Ckt ID: DHEC.455470 (DLCI 400 / EPVC 778)
Useful command:
sh ip bgp sum
sh log
sh clock to make sure the log time looks correct.
noticed that configuration was changed
looked for changes. None found.
looked for a site diagram.
none in the MDL.
looked for config changes in HP NA, but only one backup.
The 2 neighbors that are up are on Vzb MPLS Network.
---------------------------------------------------------------------------------------------------OOB : Out of Bound
When you get access, open Procomm. Go to top menu "Data" and select "Modem
Comand Mode"
Type in the window if it responds with OK. ats37=9 (sets modem to 9.6k)
atdt8,
To dial type atdt8,1 (phone number) example atdt8,1602 272-0802
Enter the phone number. Use 8 for long distance from Plano. Note: if the number
doesnt include the long distance access number, please include it. That would be a
1 for US and Canada and 011 for other countries.
----------------------------------------------------------------------------------------------------

International Paper
- IP - Only open tickets in US
- IP HD will open tickets with Orange on all International routers for IP (monitored
device)
-Just ignore those alarms.
-------------------------------------------------------------------Tower Perrin
Ignore tp alarms for crypto tunnels.
Note: TP (Towers Perin) crypto alarms are bogus, we do not manage those alarms,
just close them
Jpmntarcvtc001 is antoerh bogus alarm
-------------------------------------------------------------------CE router circuits that are attached to PE routers are handled by the Transport
group.
Never open a ticket on a CE device without asking if Transport is working it already.
####Event Update####
Spoke to ATT, advised NI can be looped but not CSU. This suggest a power outage.
Spoke to IP HD, they confirmed that there is power on site
Note: Never trust what they say. All they do is dial the modem and see if it rings.

---------------------------------------------------------------------------------------------------boq-bql4984rt05 had a tunnel bounce.


This is very common.
Just make sure that it comes back up and close the alarm.
Also in the log, the last line says Tunnel1 from Loading to FULL, Loading Done.
This is okay. Non issue.
Jul 11 20:59:41 CST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1,
changed state to down
Jul 11 20:59:41 CST: %OSPF-5-ADJCHG: Process 20, Nbr 10.63.0.20 on Tunnel1 from
FULL to DOWN, Neighbor Down: Interface down or detached

Jul 11 21:01:36 CST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1,


changed state to up
Jul 11 21:01:38 CST: %OSPF-5-ADJCHG: Process 20, Nbr 10.63.0.20 on Tunnel1 from
LOADING to FULL, Loading Done
The boq-bqlppdrt01, pqlpsdrt01 and bqlpwart routers are the far end of tunnels. If
they are showing neighbor changes, they are not the issue, There will be a branch
somewhere with the issue.
When SE goes down, BRI ( ISDN backup ) come up. if it bounce for3 times, ticket
needs to be opened
What happened was the circuit took a hit and the ISDN backup came up. Then the
circuit restored.
On those, I very seldom open a ticket unless it is happening multiple times.
If the circuit is up, then all we can do is as for an RFO which is on the bottom of the
priority list and can take days.
Jul 11 22:02:47 CST: %OSPF-5-ADJCHG: Process 20, Nbr 10.63.0.20 on Tunnel1 from
FULL to DOWN, Neighbor Down: Dead timer expired
Jul 11 22:02:47 CST: %OSPF-5-ADJCHG: Process 20, Nbr 10.63.128.20 on Tunnel2
from FULL to DOWN, Neighbor Down: Dead timer expired
Jul 11 22:02:48 CST: %LINK-3-UPDOWN: Interface BRI0/0/0:2, changed state to up
Jul 11 22:02:48 CST: %DIALER-6-BIND: Interface BR0/0/0:2 bound to profile Di1
Jul 11 22:02:48 CST: %ISDN-6-CONNECT: Interface BRI0/0/0:2 is now connected to
1800180277 N/A
Jul 11 22:02:49 CST: %LINEPROTO-5-UPDOWN: Line protocol on Interface BRI0/0/0:2,
changed state to up
Jul 11 22:02:53 CST: %OSPF-5-ADJCHG: Process 20, Nbr 10.63.0.20 on Tunnel1 from
LOADING to FULL, Loading Done
Jul 11 22:03:03 CST: %FR-5-DLCICHANGE: Interface Serial0/1/0 - DLCI 16 state
changed to INACTIVE
Jul 11 22:03:03 CST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
Serial0/1/0.16, changed state to down
Jul 11 22:03:03 CST: %FR-5-DLCICHANGE: Interface Serial0/1/0 - DLCI 17 state
changed to INACTIVE
Jul 11 22:03:03 CST: %LINEPROTO-5-UPDOWN: Line protocol on Interface
Serial0/1/0.17, changed state to down
Jul 11 22:03:03 CST: %LINK-3-UPDOWN: Interface Virtual-Access1, changed state to
down
Jul 11 22:03:03 CST: %LINK-3-UPDOWN: Interface Virtual-Access2, changed state to
down
Jul 11 22:03:03 CST: %LINK-3-UPDOWN: Interface Virtual-Access4, changed state to
down
Jul 11 22:03:03 CST: %LINK-3-UPDOWN: Interface Virtual-Access5, changed state to
down
Jul 11 22:03:03 CST: %OSPF-5-ADJCHG: Process 10, Nbr 10.63.128.10 on VirtualAccess4 from FULL to DOWN, Neighbor Down: Interface down or detached

Jul 11 22:03:04 CST: %LINEPROTO-5-UPDOWN:


Access1, changed state to down
Jul 11 22:03:04 CST: %LINEPROTO-5-UPDOWN:
Access2, changed state to down
Jul 11 22:03:04 CST: %LINEPROTO-5-UPDOWN:
Access4, changed state to down
Jul 11 22:03:04 CST: %LINEPROTO-5-UPDOWN:
Access5, changed state to down

Line protocol on ,Interface VirtualLine protocol on Interface VirtualLine protocol on Interface VirtualLine protocol on Interface Virtual-

When SE goes down, BRI ( ISDN backup ) come up. if it bounce for3 times, ticket
needs to be opened
---------------------------------------------------------------------------------------------------DUAL-5-NBRCHANGE: IP-EIGRP 132: Neighbor 10.3.246.89 (Dialer0) is down
usiwbrpd2303#sh int dialer 0
Dialer0 is up (spoofing), line protocol is up (spoofing) ack
Hardware is Unknown
Internet address is 10.3.246.1/24
MTU 1500 bytes, BW 128 Kbit, DLY 21000 usec,
reliability 255/255, txload 1/255, rxload 1/255
++++++++++++++++++++++++++++++++++++++++++++++
CRYPTO-5-SESSION_STATUS: Crypto tunnel is DOWN. Peer 192.85.7.12:500 Id:
192.85.7.12
MER015#sh crypto isakmp sa
dst

src

state

152.161.97.78 192.85.7.12

conn-id slot
QM_IDLE

++++++++++++++++++++++++++++++++++++++++++++++
CONTROLLER-5-UPDOWN: Controller E1 0/0 changed state to down (RAI detected)
RAI = remote alarm indication
--> occurs between the telco devices in the link path
reic-paris-477609#sh Controller E1 0/0
E1 0/0 is up.
Applique type is Channelized E1 - balanced

No alarms detected.
alarm-trigger is not set
Version info Firmware: 20040408, FPGA: 11
Framing is NO-CRC4, Line Code is HDB3, Clock Source is Line.
CRC Threshold is 320. Reported from firmware is 320.
Total Data (last 24 hours)
70 Line Code Violations, 4 Path Code Violations,
0 Slip Secs, 0 Fr Loss Secs, 4 Line Err Secs, 0 Degraded Mins, ack!!
4 Errored Secs, 0 Bursty Err Secs, 0 Severely Err Secs, 0 Unavail Secs

----------------------------------------------------------------------------------------------------Eg:
Aug 14 18:50:19 CST: %SYSTEM_CONTROLLER-SP-3-ERROR: Error condition
detected: TM_NPP_PARITY_ERROR
Aug 14 18:50:19 CST: %SYSTEM_CONTROLLER-SP-3-EXCESSIVE_RESET: System
Controller is getting reset so frequently
Problem:
The switch reports this error message:
%SYSTEM_CONTROLLER-SP-3-ERROR: Error condition detected:
TM_NPP_PARITY_ERROR
This example shows the console output that is displayed when this problem occurs:
Feb 23 21:55:00: %SYSTEM_CONTROLLER-SP-3-ERROR: Error condition detected:
TM_NPP_PARITY_ERROR
Feb 23 22:51:32: %SYSTEM_CONTROLLER-SP-3-ERROR: Error condition detected:
TM_NPP_PARITY_ERROR
Feb 23 23:59:01: %SYSTEM_CONTROLLER-SP-3-ERROR: Error condition detected:
TM_NPP_PARITY_ERROR
Description:
The most common errors from the Mistral ASIC on the MSFC are
TM_DATA_PARITY_ERROR, SYSDRAM_PARITY_ERROR, SYSAD_PARITY_ERROR, and
TM_NPP_PARITY_ERROR. Possible causes of these parity errors are random static
discharge or other external factors. This error message indicates that there was a
parity error. Processor Memory Parity Errors (PMPEs) are are broken down into two
types: single event upset (SEU) and repeated errors.
These single bit errors occur when a bit in a data word changes unexpectedly due to
external events (which causes, for example, a zero to spontaneously change to a

one). SEUs are a universal phenomenon irrespective of vendor or technology. SEUs


occur very infrequently, but all computer and network systems, even a PC, are
subject to them. SEUs are also called soft errors, which are caused by noise and
result in a transient, inconsistent error in the data, this is unrelated to a component
failure - most often the result of cosmic radiation.
Repeated errors (often referred to a hard errors) are caused by failed components. A
hard error is caused by a failed component or a board-level problem, such as an
improperly manufactured printed circuit board that results in repeated occurrences
of the same error.
Workaround:
If you see the error message only once or rarely, monitor the switch syslog in order
to confirm that the error message is an isolated incident. If these error messages
reoccur, reseat the supervisor engine blade. If the errors stop, it was a hard parity
error. If these error messages continue to reoccur, open a case with
the http://www.cisco.com/en/US/support/tsd_cisco_worldwide_contacts.html.
Useful commands:
Sh log
Sh ip bgp summary
Sh controller T1 or E1
Sh isdn active
Sh isdn history
Sh isdn status
sh log
sh Controller E1 0/2/0
sh ip int bri
sh int desc
sh ip bgp sum
sh inv
show interfaces serial
show controllers
sh frame-relay end-to-end kee
show frame-relay lmi
show frame-relay map
show frame-relay pvc