You are on page 1of 10

I can see from case description that there is fabric reachability issue :

Dec 19 18:35:05.921 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm set: FPC


color=RED, class=CHASSIS, reason=FPC 2 Major Errors
Dec 19 18:35:57.868 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm set: CB
color=YELLOW, class=CHASSIS, reason=Check CB 0 Fabric Chip 0
Dec 19 18:35:57.870 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm set: CB
color=YELLOW, class=CHASSIS, reason=Check CB 0 Fabric Chip 1
Dec 19 18:36:03.962 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm set: FPC
color=RED, class=CHASSIS, reason=FPC 2 has unreachable destinations
Dec 19 18:36:14.000 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm set: FPC
color=RED, class=CHASSIS, reason=FPC 2 offlined due to unreachable destinations
Dec 19 18:36:14.597 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm cleared: CB
color=YELLOW, class=CHASSIS, reason=Check CB 0 Fabric Chip 0
Dec 19 18:36:14.598 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm cleared: CB
color=YELLOW, class=CHASSIS, reason=Check CB 0 Fabric Chip 1
Dec 19 18:48:57.032 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm cleared: FPC
color=RED, class=CHASSIS, reason=FPC 2 offlined due to unreachable destinations

Can you please let me know the following :

If the FPC has been restarted to recover.From FPC shell, can you please provide
the following :
Start shell pfe network fpc 2
Show syslog messages
Show nvram

Came across PR#1148241, which seems related.

[aabdalmoniem@svl-jtac-tool02 /volume/CSdata/aabdalmoniem/2015-1221-0064]$ cat


chassisd.txt | egrep 0x1b
Dec 18 13:01:38 -- Error/Interrupt Status (0x10) 0x1b
Dec 18 13:01:56 -- Error/Interrupt Status (0x10) 0x1b
Dec 18 13:02:14 -- Error/Interrupt Status (0x10) 0x1b
Dec 18 13:02:32 -- Error/Interrupt Status (0x10) 0x1b
Dec 18 13:02:49 -- Error/Interrupt Status (0x10) 0x1b
Dec 18 13:03:07 -- Error/Interrupt Status (0x10) 0x1b
Dec 18 13:03:24 -- Error/Interrupt Status (0x10) 0x1b

Dec 18 13:03:42 -- Error/Interrupt Status (0x10) 0x1b

For Error/Interrupt Status (at address 0x10 of i2csc), the customer is using SCBE2:

CB 0 REV 03 750-055976 CAEP4899 Enhanced MX SCB 2


CB 1 REV 03 750-055976 CAEP4844 Enhanced MX SCB 2

That I assume shares the same i2c circuitry spec as SCBE, so you can refer to the
following spec :
<-- at Table 3-1 and/or section 3.2.10

http://www-in/eng/cvs_pdf/spec/atlas/fpga/i2cs_scbe/i2cs_design.pdf

So for 0x1b=b'11011, you see the following bits are set:

bit[0]: I2C Error Interrupt

bit[1]: Mastership error interrupt

bit[3]: External CPU Interrupt

bit[4]: Power Fail Int

So I would think that this is more of a symptom due to the power failure,
represented by bit[4] especially. For now we can stick to "Power Volt Status"
(around 0x2A, 0x24, 0x2E, 0x3A, 0x40) as the significant error symptom.

davidr@re0.glasbo-rbr1> show chassis alarms


1 alarms currently active
Alarm time Class Description
2015-12-18 12:46:16 UTC Major FPC 2 Hard errors

davidr@re0.glasbo-rbr1> show chassis fpc 2 detail Slot 2 information:


State Offline
Reason: Unresponsive
Temperature Absent
Total CPU DRAM 0 MB
Total RLDRAM 259 MB
Total DDR DRAM 20352 MB
Max Power Consumption 474 Watts

davidr@re0.glasbo-rbr1> show chassis hardware models Hardware inventory:


Item Version Part number Serial number FRU model number
FPC 2 REV 11 750-054904 CAEL5197 MPC2E-3D-NG

-----------------------------------------------------------------------------------
-

From looking at the associated logs:

Dec 19 18:35:05.921 re0.glasbo-rbr1 craftd[1762]: %DAEMON-4: Major alarm set, FPC


2 Major Errors
Dec 19 18:35:57.869 re0.glasbo-rbr1 craftd[1762]: %DAEMON-4: Minor alarm set,
Check CB 0 Fabric Chip 0
Dec 19 18:35:57.882 re0.glasbo-rbr1 craftd[1762]: %DAEMON-4: Minor alarm set,
Check CB 0 Fabric Chip 1
Dec 19 18:36:03.962 re0.glasbo-rbr1 craftd[1762]: %DAEMON-4: Major alarm set, FPC
2 has unreachable destinations
Dec 19 18:36:14.001 re0.glasbo-rbr1 craftd[1762]: %DAEMON-4: Major alarm set, FPC
2 offlined due to unreachable destinations
Dec 19 18:36:14.597 re0.glasbo-rbr1 craftd[1762]: %DAEMON-4: Minor alarm cleared,
Check CB 0 Fabric Chip 0
Dec 19 18:36:14.598 re0.glasbo-rbr1 craftd[1762]: %DAEMON-4: Minor alarm cleared,
Check CB 0 Fabric Chip 1
Dec 19 18:48:57.033 re0.glasbo-rbr1 craftd[1762]: %DAEMON-4: Major alarm cleared,
FPC 2 offlined due to unreachable destinations
Dec 19 18:48:57.047 re0.glasbo-rbr1 craftd[1762]: %DAEMON-4: Major alarm cleared,
FPC 2 has unreachable destinations
Dec 19 18:48:57.059 re0.glasbo-rbr1 craftd[1762]: %DAEMON-4: Major alarm cleared,
FPC 2 Major Errors
Dec 20 13:04:46.874 re0.glasbo-rbr1 craftd[1762]: %DAEMON-4: Minor alarm set, Loss
of communication with Backup RE
Dec 20 13:04:56.240 re0.glasbo-rbr1 craftd[1762]: %DAEMON-4: Minor alarm cleared,
Loss of communication with Backup RE

----------------------

Dec 19 18:35:05.921 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm set: FPC


color=RED, class=CHASSIS, reason=FPC 2 Major Errors
Dec 19 18:35:57.868 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm set: CB
color=YELLOW, class=CHASSIS, reason=Check CB 0 Fabric Chip 0
Dec 19 18:35:57.870 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm set: CB
color=YELLOW, class=CHASSIS, reason=Check CB 0 Fabric Chip 1
Dec 19 18:36:03.962 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm set: FPC
color=RED, class=CHASSIS, reason=FPC 2 has unreachable destinations
Dec 19 18:36:14.000 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm set: FPC
color=RED, class=CHASSIS, reason=FPC 2 offlined due to unreachable destinations
Dec 19 18:36:14.597 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm cleared: CB
color=YELLOW, class=CHASSIS, reason=Check CB 0 Fabric Chip 0
Dec 19 18:36:14.598 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm cleared: CB
color=YELLOW, class=CHASSIS, reason=Check CB 0 Fabric Chip 1
Dec 19 18:48:57.032 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm cleared: FPC
color=RED, class=CHASSIS, reason=FPC 2 offlined due to unreachable destinations
Dec 19 18:48:57.032 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm cleared: FPC
color=RED, class=CHASSIS, reason=FPC 2 has unreachable destinations
Dec 19 18:48:57.032 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm cleared: FPC
color=RED, class=CHASSIS, reason=FPC 2 Major Errors
Dec 20 13:04:46.872 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm set: RE
color=YELLOW, class=CHASSIS, reason=Loss of communication with Backup RE
Dec 20 13:04:56.240 re0.glasbo-rbr1 alarmd[2163]: %DAEMON-4: Alarm cleared: RE
color=YELLOW, class=CHASSIS, reason=Loss of communication with Backup RE
-----------------------------------------------------------------------------------
------------------------------------------

For case#2015-1221-0064, lets divide this case into 3 issues:

1- FPC#2 power failure:


> FPC was changed.
> We need to raise an RMA for this one, so not sure if you managed to get the RMA
details for us.

[aabdalmoniem@svl-jtac-tool01 /volume/CSdata/aabdalmoniem/2015-1221-0064]$ cat


full-chassisd.txt| egrep -i removal
Dec 18 16:18:55 FH: Informing the DPCs to act on CB removal
Dec 18 16:19:55 CHASSISD_SNMP_TRAP7: SNMP trap generated: FRU removal
(jnxFruContentsIndex 7, jnxFruL1Index 3, jnxFruL2Index 0, jnxFruL3Index 0,
jnxFruName FPC: MPC2E NG PQ & Flex Q @ 2/*/*, jnxFruType 3, jnxFruSlot 2)
Dec 18 16:19:55 CHASSISD_FRU_OFFLINE_NOTICE: Taking FPC 2 offline: Removal
Dec 18 16:19:55 fpc_down slot 2 reason Removal cargs 0x0
Dec 18 16:19:55 fpc_offline_now - slot 2, reason: Removal, error OK transition
state 1
Dec 18 16:23:11 FH: Informing the DPCs to act on CB removal

[aabdalmoniem@svl-jtac-tool01 /volume/CSdata/aabdalmoniem/2015-1221-0064]$ cat


inventory.txt | egrep "FPC 2 - part number"
Jul 23 10:17:13 FPC 2 - part number 750-054904, serial number CAEL5197
Jul 23 10:17:19 FPC 2 - part number 750-054904, serial number CAEL5197
Jul 23 11:04:55 FPC 2 - part number 750-054904, serial number CAEL5197
Jul 23 11:05:00 FPC 2 - part number 750-054904, serial number CAEL5197
Jul 23 11:26:19 FPC 2 - part number 750-054904, serial number CAEL5197
Jul 23 11:26:24 FPC 2 - part number 750-054904, serial number CAEL5197
Dec 18 16:22:17 FPC 2 - part number 750-054904, serial number CAEH9749
........

2- Dec. 19 04:57 AM, start to see FRU online messages for RE-0:

Dec 19 04:57:56 fru_is_present: out of range slot 0 for


Dec 19 04:58:06 CHASSISD_SNMP_TRAP7: SNMP trap generated: Fru Online
(jnxFruContentsIndex 9, jnxFruL1Index 1, jnxFruL2Index 0, jnxFruL3Index 0,
jnxFruName Routing Engine 0, jnxFruType 6, jnxFruSlot 0)
Dec 19 04:58:06 ch_vchassis_become_backup: vc not enabled
Dec 19 05:05:42 fru_is_present: out of range slot 0 for
Dec 19 05:06:01 CHASSISD_SNMP_TRAP7: SNMP trap generated: Fru Online
(jnxFruContentsIndex 9, jnxFruL1Index 1, jnxFruL2Index 0, jnxFruL3Index 0,
jnxFruName Routing Engine 0, jnxFruType 6, jnxFruSlot 0)
Dec 19 05:06:01 ch_vchassis_become_backup: vc not enabled
Dec 19 05:06:08 fru_is_present: out of range slot 0 for

> During that time, RE-1 was trying to become master RE, which indicate that there
might be a communication issue between the 2 REs.

Dec 19 04:57:56.906 re1.glasbo-rbr1 /kernel: %KERN-5: mastership: routing engine 1


becoming master
Dec 19 04:57:56.906 re1.glasbo-rbr1 /kernel: %KERN-4: mastership: timed out
becoming master mstr_conf=6d
Dec 19 04:57:56.906 re1.glasbo-rbr1 /kernel: %KERN-5: mastership: returned 16.
Failed to become masters
Dec 19 04:57:56.906 re1.glasbo-rbr1 chassisd[1856]: %DAEMON-4: No response from the
other routing engine for the last 2 seconds.
Dec 19 04:57:56.906 re1.glasbo-rbr1 chassisd[1856]: %DAEMON-4: Keepalive timeout of
2 seconds expired. Assuming RE mastership.
Dec 19 04:57:58.002 re1.glasbo-rbr1 /kernel: %KERN-5: mastership: routing engine 1
becoming master
Dec 19 04:57:58.002 re1.glasbo-rbr1 /kernel: %KERN-4: mastership: timed out
becoming master mstr_conf=6d
Dec 19 04:57:58.002 re1.glasbo-rbr1 /kernel: %KERN-5: mastership: returned 16.
Failed to become masters
Dec 19 04:57:59.002 re1.glasbo-rbr1 /kernel: %KERN-5: mastership: routing engine 1
becoming master
Dec 19 04:57:59.002 re1.glasbo-rbr1 /kernel: %KERN-4: mastership: timed out
becoming master mstr_conf=6d
Dec 19 04:57:59.002 re1.glasbo-rbr1 /kernel: %KERN-5: mastership: returned 16.
Failed to become masters

Dec 19 13:37:52.730 re1.glasbo-rbr1 /kernel: %KERN-4: mastership: timed out


becoming master mstr_conf=6d
Dec 19 13:37:52.730 re1.glasbo-rbr1 /kernel: %KERN-5: mastership: returned 16.
Failed to become masters
Dec 19 13:37:53.730 re1.glasbo-rbr1 /kernel: %KERN-5: mastership: routing engine 1
becoming master
Dec 19 13:37:53.730 re1.glasbo-rbr1 /kernel: %KERN-4: mastership: timed out
becoming master mstr_conf=6d
Dec 19 13:37:53.730 re1.glasbo-rbr1 /kernel: %KERN-5: mastership: returned 16.
Failed to become masters
Dec 19 13:37:54.640 re1.glasbo-rbr1 chassisd[1856]: %DAEMON-5-CHASSISD_SNMP_TRAP7:
SNMP trap generated: Fru Online (jnxFruContentsIndex 9, jnxFruL1Index 1,
jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName Routing Engine 0, jnxFruType 6,
jnxFruSlot 0)

> From logs, we don�t see fabric drops.

3- Later system reported fabric drops to FPC#2 , which triggered FPC to be reset :

http://www.juniper.net/techpubs/en_US/junos12.3/topics/concept/fabric-failures-
corrective-actions-mx-routers.html

{master}
dave@re0.glasbo-rbr1<mailto:dave@re0.glasbo-rbr1>> show chassis fabric reachability

Fabric reachability status: Fabric degradation condition healed


Detected on : 2015-12-19 18:36:03 UTC
Reason : Fabric Degradation due to grant timeouts seen by DPCs

Fabric reachability action:


Fabric reachability action : Plane and FPC action
Current phase : Plane and FPC Restart
Phase is completed
Action started : 2015-12-19 18:36:13 UTC
Action completed : 2015-12-19 18:49:22 UTC

Fabric reachability resolution: Fabric degradation healed after phase Plane and FPC
restart

> Below logs are due to fabric issue between FPC#2 and CB#0
Dec 19 18:35:57 FR: Enqueueing AFPC dest disable event for slot 2
Dec 19 18:35:57 FR: Enqueueing AFPC dest disable event for slot 2
Dec 19 18:35:57 send: yellow alarm set, device CB 0, reason Check CB 0 Fabric Chip
0
Dec 19 18:35:57 CHASSISD_SNMP_TRAP7: SNMP trap generated: fabric plane check
(jnxFruContentsIndex 12, jnxFruL1Index 1, jnxFruL2Index 0, jnxFruL3Index 0,
jnxFruName CB 0, jnxFruType 5, jnxFruSlot 0)
Dec 19 18:35:57 send: yellow alarm set, device CB 0, reason Check CB 0 Fabric Chip
1
Dec 19 18:35:57 CHASSISD_SNMP_TRAP7: SNMP trap generated: fabric plane check
(jnxFruContentsIndex 12, jnxFruL1Index 1, jnxFruL2Index 0, jnxFruL3Index 0,
jnxFruName CB 0, jnxFruType 5, jnxFruSlot 0)
Dec 19 18:35:57 FM: Message rcvd from pb 1 (type 4, subtype 325), pb_up:1
Dec 19 18:35:57 FM: Received plane control ack from PFE board 1, stage:NULL stage,
pb_up:1
Dec 19 18:35:57 FM: plane ctl ack pb 1 toggle_plane_mask:0x00 ...
Dec 19 18:35:57 FM: plane status ...
Dec 19 18:35:57 FM: 2 0 0 0

[aabdalmoniem@svl-jtac-tool01 /volume/CSdata/aabdalmoniem/2015-1221-0064]$ cat


chassisd-re0.txt | egrep "Dec 19" | egrep "board dest disable"
Dec 19 18:35:57 FM: Waiting for PFE board dest disable ack from PFE board 0
Dec 19 18:35:57 FM: Waiting for PFE board dest disable ack from PFE board 0
Dec 19 18:35:57 FM: End of PFE board dest disable fabric event for slot 2, evq=0x0
Dec 19 18:35:57 FM: Waiting for PFE board dest disable ack from PFE board 1
Dec 19 18:35:57 FM: Waiting for PFE board dest disable ack from PFE board 2
Dec 19 18:35:57 FM: End of PFE board dest disable fabric event for slot 2, evq=0x0
Dec 19 18:35:58 FM: Waiting for PFE board dest disable ack from PFE board 0
Dec 19 18:35:58 FM: Waiting for PFE board dest disable ack from PFE board 2
Dec 19 18:35:58 FM: End of PFE board dest disable fabric event for slot 2, evq=0x0
Dec 19 18:35:58 FM: Waiting for PFE board dest disable ack from PFE board 0
Dec 19 18:35:58 FM: Waiting for PFE board dest disable ack from PFE board 2
Dec 19 18:36:02 FM: End of PFE board dest disable fabric event for slot 2, evq=0x0

[aabdalmoniem@svl-jtac-tool01 /volume/CSdata/aabdalmoniem/2015-1221-0064]$ cat


chassisd-re0.txt | egrep desti
Dec 18 16:25:17 FM: Sending destination control message to PFE board 2
Dec 18 16:25:18 FM: Sending destination control message to PFE board 0
Dec 18 16:25:18 FM: Sending destination control message to PFE board 1
Dec 18 16:25:18 FM: Sending destination control message to PFE board 2
Dec 19 18:36:03 send: red alarm set, device FPC 2, reason FPC 2 has unreachable
destinations
Dec 19 18:36:14 send: red alarm set, device FPC 2, reason FPC 2 offlined due to
unreachable destinations
Dec 19 18:36:14 CHASSISD_FRU_OFFLINE_NOTICE: Taking FPC 2 offline: FPC offlined due
to unreachable destinations
Dec 19 18:36:14 fpc_down slot 2 reason FPC offlined due to unreachable
destinations cargs 0x90c4060
Dec 19 18:36:14 fpc_disconnect_generic: fpc 2 state Online cargs 0x90c4060
clean_shutdown 1, offline_reason=FPC offlined due to unreachable destinations
Dec 19 18:36:14 fpc_offline_now - slot 2, reason: FPC offlined due to unreachable
destinations, error OK transition state 1
Dec 19 18:36:14 FPC#2 - power off [addr 0x14] reason: FPC offlined due to
unreachable destinations
Dec 19 18:49:21 FM: Sending destination control message to PFE board 2
Dec 19 18:49:22 FM: Sending destination control message to PFE board 0
Dec 19 18:49:22 FM: Sending destination control message to PFE board 1
Dec 19 18:49:22 FM: Sending destination control message to PFE board 2

Based on issue 2 and 3, we might expect an issue with RE0, so we need to perform
the below action plan. Perform RE switchover, after that we need to monitor the
status of RE0 and CB0.Please provide following logs from the FPC : Start shell
pfe network fpc2show nvramshow syslog message During Dec 19, was there any change
in traffic behavior ?After RE switchover, please provide full /var/log from both
REs.

-----------------------------------------------
1)collect the below outputs :

show chassis fabric summary


show chassis fabric reachability
show chassis fabric plane
show chassis fabric plane-location
show chassis fabric fpcs
show chassis fabric destinations
show chassis fpc
show class-of-service fabric statistics
show chassis power
show chassis power detail
show chassis power sequence
-----------------------------------------------------------------------------------
-----------------

I need following output from both the RE to find out if there is any issue with the
internal RE-to-PFE or RE-to-RE communications

show tnp addresses


show system statistics tnp
show system statistics ttp
show chassis ethernet-switch statistics
show chassis ethernet-switch errors

--------------------------------------------

1. There is no fabric related issue after the RE switchover

imtech@re0.glasbo-rbr1> show chassis fabric reachability


Dec 31 16:20:03

Fabric reachability status: No Fabric degradation detected now

imtech@re0.glasbo-rbr1> show chassis fabric summary extended


Dec 31 16:20:39
Plane State Link Link Destination errors Uptime
Error TF Local / Remote
0 Online NO NO NO/ NO 1 day, 9 hours, 10 minutes, 54
seconds
1 Online NO NO NO/ NO 1 day, 9 hours, 10 minutes, 47
seconds
2 Online NO NO NO/ NO 1 day, 9 hours, 10 minutes, 47
seconds
3 Online NO NO NO/ NO 1 day, 9 hours, 10 minutes, 40
seconds

2. I saw the frequent increment in input error counters on em0 interface of RE1

imtech@re1.glasbo-rbr1> show interfaces extensive em0 | match error


Dec 31 18:11:03
Input errors:
Errors: 8218027, Drops: 0, Framing errors: 0, Runts: 0, Giants: 0,
Policed discards: 0, Resource errors: 0
Output errors:
Carrier transitions: 0, Errors: 0, Drops: 0, MTU errors: 0,
Resource errors: 0

imtech@re1.glasbo-rbr1> show interfaces extensive em0 | match error


Dec 31 18:18:27
Input errors:
Errors: 8221111, Drops: 0, Framing errors: 0, Runts: 0, Giants: 0,
Policed discards: 0, Resource errors: 0
Output errors:
Carrier transitions: 0, Errors: 0, Drops: 0, MTU errors: 0,
Resource errors: 0

I tried to find out the reason behind the rapid increment in the input error but it
doesn't accounted in the other counters, details can be seen below

3. There are some historic errors that you can see in dmesg output but they are not
increasing. On RE0 these counters all 0

Line 1990: root@re1% dmesg | grep em0


Line 1991: em0: Excessive collisions = 0
Line 1992: em0: Symbol errors = 0
Line 1993: em0: Sequence errors = 0
Line 1994: em0: Defer count = 0
Line 1995: em0: Missed Packets = 0
Line 1996: em0: Receive No Buffers = 0
Line 1997: em0: Receive length errors = 173353
Line 1998: em0: Receive errors = 4013379
Line 1999: em0: Crc errors = 4017012
Line 2000: em0: Alignment errors = 0
Line 2001: em0: Carrier extension errors = 0
Line 2002: em0: RX overruns = 0
Line 2003: em0: watchdog timeouts = 0
Line 2004: em0: XON Rcvd = 0
Line 2005: em0: XON Xmtd = 0
Line 2006: em0: XOFF Rcvd = 0
Line 2007: em0: XOFF Xmtd = 0
Line 2008: em0: Good Packets Rcvd = 4001806070
Line 2009: em0: Good Packets Xmtd = 13942745

4. In "show chassis ethernet-switch statistics" there are no error on any of the


port of both the CB and all ethernet-switch link are looking good. From RE1 tnp
ping to RE0 and FPC2 is working fine is also working fine.

imtech@re1.glasbo-rbr1> show chassis ethernet-switch


Dec 31 16:25:20

Link is good on GE port 0 connected to device: FPC0

Link is good on GE port 1 connected to device: FPC1

Link is good on GE port 2 connected to device: FPC2

Link is good on GE port 12 connected to device: Other RE

Link is good on GE port 13 connected to device: RE-GigE

Link is good on GE port 17 connected to device: Debug-GigE

imtech@re0.glasbo-rbr1> show tnp addresses


Dec 31 16:23:11
Name TNPaddr MAC address IF MTU E H R
master 0x1 02:00:00:00:00:04 em0 1500 0 0 3
master 0x1 02:00:01:00:00:04 em1 1500 0 1 3
re0 0x4 02:00:00:00:00:04 em0 1500 0 0 3
re0 0x4 02:00:01:00:00:04 em1 1500 0 1 3
re1 0x5 02:01:00:00:00:05 em1 1500 2 0 3
backup 0x6 02:01:00:00:00:05 em1 1500 2 0 3
fpc0 0x10 02:00:00:00:00:10 em0 1500 5 0 3
fpc1 0x11 02:00:00:00:00:11 em0 1500 5 0 3
fpc2 0x12 02:00:00:00:00:12 em0 1500 5 0 3

imtech@re1.glasbo-rbr1> show tnp connectivity 0x12 count 20


.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\
x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\
x08 \x08.\x08 \x08.\x08 \x08.\x08 \x0820 of 20 pings

{backup}
imtech@re1.glasbo-rbr1> show tnp connectivity 0x4 count 20
.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\
x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\x08 \x08.\
x08 \x08.\x08 \x08.\x08 \x08.\x08 \x0820 of 20 pings

-------------------------------------

2)Seems that FPC#2 reports hard errors:

Dec 18 13:19:05 CHASSISD_POWER_CHECK: FPC 2 not powering up


Dec 18 13:19:05 fpc_offline_now - slot 2, reason: Error, error Unresponsive
transition state 2
Dec 18 13:19:05 fpc_offline_now: fpc 2 state unexpected, fpc will be powered
off/on

Dec 18 13:19:05 CHASSISD_SNMP_TRAP10: SNMP trap generated: Fru Offline


(jnxFruContentsIndex 7, jnxFruL1Index 3, jnxFruL2Index 0, jnxFruL3Index 0,
jnxFruName FPC: MPC2E NG PQ & Flex Q @ 2/*/*, jnxFruType 3, jnxFruSlot
2,jnxFruOfflineReason 3, jnxFruLastPowerOff 154824767, jnxFruLastPowerOn
1279395637)
Dec 18 13:19:05 fpc_offline_now - slot 2, is_resync_ready cleared
Dec 18 13:19:05 fpc/pic 2/2: No handle for i2c_id 0x0
Dec 18 13:19:05 fpc/pic 2/3: No handle for i2c_id 0x0
Dec 18 13:19:05 hwdb: entry for fpc 3076 at slot 2 deleted

Dec 18 13:19:05 send: red alarm set, device FPC 2, reason FPC 2 Hard errors

Dec 18 13:19:05 CHASSISD_SNMP_TRAP7: SNMP trap generated: Fru Failed


(jnxFruContentsIndex 7, jnxFruL1Index 3, jnxFruL2Index 0, jnxFruL3Index 0,
jnxFruName FPC: MPC2E NG PQ & Flex Q @ 2/*/*, jnxFruType 3, jnxFruSlot 2)
Dec 18 13:19:05 -- Scratch (0x00) 0x55
Dec 18 13:19:05 -- Version (0x01) 0x66
Dec 18 13:19:05 -- Master Status (0x02) 0x07
Dec 18 13:19:05 -- Mastership Timeout (0x03) 0x3f
Dec 18 13:19:05 -- Master Force (0x04) 0x00
Dec 18 13:19:05 -- Interface 0 (0x05) 0x10
Dec 18 13:19:05 -- Interface 1 (0x06) 0x00
Dec 18 13:19:05 -- Soft Reset (0x07) 0x00
Dec 18 13:19:05 -- Date Code (0x0F) 0x02
Dec 18 13:19:05 -- Error/Interrupt Status (0x10) 0x1b
Dec 18 13:19:05 -- Interrupt Enable (0x11) 0x00
Dec 18 13:19:05 -- FRU LED Control (0x12) 0x02
Dec 18 13:19:05 -- Misc IO Status (0x13) 0x08
Dec 18 13:19:05 -- Button Status (0x14) 0x03
Dec 18 13:19:05 -- Button Interrupt Enable (0x15) 0x00
Dec 18 13:19:05 -- Power Control (0x20) 0x00
Dec 18 13:19:05 -- Power Up Status (0x21) 0x00
Dec 18 13:19:05 -- Power Disable Status (0x22) 0x00
Dec 18 13:19:05 -- Power Disable Cause (0x23) 0x40
Dec 18 13:19:05 -- Power Volt Fail Status (0x24) 0x1f
Dec 18 13:19:05 -- Power Volt Fail Cause (0x25) 0x01
Dec 18 13:19:05 -- Power Volt Fail Status (0x2A) 0x51
Dec 18 13:19:05 -- Power Volt Fail Cause (0x2B) 0x00
Dec 18 13:19:05 -- Power Volt Fail Status (0x2E) 0x33
Dec 18 13:19:05 -- Power Volt Fail Cause (0x2F) 0x00
Dec 18 13:19:05 -- Power Volt Fail Status (0x3A) 0x30
Dec 18 13:19:05 -- Power Volt Fail Cause (0x3B) 0x00
Dec 18 13:19:05 -- Power Volt Fail Status (0x40) 0x7f

>> Can we open 2 sessions to the router and try to collect FPC boot
>>messages:

Start shell user root


Cty fpc2

> From the the other session try to restart the FPC.

3- can we have full /var/log from both RE0 and RE1 and upload it to the
case.

You might also like