You are on page 1of 2

TECHNICAL COMMUNICATION No. TC0827 Ed.

01

OmniPCX Enterprise Nb of pages : 2 Date : 02 October 2006

URGENT

NOT URGENT TEMPORARY PERMANENT

SUBJECT: FREEZE OF CPU6 STEP2 OR CS ON RACK SERVER IN MAIN OR STAND-BY ROLE

1. PROBLEM
Systems equipped with CPU6 Step2 or CS on Rack Server (Common Hardware) can be subject to
freeze of CS. CS Main or Stand-By would run out of service and be blocked. An hardware reset is
therefore requested to restart the CS.
The freezing mostly happen when the hard disk is stressed for example during checkdb, the daily
backup, remote download, etc..
Reference of Anomaly Report: XTSce81548.

2. CAUSE
Two causes can lead to the blocking of the CS:
− a hardware issue with the hard disk or
− a software issue

3. IMPACTED VERSIONS OF THE SOFTWARE ISSUE


Impacted versions of the software issue are:
− F2.502.10.a
− F2.502.11
− F3.301.15
− F4.401

4. INVESTIGATION TO PERFORM
4.1 Dectection of hardware issue
To detect any hardware issue of hard disk, check the /var/log/syslog* file from the content of
error message. The command to perform under root login is
zegrep “ide |hda” /var/log/syslog*
Example of disk errors:
syslog:Oct 18 19:31:13 boul0a80 kernel: ide0: reset timed-out, status=0xd0
syslog:Oct 18 19:31:13 boul0a80 kernel: hda: drive not ready for command
syslog:Oct 18 19:31:43 boul0a80 kernel: hda: status timeout: status=0xd0 { Busy }
syslog:Oct 18 19:31:43 boul0a80 kernel: end_request: I/O error, dev 03:05 (hda),
sector 966674

1
TECHNICAL COMMUNICATION No. TC0827 Ed. 01

OmniPCX Enterprise Nb of pages : 2 Date : 02 October 2006

URGENT

NOT URGENT TEMPORARY PERMANENT

SUBJECT: FREEZE OF CPU6 STEP2 OR CS ON RACK SERVER IN MAIN OR STAND-BY ROLE

Moreover, error logs are recorded directly on the disk by the smartctl tool:
− Enter the smartctl -e /dev/hda command to enable the reading of the data
− Enter the smartctl -a /dev/hda command to read the data
The errors are logged at the end of the file.
Example with errors
...
SMART Error Log:
SMART Error Logging Version: 1
Error Log Data Structure Pointer: 01
ATA Error Count: 199
Non-Fatal Count: 0

Error Log Structure 1:


DCR FR SC SN CL SH D/H CR Timestamp
0c ff ff ff ff ff ff ff 0
00 00 3f 3f 68 97 af 91 4082097
00 00 3f 3f 68 97 e0 10 4082097
00 00 01 6c 37 01 e0 30 4082097
00 00 01 6c 37 01 e0 30 4082106
00 04 00 6c 37 01 e0 51 0
Error condition: 0 Error State: 3
Number of Hours in Drive Life: 5463 (life of the drive in hours)

Sometimes hard disk failure is only seen in the system messages or only in smartctl result. Other times
in both logs. Whenever any hard disk failure is found in any of the logs, the hard disk must be
changed.
4.2 Detection of software issue
In the incvisu file, before the blocking, no incident concerning the shutdown of the system would be
visible. In such case the system should be updated with one of the correcting patches specified
hereafter.

5. SOLUTION
Correction of this issue is be available in the following patches:
− F2.502. 20
− F3.301.24
− F4.401.16.c
− F4.401.18

You might also like