You are on page 1of 3

Seagate Confidential

FW RELEASE:

OEM 0005

REL DATE: 09-18-2012

Customer(s): STD_OEM
Product(s): MantaRay
Interface(s): SAS
NOTES:

0004 to 0005

Drive is unresponsive after servo code corruption in the flash during download
Doc Reference

CDD-169723

Likelihood

Low

Severity

Major

System Level Error

Drive Hang

Failure Scenario

Drives were returned from a customer as DNR (Drive Not Ready). Investigation determined that the servo code
in the flash was corrupted. The servo code was corrupted because power was pulled during download while flash
was being programmed, causing Servo unload failure on a download reset. When the drive is in this state,
download of a combination of Controller plus servo code could result in a drive hang.

Root Cause
F/W Change Descr

Inadvertent corruption of Servo programmed in flash.


If servo code in the flash is corrupted, the drive will not be ready. This change allows an entire firmware file (that
has servo included) to be downloaded and all the parts except the servo code will be discarded and the servo will
be re-flashed. After that, the entire firmware code can be downloaded and the drive will be restored.

Re-Transmit bit Incorrectly set to 1 in SAS response frame


Doc Reference

CDD-173832

Likelihood

Low

Severity

Moderate

System Level Error

Spec Violation

Failure Scenario

- The drive attempts to send a response for a command, but the response fails for some reason (i.e. NAK'ed by
initiator, port down, etc).
- The command associated with the response is aborted (ie. due to hard reset).
- The Re-Transmit bit may be incorrectly set to 1 in the response frame for a subsequent command.

Root Cause

When a response needs to be re-transmitted, a flag is set to indicate that the Re-Transmit bit should be set in the
response frame. If the command which is associated with the response is aborted prior to the response being retransmitted, the flag may not be cleaned up properly. A subsequent response frame for a different command
may have the Re-Transmit bit incorrectly set to 1.

F/W Change Descr

When a command is aborted, properly clean up the flag that is used to indicate that the Re-Transmit bit is to be
set.

Multiple Start Immediate commands can cause unexpected status


Doc Reference

CDD-174655

Likelihood

Low

Severity

Moderate

System Level Error

Unexpected status

Failure Scenario

Multiple Start Immediate commands, issued together, can cause unexpected status to be returned (e.g., 02/04/02
- Need SSU when it should have been 02/04/01 - in process of coming ready). This can lead to unexpected
system behavior.

Root Cause

When processing the second of two Start Immediate commands received together, a structure that typically
contains command context is referenced after being freed and reused by a new command.

F/W Change Descr

When evaluating the properties of a Start Stop Unit command, only reference fields that are still valid after the
command returns status to the host.

Flash LED and RAID Degradation Following System Reboot


Doc Reference

CDD-175009

Likelihood

Low

Severity

Major

System Level Error

Assert (Flash LED)

Failure Scenario

Power cycle the machine with RAID controller. Upon booting, the controller hangs the bus in the middle of a RLA
read. After hanging for 20 seconds the controller sends hard reset to the drive. As a result the drive asserts.

Root Cause

When host dropped bus in the middle of read, drive would not properly abort RLA reads because the bus was
dropped in a very specific window. If a reset is sent in that ~4 microsecond window, it could cause drive to free an
internal data pointer repeatedly and lead to assert.

F/W Change Descr

Updated aborting command logic to prevent data pointers from being freed repeatedly.

Internal error events unnecessarily logged


Doc Reference

CDD-175689

Likelihood

Low

Severity

Minor

System Level Error

Spec Violation

Failure Scenario

After a stop command is issued, an internal process may still send a write request to the read/write subsystem.
The request fails, causing data to be logged to internal error logs. The event is nonfatal and does not affect drive
operation.

Root Cause

A routine that tries to save state information to the media was not checking the state of the drive before issuing
its request to the read/write subsystem.

F/W Change Descr

In the routine that tries to save state information to the media, return an error if that disc is not in a state to accept
writes, before sending the request to the read/write subsystem.

Drive incorrectly reports link reset received during data transfer error
Doc Reference

CDD-177916

Likelihood

Low

Severity

Minor

System Level Error

Protocol Violation

Failure Scenario

Loss of sync that would lead to a drive initiated OOB would happen at a point in time such that the drive would
later incorrectly initiate a data transfer even though the loss of sync was already detected resulting in the drive
incorrectly aborting a command and sending a link reset received during data transfer error (0B/4B/03/01) instead
of holding off the data transfer until re-synced.

Root Cause

The function called by firmware to check if the port was blocked did not include an out of sync indication and so
firmware was not blocked even though the port was in a loss of sync state at the time.

F/W Change Descr

Changed the function that determines if the port is blocked in firmware to include the out of sync condition.

Pseudo Read Errors Incorrectly Counted Toward A Hardware Error SMART Trip

Doc Reference

CDD-184858

Likelihood

Low

Severity

Major

System Level Error

Spec Violation

Failure Scenario

A pseudo unrecovered error can be generated on an LBA by the WRITE LONG command with the COR_DIS bit
set to one and the WR_UNCOR bit set to one. When these pseudo unrecovered errors were read, the resulting
unrecovered read errors were getting counted against the Hardware Error SMART trip counter. If a number of
these errors are encountered over a defined interval of time, then a Hardware Error SMART trip would occur.

Root Cause
F/W Change Descr

FW did not adequately filter pseudo read errors from the Hardware Error SMART trip counter.
Changed FW to not increment the Hardware Error SMART counter if read error is a pseudo error that results in
an 03/11/00/83.

BMS does not restart after temperature gets out of range


Doc Reference

SDD-176401

Likelihood

Low

Severity

Major

System Level Error

BMS Failure

Failure Scenario

When the drive temperature is out of range (hot or cold), BMS is suspended. BMS does not restart, even when
the temperature returns to normal, until the drive is power cycled.

Root Cause

After BMS is suspended due to temperature being out of range, request to restart BMS is not issued.

F/W Change Descr

If BMS is suspended due to temperature being out of range, an attempt to restart it will be performed every 5
minutes. If the temperature gets within the valid range, BMS will restart.