You are on page 1of 2

Troubleshooting RAID Multiple Drive Failures

When addressing a multiple drive failure, there are several key pieces of
information that need to be determined prior to performing any state modifications.

RAID Level
o Is it a RAID 6?
RAID 6 volume group failures occur after 3 drives have failed in the volume
group

o Is it a RAID 3/5 or RAID 1?


RAID 5 volume group failures occur after two drives have failed in an
volume group.

o RAID 1 volume group failures occur when enough drives fail to cause an
incomplete mirror.
This could be as few as two drives or half the drives + 1.

o RAID 0 volume groups are dead upon the first drive failure

Despite the drive failures is each individual volume group configuration


complete?
i.e. Are all drives accounted for, regardless of failed or optimal?

How many drives have failed and what volume group does each drive belong?

In what order did the drives fail in each individual volume group?

Are there any global hot spares?


o Are any of the hot spares in use
o Are there any hot spares not in use and if so are they in an
optimalcondition?

Are there any backend errors that lead to the initial drive failures?
o This is the most common cause of multiple drive failures, all backend
issues must be fixed or isolated before continuing any further

Multiple Drive Failures Why RAID Level is Important

RAID 6 Volume Groups


o RAID 6 volume groups can survive 2 drive failures due to the p+q redundancy
model, after the third drive failure the volume group is
marked as failed
o Up until the third drive failure, data in the stripe is consistent across
the drives

RAID 5 and RAID 3 Volume Groups


o After the second drive failure the volume group and associated volumes are
marked as failed, no I/Os have been accepted since the second drive failed
o Up until the second drive failure, data in the stripe is consistent across
the drives

RAID 1 Volume Groups


o RAID 1 volume groups can survive multiple drive failures as long as one
side of the mirror is still optimal
o RAID 1 volume groups can be failed after only two drives fail if both the
data drive and the mirror drive fail
o Until the mirror becomes incomplete the RAID 1 pairs will function normally

RAID 0
o As there is no redundancy these arrays cannot generally be recovered.
However, the drives can be revived and checked no guarantees can be made
that the data will be recovered.

Multiple Drive Failures Configuration Considerations

Although there are several mechanisms to ensure configuration integrity there are
failure scenarios that may result in configuration corruption

If the failed volume groups configuration is incomplete, reviving and


reconstructing drives could permanently corrupt user data

If any of the drives have an offline status (06.xx), reviving drives could
revert them to an unassigned state

How can this be avoided?


o Check to see if the customer has an old profile that shows the appropriate
configuration for the failed volume group(s)
o If the volume group configuration appears to be incomplete, corrupted, or
if there is any doubt escalate immediately

You might also like