Professional Documents
Culture Documents
How To Determine If A Storage Drive Should Be Replaced in A VxRail vSAN Cluster and Preparation Steps (27 Jun 2019) - EMC COMMUNITY
How To Determine If A Storage Drive Should Be Replaced in A VxRail vSAN Cluster and Preparation Steps (27 Jun 2019) - EMC COMMUNITY
Product:
VxRail Appliance Family,VxRail Appliance Series,VxRail 460 and 470 Nodes,VxRail E Series
Nodes,VxRail G Series Nodes,VxRail Gen 5,VxRail Gen2 Hardware,VxRail P Series Nodes,VxRail S
Series Nodes,VxRail V Series Nodes,VxRail Software
Instructions:
About VxRail Remote Support's failed drive troubleshooting and hardware replacement
process:
Timely resolutions are a main focus of VxRail Support's approach to situations where drives
may need to be replaced. In general, all that is required to send replacement drives is to identify a
faulted disk and ensure the correct part(s) are matched up for dispatch. In most cases, this can be
done quickly and arrangements should be made to send any needed replacements very quickly.
It is also important to ensure that drive replacement is the appropriate solution and the correct
hardware information has been identified to avoid issues with sending the correct hardware and
unexpected problems during/after replacement. Sometimes, it may appear that a drive has failed
because vCenter or host clients show a disk or diskgroup in an unexpected state. Even if a drive
shows 'Permanent Device Loss' (PDL), that drive may not actually be faulted, and replacement might
not resolve the issue.
The below steps are intended to be quick but to also maximize the chance of identifying the
root issue, appropriate resolution, and any additional impact or problems that should be addressed. It
is expected that all verification steps (barring complications) can be done and a determination can be
made to send a disk replacement in less than 30 minutes, unless checks indicate an issue that
requires a different approach.
https://community.emc.com/docs/DOC-76634 1/4
4/11/2020 EMC Community Network - DECN: Dell EMC VxRail: How to determine if a storage drive should be replaced in a VxRail vSAN cluster an…
2. Check vCenter and the host to determine if a vSAN drive is marked PDL. On a host with
a suspected drive failure you can use command line:
3. Determine if the 'failed' device is real/if it's the actual faulted drive:
If the 'Name' of a drive is missing (name should start with "naa.") and just the UUID
appears, this can occur due to bad device partitions/diskgroup metadata. This often doesn't
include any actual hardware faults. See https://support.emc.com/kb/529488 about these "ghost
devices"/"phantom drives" and how to address them. In some cases, if steps in the article don't
resolve these situations, a factory reset of the node will be the best way to move forward.
For clusters with 'Deduplication and Compression' enabled, failure may be indicated on a
cache drive when a fault is present on a capacity drive in the diskgroup. Refer to article
https://support.emc.com/kb/495439
https://community.emc.com/docs/DOC-76634 2/4
4/11/2020 EMC Community Network - DECN: Dell EMC VxRail: How to determine if a storage drive should be replaced in a VxRail vSAN cluster an…
In this image, the bottom drive is a "ghost device" where a failed capacity drive used to be. The top
one shows failures in 'Operational...' columns but doesn't actually indicate a hardware fault.
Removing the diskgroup from the vSAN should result in the physically healthy disks showing up
normally again. You can re-create the diskgroup later, after replacing any drives that are actually
failed.
Steps AFTER determining that drive replacement should resolve the issue:
1. The VxRail Support Engineer (TSE) checks for and notes any additional issues that might
impact drive replacement or that will need to be addressed afterwards.
2. Collect and upload logs to the Service Request (SR) if not done already. Expected logs are TSR
logs (if Dell hardware environments), the VxRail Manager log bundle, and vCenter + host logs. *See
information about collecting VxRail logs at https://support.emc.com/kb/333684
4. Customer chooses between replacing drives themselves or if a field resource needs to be sent
on site to do it.
5. TSE updates case notes with details about the situation, upcoming disk replacement, and any
special considerations.
6. TSE provides node replacement documentation (customer can also download from Solve). If
'De-duplication and Compression' is enabled on the cluster, TSE should attach
https://support.emc.com/kb/528355 to the SR, ensure it is provided to the customer, and include it in
'CE Instructions' if a field resource will be sent.
7. The VxRail Support Engineer creates a work order for dispatch of part(s) and labor (if needed).
At this point, the SR ownership moves from VxRail 'Remote Support' to 'Field Support'. If the original
SR was not specifically for a failed drive, the TSE should create a new SR for the dispatch and retain
ownership of the original one.
https://community.emc.com/docs/DOC-76634 3/4
4/11/2020 EMC Community Network - DECN: Dell EMC VxRail: How to determine if a storage drive should be replaced in a VxRail vSAN cluster an…
8. Someone from Dell EMC's scheduling team reaches out to the customer contact to finalize
shipping/dispatch details.
Notes:
https://community.emc.com/docs/DOC-76634 4/4