3PAR Sparing (Aug 2015)

Technical white paper
HP 3PAR sparing
Table of contents
Executive summary ...................................................................................................................................................................... 2
Definitions ....................................................................................................................................................................................... 2
Spare space .................................................................................................................................................................................... 2
Distributed sparing ................................................................................................................................................................... 2
Rebuilds....................................................................................................................................................................................... 3
How much spare space do I need? ............................................................................................................................................ 3
Spare policy—schemes ........................................................................................................................................................... 3
Spare rate ................................................................................................................................................................................... 4
Sparing algorithms defined ......................................................................................................................................................... 4
Nearline (NL) drives are special .............................................................................................................................................. 4
Spare space implementation: admithw ................................................................................................................................... 5
How does it all work?.................................................................................................................................................................... 6
Evacuating drives ...................................................................................................................................................................... 6
Viewing spare space ................................................................................................................................................................. 6
Changing spare space policy .................................................................................................................................................. 7
Sparing performance .................................................................................................................................................................... 9
System performance during sparing .................................................................................................................................... 9
Sizing: NinjaSTARS ...................................................................................................................................................................... 10
Adaptive sparing .......................................................................................................................................................................... 11
Adaptive sparing examples .................................................................................................................................................. 12
Predictive drive failures .............................................................................................................................................................. 13
Conclusion ..................................................................................................................................................................................... 13
Technical white paper | HP 3PAR sparing
Executive summary
HP 3PAR storage arrays are designed for high availability and include many features that enable the array to keep going
when a failure occurs and one of these features is the sparing policy.
The HP 3PAR StoreServ sparing policy provides space and processes to handle mechanical failures of spinning media
devices as well as flash devices. The sparing policy gives the user some measure of control in defining different levels of
space overhead to match the needs of the environment. HP 3PAR StoreServ sparing architecture provides many-to-many
rebuilds resulting in speedy recovery. The reduced recovery time limits the exposure of a second failure occurring while still
recovering from the first failure.
HP 3PAR StoreServ sparing policy and algorithms are an integral part of the HP 3PAR StoreServ operating system and are
part of every HP 3PAR storage array. This white paper explains the sparing policy and algorithms of HP 3PAR StoreServ
arrays so applications achieve maximum benefit.
Definitions
• Chunklet: Physical disks are divided into chunklets. Each chunklet occupies contiguous space on a disk. Chunklets on
HP 3PAR StoreServ 10000 and 7000 Storage arrays are 1 GB.
• Sparing policy: HP 3PAR StoreServ policy which manages the sparing process. The policies include user options (schemes)
and algorithms to reserve spare chunklets and move data to these chunklets when necessary.
• Devtype: A category of device such as FC, NL, and SSD.
– FC = Fast Class. 10K and 15K RPM spinning media devices
– NL = Near Line. 7.2K spinning media devices
– SSD = Solid State Drive (see below)
• Scheme: Sparing policy configuration options that include:
– Default
– Minimal
– Maximal
– Custom
• Relocation: The movement of data in a chunklet from one place, such as a failing drive, to another place, such as a spare
chunklet on a good drive.
• SSD: Solid State Drive. Non-volatile flash memory chips packaged in a hard disk form factor. Functions as a disk device,
but with the properties of flash including lower power consumption and faster random read performance.
Spare space
Storage arrays today have a variety of approaches to address the basic challenge of how to keep an array running when
failures occur. The basic building block of data protection has been RAID for a long time and this key element continues
today. A key element of data protection is sparing.
Sparing is a process by which a RAID storage system restores redundancy to data stored across a collection of disks after a
single disk fails. All RAID arrays in the industry today implement some type of sparing algorithm.
Traditionally, storage array vendors have architected spare disks as part of array design. Spare disks, sometimes called hot
spares or dedicated spares, are extra disks that are standing by waiting for a failure to occur before they are utilized. When a
primary disk fails, the spare disk is immediately called into service to replace the failed disk.
One problem with the dedicated spare disk approach is loss of performance. A dedicated spare disk is not used until the
failure of another disk. The space, power, potential performance, and capital used to purchase the disk are all wasted
resources the majority of the life of the disk. HP 3PAR has addressed these shortcomings by virtualizing the function of the
spare disk called distributed sparing.
Distributed sparing
On HP 3PAR StoreServ arrays, physical disks are divided into chunklets when a disk is admitted to the system. Some
chunklets on each disk are used to hold user data and some chunklets are designated as spares. This spare space serves
the same function as a dedicated spare, but provides the performance benefit of having all drives in the system active.
2
Rebuilds
Another benefit of distributed sparing is many-to-many rebuilds. When a drive fails, a process begins to recover the lost
redundancy by rebuilding the data that was on the failed drive. All used chunklets on the failed drive are rebuilt on spare or
free chunklets on other drives. Used chunklets are reconstructed by reading data from the remaining chunklets in the RAID
set and computing the missing data from the parity information if necessary.
The rebuild process chooses a target spare chunklet using several criteria. These criteria are prioritized to maintain the
same level of performance and availability as the source chunklet if possible. The following list shows the priority used to
select a spare chunklet.
1. Locate a chunklet on the same type of drive (e.g., NL, FC).
2. Maintain the same HA characteristics as the failed chunklet (e.g., HA cage, HA magazine).
3. Keep the chunklet on the same node as the failed drive.
The best case is when spare chunklets are on the same node as the failed drive with the same availability characteristics.
When spare chunklets with this criteria are not available, free chunklets with the same characteristics are considered.
During the sparing process if the number of free chunklets used exceeds a threshold set in the HP 3PAR StoreServ OS,
consideration will be given to spare chunklets on another node. This will help keep the array balanced.
The sparing algorithm will locate target chunklets spread around many different disks. The disk being rebuilt will also have
its remaining good chunklets spread among many disks creating a many-to-many rebuild.
How much spare space do I need?

Reserved spare space is not used for user data, so the need is for just enough to handle any failures, but no more. When
reserving spare space, there are several factors to consider, but the defaults often work best.
Spare space is used to hold data from a failing or failed drive. The minimum space needed is the capacity of one drive.
Since the number of drive failures is related to the number of drives in the array, expect to configure more spare space
when there are more drives.
Another factor to consider is the architecture of the array. The current HP 3PAR StoreServ 10000 arrays use a magazine
that holds four drives as part of the physical hardware. When a failed drive must be replaced, the entire magazine must be
removed to access the failed drive. The service process will relocate the used chunklets from the three good drives in the
magazine to spare chunklets before the magazine is removed. Since a minimum of four drives will be physically removed
from the system during drive replacement, HP recommends a minimum spare space on these systems of four drives.
There is an option when replacing a failed drive on a HP 3PAR StoreServ 10000 to use log space to hold write I/Os to the
three good drives during drive replacement. This option is not always possible, but when it is possible, the time to copy
three drives worth of space to spare chunklets can be saved. This makes the drive replacement process faster.
Writes made to the three good drives in the magazine are held in log space while they are unavailable because of the drive
replacement procedure. When the drive replacement procedure is completed, the data is replayed from the log space to the
drives. This option can save considerable time, but is not always possible. Therefore, HP recommends a minimum of four
drives worth of spare space on systems with drive magazines.
Spare policy—schemes
The HP 3PAR StoreServ OS provides four user-settable options relating to spare space sizing. These options, referred to as
schemes, are:
• Minimal
• Default
• Maximal
• Custom
Best practice: The default spare policy provides the best balance of reserved spare space and usable space and is the
HP recommended option for most configurations.
The spare policy is set when the system is installed. The policy can be changed using HP 3PAR StoreServ CLI commands.
The spare policy is set with the HP 3PAR StoreServ CLI setsys command. The spare policy is implemented with the HP 3PAR
StoreServ CLI admithw command.
3
The first three sparing policies—Minimal, Default, and Maximal—are automatically managed by the HP 3PAR StoreServ OS,
while the custom setting requires the administrator to actively manage spare space. The custom setting requires the use of
the HP 3PAR StoreServ CLI commands createspare, removespare, and showspare.
HP 3PAR StoreServ CLI commands used to manage sparing are documented in the HP 3PAR Command Line Interface
Reference manual.
The HP 3PAR StoreServ CLI setsys command allows many system parameters to be set. The specific form of the command
to set the sparing policy is shown below. This example sets the sparing policy to “Default”.
lab-eosxx cli% setsys SparingAlgorithm Default
Spare rate
A key parameter in the spare policy is the spare rate. The spare rate is a target amount of space to set aside for sparing
expressed relative to the number of disks of a given type in the system. A spare rate of 24, for example, sets target spare
space equal to the size of one drive for every 24 drives in the array.
The spare rate is defined based on the hardware configuration in the array as follows:
• Spare rate = 40 if there are any magazines present in the array (e.g., 10,000)
• Spare rate = 24 for all others such as 7400/7400c, 7450/7450c, etc.
Note
The two terms, spare rate and sparing rate, are similar but different. Please don’t confuse them. The term spare rate as used
here represents the ratio of spare space to total space as just discussed. The term sparing rate is the rate at which we can
move data to reconstruct a failed drive and is dependent on many factors, including array load and the number of drives.
Sparing algorithms defined

The spare option is set for the entire array and applied to each disk type (SSD, FC, NL) in the following way.
• Default: Roughly 2.5 percent with minimums
Small configurations (e.g., 7000’s with less than 48 disks, 10000’s with less than 80 disks) have a spare space target
equal to the size of two of the largest drives of the type being spared.
• Minimal: Roughly 2.5 percent without minimums
Minimal is the same as default except for small configurations. When the configuration is small, minimal will only set
aside spare space equal to one of the largest drives (compared to two for default) of the type being sized.
• Maximal: One disk’s worth in every cage
Spare space is calculated as the space of the largest drive of the drive type being sized for each disk cage.
• Custom: Implemented manually by the administrator using the createspare, showspare, and removespare commands.
Nearline (NL) drives are special

Nearline (NL) drives get special treatment in calculating spare space using the automatically managed options (Default,
Minimal, Maximal). In general, spare space should be created and used in each drive type to address the spare needs of that
drive type. There is a special case, however, where FC spare space can be used to serve the spare needs of the NL tier.
FC disks perform better than NL drives in general so there should be no negative performance impact from relocating
chunklets of a failing NL drive to spare space on FC disks.
Spare space is calculated as a number of spare chunklets and is calculated for each drive type starting with FC drives. Spare
space for NL drives is calculated as described above, then it is reduced by the number of spare chunklets in the FC tier. For
example, if 5,000 spare chunklets are needed in the FC tier and 8,000 spare chunklets for the NL tier, only the incremental
need of 3,000 (8,000–5,000) spare chunklets will be allocated on the NL tier.
4
Spare space implementation: admithw

When one of the automatically managed sparing policies are set, it does not immediately change any spare space on the
system. Implementing the sparing policy is done with the admithw command. As its names suggests, it is intended for
execution when new hardware is added to the array, but it can also be run to implement a new sparing policy.
Note
The admithw command can be run in a production environment to change the sparing scheme. The back-end workload to
implement a sparing policy change is low.
The sparing policy setting can be displayed using the “showsys –param” command. As you can see from the example below,
this command will show the current value of many system parameters including the sparing algorithm value, which is set
to Default.
lab-eosxx cli% showsys -param
System parameters from configured settings
------Parameter------ --Value--
RawSpaceAlertFC : 0
RawSpaceAlertNL : 0
RawSpaceAlertSSD : 0
RemoteSyslog : 0
RemoteSyslogHost : 0.0.0.0
SparingAlgorithm : Default
EventLogSize : 3M
VVRetentionTimeMax : 336 Hours
UpgradeNote :
PortFailoverEnabled : yes
AutoExportAfterReboot : yes
AllowR5OnNLDrives : no
AllowR0 : no
ThermalShutdown : yes
FailoverMatchedSet : no
SessionTimeout : 01:00:00
lab-eosxx cli%
5
How does it all work?

The HP 3PAR StoreServ OS will use spare space to rebuild chunklets. The intent of the spare space is to reserve space to
manage a failure from the time beginning when a PD is identified as failed or failing, until the PD can be replaced and the
system returned to a healthy state.
Disks can fail in many ways and we have learned a lot over the years about disk failures. A disk could fail catastrophically
making it immediately unavailable. In this case, the data on the disk is destroyed and must be reconstructed from the
remaining good data on other disks using mirroring or parity information.
A more common case is a slow increase in errors (e.g., bad tracks) on a specific disk that will eventually lead to a complete
failure. In these cases, the error trend is recognized by the HP 3PAR StoreServ and when it exceeds a threshold, the disk will
proactively be marked as failed. Since the disk is still functional, it will not be handled as a catastrophic failure, but marked
as degraded while chunklets are moved off the disk. When all data is moved off the failing disk, it will be marked as failed
and removed from the system.
Evacuating drives
The process of moving chunklets off a disk is called evacuating the drive. This process is used when addressing a failing, but
not yet failed, disk. This process can take considerable time, depending on how much data must be evacuated off the drive
and how much I/O is active on the back-end of the array.
Note
Some service commands also cause drives to be evacuated. These commands are intended for service engineers (e.g., TS).
Viewing spare space

As mentioned above, the sparing policy can be displayed using the showsys –param command. The current spare space
usage can be displayed with the showspare command. Showspare will list all spare chunklets as well as free chunklets that
are in use as a spare. Since most systems will have many spare chunklets and the showspare command will list each one,
the output from the showspare command may be quite lengthy. You may wish to use the cmore command with the
showspare command to cause the output to be paged for better display. Here is an example of the showspare command
from a typical system where no spare chunklets are in use:
Lab-eosxx cli% showspare
Pdid Chnk LdName LdCh State Usage Media Sp Cl From To
0 257 ---- --- none available valid Y Y --- ---
6
Here is an example from a 7400c with a failed drive. This example uses the –used option to only show chunklets currently in
use as spare chunklets.
Lab-eosxx cli% showspare -used
Pdid Chnk LdName LdCh State Usage Media Sp Cl From To
12 1616 tp-3-sd-0.13 10 normal ld valid Y N 2:92 ---
---------------------------------------------------------------
Total chunklets: 8
Changing spare space policy

Implementing a change to the spare space policy requires two steps. The first step is to change the policy as configured on
the system. The second step is to implement the change using the admithw command. The following is an example starting
with the showsys -param command to check the current setting before the change. This example changes the policy from
“Minimal” to “Default”.
lab-eosxx cli% showsys -param
System parameters from configured settings
------Parameter------ --Value--
RawSpaceAlertFC : 0
RawSpaceAlertNL : 0
RawSpaceAlertSSD : 0
RemoteSyslog : 0
RemoteSyslogHost : 0.0.0.0
SparingAlgorithm : Minimal
EventLogSize : 3M
VVRetentionTimeMax : 336 Hours
UpgradeNote :
PortFailoverEnabled : yes
AutoExportAfterReboot : yes
AllowR5OnNLDrives : no
AllowR0 : no
ThermalShutdown : yes
FailoverMatchedSet : no
SessionTimeout : 01:00:00
lab-eosxx cli% setsys SparingAlgorithm Default
7
lab-eosxx cli% admithw
Checking for drive table upgrade packages

Checking nodes...
Checking volumes...
...
Rebalancing and adding FC spares...
FC spare chunklets rebalanced; number of FC spare chunklets increased by 408
for a total of 816.
Rebalancing and adding NL spares...
No NL PDs present
Rebalancing and adding SSD spares...
SSD spare chunklets rebalanced; number of SSD spare chunklets increased by
446 for a total of 892.
...
Lab-eosxx cli%
This change from Minimal to Default resulted in an increase of 408 spare chunklets added to the FC spare space and
446 spare chunklets added to the SSD spare space.
Space must be available to implement a change requiring an increase in spare space. The change will not be made if the
required space is not available. In the example above to change the sparing policy from Minimal to Default, 408 additional
spare chunklets are required in the FC devices. If these additional chunklets are not available, no change to spare space will
be made and the admithw command will terminate like the example below.
lab-eosxx cli% admithw
Checking for drive table upgrade packages

Checking nodes...
Checking volumes...
...
Rebalancing and adding FC spares…
408 spares were requested, but only 1 are possible.
lab-eosxx cli%
Notice the admithw command stops when it cannot make the spare space change. In this case, since spare space is
allocated to FC disks first, no changes were attempted to NL or SSD disks.
8
Sparing performance
When a drive fails, it reduces the data protection for the RAID set. Since data protection is a high priority, a key question
becomes “How long will the reduction in data protection last?” Stated another way, how long will it take to rebuild the data
from the failed drive to a new location so data protection is restored, or simply, how long will the rebuild take?
There are a large range of possibilities depending on factors like the size of the failed drive, how much data is on that drive,
how busy the array is, and the configuration. A 600 GB FC disk, for example, in a configuration with 140 other FC disks may
take 90 minutes to rebuild while a 900 GB FC disk configured in RAID 6 on a busy system may take considerably longer.
This window of reduced availability that follows the failure of a physical disk is important for several reasons. First, data
protection is reduced such that a second failure during this window could lead to data loss.
A second dynamic that occurs during this window is a change in the I/O behavior of the data that was on the failed disk.
A write to the failed disk will result in the write being cached as always. The write will be de-staged to a log space, which
is space set aside to handle situations like this. Once the rebuild is complete, the log disk is replayed to apply the latest
changes to the volume. A read operation will require the data be reconstructed from the available good data. This
reconstruction may require multiple back-end reads depending on the RAID mode.
System performance during sparing

When a drive fails, the array has a lot of work to do to restore the previous level of protection. Normal read and write
operations to the effected data is also impacted. During a disk rebuild, array performance may be impacted by the additional
work required to restore the expected level of data protection.
Figures 1 and 2 shows one example of the impact of a failed disk on performance. Figure 1 shows the host service times,
which in this example remain constant. In this case, the read service times (blue lines on top) and write service times
(green line on the bottom) seen by the host do not change. The host is unaware of the disk failure.
Figure 1. Host Response Times during Rebuild
Figure 2 shows what’s happening on the back-end of the array in the same example. You can clearly see a large increase in
the read rate (green line goes from ~300 Mbps to ~500 Mbps) and a smaller increase in the write rate (red line).
Figure 2. Disk Throughput during Rebuild
Although the host service times are not impacted by the failed disk in this case, there is an increase in the workload on the
back-end of the array. It is easy to imagine a workload where this increase in back-end I/O resulting from a failed disk could
cause increased host service times.
9
There are other potential impacts from rebuilding a failed disk. When space is constrained it is possible that some data may
be moved between controllers which could result in changing the workload balance between nodes. In one example a
balanced system handling 20,000 IOPS by each of two controllers before a disk failure became imbalanced after the disk
rebuild. Following the disk rebuild, one controller was handling 17,500 IOPS (44 percent) and the other controller was
handling 22,500 IOPS (56 percent). This imbalance was caused by the need to rebuild some chunklets on a different
controller than the controller owning the failed disk.
HP 3PAR storage arrays are highly available and have many features to protect user data from single failures. When a
failure does occur, however, it is possible to introduce a change in performance. It will not be observed in all cases, but
following a failure you should not be surprised if performance changes. There is no guarantee that performance will be
maintained at the same level following a failure.
Sizing: NinjaSTARS
Sparing requires resources and therefore must be considered when sizing the system. NinjaSTARS (version 2.6.0.5) includes
a provision to specify either Default or Minimal as spare space policies. Let’s restate HP’s recommended sparing option
which is the Default policy. The NinjaSTARS calculations are a bit simpler than the actual algorithms, but are very useful in
understanding the impact of the two spare space policies on usable space, especially in small configurations.
Note
NinjaSTARS (STorage Assessment, Recommendation, and Sizing) is an HP 3PAR sizing tool used by account teams to size
storage solutions. If you are sizing a HP 3PAR array, contact your local account team for more information about
NinjaSTARS.
The main menu bar includes a pull-down as highlighted in the screen-shot below. It allows you to choose a sparing policy of
Default or Minimal.
Figure 3. NinjaSTARS Default Sparing Algorithm
Figure 3 shows , NinjaSTARS estimating 8.2 TiB usable capacity from the small configuration when using the Default Sparing
Algorithm. The “Usable vs. Overhead” box on the right of the NinjaSTARS screen shows it is allocating 1.60 TiB for spare
space in this configuration.
In figure 4 NinjaSTARS is now estimating a usable capacity of 8.8 TiB following a change to the configuration to use the
Minimal Sparing Algorithm. The source of this change can be seen in the “Usable vs. Overhead” box on the right where spare
space is now 0.80 TiB.
10
Figure 4. NinjaSTARS Minimal Sparing Algorithm
The usable capacity difference of about 600 GiB is the result of the sparing algorithm changes between Default and Minimal.
In small configurations (less than 48 physical devices), the difference is at most the space of one drive. In this example using
900 GB drives configured in RAID 5 (3+1), the one drive difference is about 600 GiB as reflected in the NinjaSTARS output.
Adaptive sparing
HP announced adaptive sparing on some SSDs in 2014 as a unique, patented way to take spare space to the next level.
HP 3PAR has always addressed the need for spare space better than some with distributed sparing. This is native to
HP 3PAR and it eliminates unused performance that plagues some vendors’ dedicated spare disk implementations.
Adaptive sparing takes the solution to the next level by matching the SSD vendors need for extra space to manage
endurance with HP 3PAR spare space needs. Current NAND flash technology has a property where write operations wear
the flash chips and they eventually wear out. Vendors use many features to extend the life of the flash such as wear leveling
and over provisioning and today’s NAND flash endurance is guaranteed by HP in HP 3PAR StoreServ arrays for five years.
NAND flash over provisioning is key to managing endurance, but it’s expensive and limits the stated capacity of the SSD to
less than the quantity of flash chips in the device. Overprovisioning is in many ways the opposite of thin provisioning (TPVV).
Thin provisioning allows HP 3PAR StoreServ arrays to tell a host a particular VV has more capacity available than is currently
written (the host believes there is more storage than is currently provisioned). SSD overprovisioning reports less capacity
than the sum of the flash chips. A 480 GB SSD, for example, might have a total of 576 GB of flash chips where the additional
96 GB (20 percent) is used for overprovisioning.
Overprovisioning is a design feature incorporated into SSDs by flash vendors. HP 3PAR Adaptive Sparing works with SSD
vendors flash devices to maximize SSD space and endurance. Adaptive sparing matches SSD overprovisioned space with
HP 3PAR spare space, which is allocated, but not put into service until needed. The following figure shows how this works.
11
Figure 5. HP 3PAR Adaptive Sparing
In figure 5 the traditional architecture shows the overprovisioned flash chips used to manage endurance in the blue space
labeled Internal OP. Distributed sparing will allocate spare chunklets as indicated by the gray space, further reducing space
available for user data.
The Modern HP 3PAR architecture in figure 5 shows Adaptive Sparing merging the HP 3PAR spare space and part of the
internal overprovisioning space. The result is significantly greater user space while preserving over provisioned space during
normal operations. Adaptive sparing allows a 1.6 TB SSD, for example, to have 1.92 TB of stated capacity, representing a
20 percent increase in usable capacity and allows HP 3PAR to deliver a five-year lifespan for the drive.
In normal operations, adaptive sparing allows an SSD to operate with both increased space available to users and the full
complement of overprovisioned space designed by the HP partner flash device OEM.
When a failure occurs, however, the SSD will have to operate on a reduced amount of overprovisioned space for a time as
spare chunklets are called into use by the HP 3PAR sparing policy. When the failed component is returned to service, spare
space and overprovisioned space are once again merged.
Adaptive sparing adds an additional consideration to the sparing algorithm. The sparing algorithm starts with a spare
allocation equal to the size of the two largest drives when using the Default setting. An additional consideration with
adaptive sparing is to allocate a minimum of 10 percent spare space per drive for drives with adaptive sparing.
Adaptive sparing examples

Adaptive sparing requires a minimum of 10 percent of each adaptive sparing drive be used for sparing. The impact of this
requirement will depend on the configuration. A couple of examples will serve to demonstrate this impact.
On a system with 24 1.92 TB SSDs, a default sparing policy would reserve two drives’ worth of spare space. The 24 1.92 TB
SSDs each have 1787 chunklets and collectively have 42888 chunklets. The default sparing policy would reserve 3574
chunklets or about eight percent of the chunklets. Since these drives implement adaptive sparing, they must allocate a
minimum of 10 percent for spare space, so the 3574 chunklets would be increased to 10 percent or 4289 chunklets.
As a second example, consider a HP 3PAR 7400c with eight 480 GB cMLC SSDs and eight 1.92 TB cMLC SSDs using a sparing
algorithm of default. In this configuration, the sparing policy will allocate space equal to two of the largest drives, which is
3574 chunklets (1787 chunklets on each 1.92 TB SSD). The 3574 chunklets will be spread equally across all 16 drives
resulting in 223 chunklets per drive, which is more than 10 percent, and no additional adjustments are required.
Adaptive sparing offers many benefits including larger usable space and reduced cost. As we have just seen, the HP 3PAR
sparing policy incorporates adaptive sparing into its policy.
12
Predictive drive failures

We have seen how HP 3PAR responds to drive failures with distributed sparing to restore redundancy, but so far we have
not considered the drive failures themselves. Drive failures have been happening since the first drive was built in the 1950s,
and they will continue, but what if we could detect a drive failure before it occurs?
A lot has been learned from projects like Predictive Failure Analysis (PFA) at IBM in 1992, the IntelliSafe project at HP/Compaq
in 1995, and the disk failure study at Google™ in 2007. The HP/Compaq IntelliSafe project was later supported by IBM,
Seagate, and others, and became an industry standard known as SMART (Self-Monitoring, Analysis and Reporting
Technology) in 2004. HP 3PAR leverages this knowledge and builds on these advances to identify failing drives before they
fail. When failing drives are identified and addressed before they fail, data protection is increased.
In addition to SMART technologies to detect failing drives, HP 3PAR StoreServ monitors drive performance and uses slow
drive performance as an additional indicator of impending drive failure. Slow drive monitoring is implemented in a regular
HP 3PAR task.
This task runs regularly to check for indications of degraded drive performance that could be an indication of an impending
failure. Checks are made of device service times, IOPS, throughput, and more, with appropriate adjustments for current
device loading to determine if levels indicate a likelihood of device failure. If a drive does not pass all checks, it is marked as
degraded; the sparing routines evacuate the drive and flag it for service.
Predictive drive technologies like SMART and slow disk monitoring are examples of how HP 3PAR is protecting data.
Conclusion
HP 3PAR distributed sparing provides protection from drive failures while avoiding costly idle resources such as hot spares.
Many-to-many rebuilds quickly restore redundancy and minimize exposure to multiple failures. Sparing schemes offer
choice to the storage administrator, to tailor the sparing policy to the needs of the environment. The CLI offers the
commands needed to monitor the sparing policy and implement changes when necessary. The NinjaSTARS sizing tool is
aware of spare space and includes this calculation in space estimates. Finally, adaptive sparing allows HP 3PAR spare space
to work with SSD overprovisioned space, to provide more usable space and lower the cost per GB.
Resources
HP 3PAR Command Line Interface Reference
HP 3PAR StoreServ Storage: optimized for flash
News Advisory: HP Delivers All-flash Arrays for the Mainstream
HP 3PAR StoreServ Storage best practices guide
An Introduction to HP 3PAR StoreServ for the EVA Administrator
Learn more at
hp.com/go/3PARStoreServ
Sign up for updates

hp.com/go/getupdated Share with colleagues Rate this document
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for
HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as
constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
Google is a registered trademark of Google Inc.
4AA6-0776ENW, August 2015

3PAR Sparing (Aug 2015)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3PAR Sparing (Aug 2015)

Uploaded by

Copyright:

Available Formats

Technical white paper

How much spare space do I need?

Sparing algorithms defined

Nearline (NL) drives are special

Spare space implementation: admithw

How does it all work?

Viewing spare space

Changing spare space policy

lab-eosxx cli% setsys SparingAlgorithm Default

lab-eosxx cli% admithw

Checking for drive table upgrade packages

Checking for drive table upgrade packages

System performance during sparing

Figure 4. NinjaSTARS Minimal Sparing Algorithm

Figure 5. HP 3PAR Adaptive Sparing

Adaptive sparing examples

Predictive drive failures

Sign up for updates

Google is a registered trademark of Google Inc.

4AA6-0776ENW, August 2015

You might also like