You are on page 1of 4


Understanding how pool LUN ownership and trespassed LUNs can impact performance as well as uptime
The following article applies to the VNX 05.31.x.x.x and 05.32.x.x.x. families of code only. The introduction of Pool LUNs provided users with many benefits not possible with traditional LUNs. Pools support Thin LUNs, and Thin LUNs offer customers more flexibility when configuring and allocating storage since Thin LUNs allocate their space from the pool on demand, and can be over -provisioned. Overprovisioning allows a system administrator to present more potential storage space to users than is actually available in the pool. Since the space is allocated and consumed on demand, the users enjoy more flexibility in their planning and allocating of storage space. Pools can be easily expanded on an as needed basis. Along with the increased flexibility and simplicity offered by Pools comes some underlying changes in the architecture that have very real implications on design and performance that not everyone is familiar with. This article will discuss the impact of trespassed Pool LUNs, one of the most common configuration problems associated with Pool LUNs, and will provide tips on how to avoid unexpected performance problems. Pools allow for the creation and deployment of Thin LUNs. Thin LUNs differ from traditional LUNs in several key ways. When a traditional LUN is created, all of its configured space is carved out and allocated up front. The traditional LUN is assigned a default owner, and a current owner. Initially, these are both set to the same storage processor (SP). During the life of a traditional LUN, circumstances may cause the LUN to trespass over to the peer SP. When this happens, the LUNs current owner will change. When a traditional LUN resides on its non-default SP, there is no significant performance impact other than the increased load on the peer SP associated with the extra LUN or LUNs that it now owns. When a Pool is created, a large number of private FLARE LUNs are bound on all the Pool drives, and these drives are divided up between SP-A and SP-B. When a Pool LUN is created, it uses a new category of LUN ownership called the allocation owner to determine which SPs private LUNs should be used to store the Pool LUN slices. For example, a Pool LUN created with SP-A as its allocation owner will be allocating its space from private FLARE LUNs owned by SP-A. When a Pool LUN is created, its allocation owner is the same as its default and current owner. A Pool LUNs default owner should never be changed from its allocation owner. If a Pool LUNs current owner differs from its allocation owner, I/O to that LUN will have to pass over the CMI bus between SPs in order to reach the underlying Pool private FLARE LUNs. This is inefficient and may introduce performance problems. When Pool LUN ownerships (default, allocation, and current) do not match, it creates a potentially sizeable performance bottleneck. If enough LUNs have inconsistent ownerships, performance can bottleneck to the point where host I/Os can timeout and even result in data unavailability in extreme cases. Every effort should be made to maintain consistent ownership of Pool LUNs. EMC recommends avoiding prolonged periods of having trespassed Pool LUNs. For more information about Pool LUN ownership settings, see knowledge-base article 88169: Setting the Pool LUN default and allocation owner correctly. For information about several other performance related concerns specific to Pool LUNs, see knowledge-base article 15782: How to fix Pool performance issues.

September 2013 Pool LUN ownership concerns. 248 Day reboot issue reminder. Vault drive replacement procedure tips. Array power down tips. Target code revisions and key fixes. Storage pool reserved space requirements by software revision. VNX Block OE 32.207 key fixes. 1 2 2

2 3 4

Follow us on the web! Search the VNX Series page for Uptime Bulletin here: ucts/12781


More VNX Storage Systems Approach 248 Day Reboot

The following article applies to VNX Block OE code family 05.31.x.x.x only. More VNX Storage Systems running VNX Block OE versions earlier than are approaching 248 consecutive days of uptime; this makes them vulnerable to a known Storage Processor (SP) reboot issue that may occur on systems running iSCSI. VNX systems with iSCSI-attached hosts may have a single or dual storage processor (SP) reboot after 248 days of uptime. This is a concern for systems running VNX Block OE versions earlier than It is possible that these reboots could occur at the same time. If this happens, vulnerable storage systems are at risk of a brief data unavailability (DU) or even a cache dirty/data loss situation. This known issue is one of the most important reasons to ensure that your storage system is running newer code. The current target VNX Block OE code is Any code that is or later contains a fix for this issue. References Refer to ETA emc291837: VNX: Single or dual storage processor (SP) reboots on VNX systems after 248 days of runtime for detailed information on this issue, or when contacting support.

Vault drive replacement procedure tips and best practices.

On the VNX series storage platform, the first 4 drives in enclosure 0 (slots 0, 1, 2, and 3) are considered the vault drives. In addition to potentially holding user data, these drives also hold the VNX OE software. Replacing all of your vault drives should not be undertaken lightly, and the complete swapping of vault drives should only be performed by EMC personnel. It is always a good idea to generate a full set of spcollects before and after the entire procedure has been attempted. In the unlikely event that anything should ever go wrong with the procedure, having full sets of spcollects from immediately before and after the attempted procedure can be the key to successfully recovering data. Note that you cannot swap NL-SAS drives in the vault with SAS drives. Mixed drive types in the vault are not supported. There are many other specific steps and restrictions spelled out by the official EMC procedures for swapping out vault drives. Among them is the order in which the drives are replaced. While this order is no longer as critical as it once was in past storage system generations, it should still be followed. The amount of time you delay between removing a drive and inserting the new replacement drive continues to be a very important factor, however. You should always ensure that you wait at least 30 seconds before inserting the replacement disk. This allows the VNX OE enough time to fully realize that the drive has been removed and replaced with a new disk. If a replacement drive is inserted too quickly after removing the old drive, DU/DL could occur. This delay between removing a drive and inserting a new one is important when proactively swapping out any drive that has not yet failed.

Common mistakes and serious ramifications of incorrect or unplanned power downs of VNX storage systems.
Past VNX Uptime Bulletins have shared details of many bug fixes or best practices which can help avoid outages and improve uptime statistics on your VNX storage system. It may surprise you to learn that the most common, ongoing cause of outages on the VNX platform is the improper shutdown of the VNX storage system. Never power down your system simply by pulling cables. Pulling cables to achieve a power down of your system can bypass the systems ability to protect its cache. Detailed VNX storage array power down procedures can be found here:

How do I subscribe to EMC Technical Advisories?

Within EMC Online Support, users can subscribe to advisories for individual products from the Support by Product pages. Under Advisories on the left side of a given product page, click Get Advisory Alerts to subscribe. You can also view and manage subscriptions for multiple

products within Preferences > Subscriptions & Alerts > Product Advisories.

VNX/VNXe Target Revisions

EMC has established target revisions for each product to ensure stable and reliable environments. As a best practice, it is recommended that you operate at target code levels or above to benefit from the latest enhancements and fixes available. Search adoption rates in Powerlink for current VNX/VNXe target code adoption rates.

VNXe OS VERSION UNIFIED VNX CODE VERSIONS (7.0 & R31) (for File OE component) (for File OE component) (for Block OE component) (for Block OE component) UNIFIED VNX CODE VERSIONS (7.1 & R32) (for File OE component) (for File OE component) (for Block OE component) (for Block OE component)
Note: VNX OE or later requires PowerPath 5.7 or later in order to make full use of the ODX and Thin Provisioning features. It will otherwise function acceptably with older versions of PowerPath. on low level correctable errors.

RELEASE DATE 05/29/13 05/29/13 RELEASE DATE 10/12/12 07/15/13 08/24/12 07/15/13 RELEASE DATE 05/16/13 08/15/13 05/16/13 08/15/13

STATUS Target Latest Release STATUS Target Latest Release Target Latest Release STATUS Target Latest Release Target Latest Release

NDU adaptive timer feature. Driver fix to prevent iPv6 issues noted. New LCC firmware, CDES 1.33/7.05. Fixes various issues noted on LCCs. Fix for AR 497033, coherency errors may be reported after drive probation.

File code enhancements in

Fixed a watchdog panic involving SMB2 durable file handles. Fixed a loss of access issue involving disconnected SMB2 clients that have compression enabled. Fixed an issue with high CPU and memory usage resulting from the use of the VNX Monitoring M&R Tool or the EMC DPA tool. Fixed a possible rare deadlock condition when attempting to stop the virus checker. Fixed a possible Data Mover deadlock on a delete/move operation on a file with an SMB2 lease granted. Improved performance when many users are simultaneously connected to Kerberos. Fixed an issue preventing the standby controller from taking over in dual controller configs when the primary controller was rebooted with the reboot f command. Fixed and erroneous de-duplication error message related to deep compression. Fixed an issue which could result in performance degradation from unnecessary virus scans when a file was opened for metadata access only.

File code enhancements in

Protects against instances of invalid block
numbers in a block map seen during checkpoint merges which could cause DM panics.

Fixed an issue that could cause loss of CIFS

access when an SMB2 client attempts to rename a directory containing files opened with the delete on close option.

Flare code enhancements in release

Persistent Ktrace logging feature added. In family conversion fix. Fixed VNX7500 memory conversion panic. Fixed multiple potential NDU issues. Fixed a bug that could take Thin LUNs offline during online disk firmware upgrade. Fixed panic on invalid request from Seagate drives with ES0E or ES0F firmware. Support for RP Splitter 4.0 including iSCSI. Fixed RecoverPoint Splitter / VAAI panic issue detailed in emc327099. Fixes for are on page 4.

Fixed an issue which could cause various

Control Station commands to crash when the storage reported the spindle list of the disk group as empty.

Flare code enhancements in release

Fix for ODFU/Thin LUNs going offline. Rebuild Avoidance Fix. Completion Timeout fix. VMWare auto-restore fix. Reset SAS controller chip to avoid a panic

Why is it important to leave some free space in a storage pool, and how does the recommended amount of free space vary by software revision?
While a storage pool can operate at 100% consumed space, it is best practice to leave some space available as free overhead. The amount of recommended space left free varies by VNX OE software revision. In VNX OE releases in the R31 family, we recommend leaving 10% free space, so that a storage pool is no more than 90% full. In VNX OE releases in the R32 family, some changes were made to pre-allocate some space in the background, so best practice dictates that you only need to leave 5% free space. Two specific areas of functionality can be negatively impacted if pools do not leave enough free space. The FAST VP feature requires free space in the pool in order to perform its slice relocations. When there are not enough free slices remaining in the pool, it could limit the amount of relocations that can run at once. This can impact pool LUN performance, or can cause slice relocations and auto-tiering migrations to fail. Although it is not common, LUNs in a storage pool with 0% free space can become offline to a host. If Pool LUNs are taken offline for any reason such that they require EMC to run a recovery on them using EMCs recovery too ls, those recovery tools also use the free space remaining in a pool. If there is not enough free space for recovery tools to run efficiently, a pool may need to be expanded in order to give it more free space. However, expanding a pool is a time consuming option and can prolong the duration of an outage. Pools can be configured to warn the user as they approach various thresholds of available space consumption. For more information about the free space requirements for FAST VP, see knowledge base solution 00078223: Storage Pool does not have enough free space for FAST

VP to relocate slices.

VNX Block OE contains 3 key fixes.

VNX Block OE, released on 8/15/2013, contains the following three fixes mentioned in the futures section of last quarters Uptime Bulletin: 1. 2. 3. A fix for a performance degradation issue introduced in R32.201, related to running VDI applications on the VNX storage system. A fix for a connectivity issue whereby Thin LUNs could disconnect from hosts when connected to the VNX using the FCoE protocol. A fix for an extremely rare, hardware induced, simultaneous dual storage processor reboot. point during the upgrade. It is estimated that the bug has impacted less than 1% of upgrades. VNX OE fixes this issue and prevents it from causing a panic during any NDU. Sometimes the RCM team or other EMC personnel may recommend multi-step NDUs that involve passing through as an interim step when upgrading from R31 to R32.

Upgrading to R31.727 as an interim step prior to upgrading from R31 to R32 can avoid a small risk of an outage:

Should this panic occur during the NDU, it may or may not cause a data unavailable The latest VNX OE release for the R31 family, event, depending on the timing of the panic., has a fix for an uncommon If the panic occurs on one SP while its peer is bug which could impact a very small percent- rebooting to install software, then it can Note that EMC has not made this interim NDU age of NDU upgrades from the R31 family cause a brief data unavailable situation to step mandatory at this time because of the into the R32 family. The bug can cause a any attached hosts. very low rate of occurrence of the problem. storage processor (SP) to panic at some
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND W ITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.EMC 2, EMC, E-Lab, Powerlink, VNX, VNXe, Unisphere, RecoverPoint, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners. Copyright 2013 EMC Corporation. All rights reserved. Published in the USA, September, 2013.