You are on page 1of 11

EMC Storage Pool Deep Dive: Design Considerations & Caveats

Posted by veverything on March 5, 2011 This has been a common topic of discussion with my customers and peers for some time. Proper design information has been scarce at best, and some of these details appear to not be well known or understood, so I thought I would conduct my own research and share. Some time ago, EMC introduced the concept of Virtual Provisioning and Storage Pools in their Clariion line of arrays. The main idea for doing this is to make management for the storage admin simple. The traditional method of managing storage is to take an array full of disks, create discrete RAID groups with a set of disks, and then carve LUNs out of those RAID groups and assign them to hosts. An array could have dozens to hundreds of RAID groups depending on its size, and often times this would result in stranded islands of storage in these RAID groups. Some of this could be alleviated by properly planning the layout of the storage array to avoid the wasted space, but the problem is that for most customers, their storage requirements change and they very rarely can plan how to lay out an entire array on day 1. There was a need for flexible and easy storage management, and hence the concept of Storage Pools was born. Storage pools, as the name implies, allows the storage admin to create “pools” of storage. You could even in some cases, create one big pool with all of the disks in the array which could greatly simplify the management. No more stranded space, no more deep architectural design into RAID group size, layout, etc. Along with this comes a complimentary technology called FAST VP, which allows you to place multiple disk-tiers into a storage pool, and allow the array to move the data blocks to the appropriate tier as needed based on performance needs. Simply assign storage from this pool as needed, in a dynamic, flexible fashion, and let the array handle the rest via auto tiering. Sounds great right? Well, that’s what the marketing says anyway. First let’s take a brief look at the difference between the traditional RAID group based architecture and Storage Pools.

.

This example is the homogenous pool to keep things simple. Management. assigning LUNs from that RAID group while trying to fill the existing ones with proper LUN sizes. capacity.On the left is the traditional RAID group based architecture. You would have multiple RAID groups throughout the array based on protection level. performance and so on. and then carve a LUN/LUNs out of that RAID group and assign to hosts. Contrast this with having to build another RAID group. When you need more capacity. You assign disks to RAID groups. But what are the trade offs and design considerations? Let’s take a deeper look at a storage pool… . you just expand the pool. and complexity is greatly reduced. You simply assign disks to the pool and assign LUNs from that pool. On the right is the pool based approach.

the LBA (Local Block Address) corresponding with the host write has a 1:1 relationship with the Pool LUN. space is allocated in 1GB slices. the I/O is processed in a different manner than in a traditional FLARE LUN. However. As host writes comes into the Pool LUN. From there it will create 10 Private LUNs of equal size.Depicted in the above figure is what a storage pool looks like under the covers. it is a RAID5 protected storage pool created with 5 disks. if one were to create a 10GB Thick Pool LUN. In this example. When you create a LUN from this pool and assign it to the host. this space is contiguous and completely pre-allocated. there would be 1GB slices allocated across each of the 10 Private LUNs for a total of 10x 1GB slices. What FLARE does under the covers when you create this 5 disk storage pool is to create a Private RAID5 4+1 raid group. So. and so on as shown below: .5GB giving me a pool size of ~530GB. LBA 1-2GB writes on Private LUN1. I was using 143GB (133GB usable) disks. the I/O is going to one LUN on the array which is then written directly to a set of disks in its RAID group. This is what you would expect from a RAID5 4+1 RG (133*4 = 532GB). LBA corresponding to 0-1GB on the host would land on Private LUN0 since it contains the first 1GB slice. In my test case. as new host writes come into a Pool LUN. For Thick LUNs. In a traditional (ignoring Meta-LUNs for simplicity) FLARE LUN. meaning. LBA 2-3GB writes Private LUN3…. and the array created 10 Private LUNs of size 53.

but what happens when we expand the pool? Keeping in mind we need to expand this pool by a multiple of 5. As an example. if you ignored the 5 disk multiple recommendation. and proceed to create pools from non-5disk multiples. FLARE will create 2x 4+1 R5 Private RGs. the pool now looks like this: . with a 14disk R5 pool I would get (4*133)+(4*133)+(3*133)=~1460GB. I assume EMC creates these multiple Private LUNs for device queuing/performance related reasons. using 143GB disks (133GB usable). So you would end up capacity which is lower than you are expecting. This is important to consider when configuring the array as you could end up with one irate customer if multiple 300GB slices go missing over the span of the array! Next. and created a pool with 14 disks. bringing the total capacity to 530*2 = ~1060GB. you will NOT get the capacity you might expect. With a pool composed of 5 disks. Again this is a VERY important thing to note because it could lead to unexpected results if you don’t fully understand this. A difference of almost 300GB.These LUNs are all hitting the same Private RAID group underneath and hence the same disks. achieving 3x 4+1 RGs under the covers. and some things to consider when expanding the pool. and 1x 3+1 Private RG. In my case. things are pretty simple to understand because there is 1x 4+1 Private RG underneath handling the I/O requests. NOT a single 13+1 Private RG that you may expect. Underneath the covers. let’s take a look at some aspects of I/O performance. Caveat/Design Consideration #1: One very important aspect to understand is EMC’s recommendation to create R5 based pools in multiples of 5 disks. Not the expected (13*133)=1730GB. quite significant! The best option in this case is to add another drive and create a 15disk R5 pool. The pool algorithm in FLARE tries to create the Private RAID5 groups for as 4+1 whenever possible. lets add another 5 disks to it.

Reads to the original Pool LUN will still happen only across the first 5disks. Before putting the 2nd VM on the LUN. the data layout looked exactly as depicted above. and so will writes to the existing 10GBs LBAs that were previously written to. and no data on the Private LUNs of the second RAID group.After adding the 2nd set of 5disks. I brought my Pool LUN into VMware and put a single VM on it. and put another VM on it. this is what it looked like: . The Private LUNs currently have no data on them. the existing data is NOT re-striped across the new disks. and then expanded the pool. There was data spread across the Private LUNs associated with the first Private RAID group. So do not expect a sudden increase in performance on the existing LUN by expanding the pool with additional disks. FLARE has created another 4+1 Private RAID group and 10 more Private LUNs from that RAID group. In my testing. Design Consideration / Caveat #2: Note that when the Storage Pool is expanded. When I cloned another VM onto the LUN.

but VM2s data is spread across BOTH Private RAID groups and all 20 Private LUNs! Think about that for a second: 2VMs. but the first VM is still using the only first 5 disks. The actual allocation would show 100 slices (1slice = 1GB as previously mentioned) allocated across Private LUNs 09 for VM1. we would likely get something like this (not tested. Now this imbalance occurred because there was still free space in the first RG. and the other gets the I/O of 10disk striping. These are both 100GB VMs (in my testing).VM1s data is still spread across the first Private RG and first 10 Private LUNs as expected. but it still illustrates the point. on the SAME Storage Pool. at which point any subsequent VMs will get only 5 disk striping. extrapolating based on previous behavior): . If the pool was at capacity before being expanded. so the algorithm allocated slices there for the 2nd VM because it does show in a round robin fashion. talk about non-deterministic performance! That second VM will get awesome performance as it is wide striped across 10disks. If I keep placing VMs on this Pool LUN. they will continue to get 10disk striping. on the SAME VMFS. so all the slices aren’t depicted. one get the I/O of 5 disk striping. and 50 slices across Private LUNs 0-9 and 50 slices across Private LUNs 10-19 for VM2 as the overall slice distribution. UNTIL the first Private RG gets full.

you could get different levels data striping on the data sets.In this diagram. By this I mean. At some point. instead of 5 or 10. the blue simply represents “other” data filling the pool. the 2nd VM could not get slices from the first Private RAID Group (because its full) so its slices would come ONLY from the 2nd Private RAID group. and 1x 3+1 RG underneath. allocate all 15disks . it seems the best way to utilize storage pools is to allocate as many disks as you can upfront. and then expanded. From this. you would end up with 2x 4+1 RGs. If you just need enough space for 4more disks as an example. spreading its data across only 5 disks. Some of the VMs I/O could be striped across 10disks. Imagine a situation where a VM was created before the first Private RG filled up. if you have a tray of disks on a Clariion or VNX. Design Consideration / Caveat #3: As illustrated above If you expand a storage pool before it gets full or close to being full. and expanded the pool by 4 disks. If the pool was at capacity. Things can get even more hairy if you decide to add disks outside the 5disk multiple recommendation. and then my 2nd VM placed on it. you may get unpredictable I/O performance capabilities as depending on under what condition you expand the pool. instead of 10 like last time. some of the I/O could be restricted to just 3disk striping. and the rest across 5 disks as the first Private RG fills.

It would be good to avoid creating small disk count pools. There is one other issue to consider in pool expansion. if the pool is expanded at this point. Going by the 5disk multiple rule you should be safe adding 5 disks right? While this is something you can do. and need to expand the pool. and start placing data on it. and any data placed in the pool will get striped across all 15disks consistently. but now you need more space. and it will work. All of your I/O is being wide-striped across the 15disks and all is well. and expanding them frequently with 5disks at a time. This will give you 3x 4+1 RGs underneath. Let’s say you create a pool with 15disks. Before expansion. it may again give unexpected results. but the pool is at capacity (imagine it is full). as you could run into issues like the above very easily and not realize it.when creating the pool. your 15 disk R5 pool looks like this: All data is spread across 15 disks. here is what it would look like with any new VMs (or any data) are placed on it: .

But until then. you could be sorely disappointed as it is only getting 5 disks worth of data striping. the new data is only getting striped across 5 disks. but this may be overkill. and add another 15 disks to it. expecting very side striping. so should you now expand this pool by 30disks instead of 15? And then 60 disks the next time? As always. Another thing to watch out for is changing default SP owner of the Pool LUN. the recommendation to expand storage pools would be expand it by the number of disks it was initially created with. Because the LUN is made up of Private LUNs underneath. Design Consideration / Caveat #4: From this. these are some things to be aware of when designing and deploying a Storage Pool based configuration. understand the impact of your design choices and performance requirements before making any decisions as there is no blanket right/wrong approach here. As an example. if you have a 15disk storage pool. you could theoretically have some hosts I/O striping their data over 30 disks. Hopefully EMC will introduce a re-balance feature to the pool like what exists in the latest VMAX code to alleviate most of these issues. So if you have a 15disk storage pool. instead of the original 15! So if you placed a new VM on this device. that can introduce performance problems as it has to use .After the pool is expanded. expand it by another 15disks so the new data can take advantage of the wide striping. I have also heard people recommend doubling the storage pool size as a recommendation.

Also. I may write a follow up post to this illustrating some of these scenarios in a Thin provision environment on both host and array sides. but that should be something to be aware of when using Thin provisioning in general. always. it is still best to use traditional RAID groups. but rather writes in 8K extents. As always comments/questions/corrections always welcome! . and I will probably follow up with that later as well. so make sure to balance the pool LUNs when they are first created. This can cause even more unpredictable behavior under the circumstances outlined above. its about understanding the requirements and translating them into a proper design. and it is important to note that the ONLY way to get automated storage tiering is to use Pool based LUNs. Customers may have certain workloads that simply need dedicated disks. Utilizing Thin LUNs introduces a whole new level of considerations as it does not pre-allocate the 1GB slices. Again. Layering FAST VP on top of storage pools is an excellent solution for the majority of the customers. and I see no reason not use RAID groups for those use cases still. There is no question that using storage pool based approaches take the management headache out of storage administration.the redirector driver to get to the LUNs on the other SP. the good news is the EMC arrays give that flexibility. but architects should be aware of the considerations and caveats of any design. Then there comes the variable of utilizing Thin Provisioning on the host side adding another level of complexity in how the data is allocated and written. if ultra deterministic performance is required. Generally speaking. I did not even touch on some of the considerations when using RAID10 pools.