You are on page 1of 5

I D C V E N D O R S P O T L I G H T

Taming the Storage I/O Monster in Virtualized


Datacenters
July 2010
Adapted from Worldwide Virtual Machine Software 2010–2014 Forecast: A First Look by Gary Chen,
IDC #222414
Sponsored by Pancetera

Virtualization has changed the way enterprises meet their IT needs by increasing capabilities while
consolidating resources and meeting cost-cutting demands. One key challenge with virtualization is
the demands the technology places on storage and storage management. In particular, sharing
physical resources among multiple virtual machines (VMs) places an additional burden on storage
during backups, virus scans, and other ancillary operations. This Vendor Spotlight examines how the
increasing consolidation of servers using virtualization has placed greater demands on storage
management and discusses the role that Pancetera plays in an increasingly important virtual storage
management market.

Virtualizing the Datacenter


Since its emergence, VM technology has become one of the most disruptive technologies in IT
infrastructure. Over the years, the ability to virtualize servers and reclaim excess capacity has caught
the interest of datacenter managers who sought to reduce capital spending and faced power, cooling,
and space problems. IDC has seen spending on virtualization shift from consolidating software
development and testing environments toward trying to consolidate production applications within the
IT infrastructure.

IDC maps the history of virtualization as follows. The Virtualization 1.0 era, from 2005 to 2008, was
marked primarily by consolidation of test and development environments. The Virtualization 2.0 era
followed beginning in 2008 and was marked by production usage. Today, as virtualization transitions
from the 2.0 era to the 3.0 era, IDC is starting to see it going well beyond consolidation to deliver new
datacenter architectures.

The Virtualization 3.0 period is a shift to a more dynamic infrastructure where resources are pooled
and allocated seamlessly to applications as needed, often referred to as the dynamic datacenter or
private cloud. Economies of scale and scope will drive cost down further by delivering greater
utilization, automation, and just-in-time provisioning, resulting in lower capex as well as operational
efficiencies that earlier generations of virtualization couldn't deliver. Achieving the benefits of
Virtualization 3.0 will require the resolution of challenges associated with systems management,
storage management, networking, and quality of service.

The virtual machine software (VMS) market is expected to continue to grow. IDC forecasts that the
VMS market will rise from over $1.5 billion in 2009 to nearly $2.7 billion by 2014, representing a
CAGR of 11.7%. A partial view of the overall hardware and software virtualization-related ecosystem
built by IDC currently forecasts the ecosystem to grow from nearly $30 billion in 2008 to over $46
billion in 2013. This represents a CAGR of 9.2%.

IDC 986
Many organizations will make new deployments aboard virtualized servers, not just because of the
benefits of consolidation but also because of other operational benefits, including, but not limited to,
resource optimization, legacy OS and application encapsulation, and physical server independence.
Virtualization 3.0 will include dynamic resource scheduling to ensure service levels by balancing
workloads; pooling of many smaller resources into one larger logical resource to achieve higher
utilization overall; and tiering based on service levels, functionality, and costs.

Virtualization's Impact on Storage


As the adoption of server virtualization expands, the corresponding need for storage to support these
virtualized environments is evolving. A number of challenges arise as organizations deploy virtual
machines in greater density, including:

„ Contention for available I/O on the storage system and networks

„ Increase in storage consumption as a result of virtual machine proliferation

„ Increased complexity due to additional layers of abstraction making troubleshooting challenging

„ Excessive demands on system resources during necessary operations such as backups, virus
scanning, and copying of VMs either through replication or a mechanism such as VMware's
Storage vMotion

While organizations are striving to do more with less by increasing utilization of assets and reducing
waste through consolidation, unexpected costs and challenges have arisen that have made the
benefits of virtualization difficult to realize.

One area in particular that needs attention is data and system protection. A virtual machine is a single
file that contains the virtual machine image (operating system, application) and often data associated
with the application or file system hosted inside the virtual machine. Protecting these files requires
backing up the state of the virtual machine host, the virtual machines themselves, and the data
associated with virtual machines. The traditional ways of doing backups often put excessive strain on
the systems, causing application performance to degrade to unacceptable levels. This drives
managers to skip backups of virtual machines that hold no data. While a failure or loss of this VM
may not cause data loss, it is time consuming to rebuild the entire OS and application and restore the
exact configurations. Virus scans, classification and indexing, Storage vMotion, and replication are
other operations that may also cause resource contention.

The I/O Challenge


Virtualized servers create some discrete I/O challenges for storage. A virtualized server may host
multiple virtual machines, each with its own I/O requirement. To optimize utilization, managers design
environments based on the demands of the applications running inside the virtual machines.
Administrators carefully balance the I/O load on the storage system; any unexpected changes in I/O
load may disrupt the environment, causing degradation of performance or even downtime for some
applications. A typical backup requires the backup agent to crawl the file system to identify changed
files and then to make calls to the disk drives to retrieve the changed blocks and copy them to a
secondary storage media. This operation takes up an incrementally insignificant amount of I/O on a
physical host, but when activated on virtual machines, it may cause a significant I/O demand because
the operations represent a much more significant workload in aggregate than individually. Similar
results occur during other ancillary operations such as search, antivirus scanning, or copying.

In virtual desktop deployments, the storage challenges are exacerbated by the increased
virtual machine–to–physical machine ratios. In a server virtualized environment, the overall industry
average consolidation ratio is approximately 6–7 VMs per server with advanced deployments getting

2 ©2010 IDC
as high as 20–30 VMs per host; in virtualized desktop environments, the ratio may be as high as
100:1. As a result, the I/O convergence and concentration problem is greater.

In both server and desktop virtualized environments, the probability of duplicate blocks is relatively
high. Because each virtual machine or desktop is likely to run the same operating system, it is
possible that over 90% of the operating system files are identical. This gives rise to the need for very
effective deduplication that must be augmented with very fast media, such as SSDs. But again, this
adds more pressure to find a way to lessen redundant I/O demands.

As firms place more important workloads on virtual machines, it is critical to ensure that applications
and virtual machines be given the resources needed and that the VM and storage administrators
have visibility into the infrastructure in order to effectively manage demands on resources. Lack of
resources and visibility into the end-to-end infrastructure can impact a manager's ability to ensure
performance, availability, and consistency in service levels.

Some organizations stick with the traditional approach to storage and backup of servers and
desktops, placing an agent on each host and guest system. The benefits include using existing
processes, no incremental costs, and the advantage of single file recovery. However, there are
potential I/O implications with placing multiple agents for virtual machines on a single physical host.

Another approach is to back up all the virtual machines as files. This method doesn't address the
need to manage backup windows. It also does not provide for image-level recovery of an entire virtual
machine and requires a service console or consolidated backup approach to achieve desired results.

An alternative approach is consolidation of storage and backup by using an agent at the service
console or using a virtualization vendor's custom consolidated backup method. The benefits of this
approach include the avoidance of many agents and the advantage of virtual machine–level recovery.
In using a backup proxy, enterprises can offload the physical host, reduce network traffic, and
minimize backup windows. The downfalls of a consolidated approach include additional infrastructure
costs; performing a consolidated backup requires all VMDK to be copied to another location. This
overhead may represent as much as 20% of the resources. Though file-level recovery may be
achieved using this method, it requires additional processes and resources and adds complexity and
time to recovery. Many organizations that have virtualized environments are reluctant to deploy
consolidated backups due to the cost and the operational complexity that is more difficult to manage
as the environment gets bigger.

In addition to the I/O challenges backups create in server virtualized environments, the
overprovisioning of virtual machines' storage and the need for traditional backup agents to read and
copy all blocks regardless of whether they represent actual data or not consume unnecessary
bandwidth, putting further strain on allocated infrastructure resources.

The bottom line is that the market needs a more efficient and effective way to run ancillary operations
that consume resources and put unacceptable strain on consolidated environments. These
operations are required to ensure availability, security, and integrity of applications and data.

Considering Pancetera
Pancetera is a Santa Clara, California–based provider of virtual storage management solutions. The
company, founded in 2008, has received venture capital funding from Hummer Winblad Venture
Partners and Onset Ventures. Pancetera's founders have extensive expertise in virtualization,
storage, and systems management, having held key positions with Thinstall (acquired by VMware) as
well as with Data Domain and Legato (both acquired by EMC Corp.). The company also has
partnered with CommVault, IBM Tivoli, Riverbed, Symantec, and VMware as it focuses on optimizing
VM backup, data migration, and disaster recovery types of workloads.

©2010 IDC 3
Pancetera's virtual appliance resides on a single host and is initially designed to reduce the I/O,
complexity, and cost of data protection for virtual environments. The virtual appliance aggregates all
virtual machines into a unified view that can serve as a mount point for backup and restore
applications or other applications requiring visibility and access to the virtual environment. As a result,
Pancetera eliminates the need for a consolidated backup; reduces the I/O and bandwidth
requirements associated with backups; and enables organizations to achieve consistent, efficient,
and effective management of their virtual machines. In a virtual environment where Pancetera is
deployed, organizations may see significant improvements in performance, reduced contention for
I/O and bandwidth during backups and other operations, lower cost per VM achieved by reducing the
required infrastructure to support backup and backup agent software, and a consolidated view of the
whole virtual environment.

Pancetera's virtual appliance consists of two key technologies, SmartRead and SmartView.
SmartRead reads file system metadata to determine which parts of a virtual machine disk file are
actually in use and does not copy unallocated or unused blocks. Using progressive optimization, the
software also determines which blocks have changed since the last time a disk file was read and
copies only the changed blocks. Finally, SmartRead avoids reading duplicate blocks that exist within
multiple VMs. This dramatically reduces unnecessary disk I/O, minimizing storage and network load
associated with management tasks and streamlines data management functions such as Storage
vMotion and VM replication.

The company's SmartView technology provides a unified view of all virtual machines, offloading
agent-based workloads from production hosts that might affect end-user performance. The unified
view can be seen and interacted with through a simple CIFS or NFS network drive mount. The ability
to export virtual machines via CIFS and NFS creates a unique capability. Significant improvement in
performance can be achieved when moving VMs across the WAN by deploying Pancetera
technology in concert with WAN optimization. Pancetera exports VMDK via CIFS or NFS, and WAN
optimization providers work well with these protocols. The combined solution can reduce the overall
cost of having to move virtual machines across the WAN.

Because Pancetera is deployed using the Open Virtualization Format, the appliance is independent
of processor and host architectures, enabling the solution to run in a broad range of hypervisors.
Currently, the company's virtual appliance supports VMware ESX 3.5 and 4.x.

Challenges
Pancetera is entering a competitive and dynamic market space. Innovative solutions are continuously
being introduced by existing and new vendors in an effort to deliver the solution and establish a
dominant presence in the market. Pancetera, like all start-up technology companies, faces
challenges. While IDC sees organizations finally willing to make some technology investments as the
economy continues its slow turnaround, competition for those limited IT dollars is intense. Pancetera
must compete with incumbent vendors and other new entrants to the market while building
awareness, educating users, and establishing a customer footprint. Enterprises are typically reluctant
— especially in tight times — to purchase technology from smaller, non–"brand name" companies.
However, many enterprises are reaching a point where they need to address the problem, and if their
options are inadequate, they may be open to exploring solutions from smaller vendors. One way to
establish credibility with the market is to publicize actual customer profiles that tell the story of how
the technology is to be used and the benefits an organization reaped as a result of the
implementation. Another way Pancetera can establish credibility while gaining a route to market is to
partner with existing vendors that may fill a hole in their product portfolio with SmartRead and
SmartView.

4 ©2010 IDC
Conclusion
As virtualization continues to be "must-have" technology for enterprises looking to consolidate and
optimize their resources, IT will need a very intelligent management layer that will automate most
tasks through policy-driven, service-oriented approaches. A recent IDC survey shows that
management of virtualization is becoming a business priority for enterprises as their IT environments
become more complex. Tight integration between virtual resource management processes and
physical systems, such as storage, also is a priority.

As virtual infrastructure environments become more complex, IT organizations can no longer


effectively manage change, configuration, capacity planning, provisioning, and other day-to-day
management activities using highly manual or ad hoc processes. This is clear especially when it
comes to storing and backing up the huge volumes of data generated by business today,
exacerbated by the storage and processing demands associated with virtualization. Demand for
sophisticated virtual storage management tools, automation, and integration with physical system
storage management processes and tools will increase as more and more organizations scale out
production virtual machine use.

The shift to a new era of virtualization, Virtualization 3.0, will only further strain I/O as the datacenter
becomes more dynamic with a higher level of VM mobility. The rise of public clouds and the
integration between private and public clouds into a hybrid model further extend the VM mobility
problem to the WAN. The enablement of fluid movement of VMs between internal and external clouds
will be critical to enterprises in order to integrate external cloud resources seamlessly.

The challenge is for enterprises to find virtual storage management solutions that simplify an
environment growing in complexity. In particular, enterprises should seek out solutions that utilize
existing systems and storage management tools and processes as much as possible. These
solutions also should integrate multiple aspects of managing and reporting on storage and work with
more comprehensive datacenter solutions.

Most important, there is the potential for virtualized storage management solutions to add
unnecessary complexity and I/O overhead, which can ultimately slow end-user performance.
Enterprises need to find the right solution, or combination of solutions, that meets their business
objectives for storage in a virtualized environment without sacrificing I/O performance. And the
sooner companies address their virtualized storage management needs, the better.

To the extent that Pancetera can meet the challenges described in this paper, the company has a
significant opportunity for success in the virtual storage management market.

A B O U T T H I S P U B L I C A T I O N

This publication was produced by IDC Go-to-Market Services. The opinion, analysis, and research results presented herein
are drawn from more detailed research and analysis independently conducted and published by IDC, unless specific vendor
sponsorship is noted. IDC Go-to-Market Services makes IDC content available in a wide range of formats for distribution by
various companies. A license to distribute IDC content does not imply endorsement of or opinion about the licensee.

C O P Y R I G H T A N D R E S T R I C T I O N S

Any IDC information or reference to IDC that is to be used in advertising, press releases, or promotional materials requires
prior written approval from IDC. For permission requests, contact the GMS information line at 508-988-7610 or gms@idc.com.
Translation and/or localization of this document requires an additional license from IDC.

For more information on IDC, visit www.idc.com. For more information on IDC GMS, visit www.idc.com/gms.

Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com

©2010 IDC 5

You might also like