This action might not be possible to undo. Are you sure you want to continue?
Virtualizing Business-Critical Applications: Foundational Components
Separating “run-the-business” from other business applications and then identifying the IT infrastructure necessary to ensure their high availability, scalability and performance are a must for organizations that seek to reap the greatest operational benefits from emerging virtual computing architectures.
It should come as no surprise that the journey to the software defined data center (SDDC) requires fundamental shifts in how applications are deployed and managed. To fully realize the vision of SDDC, organizations must first embrace the fact that the journey includes not only moving 100% of their servers into the virtual world, but also 100% of the storage and network components that support them. As a practical matter, this becomes a journey that is far from easy. Getting all applications migrated into a virtual infrastructure platform alone requires new skills and ways of managing capacity. In addition, licensing issues require special attention as vendors also stay current with the idea that compute workloads will no longer be directly tied to physical hardware components. But most important to this journey is understanding and successfully migrating the most businesscritical applications onto virtual infrastructure such that they not only function well, but thrive. Put another way, not succeeding at getting the most complex and compute-intensive workloads to thrive in virtual infrastructure such that they are as easily deployed as any other application is one of the greatest barriers to achieving the goal of the SDDC.
One Size Does Not Fit All
When most organizations first deploy virtual infrastructure environments, they do so with the goal of reducing their data center footprint by consolidating server workloads onto fewer hardware components. This results in immediate and tangible savings. Then, over time, they begin to realize that the average virtual infrastructure environment, when properly tuned and managed, will provide notably higher levels of availability for those applications running on them. When combined with the initial cost savings achieved, organizations are often drawn to virtualize as much as they can. … And then they hit the wall.
cognizant 20-20 insights | august 2013
The first time a business-critical application requires higher levels of availability — or far greater compute resources — than is traditionally made available on basic virtual infrastructure, problems quickly arise. At first, the businesscritical application runs slowly and can become much more unstable. It is then moved back to its original physical infrastructure at least as quickly as it was moved onto the virtual infrastructure environment in the first place. Then the virtual environment is blamed. To be fair, the virtual infrastructure environment actually is to blame when this happens. But that’s usually due to a combination of the way the virtual infrastructure environment was configured and how the business-critical application was then deployed on top of it. Generally speaking, it’s not because virtual infrastructure platforms are ill equipped to handle these applications. What Makes an Application Business-Critical? Of note, the qualities that make an application business-critical often have little to do with the technology or platform said application uses. In the end, business-criticality is best determined by answering a simple question: Can I run my business without this application? From there, a corollary question emerges: How long can I run my business without this application? If the answers to these questions are “no,” or “not for very long,” then that application is critical to the business. Nevertheless, most business-critical applications share key technological characteristics. They include:
Because every application has something unique about the way it runs in any given environment, it’s easy to quickly reach a conclusion that every application will then have its own set of best practices that need to be explicitly defined to make that application thrive in a virtual infrastructure environment. In reality, this is not actually the case. The fact is that virtualization and virtual infra- Virtualization and structure environments do virtual infrastructure add a layer of abstraction of environments resources, and this abstraction layer changes the way do add a layer in which applications can of abstraction of be run. But the way in resources, and this which virtual infrastructure environments create abstraction layer this abstraction layer is changes the way in exactly the same regardwhich applications less of the applications running in that environment. can be run. But Thus, there exists a set of the way in which common practices that virtual infrastructure must be accounted for that will enable every business- environments create critical application to run this abstraction successfully on virtual layer is exactly the infrastructure. What’s actually different is the way in same regardless which these common ele- of the applications ments are expressed. This running in that expression is indeed as environment. unique as any application. Virtualization software vendor VMware identifies the following six key applications that are considered business-critical:
• High compute loads – either with heavy threading or heavy math processing. • High RAM utilization. • High and specialized I/O – particularly storage. • High availability configurations – often requiring OS or application clustering. • Complex networking configurations – public and private networks, often to support clustering.
Applications with any of these qualities will need extra care and attention to configuration and resource management in order to virtualize them successfully. Moreover, the majority of applications that do fall into the business-critical category have more than one of these qualities in play.
• • • • • •
Oracle – and Oracle RAC. Microsoft SQL Server. Microsoft Exchange. Microsoft SharePoint. SAP. Custom Java on Linux.
Most organizations run at least one of these six applications; all exhibit a subset of at least some of the characteristics listed above. Again, while they are not the only business-critical applications in use at most organizations, the independent research commissioned by VMware shows they are the most common ones. In addition, a second and less often found set of applications exist that businesses will often identify as business-critical. Again, these applications also share qualities that can make virtualization more difficult.
cognizant 20-20 insights
These “honorable mention” business-critical apps:
• • • • • • •
DB2. WebSphere. WebLogic. Hadoop/HBase. Cassandra. Tomcat. Message queue systems such as Tibco, Rabbit MQ, MQ Series, etc. • Custom, in-house built and maintained “homegrown” applications. Again, each of these applications will have specific, individual ways in which they should be tuned to thrive on a virtual infrastructure platform. This is no different than how they are optimized when running on bare metal hardware. But compute resources themselves are very consistent. Therefore, if an organization properly accounts for how an application will make use of its compute resources, common themes begin to emerge.
Always follow the KISS principle: As the Star Trek character Mr. Scott – or “Scotty” – once put it, “The more complicated the plumbing, the easier it is to stop up the drain.” There is elegance in simplicity of design. But more than that, simple designs are generally more stable, As the Star Trek more scalable and easier character Mr. Scott — to maintain. Businesscritical applications are or “Scotty” — once already inherently more put it, “The more complex, so adding complicated the complexity when virtualizing them only makes plumbing, the easier things worse. Examples it is to stop up the of mistakes in this area drain.” include: » Needlessly adding disks and spreading them across multiple data stores. Just because your physical server splits out a separate drive letter for each class of data, logs, etc., isn’t necessarily a reason to do the same in a virtual world. More than one disk — and even more than one data store — is often necessary, but take an eyesopen approach that stresses less rather than more. » Splitting out base files that are part of a virtual machine’s (VM’s) core components, including vswap and others, is not an effective way of increasing efficiency, performance or storage management. Sadly, it is a good way of introducing complexity, loss of function and loss of portability into your environment. » Duplicating features for high availability or redundancy through external or homegrown tools that are already present in the base systems or architecture. This often leads to managing or implementing abstracted features that don’t actually do what they are intended to. They also make troubleshooting more difficult.
The Four Food Groups of Computing
When planning a virtual infrastructure environment, architects are taught to consider the following four types of compute resources, which are sometimes referred to as the “four food groups” of computing:
• • • •
CPU. RAM. Disk – including both disk space and disk I/O. Network – including number of connections and bandwidth.
All applications (not just business-critical ones) consume different quantities of these compute resources at any given point in time depending on the tasks at hand. The difference is that most business-critical applications will consume disproportionate amounts of one or more of these resources compared with other applications. They also will have requirements for higher levels of redundancy, availability and recoverability compared with other applications. Remember, we answered “no” and “not for very long” to the questions about if, and for how long, we could run the business without these applications. The following set of general guidelines will help organizations assemble applications that thrive on virtual infrastructure:
Architect hardware from a “total performance” perspective: Your virtual environment should always be optimized from bottom to top – not top to bottom or from the middle out. High school and college students seem to be the most willing to put $6,000 stereos into $3,000 cars. This doesn’t work nearly as well with high-compute, business-critical
cognizant 20-20 insights
business-critical applications running on general class hardware with virtual infrastructure on top of it.
applications running on general class hardware with virtual infrastructure on top of it. Even vSphere will support High school and though the so-called “monster VM” college students with 64 vCPUs , 1TB of RAM seem to be the and a million IOPS, no VM truly be bigger or faster most willing to put can than the host hardware on $6,000 stereos which it runs. Make sure all into $3,000 cars. hardware components that part of the virtual infraThis doesn’t work are structure environment are nearly as well with appropriately sized to hanhigh-compute, dle the anticipated workloads placed on top of them. Be sure to also optimize resources across all four of the computing food groups. It’s easy today to become distracted by CPU cores and GHz speeds of the newest generation processors, and then forget about RAM — the compute resource that is almost always exhausted first on a virtual infrastructure environment. From a storage perspective, make sure to spread I/O appropriately across your storage area network (SAN). Take appropriate advantage of solid state drive (SSD) and cache capabilities to boost performance, and do so in a way that is easy to replicate. For IP SAN technologies – iSCSI and NFS – jumbo frames should be enabled as the norm.
From a network perspec- With today’s price/ tive, Gig-E connections are no longer enough. With performance today’s price/performance advantages, 10GbE advantages, 10GbE should should be the be the minimum standard for all network connectiv- minimum standard ity in virtual infrastructure for all network environments. Reserve connectivity Gig-E connectivity for out of bandwidth management in virtual of hardware only. As stan- infrastructure dards evolve and prices environments. recede, plan your network investments wisely to be ready to take advantage of 40GbE and 100GbE. These standards will likely creep into your data center faster than anyone expects.
Understand specific compute needs: Remember, each application will use resources uniquely, but also predictably. The key is to translate how any application would use resources when running on native hardware to the way these would be used when abstracted into the virtual world. For CPU utilization, assigning more CPU cores is not necessarily better. In fact, assigning too many vCPUs will slow performance. If an application has eight vCPUs but only four vCPUs worth of work to do, it will force the hypervisor to find a way to schedule four cores
Business Critical Application Optimization Methodology
Java Application Resource Allocation, App Tunables Java Virtual Machine Heap Size, Threads, … Application Oriented Optimization
Application Cache, SGA, RAM Commitment App Specific Tunables
Operating System Paravirtual Drivers, Kernel Parameter Tuning (Linux)
Virtual Machine Hardware Optimize RAM, vCPU, Storage, Resource Limits & Reservations Hypervisor Resource Pools, HA, DRS, Data Stores, Parameter Tuning Optimize Bottom to Top Figure 1 Physical Hardware Server, Storage, Network Virtual Infrastructure Oriented Optimization
cognizant 20-20 insights
The key is to translate how any application would use resources when running on native hardware to the way these would be used when abstracted into the virtual world.
on the processor that are servicing those vCPUs to do nothing. Heavily threaded applications tend to use more cores while those which crunch numbers use fewer cores and more cycles.
in standard data stores so your applications using them remain just as logically configured. Finally, use raw disk mappings (RDMs) as a last resort only. With today’s virtual infrastructure systems, there is no performance advantage to using an RDM over a virtual disk located in a properly configured data store. Further, RDMs will add natural complexity to your virtual infrastructure environment from both a configuration and system management perspective. Where feasible, use the OS-level storage systems – such as ASM on Oracle – as recommended by respective application vendors, but layered on top of the optimized storage environment that is created. Networks should be kept as simple as possible. There’s no need to do things like vNIC teaming and bonding inside a VM in almost every conceivable situation. This is already handled by the hypervisor. Instead, use one virtual network interface controller (NIC) for each distinct network Storage is to which you need to arguably the most connect. For example, a typical Oracle RAC node complex of all of will need two vNICs: one the resources to for the public network manage because and one for the private network. The SCAN and it is the component associated virtual IPs do in virtual not need a vNIC. infrastructure that
When it comes to RAM, allocate based on what the application will actually use. Also be sure to set memory reservations for that RAM which will be needed. For example, an Oracle database server should have a memory reservation that is equal to the size of the OS plus the SGA. For Java applications, an appropriate memory reservation would include the OS plus the Java heap size as well as a couple of other smaller items. If necessary, it’s preferable as a good practice to have ever so slightly more RAM assigned for these, as opposed to slightly less. However, it’s also good practice to keep memory reservations as small as practical. Making them too large will interfere with the ability to vMotion a VM from one host to another (by extension impeding the workload balancing capabilities of VMware Distributed Resource Scheduler), complicate HA admission control in the event of a host failure (interfering with HA recovery) or even prevent the VM from being able to start at all. Storage is arguably the most complex of all of the resources to manage because it is the component in virtual infrastructure that itself is almost always abstracted in multiple layers and in widely varying ways depending on the make and model of storage system used. As a result, it is also the area where application performance problems tend to arise first and most frequently. As a general rule, storage capabilities should be pushed as low in the hardware stack as practical. That stated, if a given storage system doesn’t have a feature needed or desired, implement and integrate these features at other layers while taking care to not add undue complexity. Make sure that individual components are not easily overwhelmed, just as you would when architecting shared storage for high-capacity I/O systems and applications. Align these capabilities so they are easily identified and presented
Build VMs to be trans- itself is almost parent and simple: always abstracted When building virtual in multiple layers machines, less is definitely more. If you know and in widely you will never need a varying ways specific feature, you’re depending on probably better off not installing it. Just as is the make and the norm with any OS model of storage build, turn off unneces- system used. sary services and follow the best practices for hardening the OS in question. The goal here is to have a “squeakyclean” OS on the VM that feels the same to the application as it would on any other optimized environment. Storage should appear as simple, local disks, and networks should appear as simple connections – because all of the optimization of these
cognizant 20-20 insights
components has already been accomplished within the virtual infrastructure environment itself.
Then Take Advantage
Only after the virtual environment is optimized should your organization be truly concerned about taking full advantage of its unique benefits and features. At this point, your organization should be able to do so easily. But for businesscritical applications, there is still more work to do. High Availability: When (Not) to Cluster Business-critical applications naturally have requirements for very high availability and recoverability. In many cases, the enhanced availability provided by a well-engineered virtual infrastructure platform will meet this need. When it does, certain high-availability configurations — system clustering in particular — that are a must for physical infrastructure deployments can be eliminated. Understanding when and when not to cluster, as well as how to best accomplish it, can depend greatly on the capabilities of the application in question, but there are some common guidelines. First, a properly engineered vSphere HA/DRS cluster can be expected to reliably achieve somewhere between three nines and four nines of availability for all systems running on it. By comparison, traditional database clustering techniques used by the likes of Oracle RAC and Microsoft SQL Cluster Services are intended to provide only three nines of availability at best in and of themselves. To achieve higher levels of availability requires work at the application layer. What this means is that, unless something explicitly is performed at the application layer to enhance availability (which is actually not all that common), a properly optimized vSphere HA/DRS cluster can provide equal or better levels of availability than clustering at the OS layer can. This is an excellent opportunity to consider simplifying some clustered systems. … But before running off to destroy every cluster in the data center, consider that systems are often clustered for reasons beyond availability. It’s not unusual for clustered systems to be active-active, or to be clustered to minimize downtime during patches – also known as rolling upgrades. If these kinds of operations are part of your organization’s regular maintenance, clustering is still required. Thus, the key to knowing when to cluster systems
on virtual infrastructure is to fully understand the specific application requirements, and then validate if the requirements hold up when migrating to a virtual infrastructure environment.
When clustering on top of virtual infrastructure, the high-availability features of each layer should be optimized to complement one another. At the same time, your organization should avoid clustering techniques that might interfere with infrastructure layers above and below. Operating system clusters on virtual infrastructure will generally require that shared disk is used between the individual nodes (voting and quorum drives) and usually involve one of four methods:
The key to knowing when to cluster systems on virtual infrastructure is to fully understand the specific application requirements, and then validate if the requirements hold up when migrating to a virtual infrastructure environment.
• • • •
Shared via RDM. Shared via iSCSI or NFS on SAN/NAS. Shared via multi-write virtual disk. Shared via iSCSI or NFS target VM.
While all of these options can be made to work well, they have distinct advantages and disadvantages. Sharing via RDM is the oldest and most well known, but provides the least advantages and greatest limitations. With this option, VMs in a cluster use an RDM to share data. While well-known, this option also introduces a condition of SCSI bus sharing into the cluster between the nodes. Migrating VMs via vMotion is not supported in this configuration, so VMs are fixed to whichever host they are running on unless and until restarted on another node. Data on the shared disk is also kept in a different format using the native file system of the OS on a LUN as compared to on a virtual disk in a data store. This can impact how data is protected. Of all of the options, share via RDM provides the least amount of flexibility and should be used only when other methods are not available. Share via iSCSI or NFS on SAN/NAS resolves the issue of SCSI bus sharing, thus enabling support for vMotion on cluster nodes. However, this option is not available when using FC SAN storage systems. Organizations with an investment in FC SAN may not wish to change the storage
cognizant 20-20 insights
infrastructure just to enable this method. Finally, this option has the same issues of data protection differences that are present with share via RDM.
host HA/DRS cluster where such a configuration is running can have no more than eight ESXi host systems. Also, disks that have multi-write flags set on them can have support issues with certain vStorage API based backup tools. While it is expected that future versions of vSphere should address this issue, be sure that your data protection systems take this into account. The iSCSI/NFS Gateway VM method is growing in popularity because it resolves almost all of the limitations of the others. Here, an additional VM is configured as an iSCSI or NFS target to reshare the SAN storage over a private virtual network. This VM can be a single vCPU, which means vSphere Fault Tolerance can be used to increase its availability. The nodes of the OS cluster then use the iSCSI or NFS share provided by the target VM for their shared storage (see Figure 2).
A very simple way to share a disk is to share via a multiwrite virtual disk. This option allows all data to remain in virtual disk files on a data store.
A very simple way to share a disk is to share via a multi-write virtual disk. This option allows all data to remain in virtual disk files on a data store. Here, the shared virtual disk is located in a folder where all cluster nodes can access it. It is formatted Eager Zeroed Thick and the multiwrite flag is set, allowing all VMs to write to it at will. There are distinct advantages to this method. It is easy to set up, allows for vMotion and makes data protection consistent. Its primary drawbacks are that the shared virtual disk is associated with more than one virtual machine, so data protection systems must account for this, and that the
iSCSI GatewaySchematic VM Configuration Virtualization
Guest to Guest iSCSI Disk Sharing Gateway Shared iSCSI Disk
Database Node ESXi
Database Node ESXi
Database Node ESXi
VMware Fault Tolerance
iSCSI Gateway (FT Clone)
Gateway Shared Disk
vSphere Datastore VM Disk VM Disk VM Disk SAN Infrastructure VM Disk VM Disk
Highlights: All Storage is VMDK on SAN iSCSI Gateway virtualizes and re-shares disk over VM Network (Virtual SAN on SAN) HA, DRS, and FT work together All Systems can be vMotioned Portable to any vSphere architecture Figure 2
cognizant 20-20 insights
This configuration allows all nodes – and even the iSCSI Gateway – to be vMotioned, works with every supported vSphere storage system and can be used on HA/DRS clusters with more than eight nodes. It also clearly associates the shared disk with a specific VM. The primary drawback of this configuration is that it also is arguably the most complex to both set up and maintain. Also, when the iSCSI/NFS target is made fault tolerant, its disk is marked Eager Zeroed Thick and the multiwrite flag is set. If using a vStorage API based tool, organizations may need to add a script to temporarily disable vSphere Fault Tolerance when backing up this VM. Regardless of the clustering methodology used, anti-affinity policies between the various cluster nodes is a must. This ensures that no two nodes will run on the same physical host at the same time, and thus defeat one of the high-availability purposes of clustering. This is true even for share via RDM configurations because, in the event of a host failure, VMware HA will follow DRS rules for placement when deciding where to restart the failed cluster node.
Business-critical applications have special compute needs that go well beyond those of other systems usually found in virtual infrastructure. When not carefully attended to, this can cause these applications to perform poorly and deliver reduced functionality. Fortunately, while each application expresses how it consumes resources differently, the four food groups of computing are always involved. As a result, common methods and themes arise when abstracting infrastructure for these applications. Properly configured, mission-critical applications can thrive on virtual infrastructure, gaining the same benefits of performance, consistency, availability and recoverability as all other systems. Understanding how each application uses available compute resources is the key to successfully virtualizing business-critical applications, and accelerating the journey to both cloud computing and the software defined data center.
About the Author
Christopher (Chris) A. Williams is a Director of Cognizant Virtual Solutions, within CBC-ITIS Enterprise Computing’s Infrastructure Technology Management Services Practice. In this role, Chris is responsible for designing and developing innovative virtual infrastructure, private and hybrid cloud solutions, optimizing business critical applications and database systems including Oracle RAC, DB2, SQL Server clusters, MySQL and Sybase. Chris has an M.B.A., information systems emphasis, from the University of Colorado, and a bachelor of science degree, with aerospace science and management emphasis, from Metropolitan State University of Denver. He can be reached at Chris.Williams@cognizant.com.
Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process outsourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 50 delivery centers worldwide and approximately 164,300 employees as of June 30, 2013, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world. Visit us online at www.cognizant.com for more information.
500 Frank W. Burr Blvd. Teaneck, NJ 07666 USA Phone: +1 201 801 0233 Fax: +1 201 801 0243 Toll Free: +1 888 937 3277 Email: email@example.com
1 Kingdom Street Paddington Central London W2 6BD Phone: +44 (0) 207 297 7600 Fax: +44 (0) 207 121 0102 Email: firstname.lastname@example.org
India Operations Headquarters
#5/535, Old Mahabalipuram Road Okkiyam Pettai, Thoraipakkam Chennai, 600 096 India Phone: +91 (0) 44 4209 6000 Fax: +91 (0) 44 4209 6060 Email: email@example.com
© Copyright 2013, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.