VMware vSphere 4.1 HA and DRS Technical Deepdive | Computer Architecture | Information Technology Management

VMware vSphere 4.1 HA and DRS Technical Deepdive

VMware vSphere 4.1, HA and DRS Technical Deepdive Copyright © 2010 by Duncan Epping and Frank Denneman. All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic, mechanical, or otherwise, without written permission from the publisher. No patent liability is assumed with respect to the use of the information contained herein. Although every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions. Neither is any liability assumed for damages resulting from the use of the information contained herein. International Standard Book Number (ISBN:) 9781456301446 All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark. Version: 1.1

one of the leading VMware/virtualization blogs worldwide (Recently been voted number 1 virtualization blog for the 4th consecutive time on vsphere-land.) and lead-author of the "vSphere Quick Start Guide" and "Foundation for Cloud Computing with VMware vSphere 4" which has recently been published by Usenix/Sage. He can be followed on twitter at http://twitter. Duncan is the owner of YellowBricks. Frank is a VMware Certified Professional and among the first VMware Certified Design Experts (VCDX 029).com. Frank Denneman is a Consulting Architect working for VMware as part of the Professional Services Organization. He is specialized in vSphere. (#21 in the Short Topics Series). VMware HA and Architecture. He is focused on designing large vSphere Infrastructures and specializes in Resource Management. Duncan primarily focuses on vStorage initiatives and ESXi.com/DuncanYB.com/FrankDenneman. . vStorage. Frank works primarily with large Enterprise customers and Service Providers.com.About the Authors Duncan Epping is a Principal Architect working for VMware as part of the Technical Marketing department. He can be followed on twitter at http://twitter. DRS in general and storage.nl which has recently been voted number 6 worldwide on vsphere-land.com. Duncan is a VMware Certified Professional and among the first VMware Certified Design Experts (VCDX 007). Frank is the owner of FrankDenneman.

Table of Contents About the Authors Acknowledgements Foreword Introduction to VMware High Availability How Does High Availability Work? Pre-requisites Firewall Requirements Configuring VMware High Availability Components of High Availability VPXA VMAP Plug-In AAM Nodes Promoting Nodes Failover Coordinator Preferred Primary High Availability Constructs Isolation Response Split-Brain Isolation Detection Selecting an Additional Isolation Address Failure Detection Time Adding Resiliency to HA (Network Redundancy) Single Service Console with vmnics in Active/Standby Configuration Secondary Management Network .

Admission Control Admission Control Policy Admission Control Mechanisms Host Failures Cluster Tolerates Unbalanced Configurations and Impact on Slot Calculation Percentage of Cluster Resources Reserved Failover Host Impact of Admission Control Policy Host Failures Cluster Tolerates Percentage as Cluster Resources Reserved Specify a Failover Host Recommendations VM Monitoring Why Do You Need VM/Application Monitoring? How Does VM/App Monitoring Work? Is AAM enabling VM/App Monitoring? Screenshots vSphere 4.1 HA and DRS Integration Affinity Rules Resource Fragmentation DPM Flattened Shares Summarizing What is VMware DRS? Cluster Level Resource Management Requirements .

Operation and Tasks of DRS Load Balance Calculation Events and Statistics Migration and Info Requests vCenter and Cluster sizing DRS Cluster Settings Automation Level Initial Placement Impact of Automation Levels on Procedures Resource Management Two-Layer Scheduler Architecture Resource Entitlement Resource Entitlement Calculation Calculating DRS Recommendations When is DRS Invoked? Defragmenting cluster during Host failover Recommendation Calculation Constraints Correction Imbalance Calculation Impact of Migration Threshold on Selection Procedure Selection of Virtual Machine Candidate Cost-Benefit and Risk Analysis Criteria The Biggest Bang for the Buck Calculating the Migration Recommendation Priority Level Influence DRS Recommendations Migration Threshold Levels .

Behavior of Resource Pool Level Memory Reservations Setting a VM Level Reservation inside a Resource Pool VMkernel CPU reservation for vMotion Reservations Are Not Limits.Rules VM-VM Affinity Rules VM-Host Affinity Rules Impact of Rules on Organization Virtual Machine Automation Level Impact of VM Automation Level on DRS Load Balancing Calculation Resource Pools and Controls Root Resource Pool Resource Pools Resource pools and simultaneous vMotions Under Committed versus Over Committed Resource Allocation Settings Shares Reservation VM Level Scheduling: CPU vs Memory Impact of Reservations on VMware HA Slot Sizes. Memory Overhead Reservation Expandable Reservation Limits CPU Resource Scheduling Memory Scheduler Distributed Power Management .

Enable DPM Templates DPM Threshold and the Recommendation Rankings Evaluating Resource Utilization Virtual Machine Demand and ESX Host Capacity Calculation Evaluating Power-On and Power-Off Recommendations Resource LowScore and HighScore Host Power-On Recommendations Host Power-Off Recommendations DPM Power-Off Cost/Benefit Analysis Integration with DRS and High Availability Distributed Resource Scheduler High Availability DPM awareness of High Availability Primary Nodes DPM Standby Mode DPM WOL Magic Packet Baseboard Management Controller Protocol Selection Order DPM and Host Failure Worst Case Scenario DRS. DPM and VMware Fault Tolerance DPM Scheduled Tasks Summarizing Appendix A – Basic Design Principles VMware High Availability VMware Distributed Resource Scheduler Appendix B – HA Advanced Settings .

We would like to dedicate this book to the VMware Community. Director) for supporting us on this and other projects.nl) for their very valuable feedback and for keeping us honest. The opinions expressed here are the authors’ personal opinions. not a VMware book. Without your support we could have not have done this. Content published was not read or approved in advance by VMware and does not necessarily reflect the views and opinions of VMware. Duncan Epping and Frank Denneman . Director. Marc Sevigny (VMware HA Engineering). We highly appreciate all the effort everyone is putting in to take VMware. A special thanks goes out to our Technical Reviewers: fellow VCDX Panel Member Craig Risinger (VMware PSO). This is the authors’ book. This is our gift to you. A very special thanks to our families and friends for supporting this project. Anne Holler (VMware DRS Engineering) and Bouke Groenescheij (Jume. First of all we would like to thank our VMware management team (Steve Beck. Rob Jenkins. Virtualization and Cloud to the next level.Acknowledgements The authors of this book work for VMware.

and Fault Tolerance were once synonymous with “Expensive Enterprise Solutions. you likely understand the significant benefits that virtualization can provide. and are probably well on your way to building out your virtual infrastructure and strategy. This will help you better prepare for. but are now available to even the smallest of companies. It’s this increased portability and recoverability that significantly drove VMware’s adoption during its highest growth period. customers started to see the significant advantages introduced by the increased portability and recoverability that were all of a sudden available. High-Availability. Shortly after introducing virtualization solutions. This book is going to arm you with the information necessary to understand the in-depth details of what VMware can provide you when it comes to improving the availability of your systems. placed VMware squarely at the top market leadership board. It takes a well-designed virtual infrastructure and a full understanding of how the business requirements of the organization align to the capabilities of the platform. This book will enable you to make the most educated decisions as you attempt to achieve the next level of maturity within your virtual environment. Duncan and Frank have used their extensive field experience into this book to enable you to drive broader virtualization adoption across more complex and critical applications. if you’ve read this far. VMware’s virtualization platform can provide near instant recovery time with increasingly more recent recovery points in a properly designed environment. Now. and floor space requirements was a surefire recipe for VMware’s early success. Increased optimization of low-utilized systems and lowering datacenter costs of cooling. server virtualization has forever changed how we build and manage the traditional x86 datacenter. when combined with the intelligence of intelligent resource management. and align to. Recovery capabilities and options that were once reserved for the most critical of workloads within the world’s largest organizations became broadly available to the masses. In its early days of providing an enterprise-ready hypervisor. especially as increasingly more critical applications are introduced and require greater availability and recoverability service levels. Replication. the requirements of your business as well as set the proper expectations with the key stakeholders within the IT organization. The capabilities provided by VMware are not ultimately what dictates the success and failure of a virtualization project. Quest Software . electricity. Data protection enhancements. Virtualization Business. VMware focused their initial virtualization efforts to meet the need for server consolidation.Foreword Since its inception. Scott Herold Lead Architect.

Part 1 VMware High Availability .

(MSCS). With HA the virtual machine is literally restarted and this incurs downtime. is not a 1:1 replacement for solutions like Microsoft Clustering Services. affected virtual machines are automatically restarted on other production hosts within the cluster with spare capacity.Chapter 1 Introduction to VMware High Availability VMware High Availability (HA) provides a simple and cost effective clustering solution to increase uptime for virtual machines. HA is a form of stateless clustering. Figure 1: High Availability in action Unlike many other clustering solutions HA is literally configured and enabled with 4 clicks. In the case of a failure caused by the Guest OS. HA uses a heartbeat mechanism to detect a host or virtual machine failure. and let’s repeat it. HA restarts the failed virtual machine on the same host. However HA is not. In the event of a host failure. The service is transitioned to one of the other nodes and it should resume with limited downtime or loss of data. MSCS and for instance Linux Clustering are stateful clustering solutions where the state of the service or application is preserved when one of the nodes fails. but sometimes also referred to as VM HA. . This feature is called VM Monitoring.

iSCSI Pingable gateway or other reliable address for testing isolation We recommend against using a mixed cluster. which should be installed anyway. this could even cause more downtime if not operated correctly. Also. The answer is simple. but highly recommended) Shared Storage for VMs – NFS. costs (associated with downtime and MSCS). Just like MSCS a service or application is restarted during a failover. HA triggers a response based on the loss of heartbeats. Now. not all virtual machines (or services) need 99.999% uptime. VM Monitoring does not require any additional software or OS modifications except for VMware Tools. We can’t think of a single reason not to use it. can be complex and need special skills and training. For many services the type of availability HA provides is more than sufficient. (KB article: 1013637) . However you might be more interested in knowing which components VMware uses and what is required in order for HA to function correctly. resource overhead and unplanned downtime for minimal additional costs. Differences in build numbers has led to serious issues in the past when using VMware FT.One might ask why would you want to use HA when a virtual machine is restarted and service is temporarily lost. contrary to MSCS. With that we mean a single cluster containing both ESX and ESXi hosts. HA reduces complexity. One example is managing patches and updates/upgrades in a MSCS environment. the following items are the pre-requisites in order for HA to function correctly: Minimum of two VMware ESX or ESXi hosts Minimum of 2300MB memory to install the HA Agent VMware vCenter Server Redundant Service Console or Management Network (not a requirement. How Does High Availability Work? Before we deep dive into the main constructs of HA and describe all the choices one has when configuring HA we will first briefly touch on the requirements. Besides that. It is important to note that HA. the question of course is how does HA work? As just briefly touched in the introductions. Maybe if this is the first time you are exposed to HA you also want to know how to configure it. the same happens with HA and the effected virtual machines. Pre-requisites For those who want to configure HA. does not require any changes to the guest as HA is provided on the hypervisor level. Stateful clustering does not guarantee 100% uptime. SAN.

8042 – TCP . High Availability port settings: 8042 – UDP .Used for host-to-hosts "backbone" (message bus) communication.Used by AAM agent process to communicate with the backbone. Enable VM Monitoring Status by selecting “VM Monitoring Only” and click Next 8.Used by HA to send heartbeats. If your environment contains firewalls ensure these ports are opened for HA to function correctly. 8044 – UDP . 3. ams-hadrs-001. Right-click the Datacenter in the Inventory tree and click New Cluster. 2050 – 2250 . Click Finish to complete the creation of the cluster . Leave VMware EVC set to the default and click Next 9. select Turn On VMware HA and click Next. Configuring VMware High Availability As described earlier.Firewall Requirements The following list contains the ports that are used by HA for communication. Leave Cluster Default Settings for what it is and click Next 7. 5. 1. Ensure Host Monitoring Status and Admission Control is enabled and click Next 6.Used by AAM agents to communicate with a remote backbone. HA can be configured with the default settings within 4 clicks. Select the Hosts & Clusters view. 4.Used to locate a backbone at bootstrap time. In the Cluster Features section of the page. We recommend at a minimum including the location of the cluster and a sequence number ie. 2. Each of the settings and the mechanisms associated with these will be described more in-depth in the following chapters. The following steps however will show you how to create a cluster and how to enable HA including VM Monitoring. 8043 – TCP . Leave the Swapfile Policy set to default and click Next 10. Give the new cluster an appropriate name.

When an ESX host is added to the cluster the HA agent will be loaded.When the HA cluster has been created ESX hosts can be added to the cluster simply by dragging them into the cluster. .

The following diagram depicts a two host cluster and shows the key HA components.Chapter 2 Components of High Availability Now that we know what the pre-requisites are and how to configure HA the next steps will be describing which components form HA. This is still a “high level” overview however. Figure 3: Components of High Availability As you can clearly see there are three major components that form the foundation for HA: VPXA VMAP AAM . There is more under the cover that we will explain in following chapters.

from an HA perspective there is no need to create local host files and it is our recommendation to avoid using local host files. To stress my point even more as of vSphere 4. VMware vCenter supplies the name resolution information that HA needs to function. HA takes care of the failure and restarts the vCenter server on another host. VMAP is loaded into vpxa at runtime when a host is added to an HA cluster.VPXA The first and probably the most important is VPXA. VMAP Plug-In Next on the list is VMAP. It’s highly recommended to register ESX hosts with their FQDN in vCenter. Basic design principle: Avoid using static host files as it leads to inconsistency. Although HA is configured by vCenter Server. It is comforting to know that in case of a host failure containing the virtualized vCenter server.0 VMAP was a separate process instead of a plugin linked into vpxa. The vpxa communicates with VMAP and VMAP communicates with AAM. This is not an HA agent. When AAM has received it and flushed the info it well tell VMAP and VMAP on its turn will acknowledge to vpxa that info has been processed. The VMAP plug-in acts as a proxy for communication to AAM. When a virtual vCenter is used we do however recommend setting the correct restart priorities within HA to avoid any dependency problems. In other words. including all other configured virtual machines from that failed host.0 Update 1 host files (i. but it is the vCenter agent and it allows your vCenter Server to interact with your ESX host. . which makes troubleshooting difficult. When vpxa wants to communicate with the AAM agent VMAP will translate this into understandable instructions for the AAM agent. HA is loosely coupled with vCenter Server.e. It is also takes care of stopping and starting virtual machines if and when needed. /etc/hosts) are corrected automatically by HA. They are too static and will make troubleshooting more difficult. A good example of what VMAP would translate is the state of a virtual machine: is it powered on or powered off? Pre-vSphere 4. In other words if you have made a typo or for example forgot to add the short name HA will correct the host file to make sure nothing interferes with HA. it does not need vCenter to manage an HA failover. HA stores this locally in a file called “FT_HOSTS”. Where vpxa is the process for vCenter to communicate with the host VMAP is the translator for the HA agent (AAM) and vpxa.

when HA is enabled of course. the AAM agent. All this makes the AAM agent one of the most important processes on an ESX host.One thing you are probably wondering is why do we need VMAP in the first place? Wouldn’t this be something vpxa or AAM should be able to do? The answer is yes. It is also resilient to network interruptions and component failures. The underlying message framework exactly-once guarantees message delivery. The engineers recognized the importance and added an extra level of resiliency to HA. when HA was first introduced it was architecturally more prudent to create a separate process for dealing with this which has now been turned into a plugin. If one of the processes dies the watchdog functionality will pick up on this and restart the process to ensure HA functionality remains without anyone ever noticing it failed. However. but we are assuming for now it is. AAM stores all this info in a database and ensures consistency by replicating this database amongst all primary nodes. The AAM agent is the core of HA and actually stands for “Automated Availability Manager”. Inter-host communication automatically uses another communication path (if the host is configured with redundant management networks) in the case of a network failure. One of the other tasks AAM is responsible for is the mechanism with which HA detects isolations/failures: heartbeats. virtual machine states and HA properties to other hosts in the cluster. this is not the case! The data is stored in a database on local storage or in FLASH memory on diskless ESXi hosts. either vpxa or AAM should be able to carry this functionality. It is responsible for many tasks such as communicating host resource information.) It is often mentioned that HA uses an In-Memory database only. . AAM That brings us to our next and final component. As stated above. (Primary nodes are discussed in more detail in chapter 4. AAM was originally developed by Legato. The agent is multi-process and each process acts as a watchdog for the other.

These resources can be carved up with the use of VMware Distributed Resource Scheduler (DRS) into separate pools of resources or used to increase availability by enabling HA. and that is the concept of nodes. Before we discuss the various options one has during the configuration of HA there is one important aspect that needs to be discussed first. The following diagram depicts the concepts of nodes: Figure 4: Primary and secondary hosts . A cluster can best be seen as a collection of resources. Everyone who has implemented VMware VI3 or vSphere knows that multiple hosts can form a cluster.Chapter 3 Nodes Now that you know what the components of HA are. It is important to understand the concepts of nodes as how they work can and will influence your design. it is time to start talking about the one of the most crucial concepts when it comes to designing HA clusters.

any host that joins the cluster must communicate with an existing primary node to complete its configuration. This concept was introduced to enable scaling up to 32 hosts in a cluster and each type of node has a different role. The vCenter client normally does not show which host is a primary node and which is a secondary node. All other nodes are automatically selected as secondary nodes. An example of node state data would be host resource usage. It is not recommended to decrease this value as it might lead to decreased scalability due to the overhead of these status updates.1 by default every host will send an update of its status every 10 seconds.sensorPollingFreq. The maximum value of the advanced setting is 30. It will give details around errors and will show the primary and secondary nodes. Nodes send a heartbeat to each other. At least one primary host must be available for HA to operate correctly. Except for the first host that is added to the cluster. As discussed earlier. As stated before the default value of this advanced setting is 10. This interval can be controlled by an advanced setting called das.failuredetectioninterval. However. but not to secondaries. In case vCenter is not available the primary nodes will always have a very recent calculation of the resource utilization and can take this into account when a failover needs to occur. If all primary hosts are unavailable. There is one gotcha however. it will only show which nodes are primary and secondary in case of an error. As of vCenter 4. We do however not recommend changing this interval as it was carefully selected by VMware.1 a new feature has been added which is called “Operational Status” and can be found on the HA section of the Cluster’s summary tab. The data a primary node holds is stored in a persistent database and synchronized between primaries as depicted in the diagram above. this is virtually random. generally within seconds after a change. As of vSphere 4. Secondary nodes send their state info to primary nodes. Pre-vSphere 4. When you do a reconfigure for HA. Although a smaller value will lead to a more update view of the status of the cluster overall it will also increase the amount of traffic between nodes. this is a configurable value through the use of the following cluster advanced setting: das. a node will recognize it is isolated by the fact that it isn’t receiving heartbeats from any of the other nodes. However. . Nodes send out these heartbeats every second by default. HA uses a heartbeat mechanism to detect possible outages or network isolation. Primary nodes hold cluster settings and all “node states”. This will be sent when changes occur.1 this used to be every second. There are two types of nodes. the primary nodes and secondary nodes are selected again. or nodes as HA calls them. The first 5 hosts that join the HA cluster are automatically selected as primary nodes. Primary nodes send heartbeats to all primary nodes and all secondary nodes.An HA cluster consists of hosts. Secondary nodes send their heartbeats to all primary nodes. you will not be able to add or remove a host from your cluster. A node is either a primary or a secondary node. The heartbeat mechanism is used to detect a failed or isolated node.

0): Figure 6: List node command Another method of showing the primary nodes is: . The following are two examples of how to list the primary nodes via the Service Console (ESX 4.Figure 5: Cluster operational status This however can also be revealed from the Service Console or via PowerCLI.

Figure 7: List nodes command With PowerCLI the primary nodes can be listed with the following lines of code: Power-CLI code:Get-Cluster <clustername> | Get-HAPrimaryVMHost Now that you have seen that it is possible that you can list all nodes with the CLI you probably wonder what else is possible… Let’s start with a warning . It is possible to manually add a 6th primary but this is not supported nor encouraged. This is a soft limit however. There should be no reason to increase the number of primaries beyond 5. For the purpose of education we will demonstrate how to promote a secondary node to primary and vice versa. Having more than 5 primaries in a cluster will significantly increase network and CPU overhead. To promote a node: .this is not supported! Currently the supported limit of primaries is 5.

this command can’t be run however without setting the required environment variables first. On earlier versions of ESX “ftcli” should be used. You can execute …/config/agent_env.[platform] to set these. .Figure 8: Promote node command To demote a node: Figure 9: Demote node command This method however is unsupported and there is no guarantee this will remain working in the future.

Promoting Nodes A common misunderstanding about HA with regards to primary and secondary nodes is the reelection process. We will use “failover coordinator” for now.1 when multiple hosts would fail at the same time it would handle the restarts serially. This is not the case. To simplify it. you will need at least one primary to restart virtual machines. The failover coordinator coordinates the restart of virtual machines on the remaining primary and secondary hosts. restart the virtual machines of the first failed host (taking restart priorities in account) and then restart the virtual machines of the host that failed as second (again taking restart priorities in account). The reason for this is that one of the primary nodes will hold the “failover coordinator” role. Let’s stress that. when a host fails we recommend placing it in maintenance mode. all the host failures that occur within 15 seconds will have all their VMs aggregated and prioritized before the power-on operations occur. . Failover Coordinator As explained in the previous section. This node is again randomly selected from the pool of available primary nodes. The amount of primaries is definitely something to take into account when designing for uptime. In the case of multiple near-simultaneous host failures.1 this mechanism has been severely improved. this is not the case! The promotion of a secondary node to primary only occurs in one of the following scenarios: When a primary node is placed in “Maintenance Mode” When a primary node is disconnected from the cluster When a primary node is removed from the cluster When the user clicks “reconfigure for HA” on any ESX host This is particularly important for the operational aspect of a virtualized environment. one of the other primaries will take over. In other words. This is why you can configure HA to tolerate only up to 4 host failures when you have selected the “host failures” Admission Control Policy (Remember 5 primaries…). If the failover coordinator fails. this role is also sometimes referred to as “active primary”. When a host fails it is important to ensure its role is migrated to any of the other hosts in case it was an HA primary node. HA needs at least one primary node to restart virtual machines. Pre-vSphere 4. The coordinator takes restart priorities in account when coordinating the restarts. When does a re-election. the failover coordinator process is carefully watched by the watchdog functionality of HA. occur? It is a common misconception that a promotion of a secondary occurs when a primary node fails. As any other process within the HA stack. As of vSphere 4. to disconnect it or to remove it from the cluster to avoid any risks! If all primary hosts fail simultaneously no HA initiated restart of the virtual machines can take place. This role will be randomly assigned to a primary node. or promotion.

As of vSphere 4. HA would rely on DRS. Setting a larger value will allow more VMs to be restarted concurrently and might reduce the overall VM recovery time. HA then relies on DRS to redistribute the load later if required. . When it is unavailable. This can even be extended in very large environments by having no more than 2 hosts of a cluster in a chassis. select the host with the highest percentage of unreserved memory and CPU and restart the virtual machine. HA does not coordinate with DRS when making the decision on where to place virtual machines. As stated the default value is 32. The number of concurrent failovers can be controlled by an advanced setting called das.1. As soon as the virtual machines were restarted. For the next virtual machine the same exercise would be done by HA. DRS would kick in and redistribute the load if and when needed. no restart will take place. This re-parent process however did already exist pre-vSphere 4. The failover coordinator can restart up to 32 VMs concurrently per host. The following diagram depicts the scenario where four 8 hosts clusters are spread across four chassis.1 the failover coordinator would decide where a virtual machine would be restarted. In blade environments it is particularly important to factor the primary nodes and failover coordinator concept into your design.Pre-vSphere 4. When all primary nodes reside in a single chassis and the chassis fails. no virtual machines will be restarted as the failover coordinator is the only one who initiates the restart of your virtual machines. This improvement results in faster restarts of the virtual machines and less stress on the ESX hosts. It is a best practice to have the primaries distributed amongst the chassis in case an entire chassis fails or a rack loses power.perHostConcurrentFailoversLimit. DRS also re-parents the virtual machine when it is booted up as virtual machines are failed over into the root resource pool by default. but the average latency to recover individual VMs might increase.1 virtual machines will be evenly distributed across hosts to lighten the load on the hostd service and to get quicker power-on results. there is still a running primary to coordinate the failover. When designing a multi chassis environment the impact of a single chassis failure needs to be taken into account. Basically it would check which host had the highest percentage of unreserved and available memory and CPU and select it to restart that particular virtual machine.

The “=” sign has been used as a divider between the setting and the value. it is currently considered unsupported. This new advanced setting is called das.Figure 10: Logical cluster layout on blade environment Basic design principle:In blade environments.1 a new advanced setting has been introduced. With this setting multiple hosts of a cluster can be manually designated as a preferred node during the primary node election process. The list of nodes can either be comma or space separated and both hostnames and IP addresses are allowed. This setting is not even experimental. We don't recommend anyone using it in a production environment. .preferredPrimaries. Preferred Primary With vSphere 4. divide hosts over all blade chassis and never exceed four hosts per chassis to avoid having all primary nodes in a single chassis. if you do want to play around with it use your test environment. Below you can find an example of what this would typically look like.

please be warned that this is considered unsupported at times of writing and please verify in the VMware Availability Guide or online in the knowledge base (kb. A work around found by some pre-vSphere 4. Again. If you specify more than 5 hosts.1.com) what the status is of the support on this feature before even thinking about implementing it. you can specify any number of hosts. If you specify 5 hosts.hostname3 or das.1.preferredPrimaries = hostname1. the first 5 hosts of your list will become primary.168.hostname2.1 192.1.vmware. . Although this solution could fairly simply be scripted it is unsupported and as opposed to “das.preferredPrimaries” a rather static solution.168. and all 5 hosts are available they will become the primary nodes in your cluster.1 was using the “promote/demote” option of HA’s CLI as described earlier in this chapter.168. or less.2 192.3 As shown there is no need to specify 5 hosts.preferredPrimaries = 192.das.

Chapter 4 High Availability Constructs
When configuring HA two major decisions will need to be made. Isolation Response Admission Control Both are important to how HA behaves. Both will also have an impact on availability. It is really important to understand these concepts. Both concepts have specific caveats. Without a good understanding of these it is very easy to increase downtime instead of decreasing downtime.

Isolation Response
One of the first decisions that will need to be made when HA is configured is the “isolation response”. The isolation response refers to the action that HA takes for its VMs when the host has lost its connection with the network. This does not necessarily means that the whole network is down; it could just be this hosts network ports or just the ports that are used by HA for the heartbeat. Even if your virtual machine has a network connection and only your “heartbeat network” is isolated the isolation response is triggered. Today there are three isolation responses, “Power off”, “Leave powered on” and “Shut down”. This answers the question what a host should do when it has detected it is isolated from the network. In any of the three chosen options, the remaining non isolated, hosts will always try to restart the virtual machines no matter which of the following three options is chosen as the isolation response: Power off – When network isolation occurs all virtual machines are powered off. It is a hard stop, or to put it bluntly, the power cable of the VMs will be pulled out! Shut down – When network isolation occurs all virtual machines running on the host will be shut down using VMware Tools. If this is not successful within 5 minutes, a “power off” will be executed. This time out value can be adjusted by setting the advanced option das.isolationShutdownTimeout. If VMware Tools is not installed, a “power off” will be initiated immediately.

Leave powered on – When network isolation occurs on the host, the state of the virtual machines remains unchanged. This setting can be changed on the cluster settings under virtual machine options. Figure 11: Cluster default setting

The default setting for the isolation response has changed multiple times over the last couple of years. Up to ESX 3.5 U2 / vCenter 2.5 U2 the default isolation response when creating a new cluster was “Power off”. This changed to “Leave powered on” as of ESX 3.5 U3 / vCenter 2.5 U3. However with vSphere 4.0 this has changed again. The default setting for newly created clusters, at the time of writing, is “Shut down” which might not be the desired response. When installing a new environment; you might want to change the default setting based on your customer’s requirements or constraints. The question remains, which setting should you use? The obvious answer applies here; it depends. We prefer “Shut down” because we do not want to use a degraded host to run our virtual machines on and it will shut down your virtual machines in clean manner. Many people however prefer to use “Leave powered on” because it eliminates the chances of having a false positive and the associated down time with a false positive. A false positive in this case is an isolated heartbeat network but a non-isolated virtual machine network and a non-isolated iSCSI / NFS network. That leaves the question how the other HA nodes know if the host is isolated or failed. HA actually does not know the difference. The other HA nodes will try to restart the affected virtual machines in either case. When the host is unavailable, a restart attempt will take place no matter which isolation response has been selected. If a host is merely isolated, the non-isolated hosts will not be able to restart the affected virtual machines. The reason for this is the fact that the host that is running the virtual machine has a lock on the VMDK and swap files. None of the hosts will be able to boot a virtual machine when the files are locked. For those who don’t know, ESX locks files to prevent the possibility of multiple ESX hosts starting the same virtual machine. However, when a host fails, this lock expires and a restart can occur. To reiterate, the remaining nodes will always try to restart the “failed” virtual machines. The possible lock on the VMDK files belonging to these virtual machines, in the case of an isolation event, prevents them from being started. This assumes that the isolated host can still reach the files, which might not be true if the files are accessed through the network on iSCSI, NFS, or FCoE based storage. HA however will repeatedly try starting the “failed” virtual machines when a restart is unsuccessful.

The amount of retries is configurable as of vCenter 2.5 U4 with the advanced option “das.maxvmrestartcount”. The default value is 5. Pre-vCenter 2.5 U4 HA would keep retrying forever which could lead to serious problems as described in KB article 1009625 where multiple virtual machines would be registered on multiple hosts simultaneously leading to a confusing and inconsistent state. (http://kb.vmware.com/kb/1009625)

HA will try to start the virtual machine on one of your hosts in the affected cluster; if this is unsuccessful on that host, the restart count will be increased by 1. The next restart attempt will than occur after two minutes. If that one fails, the next will occur after 4 minutes, and if that one fails the following will occur after 8 minutes until the “das.maxvmrestartcount” has been reached.

To make it more clear look at the following list: T+0 – Restart T+2 – Restart retry 1 T+4 – Restart retry 2 T+8 – Restart retry 3 T+8 – Restart retry 4 T+8 – Restart retry 5 As shown above in the bullet list and clearly depicted in the diagram below; a successful power-on attempt could take up to 30 minutes in the case multiple power-on attempts are unsuccessful. However HA does not give a guarantee and a successful power-on attempt might not ever take place.
Figure 12: High Availability restart timeline

) In short.Split-Brain When creating your design. VMware’s engineers have recognized this as a potential risk and developed a solution for this unwanted situation. For instance when using an iSCSI array or NFS based storage choosing “Leave powered on” as your default isolation response might lead to a split-brain situation. Figure 13: Virtual machine message . A split-brain situation can occur when the VMDK file lock times out. make sure you understand the isolation response setting. as of version 4.0 Update 2 ESX detects that the lock on the VMDK has been lost and issues a question if the virtual machine should be powered off and auto answers the question with yes. you will only see this question if you directly connect to the ESX host. FCoE or NFS network is also unavailable. but briefly explained by one of the engineers on the VMTN Community forums. Below you can find a screenshot of this question.vmware. However. Which could potentially leave vCenter in an inconsistent state as two VMs with a similar UUID would be reported as running on both hosts. which is viewable within vCenter. This would cause a “ping-pong” effect where the VM would appear to live on ESX host 1 at one moment and on ESX host 2 soon after. (This not well documented.com/message/1488426#1488426. This could happen when the iSCSI. HA will generate an event for this auto-answer though. In this case the virtual machine is being restarted on a different host while it is not being powered off on the original host because the selected isolation response is “Leave powered on”. http://communities.

As stated above. It is also recommended to have a secondary Service Console (ESX) or Management Network (ESXi) running on the same vSwitch as the storage network to detect a storage outage and avoid false positives for isolation detection. Isolation Detection We have explained what the options are to respond to an isolation event. We will discuss the options you have for Service Console / Management Network redundancy more extensively later on in this. By doing this you will be able to detect if there’s an outage on the storage network. they only know it is unavailable.0 Update 2. this is however not the case anymore and it should be safe to use “Leave powered on”. Isolate detection is a mechanism that takes place on the host that is isolated. As of ESX 4. The isolation address is the gateway specified for the Service Console network (or management network on ESXi). The remaining. as of ESX 4 update 2 the question will be auto-answered and the virtual machine will be powered off to recover from the split brain scenario. Remember primary nodes send heartbeats to primaries and secondaries. Configure a secondary Service Console on the same vSwitch and network as the iSCSI or NFS VMkernel portgroup and pre-vSphere 4. The question still remains: with iSCSI or NFS. "Leave powered on" could lead to a split-brain scenario. NFS. secondary nodes send heartbeats only to primaries.0 Update 2 to select either “Power off” or “Shut down” as the isolation response. When a node receives no heartbeats from any of the other nodes for 13 seconds (default setting) HA will ping the “isolation address”. hosts don’t know if that host has failed completely or if it is isolated from the network. should you power off virtual machines or leave them powered on? As described above in earlier versions. We recommend avoiding the chances of a split-brain scenario. The mechanism is fairly straightforward though and works as earlier explained with heartbeats. Basic design principle: For network-based storage (iSCSI. You would end up seeing multiple virtual machines ping-ponging between hosts as vCenter would not know where it resided as it was active in memory on two hosts.0 Update 2) to set the isolation response to "Shut Down" or “Power off”. This is one of the key mechanisms of HA. FCoE) it is recommended (pre-vSphere 4. However we have not extensively discussed how isolation is detected. but there is a possibility to specify one or multiple additional isolation addresses . non-isolated.

If only one heartbeat is received or just a single isolation address can be pinged the isolation response will not be triggered. We recommend to set at least one additional isolation address. which is exactly what you want. shut down or leave powered on. meaning no heartbeats have been received and HA was unable to ping any of the isolation addresses.with an advanced setting. This advanced setting is called “das.isolationaddress When isolation has been confirmed. We generally recommend an isolation address closest to the hosts to avoid too many . HA will execute the isolation response. power down.isolationaddress” and could be used to reduce the chances of having a false positive. Figure 14: das. Selecting an Additional Isolation Address A question asked by many people is which address should be specified for this additional isolation verification. This could be any of the above-described options.

In many cases the most logical choice is the physical switch to which the host is directly connected.network hops. another usual suspect would be a router or any other reliable and pingable device. when you are using network based shared storage like NFS and for instance iSCSI a good choice would be the IP-address of the device. Failure Detection Time is basically the time it takes before the “isolation response” is triggered. Meaning that the failover coordinator will initiate the restart on the 17th second. However. (das. Failure Detection Time Failure Detection Time seems to be a concept that is often misunderstood but is critical when designing a virtual infrastructure. There are two primary concepts when we are talking about failure detection time: The time it will take the host to detect it is isolated The time it will take the non-isolated hosts to mark the unavailable host as isolated and initiate the failover The following diagram depicts the timeline for both concepts: Figure 15: High Availability failure detection time The default value for failure detection is 15 seconds. It should be noted that in the case of a dual management network setup both addresses will be pinged and a 1 second will need to be added to the timeline. .failuredetectiontime) In other words the failed or isolated host will be declared failed by the other hosts in the HA cluster on the fifteenth second and a restart will be initiated by the failover coordinator after one of the primaries has verified that the failed or isolated host is unavailable by pinging the host on its management network. this way you would also verify if the storage is still reachable or not.

The isolation response “Power off” will be triggered by the isolated host 1 second before the das.isolationaddress”. and with ESX 3. Does this mean that you can end up with your virtual machines being down and HA not restarting them? Yes. Basic design principle: Keep das.failuredetectiontime as low as possible to decrease associated down time.failuredetectiontime low for fast responses to failures. a restart will be initiated after one of the primary nodes has tried to ping all of the management network addresses of the failed host.failuredetectiontime will also decrease the chances of running into issues like these. At the time of writing (vSphere) this is not a best practice anymore as with any value the “2-second” gap exists and the likelihood of running into this issue is small.failuredetectiontime” (15000). How can you avoid this? Selecting “Leave VM powered on” as an isolation response is one option.failuredetectiontime elapses. Increasing the das. when the heartbeat returns between the 14th and 16th second the “Power off” might have already been initiated. The restart however will not be initiated because the received heartbeat indicates that the host is not isolated anymore. If an isolation validation address has been added.Let’s stress that again. add 5000 to the default “das. We recommend keeping das. In other words a “Power off” will be initiated on the fourteenth second. A restart will be initiated on the sixteenth second by the failover coordinator if the host has a single management network.5 it was a standard best practice to increase the failure detection time to 30 seconds. . Let’s assume the isolation response is “Power off”. “das.

Single Service Console with vmnics in Active/Standby Configuration Requirements: 2 physical NICs VLAN trunking Recommended: 2 physical switches The vSwitch should be configured as follows: vSwitch0: 2 Physical NICs (vmnic0 and vmnic2) 2 Portgroups (Service Console and VMkernel) . To increase resiliency of the “heartbeat” network (Service Console for ESX and Management Network for ESXi) VMware introduced the concept of NIC teaming. however these recommendations are also valid for ESXi. (This is of course also possible for other “Portgroups” but that is not the topic of this book. We have included the vMotion (VMkernel) network in our examples as combining the Service Console and the VMkernel is the most commonly used configuration and a VMware best practice. which can be used for network fault tolerance and load balancing. To simplify the concepts we used ESX as an example. The Isolation Response enables HA to restart virtual machines when “Power off” or “Shut down” has been selected and the host is isolated from the network. VMware supports both of these configurations and each have their own pros and cons that are listed in the section below.” Using this mechanism it is possible to add redundancy to the Management Network or Service Console network to decrease the chances of a false positive.Chapter 5 Adding Resiliency to HA (Network Redundancy)In the previous chapter we have extensively covered Isolation Detection that triggers the selected Isolation Response and the impact of a false positive. “NIC teaming is the process of grouping together several physical nics into one single logical nic.) Another option is configuring a secondary Management Network or Service Console network.

(NIC Teaming Tab) Pros: Only 2 NICs in total are needed for the Service Console and VMkernel. especially useful in Blade environments. The following diagram depicts the active/standby scenario: Figure 16: Active-standby Service Console network layout . We highly recommend setting failback to “No” to avoid chances of a false positive which can occur when a physical switch routes no traffic during boot but the ports are reported as “up”.Service Console active on vmnic0 and standby on vmnic2 VMkernel active on vmnic2 and standby on vmnic0 Failback set to No Each portgroup has a VLAN ID assigned and runs dedicated on its own physical NIC. only in the case of a failure it is switched over to the standby NIC. Cons: Just a single active path for heartbeats. This setup is also less complex.

Secondary Management Network Requirements: 3 physical NICs VLAN trunking Recommended: 2 physical switches The vSwitch should be configured as follows: vSwitch0 – 3 Physical NICs (vmnic0 & vmnic2) 3 Portgroup (Service Console. secondary Service Console and VMkernel) .

Cons .) The secondary Service Console will be active on vmnic2 and connected to the second physical switch. It is mandatory to set an additional isolation address (das. with a VLAN assigned on either the physical switch port or the portgroup and is connected to the first physical switch.Decreased chances of false alarms due to Spanning Tree “problems” as the setup contains two Service Consoles that are both connected to only 1 physical switch. The following diagram depicts the secondary Service Console scenario: Figure 17: Secondary management network . (We recommend using a VLAN trunk for all network connections for consistency and flexibility. The VMkernel is active on vmnic1 and standby on vmnic2.The primary Service Console runs on vSwitch0 and active on vmnic0. Subsequently both Service Consoles will be used for the heartbeat mechanism that will increase resiliency. Pros .Need to set advanced settings.isolationaddress2) in order for the secondary Service Console to verify network isolation via a different route.

The question remains. which would we recommend? Both scenarios are fully supported and provide a highly redundant environment either way. Redundant NICs for your Service Console adds a sufficient level of resilience without leading to an overly complex environment. . Redundancy for the Service Console or Management Network is important for HA to function correctly and avoid false alarms about the host being isolated from the network. We however recommend the first scenario.

or disconnected.com/kb/1007006) .” Admission Control guarantees capacity is available for an HA initiated failover by reserving resources within a cluster. it is taken out of the equation. When Admission Control is set to strict. There is one gotcha with Admission Control that we want to bring to your attention before drilling into the different policies. which could lead to potential issues when that particular host failed or resources were scarce as there would be no host available to power-on your virtual machines.k. In other words if a host is placed into maintenance mode. (KB: http://kb. However Admission Control is a must when availability needs to be guaranteed and isn’t that the reason for enabling HA in the first place? What is HA Admission Control about? Why does HA contain Admission Control? The “Availability Guide” a.Chapter 6 Admission Control Admission Control is often misunderstood and disabled because of this.a HA bible states the following: “vCenter Server uses Admission Control to ensure that sufficient resources are available in a cluster to provide failover protection and to ensure that virtual machine resource reservations are respected.1 environment you could have ended up with all but one ESX host placed in sleep mode. Available resources also mean that the virtualization overhead has already been subtracted from the total. (For more info on how DPM calculates read Chapter 18) When Admission Control was disabled and DPM was enabled in a pre-vSphere 4. It calculates the capacity required for a failover based on available resources. This means that it will always ensure multiple hosts are up and running.vmware. Service Console Memory and VMkernel memory is subtracted from the total amount of memory that results in the available memory for the virtual machines. VMware Distributed Power Management in no way will violate availability constraints. To give an example.

The impact of each policy is described in the following section including our recommendation. Admission Control Policy The Admission Control Policy dictates the mechanism that HA uses to guarantee enough resources are available for an HA initiated failover.With vSphere 4. Figure 18: Admission control policy .1 however. if there are not enough resources to power on all hosts. DPM will be asked to take hosts out of standby mode to make more resources available and the virtual machines can then get powered on by HA when those hosts are back online. HA has three mechanisms to guarantee enough capacity is available to respect virtual machine resource reservations. This section gives a general overview of the available Admission Control Policies.

“A slot is defined as a logical representation of the memory and CPU resources that satisfy the requirements for any powered-on virtual machine in the cluster…” In other words a slot is the worst case CPU and memory reservation scenario in a cluster. This section will take you on a journey through the trenches of Admission Control mechanisms. Each option has a different mechanism to ensure resources are available for a failover and each option has its caveats. Admission Control does not limit HA in restarting virtual machines.Below we have listed all three options currently available as the Admission Control Policy. If a failure has occurred and the host has been removed from the cluster. Admission Control Mechanisms Each Admission Control Policy has its own Admission Control mechanism. This could result in an over-committed cluster as you can imagine. The so-called “slots” mechanism is used when selecting “host failures cluster tolerates” as the Admission Control Policy. For instance setting a reservation on a specific virtual machine can have an impact on the achieved consolidation ratio. Understanding this Admission Control mechanism is important to understand the impact of decisions for your cluster design. So even if resource would be low and vCenter would complain it couldn’t stop the restart. HA will recalculate all the values and start with an “N+x” cluster again from scratch. Slots dictate how many virtual machines can be powered on before vCenter starts yelling “Out Of Resources”! Normally a slot represents one virtual machine. If no reservations of higher than 256 MHz are set HA will use a default of 256 MHz for CPU. If no memory reservation is set HA will use a default of 0MB+memory overhead for memory. The mechanism of this concept has changed several times in the past and it is one of the most restrictive policies. HA initiated restarts are executed directly on the ESX host without the use of vCenter. It is also historically the least understood Admission Control Policy due to its complex admission control mechanism. (See the VMware vSphere Resource Management Guide for . Host Failures Cluster Tolerates The Admission Control Policy that has been around the longest is the “Host Failures Cluster Tolerates” policy. This directly leads to the first “gotcha”: HA uses the highest CPU reservation of any given virtual machine and the highest memory reservation of any given VM in the cluster. For those wondering why HA initiated failovers are not prone to the Admission Control Policy think back for a second. it ensures enough resources are available to power on all virtual machines in the cluster by preventing “over-commitment”. Admission Control is done by vCenter.

the amount of available slots for this host will be 5 as HA always will always take the worst case scenario into account to “guarantee” all virtual machines can be powered on in case of a failure or isolation. If you have 25 CPU slots but only 5 memory slots.If virtual machine “VM1” has 2GHz of CPU reserved and 1024MB of memory reserved and virtual machine “VM2” has 1GHz of CPU reserved and 2048MB of memory reserved the slot size for memory will be 2048MB (+memory overhead) and the slot size for CPU will be 2GHz. The question we receive a lot is how do I know what my slot size is? The details around slot sizes can be monitored on the HA section of the Cluster’s summary tab by clicking the “Advanced Runtime Info” line. Reservations defined at the Resource Pool level however.more details on memory overhead per virtual machine configuration) The following example will clarify what “worst-case” actually means. don’t configure them. Figure 19: High Availability cluster summary tab This will show the following screen that specifies the slot size and more useful details around the amount of slots available. If reservations are needed. Example . . The most restrictive number. It is a combination of the highest reservation of both virtual machines. again worst-case scenario is the number of slots for this host. We will need to know what the slot size for memory and CPU is first. Now that we know the worst case scenario is always taken into account when it comes to slot size calculations we will describe what dictates the amount of available slots per cluster. will not affect HA slot size calculations. Then we will divide the total available CPU resources of a host by the CPU slot size and the total available Memory Resources of a host by the memory slot size. if there’s no need to have them on a per virtual machine basis. resort to resource pool based reservations. This leaves us with a slot size for both memory and CPU. especially when using Host Failures Cluster Tolerates. Basic design principle:Be really careful with reservations.

However.1 HA will notify DRS that a power-on attempt was unsuccessful and a request will be made to defragment the resources to accommodate the remaining virtual machines that need to be powered on. As of vSphere 4. The following diagram depicts a scenario where a virtual machine spans multiple slots: . When you are low on resources this could mean that you are not able to power-on this high reservation virtual machine as resources may be fragmented throughout the cluster instead of available on a single host. with vSphere this is something that is configurable.Figure 20: High Availability advanced runtime info As you can see using reservations on a per-VM basis can lead to very conservative consolidation ratios.slotMemInMB”.slotCpuInMHz” or “das. To avoid not being able to power on the virtual machine with high reservations the virtual machine will take up multiple slots. If you have just one virtual machine with a really high reservation you can set the following advanced settings to lower the slot size used for these calculations: “das.

as of vSphere 4. but it will not verify the amount of available slots per host to ensure failover.1. this is no guarantee for a successful power-on attempt or slot availability. they are fragmented and HA will not be able to power-on this particular virtual machine directly but will request DRS to defragment the resources to accommodate for this virtual machines resource requirements. . Admission control does not take fragmentation of slots into account when slot sizes are manually defined with advanced settings. As you might have noticed none of the hosts has 4 slots left. However. to defragment the resources.Figure 21: Virtual machine spanning multiple HA slot Notice that because the memory slot size has been manually set to 1024MB one of the virtual machines (grouped with dotted lines) spans multiple slots due to a 4GB memory reservation. As stated earlier though HA will request DRS. It will take the number of slots this virtual machine will consume into account by subtracting them from the total number of available slots. Although in total there are enough slots available.

The cluster contains a virtual machine that has 1 vCPU and 4GB of memory. large clusters are preferred as it increases the load balancing options. However many companies start out with a small VMware cluster when virtualization is introduced and plan on expanding when trust within the organization has been built. Let’s try to clarify that with an example. If there is a large discrepancy in size and reservations are set it might help to put similar sized virtual machines into their own cluster. As explained earlier a reservation will dictate the slot size. However there is a caveat for DRS as well. which is described in the DRS section of this book. The question is will you add the newly bought hosts to the same cluster or create a new cluster? From a DRS perspective. Unbalanced Configurations and Impact on Slot Calculation It is an industry best practice to create clusters with similar hardware configurations.Basic design principle:Avoid using advanced settings to decrease the slot size as it could lead to more down time and adds an extra layer of complexity. A 1024MB memory reservation has been defined on this virtual machine. Let’s first define the term “unbalanced cluster”. When the time has come to expand. chances are fairly large the same hardware configuration is no longer available. For HA there is a big caveat and when you think about it and understand the internal workings of HA you probably already know what is coming up. An unbalanced cluster would for instance be a cluster with 6 hosts of which one contains more memory than the other hosts in the cluster. which in this case leads to a memory slot size of 1024MB+memory overhead. The following diagram depicts this scenario: . Example: What would happen to the total number of slots in a cluster of the following specifications? Six host cluster Five hosts have 16GB of available memory One host has 32GB of available memory The sixth host is a brand new host that has just been bought and as prices of memory dropped immensely the decision was made to buy 32GB instead of 16GB. For the sake of simplicity we will however calculate with 1024MB.

In other words for our cluster this would result in: . When a single host failure has been specified.Figure 22: High Availability memory slot size When Admission Control is enabled and a number of host failures has been specified as the Admission Control Policy. the amount of slots will be calculated per host and the cluster in total.16 Slots ESX04 .16 Slots ESX06 .16 Slots ESX02 .16 Slots ESX03 .32 Slots As Admission Control is enabled a worst-case scenario is taken into account. This will result in: ESX01 . this means that the host with the largest number of slots will be taken out of the equation.16 Slots ESX05 .

that percentage of the total amount of available resources will stay reserved for HA purposes. the new Percentage method is an alternative. Then HA will calculate how much resources are currently reserved by adding up all reservations for both memory and CPU for powered on virtual machines. It would result in 64 slots.esx01 + esx02 + esx03 + esx4 + esx5 = 80 slots available Although you have doubled the amount of memory in one of your hosts you are still stuck with only 80 slots in total. Basic design principle:When using Admission Control. This makes sense doesn’t it? Can you avoid large HA slot sizes due to reservations without resorting to advanced settings? That’s the question we get almost daily. Percentage of Cluster Resources Reserved With vSphere VMware introduced the ability to specify a percentage next to a number of host failures and a designated failover host. Now what would happen in the scenario above when the number of allowed host failures is to 2? In this case ESX06 is taken out of the equation and one of any of the remaining hosts in the cluster is also taken out. In our example the memory slot size happened to be the most restrictive. (Amount of overhead per configuration type can be found in the “Understanding Memory Overhead” section of the Resource Management guide. As clearly demonstrated there is absolutely no point in buying additional memory for a single host when your cluster is designed with Admission Control enabled and a number of host failures as the Admission Control Policy has been selected. For those virtual machines that do not have a reservation larger than 256 MHz a default of 256 MHz will be used for CPU and a default of 0MB+memory overhead will be used for Memory. HA uses reservations to calculate the slot size and there’s no way to tell HA to ignore them without using advanced settings pre-vSphere. balance your clusters and be conservative with reservations as it leads to decreased consolidation ratios.) In other words: . First of all HA will add up all available resources to see how much it has available (virtualization overhead will be subtracted) in total. With vSphere. the same principle is applied when CPU slot size is most restrictive. The percentage avoids the slot size issue. So what does it use? When you specify a percentage. as it does not use slots for Admission Control. The answer used to be NO if per virtual machine reservations were required.

memory or CPU.((Total amount of available resources – resources)/total amount of available resources) spare capacity) total reserved virtual machine <= (percentage ha should reserve as Here total reserved virtual machine resources include the default reservation of 256MHz and the memory overhead of the virtual machine. This would lead to the following calculations: ((24GHz-(2Gz+1GHz+256MHz+4GHz))/24GHz) = 69 % available ((96GB-(1. Even if a reservation has been set. the amount of memory overhead is added to the reservation. These thresholds can be monitored on the HA section of the Cluster’s summary tab Figure 24: High Availability summary .2GB)/96GB= 85 % available As you can see the amount of memory differs from the diagram. When one of either two thresholds is reached. Admission Control will disallow powering on any additional virtual machines.1GB+114MB+626MB+3. Let’s use a diagram to make it a bit clearer: Figure 23: Percentage of cluster resources reserved Total cluster resources are 24GHz (CPU) and 96GB (MEM). For both metrics HA Admission Control will constantly check if the policy has been violated or not.

as you can imagine. Also make sure you select the highest restart priority for this virtual machine (of course depending on the SLA) to ensure it will be able to boot. As earlier explained this Admission Control Policy does not use slots. as such resources might be fragmented throughout the cluster. You have 5 hosts. the first power-on attempt for this particular virtual machine will fail due to the fact that none of the hosts has enough memory available to guarantee it. We recommend ensuring you have at least one host with enough available capacity to boot the largest virtual machine (reservation CPU/MEM). each with roughly 80% memory usage. and you have configured HA to reserve 20% of resources. This way you ensure that all virtual machines residing on this host can be restarted in case of a host failure. to accommodate for these virtual machines resource requirements a guarantee cannot be given. Figure 25: Available resources . if needed.If you have an unbalanced cluster (hosts with different sizes of CPU or memory resources) your percentage should be equal or preferably larger than the percentage of resources provided by the largest host. A host fails and all virtual machines will need to failover. One of those virtual machines has a 4GB memory reservation. Although as of vSphere 4. The following diagram will make it more obvious.1 DRS is notified to rebalance the cluster.

Basic design principle:Although vSphere 4.1 will utilize DRS to try to accommodate for the resource requirements of this virtual machine a guarantee cannot be given. Do the math; verify that any single host has enough resources to power-on your largest virtual machine. Also take restart priority into account for this/these virtual machine(s).

Failover Host
The third option one could choose is a designated Failover host. This is commonly referred to as a hot standby. There is actually not much to tell around this mechanism, as it is “what you see is what you get”. When you designate a host as a failover host it will not participate in DRS. You will not be able to power on virtual machines on this host! It is almost like it is in maintenance mode and it will only be used in case a failover needs to occur.

Chapter 7
Impact of Admission Control Policy
As with any decision when architecting your environment there is an impact. This especially goes for the Admission Control Policy. The first decision that will need to be made is if Admission Control is enabled or not. We recommend enabling Admission Control but carefully select the policy and ensure it fits your or your customer’s needs.

Basic design principle:
Admission Control guarantees enough capacity is available for virtual machine failover. As such we recommend enabling it. We have explained all the mechanisms that are being used by each of the policies in Chapter 6. As this is one of the most crucial decisions that need to be made we have summarized all the pros and cons for each of the three policies below.

Host Failures Cluster Tolerates
This option is historically speaking the most used for Admission Control. Most environments are designed with an N+1 redundancy and N+2 is also not uncommon. This Admission Control Policy uses “slots” to ensure enough capacity is reserved for failover, which is a fairly complex mechanism. Slots are based on VM-level Reservations.

Pros:
Fully automated (When a host is added to a cluster, HA re-calculates how many slots are available.) Ensures failover by calculating slot sizes.

Cons:
Can be very conservative and inflexible when reservations are used as the largest reservation dictates slot sizes. Unbalanced clusters lead to wastage of resources. Complexity for administrator from calculation perspective. Percentage as Cluster Resources Reserved Percentage based Admission Control is the latest addition to the HA Admission Control Policy. The percentage based Admission Control is based on per VM reservation calculations instead of slots.

Pros:
Accurate as it considers actual reservation per virtual machine. Cluster dynamically adjusts when resources are added.

Cons:
Manual calculations needed when adding additional hosts in a cluster and number of host failures need to remain unchanged. Unbalanced clusters can be a problem when chosen percentage is too low and resources are fragmented, which means failover of a virtual machine can’t be guaranteed as the reservation of this virtual machine might not be available as resources on a single host.

Specify a Failover Host
With the Specify a Failover Host Admission Control Policy, when a host fails, HA will attempt to restart all virtual machines on the designated failover host. The designated failover host is essentially a “hot standby”. In other words DRS will not migrate VMs to this host when resources are scarce or the cluster is imbalanced.

and take customer requirements into account. However. We recommend using a “Percentage” based Admission Control Policy. as it is the most flexible policy. No fragmented resources. the number of host failures policy guarantees the failover level under all circumstances. Percentage based is less restrictive. Recommendations We have been asked many times for our recommendation on Admission Control and it is difficult to answer as each policy has its pros and cons. (N+2 redundancy is impossible. However. HA will be able to restart all virtual machines. Cons: What you see is what you get. Maximum of one failover host. It is the most flexible policy as it uses the actual reservation per virtual machine instead of taking a “worse case” scenario approach like the number of host failures does.) Dedicated failover host not utilized during normal operations.Pros: What you see is what you get. but offers lower guarantees that in all scenarios. . Basic design principle: Do the math. we generally recommend a Percentage based Admission Control Policy. With the added level of integration between HA and DRS we believe a Percentage based Admission Control Policy will fit most environments.

Figure 26: VM and Application Monitoring .Chapter 8 VM Monitoring VM monitoring or VM level HA is an often overlooked but really powerful feature of HA.1 VMware also introduced VM and Application Monitoring. Application Monitoring is a brand new feature that Application Developers can leverage to increase resiliency as shown in the screenshot below. With vSphere 4. We have tried to gather all the info we could around VM Monitoring but it is a pretty straightforward product that actually does what you expect it would do. The reason for this is most likely that it is disabled by default and relatively new compared to HA.

the virtual machine will be rebooted. Although Application Monitoring is relatively new and there are only a few partners currently exploring the capabilities it does add a whole new level of resiliency in our opinion. Symantec ApplicationHA as it is called is triggered to get the application up and running again by restarting it. these heartbeats are not sent over a network. If however for whatever reason this fails for an "X" amount (configurable option within ApplicationHA) of times HA will be asked to take action. VM/App Monitoring responds to a single virtual machine or application failure as opposed to HA which responds to a host failure. An example of a single virtual machine failure would for instance be the infamous “blue screen of death”. In the case of Symantec a simplified version of Veritas Cluster Server (VCS) is used to enable application availability monitoring including of course responding to issues. We have tested ApplicationHA by Symantec and personally feel it is the missing link. If heartbeats. are not received for a specific amount of time. are restarted in the correct order and it avoids the common pitfalls associated with restarts and maintenance. It enables you as System Admin to integrate your virtualization layer with your application layer. Note that it is not a multi-node clustering solution like VCS itself but a single node solution. Symantec's ApplicationHA is aware of dependencies and knows in which order services should be started or stopped. Figure 27: VM monitoring sensitivity . How Does VM/App Monitoring Work? VM Monitoring restarts individual virtual machines when needed. It ensures you as a System Admin that services. Why Do You Need VM/Application Monitoring? VM and Application Monitoring acts on a different level as HA. This action will be a restart of the virtual machine.As of writing there was little information around Application Monitoring besides the fact that the Guest SDK is be used by application developers or partners like for instance Symantec to develop solutions against the SDK. VM/App monitoring uses a similar concept as HA. heartbeats. and in this case VMware Tools heartbeats. The heartbeats are communicated directly to VPXA by VMware Tools. which are protected.

Default value is 3 automatic reboots after a failure. Table 1: VM monitoring sensitivity It is important though to remember that VM Monitoring does not infinitely reboot virtual machines. When quick action is required in case of a possible failure “high sensitivity” can be selected.Minimum amount of seconds between failures. . Default value is 3600 seconds.Maximum amount of virtual machine failures. The following advanced settings can be set to change this default behavior: High Availability advanced settings: das.maxFailureWindow . By default when a virtual machine has been rebooted three times within an hour no further attempts will be taken. to avoid a problem from repeating. and as expected this is the opposite of “low sensitivity”. Low sensitivity basically means that the amount of allowed “missed” heartbeats is higher and as such the chances of running into a false positive are lower. The default setting should fit most situations. das. if the amount is reached VM Monitoring doesn’t restart the machine automatically.maxFailures . if a virtual machine fails more than specified with das. the level of sensitivity can be configured.maxFailures within 3600 seconds VM Monitoring will not restart the machine. However if a failure occurs and the sensitivity level is set to low the experienced downtime will be higher. Well unless of course the specified time has elapsed.When enabling VM/App Monitoring.

iostatsInterval” as described above. and is stored in the virtual machine’s working directory. which is responsible for the restarts. VM/App Monitoring uses the “usage” counters for both disk and network and “vpxa” requests these every 20 seconds.das. To avoid false positives.0. The default value is 120 seconds. VM/App Monitoring uses heartbeats just like host-level HA. Although the heartbeat produced by VMware Tools is reliable. not by using the VM Network. The agent uses the “Performance Manager” to monitor the disk and network I/O. Screenshots The cool thing about VM Monitoring is the fact that it takes screenshots of the VM console. VMware added a further verification mechanism. They are taken right before a virtual machine is reset by VM Monitoring. and the AAM agent has absolutely nothing to do with VM/App Monitoring. This 120 second interval can be modified of course by changing the advanced setting “das. This is crucial to know as this means that when a virtual machine network error occurs the virtual machine heartbeat will still be received. The heartbeats are sent to VPXA. The M Monitoring mechanism sits in the vpxa agent. VM/App Monitoring is independent from host-level HA. As stated before. If and when error occurs VPXA will request a restart of the virtual machine when all conditions are met through hostd. Basic design principle: VM Monitoring can substantially increase availability. When heartbeats are not received AND no disk or network activity occurred over the last 120 seconds (per default) the virtual machine will be reset. It is part of the HA stack and we heavily recommend using it! . It is a very useful feature when a virtual machine “freezes” every once in a while with no apparent reason. again just like with host-level HA it works independent of vCenter. but that is done via the Management Network. VM Monitoring also monitors I/O activity of the virtual machine. This screenshot can be used to debug the virtual machine operating system. Is AAM enabling VM/App Monitoring? Although VM/App Monitoring is configured within HA. Of course this info is also being rolled up into vCenter.Amount of seconds VM Monitoring will look back to see if any Storage or Network I/O has taken place before deciding to reboot a virtual machine in the case no VMware Tools heartbeats are received. This has been added as of vCenter 4.iostatsInterval . if and when needed. These requests are also logged in the vpxa log file.

Chapter 9 vSphere 4. .1 HA and DRS Integration HA integrates on multiple levels with DRS as of vSphere 4. It is a huge improvement and it is something that we wanted to stress as it has changed the behavior and the reliability of HA.1.

When a failover occurs HA will first check if there are resources available on that host for the failover. as of vSphere 4. With vSphere 4. will ask DRS to defragment the resources to accommodate for this virtual machine’s resource requirements.1 this has changed.Affinity Rules VMware introduced VM-Host affinity rules with vSphere 4. The following diagram depicts this scenario: . if it is not possible to perform a failover without violating the rule the failover will not be performed. There are two types of VM-Host affinity rules “must” and “should”.1 HA is closely integrated with DRS. However. Especially when DPM was enabled this could lead to some weird behavior when resources where scarce and an HA failover would need to happen.1. Affinity rules are covered in-depth in the DRS section of this book. a guarantee cannot be given. When HA fails over a virtual machine it will power-on the virtual machine in the Root Resource Pool. the virtual machine’s shares were scaled for its appropriate place in the resource pool hierarchy. A scenario where and when this can occur would be the following: VM1 has a 1000 shares and Resource Pool A has 2000 shares. Flattened Shares Pre-vSphere 4. Resource Fragmentation As of vSphere 4. VM-Host affinity rules are specified within the DRS configuration and are typically used to bind a group of virtual machines to a group of hosts. even with this additional integration you should still be cautious when it comes to resource fragmentation. (For more details on this scenario see Chapter 7. If a rule is created of the type “must” HA will need to adhere to this rule when a failover needs to occur. by bringing hosts out of standby mode or migrating virtual machines to defragment the cluster resources) so that HA can perform the failovers. If for instance that particular virtual machine has a very large reservation.1. However. and the Admission Control Policy is based on a percentage. Although HA will request a defragmentation of resources. As such. not for the Root Resource Pool. This could cause the virtual machine to receive either too many or too few resources relative to its entitlement. However Resource Pool A has 2 VMs and both will have 50% of those “20003 shares.) HA.1 an issue could arise when shares had been set custom on a virtual machine. it could happen that resources are fragmented across multiple hosts. HA will use DRS to try to adjust the cluster (for example. In such cases. DPM In the past there barely was integration between DRS/DPM and HA.

This is depicted in the following diagram: Figure 29: Flatten shares host failure .000 was specified on both VM2 and VM3 they will completely blow away VM1 in times of contention.Figure 28: Flatten shares starting point When the host would fail both VM2 and VM3 will end up on the same level as VM1. However as a custom shares value of 10.

This scenario is depicted in the following diagram. This flattening process ensures that the virtual machine will get the resources it would have received if it had failed over to the correct Resource Pool. Chapter 10 .This situation would persist until the next invocation of DRS would re-parent the virtual machine to its original Resource Pool.1 DRS will flatten the virtual machine’s shares and limits before fail-over. Figure 30: Flatten shares after host failure before DRS invocation Of course when DRS is invoked both VM2 and VM3 will be re-parented under Resource Pool A and will receive the amount of shares they had originally assigned again. To address this issue as of vSphere 4. Note that both VM2 and VM3 are placed under the Root Resource Pool with a shares value of 1000.

Summarizing The integration of HA with DRS has been vastly improved and so has HA in general. We hope though that after reading this section of the book everyone is confident enough to make the changes to HA needed to increase resiliency and essentially uptime of your environment because that is what it is all about. We have tried to simplify some of the concepts to make it easier to understand. still we acknowledge that some concepts are difficult to grasp. If there are any questions please do not hesitate to reach out to either of the authors. Part 2 . We hope everyone sees the benefits of these improvements and of HA and VM and Application Monitoring in general.

VMware Distributed Resource Scheduler Chapter 11 .

DRS compares cluster-level and host-level capacity to the demand of the virtual machines. Cluster Level Resource Management Clusters group the resources of the various ESX hosts together and treat them as a pool of resources. DRS places the virtual machine on an appropriate host or generates a recommendation depending on the automation level.1. DRS presents the aggregated resources as one big host to the virtual machines. DRS aggregates ESX host resources into clusters and automatically distributes these resources to the virtual machines. but a virtual machine cannot span hosts even when resources are pooled by using DRS. DRS monitors resource usage and continuously optimizes the virtual machine resource distribution across ESX hosts. In addition to resource pools and resource allocation policies. Pooling resources allows DRS to create resource pools spanning across all hosts in the cluster and apply cluster level resource allocation policies. DRS offers the following resource management capabilities. Because DRS is an automated solution and easy to configure. A DRS-enabled cluster is often referred to as a DRS cluster. DRS computes the resource entitlement for each virtual machine based on static resource allocation settings and dynamic settings such as active usage and level of contention. DRS attempts to satisfy the virtual machine resource entitlement with the resources available in the cluster by leveraging vMotion. Initial placement – When a virtual machine is powered on in the cluster. a DRS cluster can manage up to 32 hosts and 3000 VMs. vMotion is used to either migrate the virtual machines to alternative ESX hosts with more available resources or migrating virtual machines away to free up resources. In vSphere 4. including . Probably unnecessary to point out. DRS continuously monitors the active workload and the available resources.What is VMware DRS? VMware Distributed Resource Scheduler (DRS) is an infrastructure service run by VMware vCenter Server (vCenter). we recommend enabling DRS to achieve higher consolidation ratios at low costs. Load balancing – DRS distributes virtual machine workload across the ESX hosts inside the cluster. Power management – When Distributed Power Management (DPM) is enabled. DRS compares the results to the ideal resource distribution and performs or recommends virtual machines migration to ensure workloads receive the resources to which they are entitled and with the goal of allocating resources to maximize workload performance.

Requirements In order for DRS to function correctly the environment must meet the following requirements: VMware ESX or ESXi in a cluster VMware vCenter Server VMware vSphere Enterprise or Enterprise Plus License Meet vMotion requirements (not mandatory. and moves virtual machines as needed to adhere to user-defined affinity and anti-affinity rules. In practice the thread may be invoked more frequently due to changes made inside the cluster. or recommends placing ESX hosts in standby mode if excessive capacity is detected or DPM powers on hosts if more capacity is needed. Operation and Tasks of DRS Load Balance Calculation vCenter creates and runs a DRS thread on the vCenter Server per cluster which communicates with the management agent (VPXA) on every ESX host inside the cluster. By default a DRS thread is invoked every five minutes. DRS will be invoked when virtual machine resource settings are changed or host are added or removed. Constraint correction – DRS redistributes virtual machines across ESX hosts to evacuate hosts for user requests that the hosts enter maintenance or standby mode. Figure 31: DRS thread components . It places. applies resource settings and (if needed) generates migration recommendations. vMotion is required. vMotion is not a requirement.recent historical demand. Basic design principle:Configure vMotion to fully benefit from DRS capabilities. This thread calculates the imbalance of the cluster. but highly recommended) o Shared VMFS volumes accessible by all ESX host inside the cluster. o Private migration network o Gigabit Ethernet o Processor compatibility For DRS to allow automatic load-balancing. For initial placement though. For example.

The technical paper: “VMware vCenter Server Performance and Best Practices” list the minimum hardware recommendations for three deployment sizes. . Periodically the VPXA sends additional notification and statistics to the vCenter server.000 powered on virtual machines. ranging from 50 hosts and 500 virtual machines to 1. keeping the status of both ESX and VMs in sync with the status in vCenter. Migration and Info Requests In turn DRS sends messages to the ESX host. To ensure performance of vCenter add sufficient vCPUs and memory.Events and Statistics The vCenter agent (VPXA) runs inside each ESX host in the cluster and enables a two way communication. The VPXA sends information when a virtual machines power state changes or when a virtual machine is migrated with vMotion. vCenter and Cluster sizing The impact of the resource utilization by the DRS threads on the vCenter must be taken into account when sizing the vCenter server and designing the cluster environment. such as proposed migrations and information requests.000 hosts and 10. It’s recommended to follow these hardware recommendations when sizing the vCenter virtual machine.

only the VDI cluster experience increased DRS invocations. offering sufficient options to load-balance the virtual machines across the host inside the cluster without introducing too many DRS threads in vCenter. combination of workload types. which in turn can impact the performance of the virtual machines due to slow or insufficient load-balancing migration recommendations and resource entitlement calculations. . (Table 3 of Chapter 14 list the events invoking DRS calculations) Separate workloads In large environments it’s recommended to separate VDI workloads and server workloads and assign different clusters to each workload to reduce the DRS invocations. vCenter servers in Virtual Desktop Infrastructure environments experience more load due to the amount of virtual machines and higher frequency of virtual machine power state changes. Amount of clusters A lower amount of virtual machines inside a cluster will reduce the amount of load-balancing calculations. A lower amount of virtual machines generally results in a smaller amount of hosts per cluster.The configuration of cluster sizes. However the potential danger is creating too much small size clusters. reducing the complexity and the amount of calculations performed by DRS per cluster. For example. By isolating server workloads from VDI workloads. having 200 x 3 host clusters instead of 100 x 6 host clusters could drive up CPU utilization of the vCenter as each cluster will at least invoke the periodic loadbalancing every 5 minutes. which leads to invoking DRS threads more often. It is believed that the current “sweetspot” of hosts per cluster ranges between the 16 and 24 hosts per cluster. therefore ensuring fast DRS performance. virtual machine management and amount of virtual machines have impact on the behavior and performance of the vCenter and therefore influence the performance of DRS threads.

The following steps however will show you how to create a cluster and how to enable DRS: 1. Click Finish to complete the creation of the cluster Figure 32: Enable DRS . Verify the Automation Level is set to Fully Automated and select Next 6. 4. Select the Hosts & Clusters view. Right Click your Datacenter in the Inventory tree and click New Cluster 3. select Turn On VMware DRS and click Next 5. Leave the Swapfile Policy set to default and click Next 7. We recommend at a minimum including the location of the cluster and a sequence number ie. ams-hadrs-001. Give the new cluster an appropriate name. 2. In the Cluster Features section of the page. DRS settings can be modified when the cluster is in use and without disruption of service.Chapter 12 DRS Cluster Settings When DRS is enabled on the cluster you need to select the automation level and set the migration threshold.

DRS suggests migration recommendations for the virtual machines. The recommendations will not be automatically applied. . Three automation levels exist: Manual . Partially automated - DRS starts the VM on the most suitable host. but they will be applied if an administrator accepts each one. If the cluster is unbalanced DRS only suggests migration recommendations.Automation Level The automation level determines the level of autonomy of DRS. ranging from generating placement and load-balancing recommendations to automatically applying the generated recommendations. which the user must apply manually.DRS generates recommendations for initial placement of the virtual machines and if the cluster becomes unbalanced.

so why not try fully automatic for a while to get comfortable with it? Basic design principle:Set capabilities. If the cluster is unbalanced. if the cluster is configured with the manual automation level. DRS migrates the virtual machine to a more suitable hosts. This list is presented to the user to help the user selecting the appropriate host. The automation level of the cluster can be changed without disrupting virtual machines. Also the DRS policy settings are taken into account.Fully automated . automation level to Automatic to fully benefit from DRS . discussed in a later chapter.or partially automated automation level is selected.DRS places the virtual machine on the most suitable host when it is powered on. DRS reviews the state of the cluster at an interval of five minutes and publish recommendations to solve the imbalance of the cluster. the user must manually apply the recommendations issued by DRS. Consequently. Besides inefficiency. Table 2: DRS automation level Initial Placement Initial placement occurs when a virtual machine is powered on or resumed. it is possible that DRS rules are violated because the administrators apply the recommendations infrequently. Impact of Automation Levels on Procedures When manual. DRS will create a prioritized list of recommended hosts for virtual machine placement. It’s easy to change it. the administrators should check the recommendations after each DRS invocation to solve the cluster imbalance. By default DRS selects an ESX host based on the virtual machine resource entitlement. DRS rules are explained in section “Rules” of chapter 16.

the entitlements are adequate to ensure the virtual machine’s performance goals. the primary goal of DRS is to ensure each virtual machine receives its entitled resources and to do this it rebalances virtual machine workload across the hosts in the cluster. To properly interact with the local resource scheduler of each ESX host. Let us look at the scheduler architecture layer. To do this as effectively as possible. In addition to the ESX local host resource scheduler. DRS computes the cluster imbalance and creates recommendations for migrating virtual machines to solve the resource supply and resource demand imbalance. DRS examines the current demand and contention in the environment and uses the resource allocation settings of the virtual machine to determine the resource entitlement for it. Two-Layer Scheduler Architecture When enabling DRS on a cluster a two-layer scheduler architecture is created.Chapter 13 Resource Management As stated before. Figure 33: Global scheduler and local schedulers . By trying to ensure that all virtual machines receive enough resources to satisfy their resource entitlement. DRS dynamically moves a virtual machine across the cluster to optimize the cluster load balance. DRS assumes that a virtual machine should not have any performance problems if it receives those resources. But contrary to popular believe DRS is not concerned with performance per se. DRS introduces a global scheduler. To satisfy each virtual machine resource entitlement. Instead DRS focuses on whether each virtual machine inside the cluster or resource pool gets its specified resource allocation. DRS converts cluster level resource pool settings into host level settings. In other words.

or if there is an artificial limit imposed. A virtual machine’s resource entitlement changes as the virtual machines runs.The global scheduler supervises the entire cluster and the local scheduler manages the resource allocation of the virtual machines on each host. This is how much of the physical resources ESX thinks the virtual machine should get. and limits. reservations. which is also known as the working set size.Also referred to as “MIN”. By default this will be everything the virtual machines wants. it’s CPU demand which is an estimated amount of CPU the virtual machine would consume if no contention exists and the utilization or degree of contention of the host. Resource Entitlement Every virtual machine has a “resource entitlement” for CPU and memory. A virtual machine’s resource entitlement is based on static entitlement and dynamic entitlement which subsequently consist of static settings and dynamic metrics. the global scheduler calculates resource entitlement when virtual machines are placed inside a resource pool and sends these calculations to the host. such as the estimated active memory. The dynamic entitlement consists of dynamic metrics. It is the target ESX defines for how much resource to give the virtual machine. Resource allocation settings: Reservation . The static entitlement consists of resource allocation settings. This entitlement is the allocation of resources that a virtual machine should receive. The host-level CPU and memory schedulers handle the resource entitlement of the virtual machine. unless there is too little resources to meet all the virtual machines aggregated demand (in other words “contention”). . Both DRS and the local ESX host scheduler use the virtual machines resource allocation settings to compute the resource entitlement of the virtual machines. A reservation is the amount of physical resources (MHz or MB) guaranteed to be available for the virtual machine. shares. DRS relies on host level scheduling to allocate resources.

In other words. the working set size. CPU demand is used to calculate the virtual machine’s CPU entitlement. Shares are always measured against other powered-on sibling virtual machines and resource pools on the host. If the vCPU is not demanding any physical CPU resources it is de-scheduled anyway. The instructions will be processed by the pCPU until CPU timeslots are available again for the virtual machine. any vCPU instructions will be placed in the pCPU scheduler queue when the virtual machine consumes all its assigned time. if a CPU limit is set. the ESX host can still revert to swapping when a memory limit is set on the virtual machine because this introduces an arbitrary cap not due to genuine contention. the amount of memory used above the limit is ballooned.Shares specify the relative importance of the virtual machine. Limit . For example. ESX applies the . The virtual machine will receive its CPU cycles and the memory pages issued by the virtual machine will be mapped on machine pages (physical memory of the ESX host). the virtual machine will be allocated whatever it wants to consume within its configured limit. By default. Idle memory tax . compressed or swapped out to a hypervisor swap file. CPU and memory The rest of the resource entitlement is based on the number of shares and in case of memory. it defines the upper limit of usage of physical resources even if there is a surplus beyond the aggregated demand. The working set size is the estimated amount of active memory. CPU demand . The limit setting is the opposite of reservation.Estimated amount of active memory of the virtual machine. if a reservation is set on the virtual machine it will have a static resource entitlement that is at least as large as or greater than the reservation.Estimated amount of CPU the virtual machine would consume if no contention. When a virtual machine hits its memory limit. the limit will implicitly be the amount installed as virtual hardware in the virtual machine. The reservation setting guarantees that physical resources will be available to back the specified amount of resources. Working set . When a cluster is overcommitted.Shares . A limit is the only exception. if no limit is explicitly set. the cluster might experience more resource demand than its current capacity.Mechanism by which ESX reallocates unreserved idle memory from virtual machines It is very important to know that if a cluster or host is under-committed the virtual machine resource entitlement will be the same as its resource demand. in that case. DRS and the VMkernel will distribute and allocate resources based on the resource entitlement of each virtual machine Resource Entitlement Calculation The distribution of resources is as follows.Also referred as “MAX”. A limit specifies an upper bound for resources that can be allocated to a virtual machine.

the excess memory is ballooned. . This is the memory entitlement of the virtual machine. If the virtual machine uses more than its entitlement during contention.idle memory tax to the virtual machine and all the inactive memory can be reallocated to other virtual machines during contention. The amount of shares determines which virtual machine has a priority over other virtual machines. The host will keep on reclaiming memory until the virtual machine resource usage is at or below its resource entitlement. compressed or swapped depending on the free memory state of the ESX host. ESX compares the shares value of each sibling virtual machine and select “victims” to confiscate memory from.

The interval in which the DRS algorithm is invoked can be controlled through the vpxd configuration file (vpxd. Table 3: Events invoking DRS calculations Each recommendation generated by DRS and the recommendations not launched are retired at the next invocation of DRS. The DRS imbalance calculation is also triggered if the cluster detects changes in its resource pool tree.cfg) with the following option: vpxd config file: . while still satisfying all the requirements and constraints.Chapter 14 Calculating DRS Recommendations DRS takes several metrics into account when calculating migration recommendations to load balance the cluster. the current resource demand of the virtual machines. When is DRS Invoked? By default DRS is invoked every 300 seconds. host resource availability and the applied high-level resource policies. or modification of resource settings. DRS might generate the exact recommendation again if the imbalance is not solved. operations and events such as changes in resource supply. The following section explores how DRS use these metrics to create a new and better placement of virtual machine than the existing location of the virtual machine. When the invocation interval expires DRS will compute and generate recommendations to migrate virtual machines.

1. It is strongly discouraged to change the default value. During the next re-evaluation cycle. A limit is imposed because there is no advantage in recommending migrations that cannot be complement before the next DRS invocation. but usually shouldn't. There are limits to how many migrations DRS will recommend per interval per ESX host.0 and allow DRS to reach a steady state more quickly when a significant load imbalance in the cluster exist. The MaxMovesPerHost parameter still exist. . but can be set to anything in the range of 60 seconds to 3600 seconds. so there is no need to tweak the value. the limit on moves per host is dynamic. A less frequent interval might reduce the number of vMotions and therefore overhead but would risk leaving the cluster imbalanced for a longer period of time. The MaxMovesPerHost setting should adapt to the DRS invocation frequency & average time per vMotion. Shortening the interval will likely generate extra overhead. These improvements make DRS less conservative compared to that in vSphere 4. This can. be changed by setting the DRS Advanced Option MaxMovesPerHost. The default value is of this parameter is 8. based on how many moves DRS estimates that can be completed during one DRS evaluation interval cycle. The MaxMovesPerHost value is adaptive to the maximum number of concurrent vMotions per host and the average migration time observed from previous migrations. MaxMovesPerHost Adjusting the interval impacts the amount of migrations DRS will recommend. but only a limit on the maximum amount of moves per host for load balancing. virtual machine resource demand can have changed rendering the previous recommendations obsolete Please note that there is no limit on max moves per host for a host entering maintenance or standby mode. for little added benefit.<config> <drm> <pollPeriodSec> 300 </pollPeriodSec> </drm> </config> The default frequency is 300 seconds. but can be exceeded by DRS. In vSphere 4.

The reason why HA uses a random approach is because HA already knows that none of the available hosts can accommodate this virtual machine. defragmentation of resources occurs across the cluster. DRS will keep spare resources defragmented and available permanently to support the configured HA host failover level. In vSphere 4.1 HA and DRS integration”. due to the fact that HA could not restart the virtual machines on any host in the cluster after the host failure. VM B from host 2 to host 3. where this scenario uses multihop migrations. During an overreservation. DRS will move virtual machines around in the cluster and if possible and necessary leveraging DPM by powering on ESX hosts to try to fix this over-reservation. To create an over-reservation. HA requests DRS to defragment resources to accommodate for a virtual machine’s resource requirements to enable an HA initiated restart. VM A from host 1 to host 2. Recommendation Calculation To generate a migration recommendation. the virtual machines reserve more memory than available on the host. Selecting a different failover policy or disabling strict Admission Control can however result in a fragmented cluster. These moves are mandatory. but is started by the over-reservations on a particular host.1 if HA is unable to find sufficient available resources to restart the virtual machines after a host failure. HA will request DRS to provide these resources to accommodate the virtual machine that couldn’t be restarted. when strict Admission Control is enabled and the “host-failure policy is selected.Defragmenting cluster during Host failover As described in Chapter 9 “vSphere 4. VM A from host 1 to host 2. DRS executes a series of calculations and passes in which it determines the level of cluster imbalance and which virtual machines it needs to migrate to solve the imbalance. What distinguishes this scenario from the common DRS load-balancing policy scenario is that DRS can calculate migrations that involve multiple virtual machines. . This reduces the time. Due to the ability to use multi-hop migrations. overhead and cost incurred compared to a traditional DRS load-balancing calculation. The normal load-balancing policy only calculates single-hop migrations. HA creates “ghost virtual machines” identical to the virtual machines that could not be restarted and places them on a random host in the cluster. The host on which the ghost virtual machine is placed becomes over-reserved and this violates the DRS rules. This scenario however will not invoke a traditional cluster wide DRS load-balance calculation (due to the duration of this process). In a fragmented cluster enough resources are available overall but not enough resources are available at a host level to restart a specific virtual machine. As explained. but DRS will simulate an over-reservation.

the cluster is considered imbalanced. This sum is divided by the capacity of the host and this value is called the host’s normalized entitlement. It does this by computing the resource entitlement of each active virtual machine on the host and summing the entire virtual machine load on the same host. Two types of VM-Host affinity/anti-affinity rules exists. Correcting host resource overcommitment (rare.rules. DRS essentially treats the Should-rules as Hard-Rules during this phase. But a quick primer on VM-Host affinity rules should help you to understand the Constraints Correction Pass better. The VM-Host affinity (or anti-affinity) rules specify which virtual machines must or should run on a group of ESX hosts.1 introduces VM-Host affinity/anti affinity rules in addition to the VM-VM affinity (or anti-affinity) rules. The Should-rule is a preferential rule for DRS and DPM and both DRS and DPM use their best effort to apply the Should. VM-Host affinity rules are a special case. Must-rules are mandatory rules for HA. If all virtual machines can be placed without introducing violations or over-utilized hosts the results are output. including: Evacuating hosts that the user requested enter maintenance or standby mode. The Should-rules are a special case in DRS algorithm. Correcting VM/VM affinity mode violations. If the Current Host Load Standard Deviation (CHLSD) exceeds the Target Host Load Standard Deviation (THLSD). To calculate the CHLSD and THLSD DRS needs to determine the load of each host first. More details about VM-Host affinity rules can be found in chapter 16. These constraints are respected during load-balancing. it runs a pass to consider and correct constraints. since DRS is controlling resources). Correcting Mandatory VM-Host affinity/anti-affinity rule violations. Correcting VM/VM anti-affinity mode violations.Constraints Correction Before DRS runs its load-balancing pass. vSphere 4. The constraints may cause imbalance. DRS and DPM. it does this by comparing the "current hosts load standard deviation" metric to the "target host load standard deviation". The entire DRS algorithm (constraint correction + load-balancing) is executed and DRS tries to place the virtual machines listed in the Should-rules. Imbalance Calculation DRS needs to establish if the cluster is imbalanced. Sum (VM entitlements) / (capacity of host) . Must-rules and Shouldrules. The imbalance info on the cluster summary page informs the administrator if there was an unfixable imbalance. and that imbalance may not be fixable due to these constraints. If the Should-rules introduce constraints or over-utilization of hosts the DRS algorithm is repeated again with the Should-rules dropped (since they are best effort) and retried with only the Mustrules in place.

The outcome of the sum (VM entitlement) / (capacity of host) becomes the load metric of the host (CHLSD). . DRS computes the Target Host Load Standard Deviation (THLSD). In a cluster with VMware High Availability (HA) enabled and HA Admission Control enabled (default). various dynamic metrics and the virtual machine overhead on the ESX host. so that growth in a virtual machine’s use of overhead memory will not go out of control in between invocations of DRS. The capacity of the host is calculated by subtracting the VMkernel overhead. The conservative migration threshold setting creates a more tolerant environment that leads to fewer migrations. DRS maintains excess powered-on capacity to meet the High Availability settings. If the cluster is configured with an aggressive migration threshold. The standard deviation of this value across all the hosts in the cluster is the CHLSD. Capacity of host The memory capacity of the host is lower than the amount of installed physical memory. DRS will calculate a more restrictive THLSD. This information is displayed on the resource allocation tab of the cluster in vCenter. DRS will (if possible) keep increasing the limit on a virtual machine’s overhead memory if the virtual machine needs more overhead memory. which results in frequent migrations to keep the CHLSD beneath the THLSD threshold. ranging from conservative to aggressive. the cluster will be less tolerant to cluster imbalance and will generate “priority-five” recommendations. The conservative migration threshold setting generates only the “priority-one” recommendations which are mandatory recommendations. The THLSD is derived from the migration threshold setting. which are expected to produce only very modest improvements.VM entitlement As mentioned in chapter 14 “Resource Management” the virtual machine resource entitlement calculated by the local scheduler is based upon its static settings. the Service Console overhead from the installed memory and a 6% reservation. Impact of Migration Threshold on Selection Procedure The migration threshold reflects tolerance of cluster load imbalance. Five settings can be selected. DRS enforces an overhead limit that is respected by the VMkernel scheduler. Given this value and the migration threshold. Figure 34: Migration threshold Every migration recommendation from DRS has a priority level which indicates how beneficial the migration is expected to be. Selecting the aggressive migration threshold setting.

In this simulation. Update cluster to the state after the move is added.Selection of Virtual Machine Candidate If an imbalance is detected. DRS selects a virtual machine to migrate based on specific criteria and simulates the migration in the cluster. If no good migration is found: stop.} . it will stop after adding it to the migration recommendation list. Else: Add move to the list of recommendations. DRS will use the following procedure: DRS procedure: While (load imbalance metric > threshold) { move = GetBestMove(). } While the cluster is imbalanced (Current host load standard deviation > Target host load standard deviation). If the CHLSD is still above the threshold it will repeat the procedure but if this migration solves the imbalance. The GetBestMove procedure consists of the following instructions: getbestmove procedure: GetBestMove() { For each virtual machine v: For each host h that is not Source Host: If h is lightly loaded compared to Source Host: If Cost-Benefit and Risk Analysis accepted simulate move v to h measure new clusterwide load imbalance metric as g Return move v that gives least cluster-wide imbalance g. The GetBestMove procedure aims to find the virtual machine that will give the best improvement in the cluster wide imbalance. a procedure is triggered to decide which virtual machine(s) it will migrate to correct the imbalance. DRS computes the possible Current Host Load Standard Deviation after the migration.

The following diagram and section will go into the cost benefit and risk analysis: . Only hosts that are using fewer resources than the source host are considered. The cost benefit and risk analysis also prevents unstable workloads from affecting the recommendations. Cost-Benefit and Risk Analysis Criteria The purpose of the cost-benefit and risk analysis is to filter out expensive and unstable migrations. to avoid the high cost associated with unnecessary vMotions. the recommendation becomes useless and creates a situation where the virtual machine is selected over and over again resulting in “Ping-Pong” migrations. the migration is deemed as unstable. Take this into account when sizing the vCenter server. One might argue that the used terminology can be alarming. Basic design principle:The number of clusters and virtual machines managed by vCenter influences the number of calculations which impacts the performance of vCenter. DRS cycles through each DRS-enabled virtual machine and each host that is not the source host. After the cost-benefit and risk analysis is completed and the results are accepted a migration of the virtual machine to the host is simulated. The vMotion process itself uses CPU and memory resources. If the virtual machine’s workload changes directly after the recommendation. DRS does this for all the virtual machines and compares the result of all tried combinations (VM<->Host) and returns the vMotion that results in the least cluster imbalance. NOTE The term “unstable migration” indicates the effect of the migration on the cluster load balance and examines the stability of the workload pattern of the virtual machine. by constantly migrating machines it will nullify the benefit from migrating virtual machines. DRS will measure the new cluster-wide load imbalance metric.This procedure tries to find the migration that will offer the best improvement. It has nothing to do with the stability of the vMotion process itself! If the workload of the virtual machine changes after the migration due to an increase or decrease of demand. By doing a cost-benefit and risk analysis DRS tries to throttle migrations and avoid a constant stream of vMotions.

it needs to be factored in. Downtime incurred during vMotion is usually measured in milliseconds. the memory consumption by this shadow virtual machine is also factored in to the cost of the recommendation. But because there is an interruption of service. At the end of the vMotion process the migrated virtual machine has a short period of downtime where a snapshot is made of the virtual machine and is resumed on the destination host. . although negligible. NOTE The term “downtime” needs some clarification. This brief downtime is approximately one second or less and is not disruptive to virtual machine connections. A shadow virtual machine is created during the vMotion process on the destination host.Figure 35: Cost benefit risk analysis Cost .During migration vMotion tries to reserve 30% of a physical CPU on both the source and destination host. Downtime indicates the interruption of service of the virtual machine.

. The net resource gain is calculated for each of the periods and weighted by the length of the period. it will predict the amount of time of this workload. Figure 36: Resource gain calculation The X-axis of the chart displays the progress of time and the Y-axis shows the absolute positive or negative gain of the virtual machine on both source and destination hosts. resources are freed up on the source host and the virtual machine itself receives more resource due to the availability of resources on its new host. VMware used the following chart to illustrate this resource gain. The migration of workload will result in a much more balanced cluster. DRS uses historical data for this calculation. Risk Risks accounts for the possibility of irregular loads. Imagine what impact adjusting the invocation interval has on the analysis. The cost-benefit and risk analysis results in a resource gain. this is called stable time.Benefit Due to the migration of the virtual machine. The resource gain is in term of the absolute units MHz or MB depending on the type being measured. whether positive or negative. by using the metrics Host CPU: active and Host Memory: active. DRS starts with determining how much resources the virtual machine is consuming. After stable time. After establishing the consumed resources. Irregular load indicates inconsistent and spiky demand workloads. DRS becomes conservative and it assumes that the virtual machine will run at the worst possible load listed in the history (up to 60 minutes) until the next DRS invocation time.

this recommendation should result in a migration which gives the most improvement in terms of cluster balance.ceil(LoadImbalanceMetric / 0. LoadImbalanceMetric is the current host load standard deviation shown on the cluster's summary page of the vSphere Client. ceil (x) is the smallest integral value not less than x. DRS will only recommend a migration if it has an acceptable result of the cost-benefit and risk analysis. The areas are added together and the sum is used to decide if the move should be rejected. If it’s not enough to balance the cluster within the given threshold the GetBestMove gets executed again by the procedure which is used to form a set of recommendations. there is a period where the gain is positive and when the stable period ends. migration gain is lower as there is an extra migration cost.In the example shown. in other words: Biggest bang for the buck! This is the reason why usually the busier. most hungry virtual machines are moved as they will most likely decrease “Current host load standard deviation” the most. a recommendation is created. Virtual machines with a smaller memory sizes or fewer virtual CPUs provide more placement opportunities for DRS. This means it’s recommended to configure the size of the virtual machine to what it actually needs.1 * sqrt (NumberOfHostsInCluster)). The Biggest Bang for the Buck After the cost-benefit risk analysis and the simulation. preventing oversizing. the worst gain turns out to be negative. After migration. it does not justify use of big virtual machine. which explains how DRS assigns the priority ratings to the migration recommendations: “For each migration recommendation. the priority level is limited to the integral range priority 2 to priority 5 (inclusive) and is calculated according to the following formula: 6 . Basic design principle:Although DRS migrates busier virtual machines to gain the most improvement of cluster balance. Here. Virtual machines with larger memory size and/or more virtual CPUs add more constraints to the selection and migration process. For each host. . Calculating the Migration Recommendation Priority Level VMware published the algorithm in VMware knowledgebase article 1007485 “Calculating the priority level of a VMware DRS migration recommendation”.

022. Let us use this formula in an example. like 1.compute the load on the host as sum (expected VM loads)) / (capacity of host). We created a workflow diagram to help visualize the flow of the DRS imbalance calculation process. Then compute the standard deviation of the host load metric across all hosts to determine LoadimbalanceMetric. according to the screenshot.022 / 0. According to the formula.2. the 3 host cluster has a current host load standard deviation of 0. .3 etc). the calculation would be: 6 – ceil (0.” The LoadimbalanceMetric value used in the algorithm is the current host load standard deviation value and ceil rounds up the value to a integer (a whole number.1 * sqrt(3)) Figure 37: DRS summary This would result in a priority level of 5 for the migration recommendation if the cluster was imbalanced.

Figure 38: DRS workflow diagram .

priority levels were introduced to exemplify which level of tolerance is used to generate migration recommendations. During calculation DRS assigns a priority level to a recommendation and this priority level is compared to the migration threshold. are being executed. this chapter takes a closer look at the various settings and the impact they can have on the DRS processes. It is not uncommon to see several migrations to satisfy the configured DRS rules. The DRS cluster will not invoke load-balancing migrations. It is possible that a mandatory move will cause a violation on another host. if this happens DRS will move virtual machines to fix the new violation. priority-one recommendations. . This scenario is possible when multiple rules exist on the cluster. To make the migration threshold setting more understandable. Migration Threshold Levels As mentioned in the previous chapter. If the priority level is less or equal to the migration threshold. Mandatory moves are issued when: The ESX host enters maintenance mode The ESX host enters standby mode An (anti) affinity rule is violated The sum of the reservations of the virtual machine exceeds the capacity of the host. The migration threshold factor is configured at the DRS setting on cluster level. Level 1 (conservative) When selecting the conservative migration threshold level only mandatory moves. the migration threshold specifies the tolerance of imbalance of the Current Host Load Standard Deviation relating to the Target Host Load Standard Deviation. the recommendations are not displayed or discarded. the recommendation is displayed or applied depending on the automation level of the cluster. If it’s above the migration threshold. Level 2 (moderately conservative) The level 2 migration threshold only applies priority-one and priority-two recommendations. priority two recommendations promise a very good improvement in the cluster’s load balance.Chapter 15 Influence DRS Recommendations Some DRS settings and feature can influence the DRS migration recommendation.

. selecting an aggressive migration threshold when hosting virtual machines with varying loads in a cluster can lead to a higher possibility of wasted migrations. A VM-Host affinity rule specifies the affinity between a group of virtual machines and a group of ESX hosts inside the cluster. The default moderate migration threshold provides sufficient balance without excessive migration activity. Although the cost benefit risk analysis takes unstable workloads into account.1 contains two types of affinity rules. Aggressive thresholds. -two and priority-three recommendations promising a good improvement in the cluster’s load balance. Priority-four recommendations promise a moderate improvement in the cluster’s load balance. every recommendation which promises even a slight improvement in the cluster’s load balance is applied. Level 4 (moderately aggressive) The level 4 migration threshold applies all recommendations up to priority level four. It is typically aggressive enough to maintain workload balance across hosts without creating excessive overhead caused by too-frequent migrations. Level 5 (aggressive) Level 5 migrations is the right-most setting on the migration threshold slider and applies all five priority level recommendations. whereas a VM-VM affinity rule only specifies the affinity between individual virtual machines. five star recommendations should always be applied. Rules VMware vSphere 4. Affinity rules can specify if the virtual machines should stay together and run on specified hosts (affinity rules) or if they are not allowed to run on the same host (anti-affinity). relatively constant workload demands and little to few DRS. A moderate migration threshold is more suitable in such a scenario. Basic design principle: Select a moderate migration threshold if the cluster hosts virtual machines with varying workloads. A level 1. Virtual Machines to Host rules (VM-Host) and Virtual Machine to Virtual Machine (VM-VM) rules. the moderate migration threshold applies priority-one.Level 3 (moderate) The level 3 migration threshold is the default migration threshold when creating DRS clusters. level 4 and 5 are considered suitable for clusters with equal-sized hosts. but a list of several priority level 5 recommendations could also collectively affect the cluster negatively if those recommendations are not applied.

VM-VM Affinity Rules When an affinity rule is configured. The must-rule is a mandatory rule for HA. Apart from the affinity and anti-affinity specification. by running them on different hosts. HA and the user. a VM-Host affinity rule can either be a “must” rule or a “should” rule. DRS and DPM use their best effort to try to confine or prevent the virtual machines from running on the ESX host they are affined to. DRS measures CPU and Memory usage. These associations enable the administrator to designate certain ESX hosts for virtual machines to comply with ISV license regulations or to create availability zones. An anti-affinity rule is the opposite of an affinity rules: It tries to avoid that the specified virtual machines run on the same host. a VM-Host can be a mandatory rule or a preferential rule. . Another example of an anti-affinity rule could be the separation of two virtual machines with network intensive workloads. VM-Host affinity rules are created to establish an association between a group of virtual machines and a group of ESX hosts. VM-Host Affinity Rules A VM-Host affinity rule determines the groups of available ESX hosts on which the virtual machine can be powered-on or moved to by DRS. This can be to restrict Oracle database virtual machine to run only on ESX host which are licensed by Oracle or to separate virtual machines across different blade chassis for availability reasons. Components A virtual machine to host affinity rule consists of three components: Virtual machine DRS group ESX host DRS group Designation – “Must” affinity/anti-affinity or “Should” affinity/anti-affinity Virtual machine DRS groups and ESX host DRS groups are quite self-explanatory so let’s dive into the designations component straight away. An example would be running both the front-end and back-end servers of an application on the same host. but DRS and DPM can violate “should” rules if it compromises certain key operations HA is not aware of preferential rules because DRS will not communicate these rules to HA. it forces the virtual machines to run on the ESX hosts specified in the ESX host DRS Group. These rules are suitable for creating a highly available application. so due to this it is possible to end up with both virtual machines running on the same ESX host which could saturate the host networking capacity available for virtual machines. DRS should try to keep the specified virtual machines together on the same ESX host. DRS and DPM. The “should” rule is a preferential rule for DRS and DPM. the application will still be running even if the ESX host fails. not network usage. it can improve performance due to lower latencies of communication. such as multiple web servers. Designations Two different types of VM-Host rules are available.

For example. DRS and DPM must take the mandatory rules into account when generating or executing operations. Similarly vMotion will reject the operation if it detects that the operation is in violation of the mandatory rule. DRS will not generate the recommendation. If DRS is unable to honor either one of the requirements the virtual machine is not powered on or migrated to the proposed destination host. For example if a new rule is created and the current virtual machine placement is in violation of the rule. . DPM DPM does not place an ESX host into standby mode if the result would violate a mandatory rule and will power-on ESX hosts if these are needed to meet the requirements of the mandatory rules.HA. a message indicating the conflicting rule will appear and the rule will be visibly disabled. the older rule overrules the newer rule and DRS will disable the new rule. HA. DRS takes both reservation and mandatory affinity rule into account. DRS will not generate recommendations that would violate a rule. For example. execute the recommendation. why a virtual machine is not migrated from a highly utilized host to an alternative lightly utilized host in the cluster. Preferential rules can be used to meet availability requirements such as separating virtual machines between blade enclosures. it can only migrate to a new host if the virtual machine memory reservation can be satisfied on the new host. If a reservation is set on the virtual machine. HA and DPM operations are constrained as well. making it more difficult for DRS to balance load and enforce resource allocation policies. for example mandatory rules will: Limit DRS in selecting hosts to load-balance the cluster Limit HA in selecting hosts to power up the virtual machines Limit DPM in selecting hosts to power down Due to their limiting behavior. Both requirements must be satisfied during placement or power on. Because of this. If a rule is created that conflict with another active rule. When creating a new rule. DRS will create a priority one recommendation (five stars) and. if trying to place an ESX host into maintenance mode it would not be allowed if this would violate a mandatory rule. As you can imagine mandatory affinity rules can complicate troubleshooting in certain scenarios. If a rule is created and the current virtual machine placement is in violation of a rule. it is recommended to use mandatory rules sparingly and only for specific cases. DRS and DPM will never take any action that result in the violation of mandatory affinity rules. such as licensing requirements. mandatory rules place more constraints on VM mobility. if DRS is set to fully automatic. DRS and mandatory rules DRS takes mandatory rules into account when generating load-balance recommendations. if this is not possible.

This means that DRS will never ever place the virtual machine on an unlicensed host. not for DPM power saving and not after an ESX host failure event. or recommends placement of a host-affined virtual machine on a host to which is not listed in the Cluster Host DRS Group (ESX01 – ESX06 & ESX09ESX14). VM03. DRS never places. ESX09. . but the violation will be corrected by the next DRS invocation. VM20) 2. ESX10) 3. one of the hosts that are allowed by the mandatory rules.e. so HA might unknowingly violate the rule during placement of virtual machines after an ESX failure. Place all Oracle licensed ESX host in a Cluster Host DRS Group (ESX01. Place all Oracle virtual machines in a Cluster VM DRS group. ESX02. HA is unaware of the preferential (should) rules. VM11. not for maintenance mode. i. Select “Must run on Host in Group” Figure 39: Mandatory VM-Host affinity rule In this scenario. (VM01.1. HA respects only mandatory (must) rules. HA uses an archived list of hosts provided by DRS and places the virtual machines only on a compatible host.High Availability Due to the DRS-HA integration in vSphere 4. Let us take a look at a configuration which is very likely to be widely implemented soon the Oracle Must affinity rule. 1. During an ESX host failure event. migrates.

If DRS is disabled while mandatory VM-Host rules still exist. as rules can have an impact on the effectiveness of the Load balancing calculation. Mandatory VM-Host rule behavior By design. If a manually-started vMotion would violate the mandatory VM-Host affinity rule even after DRS is disabled. Oracle licenses can create a constraint on the design. Anti-affinity rules can play an important role in meeting certain SLA requirements or BC/DR requirements. The DRS algorithm has less choice when rules are configured.This virtual-machine-to-host affinity rule makes it possible to run Oracle inside big clusters without having to license all the ESX hosts. Impact of Rules on Organization Many users create rules but seem to forget to create a backup or to document them. the cluster still rejects the vMotion. These new rules allow the Oracle virtual machines to run inside the main cluster with other virtual machines without having to license all the ESX hosts inside the cluster. report and alert mandatory rules. Normally separate smaller clusters were deployed for Oracle database virtual machines. remove mandatory rules first before disabling DRS. If it is the administrator’s intend to disable DRS. There are five automation level modes: Fully Automated . Virtual Machine Automation Level You can customize the automation level for individual virtual machines in a DRS cluster to override the automation level set on the cluster. mandatory rules are considered very important and it’s believed that the intended use case which is licensing compliance is so important. that VMware decided to apply these restrictions to non-DRS operations in the cluster as well. This might be necessary to meet certain availability or business requirements. increasing both OPEX and CAPEX of the environment. Basic design principle: Use VM-Host and VM-VM affinity rules sparingly. Mandatory affinity rules apply even when DRS is disabled. mandatory rules are still in effect and the cluster continues to track. Mandatory rules can only be disabled if the administrator explicitly does so. Creating a backup or documenting seems appropriate. By using PowerCLI the rules can easily be extracted from the vCenter database.

most admins like to keep track of the vCenter server in case a disaster happens. An exception for this recommendation might be a virtualized vCenter server.Partially Automated Manual Default (cluster automation level) Disabled If the automation level of a virtual machine is set to disabled. . Select the automation level in accordance with your environment and level of comfort. An alternative to this method is keeping track of the datastore vCenter is placed on and register and power-on the virtual machine on a (random) ESX host after a disaster. Due to expanding virtual infrastructures and new additional features. you only need to power-up the ESX host on which the vCenter virtual machine is registered and manually power-up the vCenter virtual machine. Assuring good performance outweighs any additional effort necessary after a (hopefully) rare occasion. Partially automated versus disabled automation level mode As mentioned before. DRS does not migrate that virtual machine or provide migration recommendations for it. Use the Partially automated setting instead of Disabled at the individual virtual machine automation level. Try to have virtual machines in DRS fully automated mode as much as possible. maintenance mode is able to evacuate this virtual machine automatically. but both methods have merits. DRS is still able to select the most optimal host for the virtual machine. By setting the automation mode of the virtual machine to manual. as DRS considers these virtual machines for cluster load balance migrations before the virtual machines not in fully automated mode. Partially automated blocks automated migration by DRS. By selecting the “Disabled” function. vCenter is becoming more and more important for day-to-day operational management. but keep the initial placement function. During startup. After a disaster occurs. the virtual machine is started on the ESX server it is registered and chances of getting an optimal placement are low(er). due to requirements or constraints it might be necessary to exclude a virtual machine from automatic migration and stop it from moving around by DRS. for example a datacenter-wide power-outage. Slightly more work than disabling DRS for vCenter. but offers probably better performance of the vCenter Virtual Machine during normal operations.

If DRS can choose between virtual machines set to the automatic automation level and the manual automation level. DRS chooses the virtual machines which are set to automatic as it prefers them over virtual machines set to manual.Basic design principle: Leave virtual machine automation mode set to Default to minimize administration overhead and possibility of human error. Impact of VM Automation Level on DRS Load Balancing Calculation Contrary to popular belief. During the recommendation calculation. . Set automation level to Manual instead of Partially Automated if more control over placement is required. DRS skips the virtual machines which are set to disabled automation level and selects other virtual machines on that host. a virtual machine set to disabled automation level still has impact on the calculation of the current host load standard deviation as the sum of the active workload is divided through the capacity of the host. DRS does not need to be aware of virtual machine automation levels at that stage.

the root resource pool contains 16GB of memory. But how do these settings work at resource pool level and what impact does it have on the virtual machine workloads? Let us explore the construct called resource pool a bit more. The cluster aggregates all the CPU and memory resources. . clusters and resource pools to help simplify resource management. The cluster provides resources based on resource allocation controls set at the resource pool level. When the cluster is created both reservation and limit parameters are set at the root resource pool. limits and shares are similar to virtual machine resource allocation settings. The root resource pool is not displayed by vCenter and its resource allocation settings cannot be changed by the user. These resource allocation controls. limit and prioritize their member virtual machines to a certain amount of cluster capacity. For example.and 16GHz of CPU resources. but instead guarantee. a top-level resource pool called the root resource pool is created implicitly. Both settings are set to the same value and indicate the total amount of resources the cluster has available to run the virtual machines. Resource pools do not "carve up" physical resources of the cluster which can only be used exclusively by their member virtual machines. when a cluster is created of 4 hosts and each host has 4GHz CPU and 4 GB memory available. Root Resource Pool When a VMware cluster is created. This independent layer has several advantages. made available by the ESX hosts inside the cluster minus the resources used by the virtualization overhead. reservations. one of these advantages is subdividing the cluster capacity into smaller resource pools.Chapter 16 Resource Pools and Controls As we progress using virtualization. most administrators spend less time setting up environments and more time on resource management of the virtual infrastructure. VMware introduced DRS. Clusters aggregate ESX host capacity into one large pool and create an independent layer between the resource providers (ESX hosts) and resource consumers (virtual machines).

Figure 40: Root resource pool The following legend applies to all diagrams in the following chapters. . When calculating the root resource pool. The amount of resources required to satisfy HA failover (assuming HA Admission Control is enabled) will be shown in the root resource pool as reserved. whereas the amount of resources used by the Service Console and VMkernel will not even show up in the capacity of the root resource pool. vCenter will exclude resources reserved for the virtualization layer. such as the Service Console and VMkernel.

but consumes resources from the cluster. Figure 41: Resource pools Apart from the root resource pool. These internal resource pools are independent of the DRS resource pools. excluding the 4 resource pools created internally on each ESX host. we advise not to exceed a resource pool depth of maximum of 2.Resource Pools As stated before a VMware cluster allocates resources from hosts (resource providers) to virtual machines (resource consumers). the root resource pool is parent of resource pool 1 and 2. To avoid complicated proportional share calculations and complex DRS resource entitlement calculations. each resource pool has a parent resource pool. which can be other resource pools or virtual machines. NOTE Placing virtual machines at the same level as resource pools is not a recommended configuration! The maximum number of resource pool tree depth is 8. A resource pool contains children. . In the example pictured above. Resource pool 1 is the child of the root resource pool but also the parent of resource pool 3. vm3 and vm4. The flatter the resource pool tree the easier it is to manage. a resource pool provides resources to virtual machines. Resource pools are in between and are both resource providers and consumers.

. but still one vMotion operation per target resource pool. migrations between clusters are also limited to a single concurrent vMotion operation. limit and shares parameters for CPU and memory resources. Some administrators have the bad habit of using resource pools to create a folder structure in the “host and cluster” view of vCenter. the VMkernel uses several techniques and mechanisms to allocate resources according to the virtual machines resource entitlement. Figure 42: simultaneous migrations Under Committed versus Over Committed Resource allocation settings are used to guarantee a certain amount of resources when the cluster is overcommitted. When the active usage of resources does not exceed the available amount of physical resources of the host. Using resource pools as a folder structure is the limitation resource pools inflict on vMotion operations. During this state every resource request done by a virtual machine is backed by physical resources. resource pools have reservations. Similar to virtual machines. Because clusters are actually implicit resource pools (the root resource pool). Expandable reservation is the only setting that exists on the resource pool level and not at the virtual machine level.1 allows 8 simultaneous vMotion operations. vSphere 4. the state of the system is called under committed. however simultaneous migrations with vMotion can only occur if the virtual machine is moving between hosts in the same cluster and is not changing its resource pool. When the active usage of resources exceeds the available amount of physical resources. During overcommitment. Depending on the network speed.Resource pools and simultaneous vMotions vCenter Server 4. the system reaches a state called over committed. Fortunately simultaneous cross-resource-pool vMotions can occur if the virtual machines are migrating to different resource pools. Virtual machines are being placed inside a resource pool to show some kind of relation or sorting order like operating system or types of application.1 supports up to 512 resource pools in a DRS cluster and the resource pools can be up to 8 levels deep under the root resource pool.

shares and limit parameters on the resource pool a collective sum of resources are defined. reservation. Figure 43: Local resource pool mapping Dividing these values across the hosts is based on the amount of running active virtual machines. the local host CPU and memory scheduler takes care of the actual resource allocation. the virtual machines within the resource pool have guaranteed access to 10GB of machine memory (ESX physical memory) inside the cluster.This allows the resource pool. Shares are always measured against other powered-up virtual machines or resource pools containing such virtual machines at the same hierarchical level Reservation – Also referred to as “MIN”(minimum). their VM resource allocation settings and their current utilization.Resource Allocation Settings By configuring the Resource Allocation Settings. This collective sum of resources can be used by the virtual machines inside the resource pool. once it has already reserved as much capacity as defined in its own Reservation setting to reserve even more. Resource pool resource allocation settings Shares . Because virtual machine workloads are executed at ESX host level. but how are resource pool settings translated to ESX host level? DRS does this by a mechanism called resource pool mapping. Reservation is the amount of physical resources (MHz or MB) guaranteed to be available for the virtual machine or resource pool Limit – Also referred as “MAX”(maximum). Once the parent resource allocation settings are propagated to the host local RP tree. For example by setting a 10GB memory reservation on a resource pool.Shares specify the relative importance of the virtual machine or resource pool. Limit specifies an upper bound for resources that can be allocated to a virtual machine or resource pool Expandable reservation . DRS mirrors the resource pool hierarchy to each host and divides the parent resource pool resource allocation values across the mirrored local trees. The reservation is taken from the unreserved capacity in the parent of this root resource pool .

Normal or Low.Shares Shares specify the priority for the virtual machine or resource pool relative to other resource pools and/or virtual machines with the same parent in the resource hierarchy. . Figure 44: Parent-child relation The key point is that shares values can be compared directly only among siblings: the ratios of shares of VM1:VM2 tells which VM is higher priority. the relative share of the resources allocated to the virtual machine will change. Use care when selecting the custom setting. In consequence when more virtual machines become active in the same hierarchical level. High. configuring the resource pools 1 and 2 with respectively share values 10 and 20 has the same effect as configuring the resource pools with share values 10000 and 20000. or select the Custom setting to specify a more granular value. which specify share values with a 4:2:1 ratio. Contrary to reservation and limits which are specified in absolute numbers. The relative priority is calculated across all siblings in relation to their sibling share level. but the shares of VM2:VM3 does not tell which VM has higher priority. shares are relative to the other virtual machines and resource pools. as a virtual machine with a custom value can lead to the virtual machine owning a large portion of shares or in other words priority. Now there is not an official term (yet) to this but let us use the term sibling share level. When configuring shares you can select one of the three predefined settings. Because shares determine relative priority on the same hierarchical level. the absolute values do not matter.

In practice. 4000 CPU shares and 163840 memory shares. This means that without changing any share settings.The default behavior of shares is that they scale with the size of the virtual machine. and vice versa. this is not always true. The assumption is that a virtual machine with more vCPUs actually needs more CPU resources. This situation does not always reflect the business side requirements. the amount of shares the virtual machine receives is based upon the amount of configured vCPUs and default share level (low. sometimes by design and sometimes accidental (and not recommended). Placing virtual machines at the same level as resource pools is something we see very often. the administrator needs to select a resource pool. a virtual machine set to share level normal and configured with 1 vCPU and 1024 MB of memory will receive a 1000 CPU shares and 10240 memory shares. The pre-defined share settings have the following values: Table 4: Share values When a virtual machine is created. NOTE This method of assigning shares based on the amount of vCPU implicitly indicates (and explicitly controls) that a virtual machine with multiple vCPUs is more important and has a higher priority than a virtual machine with a lower amount of vCPUs. the virtual machine with fewer vCPUs might contain an application which is more important to the business than the resource intensive application running on the virtual machine with multiple vCPUs. If this step is overlooked by the administrator. a virtual machine set to share level normal receives 1000 shares per vCPU. Per default a resource pool is configured similar to a virtual machine with 4 vCPUs and 16GB set at normal level. the virtual machine ends up in the root resource .e. as virtual machines can end up with a higher priority than intended. normal and high). and by default the root resource pool is selected. During a manual vMotion. The level of importance to the business does not automatically equal bigger virtual machines. i. Caution must be taken when placing virtual machines on the same hierarchical level as resource pools. a virtual machine that is configured with more vCPUs and memory is entitled to a correspondingly larger amount of physical resources during contention.

Resource pool share settings do not influence share settings on virtual machine. Once the resource allocation settings are propagated to the host local RP tree.pool. their VM-level shares amounts and their current utilization. please be aware that this only occurs when HA fails over the virtual machine.e. So how do resource pool shares affect virtual machine workloads? As mentioned before. When creating a virtual machine inside a resource pool with a share value set to High. the share level on the virtual machine is still set to the default value of normal. DRS divides the resource pool cluster level share amount across the mirrored local trees based on the amount of running active virtual machines. (This example uses memory shares. to its siblings. the same applies to CPU shares). and the ESX host’s local CPU and memory scheduler retrieve the resource allocation settings of each active virtual machine from this tree.1 HA and DRS Integration”. In this scenario a virtual machine can be denied or deny other sibling virtual machines or resource pools from resources. DRS mirrors the resource pool hierarchy to each host. Per default a resource pool is configured similar to a 4 vCPU and 16GB virtual machine at normal level. For the sake of simplicity.e. vSphere 4. explained in chapter 10 “vSphere 4. In this cluster a resource pool is created and the default resource allocation settings are used. This is because these VM-level shares indicate the relative importance of the virtual machine within its hierarchical level i. let’s forget the previous example of nested resource pools and use a 2 host cluster.1 introduces a mechanism called flattened shares. Four virtual machines running inside the resource pool are configured as followed: Table 5: Share configuration scenario Let us assume that all the virtual machines are running equal and stable workloads. i. DRS will balance the virtual machines across both hosts and create the following resource pool mapping Figure 45: Resource pool mapping . the local host CPU and memory scheduler takes care of the actual resource allocation. 4000 CPU shares and 163840 memory shares. not to virtual machines in other resource pools.

The amount of shares specified on virtual machines vm1. the virtual machines are identical to the virtual machines in resource pool 1. Inside the resource pool. Figure 46: Share ratio result . resource pool 2 is configured with double the amount of resource pool 1. 327680. In this example DRS decides to place the virtual machines on ESX host ESX1 and therefore assigns half of the share value of the resource pool to the resource pool 1 mirrored in the host level resource pool tree.e. At this point resource pool 2 is created but with a different share configuration. i. vm2 and vm3 totals 20480. which equals half of the amount of total configured shares inside the resource pool.

my example used virtual machines with equal and stable workloads. Reservations can be set at resource pool level and virtual machine level. Because of the resource usage this process of dividing occurs every time DRS is invoked and therefore the distribution of resources will keep changing if appropriate. By introducing 327680 shares.6 percent.33% ratio at resource pool level. VM Level Scheduling: CPU vs Memory The behavior of the VMkernel CPU scheduler differs from the VMkernel memory scheduler when it comes to claiming and releasing physical resources. This means that virtual machine VM5 is entitled to get 50% of the resource pool’s resources during contention. the virtual machine is guaranteed to have these CPU cycles available. which is basically 33% of the ESX host’s resources. If a CPU reservation is set on a virtual machine. The allocated resources to resource pool 2 are subdivided between the virtual machines based on their hierarchical level (sibling share level). If the virtual machine does not use all CPU cycles. (1/2 of a pool which is 2/3 of the host = 2/6 = 1/3 of the host). you will guarantee a certain amount of resources for all its children collectively. When setting a reservation at resource pool level. To emphasize. though they may contend with one another. Due to the 67%. Normally this should be brief.The local resource pool tree of ESX1 is updated with resource pool 2. As the working set is part of the resource entitlement calculation. During normal conditions. But since CPU reservations are friendly. even during resource contention.The %shares column in the resource allocation tab of the cluster displays the amount of shares each object gets per parent share level. the local resource scheduler will allocate more resources to resource pool 2. they will flow back to the system and be available for other virtual machines until the virtual machine wants to use these again. . Reservation A reservation is a guaranteed lower bound of resources that is reserved for the resource pool or virtual machine to ensure availability of physical resources at all times. the “owner” sometimes has to wait for the interloper to leave when the owner wants the pCPU. To exclusively guarantee resources for a specific virtual machine a VM-level memory reservation has to be set. some virtual machines have a higher utilization then others. this resource pool is configured with twice the amount of shares as resource pool 1. which equals roughly to 66. active workload is accounted for when dividing the resource pool shares and resources between hosts and local resource pool trees. the total amount of shares active on the host is increased to 491520 (163840+327680) Resource pool 2 owning 327680 of the total of 491520.

Impact of Reservations on VMware HA Slot Sizes. Behavior of Resource Pool Level Memory Reservations Fortunately the memory reservation mechanism on resource pool level has a different behavior than the mechanism of virtual machine level reservation. as Microsoft Windows touches every page during startup.If a memory reservation is set on a virtual machine. The next section compares the behavior of memory reservation at resource pool level and virtual machine level. When reviewing the resource allocation tab of the cluster. prohibiting other virtual machines or resource pools to annex this. These pages aren't really used. it does not have anything to do with the physical usage or claiming of the memory. these pages are listed as reserved capacity. even when no virtual machine is running inside the resource pool. due to the impact on HA slot size calculation. but this behavior can result in the ESX host starting to swap and balloon if no free memory is available for other virtual machines while the owning VM’s aren’t using their claimed reserved memory. these pages are claimed by the virtual machine. Both virtual machine level CPU and memory reservations are used by HA Admission Control as input for the calculation of slot sizes. If such a virtual machine is placed in a HA cluster. If the virtual machine uses and therefore claims these machine pages. . the amount of memory reservation is added to the reserved capacity. these significant memory reservations can lead to a very conservative consolidation ratio. the virtual machine is guaranteed to have this amount of physical memory available. These pages are allocated by the virtual machine and no other virtual machine can use them anymore. We recommend implementing this workaround very sparingly as creating a resource pool for each VM creates a lot of administrative overhead and makes the host and cluster view a very unpleasant environment to work in. This can be useful to circumvent the HA slot size if virtual machines configured with large amounts of memory are active in the virtual infrastructure. This claiming phenomenon has an enormous impact when using Microsoft Windows as guest OS. DRS reduces this amount of memory from the available memory pool. By placing these virtual machines inside the resource pool and configuring the resource pool with a memory reservation equal to the configured memory. Basically the amount of reserved memory is just being subtracted from the pool of unreserved memory in the cluster. Because of the “friendly” conduct of CPU scheduling and "greedy" behavior of memory scheduling. the VMkernel will not reclaim these pages. we will focus more on memory reservation than CPU memory in the following paragraphs. Reservations set at resource pool level are ignored by HA. the virtual machine will be guaranteed of physical resources without creating an over-conservative slot size. Even if the virtual machine idles. When a resource pool is created it instantly "annexes" the specified memory reservation. Most of the time these “heavy hitters” run mission critical applications so it’s not unusual setting memory reservations to guarantee the availability of memory resources. Chapter 7 contains more information about HA slot sizes.

1 uses the flattened shares mechanism when restarting virtual machines in the root resource pool. Using only resource pool reservations and not virtual machine reservations can lead to (temporary) performance loss if a host failover occurs. Virtual machines who are actively using memory. So basically the memory reservation parameter becomes a dynamic "metric". the resource pool must divide the reserved memory pool across its member virtual machines. Therefore shares on virtual machine level become much more important than actually perceived by many. Besides shares.When we power up a virtual machine inside the resource pool. active memory usage and the configured memory size will impact the resource entitlement and thus performance of the virtual machine. . receive more access to memory resources than idle virtual machines. which can lead to (temporary) starvation. As stated before resource pool reservations do not flow to virtual machines. not reserve it. Limits are addressed in Chapter 17. the virtual machine can start consuming resources. the VMkernel will back every request with physical resources. resource pools do not explicitly set the calculated reservation per virtual machine but use reservation for calculation of the resource entitlement of the virtual machine from a resource pool point of view. It does not matter how idle the virtual machine is. Resource pool level reservations settings do not "flow" to the virtual machine. however. Because of the dynamic characteristics of the resource pool memory reservation and the fact that it does not flow to the virtual machine. Memory reservations are more in line with the whole concept of consolidation and fairness. (Except if a limit is set on the virtual machine level. unused memory will flow back to the system. vCenter 4. Once contention occurs. so when memory is protected by a VM level reservation it is never reclaimed. The memory is available and free to be used even by virtual machines external to the resource pool. but until that point in time. they can use it. When a virtual machine is restarted by HA they are not restarted in the correct resource pool but in the root resource pool. The movement of the virtual machine from the root resource pool to the correct resource pool gets corrected in the next DRS run. Now the big difference between virtual machine level and resource pool level memory reservation is that resource pool memory reservations do not "hoard" memory. the virtual machine needs to do without any memory reservations. and as long as no resource contention occurs. so they will not influence HA slot sizes. in that scenario the virtual machine cannot allocate more than its limit. Now setting a memory reservation on a resource pool level has its own weaknesses. Virtual machine memory reservations will not be reclaimed by the VMkernel.) In this example no limits are set on virtual machine and resource pool level. NOTE We must stop treating shares as the redheaded stepchild of the resource allocation settings family and realize how important share ratios really are. How does it know which virtual machines are more "entitled" to physical memory and which are not really in need of resources? The resource pool will look at the virtual machine resource entitlement. and they only use resources when needed.

Such amount of CPU resource is needed to take full advantage of the network bandwidth in order to minimize migration time. Reservations Are Not Limits. Basic design principle: Leave some CPU capacity in a cluster as unreserved. By default all vMotions initiated by DRS are of low priority. the amount of resources specified at the virtual machine level reservation is subtracted from its parent. but subtracts this amount of the resource pool reservation. When setting a memory reservation on a resource pool it does not imply that the virtual machine cannot use additional memory above the reservation. DRS respects the virtual machine reservation and will immediately pass this to the local scheduler.1. if a memory reservation of 2GB is set on a virtual machine. The distinction between High and Low priority vMotion is that a Low Priority vMotions tries to reserve the percentage of a core. . For example. Basic design principle: Set per-VM memory reservations only if a virtual machine absolutely requires guaranteed memory. but will proceed regardless of how much it actually received. In vSphere 4. When this scenario occurs. a 10GB memory reservation is set on the resource pool. this CPU reservation is 30% of a processor core for a 1 Gb network interface and 100% of a CPU core for a 10 Gb network interface.Setting a VM Level Reservation inside a Resource Pool It is possible to set a reservation on a virtual machine inside a resource pool that is configured with a reservation as well. the memory is allocated based on shares in case of resource contention. This helps to ensure that vMotion tasks get the resources needed and are completed as quickly as possible. High priority vMotions are designed to fail if they cannot reserve sufficient resources. resulting in a 8GB pool of guaranteed memory which DRS divides across the remaining child virtual machines and resource pools inside the resource pool. If a virtual machine wants to allocate more memory above the specified amount of the reservation. VMkernel CPU reservation for vMotion The VMkernel reserves a certain amount of CPU capacity for the vMotion task. In vSphere 4.0 and earlier ESX reserved 30% of a CPU core at both source and destination host.

In the example above. resource pool 1 has a 10GB memory reservation configured. During contention. This extra space is needed by ESX for the internal VMkernel data structures like virtual machine frame buffer and mapping table for memory translation (mapping physical virtual machine memory to machine memory). Memory Overhead Reservation If you do not set a memory reservation at resource pool level. the amount of reserved memory of resource pool 1 is divided between its virtual machines based on their resource entitlements. Two kinds of virtual machine overhead exist: . be sure to enable the Expandable reservation otherwise Admission Control cannot allocate resources to satisfy the virtual machine memory overhead reservation.Figure 47: Reservations and share based resources In the picture above. The maximum amount of memory that resource pool 1 can allocate is a total of the combined configured memory of all virtual machines plus their memory overhead reservations. the 6GB memory resources exceeding the reserved memory pool is allocated from the unreserved memory pool based on the proportional share level of resource pool 1 and resource pool 2. but the total amount of configured memory of its virtual machines (16GB) is greater than the reservation. Every virtual machine running on an ESX host consumes some memory overhead additional to the current usage of its configured memory.

Table 6: Virtual machine memory overhead (in MB) Please be aware of the fact that memory overheads can grow with each new release of ESX.Static overhead Static overhead is the minimum overhead that is required for the virtual machine startup. the virtual machine will continue to function but this could lead to performance degradation.2 of the vSphere Resource Management guide lists the overhead memory of virtual machines. Verify the documentation of the virtual machine memory overhead and check the specified memory reservation on the resource pool. Dynamic overhead Once the virtual machine has started up. This means that the effective memory reservation for a virtual machine is the user configured memory reservation (VM-level reservation) plus the overhead reservation. so keep this in mind when upgrading to a new version. DRS and the VMkernel uses this metric for Admission Control and vMotion calculations. the memory overhead of a virtual machine must be included in the calculation of the memory reservation specified on the resource pool. the virtual machine monitor (VMM) can request additional memory space. This means that during the design phase of a resource pool. The behavior of dynamic overhead must also be taken into account. The VMM will request the space. . Admission control As mentioned earlier DRS and the VMkernel will not allow a virtual machine to be powered on if reservations cannot be guaranteed. The destination ESX host must be able to back the virtual machine reservation and the static overhead otherwise the vMotion will fail. The VMkernel treats virtual machine overhead reservation the same as VM-level memory reservation and it will not reclaim this memory once it has been used. If the VMM does not obtain the extra memory space. but the VMkernel is not required to supply it. Table 3. The table listed below is an excerpt from the Resource Management guide and lists the most common ones.

including virtual machine memory overhead reservation. it's only about what they can reserve. If the expandable reservation setting is selected. Note that this has nothing to do with how much memory can be configured in or used by the VMs in the resource pool. Expandable reservation is used by Admission Control. That sum cannot be greater than the resource-pool level reservation. Figure 48: Expandable reservation workflow . unless Expandable is checked.Expandable Reservation The setting expandable reservation only exists on a resource pool level. If the expandable reservation is not selected. although it is used to allocate resources for virtual machine level reservations. A simple way to think of it is this: Add the VM-level reservations—plus implicit overhead reservations—of every VM running in the resource pool. Admission Control considers only the resources available of the resource pool to satisfy the reservation. Admission Control considers the capacity in the ancestor resource pool tree as available for satisfying VM-level reservations.

the request is rejected. Even when there are plenty resources available. When the requested capacity would allocate more resources than the limit of the parent resource pool specifies. Figure 49: Traversing expandable reservation Basic design principle: Enable expandable reservation if possibly inadequate memory reservation is set on the resource pool if VM-level reservations will be defined. etc. It will only consider unreserved resources from its ancestors but not siblings. A virtual machine or resource pool is prevented from using more physical resources than its configured limit. it will search for unreserved capacity through the resource pool tree. and the virtual machine will not be started. A limit is the complete opposite of reservation. Where the reservation is a guaranteed lower bound of resources. The search for unreserved capacity stops when a resource pool is configured without the expandable reservation selected or when a limit is set. Limits A limit is an artificial cap on the usage of a resource.When a virtual machine is powered on. Ancestors are direct parents of the resource pool. the limit is a guaranteed upper bound of resources. parents of the parents. the limit will .

If it is necessary to set a limit on resource pool level. the less amount of time the virtual machine is receiving from the CPU scheduler. it is not used for specifying the limit on a vCPU level. the virtual machine is not allowed to consume more physical memory than its configured limit. but the administrator sets the memory limit to 1 GB. CPU Resource Scheduling If a CPU limit is set on a virtual machine. the VMkernel restricts the second vCPU and allows it to consume a maximum of only 400 MHz. Whether the virtual machine has 1 vCPU or 8 vCPUs. The CPU scheduler makes sure that the total CPU resource consumption of the virtual machine does not exceed the specified limit. the VMkernel restricts the amount of time the virtual machine will be scheduled. Setting a CPU limit on resource pool level. this limit specifies the upper boundary of CPU resources for the entire virtual machine. the virtual machine is able to consume 1 GB of physical memory. Setting limits on resource pool or virtual machine level can affect the performance of the virtual machines but limits can negatively affect the rest of the environment as well. A 4 GB swap file is created for the virtual machine to ensure availability of memory space for the virtual machine. A dual vCPU virtual machine receives more shares than a single vCPU virtual machine. the other 3 GB memory space will be supplied by ballooning or the swap file. For example. if vCPU 1 uses 1600 MHz. If both virtual machines are equally active and no per-VM reservations or limits are set. therefore limiting the 1 vCPU more than the virtual machine with 2 vCPUs. And this might end up negatively affecting the performance of a business critical application. the VMkernel will not allocate more physical resources than are specified by the limit. a 2000 MHz limit is set on a 2 vCPU virtual machine. If a limit is set on the virtual machine. The resource pool divides the resources and assigns a limit based on the resource entitlement of a virtual machine. Let us take a quick look at the differences between the VMkernel CPU and memory scheduler when setting limits.prohibit the virtual machine or resource pool from making use of these available resources. . the share level indicates the priority level of the virtual machine. The VMkernel CPU scheduler behaves differently from the VMkernel memory scheduler when it comes to limiting physical resources. Memory Scheduler If a memory limit is set on the virtual machine. be aware that the number of vCPUs of each virtual machine can have impact on the assignment of resources within the resource pool. If a virtual machine is configured with 4GB of memory. the physical resources are divided based on the resource entitlement of the virtual machines. DRS takes both virtual machine and resource pool shares into account when calculating the resource entitlement of the virtual machine. The lower the amount of MHz specified in the limit setting.

Figure 50: Virtual machine memory limit The Guest OS inside the virtual machine is unaware of the specified limit as such setting limits can have impact on the performance of the application inside the virtual machine. even if it does not always consume more memory above the limit threshold. Oracle and JVMs do much the same thing. As stated before the limit is not exposed to the operating system itself and as such the application will suffer and so will the service provided to the user. it is better to decrease provisioned memory than to apply a limit as the limit will impose an avoidable and unwanted performance impact in most cases while lowering the memory most likely will not. In that case. . Applications such as SQL. one of the first things they do is check to see how much RAM they have available then tune their caching algorithms and memory management accordingly. more common then we think. The funny thing about this is that although the application might request everything it can. When modern Operating Systems boot. it might not even need it.

The virtual machines cannot consume more physical resources than the specified memory limit. Because the limit is set on the resource pool the memory scheduler is free to allocate resources within the pool as required. . Figure 51: Resource pool memory limit DRS will divide the limit between the ESX host based on the amount of active virtual machines inside the resource pool and the aggregated resource entitlement of the virtual machines. If a memory limit is set on the resource pool. DRS calculates the amount of maximum allowed resources for each ESX host and pushes the resource allocation information to each ESX host inside the cluster. The VMkernel memory scheduler only knows of the part of the DRS resource tree that is relevant to its own local node. The memory scheduler will divide the amount of resources between the virtual machines belonging to the same resource pool.Basic design principle: Configure a virtual machine with correct memory size instead of applying a memory limit. Let us focus again on the resource pool limit setting. The memory scheduler uses the same mechanism as the CPU scheduler and assigns a limit based on the resource entitlement of the virtual machine. the limit applies to all the virtual machines inside the resource pool. This results in dynamic limiting the availability of physical resources as the limit for the virtual machine is related to the resource utilization of its sibling virtual machines in the resource pool.

If it is necessary to set limits. compressing or swapping to provide the additional resources. The same problem occurs when the virtual machine is using large memory pages.In the scenario when other virtual machines are dormant. Figure 52: Dividing of resource pool limit Basic design principle: If a limit is necessary use resource pool limits instead of virtual machine limits if possible. VMkernel resorts to ballooning. an active virtual machine can possibly allocate up to its configured memory. if the virtual . contrary to a per-VM limit which is always active regardless of the resource utilization of other virtual machines. If the virtual machines want to consume more than the limit threshold. It is possible to see virtual machines balloon or swap while the ESX server does not experience any memory pressure. using a resource pool level limit is preferred over per-VM limits.

Basic design principle: We recommend using memory limits sparingly as they are invisible to the guest OS and can cause swapping. expandable reservation is used to allocate unreserved memory for virtual machine reservations and virtual machine memory overhead reservations on behalf of the virtual machines inside the resource pool.machine reaches the limit of the local resource pool tree. it will traverse the ancestor tree to allocate sufficient unreserved resources. The VMkernel needs to use resources to communicate and run the balloon driver and needs to store the memory pages inside a SAN-based swap file. Ballooning. we recommend sizing the virtual machine correctly. If the resource pool is unable to provide enough unreserved resources. compressing and swapping virtual machine memory has impact on the ESX host and possibly the SAN infrastructure. Because the limit obstructs the virtual machine from using physical resources above the specified limit. the resource pool cannot allocate more physical resources than defined by the limit setting. the expandable reservation setting allows the resource pool to allocate unreserved memory resources from its parent resource pools up till the configured memory limit. Expandable reservation and limits As explained in the previous section. the VMkernel will back the remaining memory request by ballooning. these overhead situations will not occur. consuming bandwidth and creating additional load on the storage processors. the large pages (2MB) will be broken into small pages (4KB) to allow reclamation. when a limit is set at the resource pool level. However. Some administrators might ignore the additional load created by swap and balloon but if the virtual machine is sized properly to reflect its workload or SLA. . As virtual machines cannot allocate more memory than their configured memory. the limit parameter will prohibit the resource pool allocating more physical resources than the configured limit. Although the expandable reservation setting allows the resource pool to allocate additional unreserved resources. compression-cache or the swap file. If the resource pool is configured with a memory reservation less than the limit.

The goal of DPM is to keep the cluster utilization within a specific DPM target range. Enable DPM DPM is disabled by default and can be enabled by selecting the power management modes Manual or Automatic. DPM will dynamically consolidate virtual machines onto fewer ESX hosts and power down excess ESX hosts during periods of low resource utilization.Chapter 17 Distributed Power Management With ESX 3. but at the same time take various cluster settings. DPM provides power savings by dynamically sizing the cluster capacity to match the virtual machine resource demand. Due to DPM using DRS to migrate the virtual machines off the ESX hosts. VMware introduced Distributed Power Management (DPM). it leverages the DRS algorithm to distribute the virtual machines across the number of hosts before placing the target ESX hosts into standby mode. DRS must be enabled first before DPM can be enabled on the cluster. virtual machines settings and requirements into account when generating DPM recommendations. Figure 53: DPM settings .5. If the resource demand increases ESX hosts are powered back on and the virtual machines are redistributed among all available ESX hosts in the ESX cluster. After DPM has determined the maximum number of hosts needed to handle the resource demand of the virtual machines.

The combination of both mechanisms will have different effect on the role of user and the automatic application of the recommendations generated by DRS and DPM. the recommendation must be manually confirmed by the user Automatic . All hosts inside the cluster will inherit the default cluster setting. even the threshold can vary from each other.A power recommendation will be generated and will be executed automatically.DPM can be set to run in either manual or automated mode for the cluster. but in addition a per-host setting can be set as well. See section “DRS. Table 7: Effect of combining DPM and DRS . This setting overrides the cluster default. manual or automatic. Per-host settings are only meaningful when DPM is enabled. no user intervention required The power management mode setting. can differ from the DRS automation settings. DPM and VMware Fault Tolerance” of chapter 18 for more info about the constraints Fault Tolerance introduce to DPM. a use case for overriding the default cluster setting is when VMware Fault Tolerance protected virtual machines are running inside the cluster. Each power management mode operates differently: Power Management State and DPM behavior Disabled – No power recommendation will be issued Manual – A power recommendation will be generated.

Basic design principle:
Configure DRS to automation level Automatic if DPM is set to automation level Automatic.

Templates
While DPM leverages DRS to migrate all active virtual machines on the host before powering down the host, the registered templates are not moved. This means that templates registered on the ESX host placed in standby mode will not be accessible as long as the host is in standby mode.

Basic design principle:
Register templates on a single host and disable DPM on this host.

DPM Threshold and the Recommendation Rankings
The DPM threshold slider works similarly to the DRS slider, move the slider to set the DPM threshold to be more conservative or aggressive. DPM recommendation priority levels can be compared to the DRS priority levels. Setting the DPM threshold at the most conservative level, would generate only the most important (priority 1) recommendations. Setting the DPM threshold at the most aggressive level would generate all recommendations. Each level indicates the importance of the recommendation regarding the current utilization of the ESX hosts in the cluster and the possible constrains on the current capacity. DPM uses different ranges for its recommendations, Host power-on recommendations range from priority level 1 to priority level 3, while power-off recommendations range from priority level 2 to priority level 5. The highest power-off priority level (2) indicates a larger amount of underutilized powered-on capacity in the cluster. Recommendations with a higher priority level will result in more powersaving if the recommendations are applied. For the range of power-on recommendations, a priority level 1 is generated when a VMware High Availability requirement must be met. Priority level 1 recommendations are also generated to meet powered-on capacity requirement set by the user. Power-on priority level 2 indicates a more urgent recommendation to solve higher host utilization saturation levels than priority level 3.
Table 8: Recommendations priority level

Setting the DPM threshold to the most conservative level will result in DPM generating only priority level 1 recommendations, according to the accompanied text below the threshold slider: “Apply only priority 1 recommendations. vCenter will apply power-on recommendations produced to meet HA requirements or user-specified capacity requirements” DPM will only automatically apply the power-on recommendations…”

DPM will not generate power-off recommendations; this effectively means that the automatic DPM power saving mode is disabled. The user is able to place the server in the standby mode manually, but DPM will only power-on ESX hosts when the cluster fails to meet certain HA or custom capacity requirements or constraints.

Evaluating Resource Utilization
DPM generates power management recommendations based on the CPU and memory utilization of the ESX host. DPM aims to keep the ESX host resource utilization within the target utilization range. If the resource utilization is above the target utilization range, DPM evaluates host power-on operations. When the ESX host resource utilization is below the target utilization range, DPM evaluates power-of operations. DPM calculates the target utilization range as follows:
Target resource utilization range = DemandCapacityRatioTarget±DemandCapacityRatioToleranceHost

The DemandCapacityRatioTarget is the utilization target of the ESX host, by default this is set at 63%. The DemandCapacityRatioToleranceHost specifies the tolerance around the utilization target for each host, by default this is set at 18%. This means that DPM will try to keep the ESX host resource utilization centered at the 63% sweet spot, plus or minus 18 percent, resulting in a range between 45 and 81 percent. If the resource utilization of both CPU and memory resources of an ESX host falls below 45%, DPM evaluates power-off operations. If the resource utilization exceeds the 81 percent of either CPU or memory resources, DPM evaluates powering-on operations of standby ESX hosts.

Figure 54: Power operations regarding to host utilization levels

The sweet spot of 63 percent is based on in-house testing and feedback from customers. Both the DemandCapacityRatioTarget and DemandCapacityRatioToleranceHost values can be modified by the user, the DemandCapacityRatioTarget can be set between the range 40 to 90% and the DemandCapacityRatioToleranceHost allowed input range is between 10 and 40%. It is recommended to use the default values and to only modify the values when you fully understand the impact.

Virtual Machine Demand and ESX Host Capacity Calculation

DPM must be absolutely sure that it will not negatively impact virtual machine performance. it will also have impact on the current resource utilization as DRS shall try to load-balance the active virtual machines across constantly changing landscape of available hosts. This also becomes visible when reviewing the rules of power-on operations recommendation and power off operation recommendations. Evaluating Power-On and Power-Off Recommendations The next step taken by DPM if the resource utilization evaluation indicates low or high resource utilization is generating recommendations which reduce the distance of the current resource utilization to the target resource utilization range. DPM uses a longer period when evaluating resource demand that may lead to power-off operations. DPM evaluates the virtual machine workload of the past 2400 seconds (40 minutes).e. DPM iterates through the active hosts inside the cluster and places them in a specific order for DPM power off evaluation process.DPM calculates the resource utilization of the ESX host based on the virtual machine demand and the ESX host capacity. DPM calculates the ESX host resource demand as the sum of each active virtual machine over a historical period of interest plus two standard deviations. DPM ensures that the evaluated virtual machine demand is representative of the virtual machine normal workload behavior. I. so performance receives a higher priority by DPM than saving power. By using shorter periods of time for evaluation power-on operations DPM will have the ability to respond to demand increase relatively quick. A longer period is used to evaluate power-off operations so that DPM will respond slowly to a decrease in workload demand. If the cluster contains . Before selecting an ESX host for power off. DPM uses two periods of interest when calculating the average demand. The period of interest DPM uses when evaluating virtual machine demand that can possibly lead to power-on operations is 300 seconds (5 minutes). Providing adequate resources for workload demand is considered more important by DPM than rapid response to decreasing workloads. a power off recommendation is only applied when the ESX host is below the specified target utilization range AND there are no poweron recommendations active. optimizing and aligning the power demand to the workload demand. Not only does this negatively affect the power-saving efficiency. Finding a proper balance between providing resources and resource demand can be quite difficult as underestimating resource demand can result in lower performance while overestimating resource demand can lead to less optimal power savings. The demand itself is a combination of the virtual machines working set (active memory) and an estimation of unsatisfied demand during periods of contention. Using shorter periods of time or only current demand can include short-term resource demand if DPM would react to this situation it would unnecessarily generate power-on and power-off recommendations. By using historical data over a longer period of time instead of using the virtual machine active current demand. The calculated host capacity equals the installed physical CPU and memory resources minus the overhead created by the Service Console and the VMkernel.

. DPM considers hosts in order of critical resource capacity. In addition. If the hosts are overcommitted on memory.hosts in automatic mode and manual mode.(target utilization – host utilization) DPM is aware of which resource is more critical and will use and process this in the evaluation. smaller capacity host are favored for power off recommendation first. DPM computes the resource HighScore called cpuHighScore and memHighScore. Hosts inside the automatic mode group with a lower amount of virtual machines or smaller virtual machines are considered first before heavy loaded hosts in the same group. If the cluster contains homogeneous sized hosts. If the cluster contains heterogeneous sized hosts. larger capacity host are favored first. for example the memLowScore is calculated as followed: memLowScore = Sum across all host below target utilization. DPM considers hosts in order of lower virtual machine evacuation costs. DPM determines that memory is the critical resource and will prioritize memory over CPU recommendations. Table 9: DPM preference If the sort process discovers equal hosts with respect to the capacity or evacuation cost. The formula used for each resource is similar and calculates the weighted distance below/above the target utilization. if a smaller capacity host that can adequately handle the demand is also available. done for a wear-leveling effect. Be aware that sorting of the hosts for power-on or power-off recommendations does not determine the actual order for the selection process to power-on or power-off hosts. it might be possible that DPM will not strictly adhere to its host sort order if doing so would lead to choosing a host with excessively larger capacity than needed. Hosts inside the automatic group are considered before the hosts inside the manual mode group. For power-on recommendations. they are placed in separate groups. DPM calculates the value for CPU and memory resources called cpuLowScore and memLowScore. But under normal circumstances DPM generates the power-off recommendation based on the resource LowScore and HighScore. Hosts with more critical resources (CPU or memory) are sorted before the other hosts in its group. To measure the amount of resource utilization above the target resource utilization range. DPM will randomize the order of hosts. Resource LowScore and HighScore To measure the amount of resource utilization under the target resource utilization.

DPM needs to adjust its power-on operations recommendation to fulfill the requirements defined in these settings. which ensures that at least one host in the cluster is kept powered-on. If a simulation offers an improved HighScore value if a standby host is powered-on. the ESX host may be idle and will lead to less efficient power ratio.on recommendation for that specific host. Limits such as the inability to migrate virtual machines to the candidate host if it were powered on or that the virtual machines that would move to a candidate host are not expected to reduce load on the highly-utilized hosts in the cluster. DPM continues with evaluating each standby host and invokes DRS to run simulations. In some cases it might be possible that constraints will limit the host selection. DPM compares the HighScore value of the cluster in its current state (standby host still down) to the HighScore value of the simulations. By default both settings have a value of 1 MHz and 1MB respectively. If the resource utilization evaluation indicates low utilization. DPM is very efficient in homogeneous sized clusters as DPM will skip every host which is identical regarding physical resources or vMotion compatibility to any host who is already rejected for power-on operation during the simulation. MinPoweredOnCpuCapacity and MinPoweredOnMemCapacity. even migration to the hosts who are currently placed in standby mode. If these settings are altered. it might happen that DPM and DRS do not need the physical resources to run the virtual machines at a proper level. Basic design principle: We recommend using homogeneous clusters as DPM will operate more efficiently. This will simulate the distribution of virtual machines across all the hosts inside the cluster. DPM begins with iterating through the standby hosts in the sort order described in the previous section. DPM will generate a power. DPM considers host power-off . DPM continues to run simulations as long as there are hosts in the cluster exceeding the target utilization range. However DPM relies on future invocation rounds of DRS. redistribution of virtual machines among the poweredon hosts is not included in the power-on recommendation. It needs to determine how much improvement this power-up operation has on the distance of the resource utilization from the target utilization or the possible reduction of the number of highly utilized hosts. By using the HighScore calculation. Host Power-Off Recommendations DPM uses a similar approach to power-off recommendations as the power-on recommendation. Contrary to a power-off recommendation.Host Power-On Recommendations If the resource utilization evaluation indicates a host with high utilization inside the cluster. If the user set a custom value in the advanced settings. DPM considers generating host power-on recommendations. DPM determines the impact that a power-up operation has on the current utilization ratio.

If DRS is set to the conservative migration threshold level.) DPM Power-Off Cost/Benefit Analysis Before DPM generates a power-off recommendation. These simulations are used by DPM to determine how much impact the power-off operations have on reducing the number of lightly loaded hosts or reducing the distance of the lightly-utilized host to the target resource utilization and minimize the increase of resource utilization on the remaining hosts. such as anti-affinity rules or an ESX host entering maintenance mode. DPM continues to run simulations as long as the cluster contains ESX hosts below the target utilization range. therefore setting the migration threshold of DRS to generate priority level 1 recommendations will effectively disable DPM. These priority level 1 migrations are mandatory moves and only address constraint violations. (Considering both resources in case of a power-off. DPM iterates through the active hosts in the sort order described in the previous section. For example a host might be rejected to be powered off if the virtual machines that need to be migrated can only be moved to hosts that become too heavily utilized. DPM compares the LowScore value of the cluster with all the candidate hosts active to the LowScore value of the simulations. DPM will not power down a host if it violates the minimum powered-on capacity specified by the settings MinPoweredOnCpuCapacity and MinPoweredOnMemCapacity. DPM generates a power-off recommendation. DPM takes the follow cost into account: Migrating virtual machines off the candidate host The power consumed during the power-down period Unavailable resources of candidate host during power-down . then DRS will only generate priority level 1 migration recommendations. This threshold does not generate non-mandatory recommendations to rebalance the workload of the virtual machines across the ESX hosts in the cluster. This situation can occur when multiple DRS (anti) affinity are active in the cluster. Basic design principle:When DPM is activated make sure DRS is not set to the “conservative” threshold level. DPM calculates the cost associated with powering down a host. DPM will evaluate the candidate hosts and uses DRS to run simulations in which the candidate hosts are powered off in the cluster. if a simulation offers improvement of the LowScore and if the HighScore value does not increase. Another reason for DPM to not select a specific candidate host can be based on DRS constrains or objectives. This power-off recommendation also contains virtual machine migration recommendations for the virtual machines running on this particular host. DRS will not indicate these virtual machine migration recommendations as priority level 1 migrations.recommendations. A third factor is that DPM does not select a candidate host to power down based on the negative or non-existing benefit indicated by the power-off cost/benefit analysis run by DPM Similar to power-on recommendations.

The time it takes from applying the power-off recommendation to the power-off state is taken into account as well. As always do not change these settings only if the impact of modifying is known to you. It might be possible the ClusterStableTime is low. DPM calculates the required hosts which need to be available at the end of the ClusterStableTime. This calculation is somewhat of a worstcase scenario as DPM expects all the virtual machines to generate heavy workloads at the end of the ClusterStableTime. The time that the virtual machine workload is stable and no power-up operations are required is called the ClusterStableTime. DPM will only accept a host power-off recommendation if the benefits meet or exceed the performance impact multiplied by the PowerPerformanceRatio setting. The analysis breaks this time down into two sections and calculates this as the sum of the time it takes migrating all active virtual machines off the host (HostEvacuationTime) and the time it takes to power off the host (HostPowerOffTime). The power-off benefits and power-off cost are calculated as follows. The power-off benefit analysis calculates the StableOffTime value. The default value is 40 but can be modified to a value in the range between 0 and 500. calculated by DRS cost-benefit-risk analysis. During this scenario. Both cost and benefit calculations include both CPU and Memory resources. hereby generating a conservative value. DPM will use the virtual machine stabletime. These values are combined in the sum: StableOffTime = ClusterStableTime – (HostEvacuationTime + HostPowerOffTime) The power-off cost is calculated as the summation of the following estimated resource costs: Migration of the active virtual machines running on the candidate host to other ESX hosts Unsatisfied virtual machine resource demand during power-on candidate host at the end of the ClusterStableTime Migration of virtual machines back onto the candidate host The last two bullet points can only be estimated by DPM.Loss of performance if candidate host resources are needed to meet workload demand while candidate host is powered off. DPM will stop evaluating the candidate host for a power-off operation recommendation because it will not offer any benefit. this can result in a StableOffTime equal or even less than zero. as input for the ClusterStableTime calculation. . which indicates the amount of time the candidate host is expected to be powered-off until the cluster needs its resources because of an anticipated increase in virtual machine workload. As previously mentioned DPM will only recommend a power-off operation as long as it is equal or exceeds the performance impact. Unavailability of candidate host resources during power-up period The power consumed during the power-up period Cost of migrating virtual machines to the candidate host DPM runs the power-off cost/benefit analysis which compares the costs and risk associated with a power-off operation to the benefit of powering off the host.

HA will ask DRS/DPM to power-on hosts to accommodate the restart of those virtual machines. if a failure happens and HA cannot restart some virtual machines due to insufficient powered-on hosts. In addition DRS has the ability to bring ESX hosts out of standby mode to acquire the necessary resources to match the resource demand created by unexpected increase of virtual machine workloads. DRS might undo the manually placed standby mode the next time DRS runs. Because no constraints to keep enough resources available is enforced. DRS does not distinguish between ESX host which are placed in standby mode by DPM or manually by the administrator. During the recommendation generating process DRS what-if mode is executed to ensure that the power operation recommendations do not violate the DRS constraints and objectives. HA will consider the unreserved resources provided by the ESX host for Admission Control and the ESX host can be brought out of standby mode if the resources are required. starting with vCenter 4. HA places a constraint to prevent DPM from powering down too many ESX hosts if it would violate the Admission Control Policy. High Availability If HA strict Admission Control is enabled (default).0 HA clusters have a soft limit of a maximum amount of virtual machine if the cluster exceeds more than nine hosts. Contrary to disconnected hosts or hosts in maintenance mode. vSphere 4. the failover constraints are not passed on to DPM. If the cluster contains a maximum of eight hosts. However. If HA strict Admission Control is disabled. DPM will maintain the necessary level of powered-on capacity to meet the configured HA failover capacity.Chapter 18 Integration with DRS and High Availability Distributed Resource Scheduler DPM tries to match the availability of resources in the cluster to the virtual machine workload and resource demand.1. the maximum . DPM will generate poweroff recommendations and places ESX hosts in standby mode regardless of the impact it has on the HA failover requirements.

Because the magic packet is send across the vMotion network to a powered-off server. DPM does not consider the soft limit and can create a scenario where the remaining nine host end up with more than 40 virtual machines. both configured with the respective default value of 1 MHz and 1 MB. If the host does not offer the hardware support and configurations of any of these protocols it cannot be placed into standby mode by DPM.1 the soft limit of a maximum number of virtual machines when the amount of ESX hosts in the cluster exceeds nine is removed. The magic packet. DPM keeps at least one host powered on in the cluster at all times. DPM Standby Mode The term “Standby mode” used by DPM specifies a powered down ESX host. DPM requires the Host to be able to awake from an ACPI S5 state via Wake-On-LAN (WOL) packets or the two out-of-band methods. is send over the vMotion network by another current powered on ESX server in the cluster. Intelligent Platform Management Interface (IPMI) version 1. For this reason. managed by the DPM advanced controls. To use WOL the ESX host must contain a Network Interface Card that supports the WOL protocol. HA will trigger a new primary node election resulting in the recalculation of primary nodes for each former primary that is put into standby mode. The term is used to indicate that the ESX host is available to be powered on should the cluster require its resources.5 or if the appropriate credentials for using iLO or IPMI have not been configured and set up in vCenter.amount of virtual machines exceeds far more than the 40 virtual machines. Fortunately in vSphere 4. the switch port used by the vMotion NIC must be set to auto negotiate link speed instead of setting the port to a fixed speed such as 1000 MB/s Full.5 (or higher) or HP Integrated Lights-Out (iLO) technology. Both IPMI and iLO require the availability of a Baseboard Management Controller (BMC) providing access to hardware control functions and allowing the server hardware to be accessed from the vCenter server using a LAN connection. DPM will explicit disable HA on the host before placing a host into standby mode. MinPoweredOnCpuCapacity and MinPoweredOnMemCapacity. Industry best practices . Simply because DPM is not aware of the different types of HA nodes. DPM WOL Magic Packet If the ESX host is not a HP server or does not support IPMI version 1. the network packet to bring the server back to live. To avoid DPM powering down all HA primary nodes. DPM uses WakeOn-LAN Packets to bring the ESX host out of standby mode. DPM impacts the configuration of the vMotion network as well. DPM awareness of High Availability Primary Nodes DPM does not take current the current HA different node roles into account when selecting a host or multiple hosts for power-down recommendations. By disabling the HA agent on the host. Because most NICs support only WOL if it can switch to 100 Mb/s.

If vCenter is unable to succeed to power on the ESX host with the IPMI. DPM and Host Failure Worst Case Scenario As described in the previous section. only then vCenter will use MD5 authentication. it will try the second protocol iLO. iLO and WOL. . To ensure IPMI is operational. ensure that vmnic speed is set to autonegotiate as well. if the BMC reports that it supports MD5 and has the operator role enabled. vCenter will switch to plaintext authentication if none or only one requirement is met. no ESX host can be brought out of standby mode using these methods. DPM is enabled in a four-way cluster and due to the minimal amount of required resources. The active ESX server hosts the vCenter server virtual machine. Placing the ESX host in standby mode. does not use any power management protocols. vCenter initiates a graceful shutdown of the ESX host. configure the BMC LAN channel to always be available. Now imagine the following scenario. Another ESX server uses the vMotion network to send the WOL magic packet to power-on standby ESX host. If MD5. leaving three ESX hosts in standby mode and one ESX host down. DPM will attempt to use IPMI as default. Because both IPMI and iLO are depending on vCenter. Both technologies have a dependency of another system. DPM uses MD5. DPM relies on vCenter to power-on standby ESX host when using IPMI or iLO. Protocol Selection Order If the server is configured for IPMI or iLO. if this attempt fails too DPM will try to power down using the Wake-On-LAN and instructs a powered-on ESX host to send the magic packet. DPM powers-down three ESX hosts. vCenter will not use IPMI and attempts to use Wake-On-LAN. either vCenter or a powered-on ESX host. HA strict admission control is disabled. DPM will try use the protocols in the order IPMI. Baseboard Management Controller If both IPMI and WOL are present and both are operational.or plaintext is not enabled or supported.advice setting both NIC and Switch port to identical settings. A number of BMC boards require IPMI accounts set in the BIOS. some BMC LAN channel requires the availability to send operator-privileged commands.or plaintext-based authentication with IPMI. And because no ESX host is operational no magic packet is send to the standby mode using Wake-On-LAN. Due to unlucky circumstances this ESX host fails and thereby failing the vCenter server virtual machine as well.

and the load-balancing limitation. but this is something we would not easily recommend. In vSphere 4. Basic design principle: In vSphere 4. Another valuable option in this situation is to configure HA with Admission Control enabled. In vSphere 4. but this scenario must be considered. Not only has the DRS-FT integration a positive impact on the performance of the FT enabled virtual machines and arguably all other VMs in the cluster.1 includes DRS-Fault Tolerance integration. This will result in a more load-balanced cluster that likely has positive effect on the performance of the FT virtual machines. . DPM is now able to move the FT virtual machine to other hosts if DPM decides to place the current ESX host in standby mode.and secondary virtual machine to run on the same ESX hosts based on an anti-affinity rule. DRS will refrain from generating load balancing recommendations for both virtual machines. but it will also reduce the impact of FT-enabled virtual machines on the virtual infrastructure. vSphere 4. this is the utmost worst-case scenario what can happen to a DPM cluster. For example.As the section title suggests. When FT is enabled on a virtual machine in 4. However. Admission Control will ensure that enough resources will be available and will not allow DPM to power down ESX hosts if it would violate failover requirements DRS. but also migrate the primary and secondary virtual machine during DRS load balancing operations. A major impact solution can be implemented such as running vCenter server on a physical server. running the vCenter in a non-DPM cluster or disabling DPM on an additional ESX host. vSphere 4.0 DRS is disabled on the FT primary and secondary virtual machines. To avoid this situation multiple solutions can be used such as using minor impact solutions.0.1 allows DRS not only to perform initial placement of the Fault Tolerance (FT) virtual machines. so disable DPM on enough hosts.0 an anti-affinity rule prohibited both the FT primary. the existing virtual machine becomes the primary virtual machine and is powered-on onto its registered host.0. DRS is able to select the best suitable host for initial placement and generate migration recommendations for the FT virtual machines based on the current workload inside the cluster. For example. DPM and VMware Fault Tolerance vSphere 4.1 offers the possibility to create a VM-host affinity rule ensuring that the FT primary and secondary virtual machine do not run on ESX hosts in the same blade chassis if the design requires this. be aware there is a limit of 4 FT enabled virtual machines per host. DPM needs to be disabled on at least two ESX hosts because of the DRS disable limitation.0 clusters disable DPM on two ESX hosts which can act as hosts for Fault Tolerant enabled virtual machines. the newly spawned virtual machine. called the secondary virtual machine is automatically placed on another host. The new DRS integration removes both the initial placement. In vSphere 4.

the environment experiences latency or performance loss. Many companies do not enable EVC on their ESX clusters based on either FUD (Fear. vCenter will disable all DPM features on the selected cluster and all hosts in standby mode will be powered on automatically when the scheduled task runs.Because DRS is able to migrate the FT-enabled virtual machines. The period between 7:30 and 10:00 is recognized as one of the busiest periods of the day and during that period the IT department wants their computing power lock. DRS can evacuate all the virtual machines automatically if the ESX host is placed into maintenance mode. Uncertainty and Doubt) on performance loss or arguments that they do not intend to expand their clusters with new types of hardware and creating homogenous clusters. usually this latency occurs in the morning. is the incurred (periodic) latency when enabling DPM. DPM Scheduled Tasks vSphere 4.1 clusters to allow DRS to select an appropriate host for placement and allow DRS to load balance the FT-enabled virtual machines. This scheduled task will give the administrators the ability to disable DPM before the employees arrive.0 behavior and enables the DRS disable setting on the FT virtual machines. One of the main concerns administrators have. Basic design principle: Enable EVC on the vSphere 4. it can take up to five minutes before DPM decides to power up the ESX host again. stock and ready to go. If EVC is not enabled. Because the ESX hosts remain powered-on until the administrator or a DPM scheduled task . This option removes one of the biggest obstacles of implementing DPM. This reduces the need of both manual operations and the need to create “exciting” operational procedures on how to deal with FT-enabled virtual machines during the maintenance window. The advantages and improvement DRS-FT integration offers on both performance and reduction of complexity in cluster design and operational procedures shed some new light on the discussion to enable EVC in a homogeneous cluster. DRS will automatically select a suitable host to run the FT-enabled virtual machines. vCenter will revert back to vSphere 4. If the admin selects the option DPM off. During this (short) period of time. The DPM “Change cluster power settings” schedule task allows the administrator to enable or disable DPM via an automated task. when the employees arrive in the morning the workload increases and DPM needs to power on additional ESX hosts. The administrator does not need to manually select an appropriate ESX host and migrate the virtual machines to it.1 offers the option to enable and disable DPM via scheduled tasks. DRS FT integration requires having EVC enabled on the cluster. It is common for DPM to place ESX hosts in standby mode during the night due to the decreased workloads. If DPM place an ESX host in standby mode.

another schedule can be created to enable DPM after the periods of high workload demand ends. If there are any questions please do not hesitate to reach out to either of the authors. rather than having to wait for DPM to react to the workload increase.enables DPM again. by scheduling a DPM disable task every weekday at 7:00. the ESX cluster will be ready to accommodate load increases. By powering up all ESX hosts early. We have tried to simplify some of the concepts to make it easier to understand. Chapter 19 Summarizing Improvements were made to DRS in vSphere 4. .1 Better integration and more efficient algorithms allows DRS to reach a steady state more quickly when there is significant load imbalance in the cluster. still we acknowledge that some concepts are difficult to grasp. DRS will have the time to rebalance the virtual machine across all active hosts inside the cluster and Transparent Page Sharing process can collapse the memory pages shared by the virtual machines on the ESX hosts. For example. We hope though that after reading this section of the book everyone is confident enough to create and configure DRS clusters to achieve higher consolidation ratios at low costs. By scheduling the DPM disable task more than one hour in advance of the morning peak. the administrator is ensured that all ESX hosts are powered on before the morning peak.

It is also recommended to have a secondary Service Console (ESX) or Management Network (ESXi) running on the same vSwitch as the storage network to detect a storage outage and avoid false positives for isolation detection. As such we recommend enabling it. Avoid using advanced settings to decrease the slot size as it could lead to more down time and adds an extra layer of complexity. Keep das. Be really careful with reservations. Admission Control guarantees enough capacity is available for virtual machine failover. if there’s no need to have them on a per virtual machine basis.isolationaddress”. If an isolation validation address has been added. add 5000 to the default “das. especially when using Host Failures Cluster Tolerates. If there is a large discrepancy in size and reservations are set it might help to put similar sized virtual machines into their own cluster. Do the math. In blade environments. Although vSphere 4.0 Update 2) to set the isolation response to "Shut Down" or “Power off”. If reservations are needed. verify that any single host has enough resources to power-on your largest virtual machine.failuredetectiontime low for fast responses to failures. NFS. . resort to resource pool based reservations.Appendix Appendix A – Basic Design Principles VMware High Availability Avoid using static host files as it leads to inconsistency which makes troubleshooting difficult. balance your clusters and be conservative with reservations as it leads to decreased consolidation ratios.failuredetectiontime” (15000). don’t configure them. When using Admission Control.1 will utilize DRS to try to accommodate for the resource requirements of this virtual machine a guarantee cannot be given. FCoE) it is recommended (pre-vSphere 4. divide hosts over all blade chassis and never exceed four hosts per chassis to avoid having all primary nodes in a single chassis. “das. Also take restart priority into account for this/these virtual machine(s). For network-based storage (iSCSI.

preventing oversizing. We recommend using a “Percentage” based Admission Control Policy as it is the most flexible policy. Enable expandable reservation if possibly inadequate memory reservation is set on the resource pool if VM-level reservations will be defined. It is part of the HA stack and we heavily recommend using it! VMware Distributed Resource Scheduler Configure vMotion to fully benefit from DRS capabilities. . Configure a virtual machine with a correct memory size instead of applying a memory limit. and take customer requirements into account. Virtual machines with larger memory size and/or more virtual CPUs add more constraints to the selection and migration process. Leave virtual machine automation mode set to Default to minimize administration overhead and possibility of human error. use limits sparingly. Virtual machines with a smaller memory sizes or fewer virtual CPUs provide more placement opportunities for DRS. Set automation level to Automatic to fully benefit from DRS capabilities. VM Monitoring can substantially increase availability. Configure DRS to automation level Automatic if DPM is set to automation level Automatic. Mandatory affinity rules apply even when DRS is disabled. Take this into account when sizing the vCenter server. Leave some CPU capacity in a cluster as unreserved. This helps ensure that vMotion tasks get the resources needed and are completed as quickly as possible. If a limit is necessary use resource pool limits instead of virtual machine limits if possible. and because virtual machines cannot allocate more memory than their configured memory anyway. Set automation level to Manual instead of Disabled if more control over placement is required. it does not justify use of big virtual machine. Although DRS migrates virtual machines to gain the most improvement of cluster balance. Because memory limits are invisible to the guest OS and can cause swapping. This means it’s recommended to configure the size of the VM to what the VM actually needs. as rules can have an impact on the effectiveness of the Load balancing calculation.Do the math. Select a moderate migration threshold if the cluster hosts virtual machines with varying workloads. Register templates on a single host and disable DPM on this host. The number of clusters and virtual machine managed by vCenter influences the number of calculations that impacts the performance of vCenter. Use VM-Host and VM-VM affinity rules sparingly. Set per-VM memory reservation only if virtual machine absolutely requires guaranteed memory. DPM operates more efficiently in homogeneous clusters. When DPM is activated make sure DRS is not set to the “conservative” threshold level. The DRS algorithm has less choice when rules are configured.

. be aware there is a limit of 4 FT enabled virtual machines per host.In vSphere 4. Pre-vSphere 4.0 HA will verify if the isolation addresses are pingable by the host during configuration and will raise a configuration issue if this is not the case. it only gives you the option to specify multiple networks. Appendix B – HA Advanced Settings HA is probably the feature with the most advanced settings. das. As of vSphere 4. This however is no longer needed. if the default gateway is a non-pingable address.Number of milliseconds. However. so disable DPM on enough hosts.isolationaddress” to a pingable address and disable the usage of the default gateway by setting this to “false”. The most used and valuable advanced settings are described below: das.0 clusters disable DPM on two ESX hosts which can act as hosts for Fault Tolerant enabled virtual machines. set the “das. In other words.1 clusters to allow DRS to select an appropriate host for placement and allow DRS to load balance the FT-enabled virtual machines.allowNetwork[x] . Although many of them are rarely used some of them are needed in specific situations or included in best practices documents. These networks need to be compatible for HA to configure successful. The impact of this however is that the failover response will be delayed.Enables the use of port group names to control the networks used for HA.failuredetection-time . which is the default isolation address. You can set the value to be “Service Console 2” or ºManagement Networkº to use (only) the networks associated with those port group names in the networking configuration.usedefaultisolation-address . das. Please note that the number [x] has no relationship with the network. das.IP address the ESX host uses to check on isolation when no heartbeats are received. We recommend adding an isolation address when a secondary Service Console is being used for redundancy.isolationaddress[x] .0 it was a general best practice to increase the value to 60000 when an active/standby Service Console setup was used. HA will use the default gateway as an isolation address and the provided values as an additional check. for isolation response action (with a default of 15000 milliseconds). For a host with two Service Consoles or a secondary isolation address it still is a best practice to increase the value to at least 20000. should not or cannot be used for this purpose. where [x] = 110. Enable EVC on the vSphere 4. timeout time.Value can be true or false and needs to be set to false in case the default gateway. where [x] is a number between 0 and 10.

As of vSphere 4.The polling interval for failures.slotMemInMB. das.vmCpuMinMHz . .sensorPollingFreq . This setting controls the maximum amount of concurrent restarts on a single host. It is not recommended to decrease this value as it might lead to less scalability due to the overhead of the status updates.The minimum default slot size used for calculating failover capacity.das. das. das. Higher values will reserve more space for failovers.The minimum uptime in seconds before VM Monitoring starts polling.slotCpuInMHz.1 the default value of this setting is 10. das. Disabling this check will enable HA to be configured in a cluster which contains hosts in different subnets or also called incompatible networks. das. As this will typically result in a conservative number of available slots. Default value is 30 seconds.slotCpuInMHz . This advanced setting can be used when a virtual machine with a large memory reservation skews the slot size.failureInterval (VM Monitoring) . As this will typically result in a conservative number of available slots. The default value is 120 seconds. setting it to “true” disables the check.bypassNetCompatCheck .Disable the “compatible network” check for HA that was introduced with ESX 3.By default HA can restart up to 32 VMs concurrently per host. das.Sets the slot size for CPU to the specified value. It can be configured between 1 and 30.The minimum default slot size used for calculating failover capacity.vmMemoryMinMB . but the average latency to recover individual VMs might increase.ignoreRedundantNetWarning . Default value is “false”.Remove the error icon/message from your vCenter when you don’t have a redundant Service Console connection. Default value is “false”.5 Update 2.perHostConcurrentFailoversLimit . Setting a larger value will allow more VMs to be restarted concurrently and might reduce the overall VM recovery time. setting it to “true” will disable the warning. das. Higher values will reserve more space for failovers.Sets the slot size for memory to the specified value. HA must be reconfigured to make the configuration issue go away. Do not confuse with das. This advanced setting can be used when a virtual machine with a large CPU reservation skews the slot size. das.minUptime (VM Monitoring) . das.Set the time interval for status updates. Do not confuse with das.slotMemInMB .

if the amount is reached VM Monitoring doesn’t restart the machine automatically..maxFailures (VM Monitoring) . Default value is 3600 seconds. Soon in a book store near you..vmFailoverEnabled (VM Monitoring) . das.If set to true VM Monitoring is enabled.Maximum amount of virtual machine failures within the specified “das. VMware vSphere .. if a virtual machine fails more than das. Default value is 3.maxFailureWindow”.das...Minimum amount of seconds between failures.maxFailureWindow (VM Monitoring) .maxFailures within 3600 seconds VM Monitoring doesn’t restart the machine. Clustering technical deepdive . das. When it is set to false VM Monitoring is disabled.

Sign up to vote on this title
UsefulNot useful