Clusters and High Availability

By , ,

Date: Jan 8, 2022

Return to the article

In this sample chapter from VCP-DCV for vSphere 7.x (Exam 2V0-21.20) Official Cert Guide, 4th Edition, you will explore cluster concepts, the Distributed Resources Scheduler (DRS), vSphere High Availability (HA), and other resource management and availability features.

This chapter covers the following topics:

This chapter contains information related to Professional VMware vSphere 7.x (2V0-21.20) exam objectives 1.6, 1.6.1, 1.6.2, 1.6.3, 1.6.4, 1.6.4.1, 4.5, 4.6, 5.1, 5.1.1, 5.2, 7.5, and 7.11.5.

This chapters provides details on clusters and high availability in vSphere 7.0.

“Do I Know This Already?” Quiz

The “Do I Know This Already?” quiz allows you to assess whether you should study this entire chapter or move quickly to the “Exam Preparation Tasks” section. In any case, the authors recommend that you read the entire chapter at least once. Table 4-1 outlines the major headings in this chapter and the corresponding “Do I Know This Already?” quiz questions. You can find the answers in Appendix A, “Answers to the ‘Do I Know This Already?’ Quizzes and Review Questions.”

Table 4-1 “Do I Know This Already?” Section-to-Question Mapping

Foundation Topics Section

Questions

Cluster Concepts and Overview

1

Distributed Resource Scheduler (DRS)

2–4

vSphere High Availability (HA)

5–7

Other Resource Management and Availability Features

8–10

  1. You are configuring EVC Mode in a vSphere cluster that uses Intel hardware. Which of the following values should you choose to set the EVC Mode to the lowest level that includes the SSE4.2 instruction set?

    1. Merom

    2. Penryn

    3. Nehalem

    4. Westmere

  2. In vSphere 7.0, you want to configure the DRS migration threshold such that it is at the minimum level at which the virtual machine happiness is considered. Which of the following values should you choose?

    1. Level 1

    2. Level 2

    3. Level 3

    4. Level 4

    5. Level 5

  3. Which of the following is not a good use for resource pools in DRS?

    1. To delegate control and management

    2. To impact the use of network resources

    3. To impact the use of CPU resources

    4. To impact the use of memory resources

  4. You need your resource pool to use a two-pass algorithm to allocate reservations. In the second pass, excess pool reservation is allocated proportionally to virtual machines (limited by virtual machine size). Which step should you take?

    1. Ensure that vSphere 6.7 or higher is used.

    2. Ensure that vSphere 7.0 or higher is used.

    3. Enable scalable shares.

    4. Enable expandable reservations.

  5. You are configuring vSphere HA in a cluster. You want to configure the cluster to use a specific host as a target for failovers. Which setting should you use?

    1. Host Failures Cluster Tolerates

    2. Define Host Failover Capacity By set to Cluster Resource Percentage

    3. Define Host Failover Capacity By set to Slot Policy (Powered-on VMs)

    4. Define Host Failover Capacity By set to Dedicated Failover Hosts

    5. Define Host Failover Capacity By set to Disabled

  6. You are enabling VM Monitoring in a vSphere HA cluster. You want to set the monitoring level such that its failure interval is 60 seconds. Which of the following options should you choose?

    1. High

    2. Medium

    3. Low

    4. Normal

  7. You are configuring Virtual Machine Component Protection (VMCP) in a vSphere HA cluster. Which of the following statements is true?

    1. For PDL and APD failures, you can control the restart policy for virtual machines by setting it to Conservative or Aggressive.

    2. For PDL failures, you can control the restart policy for virtual machines by setting it to Conservative or Aggressive.

    3. For APD failures, you can control the restart policy for virtual machines by setting it to Conservative or Aggressive.

    4. For PDL and APD failures, you cannot control the restart policy for virtual machines.

  8. You want to use Predictive DRS. What is the minimum vSphere version you need?

    1. vSphere 6.0

    2. vSphere 6.5

    3. vSphere 6.7

    4. vSphere 7.0

  9. You are configuring vSphere Fault Tolerance (FT) in a vSphere 7.0 environment. What is the maximum number of virtual CPUs you can use with an FT-protected virtual machine?

    1. One

    2. Two

    3. Four

    4. Eight

  10. You are concerned about service availability for your vCenter Server. Which of the following statements is true?

    1. If a vCenter service fails, VMware Service Lifecycle Manager restarts it.

    2. If a vCenter service fails, VMware Lifecycle Manager restarts it.

    3. If a vCenter service fails, vCenter Server HA restarts it.

    4. VMware Service Lifecycle Manager is a part of the PSC.

Foundation Topics

Cluster Concepts and Overview

A vSphere cluster is a set of ESXi hosts that are intended to work together as a unit. When you add a host to a cluster, the host’s resources become part of the cluster’s resources. vCenter Server manages the resources of all hosts in a cluster as one unit. In addition to creating a cluster, assigning a name, and adding ESXi objects, you can enable and configure features on a cluster, such as vSphere Distributed Resource Scheduler (DRS), VMware Enhanced vMotion Compatibility (EVC), Distributed Power Management (DPM), vSphere High Availability (HA), and vSAN.

In the vSphere Client, you can manage and monitor the resources in a cluster as a single object. You can easily monitor and manage the hosts and virtual machines in the DRS cluster.

If you enable VMware EVC on a cluster, you can ensure that migrations with vMotion do not fail due to CPU compatibility errors. If you enable vSphere DRS on a cluster, you can allow automatic resource balancing using the pooled host resources in the cluster. If you enable vSphere HA on a cluster, you can allow rapid virtual machine recovery from host hardware failures, using the cluster’s available host resource capacity. If you enable DPM on a cluster, you can provide automated power management in the cluster. If you enable vSAN on a cluster, you use a logical SAN that is built on a pool of drives attached locally to the ESXi hosts in the cluster.

You can use the Quickstart workflow in the vSphere Client to create and configure a cluster. The Quickstart page provides three cards: Cluster Basics, Add Hosts, and Configure Cluster. For an existing cluster, you can use Cluster Basics to change the cluster name and enable cluster services, such as DRS and vSphere HA. You can use the Add Hosts card to add hosts to the cluster. You can use the Configure Cluster card to configure networking and other settings on the hosts in the cluster.

In addition, in vSphere 7.0 you can configure a few general settings for a cluster. For example, when you create a cluster, even if you do not enable DRS, vSphere, HA, or vSAN, you can choose to manage all hosts in the cluster with a single image. With this option, all hosts in a cluster inherit the same image, which reduces variability between hosts, improves your ability to ensure hardware compatibility, and simplifies upgrades. This feature requires hosts to already be ESXi 7.0 or above. It replaces baselines. Once it is enabled, baselines cannot be used in this cluster.

Enhanced vMotion Compatibility (EVC)

EVC is a cluster setting that can improve CPU compatibility between hosts for supporting vMotion. vMotion migrations are live migrations that require compatible instruction sets for source and target processors used by the virtual machine. The source and target processors must come from the same vendor class (AMD or Intel) to be vMotion compatible. The clock speed, cache size, and number of cores can differ between source and target processors. When you start a vMotion migration or a migration of a suspended virtual machine, the wizard checks the destination host for compatibility; it displays an error message if problems exist. Using EVC, you can allow vMotion between some processors that would normally be incompatible.

The CPU instruction set that is available to a virtual machine guest OS is determined when the virtual machine is powered on. This CPU feature set is based on the following items:

EVC ensures that all hosts in a cluster present the same CPU feature set to virtual machines, even if the actual CPUs on the hosts differ. If you enable the EVC cluster setting, you can configure the EVC Mode with a baseline CPU feature set. EVC ensures that hosts in a cluster use the baseline feature set when presenting an instruction set to a guest OS. EVC uses AMD-V Extended Migration technology for AMD hosts and Intel FlexMigration technology for Intel hosts to mask processor features; this allows hosts to present the feature set of an earlier generation of processor. You should configure EVC Mode to accommodate the host with the smallest feature set in the cluster.

The EVC requirements for hosts include the following.

You can configure the EVC settings by using the Quickstart > Configure Cluster workflow in the vSphere Client. You can also configure EVC directly in the cluster settings. The options for VMware EVC are Disable EVC, Enable EVC for AMD Hosts, and Enable EVC for Intel Hosts.

If you choose Enable EVC for Intel Hosts, you can set the EVC Mode to one of the options described in Table 4-2.

Table 4-2 EVC Modes for Intel

Level

EVC Mode

Description

L0

Intel Merom

Smallest Intel feature set for EVC mode.

L1

Intel Penryn

Includes the Intel Merom feature set and exposes additional CPU features, including SSE4.1.

L2

Intel Nehalem

Includes the Intel Penryn feature set and exposes additional CPU features, including SSE4.2 and POPCOUNT.

L3

Intel Westmere

Includes the Intel Nehalem feature set and exposes additional CPU features, including AES and PCLMULQDQ.

L4

Intel Sandy Bridge

Includes the Intel Westmere feature set and exposes additional CPU features, including AVX and XSAVE.

L5

Intel Ivy Bridge

Includes the Intel Sandy Bridge feature set and exposes additional CPU features, including RDRAND, ENFSTRG, FSGSBASE, SMEP, and F16C.

L6

Intel Haswell

Includes the Intel Ivy Bridge feature set and exposes additional CPU features, including ABMX2, AVX2, MOVBE, FMA, PERMD, RORX/MULX, INVPCID, and VMFUNC.

L7

Intel Broadwell

Includes the Intel Haswell feature set and exposes additional CPU features, including Transactional Synchronization Extensions, Supervisor Mode Access Prevention, Multi-Precision Add-Carry Instruction Extensions, PREFETCHW, and RDSEED.

L8

Intel Skylake

Includes the Intel Broadwell feature set and exposes additional CPU features, including Advanced Vector Extensions 512, Persistent Memory Support Instructions, Protection Key Rights, Save Processor Extended States with Compaction, and Save Processor Extended States Supervisor.

L9

Intel Cascade Lake

Includes the Intel Skylake feature set and exposes additional CPU features, including VNNI and XGETBV with ECX=1.

If you choose Enable EVC for AMD Hosts, you can set EVC Mode to one of the options described in Table 4-3.

Table 4-3 EVC Modes for AMD

Level

EVC Mode

Description

A0

AMD Opteron Generation 1

Smallest AMD feature set for EVC mode.

A1

AMD Opteron Generation 2

Includes the AMD Generation 1 feature set and exposes additional CPU features, including CPMXCHG16B and RDTSCP.

A3

AMD Opteron Generation 3

Includes the AMD Generation 2 feature set and exposes additional CPU features, including SSE4A, MisAlignSSE, POPCOUNT, and ABM (LZCNT).

A2, B0

AMD Opteron Generation 3 (without 3DNow!)

Includes the AMD Generation 3 feature set without 3DNow support.

B1

AMD Opteron Generation 4

Includes the AMD Generation 3 no3DNow feature set and exposes additional CPU features, including SSSE3, SSE4.1, AES, AVX, XSAVE, XOP, and FMA4.

B2

AMD Opteron Piledriver

Includes the AMD Generation 4 feature set and exposes additional CPU features, including FMA, TBM, BMI1, and F16C.

B3

AMD Opteron Steamroller

Includes the AMD Piledriver feature set and exposes additional CPU features, including XSAVEOPT RDFSBASE, RDGSBASE, WRFSBASE, WRGSBAS, and FSGSBASE.

B4

AMD Zen

Includes the AMD Steamroller feature set and exposes additional CPU features, including RDRAND, SMEP, AVX2, BMI2, MOVBE, ADX, RDSEED, SMAP, CLFLUSHOPT, XSAVES, XSAVEC, SHA, and CLZERO.

B5

AMD Zen 2

Includes the AMD Zen feature set and exposes additional CPU features, including CLWB, UMIP, RDPID, XGETBV with ECX = 1, WBNOINVD, and GMET.

vSAN Services

You can enable DRS, vSphere HA, and vSAN at the cluster level. The following sections provide details on DRS and vSphere HA. For details on vSAN, see Chapter 2.

Distributed Resource Scheduler (DRS)

DRS distributes compute workload in a cluster by strategically placing virtual machines during power-on operations and live migrating (vMotion) VMs when necessary. DRS provides many features and settings that enable you to control its behavior.

You can set DRS Automation Mode for a cluster to one of the following:

You can override Automation Mode at the virtual machine level.

Recent DRS Enhancements

VMware added many improvements to DRS beginning in vSphere 6.5. For example, in vSphere 7.0, DRS runs once every minute rather than every 5 minutes, as in older DRS versions. The newer DRS versions tend to recommend smaller (in terms of memory) virtual machines for migration to facilitate faster vMotion migrations, whereas older versions tend to recommend large virtual machines to minimize the number of migrations. Older DRS versions use an imbalance metric that is derived from the standard deviation of load across the hosts in the cluster. Newer DRS versions focus on virtual machine happiness. Newer DRS versions are much lighter and faster than the older versions.

Newer DRS versions recognize that vMotion is an expensive operation and account for it in their recommendations. In a cluster where virtual machines are frequently powered on and the workload is volatile, it is not necessary to continuously migrate virtual machines. DRS calculates the gain duration for live migrating a virtual machine and considers the gain duration when making recommendations.

The following sections provide details on other recent DRS enhancements.

Network-Aware DRS

In vSphere 6.5, DRS considers the utilization of host network adapters during initial placement and load balancing, but it does not balance the network load. Instead, its goal is to ensure that the target host has sufficient available network resources. It works by eliminating hosts with saturated networks from the list of possible migration hosts. The threshold used by DRS for network saturation is 80% by default. When DRS cannot migrate VMs due to network saturation, the result may be an imbalanced cluster.

In vSphere 7.0, DRS uses a new cost modeling algorithm that is flexible and balances network bandwidth along with CPU and memory usage.

Virtual Machine Distribution

Starting in vSphere 6.5, you can enable an option to distribute a more even number of virtual machines across hosts. The main use case for this is to improve availability. The primary goal of DRS—to ensure that all VMs are getting the resources they need and that the load is balanced in the cluster—remains unchanged. But with this new option enabled, DRS also tries to ensure that the number of virtual machines per host is balanced in the cluster.

Memory Metric for Load Balancing

Historically, vSphere has used the Active Memory metric for load-balancing decisions. In vSphere 6.5 and 6.7, you have the option to set DRS to balance the load based on the Consumed Memory metric. In vSphere 7.0, the Granted Memory metric is used for load balancing, and no cluster option is available to change the behavior.

Virtual Machine Initial Placement

Starting with vSphere 6.5, DRS uses a new initial placement algorithm that is faster, lighter, and more effective than the previous algorithm. In earlier versions, DRS takes a snapshot of the cluster state when making virtual machine placement recommendations. In the algorithm, DRS does not snapshot the cluster state, which allows for faster and more accurate recommendations. With the new algorithm, DRS powers on virtual machines much more quickly. In vSphere 6.5, the new placement feature is not supported for the following configurations:

In vSphere 6.7, the new placement is available for all configurations.

Enhancements to the Evacuation Workflow

Prior to vSphere 6.5, when evacuating a host entering Maintenance Mode, DRS waited to migrate templates and powered off virtual machines until after the completion of vMotion migrations, leaving those objects unavailable for use for a long time. Starting in vSphere 6.5, DRS prioritizes the migration of virtual machine templates and powered-off virtual machines over powered-on virtual machines, making those objects available for use without waiting on vMotion migrations.

Prior to vSphere 6.5, the evacuation of powered-off virtual machines was inefficient. Starting in vSphere 6.5, these evacuations occur in parallel, making use of up to 100 re-register threads per vCenter Server. This means that you may see only a small difference when evacuating up to 100 virtual machines.

Starting in vSphere 6.7, DRS is more efficient in evacuating powered-on virtual machines from a host that is entering Maintenance Mode. Instead of simultaneously initiating vMotion for all the powered-on VMs on the host, as in previous versions, DRS initiates vMotion migrations in batches of eight at a time. Each vMotion batch is issued after the previous batch completes. The vMotion batching makes the entire workflow more controlled and predictable.

DRS Support for NVM

Starting in vSphere 6.7, DRS supports virtual machines running on next-generation persistent memory devices, known as non-volatile memory (NVM) devices. NVM is exposed as a datastore that is local to the host. Virtual machines can use the datastore as an NVM device exposed to the guest (Virtual Persistent Memory [vPMem]) or as a location for a virtual machine disk (Virtual Persistent Memory Disk [vPMemDisk]). DRS is aware of the NVM devices used by virtual machines and guarantees that the destination ESXi host has enough free persistent memory to accommodate placements and migrations.

How DRS Scores VMs

Historically, DRS balanced the workload in a cluster based on host compute resource usage. In vSphere 7.0, DRS balances the workload based on virtual machine happiness. A virtual machine’s DRS score is a measure of its happiness, which, in turn, is a measure of the resources available for consumption by the virtual machine. The higher the DRS score for a VM, the better its resource availability. DRS moves virtual machines to improve their DRS scores. DRS also calculates a DRS score for a cluster, which is a weighted sum of the DRS scores of all the virtual machines in the cluster.

In Sphere 7.0, DRS calculates the core for each virtual machine on each ESXi host in the cluster every minute. Simply put, DRS logic computes an ideal throughput (demand) and an actual throughput (goodness) for each resource (CPU, memory, and network) for each virtual machine. The virtual machine’s efficiency for a particular resource is a ratio of the goodness over the demand. A virtual machine’s DRS score (total efficiency) is the product of its CPU, memory, and network efficiencies.

When calculating the efficiency, DRS applies resource costs. For CPU resources, DRS includes costs for CPU cache, CPU ready, and CPU tax. For memory resources, DRS includes costs for memory burstiness, memory reclamation, and memory tax. For network resources, DRS includes a network utilization cost.

DRS compares a virtual machine’s DRS score for the host on which it currently runs. DRS determines whether another host can provide a better DRS score for the virtual machine. If so, DRS calculates the cost for migrating the virtual machine to the host and factors that score into its load-balancing decision.

DRS Rules

You can configure rules to control the behavior of DRS.

A VM–host affinity rule specifies whether the members of a selected virtual machine DRS group can run on the members of a specific host DRS group. Unlike a virtual machine–to–virtual machine (VM–VM) affinity rule, which specifies affinity (or anti-affinity) between individual virtual machines, a VM–host affinity rule specifies an affinity relationship between a group of virtual machines and a group of hosts. There are required rules (designated by “must”) and preferential rules (designated by “should”).

A VM–host affinity rule includes the following components:

A VM–VM affinity rule specifies whether selected individual virtual machines should run on the same host or be kept on separate hosts. This type of rule is used to create affinity or anti-affinity between individual virtual machines. When an affinity rule is created, DRS tries to keep the specified virtual machines together on the same host. You might want to do this, for example, for performance reasons.

With an anti-affinity rule, DRS tries to keep the specified virtual machines apart. You can use such a rule if you want to guarantee that certain virtual machines are always on different physical hosts. In that case, if a problem occurs with one host, not all virtual machines are at risk. You can create VM–VM affinity rules to specify whether selected individual virtual machines should run on the same host or be kept on separate hosts.

VM–VM affinity rule conflicts can occur when you use multiple VM–VM affinity and VM–VM anti-affinity rules. If two VM–VM affinity rules are in conflict, you cannot enable both of them. For example, if one rule keeps two virtual machines together and another rule keeps the same two virtual machines apart, you cannot enable both rules. Select one of the rules to apply and disable or remove the conflicting rule. When two VM–VM affinity rules conflict, the older one takes precedence, and the newer rule is disabled. DRS tries to satisfy only enabled rules and ignores disabled rules. DRS gives higher precedence to preventing violations of anti-affinity rules than violations of affinity rules.

DRS Migration Sensitivity

Prior to vSphere 7.0, DRS used a migration threshold to determine when virtual machines should be migrated to balance the cluster workload. In vSphere 7.0, DRS does not consider cluster standard deviation for load balancing. Instead, it is designed to be more virtual machine centric and workload centric rather than cluster centric. You can set the DRS Migration Sensitivity parameter to one of the following values:

Resource Pools

Resource pools are container objects in the vSphere inventory that are used to compartmentalize the CPU and memory resources of a host, a cluster, or a parent resource pool. Virtual machines run in and draw resources from resource pools. You can create multiple resource pools as direct children of a standalone host or a DRS cluster. You cannot create child resource pools on a host that has been added to a cluster or on a cluster that is not enabled for DRS.

You can use resource pools to organize VMs. You can delegate control over each resource pool to specific individuals and groups. You can monitor resources and set alarms on resource pools. If you need a container just for organization and permission purposes, consider using a folder. If you also need resource management, then consider using a resource pool. You can assign resource settings such as shares, reservations, and limits to resource pools.

Use Cases

You can use resource pools to compartmentalize a cluster’s resources and then use the resource pools to delegate control to individuals or organizations. Table 4-4 provides some use cases for resource pools.

Table 4-4 Resource Pool Use Cases

Use Case

Details

Flexible hierarchical organization

Add, remove, modify, and reorganize resource pools, as needed.

Resource isolation

Use resource pools to allocate resources to separate departments, in such a manner that changes in a pool do not unfairly impact other departments.

Access control and delegation

Use permissions to delegate activities, such as virtual machine creation and management, to other administrators.

Separation of resources from hardware

In a DRS cluster, perform resource management independently of the actual hosts.

Managing multitier applications.

Manage the resources for a group of virtual machines (in a specific resource pool), which is easier than managing resources per virtual machine.

Shares, Limits, and Reservations

You can configure CPU and memory shares, reservations, and limits on resource pools, as described in Table 4-5.

Table 4-5 Shares, Limits, and Reservations

Option

Description

Shares

Shares specify the relative importance of a virtual machine or a resource pool. If a virtual machine has twice as many shares of a resource as another virtual machine, it is entitled to consume twice as much of that resource when these two virtual machines are competing for resources. Shares can be thought of as priority under contention.

Shares are typically set to High, Normal, or Low, and these values specify share values with a 4:2:1 ratio. You can also select Custom and assign a specific number of shares (to express a proportional weight).

A resource pool uses its shares to compete for the parent’s resources and is allocated a portion based on the ratio of the pool’s shares compared with its siblings. Siblings share the parent’s resources according to their relative share values, bounded by the reservation and limit.

For example, consider a scenario where a cluster has two child resource pools with normal CPU shares, another child resource pool with high CPU shares, and no other child objects. During periods of contention, each of the pools with normal shares would get access to 25% of the cluster’s CPU resources, and the pool with high shares would get access to 50%.

Reservations

A reservation specifies the guaranteed minimum allocation for a virtual machine or a resource pool. A CPU reservation is expressed in megahertz, and a memory reservation is expressed in megabytes. You can power on a virtual machine only if there are enough unreserved resources to satisfy the reservation of the virtual machine. If the virtual machine starts, then it is guaranteed that amount, even when the physical server is heavily loaded.

For example, if you configure the CPU reservation for each virtual machine as 1 GHz, you can start eight VMs in a resource pool where the CPU reservation is set for 8 GHz and expandable reservations are disabled. But you cannot start additional virtual machines in the pool.

You can use reservations to guarantee a specific amount of resources for a resource pool. The default value for a resource pool’s CPU or memory reservation is 0. If you change this value, it is subtracted from the unreserved resources of the parent. The resources are considered reserved, regardless of whether virtual machines are associated with the resource pool.

Expandable reservations

You can enable expandable reservations to effectively allow a child resource pool to borrow from its parent. Expandable reservations, which are enabled by default, are considered during admission control. When powering on a virtual machine, if the resource pool does not have sufficient unreserved resources, the resource pool can use resources from its parent or ancestors.

For example, say that in a resource pool where 8 GHz is reserved and expandable reservations is disabled, you try to start nine virtual machines each with 1 GHz, but the last virtual machine does not start. If you enable expandable reservation in the resource pool, and its parent pool (or cluster) has sufficient unreserved CPU resources, you can start the ninth virtual machine.

Limits

A limit specifies an upper bound for CPU or memory resources that can be allocated to a virtual machine or a resource pool.

You can set a limit on the amount of CPU and memory allocated to a resource pool. The default is unlimited. For example, if you power on multiple CPU-intensive virtual machines in a resource pool, where the CPU limit is 10 GHz, then, collectively, the virtual machines cannot use more than 10 GHz CPU resources, regardless of the pool’s reservation settings, the pool’s share settings, or the amount of available resources in the parent.

Table 4-6 provides the CPU and memory share values for virtual machines when using the High, Normal, and Low settings. The corresponding share values for a resource pool are equivalent to those of a virtual machine with four vCPUs and 16 GB memory.

Table 4-6 Virtual Machine Shares

Setting

CPU Share Value

Memory Share Value

High

2000 per vCPU

20 per MB

Normal

1000 per vCPU

10 per MB

Low

500 per vCPU

5 per MB

For example, the share values for a resource pool configured with normal CPU shares and high memory shares are 4000 (that is, 4 × 1000) CPU shares and 327,680 (that is, 16 × 1024 × 20) memory shares

Enhanced Resource Pool Reservation

Starting in vSphere 6.7, DRS uses a new two-pass algorithm to allocate resource reservations to children. The old allocation model does not reserve more resources than the current demand, even when the resource pool is configured with a higher reservation. When a spike in virtual machine demand occurs after resource allocation is complete, DRS does not make the remaining pool reservation available to the virtual machine until the next allocation operation occurs. As a result, a virtual machine’s performance may be temporarily impacted. In the new allocation model, each allocation operation uses two passes. In the first pass, the resource pool reservation is allocated based on virtual machine demand. In the second pass, excess pool reservation is allocated proportionally, limited by the virtual machine’s configured size, which reduces the performance impact due to virtual machine spikes.

Scalable Shares

Another new DRS feature in vSphere 7.0 is scalable shares. The main use case for scalable shares is a scenario in which you want to use shares to give high-priority resource access to a set of virtual machines in a resource pool, without concern for the relative number of objects in the pool compared to other pools. With standard shares, each pool in a cluster competes for resource allocation with its siblings, based on the share ratio. With scalable shares, the allocation for each pool factors in the number of objects in the pool.

For example, consider a scenario in which a cluster with 100 GHz CPU capacity has a high-priority resource pool with CPU Shares set to High and a low-priority resource pool with CPU Shares set to Normal, as shown in Figure 4-1. This means that the share ratio between the pools is 2:1, so the high-priority pool is effectively allocated twice the CPU resources as the low-priority pool whenever CPU contention exists in the cluster. The high-priority pool is allocated 66.7 GHz, and the low-priority pool is effectively allocated 33.3 GHz. In this cluster, 40 virtual machines of equal size are running, with 32 in the high-priority pool and 8 in the low-priority pool. The virtual machines are all demanding CPU resources, causing CPU contention in the cluster. In the high-priority pool, each virtual machine is allocated 2.1 GHz. In the low-priority pool, each virtual machine is allocated 4.2 GHz.

FIGURE 4-1 Scalable Shares Example

If you want to change the resource allocation such that each virtual machine in the high-priority pool is effectively allocated more resources than the virtual machines in the low-priority pool, you can use scalable shares. If you enable scalable shares in the cluster, DRS effectively allocates resources to the pools based on the Shares settings and the number of virtual machines in the pool. In this example, the CPU shares for the pools provide a 2:1 ratio. Factoring this with the number of virtual machines in each pool, the allocation ratio between the high-priority pool and the low-priority pool is 2 times 32 to 1 times 8, or simply 8:1. The high-priority pool is allocated 88.9 GHz, and the low-priority pool is allocated 11.1 GHz. Each virtual machine in the high-priority pool is allocated 2.8 GHz. Each virtual machine in the low-priority pool is allocated 1.4 GHz.

vSphere High Availability (HA)

vSphere HA is a cluster service that provides high availability for the virtual machines running in the cluster. You can enable vSphere High Availability (HA) on a vSphere cluster to provide rapid recovery from outages and cost-effective high availability for applications running in virtual machines. vSphere HA provides application availability in the following ways:

Benefits of vSphere HA over traditional failover solutions include the following:

vSphere HA can detect the following types of host issues:

When you enable vSphere HA on a cluster, the cluster elects one of the hosts to act as the primary host. The primary host communicates with vCenter Server to report cluster health. It monitors the state of all protected virtual machines and secondary hosts. It uses network and datastore heartbeating to detect failed hosts, isolation, and network partitions. vSphere HA takes appropriate actions to respond to host failures, host isolation, and network partitions. For host failures, the typical reaction is to restart the failed virtual machines on surviving hosts in the cluster. If a network partition occurs, a primary host is elected in each partition. If a specific host is isolated, vSphere HA takes the predefined host isolation action, which may be to shut down or power down the host’s virtual machines. If the primary host fails, the surviving hosts elect a new primary host. You can configure vSphere to monitor and respond to virtual machine failures, such as guest OS failures, by monitoring heartbeats from VMware Tools.

vSphere HA Requirements

When planning a vSphere HA cluster, you need to address the following requirements:

vSphere HA Response to Failures

You can configure how a vSphere HA cluster should respond to different types of failures, as described in Table 4-7.

Table 4-7 vSphere HA Response to Failure Settings

Option

Description

Host Failure Response > Failure Response

If Enabled, the cluster responds to host failures by restarting virtual machines. If Disabled, host monitoring is turned off, and the cluster does not respond to host failures.

Host Failure Response > Default VM Restart Priority

You can indicate the order in which virtual machines are restarted when the host fails (higher priority machines first).

Host Failure Response > VM Restart Priority Condition

This condition must be met before HA restarts the next priority group.

Response for Host Isolation

You can indicate the action that you want to occur if a host becomes isolated. You can choose Disabled, Shutdown and Restart VMs, or Power Off and Restart VMs.

VM Monitoring

You can indicate the sensitivity (Low, High, or Custom) with which vSphere HA responds to lost VMware Tools heartbeats.

Application Monitoring

You can indicate the sensitivity (Low, High, or Custom) with which vSphere HA responds to lost application heartbeats.

Heartbeats

The primary host and secondary hosts exchange network heartbeats every second. When the primary host stops receiving these heartbeats from a secondary host, it checks for ping responses or the presence of datastore heartbeats from the secondary host. If the primary host does not receive a response after checking for a secondary host’s network heartbeat, ping, or datastore heartbeats, it declares that the secondary host has failed. If the primary host detects datastore heartbeats for a secondary host but no network heartbeats or ping responses, it assumes that the secondary host is isolated or in a network partition.

If any host is running but no longer observes network heartbeats, it attempts to ping the set of cluster isolation addresses. If those pings also fail, the host declares itself to be isolated from the network.

vSphere HA Admission Control

vSphere uses admission control when you power on a virtual machine. It checks the amount of unreserved compute resources and determines whether it can guarantee that any reservation configured for the virtual machine is configured. If so, it allows the virtual machine to power on. Otherwise, it generates an “Insufficient Resources” warning.

vSphere HA Admission Control is a setting that you can use to specify whether virtual machines can be started if they violate availability constraints. The cluster reserves resources so that failover can occur for all running virtual machines on the specified number of hosts. When you configure vSphere HA admission control, you can set options described in Table 4-8.

Table 4-8 vSphere HA Admission Control Options

Option

Description

Host Failures Cluster Tolerates

Specifies the maximum number of host failures for which the cluster guarantees failover

Define Host Failover Capacity By set to Cluster Resource Percentage

Specifies the percentage of the cluster’s compute resources to reserve as spare capacity to support failovers

Define Host Failover Capacity By set to Slot Policy (powered-on VMs)

Specifies a slot size policy that covers all powered-on VMs

Define Host Failover Capacity By set to Dedicated Failover Hosts

Specifies the designated hosts to use for failover actions

Define Host Failover Capacity By set to Disabled

Disables admission control

Performance Degradation VMs Tolerate

Specifies the percentage of performance degradation the VMs in a cluster are allowed to tolerate during a failure

If you disable vSphere HA admission control, then you enable the cluster to allow virtual machines to power on regardless of whether they violate availability constraints. In the event of a host failover, you may discover that vSphere HA cannot start some virtual machines.

In vSphere 6.5, the default Admission Control setting is Cluster Resource Percentage, which reserves a percentage of the total available CPU and memory resources in the cluster. For simplicity, the percentage is calculated automatically by defining the number of host failures to tolerate (FTT). The percentage is dynamically changed as hosts are added to or removed from the cluster. Another new enhancement is the Performance Degradation VMs Tolerate setting, which controls the amount of performance reduction that is tolerated after a failure. A value of 0% indicates that no performance degradation is tolerated.

With the Slot Policy option, vSphere HA admission control ensures that a specified number of hosts can fail, leaving sufficient resources in the cluster to accommodate the failover of the impacted virtual machines. Using the Slot Policy option, when you perform certain operations, such as powering on a virtual machine, vSphere HA applies admission control in the following manner:

If a cluster has a few virtual machines that have much larger reservations than the others, they will distort slot size calculation. To remediate this, you can specify an upper bound for the CPU or memory component of the slot size by using advanced options. You can also set a specific slot size (CPU size and memory size). The next section describes the advanced options that affect the slot size.

vSphere HA Advanced Options

You can set vSphere HA advanced options by using the vSphere Client or in the fdm.cfg file on the hosts. Table 4-9 provides some of the advanced vSphere HA options.

Table 4-9 Advanced vSphere HA Options

Option

Description

das.isolationaddressX

Provides the addresses to use to test for host isolation when no heartbeats are received from other hosts in the cluster. If this option is not specified (which is the default setting), the management network default gateway is used to test for isolation. To specify multiple addresses, you can set das.isolationaddressX, where X is a number between 0 and 9.

das.usedefaultisolationaddress

Specifies whether to use the default gateway IP address for isolation tests.

das.isolationshutdowntimeout

For scenarios where the host’s isolation response is to shut down, specifies the period of time that the virtual machine is permitted to shut down before the system powers it off.

das.slotmeminmb

Defines the maximum bound on the memory slot size.

das.slotcpuinmhz

Defines the maximum bound on the CPU slot size.

das.vmmemoryminmb

Defines the default memory resource value assigned to a virtual machine whose memory reservation is not specified or is zero. This is used for the Host Failures Cluster Tolerates admission control policy.

das.vmcpuminmhz

Defines the default CPU resource value assigned to a virtual machine whose CPU reservation is not specified or is zero. This is used for the Host Failures Cluster Tolerates admission control policy. If no value is specified, the default of 32 MHz is used.

das.heartbeatdsperhost

Specifies the number of heartbeat datastores required per host. The default is 2. The acceptable values are 2 to 5.

das.config.fdm.isolationPolicyDelaySec

Specifies the number of seconds the system delays before executing the isolation policy after determining that a host is isolated. The minimum is 30. A lower value results in a 30-second delay.

das.respectvmvmantiaffinityrules

Determines whether vSphere HA should enforce VM–VM anti-affinity rules even when DRS is not enabled.

Virtual Machine Settings

To use the Host Isolation Response Shutdown and Restart VMs setting, you must install VMware Tools on the virtual machine. If a guest OS fails to shut down in 300 seconds (or a value specified by das.isolationshutdowntimeout), the virtual machine is powered off.

You can override the cluster’s settings for Restart Priority and Isolation Response for each virtual machine. For example, you might want to prioritize virtual machines providing infrastructure services such as DNS or DHCP.

At the cluster level, you can create dependencies between groups of virtual machines. You can create VM groups, host groups, and dependency rules between the groups. In the rules, you can specify that one VM group cannot be restarted if another specific VM group is started.

VM Component Protection (VMCP)

Virtual Machine Component Protection (VMCP) is a vSphere HA feature that can detect datastore accessibility issues and provide remediation for affected virtual machines. When a failure occurs such that a host can no longer access the storage path for a specific datastore, vSphere HA can respond by taking actions such as creating event alarms or restarting a virtual machine on other hosts. The main requirements are that vSphere HA is enabled in the cluster and that ESX 6.0 or later is used on all hosts in the cluster.

The failures VMCP detects are permanent device loss (PDL) and all paths down (APD). PDL is an unrecoverable loss of accessibility to the storage device that cannot be fixed without powering down the virtual machines. APD is a transient accessibility loss or other issue that is recoverable.

For PDL and APD failures, you can set VMCP to either issue event alerts or to power off and restart virtual machines. For APD failures only, you can additionally control the restart policy for virtual machines by setting it to Conservative or Aggressive. With the Conservative setting, the virtual machine is powered off only if HA determines that it can be restarted on another host. With the Aggressive setting, HA powers off the virtual machine regardless of the state of other hosts.

Virtual Machine and Application Monitoring

VM Monitoring restarts specific virtual machines if their VMware Tools heartbeats are not received within a specified time. Likewise, Application Monitoring can restart a virtual machine if the heartbeats from a specific application in the virtual machine are not received. If you enable these features, you can configure the monitoring settings to control the failure interval and reset period. Table 4-10 lists these settings.

Table 4-10 VM Monitoring Settings

Setting

Failure Interval

Reset Period

High

30 seconds

1 hour

Medium

60 seconds

24 hours

Low

120 seconds

7 days

The Maximum per-VM resets setting can be used to configure the maximum number of times vSphere HA attempts to restart a specific failing virtual machine within the reset period.

vSphere HA Best Practices

You should provide network path redundancy between cluster nodes. To do so, you can use NIC teaming for the virtual switch. You can also create a second management network connection, using a separate virtual switch.

When performing disruptive network maintenance operations on the network used by clustered ESXi hosts, you should suspend the Host Monitoring feature to ensure that vSphere HA does not falsely detect network isolation or host failures. You can reenable host monitoring after completing the work.

To keep vSphere HA agent traffic on the specified network, you should ensure that the VMkernel virtual network adapters used for HA heartbeats (enabled for management traffic) do not share the same subnet as VMkernel adapters used for vMotion and other purposes.

Use the das.isolationaddressX advanced option to add an isolation address for each management network.

Proactive HA

Proactive High Availability (Proactive HA) integrates with select hardware partners to detect degraded components and evacuate VMs from affected vSphere hosts before an incident causes a service interruption. Hardware partners offer a vCenter Server plug-in to provide the health status of the system memory, local storage, power supplies, cooling fans, and network adapters. As hardware components become degraded, Proactive HA determines which hosts are at risk and places them into either Quarantine Mode or Maintenance Mode. When a host enters Maintenance Mode, DRS evacuates its virtual machines to healthy hosts, and the host is not used to run virtual machines. When a host enters Quarantine Mode, DRS leaves the current virtual machines running on the host but avoids placing or migrating virtual machines to the host. If you prefer that Proactive HA simply make evacuation recommendations rather than automatic migrations, you can set Automation Level to Manual.

The vendor-provided health providers read sensor data in the server and provide the health state to vCenter Server. The health states are Healthy, Moderate Degradation, Severe Degradation, and Unknown.

Other Resource Management and Availability Features

This section describes other vSphere features related to resource management and availability.

Predictive DRS

Predictive DRS is a feature in vSphere 6.5 and later that leverages the predictive analytics of vRealize Operations (vROps) Manager and vSphere DRS. Together, these two products can provide workload balancing prior to the occurrence of resource utilization spikes and resource contention. Every night, vROps calculates dynamic thresholds, which are used to create forecasted metrics for the future utilization of virtual machines. vROps passes the predictive metrics to vSphere DRS to determine the best placement and balance of virtual machines before resource utilization spikes occur. Predictive DRS helps prevent resource contention on hosts that run virtual machines with predictable utilization patterns.

The following prerequisites are needed to run Predictive DRS:

Distributed Power Management (DPM)

The vSphere Distributed Power Management (DPM) feature enables a DRS cluster to reduce its power consumption by powering hosts on and off, as needed, based on cluster resource utilization. DPM monitors the cumulative virtual machine demand for memory and CPU resources in the cluster and compares this to the available resources in the cluster. If sufficient excess capacity is found, vSphere DPM directs the host to enter Standby Mode. When DRS detects that a host is entering Standby Mode, it evacuates the virtual machines. Once the host is evacuated, DPM powers if off, and the host is in Standby Mode. When DPM determines that capacity is inadequate to meet the resource demand, DPM brings a host out of Standby Mode by powering it on. Once the host exits Standby Mode, DRS migrates virtual machines to it.

To power on a host, DPM can use one of three power management protocols: Intelligent Platform Management Interface (IPMI), Hewlett-Packard Integrated Lights-Out (iLO), or Wake-on-LAN (WoL). If a host supports multiple protocols, they are used in the following order: IPMI, iLO, WOL. If a host does not support one of these protocols, DPM cannot automatically bring a host out of Standby Mode.

DPM is very configurable. As with DRS, you can set DPM’s automation to be manual or automatic.

To configure IPMI or iLO settings for a host, you can edit the host’s Power Management settings. You should provide credentials for the Baseboard Management Controller (BMC) account, the IP address of the appropriate NIC, and the MAC address of the NIC.

Using WOL with DPM requires that the following prerequisites be met:

Before enabling DPM, use the vSphere Client to request the host to enter Standby Mode. After the host powers down, right-click the host and attempt to power on. If this is successful, you can allow the host to participate in DPM. Otherwise, you should disable power management for the host.

You can enable DPM in a DRS cluster’s settings. You can set Automation Level to Off, Manual, or Automatic. When this option is set to Off, DPM is disabled. When it is set to Manual, DPM makes recommendations only. When it is set to Automatic, DPM automatically performs host power operations as needed.

Much as with DRS, with DPM you can control the aggressiveness of DPM (that is, the DPM threshold) with a slider bar in the vSphere Client. The DRS threshold and the DPM threshold are independent of one another. You can override automation settings per host. For example, for a 16-host cluster, you might want to set DPM Automation to Automatic on only 8 of the hosts.

Fault Tolerance (FT)

If you have virtual machines that require continuous availability as opposed to high availability, you can consider protecting the virtual machines with vSphere Fault Tolerance (FT). FT provides continuous availability for a virtual machine (the primary VM) by ensuring that the state of a secondary VM is identical at any point in the instruction execution of the virtual machine.

If the host running the primary VM fails, an immediate and transparent failover occurs. The secondary VM becomes the primary VM host without losing network connection or in-progress transactions. With transparent failover, there is no data loss, and network connections are maintained. The failover is fully automated and occurs even if vCenter Server is unavailable. Following the failover, FT spawns a new secondary VM and reestablishes redundancy and protection, assuming that a host with sufficient resources is available in the cluster. Likewise, if the host running the secondary VM fails, a new secondary VM is deployed. vSphere Fault Tolerance can accommodate symmetric multiprocessor (SMP) virtual machines with up to eight vCPUs.

Use cases for FT include the following:

Before implementing FT, consider the following requirements:

You should also consider the following VMware recommendations concerning vSphere FT:

The following vSphere features are not supported for FT-protected virtual machines:

You should apply the following best practices for FT:

In vSphere 6.5, FT is supported with DRS only when EVC is enabled. You can assign a DRS automation to the primary VM and let the secondary VM assume the same setting. If you enable FT for a virtual machine in a cluster where EVC is disabled, the virtual machine DRS automation level is automatically disabled. Starting in vSphere 6.7, EVC is not required for FT to support DRS.

To enable FT, you first create a VMkernel virtual network adapter on each host and connect to the FT Logging network. You should enable vMotion on a separate VMkernel adapter and network.

When you enable FT protection for a virtual machine, the following events occur:

Legacy FT VMs can exist only on ESXi hosts running on vSphere versions earlier than 6.5. If you require legacy FT, you should configure a separate vSphere 6.0 cluster.

vCenter Server High Availability

vCenter Server High Availability (vCenter HA) is described in Chapter 1, “vSphere Overview, Components, and Requirements.” vCenter HA implementation is covered in Chapter 8, “vSphere Installation.” vCenter HA management is covered in Chapter 13, “Managing vSphere and vCenter Server.”

VMware Service Lifecyle Manager

If a vCenter service fails, VMware Service Lifecycle Manager (vmon) restarts it. VMware Service Lifecycle Manager is a service running in a vCenter server that monitors the health of services and takes preconfigured remediation action when it detects a failure. If multiple attempts to restart a service fail, the service is considered failed.

Exam Preparation Tasks

As mentioned in the section “How to Use This Book” in the Introduction, you have some choices for exam preparation: the exercises here, Chapter 15, “Final Preparation,” and the exam simulation questions on the companion website.

Review All Key Topics

Review the most important topics in this chapter, noted with the Key Topics icon in the outer margin of the page. Table 4-11 lists these key topics and the page number on which each is found.

Table 4-11 Key Topics for Chapter 4

Key Topic Element

Description

Page Number

Section

Network-aware DRS

135

Section

How DRS scores VMs

136

List

DRS migration sensitivity

138

Section

Scalable shares

142

List

vSphere HA requirements

145

Table 4-7

vSphere HA response to failure settings

145

List

vSphere FT requirements

154

Complete Tables and Lists from Memory

Print a copy of Appendix B, “Memory Tables” (found on the companion website), or at least the section for this chapter, and complete the tables and lists from memory. Appendix C, “Memory Tables Answer Key” (also on the companion website), includes completed tables and lists to check your work.

Define Key Terms

Define the following key terms from this chapter and check your answers in the glossary:

VMware Service Lifecycle Manager

vSphere Fault Tolerance (FT)

Predictive DRS

Proactive High Availability (Proactive HA)

Virtual Machine Component Protection (VMCP)

Review Questions

  1. You are configuring EVC. Which of the following is not a requirement?

    1. A vSphere cluster

    2. A DRS cluster

    3. CPUs in the same family

    4. CPUs with the same base instruction set

  2. In vSphere 7.0, you want to configure the DRS Migration Threshold such that it is at the maximum level at which resource contention is considered, but virtual machine happiness is not. Which of the following values should you choose?

    1. Level 1

    2. Level 2

    3. Level 3

    4. Level 4

    5. Level 5

  3. In a vSphere cluster, which of the following statements is true if the primary host detects datastore heartbeats for a secondary host but no network heartbeats or ping responses?

    1. The primary host declares that the secondary host is isolated.

    2. The primary host assumes that the secondary host is isolated or in a network partition.

    3. The primary host takes the host isolation response action.

    4. The primary host restarts the virtual machines on the failed secondary host.

  4. You want to configure vSphere HA. Which of the following is a requirement?

    1. IPv4 must be used for all host management interfaces.

    2. vMotion must be enabled on each host.

    3. The Virtual Machine Startup and Shutdown (automatic startup) feature must be enabled on each virtual machine.

    4. Host IP addresses must persist across reboots.

  5. You are configuring vSphere Distributed Power Management (DPM) in your vSphere 7.0 environment. Which of the following is not a requirement for using Wake-on-LAN (WoL) in DPM?

    1. The management NIC must support WOL.

    2. vMotion is configured.

    3. The vMotion NIC must support WOL.

    4. The physical switch port must be set to auto negotiate the link speed.

800 East 96th Street, Indianapolis, Indiana 46240

sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |