VCAP510-DCA - Blueprint 2.8 - Breakdown: Objective 4.1 – Implement and Maintain Complex VMware HA Solutions

Knowledge - Identify the three admission control policies for HA

vCenter Server uses admission control to ensure that sufficient resources are available in a cluster to provide failover protection and to ensure that virtual machine resource reservations are respected.

Three types of admission control are available

Host Failures The Cluster Tolerates - Default option and default is one host configured
Percentage of Cluster Resources Reserved As Failover Spare Capacity - Admission based on % CPU and Memory limits
Specify Failover Hosts - Manually specified hosts for failover

Knowledge - Identify heartbeat options and dependencies
The HA master host must distinguish between a failed host and one that is in a network
partition or that has become network isolated. The master host uses network and datastore heartbeating to determine the type of failure.

Network Heartbeating
Slave nodes will send a heartbeat to the master node and the master node will send a heartbeat to each of the slave nodes. The slaves do not send heartbeats to each other, but will communicate during the master node election process. Network heartbeats is performed via the management address of each host and by default occurs every second

Datastore Heartbeating
When a master node stops receiving network heartbeats it will then use datastore heartbeats to determine if the host is network partitioned, isolated or if it has complete failed.

By default HA will select two datastores to use for heartbeating. The criteria used for the datastore selection is:

Datastore that is connected to all hosts, or if there aren’t datastore connected to all hosts it will select a datastore that with the highest number of connected hosts
VMFS datastores are chosen in preference over NFS datastores
Where possible, the two datastores selected will be on different storage arrays
Datastore heartbeating creates a file on the selected datastores for each host and the file remains in an up-to-date state as long as the host is connected to the datastore. If the host gets disconnected from the datastore, then the file for that host will no longer be up-to-date. The hosts write to the heartbeat file every 5 seconds

Skills and Abilities - Calculate Host Failure Requirements
HA slot size has two characteristics CPU and RAM, HA calculates the CPU slot size by obtaining the CPU reservation of each powered-on virtual machine and selecting the largest value. If you have not specified a CPU reservation for a virtual machine, it is assigned a default value of 32MHz. You can change this value by using the das.vmcpuminmhz advanced attribute. HA calculates the memory component by obtaining the memory reservation, plus memory overhead, of each powered-on virtual machine and selecting the largest value. There is no default value for the memory reservation.

Once a slot size is calculated by HA, it looks at each host to identify how many slots they can hold, it then looks at the amount of used slots (powered on VMs) in the cluster. It is used in conjunction with the admission control policies.

Skills and Abilities - Configure Customized Isolation Response Settings

vSphere HA Host Isolation response can be configured at the cluster level.

If particular VMs have different requirements HA Host Isolation response can be configured at the individual level.

Skills and Abilities - Configure HA redundancy
Management Network
Since HA uses the management network to send out network heartbeats, it is a good idea and best practice to make your management network redundant with multiple uplinks.

Datastore Heartbeat
When HA is enabled it will select two datastores to use for datastore heartbeating. VMware states that two datastores are enough for all failure scenarios. However if need to configure more than two heartbeat datastores per host you can used this advanced setting das.heartbeatDsPerHost.

Network partitions
A network partition is created when a host or a subset of hosts lose network communication with the master node, but can still communicate with each other. When this happens an election occurs and a one of the hosts is elected as a master

The criteria for a network partition is

The host(s) cannot communicate with the master node using network heartbeats
The host(s) can communicate with the master using datastore heartbeats
The host(s) are receiving election traffic

Network partitions should be avoided by configuring management network and datastore heartbeating in a resilient manner.

Configure HA related alarms and monitor an HA cluster
There are seven default alarms that ship with vCenter related to HA

Insufficient vSphere HA failover resources
vSphere HA failover in progress
Cannot find a vSphere HA master agent
vSphere HA host status
vSphere HA virtual machine failover failed
vSphere HA virtual machine monitoring action
vSphere HA virtual machine monitoring error

Creating a HA alarm is identical to creating other alarms.

Skills and Abilities - Create a custom slot size configuration
If your cluster contains any virtual machines that have much larger reservations than the others, they will distort slot size calculation. To avoid this, you can specify an upper bound for the CPU or memory component of the slot size by using the das.slotcpuinmhz or das.slotmeminmb advanced attributes, respectively.

Skills and Abilities - Understand interactions between DRS and HA
After a HA failover event DRS will reload balance the cluster.

If cluster is configured for DRS DPM and some hosts are powered down and there is a HA event and a host fails, DRS will bring any required hosts out of Standby and back into the active cluster.

When entering maintenance mode DRS is used to evacuate virtual machines to other hosts. DRS is HA aware and will not migrate a virtual machine to a host, that in doing so, would violate HA admission control rules. When this happens you will have to manually migrate the virtual machine.

If you have a VM that needs to be powered on with enough available resources, but those resources are fragmented, HA will ask DRS to try and defragment those resources in order to allow the VM(s) to be powered on.

Skills and Abilities - Analyze vSphere environment to determine appropriate HA admission control policy

Business requirement for availability, how many resources to dedicate to resilience.
Cluster node count
Cluster node physical attribute disparity
Virtual Machine reservations, and the effect this has on slot size
Virtual Machine sizing eg if some VMs are much larger than the average, and the effect this has on slot size

VMware recommends using the Percentage of Cluster Resources admission control policy for most environments.

Skills and Abilities - Analyze performance metrics to calculate host failure requirements
Establish performance baseline metrics for the VMs within your estate to ensure your workload is able to be met in case of a host failure event.

Skills and Abilities - Analyze Virtual Machine workload to determine optimum slot size

Check reservations and larger than average VMs impact on the HA slot size, if necessary manually adjust the HA slot size.

Skills and Abilities - Analyze HA cluster capacity to determine optimum cluster size
Balance the number of hosts within each cluster with your needs, a four host cluster set to have one host for failure is committing 25%, a ten node cluster set to have same one host failure is committing 10%. Analyzing the cluster HA cluster size will always be driven by business need and cost of maintaining the redundancy.

VCAP510-DCA - Blueprint 2.8 - Breakdown

Wednesday, 19 February 2014

Objective 4.1 – Implement and Maintain Complex VMware HA Solutions

No comments:

Post a Comment