Tuesday, 18 February 2014

Objective 3.3 – Implement and Maintain Complex DRS Solutions

Knowledge - Explain DRS / storage DRS affinity and anti-affinity rules
Virtual Machine DRS affinity and anti-affinity
If you run say five ESX hosts in a DRS cluster and five of your VMs are web servers running a load balanced application.  For resilience you would probably want to ensure they did not all end up running on the same host as if that host failed all of your VMs would fail until HA recovered them to alternate hosts.  Affinity and anti-afinity rules can be used to ensure VM work load is either separated or grouped together. Rules classes are "must" and "should",  must rules are never violated by DRS, DPM or HA,  should rules are best avoided but not ultimate.  It is possible to write rules which conflict with each other,  these should obviously be avoided if possible. If you have conflicting rules then the older rule wins and the new rule is disabled.

Storage DRS affinity and anti-affinity
Are similar to DRS affinity rules, but instead of being applied to virtual machines and hosts they are applied on virtual disks and virtual machines when using datastore clusters. There are three different storage DRS affinity/anti-affinity rule types
  • Inter-VM ant-affinity allows you to specify which virtual machines should not be kept on the same datastore within a datastore cluster
  • Intra-VM anti-affinity lets you specify the virtual disks that belong to a particular virtual machine are stored on separate datastores within a datastore cluster
  • Intra-VM affinity will store all of your virtual disks on the same datastore within the datastore cluster (this is the default)
Storage DRS affinity rules are invoked during initial placement of the virtual machine and when storage DRS makes its recommendations. A migration initiated by a user will not cause storage DRS to be invoked.

Knowledge - Identify Required Hardware Components To Support DPM

DRS DPM is used to power down ESX hosts when they are not required and to power them back on when demand requires.  To perform this they need to be able to talk to the server out of band to initiate power on. Servers have outband management interfaces,  Dell (iDRAC),  HP (iLO), IBM (RSA),  on this you require to create an account with rights to power manage your server.  You configure each host to be managed in vCenter with the IP \ mac address of its outband management card and the credentials you created on the outband management device.

Once configured for each host within the DRS DPM cluster before you can enable DPM you need to perform a test power down and on using IPMI credentials.  You can do this using maintenance mode and use vSphere GUI to power down and power on the host.

Knowledge - Identify EVC Requirements, Baselines and Components
Enhanced vMotion Compatibility (EVC) is a cluster level feature which facilitates virtual machine migrations (vMotion) between hosts containing CPUs with varying feature sets. EVC’s functionality is based on Intel Flex and AMD-V Extended Migration technologies. It permits a vSphere administrator to utilize a combination of new and legacy servers in an ESXi host vMotion cluster.When enabled, EVC functions by creating a baseline of common feature sets between a clusters aggregate processors.EVC does not disable generation specific CPU feature sets, it masks them. EVC baselines are based on CPU generation feature sets.

Knowledge - Understand the DRS / Storage DRS Migration Algorithms, the Load Imbalance Metrics, and their impact on migration recommendations

DRS has three automation levels and a migration threshold.
  1. Manual – Makes recommendations but it is the administrator that makes the final call if something is relocated.
  2. Partially automated – Automatic initial placement of virtual machines and recommendations that the administrator must review before something is relocated.
  3. Fully automated – Automatic initial placement of virtual machines and all decisions are left up to DRS.
Migration Threshold – This is adjustable from the settings of a DRS enabled cluster on the automation level screen. The migration threshold indicates to DRS that it should only make recommendations that will benefit the cluster in a predefined way. A low value offers minimal benefit versus a high number would greatly benefit the cluster.

Recommendation Considerations
  1. On initial placement – DRS makes the assumption that when a virtual machine is powered on that it will use 100% of the CPU and memory resources. It does this to ensure that it can select a machine with enough resources and not adversely impact current cluster demand. If DRS ascertains that a host should host the virtual machine being powered on, it will make room by vMotioning virtual machines to another more suitable host.
  2. On automation levels – As an administrator, if your cluster is in partial or manual mode, you must be constantly watching the recommendations. If you haphazardly make decisions or stop watching for recommendations, it’s probable that you will invalidate the benefits of DRS. It is recommended that you properly setup affinity and anti-affinity rules and other cluster features first, and then set the cluster to fully automated.
  3. Cluster imbalances – By default, DRS checks for cluster imbalances every five minutes. However, this may change when a host is added to the cluster, a host is put in maintenance mode, a resource pool was altered in any way, or if a virtual machine power state has changed.
DRS Metrics
Located on the summary screen of your DRS enabled cluster, examine the target and current load standard deviations. When the current host load standard deviation exceeds the target host load standard deviation, DRS will make recommendations and take action based on the automation level and migration threshold.

The Target Host Load Standard Deviation (THLSD) is derived from the migration threshold setting. A load is considered imbalanced as long as the current value exceeds the migration threshold.

Each host has a host load metric based upon the CPU and memory resources in use. It is described as the sum of expected virtual machine loads divided by the capacity of the host. The LoadImbalanceMetric also known as the current host load standard deviation (CHLSD) is the standard deviation (average of averages) of all host load metrics in a cluster.

DRS decides what virtual machines are migrated based on simulating a move and recalculating the current host load standard deviation and making a recommendation. As part of this simulation, a cost benefit and risk analysis is performed to determine best placement. DRS will continue to perform simulations and will make recommendations as long as the current host load exceeds the target host load tolerance for imbalance.

All recommendations are assigned a priority and this is determined based on the algorithm described here. A recommendation that is priority 5 will most greatly influence the balance of a cluster whereas a smaller number provides less benefit.


Storage DRS (SDRS)
VMware Understanding VMware vSphere 5.1 Storage DRS whitepaper.

SDRS clusters can be configured to operate in one of two modes.
  1. No Automation (Manual Mode) – Initial placement and migration recommendations are provided but it is the administrator’s responsibility to review and take action for each recommendation. This is the default mode after creating a datastore cluster.
  2. Fully Automated – Migration recommendations are executed automatically. Initial placement still requires administrator approval.
Datastores are formed into datastore clusters and the cluster can be manually assigned a Storage Profile based on the type of underlying disk. Or if available can pick this up from the SAN using vSphere Storage API Storage Awareness (VASA).


Storage DRS load-balancing algorithm considers Utilized Space and I/O Latency. Recommendations for Storage DRS migrations will not be made unless these thresholds are exceeded.  You can set advanced options to specify your tolerance for I/O imbalance and the percentage differential of space between source and destination datastores.        Storage DRS also calculates a cost vs. benefits analysis (like DRS) prior to making a recommendation.

SDRS issues migration recommendations for the follow events:
  • Space utilization thresholds have been exceeded on a datastore
  • I/O response time thresholds have been exceeded on a datastore
  • A significant imbalance of capacity among datastores
  • A significant imbalance of I/O among datastores
An SDRS assessment is triggered for the following events:
  • When SDRS is manually executed
  • During initial placement events
  • When a datastore is added to a datastore cluster
  • When a datastore is changed to maintenance mode
  • When then SDRS configuration is updated
  • When a threshold is exceeded
  • At the defined interval (Default is 8 hours, can be altered between 1 hour and 30 days)
Skills and Abilities - Properly configure BIOS and management settings to support DPM
For HP servers you need to create an iLO account with rights to power manage the server

Skills and Abilities - Test DPM to verify proper configuration

In order to enable DPM each host needs to have show a valid "Last Time Exited Standby", this is achieved by entering the details of the ILO user account you created alond with the IP and mac address of the ILO.


Once this is configured for your server,  place to maintenance mode,  and use vCenter to put server to Stand By mode then bring it back from Stand By mode.  This simulates DPM and once completed it populates "Last Time Exited Standby" record for server.  Once all hosts in the cluster are populated then DPM can be used.

Skills and Abilities - Configure appropriate DPM Threshold to meet business requirements
There are three different options you can choose; Off, Manual and Automatic
  • Off – power management is turned off
  • Manual – vCenter will give you recommendations during low resource utilization for hosts that can be put into standby mode
  • Automatic – vCenter will automatically place hosts in standby mode based on the DPM threshold that is set
DPM Thresholds five settings from Conseravtive to Aggressive


DPM is useful for saving power and cooling costs,  establish with the business how important these are for deciding what settings to configure.



Skills and Abilities - Configure EVC using appropriate baseline

To enable an EVC baseline the following must be met,
  • Consistent host CPU manufacturer, either Intel or AMD.
  • All hosts must be running ESX/ESXi 3.5 Update 2 or later.
  • All hosts must be connected to vCenter Server.
  • Hardware virtualization support, AMD-V or Intel VT and AMD NX or Intel XD must be enabled in the BIOS of all cluster hosts.
  • Hosts must be running supported processors.Any virtual machine that is running on a host with a higher CPU feature set than what is presented via the configured EVC baseline must be powered off prior to configuring EVC
Select the most appropriate baseline using theVMware EVC Baseline KB article. Then edit the cluster settings and assign from drop down box.

If any hosts don't meet the criteria they are shown along with the issue,  when a suitable EVC mode is selected then it shows as validation succeeded.

Skills and Abilities - Change the EVC mode on an existing DRS cluster
Changing the EVC mode or enabling EVC mode for the first time on an existing cluster can potentially be disruptive. If you have virtual machines that are running on hosts that expose a higher level of advanced CPU features then are presented with the EVC baseline you want to configure, then those virtual machines must be powered off.

The cluster EVC mode is not applied to a VM until a power on operation is performed. A cluster’s EVC mode can be modified while VM’s are powered on, but the changes will not take affect until the respective VMs are power cycled completely. A guest restart is not sufficient, rather a complete power down and power up of the guest is required for the EVC mode changes to register. As such, if the EVC mode of the cluster is upgraded, powered on VM’s will maintain capabilities of the previous EVC baseline until a full power cycle is performed.

Skills and Abilities - Create DRS and DPM Alarms

A pre-configured alarms for DRS/DPM is the Exit Standby Error. This is an event based alarm, so it will only trigger when the host/cluster reports an event of a host not able to exit standby mode.
To create a new alarm

Skills and Abilities - Configure Applicable Power Management Settings for ESXi hosts
ESXi 5 offers four different power policies that are based on using the processor’s ACPI performance states, also known as P-states, and the processor’s ACPI power states, also known as C-states. P-states can be used to save power when the workloads running on the system do not require full CPU capacity. C-states can help save energy only when CPUs have significant idle time; for example, when the CPU is waiting for an I/O to complete.
ESXi 5 offers the following power policy options:
  • High Performance: This power policy maximizes performance, using no power management features. It keeps CPUs in the highest P-state at all times. It uses only the top two C-states (running and halted), not any of the deep states (for example, C3 and C6 on the latest Intel processors). High performance is the default power policy for ESX/ESXi 4.0 and 4.1.
  • Balanced: This power policy is designed to reduce host power consumption while having little or no impact on performance. The balanced policy uses an algorithm that exploits the processor’s P-states. Balanced is the default power policy for ESXi 5.
  • Low Power: This power policy is designed to more aggressively reduce host power consumption, through the use of deep C-states, at the risk of reduced performance.
  • Custom: This power policy starts out the same as balanced, but it allows individual parameters to be modified. If the host hardware does not allow the operating system to manage power, only the Not Supported policy is available. (On some systems, only the High Performance policy is available.)
 To configure this set within BIOS

This then allows the power management levels to be managed by vSphere.

More detailed information here

Skills and Abilities - Properly size virtual machines and clusters for optimal DRS efficiency
You don’t want to size your virtual machines to the cluster, rather, you want to sized your clusters based on virtual machines. If you size a VM with 4x vCPU and 4GB vRAM but its current workload is only 1x vCPU and 1GB vRAM. During initial placement, DRS considers the “worst case scenario” for a VM.So for our example , DRS will actively attempt to identify a host that can guarantee 4GB of RAM and 4x CPU to the VM. This is due to the fact that historical resource utilization statistics for the VM are unavailable. If DRS cannot find a cluster host able to accommodate the VM, it will be forced to “defragment” the cluster by moving other VMs around to account for the one being powered on. As such, VMs should be be sized based on their current workload.

Limiting the number of virtual machines in a cluster, reduces the number of load balancing calculations that the vCenter Server has to handle. There is a performance cost associated with vMotion operations, the main vehicle of DRS. However, from a DRS performance perspective, it is not advisable to have an abundance of small (2 to 3 host) clusters. DRS Clusters support a maximum of 32 hosts and 3000 virtual machines.

Skills and Abilities - Properly apply virtual machine automation levels based upon application requirements
When creating a DRS cluster you set a virtual machine automation level for the cluster. There might be some use cases that require a virtual machine, or a set of virtual machines, that require a different level of automation then what the default for the cluster is. You can set automation levels for virtual machines individually.

You may want to do this if you have an application that is constantly changing its memory contents, you may want not want it to move hosts as often as other virtual machines.

Skills and Abilities - Create and administer ESXi host and Datastore Clusters
In order for HA, DRS and SDRS to function properly, ESXi host and datastore clusters must be configured.

Requirements
  • At least two ESXi 5 hosts.
  • All hosts should be configured with static IP addresses.
  • Identical vSwitch configuration among the participating hosts. Dedicated vSS’s or a shared vDS are compatible.
  • Consistent port group configuration between participating hosts. Note that port group naming is case sensitive, so ensure all related port groups are configured consistently.
  • CPU compatibility is required between participating hosts. At a minimum, CPUs must be from the same vendor (AMD or Intel), family (for example Xeon or Opteron), must support the same features and being virtualization enabled is a required for running 64-bit guests.
  • Common vmotion vmkernel network between the hosts. Connections must be at least 1 GB. Dedicated uplinks recommended but not required.
  • Shared storage between the hosts. FC, FCoE, iSCSI and NFS supported.
  • Maximum of 32 hosts per cluster.
Skills and Abilities - Administer DRS / Storage DRS
All administration takes place within the GUI and almost all of it within the cluster settings.
  • Adding and removing hosts \ datastores
  • Cluster Validation
  • Create and maintain Anti Affinity/Affinity rules
  • Invoke DRS \ SDRS
  • Host Maintenance Mode
  • Datastore Maintenance Mode
  • Storage DRS scheduled tasks
  • Apply DRS recommendations if manual mode
  • SDRS schedulding

No comments:

Post a Comment