Workload Resiliency and Management with Stretched Clusters in VMware Cloud on AWS

One of the unique features of VMware Cloud on AWS (VMC) is stretched clusters.  It allows customers to run their VMs and provide resiliency across AWS availability zones (AZ) without having to re-architect an application.  At a basic level, it works by creating a vSAN stretched cluster across two AZs, with a third AZ hosting a witness node.   With stretched clusters, VMware provides a 99.99% uptime SLA.  Stretched clusters provide for a highly available solution, and protects against some failure scenarios but not all.  Stretched clusters should not be considered a disaster recovery solution.  Stretched clusters make architecting infrastructure resiliency easier, and gives customers the ability to manage their workload placement and protection levels. 

Stretched Clusters

In the example below, I will show various aspects of several features of the VMC service, and discuss several design options to consider in deploying stretched clusters and building highly available applications efficiently. First, we will start with the VMware Cloud on AWS Sizer.

Example Workload:  250 VMs with Stretched Cluster Requirement

  • Virtual Machines:  250
    • 4vCPU:Core
    • vCPU/VM: 2
    • vRAM/VM:  8 GB
    • Storage Utilized/VM:  200GB
Figure 1 : Workload Profile from VMC Sizer
  • Per the output of the VMC Sizer:
    • ESXi i3 hosts: 14
    • 504 Cores
    • 7TB RAM
    • ~145 TiB usable storage
Figure 2 : Stretched Cluster Infrastructure Requirements from VMC Sizer

A common misconception is that a stretched cluster has to cost at least 2X what a standard cluster costs.  If a customer has a 14 i3 host stretched cluster, the total cluster capacity is 504 physical cores and 7 TiB of RAM, and ~145 TiB of usable storage.  With VSAN and stretched clusters, the VMC Sizer assumes that all virtual machines will use the Dual Site Mirroring capability, making the effective usable storage ~72TiB.  For some workloads, this type of configuration makes sense.  For some it may not.   As you can see, there is a big discrepancy between the ~48 TiB of required storage vs the ~145 TiB of provisioned storage.  As I will show, the more we understand about the application architecture, the more effective we can be in managing the workload and infrastructure capacity.   Before we dive into the application types, I want to discuss a couple components and considerations of stretched clusters. 

Stretched Cluster Protection – Network and Management Infrastructure

Networking in Stretched clusters from a customer perspective is not too much different than a standard cluster, but there are differences.

Stretched Cluster Provisioning

Deploying a stretched cluster is straightforward. From the VMC UI, select a multi-host deployment and check the Stretched Cluster option. note that the number of hosts now will only show even values, since the stretched cluster must be deployed evenly across AZs.

Figure 3: Stretched Cluster SDDC Configuration

The connected VPC two subnets defined for the ENI creation and inter-account routing. During SDDC creation, the ENI will be deployed into the customer account across both subnets. At any given time, only one ENI in an SDDC is active, whether it is stretched or not.

Figure 4: Stretched Clusters and Cross ENI

These are the two differences in provisioning from the customer perspective. Once the SDDC is built, a customer does not have to worry about much related to the SDDC networking other than provisioning their workload segments. The workload segments run on an overlay network that exists on both sides of a stretched cluster. In a VM migration or failover, the workloads keep their same layer 2/3 configuration, the associated NSX tags, and security policies.

Stretched Cluster Network Infrastructure

Under the hood, there are operational aspects of the stretched clusters that customers may want to be aware of. VMC as a service takes care of the aspects of failing over the T0 and T1 routers, the NSX Edge, and other management components. Since VPN is an overlay construct, nothing special is required in event of a failover. Direct Connect and Transit Connect are regional AWS constructs, so again nothing special is required for a customer to configure/account for. VMC Techzone has a good write up on networking in stretched clusters.

Figure 5: Networking in Stretched Cluster SDDCs

Other resources that may relate to the SDDC might have to be evaluated, such as resources provisioned in the connected VPC, security/egress/Transit VPCs, etc.

Stretched Cluster Protection – Storage and Compute

The vSAN is configured with two data stores.

  • vsanDatastore
    • The vsanDatastore contains the management components required for VMware to provide VMC as a service.   In the VMC shared responsibility model, VMware maintains responsibility to ensure the availability of the infrastructure management components.  These management components include vCenter, NSX Edges and Managers, HCX components, SRM components, and more. 
    • The default reservation is 1676 GiB (1800 GB) but may consume more space as various services are enabled and consumed. 
    • VMC customers have limited rights to this datastore.
  • WorkloadDatastore
    • The WorkloadDatastore is where customers deploy their workloads into. 
    • VMC customers have full read/write permissions to this datastore.
    • Customers apply storage policies to workloads as they are provisioned, or after migration..
    • Customers can use standard storage policies or define their own.

Note:  Datastores have RAID considerations, slack space, and other overhead considerations that are incorporated into the VMC Sizer.  I am not going to focus on those parts of the storage configuration, but focus on the storage available for end user consumption. 

In a stretched cluster, VMware managed management components have a site disaster tolerance to “Dual site mirroring”.  This is not configurable.  For workload VMs, customers have the choice as to the protection levels that they desire.  This is done by implementing vSAN policies, and configuring site disaster tolerance and failures to tolerate.  If all VMs are assigned the site disaster tolerance policy = Dual site mirroring, then the same amount of storage will be consumed from each fault domain.

Figure 6: Stretched Clusters with Dual Site Mirroring

Note that this can be fairly inefficient.  Dual site mirroring is a broad tool to provide resiliency, but it does come at a cost.  In Figure 6, the workload requirements for storage are effectively doubled, because every VM in the SDDC is being mirrored to the secondary AZ.  This is where understanding the application will come in very handy.  Many customers have redundancy built into their web, load balancing, database servers already.  Many applications do not have a high availability requirement.  Some applications already have regional redundancy, inter-regional redundancy, or are ephemeral.  These are all reasons it might make sense not use the Dual Site Mirroring site redundancy. 

Features Relevant to Workload Storage and Placement

There are two features of VMC that are applicable to effectively force a workload to run on a preferred availability zone.  Storage policies define which fault domains a workload is stored.  Compute policies define where a workload should run.  Both need to be considered and utilized in targeted workload placement.  It would not be efficient to force a VM to be stored on fault domain 1 but having it running on the secondary site.  This would result in all disk operations traversing across the AZs, and event of an AZ failure would ensure that the workload would incur an outage. 

Storage Policies

When creating a storage policy in a stretched cluster, there are two user controllable configurations relevant to the conversations.  They are site disaster tolerance and failures to tolerate.

  • Site disaster tolerance
    • Dual site mirroring
    • None – keep data on Preferred
    • None – keep data on Non-preferred
    • None (vSAN makes the decision based on available storage capacity)
  • Failures to tolerate
    • 1 failure – RAID-1 (Mirroring) – 2 host minimum
    • 1 failure – RAID-5 (Erasure Coding) – 4 host minimum
    • 2 failure – RAID-1 (Mirroring) – 5 host minimum
    • 2 failure – RAID-6 (Erasure Coding) – 6 host minimum
    • 3 failure – RAID-1 (Mirroring) – 7 host minimum

Site disaster tolerance is only available for stretched clusters.  Failures to tolerate (FTT) is dependent on the desired protection levels and the number of hosts in the fault domain.  An 14 host stretched cluster (7 hosts per fault domain) can leverage FTT of 1,2, or 3 with either RAID-1, RAID-5, or RAID-6. In designing a stretched clusters with both highly available applications and applications without HA, three or more storage policies may be needed:

Policy NameSite Disaster ToleranceFailures to tolerate
us-east-1-az1None – keep data on Preferred2 failure – RAID-6 (Erasure Coding)
us-east-1-az2None – keep data on Non-preferred2 failure – RAID-6 (Erasure Coding)
us-east-1-MirroredDual site mirroring1 failure – RAID-5 (Erasure Coding)
Table 1: Stretched Clusters Storage Policy Example

Compute Policies

Compute policies are a method of keeping groups of VMs together, keeping VMs separate, keeping a group of VMs on a group of hosts, or keeping a group of VMs off of a set of hosts.  Compute policies are relevant especially for applications with dependencies or redundancies.  A common compute policy in a cluster, stretched or not, would be to ensure that database replicas do not run on the same host.  Compute policies run in a “should” method, with a desired state. 

Compute policies are pretty well defined in the techzone and documentation.  For the sake of the stretched cluster configuration, I will not dive into this.  It would be recommended to create a tag for each availability zone, assign those tags to the hosts, assign the tags to the virtual machines, then create a VM-Host Affinity rule to have VMs prefer to run on the hosts with the same tag. 

Example 1:  Application with Resiliency Built-in

Many enterprises already have redundancy built-in to various aspects of their application.  Customers can deploy VMware or third party load balancers to help facilitate redundancy and availability into their applications.  Customers can deploy redundant servers and containers.  Customers can leverage database mirroring, replication, availability groups.  Customers can leverage application replication such as active directory replication service.  There are many ways to architect redundancy into an application. 

Figure 8:  Stretched Clusters with Highly Available Application

In Figure 8 the 3 tier application has redundancy designed throughout, including the use of load balancers, duplicate web/app servers, and database replicas.  There are many types of resiliency options, depending on software architecture and vendors.  Building resiliency throughout the application stack has additional benefits than just protecting from a site failure.  Maintenance, patching, and upgrades can be performed in a rolling manner.  An application can be protected from an operating system or service failure.  An application can be protected from a misconfiguration. There are several distinct benefits of implementing redundant applications in a stretched cluster. The layer two segments are stretched across AZs. Also, the internet gateway/VPN/DX/DXGW connectivity will automatically failover in during an AZ failover.

Figure 9: Resource Requirements for Example Application 1

Leveraging a combination of application knowledge, storage policies, and compute policies, the application can be distributed across two availability zones.  This configuration will reduce the storage requirements, since the logical data is already being replicated from one fault domain to another, albeit using application mechanisms rather than vSAN mirroring.  In event of a fault domain failure, 1/2 of the application will already be running.  As soon as the network infrastructure is running on the non failed side, the application should be able to resume service.  The application would not be dependent on operating system reboots or services starting. If the dual site mirroring storage policy were implemented for this application, the infrastructure requirements would look significantly different:

Figure 10: Stretched Cluster Work – Stretched Cluster with HA Application w Dual Site Mirror
Figure 11:  Infrastructure Requirements with Dual site mirroring For Example 1
  • Effectively, we now have 4 copies of the databases. 
    • Primary side primary replica
    • Primary side vSAN mirror replica
    • Secondary side Secondary replica
    • Secondary side vSAN mirror replica

This may or may not be a valid configuration.  If the application is scaled out for throughput for instance, it may be necessary to have all 4 web servers running concurrently whenever possible.  Database servers may utilize the secondary replicas for read operations.  Keep in mind that if an AZ completely fails, it may not be possible to restore the VMs at all or from backup, and in order to restore the application to full resiliency could require significant operational work.  Ultimately the efficiency and architecture of the infrastructure can be tuned as more about the application is understood. 

Example 2:  Application with No Resiliency Built-in

As most enterprises have many applications with many layers of resiliency, most also have many applications with little or no resiliency.  This may be due to vendor support, legacy products, lack of importance of the application, or cost.  Stretched clusters can provide a level of protection for these workloads without rearchitecting the application. 

Figure 12: Stretched Cluster with non HA Application

In this example, the application is single threaded.  It should be assumed that if a component of the application fails, the entire application may not be available.  The use of dual site mirroring would be beneficial to building in better resiliency into the application without refactoring it. 

Figure 13: Infrastructure Requirements for Example Application 2


Most organizations have a mix of redundant/resilient applications and applications without these features.  As more is understood about the application profile, the more intelligent placement and design we can apply to the environment in VMware Cloud on AWS.  If we apply this logic back to the original sized environment:

  • Virtual Machines:  250
    • 4vCPU:Core
    • vCPU/VM: 2
    • vRAM/VM:  8 GB
    • Storage Utilized/VM:  200GB

If after analysis, we determine that only 40 of the workloads do not have resiliency built into them and require multiple availability zone redundancy, we may be able to reduce the host count required on the stretched cluster, while still being able to leverage the benefits of the stretched cluster. If we work through the VMC Sizer tool for the two scenarios independently, we can right size the SDDC based on the requirements of the workloads. Note that each sizing takes into account the management reservation overhead.

  • Virtual Machines not requiring dual site mirroring:  210
    • ESXI i3 hosts:  5
    • 180 cores
    • 2.5TB RAM
    • ~52TB usable storage.
  • Virtual machines requiring dual site mirroring:  40
    • ESXI i3 hosts:  4
    • 144 cores
    • 2TB RAM
    • ~41TB usable storage.
Figure 14: Stretched Clusters with HA and non HA applications

A stretched cluster cannot have an odd number of hosts, so 10 hosts would be the new recommended requirement vs 14 originally. An effective savings of 28%.  Both sets of workloads would still benefit from the additional resiliency built-in to stretched clusters, including the availability of the connectivity and management.

Additional Considerations

SLA Impact – For vSphere to restart a workload in the event of AZ failure, there must be an accessible copy of the data inside the surviving AZ. This is why Dual Site Mirroring is a prerequisite for SLA eligibility. VMware Operations cannot guarantee 99.99% availability without the redundant copy of the data. Therefore, only consider disabling when the complete loss of the workload and its data would be tolerable. 

Cross Availability Zone Charges – Traffic that crosses between the two availability zones is metered, whether that traffic is disk reads/writes, network traffic between VMs, or storage replication.  In general, keep the VMs running on the hosts where the data resides, and keep chatty applications in the same AZ if possible. 

Multiple SDDC Considerations – Many customers deploy both stretched and non-stretched clusters.  These clusters must be in separate SDDCs.  Leveraging the tools above may remove the necessity of multiple SDDCs, which may lead do additional infrastructure and operational efficiency. 

Multiple Cluster Considerations – Customers may choose to have entire clusters leveraging dual site mirroring, and having clusters dedicated to VM/storage affinity. 

Disaster Recovery Considerations – Stretched clusters are a solution to building highly available cloud infrastructure and applications.  It is not considered a disaster recovery solution.  Stretched clusters do not have a concept of restore point objective or retention.  Stretched clusters are also limited to regional replication, so it does not support out of region replication.  VMware Cloud Disaster Recovery is a great solution for out of region replication, robust RPO, and long term retention. 

Additional Resources

Scroll to Top