Demystifying VMware Cloud on AWS, Hybrid Cloud Extensions, and Supplemental Storage

5 minute read
Content level: Expert
1

Migrations to VMware Cloud on AWS chan be challenging where there is a significant amount of storage. This provides guidance of how to migrate when vSAN storage does not have enough capacity to receive all data from an on-premises environment.

Introduction

I’ve been working with a customer who has chosen to use VMware Cloud on AWS with one of the supplemental storage solutions. VMware is currently offering 2 solutions, VMware Cloud Flex Storage (VMW-FS) and FSx NetApp ONTAP (FSxN). These solutions allow a VMware Cloud on AWS SDDC to scale storage independently of compute.

A traditional VMware Cloud on AWS SDDC is considered a hyper-converged architecture that requires compute and storage to scale based on the capacity provided by each node. If a workload requires significantly more compute or storage this can cause a lot of excess for the other resource type, resulting in increased TCO. By independently scaling resources TCO can be lowered for an SDDC because resource utilization is optimized.

Migrating Data

This customer has chosen VMware Cloud Flex Storage because their data is mostly file based and does not require the performance capability of an all flash vSAN. Through the Pilot phase of the project the customer was able to set up the VMware Cloud on AWS SDDC and VMware Cloud Flex with minimal effort. The solution was easy to implement and initial testing showed the performance capability easily supported the application requirements.

The next phase involved a partner to begin migrations. To reduce risks non-production workload would be moved before production and Hybrid Cloud Extension (HCX) bulk migration method would be used for all workloads. Through this phase the customer did not experience issues with the solution. One challenge the customer and partner learned through this experience was the level of dependencies that existed within their applications and extra precautions taken to consider these dependencies.

As the migration progressed, they moved forward with the production migration and including a large database server. The initial data synchronization and 2-hour data updates occurred though the week without issues until the cutover was initiated during a scheduled maintenance window. The team waited after initiating the cutover process but the database server never completed the cutover. After reaching out to VMware Support it was determined to cancel this server migration and recover the on-premises copy after 12 hours had passed. Support identified something within HCX cutover process was stopped and was never going to complete.

What Happened?

I setup a call with the customer to understand their migration process. They began by describing their migration process and issues experienced with application dependencies. One interesting detail was provided, they were migrating directly to VMW-FS. I thought... VMware does not support VMware Site Recovery replication directly to supplemental storage and because HCX bulk-migrations leverages the same host based replication method I thought this could be the same case.

Digging around through VMware’s documentation I was unable to locate if this is supported or not. Considering VMware SR leverages the same data migration method I wondered what is the underlying reason for lack of support. Let’s dig into what we know:

  • VMware Site Recovery uses vSphere Replication or host-based replication
  • VMware HCX Bulk Migration uses vSphere Replication
    • Delta updates occur every 2 hours after the initial data replica. This can create additional snapshots that will require remediation.
  • VMware HCX Replication Assisted vMotion (RAV) also uses vSphere Replication
  • vSphere Replication creates snapshots at the destination to maintain versions and updates.
  • Bulk Migration cutover process
    • Source VM is powered off
    • Final offline synchronization
    • Data consolidation of the snapshots
    • VM instantiation at the target datacenter / SDDC
  • ESXi NFS VAAI plugin is not installed and therefore not available to VMW-FS or FSxN
  • VAAI NFS Primitives can assist in copy, clone, and snapshot management.
  • VMware Site Recovery does not support replication to VMW-FS or FSxN as a destination.
  • vSphere Replication does support vSAN as a source and a target datastore.

Conclusion

Based on this data we can observe that vSphere Replication is fully supported for vSAN and thus is fully supported for HCX Bulk Migration and Replication Assisted vMotion (RAV). We also can see that VMware Site Recovery does not support replication to one of the supplemental storage solutions and from this we can assume that HCX Bulk Migration and RAV does not support migrations directly to either supplemental storage solution. Upon further investigation the NFS VAAI plugin supports primitives that assist in the management of copies, clones, and snapshot management. Snapshots are used in the vSphere Replication process. While the customer did have some success with this process it is not officially supported by VMware.

The information can provide us a good understanding of what occurred. This particular database server in question is 5TB in storage capacity and does serve more transactions and thus more IO than other servers considering previously migrated workloads. It was realized the server migration halted at the time data consolidation of snapshots should have been occurring. VAAI would have assisted in this instance due to snapshot management offloading. Finally, a 2-step process would be supported. The first step is to migrate from an on-premises or SDDC environment to the vSAN workload datastore. Upon the completion of the migration of a given workload the second step is to Storage vMotion from the vSAN workload datastore to the chosen solution of VMware CFS or FSxN.

References:

VAAI Primitives – Including NAS

vSphere Replication

HCX : Bulk Migration operations and best practices

Understanding VMware HCX Bulk Migration

Netapp VAAI reference

HCX Bulk Migration best practices

NetApp ONTAP capabilities for vSphere