Because virtualization offers so much cost savings and agility, many organizations are going the whole hog and virtualizing mission-critical systems they wouldn't have dreamed of virtualizing a little while ago. Who would have thought, for example, that we'd see enterprises deploy Oracle virtually rather than on physical hardware?
It's no surprise there's increasing interest in wringing every last drop of disaster recovery and disaster avoidance out of virtualization infrastructures. Most commonly, this is manifest in a replicated, dual-site architecture where virtualization resources can be quickly failed over to a secondary data center in the event that catastrophic failure strikes the primary site.
For some organizations, however, this isn't enough. They want the capability to seamlessly migrate workloads from one site to another in order to utilize computing resources at both sites. Moreover, they want the infrastructure to heal itself automatically in the event of a site or hardware failure -- just as single-site virtualization clusters are able to do. Hence the new buzz phrase "stretched cluster."
The stretched cluster has its challenges. But advances in the capabilities of virtualization stacks and the underlying storage gear have made so-called stretched clusters an increasingly attainable and attractive option.
In that scenario, the real failure of an entire site could be handled fairly gracefully. The remaining site's storage and virtualization infrastructure could easily tell that the failed site isn't responding and automatically restart the virtual machines lost in the failure, maybe resulting in only a few minutes of downtime. It sounds great, but it's not that simple.
For example, what happens if the first site didn't fail and one or both of the intersite links fail instead? If the intersite storage replication link is lost, the two storage arrays could both assume that the other has failed and become active at the same time -- a true nightmare scenario in which the two replicas start to diverge. Likewise, failure of the intersite link used for virtual machine traffic might result in the two halves of the virtualization cluster assuming the other half is down and attempting to restart VMs the other site is still actively running.
Some storage vendors have attempted to deal with this by requiring the implementation of a software stack that runs at a third site to aid the two active storage systems in determining whether the other site has truly failed or merely lost connectivity. Even with those measures, however, some scenarios persist in which site partitioning can occur.
To prevent the split-brain nightmare from playing out, one side of the storage cluster is generally defined as being primary in the event of a loss of site-to-site communication -- either on a per-array or per-volume basis. While this does mean there are scenarios in which the storage infrastructure will not recover automatically, it is necessary to avoid the corruption of data that could result if such measures were not taken.
These split-brain scenarios are the worst challenge for virtualization and storage vendors alike, but they're not the only ones. Though the two strorage infrastructures will appear as a unified array to hosts un some stretched cluster storage implementations, only one storage array is responsible for accepting I/O for a given volume. This means that I/O generated by a virtual machine running on a host at the other site must first cross to the first site to be written; only then can it be synchronously replicated back to the first. The trouble here is that hosts (virtualization hypervisors in this case) don't have a good way of knowing whether the storage they have visibility to is actually local to them.