VMware VSAN: Inside the revolutionary new approach to storage

VMware's new software-defined storage tech is more than just a new spin on SAN

A big part of the messaging delivered at EMC VMware's 10th annual VMworld trade show last week surrounded the software-defined data center. No longer content to simply deliver compute virtualization tech, VMware is now actively pushing into both network and storage virtualization with the announcement of the VMware NSX software-defined networking stack and VMware Virtual SAN (VSAN). While NSX is primarily an evolution of the products acquired through VMware's purchase of Nicira, VSAN appears to be, for the most part, internally developed.

This isn't the first time VMware has ventured into the storage space. The VMware vSphere Storage Appliance was VMware's first attempt at a storage technology that actually leveraged the benefits of virtualization. However, the new VMware VSAN should in no way be confused with the maddeningly similarly named VMware VSA. The two technologies couldn't be more dissimilar. Where the VSA is a relatively limited appliance-based SAN stand-in that might be used in a small business or branch office environment, VSAN is a complete reintegration of persistent storage into the hypervisor.

However, VSAN is just a tool, and like all tools, it won't be perfect for everyone. Moreover, it's a tool that doesn't actually exist yet. VSAN will be in public beta shortly and will be compatible with the newly released vSphere 5.5, but isn't due out yet for some time. Despite its beta status, take a look at how VMware's Virtual SAN works -- it may be a glimpse into the future of software-defined storage.

What the VMware Virtual SAN isn't

It's easy to say that VMware VSAN is a way to leverage the direct-attached storage in VMware vSphere hosts to form a distributed, redundant shared storage infrastructure that vSphere can use to house VMs in place of a traditional SAN or NAS. However, that might lead you to believe it's similar to products such as VMware's own vSphere Storage Appliance or Hewlett-Packard's LeftHand P4000 VSA, which also fill that niche. That would be inaccurate.

Instead, it's easier to describe what the VSAN isn't.

First, it is not a virtual appliance. Unlike the vSphere Storage Appliance, the entirety of the VSAN control and data planes are built into the ESXi hypervisor rather than being implemented in general-purpose virtual appliances. Unlike an appliance, it can directly address the underlying physical hardware. This not only cuts out the small amount of processing inefficiency introduced by the hypervisor but also gives the VSAN much greater control over and visibility of the hardware -- even allowing it to directly address specific disks.

Second, VSAN doesn't represent block-level storage in the same way a typical SAN does, nor, exactly, is it a file-level NAS. Instead, it's more of an object-based storage infrastructure designed with the single purpose of storing virtual machines and their associated data -- and nothing else.

By contrast, most SANs live their lives trying to make themselves look like a simple direct-attached disk: a collection of blocks ready to be used however the operating system they're attached to wants to use them. Similarly, a NAS offers up a general-purpose file system and allows the host OS to create and delete files and directories on top of that file system -- very similar to the SAN, but operating at a higher level and is thus less flexible (and more purpose-built).

VMware's VSAN continues that progression by introducing the complexity necessary to divide virtual machines across multiple disks on multiple servers and enforce a range of storage policies on a per-VM-disk basis. Literally, all it can do is store virtual machines, and only for vSphere hosts.

This is where the really interesting bits of VSAN are. Few other virtualization-oriented storage technologies have this kind of visibility into what they're actually storing. Say I've built a VMFS file system on a Fibre Channel SAN and filled that VMFS volume with a variety of virtual machines. The SAN in this case has absolutely no idea what it's storing. It doesn't know how many virtual machines there are, how big the disks are, or which workloads are more critical to the organization.

The only way I can get the SAN to understand that is to create further VMFS volumes and split the virtual machines among them. However, unless I want to create an endless number of SAN volumes, I can't really enforce storage policies effectively using a SAN. That's one problem the VSAN works around by addressing its storage in the context of individual virtual machine disks rather than generic blocks of storage or meaningless lumps of files.

Fine, then -- so what is the VMware Virtual SAN?

Now that I've gotten some of the conceptual stuff out of the way, I can run through the nitty-gritty. A VSAN is constructed using the physical disks in the vSphere hosts that make up a vSphere cluster. It does not depend on RAID on the hosts; in fact, any RAID controller in the hosts must be disabled so that the disks are exposed directly to the vSphere OS (and thus to the VSAN software).

The minimum configuration for a VSAN is three hosts, each containing at least one SSD and one magnetic disk. The maximum configuration is likely to change as VSAN reaches its initial release, but is currently set at eight hosts per VSAN cluster with each host capable of fielding five disk groups -- each with one SSD and as many as five magnetic disks. You might be able to field 40 SSDs and 200 spinning disks -- a far cry from the three-host, 16TB maximum of the VMware vSphere Storage Appliance.

Crucially, the SSDs are not used as storage in the way that a tiered SAN might use them. Instead, they are used only as a write buffer and read cache. Their only role is to accelerate the performance of the VSAN rather than to provide usable capacity. To this end, the capacity of the SSDs should be at least 10 percent that of the magnetic disks. That could limit the ability to scale, given the 1:5 ratio between SSDs and the maximum number of spinning disks in a disk group -- five 2TB SATA disks would require an expensive 1TB SSD to front-end them.

Turning on the VSAN software is mind-numbingly easy. After you've loaded vSphere on host hardware with the appropriate disk resources and either dedicated 1Gbps or shared 10Gbps Ethernet interfaces, you just check a box in the cluster configuration in vCenter. If you don't care to manually control the process, vCenter will program the hosts to lasso all their direct-attached disks into the VSAN and present the resulting storage space to the hosts in the cluster as one big data store. If you'd like to control how the disk grouping is done or leave some disks unused by the VSAN, you can do so manually.

After that, all you have left to do is create virtual machines and the storage policies to go along with them. The storage policies consist of a range of settings that affect the number of disks onto which a virtual machine is stripped and the amount of data redundancy that's introduced (reflected in the number of failures permitted).

The striping question is one of performance. If you decide to create a virtual machine with four stripes, the VSAN splits the data up onto at least four disk groups (provided you actually have four disk groups). This will have almost no impact on write performance because the SSDs in those disk groups will bear the writes and are usually not a performance bottleneck. However, it will speed read performance if there are a lot of cache misses because there are more spinning disks to read from.

Data redundancy is where the concepts behind RAID are introduced. Effectively, this setting tells the VSAN how many times it needs to make copies of the data; it is independent of striping. If you tell it to withstand a two failures, it will copy the data enough to ensure that two vSphere hosts making up the VSAN could fail and therefore access to this VM would not be interrupted.

Putting it all together

The really neat thing about all of this is that you can have two virtual machines side by side on the same data store but have wildly different performance and redundancy policies applied to them. That's simply not possible without a file system and underlying storage architecture that is aware of what it is storing. Although the first revision of VMware Virtual SAN could be a massive flop in terms of performance or reliability (we'll see when it comes out), these capabilities are what make it unique and why it is worth taking note of.

This article, "VMware VSAN: Inside the revolutionary new approach to storage," originally appeared at InfoWorld.com. Read more of Matt Prigge's Information Overload blog and follow the latest developments in storage at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

Copyright © 2013 IDG Communications, Inc.