Why you should embrace memory overcommits for your VMs

Although available only in the costlier VMware ESX, this technology should be considered essential by IT

The independent analyst firm Burton Group recently released a proposed set of standards for server virtualization -- some required, some preferred, some optional -- to help IT see beyond the data sheet marketing check boxes for EMC VMware's ESX, Microsoft's Hyper-V, and Citrix's XenServer. VMware supports 27 of the requirements and most of the preferred and optional standards, while Hyper-V supports 24 of the required standards and fewer of the others (my previous post "A new wrinkle -- and possible conclusion -- for the hypervisor wars" discusses who would be affected by those three "required" omissions in Hyper-V).

Note:  Actually, according to Chris Wolf (Citrix CTP, MVP for Microsoft Virtualization, vExpert for VMWare), the author of the Burton Group criteria, Hyper-V meets 25 of the 27 requirements because he was able to successfully cluster SCVMM.

Recently I've examined closely the Burton Group criteria and it is quite impressive.  Weighing in at 70 pages, the criteria has some debatable features in terms of placement (required, preferred or optional) but hits everything you can imagine that clients could ask for in a solution.  As I've mulled over Burton Group's recommendations, I've come to believe that one of its preferred standards -- memory overcommit -- should be considered a requirement. Right now, only VMware ESX meets that condition.

[ InfoWorld's J. Peter Bruzzese takes a look at the hypervisor war and the new Hyper-V R2. | Keep up with the latest virtualization news with InfoWorld's virtualization newsletter and visit the InfoWorld Virtualization Topic Center for news, blogs, essentials, and information about InfoWorld virtualization events. ]

On the surface, the term "memory overcommit" gives off the wrong vibe because overcommitting seems to suggest overextension of resources. In reality, the idea is that you can create multiple VMs that would typically require a certain amount of memory even though you don't have the physical memory to back up what you have provisioned for those systems. Basically, with memory overcommit, the amount of memory allocated to VMs on a physical host can exceed the amount of physical RAM on the host. This allows for greater VM density on a physical host.  Chris Wolf says "A favorite analogy of mine is the pagefile. Windows OSs have been using a pagefile forever for the same purpose (to allow the OS to run more apps than physical resources would normally allow, and intelligently allocate physical memory as it's needed)".

As an example, if you have 20 VMs, each with 4GB assigned but infrequently require more than 2GB of memory, you would be throwing away 40GB of RAM in the host if you committed enough memory for what you've assigned. Some VMs will use more memory, others less, but you can gauge both the average and peak amount of memory used after a few weeks of monitoring, then provision enough memory for those real-world circumstances. Chances are high it won't be the full 80GB assigned. That's overcommitting in theory.

What happens when all 20 VMs require their full memory allotment? Isn't that an unreasonable risk? No, actually. VMware uses a feature called transparent page sharing to use less memory for more VMs (it also uses a balloon driver and optimized algorithms in the hypervisor kernel). The idea is simple: Each VM typically has pages that are the identical in multiple VMs (EXEs and DLLs) that would waste physical memory if duplicated across all the VMs. So VMware reduces the VM guests' memory needs by loading those shared pages just once. It's the same approach used in Windows whereby multiple applications make calls to a common set of DLLs without loading those DLLs into memory multiple times.

Another consideration is what some people call good and bad memory overcommit. After considering the performance needs of your systems and using the scenario above with 20 VMs allocated 4GB of memory each, if the performance shows an average total use of 50GB and a maximum total use of 55GB, provisioning less than 55GB would overcommit in a bad way. If you provision less than the maximum observed or predicted usage, you will force the hosts to swap to disk too often, which is a performance killer. And even that max usage amount (in this case, 55GB) is too tight; 60GB is a better ("good") memory overcommit for this instance, providing what you need and some headroom, while still saving you 20GB of RAM.

VMware justifies its higher cost over Hyper-V by saying you'll easily make up the difference in the amount of memory you need to provision; lower memory costs pay for the pricier software and its overcommit capabilities. Maybe --  you can run (or research) scenarios that support and contradict this claim.

Cost is not the main issue of this discussion; fear of degraded performance or other problems from overcommitting is what causes many IT organizations to avoid overcommitting. I suggest you not fear overcommitting. Think of it instead as resource sharing. Most systems aren't maxing out their available memory anyway. It's a big reason that companies adopt virtualization in the first place: to reduce the resource waste across all those single-instance physical servers. So overcommitting is a natural concept for virtualization. It lets you use proven approaches such as transparent page sharing and other technologies to ensure memory is best implemented where needed.

Do you use memory overcommit in your environment? If you do, can you comment on the financial savings you've seen in your environment or the other benefits that make this worthwhile? If you choose not to implement memory overcommit or to use a hypervisor that doesn't include this feature, can you please explain your position?

This article, "Why you should embrace memory overcommits for your VMs," was originally published at InfoWorld.com. Follow the latest developments in virtualization at InfoWorld.com.