Product review: VMware pumps up VI3
ESX Server 3.5 and VirtualCenter 2.5 upgrades boost scalability and add handy new features; integrated tools for capacity planning, VM patching, and storage management fill important gaps, but wrinkles and loose ends remain
In the year or so since VMware released VMware Infrastructure 3.0 (VI3), it has come to be viewed as a watershed event in virtualization. Building upon the reliability of the VMware ESX 2.5 hypervisor, VI3 and its sophisticated VM management tools brought virtualization firmly into the IT mainstream. The recent upgrade to ESX Server 3.5 and VirtualCenter 2.5 doesn't equal the leap to VI3 (see my review, "Deep dive into VMware's virtual infrastructure"), but it does add a few features that will definitely come in handy for any virtualization implementation.
[ View our screencast demos of VMware Infrastructure 3's live VM migration using VMotion using VMotion and dynamic load balancing using Distributed Resource Scheduler ]
These features are generally focused on easing the maintenance burden imposed by a virtualized infrastructure. Virtualization promises to make server management intrinsically simpler, but as with many things in IT, it doesn't hit every target. Addressing some important pain points, VMware has added features such as patch management (Update Manager), live migration of VM disks (Storage VMotion), and a capacity planning wizard (Guided Consolidation) to the suite. Each of these new features fills a gap in the overall picture, and for the most part, does so quite well.
It's certainly true that many IT shops will build their first VMware implementation on ESX 3.5 and VirtualCenter 2.5, and will never have seen their predecessors, but by the same token, many current VI3 shops will be upgrading to the new revs as soon as possible to leverage the new features. To that end, my testing included not only brand-spanking-new ESX installations, but also -- the best possible test of any software release -- production upgrades.
The upgrade dance
Building VMware ESX hosts from scratch is as simple as it gets. Burn an ISO, insert into a server, boot from the CD, click OK a few times, and then add that host in the VMware Infrastructure Client. Configure the network, storage, and licensing, and that's essentially it. You can even reduce those steps if you PXE boot the VMware installer. Upgrading a host from ESX 3.0 to ESX 3.5 is actually simpler than building a new host, and requires very little downtime for the host. If your existing infrastructure is built properly, it means zero downtime for production VMs.
The first overall step to upgrading the whole infrastructure is to upgrade VirtualCenter (VC). Previous versions of VirtualCenter used MSDE (Microsoft SQL Desktop Engine) as the default database, but VMware's recommendation was to use the full-blown Microsoft SQL Server or Oracle Database to handle the database tasks. VC2.5 does away with the legacy MSDE, instead bundling Microsoft SQL Server 2005 Express Edition. This is a better database platform than MSDE, but is still designed for smaller implementations. In many production VMware environments, this is all that's necessary -- a welcome change from the previous iteration.
The upgrade to VC2.5 on a production VC2.0 server went cleanly, with a simple installation wizard pulling the strings. At the end of the process, the VC2.5 server was running, and using the new VC2.5 client, I could log in and view the production farm – except there wasn't one. The upgrade process didn't migrate the previous database to the new installation, and I had to redefine the clusters, hosts, and even templates that existed on the farm. In a small environment, this is simple. In a large environment, this could be a big problem. This gotcha, and many others in this upgrade, can be dodged by careful planning and research on the process, not to mention a thorough reading of the release notes. But VMware could have done more to smooth the way. I really would have liked to see a straightforward database migration process with validity checking during the upgrade to minimize problems in this area.
Following the upgrade, and the subsequent redefinition of some farm parameters, VC2.5 was running against an ESX 3.0 farm without issue. Next step: Upgrade the hosts.
The easiest way to upgrade an ESX host to ESX 3.5 is to download the ESX upgrade package from VMware. Customers with an existing support agreement can download the updates for free, and existing 3.0 licenses should work with 3.5 hosts. There are other methods of performing this upgrade, but using the upgrade packages is by far the simplest.
The ESX 3.5 upgrade package is essentially an archive containing RPM packages and some supporting scripts. Using SCP, I moved this archive to a folder on the central farm datastore, and began updating each host from that package. It's a relatively time-consuming process but still surprisingly simple. I first placed each host in Maintenance Mode, which forces the active VMs on that host to VMotion to other hosts in the farm, then ran esxupdate on that host, specifying the directory containing the ESX 3.5 upgrade packages. A few minutes and dozens of RPM updates later, the host was upgraded.
I then rebooted the host and took it out of Maintenance Mode in VirtualCenter. It was then just a normal host in the farm, and VMs began to migrate to it in accordance with the DRS (Distributed Resource Scheduler) rules present on the farm. The whole process took about 15 to 20 minutes per host, with most of that time spent waiting for the host to enter Maintenance Mode, and waiting for the host to come back up following the reboot. After the last host was done, the whole farm was up to ESX 3.5 with no ill effects.
Many software packages have the ability to be upgraded rather than re-installed. Most of the time, admins opt for the latter. The reason is that upgrades can bring about problems that aren't seen with bare-metal re-installations. Anyone who upgraded to Windows XP from Windows 2000 knows that this is true -- but in the case of ESX 3.5, the upgrade procedure seems to be very thorough. Several weeks since, it has not caused any problems at all.
Fun with virtual servers
VMware admins will notice a few new things right off the bat with VirtualCenter 2.5. First on that list are the annoying splash screens that now adorn most of the elements in the VMware Infrastructure Client. They're designed to be large, friendly displays with links to the most common tasks, but for anyone who's used VI3, they're not useful. Thankfully, it's possible to turn these off.
Beyond that annoyance, the new VirtualCenter is almost identical to the previous iteration in form and function, with some new buttons linking to the new features. There is one significant departure from the previous version that's worth noting: a new plug-in architecture. By implementing this, VMware has broadened the scope of what VirtualCenter can accomplish, and potentially opened the door to integrating third-party tools into the overall management infrastructure.
The big Consolidation button at the top of the new VC client is an obvious starting point. VMware has integrated its Capacity Planner code into VC2.5, allowing admins to gauge the impact of virtualizing existing physical servers without leaving the console. Coupled with VMware's physical-to-virtual (P2V) conversion tools, this is a built-in method of doing either piecemeal or wholesale migrations of an existing datacenter. Although plenty of third-party tools do P2V and migration planning, having these tools built into VC2.5 is handy for many smaller infrastructures. This feature, which requires administrator-level credentials for Windows systems, will discover servers on specified subnets and monitor their utilization and performance over time. Following this period, reports can be produced that will provide guidance in selecting physical servers ripe for virtualization, and provide a better picture of the overall utilization of an existing infrastructure. Although too much data can be a bad thing, it's generally not a bad idea to get as many viewpoints on actual server performance as possible when making these decisions. These new consolidation tools will be welcome in many IT departments.
Update Manager, descended from Shavlik's HFNetChkPro, is another big addition to VC2.5. Update Manager not only provides a control panel for applying updates and patches to ESX Server and groups of VMs, either ad hoc or on a scheduled basis, but can automate the entire process, taking a snapshot of the VM prior to patch application and retaining those snapshots for a configurable time period. So even if the patch makes a mess of your server infrastructure, you can quickly roll back to the snapshots and get things back up and running.
As with all patch management tools, Update Manager is subject to the vagaries of any automated system that attempts to make fundamental changes at the OS layer. Some patches will error, and some will work, but the display and configuration of Update Manager make that dicey reality tolerable to some extent. It's unlikely that we'll ever see a smooth and truly elegant multi-platform patch management solution in our lifetimes, but Update Manager is functional enough to be used on a regular basis, even when dealing with Linux patches -- which should be simpler than Windows and may be why it doesn't get the attention it deserves.
The other major feature addition is Storage VMotion. Traditional VMotion required that the host servers be connected to the same shared storage, be that iSCSI, NFS, or Fibre Channel, and when a VM transitioned from one physical host server to another, the storage remained in the same location -- only the VM's RAM footprint and network connections were moved. With Storage VMotion, everything can move from one host to another, including the disk. As with traditional VMotion, this happens live, without rebooting the VM.
Storage VMotion can be a slow process, especially if the storage isn't terribly speedy, but it does work. This functionality can be a lifesaver in a number of situations, such as during storage migrations and upgrades. It further reduces the management and maintenance tasks that require a VM reboot, which ultimately helps service uptime and further extends the number of tricks that VMs can do that physical servers can't. Working in concert with Storage VMotion and DRS is the new DPM (Distributed Power Management) capability that can be used to power off dormant hosts if load drops. This nice green feature requires Wake On LAN (WOL) support on the physical servers.
The halfway point
VMware did some nice things in this combo dot-five release but could have taken it further. There are still plenty of issues with VI3 that haven't been addressed, not the least of which are the truly obtuse error reporting and logging mechanisms. In one instance, trying to create a RDM (Raw Device Mapping) for a VM to directly interface with an iSCSI LUN would continually fail at the last step with a "General Error" statement that was thoroughly unhelpful. It turns out that the nature of RDM mappings require that pointer files be created on a VMFS file system that reference the iSCSI LUN mapped to the host. If the datastore in use happened to be NFS, then RDM pointers can't be created, and thus can't be used.
Seeing an error message with that information anywhere along the line would have been extremely helpful.
A number of other common functions still need work. For instance, if you rename a VM in the client, it renames the VM in the display, but it doesn't rename the folders and files related to the VM. Thus, you can't create a VM using the old name. Further, if you migrate the "renamed" VM to another datastore while it's powered off, some of the files get renamed to the new VM name, but some don't -- notably snapshots. In this instance, you're left with a nonfunctional VM after the migration. This seemingly simple step can be highly frustrating, and there's really no excuse for it. Simply renaming a VM shouldn't cause so much trouble.
Networking is still more complex than it perhaps needs to be, with service consoles, VMkernel interfaces, multiple default routes, and so on. It would be nice to see a consolidation of sorts there, with clearer definitions of networking functions and certainly clearer interpretations of commonly used networking terms. Configuring EtherChannel NIC bonding requires navigating a labyrinth of dialog boxes that becomes tiresome very quickly, although the addition of CDP (Cisco Discovery Protocol) support in ESX 3.5 goes a long way toward untying some knots by making it far simpler to identify connected switchports on each ESX host, provided you're using CDP-compliant switches. In some instances, though, this new feature turned up blank even when connected to a Cisco 6509 with CDP enabled. Speaking of networking, one of the more welcome hardware support updates is for selected 10-gigabit cards from Neterion and NetXen.
There are more new features to be found in ESX 3.5, such as IPv6 support for VMs, increases to host logical CPU counts and RAM counts (32 CPUs and 256GB, respectively), and support for as much as 64GB of RAM per VM. VirtualCenter 2.5 is more scalable as well, able to manage as many as 200 ESX hosts and 2,000 VMs. Another welcome improvement to VirtualCenter: VM client tool installations can now be automated on both Linux and Windows, thank you.