Credit: Teerawut Punsorn
In thinking about virtualized environments, we're at a point in time where we can provide abundant resources to a single physical server, yet we're still beholden to the age-old elements of hardware failure. This leads to overly optimistic planning and, ultimately, downtime.
It's not unreasonable to spec out a physical server with 128GB, 256GB, or even 1TB of RAM, 16 to 48 CPU cores, and a slew of 10G interfaces. Such a server could easily handle dozens, possibly hundreds, of VMs depending on the workloads. On the face of it, we could run the equivalent of three racks of 1U physical servers from 2004 on a single 1U server today. It truly is an amazing evolution in general computing. It's also dangerous, because when that server tanks for whatever reason, the problems generated by that failure are vast, far surpassing the failure of a 1U server from nine years ago. For some reason, this risk isn't factored into many virtualization builds.
[ The true grit of IT troubleshooting | Doing server virtualization right is not so simple. InfoWorld's expert contributors show you how to get it right in this 24-page "Server Virtualization Deep Dive" PDF guide. | Get the latest practical info and news with InfoWorld's Data Center newsletter. ]
The fact is many small-to-medium-size businesses can run their entire server operations on a single modern server. If we're talking about 40 or 50 general-purpose VMs, it's completely doable. Most builds add a second server for load balancing and failover, so you have the entire business running on four CPUs, however much RAM, and four power supplies. We're back to the mainframe, but without the RAS (reliability, availability, serviceability) features. Internal system failures, power issues, upgrades, and the like can easily take one of those servers out of commission, and we're down to a single box again and the potential of dealing with powering up dozens of VMs that were lost when the other server failed.
It's a very tenuous situation at best and catastrophic at worst, yet I see many builds that try to pack as much as possible into a few physical servers and call it a day. A much better solution is to reduce the resources per server and add more physical systems to the mix.