On June 14, the Amazon Web Services cloud computing platform experienced a serious outage in its Virginia (U.S.-East) data center. Apparently power-related, the outage took down portions of one of the four independent availability zones that operate in that data center. As a result, many popular websites and a slew of less popular ones disappeared from the Internet for several hours.
As in previous outages of megascale cloud implementations from likes of Amazon and Microsoft, this incident triggered a round of hysteria about the future of cloud computing. Surprisingly, unlike the response to last April's AWS outage, many rushed to Amazon's defense. This could be a reflection of the fact that attitudes toward the cloud and its inevitable failings are becoming more realistic, or it could simply be that this month's outage was far less widespread. In either case, anti-public-cloud pundits and competitors alike wasted no time in using this failure to underline why the public cloud is an incredibly bad idea.
[ Stay on top of the current state of the cloud with InfoWorld's special report, "Cloud computing in 2012." Download it today! | Also check out our "Private Cloud Deep Dive," our "Cloud Security Deep Dive," our "Cloud Storage Deep Dive," and our "Cloud Services Deep Dive." ]
Fear, uncertainty, and doubt that helps no one
As I've said before, I am still relatively shocked by the wild reactions that always seem to follow these highly publicized events. One blog entry written by private cloud vendor Piston Computing particularly caught my eye. In it, Piston co-founder Gretchen Curtis opined that this most recent AWS outage was proof it's better to own than to rent. Although buying may indeed be better than renting in many cases, I lament the black-and-white nature of this post, and think it's a great example of the FUD from self-interested entities (Piston sells data center technology, whereas Amazon rents it) that always seems to trail similar events and in the end serves no one well.
I won't go point by point on Curtis' post because I happen to agree with much of it -- at least in the very large enterprise sphere that forms the sweet spot for Piston's implementation of OpenStack. But what irks me about it -- and much of the other editorial commentary -- is that the AWS outage doesn't back up the claims Curtis made. Her points were valid, but they were unrelated to the AWS outage.
What many -- both proponents and detractors of public cloud offerings -- seem to miss is that being in the cloud does not and will never free you from having your own disaster-recovery and high-availability measures in place to defend against the failures and outages that will inevitably occur.
In an on-premise or private cloud infrastructure, that means deploying redundant core infrastructure hardware and maintaining a testing regimen to ensure it's working. In the cloud, you may not be concerned with the hardware, but you need to diversify your workloads across multiple availability zones within a cloud provider or even across multiple cloud providers. Conceptually, it's no different than what you do on-premise, although it may bear little resemblance in execution.
Of course, if you're large enough to have the correct economies of scale, you may find delivering that kind of high availability coupled with the elasticity the public cloud offers may be cheaper and easier to do in an on-premise private cloud -- and I believe that was the thrust of Curtis' blog post.
The real issues: Getting the right tool for the job, learning from experience
That decision, however, is an issue of selecting the right tool for the job. Just as no one screwdriver is appropriate for every screw in existence, any of the public cloud, private cloud, traditional on-premise infrastructure, or hybrids of the three may end up being the right tool for you. The key to making a good choice is truly understanding the pros and cons of each approach and being able to match them to your needs -- areas in which neither the breathless pro-cloud nor staunch anti-cloud narratives can really help.
That's not to say I don't appreciate a vigorous post-outage debate about what went wrong in a given failure and how (or whether) it will be avoided in the future. Though some public cloud providers are less than forthcoming with real details, at least we're aware of the general cause and what was done to fix it.