Terremark's luck turned sour on St. Patrick's Day, March 17, 2010. The company's vCloud Express service took a nosedive that day, with a Miami-based data center going offline for about seven hours. Users were unable to access data stored in the center for the entire period.
Not to get overly redundant, but this brings up the value of redundancy -- having your crucial data available on multiple servers in different data centers or, even better, different regions. You could also take the extra step of spreading it among different providers as a failsafe.
"You can pick a series of vendors to host a workload -- one as a backup or two as a backup, and then another as your primary," suggests Harold Moss, chief technology officer of IBM's Cloud Security Strategy program. "You can then implement your workload there in a secure manner, with the appropriate security, and start to introduce your resiliency capabilities."
This is no hypothetical exercise: PayPal fell for real in the summer of 2009, leaving millions of merchants around the world with no way to sell their stuff. The service was completely unavailable for about an hour and remained spotty for several more. PayPal said hardware failure was to blame.
It's a rare kind of outage, no doubt -- but with all the sales lost, this unfortunate interruption easily earns a spot in cloud computing's hall of shame.
Colossal cloud outage No. 10: Rackspace's rough year
When you provide cloud services to Web presences like TechCrunch and Justin Timberlake, you'd better believe people are going to notice when your servers stop working.
Rackspace learned that lesson a few times in 2009. The cloud provider suffered four high-profile failures throughout the year, adding up to hours of offline time for the company's customers. One blip was bad enough that Rackspace had to pay out nearly $3 million in service credits to its users.
Rackspace called the incidents "painful and very disappointing" and promised to "execute at a high level for a long time" after. Today, the company continues to focus on uptime but also works to help users plan for the inevitable turbulence that comes with life in the cloud.
"If you want to cluster a server or build geographical redundancy, it's easier to do now than it ever was before, but you have to actually take those steps," says Rackspace's Lew Moorman. "The cloud doesn't bring inherent weaknesses that weren't present if you did things in-house before."
All considered, the biggest lesson here may be that no single server, center, or service is 100 percent reliable. If you don't build your business with that in mind -- well, my friend, you're just walking around with your head in the cloud.
- Cloud development: 9 gotchas to know before you jump in
- How to integrate with the cloud
- Download: Cloud Computing Deep Dive Report
- Download: Cloud Security Deep Dive Report
- What cloud computing really means
This article, "The 10 worst cloud outages (and what we can learn from them)," originally appeared at InfoWorld.com. Track the latest developments in cloud computing at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.
Read more about cloud computing in InfoWorld's Cloud Computing Channel.