Don't just blame the cloud for the Amazon Web Services outage

No technology or vendor is perfect, but the promise of the cloud is too great to abandon

Amazon Web Services has once again found itself in the unenviable position of being the poster-boy-turned-whipping-boy for the cloud computing world due to another high-profile service disruption that severely slowed down or knocked out a handful of heavily trafficked websites and services, including Netflix, Reddit, Airbnb, imgur, Pinterest, Heroku, and Foursquare.

Like clockwork, the outage has generated a healthy debate around the blogosphere as to whether this most recent downtime spells doom for the cloud in general or for Amazon in particular, or whether affected AWS users accept a share of the blame for taking the cheap route and signing up for the bare-bones, single-region AWS services to host their mission-critical services.

AWS confirmed on its status page at 11:11 p.m. PT yesterday that it had experienced "degraded performance for a small number of EBS (Elastic Block Store) volumes." The page said that the issue was restricted to a single Availability Zone within the U.S.-East-1 Region, which is in Northern Virginia.

If that region rings a bell, it's because it was at the center of the significant AWS outages earlier this year: The facility suffered an outage in June -- purportedly caused by a line of powerful thunderstorms but also some bugginess -- that disrupted services, including Elastic Compute, Elastic Cache, Elastic MapReduce, and Relational Database Services. A previous Amazon Web Services outage occurred in the same facility on June 14.

By this morning, AWS reported it had restored IO for the majority of EBS volumes, and a small number of volumes would require customer action to restore IO. The company also reported that the remainder of affected ELB load balancers had been recovered, and those services were operating normally.

Amazon's Relational Database Service also was affected. At 11:03 yesterday morning, Amazon reported experiencing "connectivity issues and degraded performance for a small number of RDS DB Instances" in a single Availability Zone in the Northern Virginia data center. By around 2 p.m. Pacific, the company stated that the "recovery process to bring remaining RDS instances back online was continuing at a steady pace."

"Customers can launch new database instances," according to the status report. "Customers with impaired DB instances do have the option of initiating a Point in Time Restore operation."

Also affected was the CloudSearch service, which was causing elevated error rates for the search and document service. The company said those problems had been fixed by just after midnight Pacific.

Amazon has been mum on the cause of these outages, but that hasn't stopped the predictable deluge of criticism and speculation from onlookers as to where the blame ultimately lies.

The easiest target is -- and will continue to be -- the cloud itself. You don't have to go far in comment threads and select blogs to see chiding comments along the lines of, "But I thought the cloud was supposed to solve all our problems! Boy, I am glad we host our own site or services."

No surprises there: The cloud has been the target of slings and arrows since day one. Sure, cloud computing can bring new complexities and, thanks to its porous nature, new vulnerabilities to an IT environment. Then again, the same could be said when laptop computers, mobile devices, and network-connected remote offices came along. Heck, even the Internet itself is a huge security threat. But the cloud provides levels of flexibility and affordable, easy access to powerful computing resources that have undeniably helped organizations young and old thrive.

1 2 Page
Join the discussion
Be the first to comment on this article. Our Commenting Policies