Gmail Gfails, Internet survives again

Yet another Google e-mail outage casts more doubt on the viability of cloud computing. But Cringely says there's always a trade-off

Maybe you didn't notice, but Gmail went down for about 100 minutes Tuesday, depriving millions of the Gfaithful of their beloved e-mail service.

[ Google has its hands full with a variety of issues these days, including Internet anonymity: "Skanks for nothing: Google must identify 'anonymous' blogger" | Stay up to date on Robert X. Cringely's musings and observations with InfoWorld's Notes from the Underground newsletter. ]

The gnashing of teeth and the renting of garments could be heard clear across Twitterville. Like these representative twits -- err, tweets:

Words and phrases like “apocalypse” and “digital terrorism” really did come to mind when I realized the #gfail.

gmail is down? im having an anxiety attack. i cant function without it.

No Gmail, Facebook is twitchy and Twitter is slow. My life may end just now…

Actually, as several Tweeters pointed out, Google's POP and IMAP servers were working just fine. It was the Web interface that did a face plant. But to the Twitterati who couldn't quite grasp that concept, it seemed the world had ended, at least temporarily.

Turns out the problem was fairly prosaic, though at the scale Google operates, even a hangnail can look like a fatal condition. The Google team took a few Gmail Web servers offline for routine maintenance, the servers that remained online got too slow and shut themselves down, and the resulting traffic overwhelmed the machines still left standing.

In a post to The Official Gmail Blog, "Site Reliability Czar" Ben Treynor writes:

The Gmail engineering team was alerted to the failures within seconds (we take monitoring very seriously). After establishing that the core problem was insufficient available capacity, the team brought a LOT of additional request routers online (flexible capacity is one of the advantages of Google's architecture), distributed the traffic across the request routers, and the Gmail web interface came back online.

He makes it sound so simple. If it was really that easy, why did it take them nearly two hours to fix it? (And If he's truly a czar, does he walk around all day swilling vodka and wearing a big furry hat? That could explain a lot.)

Treynor concludes: "Gmail remains more than 99.9% available to all users, and we're committed to keeping events like today's notable for their rarity."

By my reckoning, this would mark the ninth notable Gmail outage in little more than a year, including incidents this year in May, March, and February, plus October, August (three separate events), and July 2008. All enterprises eventually have problems with their e-mail systems (InfoWorld's had a few lately itself). But if any Fortune 500 firm had that many problems with its internal e-mail system over that span, some poor geek would be looking for a new job.

You can bet Microsoft is all grins this morning. Paid Content's Joe Tartakoff says a Redmond employee gleefully pointed him to all the Twitterati taking Google's "Gone Google" marketing campaign and flipping it to "Google Gone."

The inevitable "is cloud computing ready for prime time?" posts have resurfaced yet again, like mushrooms after the rain. For example: PC World's Ian Paul asks, "How viable is this Utopian computing future when the accessibility of your files is dependent on forces beyond your control?"

My take: When you get something you want, you generally have to give up something you also want. Want to call anyone from virtually anywhere on the planet? Get a cell phone. But don't expect that crystal-clear call quality and reliability that spoiled us in the era of the Ma Bell monopoly. I doubt that's ever coming back. Is that enough to make you give up your handset and go back to a landline? Not bloodly likely.

You want the enormous cost savings and convenience of cloud computing? Plan to give up at least some of the stability and reliability of your enterprise iron -- as well as the accountability for the geeks paid to maintain it. It's a trade-off. Whether the trade-off is worth it only you can decide. Most of the time it seems like it is. Yesterday, for about 100 minutes, not so much.

Did Gmail's Gfail make you less confident in the cloud? Post your lofty thoughts here or e-mail me: (yes, I think it's working again, knock on wood).

Copyright © 2009 IDG Communications, Inc.

How to choose a low-code development platform