Update: Google fixes lengthy, widespread Gmail malfunction

A 10-hour disruption that affected email delivery and attachment downloads affected close to 50 percent of Gmail users

A Gmail glitch that took about 10 hours to fix and hit close to 50 percent of the webmail service's users has been fixed, ending one of the longest, most widespread Gmail disruptions in years.

Affected users endured email delivery delays and difficulties downloading attachments due to a bug first acknowledged by Google at around 10:30 a.m. U.S. Eastern Time Monday. The company declared it patched at 10 p.m.

[ Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. | Stay abreast of key Microsoft technologies in our Technology: Microsoft newsletter. ]

On its Google Apps Status site, the company pegged the start of the problem at close to 9 a.m. and its resolution at 6:30 p.m.

The issue affected individuals who use the free version of Gmail as well as businesses, schools and government agencies that pay for it as part of the Google Apps cloud collaboration and email suite.

In the U.S., the disruption covered most of the workday on both coasts, which heightened the impact of the bug for millions.

People who depend on Gmail for critical tasks took to Twitter, discussion groups and other online forums to express their frustration.

The last time Google gave an official figure for active Gmail users was more than a year ago, when it said there were more than 425 million.

Assuming conservatively that the service now has about 450 million active users, Monday's disruption likely affected more than 200 million users, plus senders on other email platforms whose messages weren't received in a timely fashion.

Google said that the severity and length of the impact varied among users. About 29 percent of messages received were delayed by an average of 2.6 seconds, but some mail was "severely delayed."

"We apologize for the duration of today's event; we're aware that prompt email delivery is an important part of the Gmail experience, and today's experience fell far short of our standards," the company wrote on the status site.

The incident is a big deal for both Google and those affected, but it shouldn't on its own dissuade CIOs from using the suite, said Forrester Research analyst TJ Keitt.

"Data centers hosting multi-tenant collaboration services aren't immune to disruptions. So, when they happen, the way to judge the vendor is on how well they identify and resolve the problem, and then inform the public to how they resolved the issue," Keitt said.

Using that criteria, Google's updates throughout the duration of the incident could have been more transparent and detailed regarding the nature of the problem and the strength of the fix that was put in place, he said via email.

"They have clearly not communicated this publicly, so I hope they've been forthcoming with this information with their clients," Keitt said on Monday night.

Meanwhile, Matthew Cain, a Gartner analyst, said the incident raises fundamental questions about what is considered downtime, especially as it relates to service-level agreements from cloud application vendors.

"If message delivery is delayed 15 minutes, is that considered downtime? What about 2 hours?," he said via email. "The move to cloud email puts a spotlight on these essential questions about how to meter and compensate for subpar messaging performance that is not traditionally classified as 'downtime.'"

On Tuesday, Google offered more details about the cause of the problem and the steps it's taking to prevent it from happening again.

The cause was a "very rare" dual network failure, which brought down two separate, redundant network paths, according to a blog post from Sabrina Farmer, senior site reliability engineering manager for Gmail.

"The two network failures were unrelated, but in combination they reduced Gmails capacity to deliver messages to users," she wrote.

Over the next few weeks, Google staffers will work on bulking up network and backup capacity for Gmail, as well as on making Gmail's message delivery more resilient in the event of a network crash, according to Farmer.

"Finally, were updating our internal practices so that we can more quickly and effectively respond to network issues," she wrote.

Juan Carlos Perez covers enterprise communication/collaboration suites, operating systems, browsers and general technology breaking news for The IDG News Service. Follow Juan on Twitter at @JuanCPerezIDG.

Mobile Security Insider: iOS vs. Android vs. BlackBerry vs. Windows Phone
Join the discussion
Be the first to comment on this article. Our Commenting Policies