August 21, 2007

Skype offers more details of 'perfect storm' outage

Exceptionally high traffic at the same time as the Windows Update process led to a shortage of supernodes in Skype's peer-to-peer network

The situation that prevented millions of people from accessing Skype's Internet telephony service late last week was a "perfect storm" and should not reoccur, the company said Tuesday.

The company initially attributed the problem, which began on Aug. 16, to the near-simultaneous rebooting of millions of computers, as Skype users running the Windows operating system attempted to reconnect to the service after downloading a series of routine software patches from Microsoft's Windows Update service.

Skype's service relies on some of its users' computers to act as "supernodes," routing traffic for other, less well-connected, users. But as Skype customers tried to reconnect, many of those supernodes were themselves in the process of rebooting. The remaining supernodes were soon overwhelmed because a bug in the company's software did not efficiently allocate the network resources available.

Users were sceptical of this explanation, because Microsoft regularly issues patches that may cause Windows computers to reboot, and this has not caused problems for Skype before. Microsoft releases software updates on the second Tuesday of each month, a day known to systems administrators as "patch Tuesday."

Skype spokesman Villu Arak offered a more detailed explanation of Skype's outage on Tuesday: Last week's problems were the result of a "perfect storm" of exceptionally high traffic through the service at the same time as the Windows Update process led to a shortage of supernodes in the service's peer-to-peer network.

The company did not offer an explanation for the high traffic, but accepted full responsibility for the software problem.

"Skype and Microsoft engineers went through the list of patches that had been pushed out," Arak wrote. "We ruled each one out as a possible cause for Skype’s problems. We also walked through the standard Windows Update process to understand it better and to ensure that nothing in the process had changed from the past (and nothing had)."

The catastrophic effect on Skype's service was entirely Skype's fault, a result of its software being unable to deal with simultaneous high load and supernode rebooting, according to Arak.

On Aug. 17, the day after the problems began, Skype released a new version of its software client for Windows to correct the problem. That update should behave better the next time high traffic coincides with a scarcity of supernodes, he said.

Skype had updated versions of its software client for Windows, Mac, and Linux since July's patch Tuesday and before last week's outage, but the changes made in those updates were not responsible for the problem, according to company spokeswoman Imogen Bailey.

Close

On Twitter now

Networking

Powered by Twitter

On Twitter now

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive Networking Resource Alerts

Subscribe to the Today's Headlines: First Look Newsletter

Find out what will be news for the day, with our first-thing-in-the-morning briefing.

©1994-2009 Infoworld, Inc.