Microsoft's Azure cloud infrastructure and development service experienced a serious outage on Wednesday, with the system's service management component going down worldwide starting at 1:45 a.m. GMT.
"We are experiencing an issue with Windows Azure service management. Customers will not be able to carry out service management operations," Microsoft said in an initial message on the outage on its Azure service dashboard.
[ Also on InfoWorld: What's new in Windows Azure. | Get the no-nonsense explanations and advice you need to take real advantage of cloud computing in InfoWorld editors' 21-page Cloud Computing Deep Dive PDF special report. | Stay abreast of key Microsoft technologies in our Technology: Microsoft newsletter. ]
The issue has been "mitigated and service management is restored for the majority of customers," Microsoft said in a message posted at 1:30 p.m. GMT. "We still need to work through some issues before we can completely restore service management."
The incident's root cause "has been traced back to a cert issue triggered on 2/29/2012 GMT," Microsoft said in a previous update.
At 5 a.m. GMT, Microsoft said less than 3.8 percent of hosted services had been affected, and measures had been taken to stop the problem "from spreading across the production environment."
In addition, Azure customers in the north and south central U.S. as well as northern Europe may be experiencing some performance problems, according to a message on the dashboard posted at 10:55 a.m. GMT.
"Incoming traffic may not go through for a subset of hosted services in this sub-region," it stated. "Deployed applications will continue to run. There is no impact to storage accounts either."
At 1:30 p.m. GMT, Microsoft said it was "still troubleshooting" the issues affecting these regions.
As of 9 p.m. GMT, the service management function was still experiencing a worldwide outage, according to the dashboard.
But in an update posted at 7:30 PM GMT, Microsoft said it was "actively recovering Windows Azure hosted services in the North Central US, South Central US and North Europe sub-regions," and that "more and more customers applications should be back up-and-running even if service management functionality is not yet restored."
Previously, as Wednesday wore on, the dashboard reported other outages affecting different aspects of the platform.
The SQL Azure Data Sync service was unavailable in six regions around the U.S., Europe and Asia, and various problems were also listed for some regions regarding Access Control 2.0, Azure Reporting, Azure Marketplace and Azure Service Bus.
The notifications promised regular updates on the work being done to fix the issues, but no concrete timetables.
Azure users posted a stream of critical comments about the outages to the service's official forums on Wednesday.
"The dashboard shows it's being worked on," one commenter said. "Since we rely heavily on Windows Azure, we've been monitoring the dashboard closely the entire day. What I've noticed is a complete lack of estimates on (when issues will be resolved. For the last 4 hours, the status has essentially been 'The restoration steps to mitigate the issue are underway.'"