Is it possible that Microsoft's doing something right?
If you're even slightly interested in Office 365, by now you know that Microsoft had a major service disruption on Wednesday. At about 12 p.m., Redmond time, one of Microsoft's data centers in North America turned belly up, taking out the Office 365 Exchange servers. For those of you who have migrated to Office 365, you had no email at all for about three hours, and then messages slowly trickling in for an hour or so after that.
At about the same time, Microsoft's CRM Online system went down, and there are many reports that SkyDrive went down as well.
The silver lining? Office 365's Lync and SharePoint kept working. But there's one more thing that seems to get lost amid all the angst: Microsoft kept its customers reasonably well informed. It fessed up.
No, there was no explanation for why the outage happened -- even now, we don't have a backstory -- and no estimate of the time to repair. There was no talk about CRM or SkyDrive. But the worst problem, Office 365's Exchange Online sleeping with the fishes, was reported on the Service Health Page as Incident Ex440.
The Office 365 support 800-number went dead. The Office 365 Service Request link on the Admin page didn't work. The Office 365 Community Forum went ballistic, with very little response from Microsoft. Based primarily on the thread on the Office 365 forum, the timeline went something like this (translated into Pacific time):
12:07 p.m. -- First post that Outlook and OWA are dead; the Support Request link on the Admin page is dead.
12:14 p.m. -- Exchange is dead. The Service Health Page shows all green; everything's OK. No response on the forum from Microsoft.
12:33 p.m. -- Nobody seems to be able to get through to Microsoft phone support.
12:32 p.m. -- Incident Ex440 appears on the Exchange Online Service Health Dashboard. "We are investigating a service issue and will provide updated information when it becomes available." Incident Start Time stamped at noon.
12:37 p.m. -- Phone support says it's a nationwide outage. No estimated repair time.
12:42 p.m. -- Incident Ex440 updated, with the same "We are investigating" message posted.