A "unique" breakdown coupled with a previously unknown flaw in Exchange Online caused Tuesday's extensive outage, and to make matters worse, the service disruption alert system also malfunctioned, leaving some affected customers in the dark.
So said Rajesh Jha, corporate vice president of Office 365 engineering, in an incident report posted to the Office 365 support forum in which he also addressed another separate, prolonged Lync Online outage from Monday.
[ InfoWorld's J. Peter Bruzzese cautions: Don't kill Exchange yet! Migrate to Office 365 step by step | Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. | Stay abreast of key Microsoft technologies in our Technology: Microsoft newsletter. ]
"I want to apologize on behalf of the Office 365 team for the impact and inconvenience this has caused. Email and real-time communications are critical to your business, and my team and I fully recognize our accountability and responsibility as your partner and service provider," he wrote.
For customers on U.S. Eastern time, the Exchange Online outage covered virtually the entire workday.
The main selling point from Microsoft, Google, Amazon, and other providers of cloud software and computing services is that their customers don't need to worry about maintaining on-premises servers, patching applications, and rebooting systems that crash.
While no one expects even these mighty technology companies to be perfect, an email outage that lasts for almost nine hours during a workday is sure to plant the seeds of doubt on business managers about the wisdom of turning off their on-premises email servers and trusting this essential communications service to a cloud provider.
The second-guessing is bound to be even more intense when the email breakdown happens the day after a significant outage affecting Lync Online, which Office 365 customers use for instant messaging, presence, audio communications, videoconferencing, Web meetings, and in some cases, IP telephony.
Many were IT professionals who were fielding complaints from their frazzled users, while having no control over the problem and little information from Microsoft about its cause and estimated time of resolution.
Jha addressed this breakdown in communications, saying that during the Exchange Online incident "we also experienced a problem with our Service Health Dashboard (SHD) publishing process, meaning not all impacted customers were notified in a timely way which we realize was frustrating and this has since been addressed."
For Microsoft, back-to-back outages of this magnitude are poisonous, embroiled as it is in a vicious fight with Google in the cloud email and collaboration suite market.