Users complained about not being able to access various Microsoft services yesterday, including Microsoft 365 products like Office, Outlook, and Azure.
The incident—which lasted nearly 10 hours—took place less than two weeks after a CrowdStrike update caused Microsoft Windows machines to crash. Companies affected by the new outage include U.K. bank NatWest, according to the BBC.
What Happened At Microsoft?
The incident started at approximately at 11:45am UTC and was resolved at 19:43pm, according to Microsoft’s Azure status history page. According to Microsoft, a “subset of customers may have experienced issues connecting to a subset of Microsoft services globally.”
Impacted services included Azure App Services, Application Insights, Azure IoT Central, Azure Log Search Alerts, Azure Policy, as well as the Azure portal itself and “a subset of Microsoft 365 and Microsoft Purview services.”
Microsoft says the “initial trigger event” was a DDoS attack, which sees adversaries flood services with traffic in order to bring them to a standstill.
Microsoft describes an “unexpected usage spike” which resulted in Azure Front Door and Azure Content Delivery Network components “performing below acceptable thresholds, leading to intermittent errors, timeout and latency spikes.”
Most firms have protection in place to prevent DDoS from having an impact. The initial DDoS attack had activated the firm’s DDoS protection mechanisms, but an error in the implementation of defenses “amplified the impact of the attack rather than mitigating it,” Microsoft admits.
It appears that the outage was caused by DDoS attack—despite the fact Microsoft had protections in place, says Sean Wright, head of application security at Featurespace. “Similarly to the CrowdStrike issue a few weeks ago, it appears that an error occurred in the software that was used to protect against DDoS attacks,” Wright says.
This is highlights the importance of testing software thoroughly, he says.
What’s Next?
The CrowdStrike incident had already—and unfairly—created bad optics for Microsoft, so the timing of this new outage is unlucky. Microsoft knows this and has communicated clearly throughout the outage, saying it will publish a Preliminary Post Incident Review within approximately 72 hours, to share more details on what happened and how it responded.