[Datto SaaS Protection] 500 Internal Server Error - Backup Failures
Incident Report for Datto
Postmortem

On 8/17/22, SaaS Protection M365 Backups began to fail intermittently across a subset of SaaS customers, primarily for Microsoft Teams and Exchange. Beginning the morning of 8/23/22 - the frequency of these backup failures began to increase dramatically – impacting a larger cross-section of SaaS Protection customers.

The root cause of these failures was a bug introduced by Microsoft in the Exchange Web Services API.

SaaS Protection Engineering escalated this issue to Microsoft on the afternoon of 8/18/22, while in parallel exploring options to address the issue internally. Our Engineering team deployed a change that mitigated the problem on 8/24/22 at 16:00 UTC, after which Backup Success Rates quickly returned to normal levels. Microsoft deployed a fix for this issue to their service on 8/26/22.

Posted Sep 28, 2022 - 14:12 UTC

Resolved
This incident has been resolved.
Posted Sep 12, 2022 - 17:38 UTC
Update
We are continuing to monitor for any further issues.
Posted Aug 25, 2022 - 14:49 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Aug 24, 2022 - 20:00 UTC
Update
At this time, Engineering is still investigating this issue. Upon further investigation, we have determined that the 500 errors and intermittent backup failures for Exchange and Teams are related to an API issue on the Microsoft side. Based on logging, Engineering has confirmed that this issue impacts all Exchange and Teams services intermittently across the fleet. Our team is currently working with Microsoft Premier Support to identify the root cause for the issue, so that we can identify a permanent fix.

Engineering has identified a potential workaround that is currently in QA. Upon successful testing, we will deploy the workaround to production. We will provide another update within the next 24 hours.

You can monitor the current status of this issue at https://status.datto.com/
Posted Aug 23, 2022 - 22:34 UTC
Update
We are continuing to investigate this issue.
Posted Aug 19, 2022 - 18:13 UTC
Update
We are continuing to investigate this issue.
Posted Aug 18, 2022 - 20:14 UTC
Update
We are continuing to investigate this issue.
Posted Aug 18, 2022 - 16:48 UTC
Investigating
We are currently aware of a problem where backups for our SaaS applications are failing with an internal 500 error.
Our Engineering team is currently investigating this issue.
Currently, we do not have an ETA on when a fix will be available.
You can monitor the current status of this issue at https://status.datto.com/
Posted Aug 18, 2022 - 15:58 UTC
This incident affected: Datto SaaS Protection (SaaS Protection Backups).