RMM - Concord, Zinfandel, Syrah, Merlot - Delayed Audits
Incident Report for Datto
Postmortem

On 14-October-2021 at 21:09 UTC, Datto RMM Partners on the Concord, Zinfandel, Syrah and Merlot platforms experienced a service interruption which caused audits to be delayed on new and existing devices.

The root cause for this service interruption was identified to be a code change deployed in the 10.0 release that introduced changes to data validation in audit data sent by agents.

The change caused an unforeseen increase in the size of delta audits, which in turn resulted in increased processing time on the platform.

Our Engineering team increased platform resources to cover the increased load while they were working on a hotfix for the issue. This resolved the symptoms of the issue by 15-October-2021, 4:45 UTC.

The Agent code change causing the issue was reverted; the hotfix was created, tested and released on platforms already on the 10.0 version by 15-October-2021, 13:47 UTC. Agents were not forced to update, but rather let to organically update through their regular procedure to ensure that day to day operations are not disrupted.

The issue was considered fully resolved once the 10.0 release has been deployed to the Pinotage platform as well with the fix already included on 19-October-2021.

In order to prevent a similar issue from happening in the future, pre-release Code review and QA processes have been updated to cover scenarios that caused this interruption.

Posted Oct 26, 2021 - 10:17 UTC

Resolved
This incident has been resolved.
Posted Oct 19, 2021 - 12:34 UTC
Monitoring
The Engineering team has deployed a new agent version that fixes the root cause of the issue. RMM Agents on endpoints will update with their regular update process.
We will continue to monitor the health of the service.
Posted Oct 15, 2021 - 13:47 UTC
Update
We are continuing to work on a fix for this issue.
Posted Oct 15, 2021 - 10:06 UTC
Identified
Our engineers have identified the issue and are currently working on a fix.

In the interim we have implemented changes that should mitigate the issue and audits should now be processing in the usual timeframe.

Note that some devices may still have a short delay until their next audit is submitted.
Posted Oct 15, 2021 - 04:45 UTC
Update
Our team is still investigating this issue, Partners may continue to experience delays with device audit.

Thank you for your continued patience.
Posted Oct 15, 2021 - 01:42 UTC
Update
We are still investigating this issue and will provide an update as soon as we are able.
Posted Oct 14, 2021 - 22:27 UTC
Investigating
Our teams are currently investigating delayed audits on new and existing devices for Datto RMM on Concord, Zinfandel, and Syrah. This may also cause a delay in policies being applies to devices. An update will be posted here within 30 minutes with the status of this investigation.

Thank you for your patience!
Posted Oct 14, 2021 - 21:09 UTC
This incident affected: Datto RMM (Merlot (EU2), Zinfandel (US West), Concord (US East), Syrah (APAC)).