Datto RMM - Zinfandel Platform - Agents Not Connecting To Platform
Incident Report for Datto
Postmortem

On Tuesday May 14th 7:17 AM UTC Datto RMM Partners experienced a service interruption where devices experienced latency issues while connecting to the Zinfandel platform.  

The root cause of this service interruption was identified as a memory saturation issue experienced by the load balancer service. As a result of this the load balancers could not connect to the backend service handling agent connection to the platform.  

Various mitigation steps were taken to reduce the impact of the issue while the R&D team was working on the resolution and the team deployed a permanent fix on May 17th 17:42 PM UTC by scaling the size of the instances that represent the load balancer service. Alert mechanism related to instance memory will be incorporated in our infrastructure to get awareness of these kinds of issues much earlier going forward and avoid an incident like this.

Posted May 22, 2024 - 08:00 UTC

Resolved
This incident has been resolved.
Posted May 16, 2024 - 15:52 UTC
Monitoring
A fix has been put in place, and the Zinfandel Platform has been taken out of Maintenance Mode. We are monitoring the fix to ensure resolution of this issue.
Posted May 15, 2024 - 18:31 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted May 15, 2024 - 17:00 UTC
Update
We have updated the title of this incident to "Agents Not Connecting To Platform". We have placed the Zinfandel Platform into a Maintenance Mode to help mitigate the false offline alerts that might be generated. Devices may cycle between online and offline as we investigate this issue.

Thank you for your patience.
Posted May 15, 2024 - 16:09 UTC
Investigating
Our teams are currently investigating reports of False Offline Alerts for Datto RMM on our Zinfandel Platform.

Thank you for your patience!
Posted May 15, 2024 - 15:08 UTC
This incident affected: Datto RMM (Zinfandel (US West)).