RMM - Concord - Agents going offline / False offline alerts - under investigation
Incident Report for Datto
Postmortem

User Impact: Devices reconnecting and general connection instability, causing dropped sessions and false offline alerts.

Root Cause Analysis: A buildup of job related "action flags" caused a slow down in processing other messages as the existing flags were not being cleared out as expected. The slow down resulted in the failure of some devices to receive a response prior to their timeout. As such, the Ping response message had failed to reach the platform and triggered a reconnect and offline alerts.

We have cleared out the backlog of flags and an investigation is underway to determine the best method of avoiding a this behavior in the future.

Posted 3 months ago. May 14, 2019 - 13:49 UTC

Resolved
This incident has been resolved.
Posted 3 months ago. May 08, 2019 - 22:10 UTC
Update
We are continuing to monitor for any further issues.
Posted 3 months ago. May 08, 2019 - 20:48 UTC
Update
We are continuing to monitor for any further issues.
Posted 3 months ago. May 08, 2019 - 20:48 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted 3 months ago. May 08, 2019 - 19:09 UTC
Update
We are actively taking steps to address this concern. As a result - You may see jobs temporally not running. We apologize for the inconvenience. We will update this page within 30 minutes with an update as to the the status.
Posted 3 months ago. May 08, 2019 - 18:48 UTC
Investigating
We are currently investigating a number of reports of agents going offline / false offline alerts being generated on the Concord platform. We will update this page within 30 minutes with an update as to the status.

Thank you for your patience!
Posted 3 months ago. May 08, 2019 - 18:30 UTC
This incident affected: Datto RMM (Concord (US East)).