RMM - Concord - Agents going offline / False offline alerts - under investigation
Incident Report for Datto
Postmortem

User Impact: Devices reconnecting and general connection instability, causing dropped sessions and false offline alerts.

Root Cause Analysis: A buildup of job related "action flags" caused a slow down in processing other messages as the existing flags were not being cleared out as expected. The slow down resulted in the failure of some devices to receive a response prior to their timeout. As such, the Ping response message had failed to reach the platform and triggered a reconnect and offline alerts.

We have cleared out the backlog of flags and an investigation is underway to determine the best method of avoiding a this behavior in the future.

Posted May 14, 2019 - 13:49 UTC

Resolved
This incident has been resolved.
Posted May 08, 2019 - 22:10 UTC
Update
We are continuing to monitor for any further issues.
Posted May 08, 2019 - 20:48 UTC
Update
We are continuing to monitor for any further issues.
Posted May 08, 2019 - 20:48 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 08, 2019 - 19:09 UTC
Update
We are actively taking steps to address this concern. As a result - You may see jobs temporally not running. We apologize for the inconvenience. We will update this page within 30 minutes with an update as to the the status.
Posted May 08, 2019 - 18:48 UTC
Investigating
We are currently investigating a number of reports of agents going offline / false offline alerts being generated on the Concord platform. We will update this page within 30 minutes with an update as to the status.

Thank you for your patience!
Posted May 08, 2019 - 18:30 UTC
This incident affected: Datto RMM (Concord (US East)).