RMM - Merlot - Unable to create Jobs
Incident Report for Datto
Postmortem

What happened and why?

The 11.9.0 version upgrade included a new logic to remove jobs older than 6 months. While this function did not present an issue with product performance during QA testing, unfortunately, problems presented in the production environment when a large number of records were started to be pruned.

The pruning action caused high resource usage in the database, and resulted in database locks. This in turn caused a timeout or direct failure when creating new jobs on the Syrah platform.

How did we respond?

The database clean-up logic has been disabled to resolve the issue for users on Syrah, and prevent the same issue from occurring on platforms where the 11.9.0 version upgrade was scheduled to be deployed in the following days.

Work on a subsequent code change to the clean-up logic has started immediately to apply the lessons learnt from the incident, and avoid the problem from resurfacing once the logic has been re-enabled on 11.9 platforms.

How are we making incidents like this less likely or less impactful?

More rigorous risk assessment procedures for code changes have been introduced into the development and release review processes.

Posted Jun 14, 2023 - 14:45 UTC

Resolved
This incident has been resolved.
Posted May 30, 2023 - 18:50 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 30, 2023 - 14:43 UTC
Investigating
Our teams are currently investigating the inability to create Jobs in Datto RMM on the Merlot platform.

Thank you for your patience!
Posted May 30, 2023 - 14:41 UTC
This incident affected: Datto RMM (Merlot (EU2)).