Datto SaaS Protection - Emergency Node Maintenance - Node des1-bfyii-2638
Incident Report for Datto
Postmortem

On July 25th at 9:30am EDT emergency maintenance began for node des1-bfyii-2638 due to the CPU overheating and failing.

The root cause of these backup issues was that the CPU on the node overheated, possibly due to the CPU fan failing, or the portion of the motherboard that controls the CPU fan failed.

Engineering teams took the following steps to remediate the problem: 

  1. Performed a RAM swap for the node
  2. Inspected the Power Supply Unit for the node
  3. Completed a chassis swap for the node

The following corrective actions have been identified to minimize the likelihood of this issue happening going forward: 

  • We need to Introduce CPU thermo-monitoring to alert and notify On-Call personnel when a CPU begins to exceed a high-temperature threshold. 

    • This feature is currently not supported by our alerting and monitoring infrastructure
Posted Jul 29, 2022 - 15:09 UTC

Resolved
This incident has been resolved.
Posted Jul 28, 2022 - 20:02 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jul 28, 2022 - 16:37 UTC
Identified
Engineering has identified the root cause of the issue and is currently working on identifying a fix. We will provide additional updates once available.
Posted Jul 27, 2022 - 16:45 UTC
Update
We are continuing to investigate this issue.
Posted Jul 25, 2022 - 18:41 UTC
Investigating
As of 9:30am EDT, emergency maintenance is underway for node des1-bfyii-2638. Downtime is unknown at this time.

Maintenance Type: Emergency Node Maintenance
Affected Node: des1-bfyii-2638
Impact: Backups are paused, and the UI is unavailable to users during this time.

Unsure of which node or pod you are on? We have a Knowledge Base article to walk you through determining your node or pod: https://help.datto.com/s/article/KB3600004035831
Posted Jul 25, 2022 - 14:45 UTC
This incident affected: Datto SaaS Protection (SaaS Protection Backups, SaaS Protection Console Login, SaaS Protection Seat Management, SaaS Protection Client Onboarding).