PSA - AE2 - Long loading times
Incident Report for Datto
Postmortem

On June 20th 2022 at 13.21 UTC, Autotask PSA Partners in the AE2 zone experienced a service interruption which caused latency.

The root cause for this service interruption was identified to be Microsoft Windows Critical and Security updates were installed across our server estate on the 18th of June, as per our normal release schedule. During the course of the patch distribution, several hosts that serve the AE2 zone failed to resume normal operations, but also did not alarm within our monitoring system. This caused and increased load on the remaining hosts serving the AE2 zone. Once engineers noticed the abnormal increase in load, they discovered the hosts that were not servicing the zone. Those hosts were brought online, decreasing the overall load of the zone, resuming normal operations. 

Our Engineering team deployed a fix to correct the problem on June 20th 2022 14:08 UTC.

We are investigating enhancements to our monitoring around monthly Windows Critical and Security updates so that we can be alerted if hosts do not resume normal operations after updates are applied.

Posted Aug 03, 2022 - 15:49 UTC

Resolved
This incident has been resolved.
Posted Jun 20, 2022 - 15:17 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jun 20, 2022 - 14:29 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Jun 20, 2022 - 14:08 UTC
Update
We are continuing to investigate this issue.
Posted Jun 20, 2022 - 14:07 UTC
Investigating
Our teams are currently investigating long loading times for PSA on the AE2 Zone.

Thank you for your patience!
Posted Jun 20, 2022 - 13:41 UTC
This incident affected: Autotask PSA (America East 2).