Datto RMM - Syrah (APAC) - Webportal Login Failures
Incident Report for Datto
Postmortem

User Impact:

Platform functionality unavailable to partners except alert and monitor metrics as these were still submitted

Root Cause Analysis:

A failure of the AWS Aurora Service in the AP-SOUTHEAST-2 region that Syrah is hosted in, impacting both master and slave databases. As a result, there was a failure in the primary back-end data store.

We continue to invest in projects to improve platform resilience and stability with a focus on providing a reduced service in events such as these (e.g. monitor metrics and alerts).

We apologize for any inconvenience this may have caused.

Posted Feb 14, 2019 - 16:31 UTC

Resolved
This incident is now resolved.

Syrah Platform is back to operational.

If any new issues with Syrah platform are still being experienced, please reach out to our Support team.

For any RCA related to this incident, please contact our support team via our support portal.

Thank you for your patience and we apologize for the impact on your business.
Posted Feb 14, 2019 - 03:39 UTC
Update
We are continuing to monitor Syrah platform.

Platform is still working through the backlog and we still expect some reduced performance.

Thank you for the patience
Posted Feb 14, 2019 - 02:11 UTC
Update
We have verified that the login and connection issues with Syrah(APAC) platform should be now resolved.

We will still be monitoring the situation though.

We are expecting the platform to work through some backlogs and this could, therefore, result in some reduced performance.

We will provide next update to this incident within another hour.

Thank you for your continued patience.
Posted Feb 14, 2019 - 01:32 UTC
Monitoring
We have verified that the login and connection issues with Syrah(APAC) platform should be now resolved.

We will still be monitoring the situation though.

We are expecting the platform to work through some backlogs and this could, therefore, result in some reduced performance.

We will provide next update to this incident within another hour.

Thank you for your continued patience.
Posted Feb 14, 2019 - 01:31 UTC
Update
We had identified the source of the problem to be related to AWS platform issues as suggested in our last update.

We are still monitoring the situation with AWS.

The latest update from AWS is that they are beginning to see the recovery for some Amazon Aurora Clusters and are continuing to work toward full resolution.

We will post another update as soon as there is a new one available from AWS.

Status updates on the AWS issue can be found here (https://status.aws.amazon.com/) under "Asia Pacific" or directly via the AWS RSS feed here (https://status.aws.amazon.com/rss/rds-ap-southeast-2.rss
Posted Feb 14, 2019 - 00:24 UTC
Identified
We have identified the source of the problem and it is related to AWS platform issues.

No further updates will be posted here until the issue has resolved by AWS. For status updates on the AWS issue can be found here (https://status.aws.amazon.com/) under "Asia Pacific" or directly via the AWS RSS feed here (https://status.aws.amazon.com/rss/rds-ap-southeast-2.rss)

We thank you for your continued patience.
Posted Feb 13, 2019 - 22:00 UTC
Update
We are continuing to investigate reports of login and connection failures to the Syrah (APAC) platform.

We will continue to provide updates on a half-hour basis. Thank you for your continued patience.
Posted Feb 13, 2019 - 21:17 UTC
Investigating
We are actively investigating reports of login and connection failures to the Syrah (APAC) platform.

We will provide an update to this incident within 30 minutes. Our apologies.
Posted Feb 13, 2019 - 20:46 UTC
This incident affected: Datto RMM (Syrah (APAC)).