Between November 4th and November 9th a release was rolled out to SaaS Protection. Following that release SaaS Protection partners experienced a service interruption which caused slow ingest in the UK. This initially affected all partners and end customers but was later localized to the new v3 infrastructure.
The root cause for this service interruption was identified to be a fix in the release for a long-standing Microsoft API defect which unexpectedly added a large number of SharePoint and Teams sites. This caused our Azure peering links to become saturated. Additionally, the large backlog of new SharePoint and Teams sites created a large backlog of services on the v3 platform.
Our Engineering team took the following steps to remediate the problem:
We’re in the process of expanding our data center footprint in the UK and bringing new v3 infrastructure online in Q1. This will allow us to manage the growth of the current v3 infrastructure and balance the load more effectively.
New procedures have been included in our regular infrastructure reviews to identify the signs of increasing stress on the database.
We are further enhancing our phased rollout of releases to protect against an unexpected load on the platform in the future.
A more detailed RCA is available. For access to this RCA please make a request directly to your PSM.