On 30-September-2021, Thursday at 6am UTC, Autotask PSA Partners hosted in our Philadelphia datacenter experienced a service interruption which caused incoming emails to not create tickets or add notes to existing tickets, and in some cases to create duplicate tickets or duplicate notes.
Our Infrastructure team worked to correct the problem and it has been resolved by 9:30am UTC on the same day.
The service interruption was caused by the procedure deleting emails marked as already processed failing to delete the emails. This resulted in the emails being queued up to be processed again, causing an increasingly longer queue and multiplicate processing of a single email.
Autotask uses a clustered environment to handle incoming email processing. Incoming emails are parsed and the associated information is used to create or update associated tickets. Once an email is processed, it is then deleted and the next email is processed.
Analysis revealed that the mailbox database had entered a bad state which led to the issue. We created a new mailbox and switched Autotask to utilize it. Incoming emails began immediately processing and deleting properly. We then migrated all unprocessed emails to this new mailbox so that the backlog could be cleared.
We have added logic to verify that emails are properly deleted upon processing. If an email is not deleted, the service is paused and we are immediately alerted.
In the coming weeks, we will migrate to a new email processing technology to further harden the incoming email processing flows.