Delayed scheduled shares

Incident Report for Funnel

Resolved

Last 12 hours have shown no errors and normal scheduling.

Posted Aug 26, 2022 - 08:20 CEST

Update

We are continuing to monitor for any further issues.

Posted Aug 25, 2022 - 19:01 CEST

Update

We are still monitoring this, there is currently a short delay.

AWS is fully recovered and we no longer see any increased error rates. However, there was a large backlog of work that took time.

Funnel is prioritising newer scheduled work. Some older runs have been skipped due to their age but will be or have already been run again based on their normal schedule.

Posted Aug 25, 2022 - 17:21 CEST

Update

AWS is making progress recovering.

Our error rates have now gone down and we expect normal scheduling freshness within a few hours.

Posted Aug 25, 2022 - 14:38 CEST

Update

We are still seeing high error rates.

There is no firm ETA from our infrastructure provider AWS, but they expect hours before they are fully recovered.

Posted Aug 25, 2022 - 09:26 CEST

Monitoring

From Aug 25 06:00 CEST / 9:00pm PDT / 12:00am EDT there are clear signs of recovery.

We have more running workers and are working on the queues. But are still seeing high error rates scaling. Funnel is automatically scaling and adapting as more capacity becomes available from AWS.

There are still long queue or errors that are caused by this 9 hour drop in capacity. Especially for European customers starting their day, data freshness could be impacted if the export they use has been impacted. Typical symptoms are:
- Data Warehouse - have a 'Pending' status
- Google Sheets - have an 'In progress' status
- Google Analytics Upload - will have status 'failed'

Posted Aug 25, 2022 - 08:14 CEST

Identified

From 24 Aug 21:30 CEST / 12:30pm PDT / 3:30 pm EDT we started getting increased latencies and errors scaling workers to handle scheduled exports. This will cause delays of scheduled exports or possibly also even errors due to time outs. Scheduled exports will be retried and then resumed based on their schedule.

This is related to an AWS "Operational issue - Amazon Elastic Container Service (N. Virginia)" showing capacity issues and a temporary stop of new creating new instances. See also https://health.aws.amazon.com/health/status

Posted Aug 25, 2022 - 07:58 CEST