AWS metrics are up to date and alerts on them have been re-enabled.
Posted Mar 30, 2016 - 18:18 EDT
Update
AWS has resolved their API issue and our CloudWatch metrics are processing the backlog at normal throughputs again. Respective alerts are still disabled until the backlog has been cleared.
Posted Mar 30, 2016 - 17:33 EDT
Update
AWS endpoints are still recovering: AWS CloudWatch metrics remain delayed and corresponding alerts disabled. We're testing a workaround to mitigate the impact of the AWS API errors and accelerate recovery.
Posted Mar 30, 2016 - 16:46 EDT
Monitoring
AWS endpoints are coming back online and we're progressively resuming collection of AWS Cloudwatch metrics. Corresponding alerts remain disabled until fully caught-up
Posted Mar 30, 2016 - 16:03 EDT
Identified
AWS is investigating elevated error rates in their authentication service which we use to collect AWS CloudWatch metrics. We're waiting for a resolution from Amazon.
Posted Mar 30, 2016 - 15:24 EDT
Investigating
We’re seeing a potential outage with some AWS API endpoints we’re using to retrieve metrics. As a result, AWS metrics are delayed and corresponding alerts have been suspended. We’re investigating further. All other metrics and integrations are unaffected.