Elevated error rate with metric queries
Incident Report for Datadog
Resolved
This incident has been resolved.
Posted Feb 16, 2021 - 23:56 EST
Monitoring
All systems are recovered and stable. We will continue to monitor to ensure no further impact occurs.
Posted Feb 16, 2021 - 23:53 EST
Update
We are continuing to mitigate monitor related issues and are close full recovery.
Posted Feb 16, 2021 - 23:44 EST
Identified
We have identified the issue and rolled out a fix. Dashboard and API queries are returning to normal but metric and composite monitors are still impacted and potentially being skipped. We are continuing to work on addressing this impact.
Posted Feb 16, 2021 - 22:10 EST
Update
Mitigations have been put in place to reduce the impact and we are continuing to investigate the issue. Errors and slow loading graphs are still possible at this time.
Posted Feb 16, 2021 - 21:14 EST
Investigating
We are seeing elevated response times and error rates for metrics queries. This results in slowness or errors in loading graphs and potentially skipped or delayed monitor notifications. The problem is actively being investigated and we will update as we learn more.
Posted Feb 16, 2021 - 20:36 EST
This incident affected: Alerting Engine, API, Metrics Pipeline, and Web Application.