All live data including APM metrics are now current, as well as corresponding APM alerts. Note that a subset of historical APM metric data may still show gaps and will be recalculated, along with SLOs over the next 24h. We apologize again for the inconvenience this outage has caused.
Posted 5 years ago. Sep 24, 2020 - 21:40 EDT
Update
Events data is now current. We are continuing to backfill delayed data for APM metrics.
Posted 5 years ago. Sep 24, 2020 - 20:55 EDT
Update
Processes and NPM data is now current. We are currently processing remaining data backlogs and are continuing to backfill delayed data for events and APM metrics.
Posted 5 years ago. Sep 24, 2020 - 19:49 EDT
Monitoring
We are currently processing remaining data backlogs. We’re now current with Metric data and alerts, and are working on backfilling delayed data for events, APM metrics, processes and NPM.
Posted 5 years ago. Sep 24, 2020 - 18:31 EDT
Update
We are making further progress in the recovery of customer-facing systems. The web application and APIs are operational, so are logs and corresponding alerts, as well as live APM traces. A subset of metric data is still delayed and being caught-up. We are still however working on processing backloged APM metrics and other types of alerts.
Posted 5 years ago. Sep 24, 2020 - 17:19 EDT
Update
We are making further progress in the recovery of customer-facing systems. The web application and APIs are operational, so are logs and corresponding alerts, as well as live APM traces. A subset of metric data is still delayed and being caught-up. We are still however working on processing backloged APM metrics and other types of alerts.
Posted 5 years ago. Sep 24, 2020 - 17:17 EDT
Update
We are making further progress in the recovery of customer-facing systems. The web application and APIs are operational, so are logs and corresponding alerts, as well as live APM traces. A subset of metric data is still delayed and being caught-up. We are still however working on processing backloged APM metrics and other types of alerts.
Posted 5 years ago. Sep 24, 2020 - 17:03 EDT
Update
We are making further progress in the recovery of customer-facing systems. The web application and APIs are operational, so are logs and corresponding alerts, as well as live APM traces. A subset of metric data is still delayed and being caught-up. We are still however working on processing backloged APM metrics and other types of alerts.
Posted 5 years ago. Sep 24, 2020 - 16:59 EDT
Update
We are making progress in the recovery of customer-facing systems. Web application error rate is down, metrics data is available, although we are still catching-up on some of the delayed data. Logs data is available and timely. We are still working on re-enabling all functionality and catching-up our alerting systems.
Posted 5 years ago. Sep 24, 2020 - 16:28 EDT
Update
We are making progress in the recovery of customer-facing systems. Web application error rate is down, metrics data is available, although we are still catching-up on some of the delayed data. Logs data is available and timely. We are still working on re-enabling all functionality and catching-up our alerting systems.
Posted 5 years ago. Sep 24, 2020 - 16:26 EDT
Update
We are still working to resolve this outage. We are working to divert traffic away from the affected components and restoring our customer-facing services. Our mitigations are showing progress, but we are still observing high error rates in our web application and API, and delays in metrics processing and alerting.
Posted 5 years ago. Sep 24, 2020 - 15:36 EDT
Update
We are currently experiencing a widespread outage in our US-1 Data center, and all hands are on deck to resolve it - we are truly sorry for the inconvenience and are working towards a timely resolution. The infrastructure that allows the configuration and resolution of our services is currently severely degraded, causing a number of customer-facing services to be disrupted. This results in high error rates in our web application and API, delays in metrics processing and disrupts alerting.
Posted 5 years ago. Sep 24, 2020 - 14:32 EDT
Update
We are continuing to actively work to mitigate the internal infrastructure connectivity issue impacting multiple systems.
Posted 5 years ago. Sep 24, 2020 - 14:19 EDT
Update
We are continuing to actively work to mitigate the internal infrastructure connectivity issue impacting multiple systems.
Posted 5 years ago. Sep 24, 2020 - 13:16 EDT
Identified
We are actively working on an issue that affects internal infrastructure connectivity and is impacting multiple systems.
Posted 5 years ago. Sep 24, 2020 - 12:35 EDT
Update
We are continuing to investigate this issue.
Posted 5 years ago. Sep 24, 2020 - 12:31 EDT
Update
We are continuing to investigate the elevated error rate on the web application.
Posted 5 years ago. Sep 24, 2020 - 12:19 EDT
Update
We are continuing to investigate the elevated error rate on the web application.
Posted 5 years ago. Sep 24, 2020 - 11:42 EDT
Update
We are continuing to investigate this issue.
Posted 5 years ago. Sep 24, 2020 - 11:07 EDT
Update
We are continuing to investigate the elevated error rate on the web application.
Posted 5 years ago. Sep 24, 2020 - 11:06 EDT
Investigating
We are seeing an elevated error rate on the web application. We are currently investigating the issue. It's important to note that monitoring data is properly processed and that no data is lost.
Posted 5 years ago. Sep 24, 2020 - 10:27 EDT
This incident affected: APM, Metrics and Infra Monitoring, Monitors, and Web Application.