Connectivity issue for agents 3.6 to 5.32.6
Incident Report for Datadog US1
Resolved
The incident has been resolved. For additional information on upgrading agents 3.6 to 5.32.6, see https://docs.datadoghq.com/agent/faq/certificate_verify_failed-error/.
Posted May 30, 2020 - 16:22 EDT
Update
We've released Datadog agent 5.32.7 which fixes the certificate problem and restores connectivity to Datadog.

To upgrade from a previous version of agent 5, use the install command for your platform:

Centos/Red Hat: sudo yum check-update && sudo yum install datadog-agent
Debian/Ubuntu: sudo apt-get update && sudo apt-get install datadog-agent
Windows: Download the Datadog Agent installer (https://s3.amazonaws.com/ddagent-windows-stable/ddagent-cli-latest.msi). start /wait msiexec /qn /i ddagent-cli-latest.msi
More platforms and configuration management options detailed at https://app.datadoghq.com/account/settings?agent_version=5#agent
Posted May 30, 2020 - 15:54 EDT
Update
We are continuing to monitor for any further issues.
Posted May 30, 2020 - 13:20 EDT
Update
We are continuing to monitor for any further issues.
Posted May 30, 2020 - 13:17 EDT
Update
What happened?

On Saturday May 30th, 2020, at 10:48 UTC, an SSL root certificate belonging to a certificate authority and used to sign some of the Datadog certificates expired, and caused some agents to lose connectivity with Datadog endpoints. Because this root certificate is embedded in agents from version 3.6 to version 5.32.6, some users will need to take action to restore connectivity.

What versions of the agent are affected?

Agent versions spanning 3.6.x to 5.32.6 embed the expired root certificate and are affected. Agent versions 6.x and 7.x are unaffected; if you are using these agents, no action is required on your part.

Fixing without updating the agent

We’re actively working on a new version of agent 5 but if you’d like to address this without an update, the following is the quickest path to resolution.

On Linux:

sudo rm /opt/datadog-agent/agent/datadog-cert.pem && sudo service datadog-agent restart

On Windows:

rm "C:\Program Files (x86)\Datadog\Datadog Agent\files\datadog-cert.pem"
net stop /y datadogagent ; net start /y datadogagent
Posted May 30, 2020 - 12:49 EDT
Monitoring
A fix has been implemented and we are monitoring the results. We are contacting customers directly who are running versions of the agent which may still have connectivity issues.
Posted May 30, 2020 - 10:54 EDT
Update
We've identified connectivity issues affecting certain versions of the agents - most notably agent 5, and are working to mitigate.
Posted May 30, 2020 - 10:17 EDT
Update
The API is back to normal and alerts have been re-enabled. We're still investigating connectivity issues between some agents and Datadog endpoints.
Posted May 30, 2020 - 09:26 EDT
Identified
The issue has been identified and a fix is being implemented. To prevent spurious alerts, we have temporarily disabled metric, event, service check, and composite monitors.
Posted May 30, 2020 - 08:17 EDT
Investigating
We're investigating increased errors receiving payloads from agents and delivering notifications, and snapshots are failing.
Posted May 30, 2020 - 07:50 EDT
This incident affected: Monitors.