We are continuing to monitor for any further issues.
Posted May 30, 2020 - 13:20 EDT
Update
We are continuing to monitor for any further issues.
Posted May 30, 2020 - 13:17 EDT
Update
What happened?
On Saturday May 30th, 2020, at 10:48 UTC, an SSL root certificate belonging to a certificate authority and used to sign some of the Datadog certificates expired, and caused some agents to lose connectivity with Datadog endpoints. Because this root certificate is embedded in agents from version 3.6 to version 5.32.6, some users will need to take action to restore connectivity.
What versions of the agent are affected?
Agent versions spanning 3.6.x to 5.32.6 embed the expired root certificate and are affected. Agent versions 6.x and 7.x are unaffected; if you are using these agents, no action is required on your part.
Fixing without updating the agent
We’re actively working on a new version of agent 5 but if you’d like to address this without an update, the following is the quickest path to resolution.
On Linux:
sudo rm /opt/datadog-agent/agent/datadog-cert.pem && sudo service datadog-agent restart
On Windows:
rm "C:\Program Files (x86)\Datadog\Datadog Agent\files\datadog-cert.pem" net stop /y datadogagent ; net start /y datadogagent
Posted May 30, 2020 - 12:49 EDT
Monitoring
A fix has been implemented and we are monitoring the results. We are contacting customers directly who are running versions of the agent which may still have connectivity issues.
Posted May 30, 2020 - 10:54 EDT
Update
We've identified connectivity issues affecting certain versions of the agents - most notably agent 5, and are working to mitigate.
Posted May 30, 2020 - 10:17 EDT
Update
The API is back to normal and alerts have been re-enabled. We're still investigating connectivity issues between some agents and Datadog endpoints.
Posted May 30, 2020 - 09:26 EDT
Identified
The issue has been identified and a fix is being implemented. To prevent spurious alerts, we have temporarily disabled metric, event, service check, and composite monitors.
Posted May 30, 2020 - 08:17 EDT
Investigating
We're investigating increased errors receiving payloads from agents and delivering notifications, and snapshots are failing.