A customer who uses Microsoft Advanced Threat Analytics (ATA) recently had severe issues with their ATA implementation. At first, the portal started to behave strangely, not showing all information in alerts and some configuration settings were missing. After a restart of the ATA servers, the services failed to start at all.
The Microsoft.Tri.Center-Errors.log file contained many errors like this:
2020-01-09 12:34:27.9920 1140 98 Error [CertificateExtension] Microsoft.Tri.Infrastructure.Utils.ExtendedException: There are no matching certificates [StoreLocation=LocalMachine StoreName=My thumbprint=89E1C9790B175D2E6B716CFDDABA3D9F444829F6]
It turned out that their internal PKI had automatically renewed the certificate that ATA was configured to use. In general, this is what you want from a PKI (auto-renewed certificates), but unfortunately, ATA does not support renewing an existing certificate.
The reason is that some ATA data is encrypted using the configured certificate, and during certificate renewal, the old certificate is removed, so you lose the ability to decrypt that data.
So you need to create a new certificate before the old one expires and manually configure ATA to use the new certificate.
This requirement is clearly stated in the Microsoft ATA-documentation:
You will even get alerts in ATA Health Center about upcoming certificate expiration:
Replacing the certificate is not really that difficult or time-consuming. But if you do not replace the certificate before it expires you will get this alert:
You can see that that when this happens, the only resolution is to redeploy your ATA, and you will lose all your configuration, alerts, and behavior analysis history.
Other services that use certificates can usually be recovered really easy from issues caused by expired certificates by simply getting a new certificate and pointing the service to the new certificate, but since the “certificate pointing” in ATA is done in the ATA Configuration, which is encrypted by the previous certificate, there is a catch 22 situation here.
Some people have tried to manually add the thumbprint of a new certificate in the SystemProfile_date.json configuration file, and they have gotten the ATA up and running again. However, they could not edit all ATA settings after that, so they eventually ended up redeploying from scratch.
Restoring from backup after redeployment will not work either since the backup still points to the old removed certificate. You can still use that backup configuration file as a manual reference on how to configure ATA again , since it is in cleartext.
So go ahead and make sure that your ATA implementation does not use a certificate that will be automatically renewed, and/or put a reminder in your calendar to renew it before it expires. And monitor those health alerts!