Microsoft Sentinel & Azure Arc – Troubleshooting Windows Event Logs
Last Updated: 25/11/2024
This post investigates troubleshooting Windows event logs from Azure Arc enabled machines into Microsoft Sentinel. Recently, I had an issue with 2 virtual machines (located on-premises) with Azure Arc enabled via public endpoint. Where Windows event logs were not ingesting into Microsoft Sentinel. The virtual machines are on Server 2019 and after a recent set of updates the Windows event logs were not being ingested into Sentinel.
This blog doesn’t detail the specific issue, as I am still unsure what caused it in the first place but have decided to list down some of the troubleshooting steps I took which might be handy for those of you having similar issues.
It’s worth noting that this is the rough order of the steps I took, it doesn’t necessarily mean they are the correct order to troubleshooting Windows event logs.
Check that status of the Azure Arc Machine
Navigate to your Azure Arc resource in Azure, in the overview page you will be able to quickly see the ‘Status’ of the machine. Ensure that this is ‘Connected’.
Check the Data Collection Rule
Navigate to your Sentinel instance, then to ‘Data Connectors’ and select ‘Windows Security Events via AMA’. In the connector page, select your data collection rule and inspect the configuration.
Note: Hit the ‘Show Selected’ button to show the resources that have been selected for this DCR.
Validate that the AMA agent extension is installed and updated
Head back to your Azure Arc resource, and this time navigate to Settings, then Extensions. You will see a list of extensions for your Azure Arc machine. Ensure that ‘AzureMonitorWindowsAgent’ has a status of ‘Succeeded’.
Validate the AMA agent is running on the server
Admittedly, this should have been one of my first steps to take, and whilst everything in Azure was telling me that the agent was running and deployed, its always worth double checking. Log onto your server and run the following command in an elevated PowerShell:
azcmagent.exe show
You should get an output, and the following fields should be showing as ‘running’:
- Agent Service (himds)
- GC Service (gcarcservice)
- Extension Service (extensionservice)
Check that the agent was able to download the DCR config to the server
DCR configuration also lives on the server and is worth checking to see if the latest DCR configuration has been downloaded. You can find this in: C:\resources\directory\AMADataStore\mcs\configchunks
Whilst your in this directory, check if: C:\resources\directory\AMADataStore\mcs\msconfig.lkg.xml exists. Open the file to see if it contains the ‘subscription’ section.
For me, everything seemed to check out. The agent was showing as connected on both servers and in Azure. The DCR rule looked correct (and hadn’t been changed) and was even getting an agent heartbeat. The DCR configuration was present on the servers and everything looked like it should be working…but still no event logs into Azure and Sentinel.
Steps to Fix
I decided to take a few steps to start ruling things out. I recreated the DCR just to be on the safe side. Then reinstalled the Azure Management Agent extension from Azure to one of the machines.
Next I removed the Microsoft Management Agent extension from Azure, as this was the legacy way of collecting logs.
From the looks of it, reinstalling the agent is what kicked it back to life. As the second machine had the legacy extension removed and still wasn’t working until I redeployed the Azure Management Agent. Its possible that the legacy agent was causing a conflict, however this was fully working a few weeks ago, so its hard to pinpoint the exact cause and fix.
Lessons Learnt
The biggest worry was that the heartbeat of the agent was coming in fine, but the event logs were not. Simply looking at the health via the heartbeat of the data source wouldn’t have indicated anything wrong. Until you need to look through events and see them missing, this speaks to the importance of ensuring that all your data sources are healthy.
Seeing as most of the Workbooks for data collection health focus on the Heartbeat table. I have created a quick query to show us the state of the machines based on the SecurityEvent table (Not the heartbeat table) and when these were last received. Anything over 10 minutes will change the state to ‘Unhealthy’.
With a bit of work, this query can be used in a workbook. For now, I am happy that my machines are reporting in with the event logs and have a quick query to see the health of the Event Logs from the machine, not just the heartbeat.
KQL Query used above: https://github.com/gennaromigliaccio/Sentinel/blob/main/KQL/WindowsSecurityEventHealth
Hopefully this article helps those who are also troubleshooting Windows event logs in Microsoft Sentinel.
If your interested in ingesting Windows event logs into Sentinel, please visit my previous posts on how to ingest Azure VMs and non-Azure VMs.