Incident Response Part 1: IR on Microsoft Security Incidents (KQL edition)

It always happens on Friday afternoon, a high severity incident is created just before you want to start your weekend. After you have triaged the incident you suspect that an threat actor gained access to your environment. From that moment questions are starting to pop up in your head; what happened on this device? Are more devices impacted? What do I need to do to contain the incident? Will I be in time for dinner with my wife? To answer all those questions I have created an Incident Response blog series that consists of three parts:

  1. Incident Response Part 1: IR on Microsoft Security Incidents (KQL edition)
  2. Incident Response Part 2: What about the other logs?
  3. Incident Response Part 3: Leveraging Live Response

/images/kql-incident-response/Incident.png

This week we start with the first part: IR on Microsoft Security Incidents (KQL edition). The next parts will follow in the upcoming weeks! In this blog, we discuss how KQL can help answer the questions you have once an incident has been created. The starting point of the blog will be a Defender For Endpoint incident that has triggered, based on that we will discuss the following actions:

  1. Mayday Mayday Mayday
  2. KQL DFIR
  3. Summarize Malicious Activities
  4. Listing Inbound Connections
  5. File Analysis
  6. Registry Run Key Forensics
  7. More IR Queries
  8. MDE Settings

Before we dive into the technical aspects of this blog, it is important to first share some theoretical background on incident response, this is done based on the NIST Computer Security Incident Handling Guide. Incident response is often thought of as something we just do, but it consists of four pillars:

  • Preparation
  • Detection and Analysis
  • Containment, Eradication, and Recovery
  • Post-Incident Activity

Those pillars are phases in your incident response lifecycle. This blog will focus on the Detection and Analysis phase, if you start implementing some of the items from this blog, you are already working in the Preparation phase! The preparation phase in incident response is crucial as it lays the foundation for an effective and efficient response to security incidents. Preparation includes: selecting the tools, performing training, planning response actions and many others. For the incident response best practices the incident response overview page by Microsoft can be useful to bookmark!

Mayday Mayday Mayday

The image below shows the incident that we are dealing with today, one incident consisting of 19 alerts. Based on the information provided by Defender For Endpoint we can quickly collect the Evidence and Response to further investigate those activities. This includes information on the indicators of compromise that are related to this incident, such as IPs, Hashes and commands. A timeline search could be one of our next activities to investigate what has happened on the system, but if the actions are spread over a larger amount of time it can quickly become infeasible to investigate those activities manually. From now on we move over to KQL to investigate these activities and in the case of a real incident, before you move on hit that isolate button!

/images/kql-incident-response/alert.png
Defender For Endpoint Incident

KQL DFIR

The DFIR Section of the GitHub repository is used for most of the KQL queries. Before we touch base on those queries we first highlight the variables that are used in the queries. Each query uses let variables as input to scope the results. The variables are most often the name of the compromised device (or account), in our case that is testmachine3 and the SearchWindow which is time to look back in the logs.

let CompromisedDevice = "testmachine3";
let SearchWindow = 48h; //Customizable h = hours, d = days
next()

IR tip! Use the KQL next() function to get the difference between the current line and the next line. This can be very useful to detect subsequent actions, due to the limited timeDiffInMilliseconds. This example uses milliseconds, but all other timeframes can also be used.

DeviceProcessEvents
| where DeviceName == "testmachine3"
| sort by Timestamp asc
| extend timeDiffInMilliseconds = datetime_diff('millisecond', next(Timestamp, 1), Timestamp)

Summarize Malicious Activities

We start of with a KQL query that summarizes all the malicious activities on a device. The query combines multiple malicious activities that originate from a compromised device, which are put into one query. Those activities include:

  • ASR Triggers
  • SmartScreen Events
  • Antivirus Detections
  • Tampering Detections
  • Exploit Guard Triggers
  • AMSI Events

A sample of the query is shown in the code block below. The query stacks multiple individual queries and combines them together in the results using the union operator. As can already be derived from the first query is that preparation is key, because you do not want to build those queries after incidents have been triggered. Save yourself some time to prepare such queries beforehand, your time to respond to incidents will go down with good preparation.

let ExploitGuardEvents = DeviceEvents
     | where Timestamp > ago(SearchWindow)
     | where DeviceName == CompromisedDevice
     | where ActionType startswith "ExploitGuard"
     | project Timestamp, DeviceName, ActionType, FileName, FolderPath, RemoteUrl;
// List all amsi events
let AMSIEvents = DeviceEvents
     | where Timestamp > ago(SearchWindow)
     | where DeviceName == CompromisedDevice
     | where ActionType contains "Amsi"
     | extend Description = tostring(parse_json(AdditionalFields).Description)
     | project Timestamp, DeviceName, ActionType, Description, FolderPath;
// Combine all results into one output
(union isfuzzy=true
     (ASREvents),
     (SmartScreenEvents),
     (AntivirusDetections),
     (TamperingAttempts),
     (ExploitGuardEvents),
     (AMSIEvents)
     | sort by Timestamp
)

The results show that 5 different malicious activities have been performed. Just running a ‘simple’ query is quite effective isn’t it? Now it’s up to you again, are those activities malicious? Do we need to run some more queries? (Always a good answer :).)

/images/kql-incident-response/ListMaliciousActivities.png
List Malicious Activities

The queries that are used in each screenshot can be found in the Github repository. Just follow the link below, copy the query and enjoy!

KQL Query List Malicious Activities

Listing Inbound Connections

Based on the summarized results we want to get more information from this device. How did the adversaries gain access to the system? The query for this section lists all network events with a InboundConnectionAccepted ActionType. That an inbound connection has been accepted does not necessarily mean that an adversary got access to your device, because in most cases you first make a connection to a system and then perform the authentication. In the case of the results we got, we see a lot of different IPs from a variety of countries have connected on port 3389 (RDP). Is that a reason for concern? It depends, the RDP port will probably be open to the internet, but if no successuccessful full login attempt is made within that same period we still do not have to fear too much.

/images/kql-incident-response/InboundConnections.png
Inbound Connections Accepted

When combining the DeviceNetworkEvents and the DeviceLogonEvents we get some interesting results. We created a query that joins all logon actions from the compromised device with the network events (query after image). Furthermore, we calculate the time difference between the InboundConnectionAccepted and the login action.

| extend TimeDifference = datetime_diff('second', LogonTime, Timestamp)

If a succesful login is performed shortly after the InboundConnectionAccepted with a RemoteInteractive logon type, we can conclude that the logon originated from the IPs we saw in the network traffic. Based on the file analysis we have now tracked the source of our alerts!

/images/kql-incident-response/LoginActions.png
Successful login actions combined with

// Add the device you are investigating in the CompromisedDevice variable
let CompromisedDevice = "testmachine3";
let SearchWindow = 10d; //Customizable h = hours, d = days
DeviceNetworkEvents
| where Timestamp > ago(SearchWindow)
| where DeviceName == CompromisedDevice
// Only list accepted inbound connections
| where ActionType == "InboundConnectionAccepted"
| join kind=inner (
    DeviceLogonEvents
    | where DeviceName == CompromisedDevice
    | where Timestamp > ago(SearchWindow)
    | where ActionType == "LogonSuccess"
    | project-rename LogonTime = Timestamp)
    on $left.DeviceId == $right.DeviceId
| extend TimeDifference = datetime_diff('second', LogonTime, Timestamp)
| where TimeDifference between (0 .. 150)
| project Timestamp, LogonTime, TimeDifference, DeviceName, RemoteIP, LogonType, 
          AccountDomain, AccountName, ActionType1

KQL Query (Public) Inbound connections to a compromised device

File Analysis

One of the alerts that was also related to this incident is shown below. The actions seem to be triggered from the commandline execution:

cmd.exe /c ""C:\Users\administrator1\Desktop\VeryCoolScript.cmd" "

/images/kql-incident-response/PasswordStealing.png
Alert: Password stealing from files

From the alert it seems that only the find string process has started as subprocess of the VeryCoolScript.cmd file, but based on previous experience we want to confirm this by running a query that lists all subprocesses of a file. The query is too long to share here, but is listed at the end of this section. The query takes either a SHA1 hash or filename as input and lists all the associated Subprocesses, FileNames, FileLocations and SHA1 hashes that the file has had.

In our case the hash of the bat file VeryCoolScript.cmd is used as input. The first information that has been returned is related to the name, location and hashes (only if the filename is used as input multiple can exist). In this case, the file has two names VeryCoolScript.cmd & VeryCoolScript.txt*. But what is more interesting are all the subprocesses that this file has created, which way more than first anticipated based on the alert. It seems that the adversary has automated various discovery activities within this script. This proved that executing KQL for the enrichment of your alerts can be very useful! The query will return the subprocesses of any file, in this case, we used a cmd file, but any file extension will work.

/images/kql-incident-response/FileProcesses2.png
Processes created by file

KQL Query Find all the processes a file has created and the associated FileNames, FileLocations and SHA1 hashes that the file has had.

KQL CAFE
Last year I was invited to do a session @KQL Cafe and I have discussed some of the queries mentioned in this blog. If you are interested you can use this link to view it (my part starts around minute 13.)

Registry Run Key Forensics

Registry Run Keys can be abused by adversaries to get persistence, as mentioned in T1547.001. The incident that was triggered also generated a Anomaly detected in ASEP registry alert. The Defender For Endpoint description of this alert is: A process registered a suspicious command or file in ASEP registry key, where it will be run after a reboot. An attacker may place a malicious piece of software in such a location to prevent losing access if a machine is turned off.

/images/kql-incident-response/ASEPRegistry.png
Anomaly detected in ASEP registry alert

To investigate this a query is used that first defines all the registry run keys to search on. To optimize the results, a search can be performed on multiple devices at once. Once we hit shift + enter we get the results, this includes all run key changes from the compromised devices as seen below. For each device an entry is made, this will return the count of runkey changes and by clicking on the returned row all details are shared. Now it is up to you again to investigate if those new values are suspicious or benign.

/images/kql-incident-response/ASEPResult.png
KQL Registry Run Key Forensics

KQL Query Registry Run Key Forensics

More IR Queries

Only a subsection of the available KQL queries has been shared in this blog, but there are many more available for a variety of different incidents. All the other queries are shared in the GitHub repository in the DFIR category: DFIR Queries. Some ideas of queries that are listed:

I hope that this blog has shown the importance of preparation in order to effectively and efficiently respond to security incidents. In the case of a critical incident, you will most often not have the time to create such queries to quickly search or list malicious activities, the recommendation would be to prepare KQL queries for common response steps. From my perpective, the queries can add some quick value to an incident by adding context, details and combining data.

Automation?

Yes! You can automate the DFIR queries with Playbooks/logic apps and automation triggers in Sentinel. I do recommend the following blog if you are interested in automating (a part of) this:

Defender For Endpoint Automation: Link

If you want to query data in Sentinel you can replace the Advanced Hunting action with the Run query and list results from the Azure Monitor connector.

MDE Settings

If you have Defender For Endpoint the Automated investigation and remediation (AIR) capabilities are configurable (Microsoft Defender for Business is not configurable). Those settings can save you from a nasty breach if they are properly configured. Automatic attack disruption in Microsoft 365 Defender does similar work for identities and devices and is also configurable.

What settings do I recommend? Aim for full automation if possible, if not possible put all your DeviceGroups that contain cattle in full automation and pets in semi-automation. Full automation means that remediation actions are taken automatically on artefacts determined to be malicious, this is also the default in Defender For Business. Semi-automation requires manual approval before action is taken.

Especially if you do not have the manpower to respond to cyber attacks 24/7 it is recommended to fully automate the remediation process. If you cannot perform this, build a group that is excluded and automate all other devices. A similar approach should be taken for identities to put most accounts in automation, but exclude some if needed (or if it is a break glass account).

Documentation on Automation levels in automated investigation and remediation capabilities: Link Documenation on Automatic attack disruption in Microsoft 365 Defender: Link

Questions? Feel free to reach out to me on any of my socials.