The expression "assume breach" has become common in the information security industry. Far too often, intrusions go undetected for extended periods of time or until an external party discovers a breach and notifies the organization. Given the increasingly targeted and even personalized nature of attacks, network defenders must move beyond a reactive posture and instead hunt for unknown breaches. This systematic pursuit of unknown adversaries is known as threat hunting.
Hunting is not without its challenges. Defenders must be able to sift through mountains of data to rapidly detect and address a compromise. How is this done? You can get a taste of hunting threats on the cheap by making use of free and open source tools to analyze host and network data. This can demonstrate the power of hunting and perhaps whet your appetite for a full-featured threat hunting platform.
Hunting on networks
Security at the network level has traditionally been about searching for IoCs (indicators of compromise), such as blacklisted domains or IPs. However, malicious tradecraft is rapidly evolving. Adversary infrastructure is becoming harder to distinguish from legitimate services, and malicious actors routinely cycle through new and never-before-seen elements of their attack infrastructure. These techniques render most network IoCs quickly obsolete.
Collecting and analyzing DNS data is a great way to begin hunting on networks. There are various open source sensors -- such as PassiveDNS and sie-dns-sensor -- that can be placed at any point in the network (ideally on a local recursive DNS server) to passively capture DNS transactions. This data can then be transferred into a message queue like Kafka, which can feed it to any number of consumers to perform the necessary analysis for threat hunting. Network defenders can conduct a wide range of analyses on this passive DNS data to hunt for unknown intrusions in networks.
Example: Hunting DGA malware
After putting that foundation in place, the next step is looking at the collected data to find patterns and signals of malicious behavior that, with a relatively low false positive rate, provide the hunter with starting points to dig deeper into identifying unknown threats. A number of these signals -- a domain generation algorithm, to take one example -- can be applied to the passive DNS data to hunt for unknown malicious adversaries in your network.
DGA (domain generation algorithm) malware uses an algorithm to pseudo-randomly generate thousands of domains daily and attempts to connect to them to receive communications from a controller. In order to block DGA command-and-control traffic, security engineers must reverse-engineer the malware to predict all possible domains, and then either block or sinkhole the domains. This is tedious work, and it's difficult to keep up to date.
Fortunately, algorithmically generated domains have structural properties that are different from benign domains. Benign domains are generally chosen because they are easy to remember or reflect common words across a variety of languages. One fairly accurate approach to detecting DGA domains is to extract features like consonant-to-vowel ratio, longest consonant sequence, entropy, and common n-grams with dictionary words and analyze them in a random forest classification tree. Given the sophisticated nature of this approach, we have provided code that can be used for detection. This specific classifier detects abnormal lexicographical structures from common English words.
Hunting on hosts
The network is not the only place to hunt. Desktop computers and servers provide a wealth of data, including running processes, active network connections, listening ports, artifacts in the file system, user logs, and autoruns.
Autoruns are auto-starting locations where a malicious executable can persist across reboots on modern Windows machines. They're a good place to look for outliers and suspiciousness because files that automatically boot when a computer boots tend to be relatively consistent across a network, making pure outlier analysis feasible. Any autoruns showing up in only a handful of places may indicate trouble.
Example: Hunting via Autoruns
There are more than 100 possible autorun locations in Windows, including startup registry keys, services, drivers, browser extensions, and Office add-ons. Beyond covering the sheer number of locations, grabbing the necessary data for analysis is nontrivial due to the way data is formatted by the operating system. The Windows Sysinternals Suite (maintained by Microsoft) includes a tool called Autoruns to tackle this problem, free of charge. While not perfect, this tool pulls in the right data for most autorun items on a Windows system, hashes them, and allows for some basic enrichment.
After you've collected all the autoruns, they must be analyzed. Start by submitting all their hashes to VirusTotal. It will quickly tell you if any are known to be malicious and should prioritized for additional investigation. This can be done inline within Autoruns, or you can easily build something to automate the process using the VirusTotal API.
You shouldn't stop after scanning for known malware. It's now time to hunt for unknown malicious behavior and look for anomalies in the data. There are many ways to do this, but we'd recommend first stacking by hash and looking for outliers that don't match the general population of the data.
To do this, pull hashes of all autorun items as described earlier, and then list them out as
HOST:HASH. The figure below provides a concrete example of how this might look. Note that you will have many more autoruns for each machine in a real environment.
An easy next step is to delineate the output by colon (:).
# cat hash-map.txt | cut -d':'-f2 > hashes.txt
And then reduce and sort by the number of occurrences across your systems to quickly identify the anomalies.
In this example, there were 42 systems. Many autoruns appeared on each system. A couple only appeared on one. These outliers could be suspicious. A reasonable first step would be to look at the detailed output of Autoruns from the hosts where the outlier was seen. You might note a strange description, strange file name, strange autostart location, or other oddity.
These are not the only suspicious things you might note in autoruns data. There are many more approaches. You could take the exploration much further, for example, by indexing all of the data in a tool like Elasticsearch. This would allow for fast search capabilities across your data to include regularly collecting autoruns from your endpoints and looking for changes in autoruns over time. And, of course, there are many more endpoint artifacts that are prime locations for hunting. A true hunting effort should cover user logs, processes, network information, and more.
Today's adversaries are shrewd and sophisticated, pursuing theft and disruption with technology and techniques that are unique and never-before-seen. To counter these adversaries, proactive hunting techniques are necessary. Fortunately, it's possible to explore some basic hunting capabilities on the cheap to discover how a more proactive security posture can help detect unknown intrusions.
Mark Dufresne is the director of malware research and threat intelligence at Endgame. Mark worked previously in various aspects of cyber security as an operations chief and manager at the Department of Defense. Mark is a graduate from Johns Hopkins University, where he earned his master's degree in Security Informatics.
New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to email@example.com.