One of These Things Is Not Like the Other
Cyber hunt teams look to machine learning to sort true security alerts from false positives.
Facing mounting threats, cyber hunt teams—aka security operations teams—are turning to machine learning technologies to sift through heaps of data and detect malicious activity faster than ever. People excel at making decisions with the right information, and machines excel at analyzing and retrieving actionable intelligence from large amounts of data. This duo is much more dynamic when working together than apart. Consider Tony Stark and his Iron Man suit versus the fictional character HAL 9000 from the Space Odyssey series.
The collaboration between man and machine is fodder for cyber hunt teams, which need help combing through large datasets to identify nefarious actors inside networks. When teams receive thousands of security alerts daily, they cannot waste time following false positives. Machine learning can help detect behavioral anomalies associated with all phases of the cyber kill chain and improve team effectiveness by reducing alert fatigue tied to traditional detection methods and by automatically tagging suspicious behaviors for follow-up.
Machine learning analyzes behavior across multiple dataset features. This approach enables the technology to detect continually changing attacks, including zero-day exploits. These attacks are both evolving and multiplying. During the first quarter of last year, Kaspersky Labs tracked 2,900 new malware modifications. By the third quarter, the company tracked 32,091 modifications. The problem is that malware authors can easily test against the latest anti-virus and firewall detection products and modify code to slip past safeguards. That is why rule-based systems simply cannot keep up. While ransomware itself is not a new type of malware, many of the propagation models behind last year’s cyber hacks are new. The improved models allow more ransomware creators to offer more malicious product, Kaspersky reports. Criminals who lack the skills or resources to develop their own malware rely on these models.
Granted, machine learning is no simple solution to jettison hacking attacks and cyber worries. When applying machine learning to threat detection, the lack of labeled datasets and the problem of class imbalance can be serious challenges to overcome.
The former can make detecting anomalies difficult. Here is why: Typically, training an algorithm to learn to distinguish between benign and malicious behavior requires examples of both kinds of behaviors. For example, training an algorithm to identify photos of cats requires photos of cats labeled “cats” as well as photos of other animals labeled “not cats.” But such labeled training data is hard to come by. In one month of network observations, there might be no examples of malicious behavior—good news for a secure network, but bad news for a behavior-labeling endeavor. This would be akin to training the cat photo identifier without any photos of cats. On the other hand, if analysts have a few examples of malicious behavior, such as ransomware and phishing emails, then they can train an algorithm using that dataset. But the algorithm would only identify those phishing emails and types of ransomware.
The second challenge when applying machine learning to threat detection, class imbalance, occurs when one class of data far outnumbers another. Perhaps only 1 percent of 10,000 network incidents might be considered malicious. Because machine learning algorithms generally maximize accuracy, an algorithm in this case could achieve 99.9 percent accuracy by labeling most threats benign. Even though it never catches malicious activity, the algorithm is almost always correct because malicious activity is so rare. This problem is not unique to security, but class imbalance sets another trap that a security-oriented learning algorithm must avoid.
Rather than label data benign or malicious, many machine learning solutions assume that the majority of data is benign and train algorithms to look for anomalous patterns. Continuing with the cat identification analogy, if a stack of photos featured at least 99 percent dogs, an outlier-detection algorithm would identify the photos of cats by essentially asking, Which of these is not like the others? In essence, the algorithm would exploit class imbalance to function without data labels.
An important point is that data outliers are not necessarily malicious. A single device on a network might behave differently than everything else because it is actually the only device of its kind on the network. Approving, or white-listing, remedies this issue, as does a hybrid algorithm approach that incorporates both outlier detection for automated identification of potentially malicious behavior and classification to incorporate manual feedback about what the outlier detector got wrong.
Nonetheless, using machine learning for security does not require any particularly novel algorithm, as the necessary theoretical math has existed for decades. The best machine learning solutions owe their success not to special algorithms but to the quality of data they analyze—the ingredients. Simply put, it is not the sausage grinder that makes the sausage special.
Today, most machine learning threat-detection solutions analyze server and firewall log files. The logs serve security teams well but are formatted inconsistently and must be normalized for analysis. Unstructured data will not build sound machine learning models. Logs also can be tampered with or deleted and are often batch processed rather than analyzed in real time.
Network communications such as logs are the foundation of network behavioral analysis. They provide a factual record of what occurs during an incident and are inherent to multiple stages of the cyber kill chain—reconnaissance, payload delivery, command and communications, lateral movement, staging, and exfiltration. In other words, attackers cannot hide their tracks as they move through the network perimeter to an endpoint and to the interior. Because network behavioral analysis centers on real-time observation of traffic, it also presents an opportunity for much more proactive machine learning that automatically surfaces anomalous patterns.
Machine learning is proven technology with irrefutable results in the security space. Cyber hunt teams can focus on real issues, and agencies can increase the productivity of analysts and identify breaches much faster.
Jesse Rothstein is chief technology officer and chairman of the board at ExtraHop Networks, which he co-founded in 2007 and led as CEO until last July. Edward Wu is a senior software engineer at ExtraHop Networks, specializing in applied machine learning for advanced persistent threat detection, network traffic analysis and behavior anomaly detection. The views expressed are their own.