AI Reinvents Complex Cyber Attack Replication for Critical Infrastructure Protection
Researchers at the Department of Energy’s Pacific Northwest National Laboratory (PNNL) say they hope to eventually open-source a new tool for rapidly and inexpensively replicating complex cyber attacks—a key process for developing effective defense techniques for critical infrastructure.
A collaboration between PNNL and Anthropic uses the company’s large language model (LLM) known as Claude to dramatically automate cyber attack emulation, reducing both time and costs. PNNL’s new system, called ALOHA—Agentic LLMs for Offensive Heuristic Automation—reduces the cyber attack replication process from weeks to hours.
The technology works in concert with MITRE’s open-source Caldera software, which helps prepare for and defend against cyber attacks, according to a PNNL article. When an attack occurs, a human enters a text description of the attack into ALOHA and instructs the program to recreate the steps necessary to emulate that attack. A complex attack chain might include 20 different tactics encompassing 100 different steps—all of which need to be reconstructed.
During a recent Teams interview with SIGNAL Media, Loc Truong, a PNNL data scientist who leads the ALOHA research effort, and Kristopher Willis, a PNNL cyber researcher, explained that attack replication is critical but expensive and time-consuming, with some companies charging tens of thousands of dollars to do so.
As one example, Truong described a common ransomware technique in which massive numbers of files are aggregated and encrypted by a binary code, moved to a particular location and deleted. “Usually, the process is very costly for people to reproduce the attack and can take a team of experts, in the past, a few weeks to months and a lot of money,” he said. “We hope to create a tool and techniques to bring down the cost of attack replication so that we can protect critical infrastructure faster when these exploits are discovered.”
Willis recalled that a few years ago, someone leaked the playbook for Conti ransomware, which was first used by the Russia-based criminal group known as Wizard Spider. Conti encrypts victim data, proliferates across the network, covers its presence and provides criminal hackers with full control. It has evolved into ransomware-as-a-service.
But with ALOHA, Willis suggested, the entire playbook can be fed into the system, which will then develop counters for every attack. “This book was about 30 pages, 40 pages long, that someone had leaked to GitHub. You can take the Conti playbook, feed it into ALOHA and be able to build all of the tactics, techniques and procedures.”
ALOHA can even cope with the playbook’s English translation, which, in some cases, improperly describes some attacks. “They had wrong commands in there, and so these are things that can be picked up as a signature for that particular adversary. For a person who just gathered that a few years ago, that took about 20 to 30 days to go through that, build all the tactics, techniques and procedures and make sure that they would work. This does it in a matter of an hour.”
During simulated attacks on a water treatment plant at PNNL’s Control Environment Laboratory Resource, ALOHA completed attack sequences involving more than 100 steps in just three hours. This speed enables defenders to identify vulnerabilities and reinforce systems far more quickly than before, according to the PNNL article.
“PNNL’s use of large language models to simulate attacks on critical infrastructure is crucial for understanding the national security implications of increasingly capable AI,” Marina Favaro, Anthropic’s national security policy lead, said in the article. “This kind of collaboration helps us better understand the national security landscape and feeds directly into our safety processes and how we build Claude.”
An Anthropic article explains that PNNL developed a “scaffold” for Claude to automate and accelerate this process of adversary emulation. This scaffold allowed natural language prompts to be quickly translated into complex attack chains, in part by predefining some code-based “tools” that allow the model to more easily take actions on computer networks.
During one of the runs of this test, Claude proved resourceful. One of the predefined tools built by the researchers as part of the scaffold was a mechanism for bypassing a security feature in Windows called User Account Control. However, this mechanism was not always reliable and sometimes failed. Sensing one of these failures through reports of an unsuccessful attempt to use its tool, Claude identified and used a different, known user account control bypass technique to accomplish its goal.
Truong explained that the attack being replicated in that simulation allowed a regular user to become a privileged user. One step required a particular payload to be downloaded from the system; however, the network connection was disrupted, preventing the download.
ALOHA, which includes a feature to automatically retry, used the text description of the attack to keep trying. “It explored that issue itself without using the special payload we crafted ahead of time. Now it has gone ahead and done something outside of what we originally planned, but the objective was reached. This iteration process allows for something powerful,” Truong said.
The researchers said they might ultimately commercialize ALOHA as an open-source technology. “When I created this project, I’ve always [wanted] to help defenders however I can, so if at all possible, we would love to be able to open source it to the community,” Truong shared.
There’s a few different pathways there, but we’re going to focus a bit on vulnerability, or at least software vulnerability remediation.
The Federal Source Code Policy program requires agencies to release at least 20% of new custom-developed code each year as open-source software, but the approval process to do so can present challenges, leading the researchers to express a degree of caution. “For ALOHA itself, I’m reluctant to say it will be open source. We may go down that pathway, but there are a few roadblocks that would have to occur to make it open source. But that’s certainly a possibility,” Willis said. “You go through the commercialization office, and they go through all of the steps to do that. It’s just a process that you have to go through.”
Part of the process includes proving the software is ready for public release, Truong noted. “It takes a long time to prove a piece of software is safe and secure before you put it out there. And sometimes you don’t get permission at all, so we’ll see.”
The team is already working with Anthropic and is talking to others in government and industry to explore commercialization possibilities. “We are in the process of talking to various partners to see how we can best share the techniques and what we have developed,” Truong reported.
The next step, according to the PNNL article, is to have people test ALOHA in different types of systems and expose it to more and more use cases. The researchers say they are integrating the technology into other products. So far, the focus has been on adversary emulation and continuous loop integration, but now the team aims to apply ALOHA to cyber reasoning systems, automated systems that identify, analyze and patch software.
When a vendor applies a software patch, the question remains whether the software is still vulnerable. ALOHA may answer that question.
“What we want to do is take a proof of vulnerability ... and develop that into an actionable proof of concept. We can develop tactics, techniques and procedures around that patch to understand if the remediation worked. But we could also go even further and make a remediation ourselves. There’s a few different pathways there, but we’re going to focus a bit on vulnerability, or at least software vulnerability remediation,” Willis reported.
Truong pointed out that the new effort will change the focus. “Previously, we looked at infrastructure, and now we want to see if we can apply it to software patching at the lower level. Can we confirm that the remediation works? Is there a way to get around it that we have not accounted for, a blind spot that we didn’t consider?”
The team also plans to improve ALOHA as capabilities advance. “Technology is constantly improving, and you can see model capabilities are constantly improving, and so we are observing what are the state of the art out there in capabilities, and then making sure that ALOHA matches all of the latest capabilities,” Truong said. “We are making sure that as this latest model gets released, that it has the capability to use more and more complex, sophisticated tool sets, that ALOHA is armed with those tool sets to do things at the state of the art.”
Comments