Enable breadcrumbs token at /includes/pageheader.html.twig

AI Cyber Challenge Prepares for Final Battle

The research and development competition aims to develop artificial intelligence for cybersecurity and will host its final challenge this year.
Image
The Emerging Edge logo

In August, the Defense Advanced Research Projects Agency (DARPA) and the Advanced Research Projects Agency for Health (ARPA-H) will conduct the final competition under the Artificial Intelligence Cyber Challenge (AIxCC) at DEF CON 33, the annual hacker convention in Las Vegas. The challenge aims to develop AI-enabled cybersecurity systems to safeguard critical infrastructure software.

The AIxCC is a two-year effort to develop novel AI systems for protecting critical infrastructure software that “enables modern life,” according to a DARPA article. “In an increasingly interconnected world, software undergirds everything from financial systems to public utilities. As software enables modern life and drives productivity, it also creates an expanding attack surface for malicious actors,” the article states.

In a recent SIGNAL Media interview, Andrew Carney, the AIxCC program manager for both DARPA and ARPA-H elaborated on what that means. “What we’re really talking about is everything from critical infrastructure that people think about as critical infrastructure—our water systems, our power grid, our transportation systems—to the services that are perhaps more software-based built on top of them, like the financial sector, or services like healthcare that involve a lot of complex technology,” Carney said. “All of these sectors are critical to our way of life in one way or another, and they are increasingly dependent upon layers and layers of complex software, and this software is pervasively vulnerable to cyber attacks. Given our position in the world, we are extremely vulnerable, and that’s a vulnerability that we’d like to eliminate.” 

Much of that software is open source, meaning it is written and maintained by volunteers and made publicly accessible, so anyone can modify it. “Open-source software is a major component of these software systems. We’ve seen over 80% of critical infrastructure applications involve some component of open-source software,” Carney added.

And those volunteers may not have the resources to ensure the software is secured. “In some cases, highly critical projects are maintained by a single person who does this as a side gig. They’re doing this in addition to their normal job. They’re not getting paid for this, and so the resources available for them to iterate, improve, refine and secure their software are far outweighed by the value in finding the vulnerability and supporting it.” 

The challenge is focused on so-called cyber reasoning systems (CRS), a term DARPA researchers may have coined and that certainly became prominent during the agency’s Cyber Grand Challenge that ended in 2016. With CRS, the AI system essentially takes on the role of a cybersecurity analyst, automatically identifying, analyzing and responding to threats. The systems are designed to reason and to understand complex cyber environments, use advanced algorithms to detect and patch vulnerabilities, and can even simulate attacks to strengthen defenses. 

The challenge “closes the feedback loop for software security analysis and automation,” and “could mean a future where we don’t read about ransomware attacks every day in the paper,” and “a safer, more secure world for all of us,” Carney offered. “Finding vulnerabilities fully automatically is all well and good. That helps. But if we can’t effectively remediate them, then we can’t close that loop and go through that refinement process seriously.”

Carney, who “spent a good chunk” of his career performing reverse engineering and vulnerability research, said the level of automation 15 years ago was very process-focused. “You would automate the task, and then you, as the human practitioner, would be responsible for ingesting and administrating all the results of these automated subtasks. The idea with a cyber reasoning system is to make that cyber reasoning system the brain, make that the administrator, make that the decider, make that the strategist. It’s important to note here that the CRSs in our competition are, as it were, fully automated. There’s no human in the loop.”

DARPA and ARPA-H are partners in the challenge. DARPA originated the idea and was discussing how to move forward with agencies outside the Defense Department. As those discussions progressed, it just made more sense to involve civilian infrastructure, specifically health care. The intelligence community, Homeland Security Department and others also could benefit. “The defensive mission focused on patching is beneficial to pretty much anyone who relies on software, which I realize cheekily, is all of us. We’re having conversations across the federal government, within the Defense Department and across civilian federal agencies,” Carney reported.

In fact, the solutions developed under AIxCC also will be open source and available to all. “They will be made available to the public at large, so there will be a huge opportunity to transition this technology into all of the use cases, and we want to make sure everyone’s prepared for it. It’s still a lot of work to just ensure that you prepare to take advantage of this sort of opportunity,” the program manager offered.

He dismissed the possibility that adversaries also will take advantage of the tools once they are publicly available. He cited a study from last year that found nearly 1,000 vulnerabilities in hardware and software. Of those, only four were exploited by the most active ransomware threat actors. “It’s a situation that we see in other sectors as well. The availability of vulnerabilities is not the limiting factor for our adversaries. Our ability to analyze our own software and patch it, however, that is something that requires a tremendous amount of effort today.”

The AIxCC effort uncovered evidence to that effect during the semifinal competition at last year’s DEF CON conference. For that stage of the competition, program officials made synthetic but realistic versions of five actual open-source projects, including the Linux kernel, and inserted synthetic vulnerabilities into those projects for the competitors to discover. The synthetic code was built on top of real-world code, and the program officials had measures in place for the teams to also count real-world vulnerabilities if detected. 

Each team was given four hours per challenge and a $100 budget and was told to use commercial large language models to find the vulnerabilities. “Collectively, the teams were able to find 22 synthetic vulnerabilities and patch 15 of them, which was fantastic. But even more interesting was one of the teams found a real vulnerability. They found a zero day in one of our challenge projects in SQLite, which we then reported to the SQLite maintainers and had it patched.”

It was Team Atlantis, which includes experts from Georgia Tech, Samsung Research, the Korea Advanced Institute of Science and Technology, and Pohang University of Science and Technology, that found the zero-day vulnerability. The team’s success earned a $2 million prize and a place in the upcoming August finals.

The team used its system known as Atlantis, an end-to-end, large language model-based bug-finding and fixing system designed to function entirely without human intervention, team member Hanqing Zhao explained in a blog post. It is capable of handling complex systems like the Linux kernel and supports a range of modern programming languages, including C/C++, Java, and others. It is specifically designed to replicate the behavior of human researchers, Zhao wrote. “Our design philosophy is simple: to emulate the mindset of experienced security researchers and hackers through [large language model] agents, enhanced with advanced program analysis techniques.”  

Keeping up with technology as it advances has posed a challenge for the competition, which already is challenging enough, Carney indicated. “We are effectively tackling a problem that evolves week over week, day over day. And getting the speed of our own infrastructure, of the challenge organizers, just getting all those pieces moving at that speed together, we’re doing it, but I think the whole team would agree that it’s a lot of work.”

Image
Andrew Carney
All of these sectors are critical to our way of life in one way or another, and they are increasingly dependent upon layers and layers of complex software, and this software is pervasively vulnerable to cyber attacks.
Andrew Carney
Program Manager, Artificial Intelligence Cyber Challenge

As of January, DARPA and ARPA-H were finalizing the schedule for next steps after the culminating competition. “We’re working with our partners on transition pathways both inside and in the federal government at large,” Carney noted. 

Four companies—Anthropic, Google, Microsoft and OpenAI—have signed up to be collaborators on the AIxCC program. “These sort of elite commercial cybersecurity folks get excited about this competition to push the edge of the art in this automated vulnerability discovery and patching space,” Carney asserted. 

In addition to Team Atlantis, the teams heading into the final competition are:

  • Shellphish, which includes members from Arizona State University, the University of California, Santa Barbara and Purdue University.
  • Theori, a cybersecurity firm headquartered in Austin, Texas.
  • Trail of Bits, a New York cybersecurity firm.
  • All_you_need_is_a_fuzzing_brain, a team led by Texas A&M with a participating member from City University of Hong Kong. 
  • Smart Information Flow Technologies (SIFT), a research and development consulting company with offices in Minneapolis and Boston.
  • 42 b3yond 6ug, a Northwestern University Team that includes collaborators from Johns Hopkins University, University of Colorado Boulder, University of New Hampshire, University of Utah, and University of Waterloo.

In the coming months, the teams will likely focus on incorporating rapidly advancing AI and machine learning capabilities in preparation for the final competition, which will be the toughest yet, the program manager intimated.

“A friend of mine described this to me, and I just I love this idea, that the reward for winning a pie-eating contest is often more pie. As our competitors did very well in the semifinals, we are looking forward to seeing how much pie they can handle this time.”