Securing the Artificial Intelligence Supply Chain May Require an Abundance of AI
The artificial intelligence (AI) safety rulebook is in a rewriting process by those delivering AI security. Application developers employ novel methods to protect the production chain’s three potential weak spots: training data, algorithms and hardware.
The approaches to protecting an AI supply chain are flexible and depend on the mission. Still, everything starts with data, and sometimes, critical information is a life-and-death issue.
Malicious actors targeting a capability that relies on AI can damage it by poisoning the inputs the model will run on.
“When I mobilized to go to Iraq, one of the challenges that was brought to me—and I was [Department of the Navy] CIO [chief information officer] at the time—was if somebody got into the blood supply database for my unit and modified our blood on record,” said Rob Carey, now president of Cloudera Government Solutions.
In this instance, altered records could have severely harmed or even killed warfighters under treatment. Carey dealt with this potential risk once it was brought to his attention, but the incident proved to be a lesson. When AI models become widespread, the corresponding data must receive special attention.
“The idea of the integrity of the data is absolutely paramount to the successful implementation of any decision-making that comes off of data-based decision-making,” said Carey, also a former principal deputy CIO for the Department of Defense (DoD).
Unvetted sources present one of the risks.
And there are two other less explored sources for models: one is for large language models (LLMs) and other chatbots that interact with humans to continue their training. The DoD has repeatedly mentioned leveraging these technologies for various uses in the military. This means those interactions could be poisoned deliberately to change the reliability of future outputs.
Another source is when actors use prompt language to obtain a result that the LLM itself should interdict.
These are prompt injection vulnerabilities, and these “involve crafty inputs leading to undetected manipulations. The impact ranges from data exposure to unauthorized actions, serving attacker’s goals,” according to a document by the Open Worldwide Application Security Project, a nongovernmental organization that works to improve software security.
An alternative approach is to avoid the problem of training data altogether when the mission allows. In the case of AI for preventing breaches, this may be the way.
“We don’t have an external training data set. I don’t learn attacks to predict attacks. I learn your environment,” said Marcus Fowler, CEO of Darktrace Federal, an AI company for cybersecurity.
 
For Fowler, unsupervised learning, if the context permits, is the way to stop malicious interference. Still, this AI protects AI applications—and the systems linked to them—without the risks of starting from a data set.
In line with Fowler’s view, federal agencies, in coordination with the Information Technology Sector Coordinating Council, call for the adoption of artificial intelligence to control system security and employ it for detecting anomalous behaviors, according to a recent document.
When developing tools, AI has proven to be a performance booster.
DevSecOps, or development, security and operations, benefits from using AI in this key stage of any software production chain, including other AI tools. Paradoxically, leveraging AI entails risks.
If language models employed for simplifying programmers’ jobs are tainted with malicious or weak inputs, the results will poison all stages of the production chain after this code is used.
One option is to assist programmers with AIs that will not take training data from large sets that could bring new risks but instead use sets from controlled enterprise environments where results have an extra layer of safety.
Nevertheless, despite some companies adopting these safeguards, production entails more than just coding, and technologies should be safely implemented, spanning full production chains.
“I can crank out code really fast and then throw it over the wall to a test team that can’t test it that fast. It doesn’t matter,” said Joel Krooswyk, federal chief technology officer at Gitlab.
Beyond the technical specifications to run the AI models, the federal government has a set of criteria that hardware must meet. One of the keys is complying with the U.S. origin of critical components and supply chain traceability rules, according to Fowler. Cloud services add an extra layer of security requirements.
But vigilance doesn’t end once compliant hardware is plugged in and playing.
“A compromised piece of hardware isn’t going to act strange when you plug it in. If the attacker is really thinking about it, they’re going to have a time delay; they’re going to have it when you’re no longer vetting and validating it,” Fowler told SIGNAL Media in an interview.
Therefore, a part of hardware performance relies on the same AI it runs to keep it safe.
“The AI always watches it, so the second it deviates from normal, or it deviates from its peer group, we’re going to learn of that, so I think it is about hardening that supply chain of AI—or with AI—from origin and delivery through operational use,” Fowler said.
Using AI to fix AI is already in use. And this, when applicable, provides a virtuous loop.
Additionally, the thinking behind creating AIs that work and are trusted by operators and whole command structures must stick to the perennial principles of the technology supply chain.
“I’m trying to encapsulate that cybersecurity and integrity of the network involves hardware, software and people—obviously—and data, so those four major elements you have to have all the proper protections in place,” Carey told SIGNAL Media in an interview.
It is crucial to look at the whole process and learn how different suppliers treat their inputs and outputs.
“If you can get transparency from the vendors that you’re working with, that’s huge,” Krooswyk said.
Integrity includes aggregation, providing full traceability of all the components used. For Krooswyk, sharing the sources and methods with the final user, as well as using trusted infrastructure and leveraging resources from well-established companies that have shown compliance and, in turn, the same principles in their work, assure everyone involved that an AI output can be trusted.
 
               
					 
 
 
 
Comments