Quality Data and Validation Are Critical for AI
Making meaningful strides in the world of artificial intelligence (AI) and machine learning requires a focus on quality data and a mindset shift to data as a product and not an asset. Craig Martell, Ph.D., chief digital and artificial intelligence officer, U.S. Department of Defense (DoD), discussed the department’s AI priorities and challenges during a fireside chat at TechNet Cyber in Baltimore on Wednesday. He also shared his fears about generative AI models, like ChatGPT, that have captured public attention.
Martell is the inaugural chief digital and artificial intelligence officer for the department and has been in his post for 10 months. His organization, born from the merging of four other organizations, is charged with building tools that can scale data analytics and AI across the DoD.
When it comes to his digital hierarchy of needs—where he feels the department should expend energy and resources—Martell’s focus is firmly set on quality data. “The foundation of AI is quality data,” said Martell. Without it, analytics and AI are meaningless.
“What do I mean by quality data? It’s correct and has ownership, but it’s easily accessible and updated on a sufficiently regular bases,” explained Martell. “Right now, we’re very stovepiped. Right now, we treat data as an asset, and that’s problematic.” An asset implies that something needs to be protected or safeguarded. Data should be viewed as a product, argued Martell. Data has customers. Those customers have varying and contradictory needs, and someone has to own that product and help their customer be successful. “Our biggest mind shift is from ‘Data that I protect and you don’t get,’ to, ‘Data only has value to the degree that the data’s customers are successful.’”
One major goal for Martell’s organization is making sure they set Joint All-Domain Command and Control (JADC2) on the right path. “JADC2 is not a product or a destination or a collection of tools; it is simply a way that we need to do business. As a way to do business, it needs the appropriate infrastructure to allow data to flow to the right places,” said Martell. The push from the top down is to make sure data is available and high quality, which requires both technology and policy changes.
But Martell emphasized that quality data isn’t enough—the real value comes from knowing what that data means to the users, and that requires proper labeling. “Telling the system, this is A and this is B, and I need you to discern between them.” That’s the hardest part of the AI/ML pipeline, he reflected. It’s all about getting the data right and getting the analytics right. “If we got those two right, we would solve so many problems in the department.”
When asked about his opinion on generative AI models like ChatGPT, Martell responded—“I’m scared to death. That’s my opinion.” The fear stems from concern that users will trust the large language models too much without the providers of the services building in the right safeguards and the ability to validate the information.
JADC2 is not a product or a destination or a collection of tools; it is simply a way that we need to do business.
Services like ChatGPT don’t understand context, said Martell. It speaks fluently and authoritatively, giving people a false sense of confidence in the results. “If you don’t validate it, then you’re being irresponsible,” he noted. Martell emphasized that users should only use it for fun unless they’re willing to follow the rabbit trail and validate the sources.
He called on industry to help address these issues. “Don’t just sell us the generation. Work hard on the detection; change the way we consume the generation so we have the tools to decide when it’s right and when it’s wrong.”
“Everyone wants the magic solution,” replied Martell. "There’s no magic folks."