Smashing the Brittleness of Machine Learning
Burgeoning computer capabilities often are unreliable, or brittle, at first. Capabilities that work successfully in one instance may fail miserably when applied to another area. At the moment, machine learning is no different, experts say, and the government and private industry are endeavoring to get past the limitations to improve its use.
“Brittleness would be any system that, when applied to a specific problem, appears very competent, delivers great results, but when moved to another instance of that problem or a slightly different domain, tends to collapse wholly and be unusable,” says Eric Davis, principle scientist at Galois. “Its like a house of cards, and when you have to rebuild it for somewhere else, it’s this idea that we’ve invested a lot into one specific problem, and we get very little that’s generalizable.”
Davis explains that the experience of brittleness is not unique to machine learning, and he expects the industry to progress through that phase. “There’s nothing really new to the brittleness of machine learning,” he states. “It mirrors a lot of other developments that we’ve had in technology. We had brittleness originally in software engineering and development. We had brittleness in cybersecurity. It’s a very common component of new research areas.”
What is different for machine learning, however, is its dependency on big data. For machine learning to become more replicable, big data has to be available, and the data has to be formatted or standardized to enable that replicability, Davis notes.
Having access to regional data troves—such as from the National Science Foundation’s National Network of Big Data Regional Innovation Hubs—will work only if the associated machine learning capabilities are formalized and generalized, he stresses.
At first, as machine learning models developed, the pressure to build solutions quickly was so great that “it was more like the Wild West,” and certain models “won” in an area, Davis shares. The solutions were limited, however. For machine learning models used by scientists to research lead poisoning in Chicago, for example, it was “very hard to move those initial systems to apply to Cincinnati, Washington, D.C., or to Pittsburgh,” he states. “They are informal and bespoke models built specially for that problem and are going to be very specific to the circumstances and the data.”
In addition to being unreplicable, brittleness in a system could mean that users can’t always tell when a machine learning application will work or when it will fail. “Sometimes we’ll apply it to a new problem or the same problems with new data, and it’ll work fine,” Davis says. “Sometimes we’ll do it and we won’t get the right answer, and our system will fail or our system won’t even run on the data. And this is hard to predict ahead of time. So now we’re in the stage where we’re looking to make these solutions less brittle. We’re trying to find ways to make it more general and more formal and also prove things about our systems.”
That means focusing on the machine learning development environment and identifying best practices, as well as increasing the ability to verify when machine learning models are correct—“which becomes necessary when analyzing mission-critical systems,” Davis adds.
He cautions that generalizing machine learning approaches will take longer upfront but is a necessary step in improving the ability to deploy machine learning more broadly.
To build verifiable systems that are not only more general but also more formal, Davis expects the need to build a formal understanding of the full process of machine learning. “Building that formal intuition is something we’ve done with programming languages,” he suggests. NASA’s Java PathFinder project examined how to build Java code that does exactly what it was intended to do. The Haskell programming language was designed for verifiability.
Machine learning developments need to mirror this verifiability, Davis purports.
“It just comes down to doing the same sort of formalization within machine learning,” he states. “It means building formal end-to-end processes where we can say here’s what happens in each step. We need to be able to say here are the invariants at each step, the things that shouldn’t change, and when things do change, show the way in which they changed, in order to create systems for which we can build expectations.”
Mentioning a quote from Berkeley researchers, Davis warns, “That without formal understanding of the system, the system can never be trusted. It can only be surprising, and we want to avoid surprises to our systems. We want our systems to do what we expect and nothing else.”