Improving Battle Buddy Bots
Army researchers are advancing the robot-human partnership for warfighting environments.
Scientists at the U.S. Army Combat Capabilities Development Command Army Research Laboratory in Adelphi, Maryland, are preparing robots that can talk with soldiers, navigate in a “socially compliant” manner and learn from demonstration. The effort to enable robots to take verbal instruction, complete a series of complex tasks and maneuver in the same environments as soldiers is all part of the Army’s long-term endeavor to create fully skilled battlefield operators that work with warfighters, say Ethan Stump and John Rogers, roboticists at the Army Research Lab (ARL).
“It’s really focusing on the idea of robots being teammates rather than just tools,” Stump says (SIGNAL Magazine, December 2018, page 10, “Army Robots Prepare ...”). “Learning from demonstration, and how we actually teach the robots the concepts that are not necessarily easy to break down, is a way to make robots act more as teammates. But the main effort is talking with robots. This is essentially what underpins the entire intelligence architecture.”
Stump and Rogers, who received their doctorate degrees from the University of Pennsylvania and Georgia Institute of Technology, respectively, have both worked on the ARL’s Micro Autonomous Systems and Technology (MAST) program as students. MAST is a research consortium that the ARL has led for the last 10 years to advance unmanned air and ground vehicles; sensing, perception and processing; communication, networking and coordination, among other robotic-related areas for joint warfighters. More recently, Stump, Rogers and another ARL roboticist, Jon Fink, served as scientific advisers for the DARPA Subterranean Challenge.
Initially, to conduct their research on robot-human teaming capabilities, the ARL roboticists used the Husky unmanned ground vehicle platform from Clearpath Robotics Inc.
To support the Army’s Next Generation Combat Vehicle Cross Functional Team (CFT), however, the researchers are changing to a larger robotic platform, ClearPath’s Warthog. “We are moving to that as a surrogate platform for the Robotic Combat Vehicle (RCV), and that’s going to start to let us have a system that is looking more like what we can expect out of the RCV,” Stump says.
This past summer, the ARL roboticists conducted a pilot test at Camp Lejeune, the Marine Corps base in Jacksonville, North Carolina. The base has a military operations in urban terrain facility, known as a MOUT, which makes for a perfect robot-human testing ground, Rogers offers. “It’s set up to train Marines as they are preparing to deploy,” he says.
In conducting their testing, the researchers were positioned away from the robot and could see only what the robot saw, which is similar to a future military scenario. They started with the interaction of a single robot with one warfighter. “That’s really just a building block to getting to larger teams where one person is controlling multiple robots, as compared to the current state of the art, where multiple people are interacting with one robot,” Stump emphasizes.
In addition to specialized arms from the Jet Propulsion Laboratory, the robot is equipped with advanced sensors, a robust processing system, and a complex intelligence architecture made up of many components that allow the robot to maneuver and interact with soldiers using natural language.
“For the robot, it is not just trying to understand the words that were said and then connect them to the meanings,” Stump clarifies. “It is trying to figure out when there is missing information or an ambiguous meaning, and then using that to query back to the human and start a dialogue to clarify what they meant or ask for more information. We have a group that has been working on that topic. And so, what we’re trying to do is … connect their work on sophisticated dialogue systems and bring that in. We’re kind of at a nice point of being able to incorporate that so we can actually have meaningful back and forth interactions with the robot about what the person is saying and how it connects to the environment.”
In addition to the language aspect of the robot, the researchers are working on basic robot mobility in dynamic environments. Here, robots need to navigate in a “socially compliant” manner. If the robot is moving through a crowd, it should behave a certain way—not simply barrel straight through a group of people. The ARL’s solution is based on a deep-learning technique that runs on a neural network and stems from an academic partner’s thesis work on developing controllers. “We have trained a controller that can take into account the positions and trajectories of the people around the robot, and it can plan a route through that will be more socially compliant,” Stump notes.
To create such a system, the researchers observed what humans do when they are trying to move through a crowd of people. They looked at camera data taken from a busy shopping mall where scores of people were moving in and out of stores and identified passageways. “Humans move with a definite purpose,” Rogers says. “And our model has picked up on some of this determined motion, and we can take into account these other semantic entities, and the robot can think about, ‘A person came out of this door, and are they going to cross my path? Should I slow down a little bit just to make sure that I don’t come into contact or get too close to them?’”
Another aspect the researchers are tackling is advanced simultaneous localization and mapping (SLAM). Long used in autonomous platforms, SLAM helps a robotic system map its surroundings. But for the next generation of cooperative robots, SLAM techniques have to be more sophisticated than in the past to not only identify a robot’s surroundings, but also have an understanding of the objects or people around it.
“Our approach to robot mapping uses a kind of the modern approach to performing SLAM,” Rogers says. “We estimate the entire trajectory of the robot, instead of just building a map and keeping one map over the entire operation of the robot. We are constantly keeping track of the robot’s trajectory through space. And then we smooth that trajectory as we get new measurements.”
This technique, called semantic SLAM, builds a 3D model of the robot’s environment, but as it puts that together, it also estimates what the things are in the environment. The technique goes well beyond the characteristics of light detection and ranging, or LIDAR, which only tells users the distance to objects, not what the objects are, Rogers notes.
To achieve this understanding for the robot, the scientists are working to incorporate object classification technology with the mapping process. And “how to actually join the object detection process, recognizing objects or pieces of objects, and putting that together with mapping process,” it is not so easy, the researchers say.
“So if you want to start to understand what the environment looks like, and this is really important, if you want to start being able to give instructions to a robot and have it try to interpret those instructions in the context of the environment, it needs to understand what things are in the environment,” he says. “Now, rather than talking about the shapes that we’re seeing, which is what you get out of the LIDAR, we can talk about specific landmarks like the street sign or the cone that the robot saw.”
The semantic SLAM method allows observations of the detailed components of objects in a robot’s environment. “What’s cool about this is that you can actually put objects into the map and track them at lots of different levels of abstraction,” Stump explains. “The way that we’re doing it now actually, we’re not even recognizing full objects. We’re recognizing pieces of objects. For example, if you want to track a window, we can detect the corners of the window because those are very distinct.”
An inference piece lays on top of the mapping process that helps fill in more details. “Once you put them in the map, at some point, we have a process that sits on top of the map and does an inference that says, ‘I’ve got three or four things that look like window corners, and they’re spaced kind of how a window should be spaced according to some model or distribution,” he offers. “And so, we can actually make an inference that that must be a window and then we can find the bounds of that. We can assert it to the map, and now we’re tracking an object without having to do this sort of all-in-one-shot object detection that people typically do these days.”
They have to adjust certain measurements for accuracy, Rogers continues. “External measurements from GPS give us an absolute position, but that might be noisy,” he says. “As we get near a building, for example, we will get a bias measurement that is incompatible with kind of the underlying Gaussian assumptions in smoothing-based approaches. And so we need to filter those kind of measurements out.”
Other mapping observations also are incorporated into the process, such as when the robot revisits a previously seen location. “As the robot comes back to somewhere it’s been before, it actually can match the 3D laser scan data that it has taken in the past with its current view of that place,” Rogers clarifies. “All of these different measurements together come up with a good estimate of what the actual environment looks like after it takes all these measurements, and it moves throughout the entire environment.”
In addition, the researchers are incorporating techniques to be used with multiple robots in a team for multi-robot mapping. “The robots can actually share measurements across the team of robots and use this system to get a common frame of reference between them,” he notes.
Rogers also is focusing on enabling humans to train robots to navigate on the types of terrain humans prefer the robot to operate. This learning by demonstration piece is the key to having the robot handle whatever type of battlefield it might encounter.
The roboticists are using a deep-learning-based terrain classification mechanism that was built by academic partner Carnegie Mellon University. The mechanism generates semantic labels for terrain for robot understanding. “Basically, we’re trying to find out where is the road or different types of surfaces, vegetation or buildings—any sort of thing that might exist in the field of view of the robot,” Rogers shares. “We try and label it at every pixel. Each pixel in the camera image will pick up a label from this classifier. And then we take the output from that classifier and project it into the ground representation, so the robot can think about what the appearance is. And it can take into account what it can see and generate a sort of ground-based map of where these different types of terrain exist in the environment.”
When they tested the platform at Camp Lejeune this summer, the humans interacting with the robot gave it training examples on how to drive in some desired manner. “We can teach the robot certain behaviors that can be used in theater for example,” Stump says, noting that at Camp Lejeune they trained the robot to follow certain roadways. The humans started by driving the robot on short training examples. From that little piece of training data, the robot can generalize that it should drive on the roads and not drive onto the grass or through the trees.
The learning from demonstration piece is “fantastic,” Stump states. Because a soldier may have a specific way they want the robot to operate, and they cannot necessarily explain it to the robot through equations or by programming it, but they can show the robot.
“And we’re thinking very operationally,” Rogers adds. “If the robot is fully trained, and we think it’s ready to go, and it’s working alongside soldiers in the field, and the soldier sees the robot do something that he doesn’t want it to do, the soldier can seize control of the robot through his teleoperation interface, then correct its behavior and move it back to where the soldier thinks the robot should be driving. And then the robot will actually learn from that.”