Soldiers Search Syntax
Adept integrated transcription, translation, distillation system quickly finds buried foreign language nuggets.
Civilian Defense Advanced Research Projects Agency (DARPA) employees use the agency’s Global Autonomous Language Exploitation (GALE) system at an operations site. This BBN Technologies version of GALE extracts, transcribes, translates and distills Arabic news in near real time from all media sources, including blogs.
Critical actionable military data obscured by foreign languages and often masked in large volumes of different types of media are both highly important and perishable. The global deployment of a dozen monitoring systems is enabling software applications to transcribe and translate both text and speech and distill large volumes of information in multiple languages, including Arabic and Chinese.
The Global Autonomous Language Exploitation (GALE) program, underway at the Defense Advanced Research Projects Agency (DARPA),
The five-year GALE program continues toward aggressive targets to improve today’s translation techniques, which are about 75 percent accurate. The goal is to provide the military with translations that are accurate 85 percent to 95 percent of the time and consistent in 90 percent to 95 percent of the documents translated.
According to Dr. Joseph Olive, the program’s software technologies absorb, translate, analyze and interpret huge volumes of speech in text, easing the load on linguists and analysts. Olive is the GALE program manager in DARPA’s information processing technology office. He earned a bachelor’s degree in physics, a master’s degree and a doctorate from the
Automatic processing engines convert and distill data, delivering pertinent, consolidated information in easy-to-understand forms to military personnel and English-speaking analysts in response to direct or implicit instructions.
A predefined template is used to interface with the machine, instructing the system to look for a specific person’s movements, for example, or specifying topics, dates or locations. There are numerous ways to identify and obtain needed information successfully from the system, Olive reveals. Vigorous, multidisciplinary research brings together previously separate efforts and develops tightly coupled solutions. Topic identification can help transcription and translation. Accurate name tagging helps avoid translating names, and accurate parsing prevents the computer from producing sentences that confuse the subject and the object.
GALE machine translation is not like a dictionary vocabulary lookup or a simple grammatical transfer, and it is not limited to specific topics. Indeed, it is statistical machine translation with integration of grammar and other components applicable to a broad range of topics, Olive emphasizes. “The GALE distillation process is not a Google-like search engine with a bag of words, cross-document summarization and information extraction. GALE’s distillation capability provides targeted information delivery and relevant information without redundancy and a combination of natural language processing technologies to produce utility-centric systems,” Olive says. “Distillation is on target, with relevant, concise and nonredundant information.”
Olive explains why the translation capability is so important in today’s military missions. “Soldiers must be able to communicate with people in foreign lands and understand what is going on around them. However, it is not possible to send every soldier to a 63-week course at the Defense Language Institute. And just as impossible to predict is where our soldiers will be deployed next and what language skills they will need,” Olive observes.
GALE allows soldiers to communicate with allies, enemies and local citizens. In addition, it enables the military to use the huge amounts of open source data available in foreign languages and, perhaps just as important, to decipher documents obtained as a result of capturing enemy combatants. Fortunately, DARPA has long understood the importance of human-computer communication, working hard to meet the military’s needs, Olive asserts.
“In the Arabic speaking world, there are scores of radio and television stations, newswires and Web sites producing hundreds of hours of audio and megabytes of text every day. These information streams must be transcribed, translated and distilled to find tiny scraps of relevant information, and it must be done in almost real time,” Olive points out.
|A U.S. Army soldier operates DARPA’s GALE system, which automatically translates Arabic into English. The system also can translate Chinese into English. A dozen of the systems are deployed globally, including in Iraq and the western Pacific.|
Both companies are successfully deploying GALE at far-flung sites. The BBN system, as an example, delivers a transcript from an Arabic television station’s programming within 5 minutes of a broadcast with automatic translation, Olive confirms. A single machine can monitor four television channels simultaneously and be programmed to switch to other stations and continue at a later time. The IBM system is fairly similar and provides closed captions in real time without human intervention.
A BBN team includes speech and language scientists as well as researchers from academic institutions in the
IBM, with industry and academia, is using its unstructured information management architecture as the underlying integrating design. The company is building large-scale multimodal unstructured information management applications with source code that already is available to the open source community.
GALE technology makes it possible for English speakers to do much of what interpreters and analysts do now—quickly translate large quantities of undecipherable foreign language data into English, actionable intelligence that can be used for operational planning and force protection, Olive states.
DARPA Global Autonomous Language Exploitation (GALE): www.darpa.mil/ipto/programs/gale/index.htm
BBN Technologies Speech Recognition: www.bbn.com/Solutions_and_Technologies/Speech_Recognition
IBM Unstructured Information Management Architecture: www.research.ibm.com/UIMA