Soldiers Search Syntax

June 2007
By Clarence A. Robinson Jr.

Civilian Defense Advanced Research Projects Agency (DARPA) employees use the agency’s Global Autonomous Language Exploitation (GALE) system at an operations site. This BBN Technologies version of GALE extracts, transcribes, translates and distills Arabic news in near real time from all media sources, including blogs.
Adept integrated transcription, translation, distillation system quickly finds buried foreign language nuggets.

Critical actionable military data obscured by foreign languages and often masked in large volumes of different types of media are both highly important and perishable. The global deployment of a dozen monitoring systems is enabling software applications to transcribe and translate both text and speech and distill large volumes of information in multiple languages, including Arabic and Chinese.

The Global Autonomous Language Exploitation (GALE) program, underway at the Defense Advanced Research Projects Agency (DARPA), Arlington, Virginia, provides an integrated product. Already fielded at installations in far-flung areas such as Baghdad and the Pacific Rim, this automated transcription, translation and distillation of speech and text system supports military operations and tactical situational awareness.

The five-year GALE program continues toward aggressive targets to improve today’s translation techniques, which are about 75 percent accurate. The goal is to provide the military with translations that are accurate 85 percent to 95 percent of the time and consistent in 90 percent to 95 percent of the documents translated.

According to Dr. Joseph Olive, the program’s software technologies absorb, translate, analyze and interpret huge volumes of speech in text, easing the load on linguists and analysts. Olive is the GALE program manager in DARPA’s information processing technology office. He earned a bachelor’s degree in physics, a master’s degree and a doctorate from the University of Chicago. With more than 30 years of experience in research, development and management at Bell Laboratory in human dialogue systems and human-computer communications, Olive also served as director of research and chief technology officer at Lucent’s Speech Solutions.

Automatic processing engines convert and distill data, delivering pertinent, consolidated information in easy-to-understand forms to military personnel and English-speaking analysts in response to direct or implicit instructions.

A predefined template is used to interface with the machine, instructing the system to look for a specific person’s movements, for example, or specifying topics, dates or locations. There are numerous ways to identify and obtain needed information successfully from the system, Olive reveals. Vigorous, multidisciplinary research brings together previously separate efforts and develops tightly coupled solutions. Topic identification can help transcription and translation. Accurate name tagging helps avoid translating names, and accurate parsing prevents the computer from producing sentences that confuse the subject and the object.

GALE machine translation is not like a dictionary vocabulary lookup or a simple grammatical transfer, and it is not limited to specific topics. Indeed, it is statistical machine translation with integration of grammar and other components applicable to a broad range of topics, Olive emphasizes. “The GALE distillation process is not a Google-like search engine with a bag of words, cross-document summarization and information extraction. GALE’s distillation capability provides targeted information delivery and relevant information without redundancy and a combination of natural language processing technologies to produce utility-centric systems,” Olive says. “Distillation is on target, with relevant, concise and nonredundant information.”

Olive explains why the translation capability is so important in today’s military missions. “Soldiers must be able to communicate with people in foreign lands and understand what is going on around them. However, it is not possible to send every soldier to a 63-week course at the Defense Language Institute. And just as impossible to predict is where our soldiers will be deployed next and what language skills they will need,” Olive observes.

GALE allows soldiers to communicate with allies, enemies and local citizens. In addition, it enables the military to use the huge amounts of open source data available in foreign languages and, perhaps just as important, to decipher documents obtained as a result of capturing enemy combatants. Fortunately, DARPA has long understood the importance of human-computer communication, working hard to meet the military’s needs, Olive asserts.

What U.S. forces see or hear on the English language version of Al Jazeera is often very different from the Arabic language version, he offers. GALE extracts material from all media sources, including blogs, and transcribes and translates Arabic news in real time. The system extracts basic information and allows analysts to decide whether they would like a staff member to translate the material more thoroughly. For example, the U.S. Central Command receives about 5,000 Arabic language articles each week. After GALE automatically translates and filters them, analysts reduce the amount to about 300 that are important enough to be forwarded for human translation; this is a ratio of 16 to 1.

“In the Arabic speaking world, there are scores of radio and television stations, newswires and Web sites producing hundreds of hours of audio and megabytes of text every day. These information streams must be transcribed, translated and distilled to find tiny scraps of relevant information, and it must be done in almost real time,” Olive points out.

A U.S. Army soldier operates DARPA’s GALE system, which automatically translates Arabic into English. The system also can translate Chinese into English. A dozen of the systems are deployed globally, including in Iraq and the western Pacific.
For three decades DARPA has been the military’s primary supporter of research in advancing language translation technologies. GALE contractors include IBM, Armonk, New York, with its Translingual Automated Language Exploitation System (TALES), and BBN Technologies, Cambridge, Massachusetts. BBN is continuing its work in the area of language translation with DARPA under an additional $16 million contract that the agency awarded to the company in January. BBN’s Enhanced Text and Audio Processing, or eTAP, system provides automatic adaptation and contextual awareness processing at all system levels.

Both companies are successfully deploying GALE at far-flung sites. The BBN system, as an example, delivers a transcript from an Arabic television station’s programming within 5 minutes of a broadcast with automatic translation, Olive confirms. A single machine can monitor four television channels simultaneously and be programmed to switch to other stations and continue at a later time. The IBM system is fairly similar and provides closed captions in real time without human intervention.

A BBN team includes speech and language scientists as well as researchers from academic institutions in the United States and abroad. This company integrates the transcription, translation and distillation components into a single process instead of the more traditional linked approach—speech-to-text, followed by machine translation and distillation. Because language, both oral and written, is fluid, the firm uses automatic adaptation and contextually aware processing at all levels to optimize performance. This approach adapts to different languages, dialects, topics, speakers and semantic nuances.

IBM, with industry and academia, is using its unstructured information management architecture as the underlying integrating design. The company is building large-scale multimodal unstructured information management applications with source code that already is available to the open source community.

GALE technology makes it possible for English speakers to do much of what interpreters and analysts do now—quickly translate large quantities of undecipherable foreign language data into English, actionable intelligence that can be used for operational planning and force protection, Olive states.

Web Resources
DARPA Global Autonomous Language Exploitation (GALE):
BBN Technologies Speech Recognition:
IBM Unstructured Information Management Architecture:


Enjoyed this article? SUBSCRIBE NOW to keep the content flowing.