A technology undergoing investigation provides communications capabilities without human translators.
Speech-to-speech translation software, such as SRI International’s IraqComm system, allows people speaking different languages to communicate without a human interpreter. IraqComm translates spoken English into spoken Iraqi Arabic and has undergone an investigative fielding in Iraq.
The current search for a process to translate from English to Iraqi Arabic and vice versa began when Multinational Security Transition Command–Iraq (MNSTC-I) presented the U.S. Joint Forces Command (JFCOM),
Officials at JFCOM queried personnel at DARPA to determine whether the translation technology was mature enough for evaluation in an operational environment. DARPA provided JFCOM staff with the results of the studies they had conducted so command personnel could decide which products best suited their need.
Wayne Richards, branch chief, JFCOM Capabilities Division, says he specifies Iraqi Arabic for the translations because most citizens in
Based on DARPA’s evaluations of the translation products in its laboratories and limited field utility assessments in the
JFCOM provided the systems to the Civilian Police Assistance Training Team, a branch of MNSTC-I, to test during the training they provide to Iraqi civil security forces. Performing the evaluations in this kind of environment has advantages. The evaluations are conducted in a benign environment with a targeted group of users who can study the device under controls that limit the chance of harm to soldiers. “We couldn’t think of a better place to put them,” Richards notes. In addition to training, JFCOM has an interest in using these translators in the medical field.
Of the systems they tested, training team members selected IraqComm, developed by SRI International,
Richards explains that IraqComm’s limited translation library is a good reason for testing the product in a training environment. The system is programmed to communicate specifically about training needs and is not yet robust enough to use the same program for all situations. “The translations engines are not that mature yet where you can do free-flow conversational speech,” Richards says.
JFCOM personnel returned to
Richards emphasizes that the choice to explore these systems further does not mean that they will be selected for acquisition or that they will prove the best products in the future. In addition, field experiments do not constitute an endorsement by the command. JFCOM provides all the feedback it receives from the evaluations to DARPA so that agency can continue to steer improvements of the speech-to-speech translators. Results from the IraqComm fielding will be available to all companies, not just SRI International, because they can affect the development of all the systems. The U.S. Army Test and Evaluation Command is preparing a report on the fielding of IraqComm. “It’s a continual feedback process,” Richards states.
In addition to the 32 IraqComm systems brought to
Components of the IraqComm technology have undergone several decades of development, according to Kristin Precoda, director of the speech technology and research laboratory at SRI International. The company worked on a similar product to perform translation between English and Pashtu for medical personnel in
IraqComm incorporates three software technologies: automatic speech recognition, machine translation and text-to-speech synthesis. A user speaks into the microphone, and the system records the voice. The automatic speech recognition module processes the recording and displays the speech onscreen. The machine translation component translates the phrase into the target language, and the text-to-speech component produces an audible rendition. In addition, the translation can be viewed on the computer screen.
|The U.S. Joint Forces Command has chosen two speech-to-speech translators for investigative fielding in Iraq. To use the technology, personnel speak into a microphone. The software translates from English to Iraqi Arabic and vice versa. The translations are created audibly and graphically on a computer screen.|
SRI designed the system for tactical use, and Precoda states it can translate speech about topics in the areas of force protection and civil affairs and about some very basic medical issues. The software runs on various laptops, including the Panasonic Toughbook CF-18, which weighs about four pounds. Precoda’s vision is to install the software on personal digital assistants or other devices smaller than a laptop and better suited for nontraining scenarios. “It’s not a one-size-fits-all solution,” Precoda says. Other future considerations include making the program hands and eyes free and increasing battery life.
SRI also is looking at ways to make the product more accurate and more robust in different conditions and is researching the system’s capacity to translate other languages.
At IBM, where the Mastor program was developed, researchers have been studying speech translation for more than 30 years. Five years ago they directed their efforts toward speech-to-speech translation. The company focused on domain-specific translation, which has applications in many industries, including the military, health care and tourism. According to David Nahamoo, speech chief technology officer, IBM Research, “We built our technology using meaning as the interlingual.”
The company’s technology is equipped to try to understand the meaning of words and to reconstruct sentences based on that meaning. By using domain- or application-specific expression, users can avoid problems with idiomatic phrases that accompany many translations. Nahamoo explains that by using meaning to translate, the system does not need a perfect understanding of a language to produce an effective, accurate translation. This facet of the system would assist troops using the software to train or to communicate with the average Iraqi citizen on the street or at an entry gate.
Mastor operates in much the same way as IraqComm. A user speaks in English or Iraqi Arabic; the spoken word becomes text; the text is translated into the target language; and the software speaks the translated words. Mastor offers an additional user interface that provides alternative translation choices to the speaker in the speaker’s language. Users can select one of the other phrases if it better conveys their meaning, and the device will speak that option.
IBM is looking ahead to the possibility of selling this technology to the general public for purposes such as travel. Nahamoo describes Mastor in its current state as somewhere between prototype and a deployable system ready for use by several thousand people.
For now, the company offers it as a tailored product to interested parties. For JFCOM or other organizations within the U.S. Defense Department, Mastor comes as a hardware and software package that has been ruggedized. When IBM fields Mastor with JFCOM, the command’s personnel receive training on the product from IBM so they can support it in theater.
Nahamoo also mentions another possible delivery mode for the translator system—remote access. The technology would reside on a server, which people could access through their computers. In the future, telephone companies could offer it. A person in the
Both Precoda and Nahamoo stress that the thrust behind their products is a desire to improve communications among different people. They believe the results of improved communication will have a social impact and lead to better collaboration and understanding.
The automatic translation systems are not intended to replace human translators but instead to augment the small pool of interpreters available. Richards explains that because interpreters cannot be in all places at all times, machine translation tools give personnel with no foreign language training the capability to communicate with someone who does not speak their language.
Richards says JFCOM has demonstrated speech-to-speech translation to U.S. Secretary of State Dr. Condoleeza Rice and to the commander and deputy commander of the U.S. Central Command. Despite this, Richards believes much remains to be accomplished in terms of improving speech-to-speech translators. “A lot of that work is happening because we have the opportunity to put these prototypes in the field in these controlled environments,” he states. Some of this work involves capturing conversations between
Richards says that data in the library is government property, so when the technology is completely ready and provided to