Understanding the Written Foreign Language
Something as simple as matching names in an
intelligence database can help fight terrorism.
A transliteration tool developed jointly by the intelligence community and a commercial firm is helping eliminate the problem of misidentified foreign names and places in databases. These types of errors can allow a potential terrorist or plot to slip though security if analysts cannot identify common proper nouns and establish valuable links.
The new system helps avoid this problem of misidentification arising from different interpretive spellings of names from a language that does not use Western-style Roman lettering. This problem has become an issue when terrorists’ names are not matched in different databases because their spelling is interpreted differently. Analysts are not able to put together the pieces in a puzzle to develop an accurate picture that shows a potential threat.
|The Highlight Language Analysis Suite standardizes the transliteration of foreign language names so that analysts can match proper nouns across databases. A simple plug-in for Microsoft Word or Excel, the tool helps eliminate unintentional ambiguities in identifying individuals who might be terror suspects.|
Known as the Highlight Language Analysis Suite, the system was developed jointly by the Office of the Director of National Intelligence, the Defense Intelligence Agency (DIA) and Basis Technology of Cambridge Massachusetts. Nicholas Bemish, senior expert for human language technology at the DIA, says that the most important element of Highlight is that it provides a standardized translation and translated content in all reporting. With naming references standardized, no conflicts arise in transliteration among the many intelligence agencies using it.
Highlight’s transliteration assistant accepts approximate spellings of foreign names and conforms them to a standardized intelligence community spelling. Users can select from a list of choices when facing multiple possibilities. The system uses batch processing, which allows automatic transliteration of large databases ranging from cell phone contact lists to telephone directories.
The Highlight program is based on the development of a system that supports the intelligence community’s transliteration standards for proper nouns, names and places. Bemish offers that the community needed a capability providing consistency and readability that would work for nonlinguist consumers.
Highlight provides the standard within the document itself, he explains. In addition to standardizing transliteration, it increases speed and improves translation accuracy. Linguists no longer need rely on rote memorization for correct spellings of commonly referenced items, he notes.
“We get a lot of content in a foreign language, and we don’t have the language human capacity to be able to do a lot of this work,” Bemish points out. “Where we can take a lot of that content and push it through the system in an automated fashion, it allows us to then get that information out to the people who need it a lot quicker—with some minor cleanup—to make sure that we have correct spellings according to the current standards.”
This capability is especially vital for languages that do not convert easily to Roman lettering. Many languages do not have direct equivalents in the Roman alphabet, so their conversion is done phonetically. Government organizations have different standards for this transliteration, which can lead to confusion or a loss of intelligence value.
More than one foreign-born terrorist was not identified as a serious threat because the individual’s name was not spelled the same way among the different government databases that are searched for cumulative threat information. “We had individuals who either were going towards prosecution litigation, trying to get visas to enter the United States or actually having gotten onto the plane, where officials looked [at records] and said, ‘well his name was spelled in the Department of Homeland Security database this way and spelled in the State Department visa registry this way … so how would we have known?’
“This way, since we use this system across the 16 intelligence community partnerships worldwide … we’re actually solving that problem today,” he emphasizes.
With Highlight, those names are standardized and spelled the same way in all the databases belonging to groups that have implemented the transliteration tool. Analysts know they have the right person and the correct information about that person.
Bemish notes that Highlight currently transliterates names in Arabic, Farsi, Pashto and Dari. An Arabic editor function allows a user to input Arabic text using a standard keyboard. A geoscope map viewer allows verification of the spelling of geographic features in the Middle East, or a user can see where these features are located. This month, Highlight will begin transliterating names in Mandarin.
Bemish explains the intelligence community had been seeking translation-related service capabilities rolled into a single suite of tools that could be installed on the desktop of a linguist or an analyst. This would enable them to perform their functions more smoothly. The traditional tool used for this purpose was a commercial off-the-shelf transliteration tool.
The community wanted an intelligence community-licensing model under a common operating system that supported all of its members. “This way, you don’t have 16 different agencies paying for this item 16 different times,” Bemish says.
Three years ago, intelligence experts chose these capabilities from components in the Basis product line, and this consolidated suite of tools was named Highlight. “It was highlighting the information, highlighting the content that was of importance to the intelligence community; and then getting that information to be referenced, translated and identified correctly for the people in the community who needed access to that information.” Bemish notes.
The intelligence community adopted many of the tools available in this product line, and then it adapted some of them to meet its own requirements and needs at the desktop level. The result is a commercial/government off-the-shelf type of capability, Bemish offers. “Now, those agencies do not necessarily have a cost burden at this point. Whereas they used to have to program it into their budget and then pay for it, now it’s paid to a single licensing model that everyone then can take advantage of.
“We have saved money across the intelligence community by reducing those cost burdens to each agency, and we’re all getting the same product and benefit,” he declares. Under the existing contract, the system is licensed for 800 seats and a combination of individual seats and server licenses. In addition to being able to speed information to the correct customer more quickly, human capital requirements also are reduced, which is another way the system lessens cost, Bemish offers.
Highlight is available as a plug-in for Microsoft Word and Excel. When a user opens up Microsoft Office, he or she sees an icon in the ribbon bar at the top of the screen. That user then can select whichever language or community standard is needed for transliteration. Other items, such as reference terms and specific names, can be accessed from lookup tools through Highlight, Bemish reports. He emphasizes that, while this is not a machine translation tool, it has features that can provide assistance to a linguist in the translation process.
“When [users] hit names, places or other information of interest that have been tagged and identified, that information automatically will be produced and identified in their document,” Bemish allows. “That’s translation work and typing keystrokes that they don’t have to do.” He continues that if 10 linguists are translating the same document, they all are going to translate several words and phrases the same way.
The community already is working on future iterations of Highlight. New features that may be incorporated into the system include upgrades to the graphical user interface. Users would be able to set profiles based on their current needs, Bemish notes. Another improvement would be to widgetize the system so it can be placed on the cloud. It would be available on an app store for authorized individuals to obtain on demand as needed.
Highlight also might be integrated into computer-assisted translation tools such as Translation Memory. Bemish relates that pilot work on this approach already has been conducted. Success would enable the reuse of some of the content being created from completed translations. “We’re looking at integration efforts with other tools that currently are on peoples’ desktops to take advantage of those other solutions that are out there—so that you’re not constantly loading something new on somebody’s desktop,” he explains.
“We’re constantly working with the vendor—who works very hard at this—to make sure that the software is not going to be disruptive to any other key components of our system,” Bemish imparts. “We want to make sure that we have solid software solutions that have limited breakage and do not require a lot of maintenance downtime.”
Training is another key focus, and Bemish lauds the system as user-friendly. In most cases, users are trained in less than an hour to be able to use the system effectively. The system’s help functions also can walk a user through its performance, he notes.