A U.S. Defense Department program combines technologies into a network to identify, predict and prevent terrorist attacks.
A variety of technologies under development by U.S. government researchers soon may help security organizations to track, anticipate and preclude terrorist activity. Part of an overarching program, these applications will permit analysts and decision makers quickly to assess and act upon patterns and trends in terrorist activity.
The Defense Advanced Research Projects Agency (DARPA), Arlington, Virginia, has initiated a homeland security research program that incorporates a number of technologies such as data fusion, database searches, biometrics and pattern recognition. Called Total Information Awareness (TIA), the effort seeks to develop a network of technologies that will help officials to predict and pre-empt terrorist activity.
TIA is part of DARPA’s Information Awareness Office (IAO). Both entities were created in the wake of September 11, 2001, says IAO Deputy Director Robert Popp. A number of programs focused on counterterrorism, national security and asymmetric threats, but they spanned several of the agency’s technical offices. After the attacks, these programs were placed under one office, which officially opened in January 2002.
Charged with countering asymmetric threats and terrorism, the IAO seeks to develop technologies to prevent terrorism at the national and international levels. TIA is envisioned to support senior decision makers such as the president, directors of the Federal Bureau of Investigation and Central Intelligence Agency, and major combatant commanders.
To combat terrorism at the individual cell level, TIA seeks to track individuals as they make transactions in the global information space. “What we are trying to do is develop and utilize information technology to detect, classify and understand what those different signatures mean relative to terrorist activity. Once we can do this, actionable intelligence and options can be provided to senior decision makers,” Popp explains.
He describes locating and interpreting terrorist activities as a signal processing issue. To seek data signatures corresponding to different kinds of terrorist-related events, the information must be sifted from the general noise of the world’s information systems. But multiple challenges exist before this becomes reality. “The signal-to-noise ratio is incredibly low. The [number of] potential targets that we need to identify and collect signatures for is enormous, and the time between events can be rather long if you think about the initial World Trade Center attack in 1993 and 9/11,” he says.
The TIA systems program supports a variety of technology efforts covering areas such as data mining, link analysis, human language translation and biometrics. Other related research areas seek to develop systems and tools to help analysts sift through large volumes of data for meaningful patterns and clues. The results of these various research programs will feed the TIA systems effort, which is a transition point for new technologies. Popp notes that any shortfalls found within the program will be addressed using appropriate commercial and government technology solutions.
The IAO’s focus is not on collecting and disseminating conventional intelligence data, Popp emphasizes. While interested in providing analysts with different tools to process and exploit traditional information, the office seeks to collect other types of data such as unstructured text on Web sites or tracking and identifying patterns in communications, financial, travel and housing transactions. However, major challenges exist in collecting this data, much of which resides in the private sector. Because privacy laws restrict the federal government’s ability to access private commercial data without a specific court order, DARPA researchers are developing technologies that exploit a combination of traditional intelligence sources and other types of transactional information to locate potential terrorist activity.
Popp envisions two types of analytical environments under TIA. The first is a community of experts spanning intelligence, counterintelligence and law enforcement agencies. In this environment, analysts would use a variety of applications such as discovery tools, hypothesis tools, patterning tools and models. This community’s output consists of hypothesis supporting models that can predict potential terrorist activities. He notes that a key point is not to provide a single solution but rather a range of possibilities.
These tools permit this community to provide more timely solutions that can be fed into the second analytical environment. More policy and operationally oriented, this second group will take the first group’s data and estimate possible outcomes for terrorist activity by determining overlapping patterns in the range of predicted events. This information then is presented to senior leadership for action.
TIA serves as the overarching program for a number of related technology efforts. One of these programs is human identification from a distance, which seeks to develop multimodal biometric technologies to identify terrorists passively from distances up to 500 feet. Research is focusing on fusing different types of biometric data such as gait recognition, facial identification and other types of biometric identification. Efforts also are underway to develop multispectral identification systems using visible and infrared light, radar and millimeter wave bands.
The Genysis program focuses on massive database searches and privacy protection. Because the federal government is prohibited from accessing certain types of data, researchers are developing methods to make some information less specific for analysis. The process permits analysts to access the data, but it protects personal information. For example, if a suspect was being studied, personal data such as name, address and other identifying information would be withheld. All the analyst would know is that this person, who may be given a generic “John Doe” or numeric identifier, bought 500 pounds of fertilizer in November and made numerous calls to the Middle East and Afghanistan. “They [analysts] don’t necessarily need to know who the person is. They are really interested in patterns indicative of terrorist activity,” Popp says.
Another technique involves selective revelation—providing only the most essential information to police and law enforcement authorities. Popp notes that if a local police department detains someone for a traffic violation and they suspect the individual may be involved in terrorism or some other crime, they can access a database to determine whether that person is on any government watch lists. The response may be a simple yes or no, or more detail may be provided. “You don’t reveal any privacy information on the person. You just say this guy is suspect; he is on some major watch list. You don’t need to know why; you don’t need to know which list. Just hold him until the appropriate official comes,” he maintains.
In addition to creating methods to make personal data more opaque to human analysts, researchers also are building audit trails into the system. This permits a third party oversight organization, such as the U.S. Congress, to examine records should there be a need for an investigation.
The other aspect of Genysis focuses on raw database searches. “Theoretically, any piece of information out in the infospace could have value to process, exploit or touch. There are potentially huge volumes of information that could have value. If an analyst submits a pattern-based query that is indicative of some kind of terrorist activity, that query can span many different information domains,” Popp says.
The challenge for those extracting this data lies in the many different types of legacy databases. This extends beyond equipment and software to encompass schemas, formats, business processes and maintenance issues. The technology must be able to bring in new information sources and database technology seamlessly and cleanly while dealing with legacy systems and making data transparent to the user community. Another complication is that new databases go online every year, which adds to the overall complexity.
To take advantage of the global information pool, analysts must be able to translate foreign language texts. The Translingual Information Detection Extraction and Summarization, or TIDES, program is designed to allow English speaking analysts to search and query larger quantities of foreign language texts than would be possible with human translators alone.
This capability is important because only a handful of analysts and translators are fluent in the languages that are currently relevant to the war on terrorism. With a vast amount of material available in foreign language text, researchers are developing systems that will permit an analyst to write a query in English that is then translated into one or several languages, so the system can scan foreign language speech and text databases. The results of that query then are translated back into English for the analyst to review. The program is currently working with two foreign languages, Arabic and Chinese.
Another language-based effort is the Effective Affordable Reusable Speech-to-Text program. It seeks to develop a robust speech-to-text transcription technology with an error rate of five to 10 percent for broadcast and conversational speech. This is a challenging area because state-of-the-art technology currently has an error rate of 40 to 50 percent. Popp notes that at current error levels, most documents are meaningless for intelligence purposes.
Data mining and information gathering techniques also are under development. An important program within TIA is called Evidence Extraction and Link Discovery (EELD). This program develops systems that automatically extract information from unstructured and semi-structured data sources, discover relationships and learn patterns of activity.
Popp explains that information found in databases is structured in a tabular format, with columns and rows of data corresponding to different records. However, much of the data found on the World Wide Web is semi-structured. Usually it is written in Hypertext Markup Language and takes the form of text divided by tabs or of labels identifying different fields within a document. Unstructured data often consist of long strings of text with few or no divisions or identifying marks.
EELD initially seeks to extract information from semi- and unstructured text. Researchers are focusing on search parameters that allow the system to search for entities—key words such as individual, organizational or affiliation names, for example. Once these entities have been discovered, it is necessary to uncover links between them. These are derived from the different types of relationships between entities. After the entities and their relationships have been established, the program attempts to develop a patterned learning mode that will study these relationships, and from them, determine whether other patterns of terrorist behavior exist. “The patterns can be useful because they may be indicative of known terrorist activity used by past or existing groups,” Popp says.
TIA also incorporates biosurveillance programs designed to provide early warning and detection of bioterrorism incidents. This effort largely is focused on mining and studying unconventional data sources such as over-the-counter sales of medicines and foods associated with illnesses and levels of school absenteeism. Researchers are examining a range of different data sources coupled with more conventional information used in epidemiology to provide early warning and detection of bioterrorism events.
Modeling terrorist behavior is another part of TIA efforts. The Wargaming in the Asymmetric Environment (WAE) program focuses on using applied behavioral analysis tools and techniques to better understand, anticipate and predict terrorist attacks. Popp is sanguine about this technology, noting that it shows great promise and that it already has been used with some level of success. He explains that WAE attempts to use some of the techniques found in social sciences such as psychology and sociology. By studying which events in a given social, political and economic environment serve as triggers for terrorist behavior, the technology can serve as an indicator and warning function for potential future threats and events, he says.
Helping analysts to identify and sort information quickly about potential terrorist threats is the goal of the Genoa II program. Like its predecessor Genoa, the effort is developing ways for intelligence teams to work smarter, faster and more jointly in day-to-day operations. However, Genoa II goes a step farther by applying additional automation to team processes to exploit more information and generate and examine additional hypotheses. These results will permit teams to deal with multiple crises simultaneously. Some of this aid will be in the form of software agents that can sort and cue documents by importance before an analyst reads them. The system will alert the analyst when there is a piece of information of interest within a document.
Genoa II also will develop cognitive amplifiers to help intelligence teams rapidly and fully comprehend complicated and uncertain situations. The program seeks methods to cut across and complement existing stove-piped hierarchical organizational structures seamlessly with dynamic, adaptable, peer-to-peer collaborative networks. A final aspect of Genoa II is visualization technology that takes the complex analytical structures used by experts and converts them into a format that presents an intelligible story for senior decision makers.