Computer Language Seeks Deeper Meaning
Semantic web technology offers smarter, more efficient data searches, information sharing.
U.S. Defense Department researchers are developing software that may be capable of accurately understanding the nuances of human language. The technology promises to greatly enhance a spectrum of computer-based systems—from commercial Web browsers and personal virtual assistants to advanced intelligence gathering and command and control systems.
Machines are good at reading text, but current systems can only track individual key words and are unable to determine the information’s context, experts say. This often leads to information overload as vast amounts of only slightly related data are presented to the user. A system that understands the shades of meaning and context in written text would be a highly efficient information-gathering and decision-making tool.
Using an approach called semantic web, researchers seek to create networks where information can be easily understood by machines and humans. According to Dr. Geoffrey P. Malafsky, president of Technology Intelligence International LLC, a Burke, Virginia-based technology consulting and intelligence firm, this requires a general purpose representation and markup language that conveys information about machine accessible semantics. This code can be organized into taxonomies and expressed as abbreviated ontologies—the knowledge of the ways and means of dealing with specific terminology and facts.
Semantic web-based applications could touch everything encompassed by information technology, Malafsky explains, because developers are attempting to create an underlying level to permit information technology to process information intelligently. “It really means that the machines become more capable of understanding the language within information rather than treating it as a simple ensemble of words, which they mostly do right now,” he says.
The effort seeks to solve the language understanding issues of the World Wide Web. This undertaking includes all large-scale information systems such as enterprise portals, knowledge management and content management systems. These principles also can be applied to military operations in advanced command, control, communications, computers, intelligence, surveillance and reconnaissance (C4ISR) systems and information fusion. “We want the system to be able to analyze and make sense out of large volumes of information in a way similar to how humans can. Right now, they [machines] do a poor job of it,” Malafsky maintains.
A major difficulty encountered when searching for data on the World Wide Web or other networks, for example, is the sheer volume of information and the fluidity of human language, where words have multiple meanings depending on their context. “We teach children to read and understand by context. And that context could be in the next sentence, or it could be four paragraphs down the page. But we spend a lot of time teaching people to understand the meaning, based upon the greater context. Even then, there are nuances where people have to get together and spend hours debating its meaning,” he says.
The most sophisticated information technology systems currently treat words as single entities and perhaps conduct searches based on sentence structure. But this is still a rudimentary approach. “What we really want to do now, in a machine-readable way, is put in some of what we know occurs in language. What that means is, you want to find a machine-readable way to include context and relationships, and to have it defined explicitly,” he explains.
For example, instead of having a Hypertext Markup Language (HTML) tag on a name that might say “author,” a metadata tag written in Extensible Markup Language (XML) can describe that particular writer’s specialty and publication. XML provides more accurate and unambiguous data than does HTML. Metadata tagging explicitly indicates the relationship of a particular word. Malafsky notes that this is a major difference between simply providing key words or even having a simple taxonomy because the tags help refine the overall context of a search.
Semantic web technology is still primarily in the research and development phase. But commercial and government communities have launched major efforts to move it into practical applications. Malafsky observes that the Defense Department and intelligence communities are pushing for systems based on semantic web principles.
The military is especially interested in tactical C4ISR systems with this capability because future warfighting visions involve an increased operational tempo that leaves less time for data collection and analysis, resulting in greater uncertainty during an operation. “We don’t have the luxury now of having a lot of analysts making sense of the information and calling back somewhere to get data. You want the systems to automate it, do it in real time, and come up with a good answer—even though we don’t know what a good answer might be,” he says.
For example, a commander conducting antisubmarine operations against a Russian-made diesel submarine is unsure of its location. Instead of conducting a random search, the commander accesses data collected by satellite. Although semantic web systems offer a more clue-based approach that automatically processes, sifts and filters incoming data, its search parameters are not limited to simple key words. Instead, the system hunts information based on context. “Did the satellite show something on the surface of the ocean? Can it then automatically correlate that to weather patterns and to a historical database of known operational patterns for this type of submarine, and then link to what is known about the submarine’s commander? That’s an awful lot of uncertainty and context that goes into trying to pull those pieces together,” he observes.
Malafsky adds that a major push is underway to embed semantic web-based inference technology into command and control and C4ISR systems and across the World Wide Web for the intelligence community. But implementation is currently limited by two factors—the exact nature of language and how to break language rules down so that they are consistently reproducible and can be taught to a machine.
Another difficulty is the volume of information available. A general Web search about diesel submarines may produce a million documents, of which 10,000 are actually about the desired subject area. Although a highly trained and experienced person could differentiate between these texts on the same topic, this difference is too subtle for existing technology, so it becomes noise, Malafsky says.
He notes that a number of programs at the Defense Advanced Research Projects Agency (DARPA) are developing automated systems and ontologies that can break down sentences into knowledge bases. Although they work well in the laboratory, they still cannot function in a real battlefield scenario or conduct a single day’s worth of open-source intelligence gathering. “So you really cannot apply any of these advanced tools to this situation and come out with anything meaningful because we have not determined the rules and algorithms by which we can truly differentiate [language and context] and then automate that function,” he says.
DARPA research currently involves inference and data-sifting efforts such as Evidence Extraction and Link Discovery, Genoa and the Total Information Awareness program (SIGNAL, February, page 43). The DARPA Agent Markup Language (DAML) program seeks new ways to develop language to explain and visualize information on the Web. Researchers indicate that HTML does not efficiently facilitate software programs’ ability to find or interpret information. Metadata tags are currently written in XML, which permits more accurate and unambiguous information on a tag. But XML is limited in its ability to describe relationships between objects. DAML is being developed as an extension of XML to create ontologies and markup information that is machine readable and understandable.
Semantic web-based technologies still have a long way to go before they can become reliable intelligence and command and control tools, warns Malafsky. It is not simply a matter of applying a computer to solve a problem, but determining the context of a situation based on existing knowledge and inferences. While this can be easily understood in retrospect, detecting an event before it occurs with a high degree of certainty is extremely difficult. “So it’s adding that understanding into what we’re trying to do. But it’s hard when you scale it up to really high volumes,” he says.
Additional information on the semantic web concept and DARPA Agent Markup Language is available on the World Wide Web at http://www.w3.org and http://www.daml.org.
Navy Plugs Into Semantic Web The U.S. Navy is developing technologies for its new enterprise information technology systems based on the semantic web concept. The service is undertaking a major program to Web-enable its commands. Among these efforts is Task Force Web, which is designed to create a single point of access for users afloat and ashore. According to Anh Ta, principal scientist with The MITRE Corporation, Tysons Corner, Virginia, Task Force Web is developing enterprise architectures for the Navy with the goal of Web-enabling many commands by 2004 (SIGNAL, June 2002, page 23). An important component of this work is persuading commands to embed metadata tags in their content as they establish their systems, he says. As they go online, business processes can be modified or new ones created to suit the technology. Ta cites the example of duty station assignments as a business process that can be enhanced by semantic web techniques. Toward the end of a rotation, a sailor can select a new post based on criteria such as service history, training and quality of life issues. Detailers then access databases that list available assignments and their requirements; however, they currently must access several different systems to collect all of the necessary information. The Task Force Web team created a prototype assignment matching system by using an ontology that captured all of an entrant’s related job qualifications. Existing databases were linked to create classifications covering issues such as quality of life. “If I’m married, and I get reassigned, does my wife need a job? Do I have any children? Do they belong in day care or require special medical facilities?” he asks. The prototype system includes all of the career data on a single screen and could be operated by individual officers or sailors without assistance from a detailer. After a user logs into the system, a software agent pulls requested information and places it on the screen. Two categories of data are shown. The first consists of jobs available to individuals based on their qualifications, career requirements and quality of life preferences. The second category lists additional positions that are available with training and a list of training courses that can be accessed by clicking on a link. Although the system worked well, Ta cautions that it is just a proto type and that the Task Force Web effort is still in its very early stages. “What we’re doing within this system is trying to look at the business process and where to use semantic web technology to horizontally integrate data to make it all seamless,” Ta says. |