Speaking Up On the Internet

June 2002
By Henry S. Kenyon

Voice XML standard ready for mainstream use.

A voice-recognition protocol may be on the verge of widespread market acceptance. Developed by a consortium of major telecommunications and technology firms, the standard creates a set of programming rules that can be easily incorporated into existing telephone and wireless networks.

The ubiquity of cellular telephones, personal digital assistants and other communication devices is creating a quandary for engineers—what is the best way to access the glut of information available on the Internet? Previous attempts to develop wireless browsers have had lackluster results in North America and Europe, experts say. However, they predict that mobile communications will become more versatile if users can easily access data or messages through natural speech.

In the late 1980s, IBM, Motorola, Lucent Technologies and AT&T pooled their resources to develop a speech-recognition technology and ensure interoperability. The results of this effort are the voice extensible markup language (XML) and the formation of a consortium of companies known as the Voice XML Forum to develop standards. According to William A. Dykas, chairman of the board of the Voice XML Forum, the protocol was designed around an open standard to emulate the success of the World Wide Web’s extensible markup language. It allows users to avoid the cost and limitations of proprietary systems by permitting them to modify the software to meet their own needs.

By extending the Web model, the forum taps into a large pool of experienced programmers. “You can help bring them in and not have to train them from the ground up in a new programming language,” Dykas explains. This flexibility goes beyond software because it also theoretically enables applications to be portable between different platforms.

According to Dykas, Voice XML is beginning to seed the marketplace. He cites a growing demand for the protocol in applications such as voice-activated dialing, voice-access to e-mail and, more broadly, in services like voice-enabled unified messaging and unified communications applications. “That’s where the technology is. Systems like text-to-speech and speech recognition for command and control, and text-to-speech to read e-mail are starting to be implemented,” he says.

Although it is poised to enter the wider consumer marketplace, voice recognition took many years to mature, Bernard Elliot, a research director with Gartner Incorporated, Philadelphia, explains. He notes that it is now becoming less of a niche technology as more products and applications enter the market. Because voice XML creates a standard interface and could increase information access portability, it may prod more developers to speech-enable their applications. Elliot notes that organizations using proprietary call-center technologies had difficulty justifying modifications such as speech access. The standard now allows users to modify their systems in a flexible and cost-effective manner, he says.

Voice XML functions like an Internet browser. It does not directly access the data at a dial-in site but connects to a server that gathers the data and presents it, explains Louis Abbruzzesi, chief technology officer with Cambridge Voice Technology, Reston, Virginia. Instead of a browser, sites use gateways. “It is these gateways that receive these documents. As people call in, documents are served up, just as in the Web,” he says.

Abbruzzesi predicts that in the near future Voice XML gateways will be available with a variety of options in the same way Web servers are sold today. “I think the only thing standing between that reality and the present is just the adoption of the standard and the existence of all these new platforms,” he says. This abundance of systems and patchy compliance resembles difficulties faced by early versions of Navigator and Internet Explorer, which would not perform uniformly on different Web pages. “That’s where you are with Voice XML. Everybody’s trying to catch up to the standard, and in doing so, they may support parts of it or extensions before they’re fully compliant,” he says.

Competition is increasing as more firms develop gateways. Abbruzzesi explains that the field is growing with incumbent interactive voice response (IVR) companies scrambling to offer the voice-recognition component of Voice XML. Many firms with call centers also are considering the choice of moving to the standard or continuing to invest in proprietary IVR systems.

Voice XML can perform all of the functions of IVR-based systems, including touch-tone entry for some applications. However, the real power lies in the speech-recognition component, Abbruzzesi explains. While a keypad only provides a vocabulary of 12 buttons, voice identification offers a potential of 50,000 entries in a database that can be compiled in real time. “In other words, you call up your company directory and ask for Bob. Tomorrow you get a new employee. All you should really have to do is add the entry in the database without having to write any additional code. That’s pretty nifty stuff,” he says.

Voice-recognition techniques also are gaining government and commercial clients. David Tso, director of business development at Avaya Unified Communications, Milpitas, California, notes that applications such as collaboration, conferencing, message management, directory services and interaction management can be voice-enabled. He sees a growing market among Avaya’s Fortune 500 customers who have very distinct and specific needs for employees working in virtual offices, mobile services or at their desks. “We think there is a significant percentage of mobile and virtual workers in a large enterprise. That’s where the excitement is being generated for speech access,” he says.

Unified communications is another example of voice-recognition technology. “We see unified communications as building connections to those different applications and bringing them together with the Avaya communication application services. That’s a very powerful solution because a person not only gets functions such as messages, conferencing and calendars, but they are also able to receive information that’s directly relevant to customer needs,” Tso says.

Another potential growth area is using voice browsers to access data. But Tso is cautious. Though the possibilities are great, he notes that the market is still nascent in terms of large technology deployments.

Voice XML permits an Avaya communication gate’s user to access data with normal language. For example, a traveling salesperson can receive notification about urgent messages. Professionals can arrange conferencing and create business contact lists. With Avaya’s Speech Access application and Unified Communications Center, employees can check their calendars with spoken commands to arrange meetings. “They can be in virtually any situation where they can have cellular telephone access, and they can accomplish that task,” Tso explains.

Speech recognition also is useful in government applications requiring high levels of customization or security. Tso notes that voice identification systems are more responsive than asynchronous communications. Speech access provides increasing capabilities to get to data and information securely in real time, he explains.

These applications can be used in an emergency response situation. For example, they could be linked to an emergency management database containing key personnel contacts, Tso says. By using telephone or wireless systems, users could access services such as assistance, conferencing and information residing on proprietary databases.

Like Avaya, Verascape Incorporated provides VeraServe platforms and turnkey solutions for its customers. The Oakbrook Terrace, Illinois-based firm’s products make it easy for customers to deploy Voice XML applications. The company’s main product provides speech recognition, text-to-speech, Voice XML interpreters and line cards to connect to telephony networks in an integrated, fully redundant package, explains James Seidman, Verascape’s vice president of engineering. One connects telephone lines and translates the signals into voice over Internet protocol. A call director function provides all of the call routing, manageability and configuration functions, which he describes as the brains of the system. A speech server houses the speech-recognition and Voice XML interpreter and a text-to-speech server.

A typical installation consists of many components, and Seidman explains that this is due to processing requirements. The more telephone lines that an organization supports, the more processing support it needs. The system also is redundant—if any given component fails, the remaining units will keep functioning, he says.

Multimodal communications are another area where Voice XML and speech-recognition technologies are making progress. This method allows users to access data in a variety of formats such as requesting flight directions over a cellular telephone by voice command and having itineraries listed on the device’s screen in text. Another variation is keying in a request for driving directions on a personal digital assistant (PDA) and having an automated system read the directions to the user via the PDA or cellular telephone.

Seidman believes that Voice XML is becoming a mainstream application. “Voice XML has two huge advantages. First, you can actually go to a lot of different people who can program for you. The other advantage is for things like customer self-service applications. Integrating IVR systems with your legacy databases can be a major project in itself, let alone creating stuff that’s going to interact with the customer. The nice thing about Voice XML is that it’s served up from exactly the same types of Web and application servers that you use for your legacy databases,” he says.

Another key player in the Voice XML market is Nuance. The Menlo Park, California-based company provides speech-recognition software that companies like Avaya embed in their platforms, says John Shea, Nuance’s vice president for marketing and management. He notes that all Nuance products feature a Voice XML interpreter that permits clients to write their own applications using the standard.

Local and state governments are beginning to use automated voice-recognition technology for their departments of motor vehicle systems and information lines, Shea observes. At the federal level, major agencies such as the Internal Revenue Service are replacing agents with automated voice systems in areas such as tax advice. A major benefit is reducing the number of personnel staffing telephones because automated systems can process the bulk of the calls. Customer satisfaction also increases because they can speak naturally without the button pushing required by a traditional IVR system, he says.

Voice recognition also has security aspects that have caught the attention of the Federal Aviation Administration (FAA), Federal Bureau of Investigation and National Security Agency. The application in question is voice printing, which can be used as an access-control system in restricted areas.

Shea notes that the FAA is developing a global profiling system for all U.S. travelers. Voice authentication could be used to identify people before they board an aircraft, he speculates. This type of system could help eliminate a requirement for personal identification numbers and other codes in many applications, he says.


Additional information on Voice XML is available on the World Wide Web at www.voicexml.org, www.nuance.com, http://www.verascape.com, and www.avaya.com.

Enjoyed this article? SUBSCRIBE NOW to keep the content flowing.