VClarity Voice Platform VClarity L.L.C. Voice Platform Snap-in Functional Overview White Paper Technical Pre-release Version 2.0 for VClarity Voice Platform Updated February 12, 2007
Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCING...3 FUNCTIONAL OVERVIEW... 4 AND VOICE FOUNDATION CLASS... 4 VCLARITY VOICEXML 2.0 BROWSER... 5 SPEECH AND TELEPHONY STANDARDS... 10 2
EXECUTIVE SUMMARY Many corporations implement speech recognition solutions and telephony systems like VClarity Voice Platform to automate their back-office business processes. Oftentimes though, it can be quite a tedious process to get information into those systems even requiring people to repeat multiple words of similar sounds which of course, gives rise to greater opportunity for errors. What if there were ways to enable customers to speak more naturally by phone into the interactive speech applications through VClarity Voice Platform? VClarity speech applications do just that they allow customers to interact easily with VClarity Voice Platform, thereby increasing their productivity and enabling them to get timely and accurate information from/into the system from the convenience of any traditional phone and with which they are most familiar. The VClarity Voice Platform, whether premise-based or hosted, helps companies maximize their investment in the telephony technology by delivering the power of the VClarity Voice Platform to every single customer. 3
FUNCTIONAL OVERVIEW The VClarity Voice Platform is a programmable environment that encapsulates highly accurate speech recognition engines, natural sounding Text-to-Speech engines, and Telephony interfaces. It is a vendor independent interface to the underlying software components. Primarily structured as a set of COM components that expose various properties and methods for application developers and a powerful VoiceXML browser. The Voice Platform consists of the following components: 1. Voice Server 2. VXML 2.0 Browser 3. Automated Speech Recognition Engine (ASR) 4. Text To Speech Engine (TTS) 5. Telephony (TDM/SIP) VOICE FOUNDATION CLASS The Voice Foundation Classes is a set of COM objects and libraries that can be used to develop custom speech applications. Developers could use these libraries to quickly deliver their own speech solutions using VB and VC++, hence meeting their time to marketing requirements. Web Technologies Application Layer Speech Recognition Interface Text to Speech Interface Telephony Interface 4
FUNCTIONAL OVERVIEW VCLARITY VOICEXML 2.0 BROWSER VoiceXML is a markup language that simplifies implementation of Speech User Interfaces (like HTML does for GUIs). VoiceXML uses a web model of application development. A web server provides VoiceXML documents to a voice browser that renders the speech user interface. User inputs can be submitted back to the web server just like web browsers. All of the benefits of traditional web based development apply to VoiceXML (scalability, ease-of-use, location transparency etc.). The fact that VoiceXML is based on XML is an added advantage. Many web sites currently take advantage of XML and XSLT to render content targeted to different browsers such as IE4, IE5, WAP. This can be extended to include voice browsers. VClarity Voice Platform is a high performance, carrier-grade telephony server that is responsible for call control and speech recognition capabilities. When a call is received, it is routed to one of the Gateway servers. The Voice Platform, running on the Gateway Server, receives the call via telephone hardware (Digital Dialogic Cards) or software (SIP) installed on the box and transfers it to the VXML browser. The browser retrieves the appropriate VXML page from a remotely located application server over the Internet/Intranet. It parses the VXML page and executes by calling various functions exposed by the Voice Platform. The underlying ASR engine does all the speech recognition. The underlying TTS engine does all the Text to Speech conversion. 5
VCLARITY VOICEXML 2.0 BROWSER VoiceXML Browser is with the working draft of the v2.0 Specification. W3C Tag Compliance The following table summarizes W3C compliance of VoiceXML Browser: Element <assign> <audio> <block> <break> <catch> <choice> <clear> <disconnect> <div> <else> <elseif> <emphasis> <enumerate> <error> <exit> <field> <filled> <form> <goto> <grammar> <help> <if> <initial> <link> <log> <menu> <meta> <noinput> <nomatch> <object> <option> <paragraph> & <p> <param> Level of support ( Depend on capabilities of TTS engine) ( Depend on capabilities of TTS engine) ( Depend on capabilities of TTS engine) 6
<phoneme> <prompt> <prosody> <record> <reprompt> <return> <say-as> <script> <sentence> & <s> <subdialog> <submit> <throw> <transfer> <value> <var> <voice> <vxml> ( Depend on capabilities of TTS engine) ( Depend on capabilities of TTS engine) ( Depend on capabilities of TTS engine) ( Depend on capabilities of TTS engine) ( Depend on capabilities of TTS engine) ( Depend on capabilities of TTS engine) 7
VoiceXML Property Initial Setting Specified by the completetimeout VoiceXML Complete Timeout parameter incompletetimeout VoiceXML Incomplete Timeout parameter interdigittimeout VoiceXML Inter-Digit Timeout parameter bargeintype VoiceXML Bargein Type parameter timeout VoiceXML Timeout parameter audiomaxage VoiceXML Audio Max Age parameter audiomaxstale VoiceXML Audio Max Stale parameter documentmaxage VoiceXML Document Max Age parameter documentmaxstale VoiceXML Document Max Stale parameter grammarmaxage VoiceXML Grammar Max Age parameter grammarmaxstale VoiceXML Grammar Max Stale parameter objecttmaxage VoiceXML Object Max Age parameter objectmaxstale VoiceXML Object Max Stale parameter scriptmaxage VoiceXML Script Max Age parameter scriptmaxstale VoiceXML Script Max Stale parameter fetchaudiodelay VoiceXML Fetch Audio Delay parameter fetchaudiominimum VoiceXML Fetch Audio Minimum parameter fetchtimeout VoiceXML Fetch Timeout parameter inputmodes dtmf voice 8
Information Element Standard VoiceXML Session Variable Local URI session.connection.local.uri Remote URI session.connection.remote.uri Connection Protocol session.connection.protocol.name Connection Protocol Version session.connection.protocol.version Connection Originator session.connection.originator Application-to-Application Information session.connection.aai Redirecting Paths session.connection.redirect ANI Information Digits session.connection.protocol.<name> Version session.connection.protocol.version ANI session.telephone.ani DNIS session.telephone.dnis IIDigits session.telephone.iidigits UUI session.telephone.uui CallerId (vclarity customized Property) session.callid Channel Identifier (vclarity customized Property) session.channelid 9
TEXT TO SPEECH AND SPEECH RECOGNITION Text To Speech Our TTS software brings you truly natural sounding voices able to read any kind of dynamic data and prompts in your server-based speech applications. In partnership with Cepstral: new, high-quality voices guarantee VClarity's market leadership in quality and efficiency as well as in pronunciation accuracy and natural timbre. VClarity supports TTS and application development in the following languages: Arabic (Traditional, Egyptian) English (US, UK) Spanish (Americas Spanish) French (Canadian) Italian German Other Languages Available Upon Request Automatic Speech Recognition Our ASR is the next-generation speech recognition technology for speech-enabled applications. It is speaker-independent and reliably recognizes large-scale vocabulary continuous speech, even in the noisiest environments such as wireless. In partnership with LumenVox: the ASR currently powers services that handle millions of calls every day, such as fully automated directory assistance services, voice portals, and other speech applications. VClarity supports ASR and application development in the following languages: English (US, UK, AUS, NZ) French (Canadian) Spanish (Americas Spanish, Mexican) 10
TELEPHONY VClarity utilizes two telephony interfaces: Dialogic Cards Dialogic offers a wide range of cards that integrate telephones and computers. These cards are typically used in automated telephone platforms to provide products and services including: Interactive Voice Response, Predictive dialing, PC and Windows-Based PBX systems, Conferencing services, and VOIP gateways. VClarity SIP Stack The VClarity SIP Stack is a small foundation class written in C++ that provides an embedded telephony interface to our VClarity Voice Platform. VClarity SIP Stack offers scalability, robustness, ease of use, and an affordable replacement to Telephony hardware. The VClarity SIP Stack works with most of the VOIP providers. Audio Codec supported: Speex ilbc 30ms ilbc 20ms GSM G726-32 G711-Mulaw G711-Alaw 11
VClarity Voice Platform is a speech recognition and text-to-speech framework and operating system of integrated, adaptable, and scalable business solutions that enables you and your people to make business transactions with greater confidence. VClarity Voice Platform works seamlessly, automating and streamlining financial, customer relationship and supply chain processes in a way that allows you to drive business success. U.S. and Worldwide 1-714-333-9465 www.vclarity.com The information contained in this document represents the current view of VClarity L.L.C. on the issues discussed as of the date of publication. Because VClarity must respond to changing market conditions, this document should not be interpreted to be a commitment on the part of VClarity, and VClarity cannot guarantee the accuracy of any information presented after the date of publication. This White Paper is for informational purposes only. VCLARITY MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of VClarity L.L.C. VClarity may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. 2007 VClarity L.L.C. All rights reserved. 12 PART NO. 2007-0001 (02/07)