Special Lecture (406) Spoken Language Dialog Systems Introduction to VoiceXML

Similar documents
Special Lecture (406) Spoken Language Dialog Systems VoiceXML: Dialogs, Forms and Fields

INTRODUCTION TO VOICEXML FOR DISTRIBUTED WEB-BASED APPLICATIONS

Speech Applications. How do they work?

Authors Martin Eckert Ingmar Kliche Deutsche Telekom Laboratories.

Speaker Verification in BeVocal VoiceXML

EVALITA 2009: Loquendo Spoken Dialog System

VClarity Voice Platform

VoiceXML Reference Version 1.6 June 2001

LABORATORY 117. Intorduction to VoiceXML

Introducing the VoiceXML Server

VoiceXML Application Development Recommendations

Implementation of ASR4CRM : An Automated Speech- Enabled Customer Care Service System

BeVocal VoiceXML Tutorial

Voice Foundation Classes

Voice Extensible Markup Language (VoiceXML)

Position Statement for Multi-Modal Access

SurVo. Stepping Through the Basics. Version 2.0

Genesys App Automation Platform Deployment Guide. Hardware and Software Specifications

A Technical Overview: Voiyager Dynamic Application Discovery

Version 2.7. Audio File Maintenance Advanced User s Guide

White Paper Subcategory. Overview of XML Communication Technologies

Auto Attendant. Blue Platform. Administration. User Guide

MRCP Version 1. A.1 Overview

About Unified IP IVR. Product names. Summary description of Unified IP IVR. This chapter contains the following:

Application Notes for InfoTalk-Vbrowser 3.0 with Avaya Aura Communication Manager and Avaya Aura Session Manager 6.3 Issue 1.0

SR Telephony Applications. Designing Speech-Only User Interfaces

Avaya Media Processing Server VXML Browser User Guide

Delivering Superior Self Service with Open Standards

Application Notes for Nuance OpenSpeech Attendant with Avaya Voice Portal Issue 1.0

Holly5 VoiceXML Developer Guide Holly Voice Platform 5.1. Document number: hvp-vxml-0009 Version: 1-0 Issue date: December

Record_With_Confirm. Settings

Dynamic Aural Browsing of MathML Documents via VoiceXML

HMI ARCHITECTURE SUMMARY ARCHITECTURE DESCRIPTION

VoiceXML. Installation and Configuration Guide. Interactive Intelligence Customer Interaction Center (CIC) Version 2016 R4

Version 2.6. SurVo Advanced User s Guide

Hosted Fax Mail. Blue Platform. User Guide

Media Resource Control Protocol v2

Web Architectures. Goal of Architecture Design. Architecture Design. Single Server Configuration. All basic components installed on same machine

Internationalizing Speech Synthesis

Form. Settings, page 2 Element Data, page 7 Exit States, page 8 Audio Groups, page 9 Folder and Class Information, page 9 Events, page 10

Phonologies The Voice of Technology

Grandstream Networks, Inc. UCM6xxx Series Follow Me Guide

Avaya Dialog Designer Dialog Designer Developer s Guide

Dialog Designer Call Flow Elements

Integrate Speech Technology for Hands-free Operation

An overview of interactive voice response applications

Speech Tuner. and Chief Scientist at EIG

Menu Support for 2_Option_Menu Through 10_Option_Menu

Record. Settings. Settings, page 1 Element Data, page 5 Exit States, page 5 Audio Groups, page 6 Folder and Class Information, page 6 Events, page 6

e-business on demand Competitive Technical Briefing Enterprise Portals

A Convedia White Paper. Controlling Media Servers with SIP

Unified CVP Migration

Introduction. VoiceXML overview. Traditional IVR technologies limitations

Unified Meeting User Guide

Deploying SALT Telephony Call Control on an e-business Site

Multimodality with XHTML+Voice

VoiceXML for Business Applications: A Survey

Title page. Nortel Mobile Communication Web User Interface User Guide

An Approach to VoiceXML Application Modeling

Loquendo TTS Director: The next generation prompt-authoring suite for creating, editing and checking prompts

NEAXMail AD-64 VOICE/UNIFIED MESSAGING SYSTEM User Guide

Application Notes for Configuring Nuance Speech Attendant with Avaya Aura Session Manager R6.3 and Avaya Communication Server 1000 R7.6 Issue 1.

Chapter 1 Introduction to HTML, XHTML, and CSS

Getting Started with Exchange Unified Messaging

LABORATORY 117. Intorduction to VoiceXML (3)

Niusha, the first Persian speech-enabled IVR platform

PSS Contact Center Capabilities for Genesys PSS Contact Center Capabilities For Genesys

Genesys Meeting Center User Guide v4.11.7

SOFTWARE AND MULTIMEDIA. Chapter 6 Created by S. Cox

Exam Express Exam EE0-411 voice xml application developer exam Version: 5.0 [ Total Questions: 118 ]

Interaction Desktop framework Printed help. PureConnect powered by Customer Interaction Center (CIC) 2018 R1. Abstract

Back-end Avaya Aura Experience Portal and SIP-enabled Avaya Contact Center Select using a Play and Collect sample application

Rev

Rev

VoIP Basics Guide A COMPREHENSIVE GUIDE FOR VOIP BEGINNERS. Call Us at

Configuration Guide. Index. 1. Admin Menu 2. VoiceXML editor 3. System Reports 4. System Settings. About us

InterCall Unified Meeting SM User Guide v4.4

Unified CVP Architecture Overview

HMS Developer Guide HVP Release 5.1. Document number: hvp-hms-0014 Version: 1-0 Issue date: 22 December 2009

Unified Meeting User Guide

A NOVEL MECHANISM FOR MEDIA RESOURCE CONTROL IN SIP MOBILE NETWORKS

Deliverable D ASFE-DL: Semantics, Syntaxes and Stylistics (R3)

Coral Messaging Center for Windows

Abstract. 1. Conformance. 2. Introduction. 3. Abstract User Interface

Version 2.6. Smart Click-to-Call Advanced User s Guide

UCCE Solutions Hands on Cisco Virtualized Voice Browser (CVVB) and Customer Voice Portal (CVP) Features

Rev oneclicktelecom.net

Management of Prompts, Grammars, Documents, and Custom Files

PRACTICAL SPEECH USER INTERFACE DESIGN

Plug-in 3457 User Guide

AGENT VIEW Taking calls

DRAGON FOR AMBULATORY CARE PROVIDERS

GENESYS MEETING CENTER ENTERPRISE EDITION. User Guide

Application Notes for InfoTalk-Recognizer 9.0 with Avaya Aura Experience Portal 6.0 and Avaya Aura Communication Manager 6.2 Issue 1.

EE Voice xml application developer exam.

Appendix A. A Quick Reference to VoiceXML 1.0 Syntax

Telephony Integration

New System Setup Guide

Languages in WEB. E-Business Technologies. Summer Semester Submitted to. Prof. Dr. Eduard Heindl. Prepared by

Transcription:

Special Lecture (406) Spoken Language Dialog Systems Introduction to VoiceXML Rolf Schwitter schwitt@ics.mq.edu.au Macquarie University 2004 1

Today s Program Developing speech interfaces Brief history of VoiceXML VoiceXML GUI versus VUI VoiceXML implementations Grammars Macquarie University 2004 2

Developing Speech Interfaces Speech interfaces can be developed using general-purpose programming languages special-purpose purpose (programming) languages. A special-purpose purpose language such as VoiceXML can simplify application development reduce network traffic separate interaction code from application logic code provide portability and simplicity support prototyping and refinement. Macquarie University 2004 3

Brief History of VoiceXML In 1999, AT&T, IBM, Lucent Technology and Motorola formed the VoiceXML Forum. The goal was to establish and promote VoiceXML for making Internet content available by phone and voice. Each company had previously developed its own markup language. Customers were reluctant to invest in proprietary technology. VoiceXML 1.0 was released in March 2000. VoiceXML 2.0 is a candidate recommendation (March 2004). Macquarie University 2004 4

W3C Voice Browser Working Group Organisations participating in the Voice Browser Working Group: BeVocal,, Canon, Comverse,, France Telecom, Genesys, HeyAnita,, Hitachi, HP, IBM, Intel, IWA/HWG, Loquendo, Microsoft, MITRE, Mitsubishi Electric, Motorola, Nokia, Nortel Networks, Nuance, PipeBeach,, SAP, Scansoft, Snowshore Networks, SpeechWorks,, Sun Microsystems, Syntellect, Tellme Networks, Unisys, Verascape, Vocalocity, VoiceGenie, Voxeo,, and Voxpilot Macquarie University 2004 5

A VoiceXML Example <?xml version="1.0"?> <vxml version="2.0"> <form> <block> <prompt bargein="false">welcome to Ajax Travel. <audio src="http://www.prerecorded.audiofile..."/> </prompt> </block> </form> </vxml> Macquarie University 2004 6

VoiceXML VoiceXML is designed to describe the speech user interface reduces the amount of speech expertise (but not design expertise). VoiceXML documents can be static or dynamically generated by server side code use the same business logic and databases as the visual Web. Note: The form interpretation algorithm of the voice browser drives the interaction between the VoiceXML documents and the user. Macquarie University 2004 7

Missing Features VoiceXML 2.0 does not support learn-to to-speak applications speaker identification and verification. Macquarie University 2004 8

VoiceXML Architecture Phone PSTN Internet Gateway & Voice Server Internet HTTP/VoiceXML Web Server SIP regular phone wireless phone soft phone telephony interface voice browser automated speech recognition text-to to-speech synthesis touchtone audio play/record VoiceXML documents audio files service logic (CGI) transaction processing database interface Macquarie University 2004 9

A VoiceXML Scenario A customer dials the phone number of a travel agent. The VoiceXML gateway receives the call along with information about the dialed number. The VoiceXML gateway searches a database. If successful, it maps the dialed number to a URL. This URL is the location of the agent s main page (ajax.vxml). ( The gateway retrieves the ajax.vxml page together with associated files such as grammars and recorded audio from the HTTP server. These associated files may be cached on the VoiceXML gateway. Macquarie University 2004 10

A VoiceXML Scenario The VoiceXML interpreter parses and executes the VoiceXML document. The interpreter steps through ajax.vxml playing prompts, hearing responses and passing them on to a speech recognition engine. If necessary, additional VoiceXML documents and associated files are retrieved from the HTTP server. Recorded audio is served by specifying the URL of the WAV file. Communications between the voice gateway and the HTTP server follow standard HTTP protocols. Macquarie University 2004 11

More On VoiceXML: An Example Dialog Computer: Computer: Caller: Computer: Caller: Computer: Welcome to Ajax Travel. Please say your name. Sam. Do you want to travel by air, rail, or boat? Rail. You have selected to travel by rail. Macquarie University 2004 12

A VoiceXML Code Fragment <?xml version = "1.0"?> <vxml version = "2.0"> <form> <block> <prompt> Welcome to Ajax Travel. </prompt> </block> <field name = "UserName"> <prompt> Please say your name. </prompt> Macquarie University 2004 13

A VoiceXML Code Fragment <grammar type = "application/srgs+xml" version = "1.0"> <rule id = "auser"> <one-of> <item>fred</item> <item>sam</item> </one-of> </rule> </grammar> Macquarie University 2004 14

A VoiceXML Code Fragment </field> <filled> <goto next = "#travel"/> </filled> </form> <! - transition to another dialog in the current document --> Macquarie University 2004 15

A VoiceXML Code Fragment <menu id = "travel"> <prompt> Do you want to travel by air, rail, or boat? </prompt> <! - choices follow on the next slides --> Macquarie University 2004 16

A VoiceXML Code Fragment <choice next = "#plane"> <grammar type = "application/srgs+xml" version = "1.0"> <rule id = "by_plane"> <item> air </item> </rule> </grammar> </choice> Macquarie University 2004 17

A VoiceXML Code Fragment <choice next = "#train"> <grammar type = "application/srgs+xml" version = "1.0"> <rule id = "by_train"> <item> rail </item> </rule> </grammar> </choice> Macquarie University 2004 18

A VoiceXML Code Fragment <choice next = "#boat"> <grammar type = "application/srgs+xml" version = "1.0"> <rule id = "by_boat"> <item> boat </item> </rule> </grammar> </choice> </menu> Macquarie University 2004 19

A VoiceXML Code Fragment <form id = "train"> <block> <prompt> You have selected to travel by rail. Details for making travel arrangement would be here in a real application </prompt> </block> </form> </vxml> Macquarie University 2004 20

VoiceXML Elements <vxml vxml> top-level element in each VoiceXML document <form> a dialog for presenting information and collecting data <block> a container of (non-interactive) executable code <prompt> queue speech synthesis and audio output to the user <field> declares an input field in a form <filled> an action executed when fields are filled Macquarie University 2004 21

VoiceXML Elements <menu> a dialog for choosing amongst alternative destinations <choice> define a menu item <grammar> specify a speech recognition or DTMF grammar <goto goto> go to another dialog in the same or different document Macquarie University 2004 22

GUI versus VUI Macquarie University 2004 23

GUI versus VUI Characteristic Fonts Pictures Backgrounds Menus Forms Prompting the user User s response GUI (HTML) Choice of multiple sizes, colours, and types Present the picture to the user Choice of multiple colours and patterns Large number of choices Users may enter values in any order Large menu of information and links Click a field and enter a value VUI (VoiceXML ( VoiceXML) Choice of multiple voices Describe the picture with words (caption) Choice of background music or sounds Usually 7 ± 2 choices Users enter values usually in predefined order Limited number of prompts and options Speak the menu choice or field value Macquarie University 2004 24

GUI versus VUI Characteristic Navigation Human memory Dialog style Input control Accuracy of input Event handling Global commands Dates GUI (HTML) Click the link or use the keyboard to enter URL Screen acts as an extension to human memory Mostly user-directed Apply the type constraints Input is always recognized Display the event message Available as a menu in a separate frame Absolute dates are used (28.7.2003) VUI (VoiceXML ( VoiceXML) Speak the option choice Caller is forced to use own mental memory Mostly application-directed Apply the grammar rules Input may not be recognised correctly Present help as a prompt Announced at the beginning of a session Relative dates are used ( next Monday ) Macquarie University 2004 25

VoiceXML Implementations Web-based based VoiceXML development tools: Tellme at http://studio.tellme.com BeVocal at http://café.bevocal.com HeyAnita at http://www.heyanita.com VoiceXML platforms (and graphical development tools): Nuance at http://www.nuance.com OptimTalk at http://www.optimtalk.cz Motorola s Mobile ADK for Voice (old beta version only) Macquarie University 2004 26

Tellme Studio Tellme studio is a suite of Web-based based VoiceXML development tools. Tellme studio enables you to build and test, and publish VoiceXML applications without buying or installing any hardware or software. By registering, you can develop your application for free. But check out first the VoiceXML elements supported by the Tellme voice interpreter. Macquarie University 2004 27

MyStudio VoiceXML scratchpad You can write a phone application using the VoiceXML scratchpad. Application URL Alternatively, you can write a phone application using a text editor and store the result on a Web server. The application URL points to the initial VoiceXML document. VoixeXML terminal You can test the application logic and flow using the VoiceXML terminal. Macquarie University 2004 28

MyStudio Macquarie University 2004 29

VoiceXML Scratchpad Macquarie University 2004 30

Application URL Macquarie University 2004 31

VoiceXML Terminal Macquarie University 2004 32

Grammar Scratchpad The Tellme platform provides two choices when writing grammars: use a built-in in grammar define your own grammar Supported grammar languages are: Nuance Grammar Specification Language (GSL) Speech Recognition Grammar Specification (SRGS) You can execute GSL + SRGS in the VoiceXML scratchpad. But the Tellme grammar tools support GSL grammars only. Macquarie University 2004 33

Grammar Scratchpad: SRGS Macquarie University 2004 34

Grammar Scratchpad: GSL Macquarie University 2004 35

Grammar Phrase Checker Macquarie University 2004 36

Grammar Phrase Checker: Returned Value Macquarie University 2004 37

Grammar Phrase Generator Macquarie University 2004 38

Grammar Phrase Generator: Generated Phrase Macquarie University 2004 39

Connecting to Tellme Studio To preview your application, you can use a phone and call (408)-678 678-4465 or you can use a soft phone and call sip:8005558965@sip.studio.tellme.com Macquarie University 2004 40

Nuance Grammar Specification Language (GSL) Key to understand speech are well-designed grammars. A grammar describes the words and phrases that the recogniser can understand at a specific point in a VoiceXML document. GSL provides a rich set of functionality for writing grammars. But GSL syntax uses characters that are reserved by XML. Therefore, in-line grammars must be protected (CDATA section). Macquarie University 2004 41

GSL Syntax by Example Here is an in-line grammar in GSL format: <grammar type = "application/x-gsl" mode = "voice"> <![CDATA[ [ [(new york) (big apple)] {<destination "new york">} [washington (the capital)] {<destination "washington">} ] ]]> </grammar> Macquarie University 2004 42

GSL Syntax by Example All words within a GSL grammar are lowercase 1. Square brackets define an "or" condition. Parentheses define an "and" condition. [ ] [(new york) (big apple)] {<destination "new york">} [washington (the capital)] {<destination "washington">} 1 Uppercase letters are reserved to reference subgrammars. Macquarie University 2004 43

GSL Syntax by Example This grammar fragment [ ] [(new york) (big apple)] {<destination "new york">} [washington (the capital)] {<destination "washington">} recognises either "New York" or the synonym "Big Apple" or "Washington" or the synonym "The Captial" as destination. Macquarie University 2004 44

GSL Syntax by Example The value of the name attribute of a field in an active form is set to the value returned by the grammar. <form> <field name = "destination"> <prompt>do you want to fly to New York or Washington?</prompt> <grammar type = "application/x-gsl" mode = "voice"> <![CDATA[ [[(new york) (big apple)] {<destination "new york">} [washington (the capital)] {<destination "washington">}] ]]> </grammar> Macquarie University 2004 45

What Happens if the caller does not respond with an predefined expression or the caller does not respond at all? In these cases events are thrown by the platform. Events are caught by the catch element: <catch event = "nomatch noinput"> <reprompt/> </catch> Macquarie University 2004 46

GSL Grammar at Work <?xml version = "1.0"?> <vxml version = "2.0"> <form> <field name = "destination"> <prompt>do you want to fly to New York or Washington?</prompt> <grammar type = "application/x-gsl" mode = "voice"> <![CDATA[ [[(new york) (big apple)] {<destination "new york">} [washington (the capital)] {<destination "washington">}] ]]> </grammar> <catch event = "nomatch noinput"> <reprompt/> </catch> Macquarie University 2004 47

GSL Grammar at Work <filled> <prompt>you said <value expr = "destination"/></prompt> </filled> </field> </form> </vxml> Macquarie University 2004 48

More on GSL Syntax The following grammar fragment [([enroll add] me) (sign me up)] recognises the three sentences: enroll me add me sign me up because an "or" condition is nested within an "and" condition. Macquarie University 2004 49

More on GSL Syntax Variations in input are supported through additional operators. Operators are attached as prefixes to individual words or phrases. s. Operators can be used to indicate that a word or phrase may occur zero or one time (?) zero or more times (*) one or more times (+) Macquarie University 2004 50

Using Operators Use "?" before an expression when it is completely optional. In the following example, the word "me" is optional: [(enroll?me) (sign?me up)] The utterance should not be rejected because the caller neglects to say the word "me". Macquarie University 2004 51

Using Operators Use "*" before an expression when it is optional, but the caller may say it multiple times. In the following example, the word "please" is optional [(*please sign?me up)] but the caller may say this word more than once. Macquarie University 2004 52

Using Operators Use "+" before an expression when it must occur at least one time but may occur more than once. This operator is provided for completeness only. It is not typically used. Macquarie University 2004 53

GSL Grammar for Voice Input <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [sales] {<dept "010">} [marketing] {<dept "020">} [engineering] {<dept "030">} [(public relations) (p r)] {<dept "040">} ] ]]> </grammar> Macquarie University 2004 54

GSL Grammar for Touchtone Input <grammar type="application/x-gsl" mode="dtmf"> <![CDATA[ [ [dtmf-1] {<dept "010">} [dtmf-2] {<dept "020">} [dtmf-3] {<dept "030">} [dtmf-4] {<dept "040">} ] ]]> </grammar> Macquarie University 2004 55

Take-Home Message You can develop a VoiceXML application using a Web-based based development environment a VoiceXML platform on a desktop computer. Tellme Studio is a suite of Web-based based VoiceXML development tools enables you to build, test and publish Voice XML applications supports the Nuance Grammar Specification Language (GSL). Macquarie University 2004 56