DIALOGUE MANAGEMENT & VOICEXML, PART I

Similar documents
Special Lecture (406) Spoken Language Dialog Systems VoiceXML: Dialogs, Forms and Fields

Voice Extensible Markup Language (VoiceXML)

VClarity Voice Platform

Voice Foundation Classes

VoiceXML Application Development Recommendations

Speaker Verification in BeVocal VoiceXML

Special Lecture (406) Spoken Language Dialog Systems Introduction to VoiceXML

INTRODUCTION TO VOICEXML FOR DISTRIBUTED WEB-BASED APPLICATIONS

LABORATORY 117. Intorduction to VoiceXML (3)

LABORATORY 117. Intorduction to VoiceXML

VoiceXML Reference Version 1.6 June 2001

EVALITA 2009: Loquendo Spoken Dialog System

Niusha, the first Persian speech-enabled IVR platform

Avaya Media Processing Server VXML Browser User Guide

SR Telephony Applications. Designing Speech-Only User Interfaces

BeVocal VoiceXML Tutorial

An Approach to VoiceXML Application Modeling

Exam Express Exam EE0-411 voice xml application developer exam Version: 5.0 [ Total Questions: 118 ]

Cisco CVP VoiceXML 3.0. Element Specifications

Web Architectures. Goal of Architecture Design. Architecture Design. Single Server Configuration. All basic components installed on same machine

Voice Browser Working Group (VBWG) Input on application backplane topics. Scott McGlashan (HP) Rafah Hosn (IBM)

Menu Support for 2_Option_Menu Through 10_Option_Menu

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Authors Martin Eckert Ingmar Kliche Deutsche Telekom Laboratories.

VoiceXML Tutorial. Felix Burkhardt, written around 2002

Form. Settings, page 2 Element Data, page 7 Exit States, page 8 Audio Groups, page 9 Folder and Class Information, page 9 Events, page 10

Deliverable D ASFE-DL: Semantics, Syntaxes and Stylistics (R3)

Version 2.6. SurVo Advanced User s Guide

Multimodality with XHTML+Voice

SurVo. Stepping Through the Basics. Version 2.0

Speech Applications. How do they work?

SCXML. Michael Bodell.

Script Step Reference Information

MRCP Version 1. A.1 Overview

Back-end Avaya Aura Experience Portal and SIP-enabled Avaya Contact Center Select using a Play and Collect sample application

Position Statement for Multi-Modal Access

Application Notes for InfoTalk-Vbrowser 3.0 with Avaya Aura Communication Manager and Avaya Aura Session Manager 6.3 Issue 1.0

EE Voice xml application developer exam.

SEER AKADEMI LINUX PROGRAMMING AND SCRIPTINGPERL 7

PRACTICAL SPEECH USER INTERFACE DESIGN

A NOVEL MECHANISM FOR MEDIA RESOURCE CONTROL IN SIP MOBILE NETWORKS

VoiceXML. Installation and Configuration Guide. Interactive Intelligence Customer Interaction Center (CIC) Version 2016 R4

ATTENDANT USER GUIDE

Introducing the VoiceXML Server

SALT. Speech Application Language Tags 0.9 Specification (Draft)

Dialogue systems. Volha Petukhova Saarland University

11/26/2008. CSG 170 Round 10. Prof. Timothy Bickmore. Quiz

AT&T Connect Participant Application User Guide Integrated Edition Version 8.9 January 2010

NEAXMail AD-64 VOICE/UNIFIED MESSAGING SYSTEM User Guide

A Technical Overview: Voiyager Dynamic Application Discovery

Genesys Meeting Center User Guide v4.11.7

How to create dialog system

Media Resource Control Protocol v2

Application Notes for Nuance OpenSpeech Attendant with Avaya Voice Portal Issue 1.0

NEAXMail AD-40 User Guide

From video conversation 2. This is a gap fill exercise and can be used as either a quiz/test of

Starting-Up Fast with Speech-Over Professional

Getting Started with Exchange Unified Messaging

Hosted Fax Mail. Blue Platform. User Guide

Component Model for Multimodal Web Applications

ESD.051 / Engineering Innovation & Design

IHF 1500 Bluetooth Handsfree Kit. Troubleshooting. and. Frequently Asked Questions Guide (FAQ) Version 1.0 Dec. 1st, 2006 All Rights Reserved

CallPilot Multimedia Messaging

Extracting domain knowledge for dialogue systems from unstructured Web pages

Integrated Conference Bridge Professional

Intro to XML. Borrowed, with author s permission, from:

Unified Meeting User Guide

A Scripting Language for Multimodal Presentation on Mobile Phones

Tools and Toolkits for Voice and Animated Character based Interventions. Overview

Auto Attendant Guide - Proprietary Nextera Communications. Auto Attendant Guide

MPML: A Multimodal Presentation Markup Language with Character Agent Control Functions

Back-end Avaya Aura Experience Portal and SIP-enabled Avaya Aura Contact Center using Context Creation

Holly5 VoiceXML Developer Guide Holly Voice Platform 5.1. Document number: hvp-vxml-0009 Version: 1-0 Issue date: December

MRCP. Google SR Plugin. Usage Guide. Powered by Universal Speech Solutions LLC

Installing And Programming The Digital Voice Announce Equipment On The DXP, DXP Plus, And FX Series Systems

Formatting documents for NVivo, in Word 2007

MRCP. Kaldi SR Plugin. Usage Guide. Powered by Universal Speech Solutions LLC

Application Notes for Beijing InfoQuick SinoVoice Speech Technology (SinoVoice) jtts with Avaya Interactive Response Issue 1.0

Evaluation of a Mixed-Initiative Dialogue Multimodal Interface

IP Office Voic Pro

Grid4 s SmartCOMM Hosted IP Complete. Auto Attendant User Guide

Using control sequences

Application Notes for Deploying a VoiceXML Application Using Avaya Interactive Response and Audium Studio - Issue 1.0

Application Notes for Beijing InfoQuick SinoVoice Speech Technology (SinoVoice) jtts with Avaya Voice Portal Issue 1.0

CCXML in Action: A CCXML Auto Attendant

Unified Meeting User Guide

CallPilot Mini Reference Guide

Hands-Free Internet using Speech Recognition

Multimodal Dialog Description Language for Rapid System Development

Multilingual Aspects in Speech and Multimodal Interfaces. Paolo Baggia Director of International Standards

4s Instead Of Voice Control

Element Specifications for Cisco Unified CVP VXML Server and Cisco Unified Call Studio Release 11.6(1)

Auto Attendant. Blue Platform. Administration. User Guide

USER REFERENCE MANUAL

MRCP for State Tables

A POLICY-BASED DIALOGUE SYSTEM FOR PHYSICAL ACCESS CONTROL

Aspire Basic Operation (Quick Reference)

VOICEXML-APPLICATIONS FOR E-COMMERCE AND E-LEARNING

Blackboard Collaborate WCAG 2.0 Support Statement August 2016

Media Resource Control Protocol v2 A Tutorial

Transcription:

DIALOGUE MANAGEMENT & VOICEXML, PART I Staffan Larsson Slides adapted from Jurafsky & Martin and Gabriel Skantze, with additions from VoiceXML2.0 document

LINGUISTICS OF HUMAN CONVERSATION (RECAP)

Linguistics of Human Conversation Speech Acts Turn-taking Grounding Conversational Structure Implicature 2017-02-21 3 Speech and Language Processing -- Jurafsky and Martin

Speech Acts Austin (1962): An utterance is a kind of action Clear case: performatives I name this ship the Titanic I second that motion I bet you five dollars it will snow tomorrow Performative verbs (name, second) Austin s idea: not just these verbs 2017-02-21 4 Speech and Language Processing -- Jurafsky and Martin

Each utterance is 3 acts Locutionary act: the utterance of a sentence with a particular meaning Illocutionary act: the act of asking, answering, promising, etc., in uttering a sentence. Perlocutionary act: the (often intentional) production of certain effects upon the thoughts, feelings, or actions of addressee in uttering a sentence. 2017-02-21 5 Speech and Language Processing -- Jurafsky and Martin

5 classes of speech acts: Searle (1975) Assertives: committing the speaker to something s being the case suggesting, putting forward, swearing, boasting, concluding Directives: attempts by the speaker to get the addressee to do something asking, ordering, requesting, inviting, advising, begging Commissives:Committing the speaker to some future course of action promising, planning, vowing, betting, opposing Expressives: expressing the psychological state of the speaker about a state of affairs thanking, apologizing, welcoming, deploring Declarations: bringing about a different state of the world via the utterance I resign; You re fired) 2017-02-21 6 Speech and Language Processing -- Jurafsky and Martin

Turn-taking Dialogue is characterized by turn-taking. A: B: A: B: Resource allocation problem: How do speakers know when to take the floor? Total amount of overlap relatively small (5% - Levinson 1983) Don t pause either Must be a way to know who should talk and when. 2017-02-21 7 Speech and Language Processing -- Jurafsky and Martin

Turn-taking rules At each transition-relevance place of each turn: a. If during this turn the current speaker has selected B as the next speaker then B must speak next. b. If the current speaker does not select the next speaker, any other speaker may take the next turn. c. If no one else takes the next turn, the current speaker may take the next turn. 2017-02-21 8 Speech and Language Processing -- Jurafsky and Martin

Implications of subrule (a) For some utterances the current speaker selects the next speaker Adjacency pairs Question/answer Greeting/greeting Compliment/downplayer Request/grant Silence within adjacency pair is different than silence after A: Is there something bothering you or not? (1.0) A: Yes or no? (1.5) A: Eh B: No. 2017-02-21 9 Speech and Language Processing -- Jurafsky and Martin

Grounding Dialogue is a collective act performed by speaker and hearer Common ground: set of things mutually believed by both speaker and hearer Need to achieve common ground, so hearer must ground or acknowledge speakers utterance. Clark (1996): Principle of closure. Agents performing an action require evidence, sufficient for current purposes, that they have succeeded in performing it Need to know whether an action succeeded or failed 2017-02-21 10 Speech and Language Processing -- Jurafsky and Martin

Clark and Schaefer: Grounding Continued attention: B continues attending to A Relevant next contribution: B starts in on next relevant contribution Acknowledgement: B nods or says continuer like uh-huh, yeah, assessment (great!) Demonstration: B demonstrates understanding A by paraphrasing or reformulating A s contribution, or by collaboratively completing A s utterance Display: B displays verbatim all or part of A s presentation 2017-02-21 11 Speech and Language Processing -- Jurafsky and Martin

A human-human conversation 2017-02-21 12 Speech and Language Processing -- Jurafsky and Martin

Grounding examples Display: C: I need to travel in May A: And, what day in May did you want to travel? Acknowledgement C: He wants to fly from Boston A: mm-hmm C: to Baltimore Washington International [Mm-hmm (usually transcribed uh-huh ) is a backchannel, continuer, or acknowledgement token] 2017-02-21 13 Speech and Language Processing -- Jurafsky and Martin

Grounding Examples (2) Acknowledgement + next relevant contribution And, what day in May did you want to travel? And you re flying into what city? And what time would you like to leave? The and indicates to the client that agent has successfully understood answer to the last question. 2017-02-21 14 Speech and Language Processing -- Jurafsky and Martin

Grounding and Dialogue Systems Grounding is not just a tidbit about humans Is key to design of conversational agent Why? 2017-02-21 15 Speech and Language Processing -- Jurafsky and Martin

Grounding and Dialogue Systems Grounding is not just a tidbit about humans Is key to design of conversational agent Why? HCI researchers find users of speech-based interfaces are confused when system doesn t give them an explicit acknowledgement signal Stifelman et al. (1993), Yankelovich et al. (1995) 2017-02-21 16 Speech and Language Processing -- Jurafsky and Martin

Conversational Structure Telephone conversations Stage 1: Enter a conversation Stage 2: Identification Stage 3: Establish joint willingness to converse Stage 4: First topic is raised, usually by caller 2017-02-21 17 Speech and Language Processing -- Jurafsky and Martin

Conversational Implicature A: And, what day in May did you want to travel? C: OK, uh, I need to be there for a meeting that s from the 12th to the 15th. Note that client did not answer question. Meaning of client s sentence: Meeting Start-of-meeting: 12th End-of-meeting: 15th Doesn t say anything about flying!!!!! What is it that licenses agent to infer that client is mentioning this meeting so as to inform the agent of the travel dates? The assumption that C s utterance is relevant to A s question 2017-02-21 18 Speech and Language Processing -- Jurafsky and Martin

Grice: conversational implicature Implicature means a particular class of licensed inferences. Grice (1975) proposed that what enables hearers to draw correct inferences is: Cooperative Principle This is a tacit agreement by speakers and listeners to cooperate in communication 4 maxims: Relevance: Be relevant Quantity: Do not make your contribution more or less informative than required Quality: try to make your contribution one that is true (don t say things that are false or for which you lack adequate evidence) Manner: Avoid ambiguity and obscurity; be brief and orderly 2017-02-21 19 Speech and Language Processing -- Jurafsky and Martin

DIALOGUE MANAGEMENT

Dialogue System Architecture 2017-02-21 Speech and Language Processing -- Jurafsky and Martin 21

Dialogue Manager Controls the architecture and structure of dialogue Takes input from ASR/NLU components Maintains some sort of state Interfaces with Task Manager Passes output to NLG/TTS modules 2017-02-21 22 Speech and Language Processing -- Jurafsky and Martin

Five architectures for dialogue management Finite State: this lecture (VoiceXML) Frame-based: next lecture (VoiceXML) AI Planning & Plan Recognition Information State Dialogue Systems II course Reinforcement learning / Hidden Information State Dialogue Systems II course 2017-02-21 23 Speech and Language Processing -- Jurafsky and Martin

Brief history of dialogue systems ELIZA (Weizenbaum 1966): text, psychoanalyst SHRDLU (Winograd 1972): text, blocks world TRAINS (Allen et al 1991): speech, virtual world CLSU toolkit (McTear 1994): speech, general platform, finite state Philips Railway Information System (Aust et al 1994): speech over phone, form-filling VoiceXML (W3C 2000): speech, general platform, form-filling Siri (Apple 2009): multimodal, smartphonebased, form-filling (more or less)

VOICE XML: BACKGROUND

Why VoiceXML? A W3C non-proprietary standard Will not change overnight Provides a good introduction to dialogue systems design Clear and explicit functionality specification Widely used in call center applications There are free and open platforms The dialogue capabilities of VoiceXML are comparable to the capabilities of state of the art commercial dialogue management in Siri, Cortana, etc. Deals well with simple, frame-based mixed initiative dialogue. Drawbacks: Not clear if it still being developed Not available in smartphone app format Getting a bit old

Slide adapted from Kenneth G. Rehor The Evolu*on of Early Voice Markup Languages 2000 B. D. Lucas L. Boyer Speech Markup Language J. Ferrans G. Karam N. Klarlund P. Danielsen TM VoxML PML PML J. C. Ramming K. G. Rehor D. A. Ladd 2/96 C. D. Tuckey 11/98 1995 Bell Labs MAWL/PML/PhoneWeb

Slide adapted from Gabriel Skantze What is VoiceXML? Voice extensible Markup Language An XML-based script-language for developing telephonebased dialogue systems A standard developed by W3C Most recent version is 2.0 Working draft of 3.0 A VoiceXML-application is a web application The application resides on a web server and is available over the Internet The dialogue is executed by a voice browser Distributed architecture

Slide adapted from Gabriel Skantze VoiceXML manages: Prompts with synthetic speech Prompts with pre-recorded speech and sounds Recognition of spoken words and phrases Recognition of DTMF (buttons) Voice recordings Control of dialogue flow (menus, error handling, sub dialogue, etc) Control of telephony (call routing, hang-up, etc)

Slide adapted from Gabriel Skantze Distributed architecture HTML HTML browser Web servers Voice XML VoiceXML browser (ASR, TTS)

Slide adapted from Gabriel Skantze Related standards SRGS - Speech Recognition Grammar Specification SISR - Semantic Interpretation for Speech Recognition SSML - Speech Synthesis Markup Language ECMAScript (JavaScript) - the scripting language used in VoiceXML VoiceXML 3.0 (draft): https://www.w3.org/tr/voicexml30/ CCXML - Call Control extensible Markup Language https://www.w3.org/tr/ccxml/ SCXML - State Chart XML https://www.w3.org/tr/scxml/

Slide adapted from Gabriel Skantze XML recap XML = Extensible Markup Language Defined by W3C Often used in web applications Used to encode arbitrary data structures Tree-structured data Domain Specific Languages (such as VoiceXML) Human-readable, Machine-readable Related standards Validation: DTD, XML Schema (XSD) Transformation: XSLT

Slide adapted from Gabriel Skantze XML example: a record collection XML declaration Element node <?xml version="1.0" encoding="utf-8"?> <RecordCollection> <Record id="72930"> <Title>Blonde on Blonde</Title> <Artist>Bob Dylan</Artist> </Record> <Record id="81729"> <Title>Let it bleed</title> <Artist>Rolling Stones</Artist> </Record> </RecordCollection> End tag Start tag Attribute Text node

Slide adapted from Gabriel Skantze XML: Things to remember Each start tag must have a corresponding end tag. <Tag>...</Tag> The attributes must be on the form: attribute= value An empty tag may be formatted as: <Tag/> which is exactly the same thing as: <Tag></Tag> Comments are written as: <!-- this is a comment --> The symbols < and > may not be used in any place except for formatting tags Use < and > instead

Slide adapted from Gabriel Skantze XML applications XML defines how the mark-up should be formatted (but not which tags may be used). For an XML document to be wellformed, it must follow this standard. An XML schema or a DTD (document type definition) describes an XML application. This regulates which tags and attributes may be used and how they may be structured. If an XML document follows the XML schema or DTD, it is valid.

Slide adapted from Gabriel Skantze Non-wellformed XML <?xml version="1.0" encoding="iso-8859-1?> <RecordCollection xmlns= http://www.myrecordcollection.com/ > <Record id="72930"> <Title>Blonde on Blonde</Title> <Artist>Bob Dylan</Artist> </Record> <Record id=81729> <Title>Let it bleed</title> <Artist>Rolling Stones </Record> </RecordCollection> Attribute-values must be stated within quotations End tag is missing for <Artist>

Slide adapted from Gabriel Skantze Non-valid XML <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE RecordCollection SYSTEM "RecordCollection.dtd"> <RecordCollection> <Record id="72930"> <Title>Blonde on Blonde</Title> </Record> <Record id="81729"> <Title>Let it bleed</title> <Artist>Rolling Stones</Artist> </Record> </RecordCollection> This document is well-formed, but not valid! A DTD (Document Type Definition) is specified for this document Artist is missing (required by the DTD)

Slide adapted from Gabriel Skantze Different XML-applications (X)HTML is an XML-application for describing graphical presentation of text and interaction through links and forms. XHTML (fragment) <p> <b>choose an option</b> <br/> <a href="book.html">book</a> <br/> <a href="help.html">help</a> <br/> <a href="end.html">cancel</a> </p> Choose an option Book Help Cancel

Slide adapted from Gabriel Skantze VoiceXML In a similar fashion, VoiceXML describes the dialogue, using grammars, mark-up for speech synthesis, etc. VoiceXML (fragment) <menu> <prompt>what do you want to do? Book, help or cancel?</prompt> <choice next="book.vxml">book</choice> <choice next="help.vxml">help</choice> <choice next="end.vxml">cancel</choice> </menu> What do you want to do? Book... Book

Slide adapted from Gabriel Skantze Hello World <?xml version="1.0" encoding="iso-8859-1"?> <vxml version="2.0" lang="en"> <form> <block> <prompt>hello world</prompt> </block> </form> </vxml>

VOICE XML: ARCHITECTURE AND BASIC CONCEPTS

VoiceXML Architecture Model The implementation platform generates events in response to user actions (e.g. spoken input received, disconnect) and system events (e.g. timer expiration) The VoiceXML interpreter Handles events from the implementation platform Sends requests to, and processes documents from, the Document Server The Document Server processes requests from the VoiceXML Interpreter through the VoiceXML Interpreter Context. produces VoiceXML documents in reply

VoiceXML Interpreter Context The VoiceXML Interpreter Context may detect incoming calls acquire the initial VoiceXML document answer the call while the VoiceXML Interpreter conducts the dialog after the answer The VoiceXML Interpreter Context may also monitor user inputs in parallel with the VoiceXML interpreter. For example, One VoiceXML Interpreter context may always listen for a special escape phrase that takes the user to a high-level personal assistant Another may listen for escape phrases that alter user preferences like volume or text-to-speech characteristics

Basic Concepts Document Transition Session Prompt Dialog Menu Form Link Field grammar Event Error handling Form Interpretation Algorithm (FIA) Form grammar Link Subdialog (later)

Documents and transitions A VoiceXML document (or a set of related documents called an application) forms a conversational finite state machine The user is always in one conversational state, or dialog, at a time. Each dialog determines the next dialog to transition to Transitions are specified using URIs, which define the next document and dialog to use If a URI does not refer to a document, the current document is assumed If it does not refer to a dialog, the first dialog in the document is assumed Execution is terminated when a dialog does not specify a successor, or if it has an element that explicitly exits the conversation

Sessions A session begins when the user starts to interact with a VoiceXML interpreter context; Continues as documents are loaded and processed; Ends when requested by the user, a document, or the interpreter context.

Slide adapted from Gabriel Skantze Prompts: <prompt> Text is read by a speech synthesiser that may be controlled: Emphasis <emphasis>welcome</emphasis> to the travel booking system. Say-as <say-as type= telephone >08663995</say-as> Choice of synthesis <voice gender= female >This is a female voice</voice> Speech Synthesis Markup Language (SSML) W3C standard

Slide adapted from Gabriel Skantze Prompts, cont Recorded audio may be mixed with speech synthesis: <prompt> A black bird sounds like this: <audio src= blackbird.wav /> </prompt> Barge-in may be controlled: <prompt bargein= false > Now you cannot interrupt me </prompt>

Tapered prompts: the count attribute Each field can have one or more prompts. If there is one, it is repeatedly used to prompt the user for the value until one is provided. If there are many, the count attribute can be used to determine which prompts to use on each attempt. Tapered prompts may change with each attempt. Information-requesting prompts may become more terse under the assumption that the user is becoming more familiar with the task. Help messages become more detailed perhaps, under the assumption that the user needs more help. Or, prompts can change just to make the interaction more interesting.

Example: tapered prompts <?xml version="1.0" encoding="utf--8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml > <form id="tapered"> <block> <prompt bargein="false >Welcome to the ice cream survey.</prompt> </block> <field name="flavor"> <grammar mode="voice" version="1.0" root="root"> <rule id="root" scope="public"> <one--of> <item>vanilla </item> <item>chocolate</item> <item>strawberry</item> </one--of> </rule> </grammar> <prompt count="1">what is your favorite flavor?</prompt> <prompt count="3">say chocolate, vanilla, or strawberry.</prompt> <help>sorry, no help is available.</help> </field> </form> </vxml>

Slide adapted from Gabriel Skantze Menu, Forms and Links S: Welcome to the travel booking system. What do you want to do? Book a trip, Speak to an operator, End the call. A: Book a trip S: From where do you want to go? A: From Paris S: Where do you want to go? A: London S: Which day? A: Abort S: Aborting. Welcome to the travel Menu Form Link

Menu Presents the user with a choice of options Transitions to another dialog based on that choice <choice> specifies a choice next attribute specifies where to go if the choice is selected <enumerate/> can be used in prompts to read out all choices

Slide adapted from Gabriel Skantze Menu: <menu> <?xml version="1.0" encoding="iso-8859-1"?> <vxml version="2.0" lang="en"> <menu> <prompt> Welcome to the travel booking system. What do you want to do? <enumerate/> </prompt> <choice next="booking.vxml">book a trip</choice> <choice next="operator.vxml">speak to an operator</choice> <choice next="end.vxml">end the call</choice> </menu> </vxml>

Slide adapted from Gabriel Skantze Menu: <menu> <?xml version="1.0" encoding="iso-8859-1"?> <vxml version="2.0" lang="en"> <menu> <prompt> Welcome to the travel booking system. What do you want to do? <enumerate/> </prompt> <choice next="#booking">book a trip</choice> <choice next="operator.vxml">speak to an operator</choice> <choice next="end.vxml">end the call</choice> </menu> <form name="booking"> </vxml>

Forms and fields Forms define an interaction that collects values for a set of form item variables A form consists of a number of fields For each field, a form item variable is specified by the <name> attribute Example: <field name="origin > Each field may specify a grammar that defines the allowable inputs for that field If a form-level grammar is present, it can be used to fill several fields from one utterance (next lecture) <filled> specifies what to do when a form or field has been filled Variable values are referred to using the value attribute Example: <value expr="origin"/>

Slide adapted from Gabriel Skantze Form: <form> <?xml version="1.0"?> <vxml xmlns="http://www.w3.org/2001/vxml" version="2.1" xml:lang="en-us"> <form> <field name="origin"> <grammar src="grammar.xml#origin"/> <prompt>from where do you want to go?</prompt> </field> <field name="destination"> <grammar src= grammar.xml#destination /> <prompt>where do you want to go?</prompt> </field>... <filled> <prompt> Ok, you want to go from <value expr="origin"/> to <value expr="destination"/> </prompt> <submit next="booking_to.vxml"/> </filled> </form> </vxml>

Slide adapted from Gabriel Skantze Grammar in a <field> A grammar specifies what the user can say Option-list <field name="color"> <prompt>which color?</prompt> <option>red</option> <option>blue</option> <option>green</option> </field> Field type <field name="confirm" type="boolean"> <prompt>please say yes or no</prompt> </field> date, time, boolean, number, currency, etc. Grammar external internal <field name="origin"> <grammar src="grammar.xml#origin"/> <prompt>from where do you want to go?</prompt> </field> <grammar>[hit hold split (double down)]</grammar>

Speech acts and adjacency pairs in VoiceXML? Fields and menus can implement acts of asking Menus are alternative-questions Fields are often wh-questions Possible acts of answering are specified By list of choices in menus By grammar for fields Asking a questions sets up an expectation for an answer = playing a field or menu prompt causes system to listen for possible answers

Slide adapted from Gabriel Skantze FIA: Form Interpretation Algorithm Go through each field in the form, in sequential order. Stop at the first field that is not filled. Play the prompt, start recognition. Fill the fields in the form that the matching grammar specifies. If all fields are filled: Leave the form. Otherwise: reiterate.

Slide adapted from Gabriel Skantze FIA finite state example S: Welcome to the travel booking system. You can now make your reserva=on S: Where are you going? U: I want to go to London S: From where do you want to go? U: From Paris S: Which date do you want to go? U: On the 13th of January S: At what *me do you want to go? U: Three o clock S: Thanks for your reserva*on Dest Origin Date Time London Paris 13th of January Three

FIA and conversational structure The FIA, toghether with a VoiceXML document, specifies possible conversational structures Possible topics and the order in which they may occur

Slide adapted from Gabriel Skantze Form: <filled> Contains instructions to be executed when a <field> or <form> is filled, such as: <prompt> <goto> <if> <throw> <clear> <assign> <form> <field name="origin"> <grammar src="grammar.xml#origin"/> <prompt>from where do you want to go?</prompt> <filled> <prompt> Ok, you want to go from <value expr="origin"/> </prompt> </filled> </field>... <filled> <prompt> Ok, you want to go from <value expr="origin"/> to <value expr="destination"/> </prompt> </filled> </form>

Blocks: <block> A sequence of procedural statements used for prompting and computation, but not for gathering input. <form> <field name="transporttype"> <prompt> Please choose airline, hotel, or rental car. </prompt> <grammar type="application/x=nuance-gsl"> [airline hotel "rental car"] </grammar> </field> <block> <prompt> You have chosen <value expr="transporttype">. </prompt> </block> </form>

Events VoiceXML defines a mechanism for handling events not covered by the basic form mechanism Events are thrown by the platform under a variety of circumstances when the user does not respond: <noinput> doesn't respond intelligibly: <nomatch> requests help: <help> And more (The interpreter also throws events if it finds a semantic error in a VoiceXML document)

Catching events Events are caught by catch elements Each element in which an event can occur may specify catch elements. Catch elements are inherited from enclosing elements In this way, common event handling behavior can be specified at any level, and it applies to all lower levels. Example: <catch event= help >...</catch> Built-in abbrevia*ons for help, noinput, nomatch: <help>...</help>

Slide adapted from Gabriel Skantze Error handling (one aspect of grounding) S: From where do you want to go? A: I want to go from Paris S: Sorry, I didn t understand. From where do you want to go? A: I said Paris! S: Please just say the name of the city. From where do you want to go? A: Paris S: You want to go from Paris.

Error handling: <noinput>, <nomatch> <noinput> I'm sorry, I didn't hear you. <reprompt/> </noinput> <nomatch> I'm sorry, I didn't understand that.<reprompt/> </nomatch> - <noinput> means silence exceeds a timeout threshold - <nomatch> means that the ASR confidence value for utterance is too low - <reprompt/> makes VoiceXML proceed to the next prompt after catching the event (e.g. repeating the previous prompt, or reformulating it) - Error handing can make use of tapered prompting using the count attribute 2017-02-21 67 Speech and Language Processing -- Jurafsky and Martin

Slide adapted from Gabriel Skantze Error handling: <noinput>, <nomatch> <form> <field name="origin"> <grammar src="grammar.xml#top"/> <prompt>from where do you want to go?</prompt> <nomatch count="1"> Sorry, I didn t understand.<reprompt/> </nomatch> <nomatch count="2"> Please just say the name of the city.<reprompt/> </nomatch> <nomatch count="3"> There seems to be a problem, I will connect you to an operator. <goto next="operator.vxml"/> </nomatch> <noinput count="1"> You must say the name of a city.<reprompt/> </noinput> <noinput count="3"> There seems to be a problem, I will connect you to an operator. <goto next="operator.vxml"/> </noinput> </field> </form>

Resulting dialogue C (computer): From where do you want to go? H: Oh right what was the name of that place again C: Sorry I didn t understand. From where do you want to go? H: It s in Norway, um C: Please just say the name of the city. From where do you want to go? H: C: You must say the name of a city. From where do you want to go? H: Um C: There seems to be a problem, I will connect you to an operator.

Grounding in VoiceXML Error handling is part of grounding VoiceXML thus provides some support for dealing with grounding Still, many parts missing, e.g. positive feedback

Help Help events can be included in fields to provide fieldspecific help

Example: directed dialogue with help <form id="weather_info"> <block>welcome to the weather information service.</block> <field name="state"> <prompt>what state?</prompt> <grammar src="state.grxml" type="application/srgs+xml"/> <catch event="help >Please speak the state for which you want the weather. </catch> </form> </field> <field name="city"> <prompt>what city?</prompt> <grammar src="city.grxml" type="application/srgs+xml"/> <catch event="help"> Please speak the city for which you want the weather. </catch> </field> <block> <submit next="/servlet/weather" namelist="city state"/> </block>

Resulting dialogue C (computer): Welcome to the weather information service. What state? H (human): Help C: Please speak the state for which you want the weather. H: Georgia C: What city? H: Macon C: The conditions in Macon Georgia are sunny and clear at 11 AM...

Slide adapted from Gabriel Skantze Variables Variables : Definition (and assignment) <var name="city" expr="'paris'"/> Assignment only <assign name="city" expr="'london'"/> Clearing: <clear namelist="origin time"/> Fields (in a form) are also variables

Control: Jumping to next state using <goto> Useful for building finite state systems <form id="ask_to"> <field name="to"> <prompt> Where are you going? </prompt> <option>london</option> <option>paris</option> <option>new york</option> </field> <filled> <goto next="#ask_day"/> </filled> </form>

Slide adapted from Gabriel Skantze Control: If-then-else Condi*onal branching; also useful for finite state systems <if cond="price > 1000"> <prompt>that was very expensive!</prompt> <elseif cond="price > 700"/> <prompt>that was expensive!</prompt> <elseif cond="price > 500"/> <prompt>that was fairly expensive!</prompt> <else/> <prompt>that was not so expensive!</prompt> </if>