The SEMAINE API: An open-source research platform for multimodal, emotion-oriented interactive systems. Marc Schröder, DFKI. enterface 2010, Amsterdam

Similar documents
The W3C Emotion Incubator Group

Developing Interactive Embodied Characters using the Thalamus Framework: A Collaborative Approach

Tutorial on Virtual Human System

Virtual Human Creation Pipeline

EMMA: Extensible MultiModal Annotation

Tools and Toolkits for Voice and Animated Character based Interventions. Overview

Authors Martin Eckert Ingmar Kliche Deutsche Telekom Laboratories.

DTask & LiteBody: Open Source, Standards-based Tools for Building Web-deployed Embodied Conversational Agents

Distributed Systems. Messaging and JMS Distributed Systems 1. Master of Information System Management

MPML: A Multimodal Presentation Markup Language with Character Agent Control Functions

Interaction Design and Implementation for Multimodal Mobile Semantic Web Interfaces

Affective Embodied Conversational Agents. Summary of programme Affect and Personality in Interaction with Ubiquitous Systems.

EVENT VERIFICATION THROUGH VOICE PROCESS USING ANDROID. Kodela Divya* 1, J.Pratibha 2

Miraculous-Life Miraculous-Life for Elderly Independent Living

How to create dialog system

WFSTDM Builder Network-based Spoken Dialogue System Builder for Easy Prototyping

Java TM. Message-Driven Beans. Jaroslav Porubän 2007

Building Large Scale Distributed Systems with AMQP. Ted Ross

SmartWeb Handheld Multimodal Interaction with Ontological Knowledge Bases and Semantic Web Services

The Basics MAC. The Main Skype for Business Window. ! Name! Presence indicator. ! Conversations. ! Meetings. ! Calls. ! Contacts

Speech Applications. How do they work?

Web & Automotive. Paris, April Dave Raggett

Introduction to Messaging using JMS

Using control sequences

Workflow, Planning and Performance Information, information, information Dr Andrew Stephen M c Gough

White Paper Subcategory. Overview of XML Communication Technologies

Instant Messaging Interface for Data Distribution Service

groupware chapter 19 Groupware What is groupware? The Time/Space Matrix Classification by Function Time/Space Matrix (ctd)

Agenda. A. Paschke 1, A. Kozlenkov 2 1. RuleResponder Approach Reaction RuleML Prova Semantic Web Rule Engine Use Cases Summary

Integrate Speech Technology for Hands-free Operation

Dialogue systems. Volha Petukhova Saarland University

Introducing the VoiceXML Server

Topics for thesis. Automatic Speech-based Emotion Recognition

VLE architecture and design: Report on design and specifications for the tutors architecture

Loquendo TTS Director: The next generation prompt-authoring suite for creating, editing and checking prompts

The NEPOMUK project. Dr. Ansgar Bernardi DFKI GmbH Kaiserslautern, Germany

Development of AniDB and AniXML Schema for Semi-Autonomous Animation Contents Generation using Korean Fairy Tale Text

A Scripting Language for Multimodal Presentation on Mobile Phones

TestingofScout Application. Ludwigsburg,

A Language-based Approach to Interoperability of IoT Platforms

AUTOMATED CHARACTER ANIMATION TOOL

Blackboard Learn 9.1 Reference Terminology elearning Blackboard Learn 9.1 for Faculty

CS Human Computer Interaction

ICT Dialogue Manager Tutorial: Session 5: Implementation in Soar (II)

EVALITA 2009: Loquendo Spoken Dialog System

Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment

Modularization of Multimodal Interaction Specifications

Exploiting Conversation Structure in Unsupervised Topic Segmentation for s

1 Community Grid Labs, Indiana University. 2 Department of Electrical Engineering & Computer Science, Syracuse University

Voice Foundation Classes

SOFTWARE DESIGN AND DEVELOPMENT OF MUTIMODAL INTERACTION

Virtual Collaboration:

A Novel Unity-based Realizer for the Realization of Conversational Behavior on Embodied Conversational Agents

Vinnie Saini Cloud Solution Architect Big Data & AI

From Discourse Plans to Believable Behavior Generation

Connect Applications and Services Together with the Enterprise Service Bus

Position Statement for Multi-Modal Access

Example Purchase request JMS & MDB. Example Purchase request. Agenda. Purpose. Solution. Enterprise Application Development using J2EE

The paper shows how to realize write-once-run-anywhere for such apps, and what are important lessons learned from our experience.

ACE Chapter 4 review. Name: Class: Date: True/False Indicate whether the statement is true or false.

MOM MESSAGE ORIENTED MIDDLEWARE OVERVIEW OF MESSAGE ORIENTED MIDDLEWARE TECHNOLOGIES AND CONCEPTS. MOM Message Oriented Middleware

Web Services & Axis2. Architecture & Tutorial. Ing. Buda Claudio 2nd Engineering Faculty University of Bologna

W3C Workshop on Multimodal Interaction

Pervasive Computing offers Adaptable Interfaces

Chapter 7: Communication. Organizational Behaviour 5 th Canadian Edition 7-1. Langton / Robbins / Judge Copyright 2010 Pearson Education Canada

1Z Oracle. Java Platform Enterprise Edition 6 Enterprise JavaBeans Developer Certified Expert

Information Systems Software

Web and Automotive W3C Workshop. Renault - DREAM Nov 2012 RENAULT PROPERTY

JAVA COURSES. Empowering Innovation. DN InfoTech Pvt. Ltd. H-151, Sector 63, Noida, UP

Software Guide. Version 3 1

Genesys App Automation Platform Deployment Guide. Hardware and Software Specifications

VClarity Voice Platform

Visual Authoring Tool for Presentation Agent based on Multimodal Presentation Markup Language

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?

A Novel Realizer of Conversational Behavior for Affective and Personalized Human Machine Interaction - EVA U-Realizer -

Narrative Editing of Web Contexts on Online Community System with Avatar-like Agents

Human Interaction Container Paradigm

C15: JavaFX: Styling, FXML, and MVC

Design of a Speech Interface for Augmenting Desktop Accessibility

Concurrent Computing CSCI 201 Principles of Software Development

Media Resource Control Protocol v2

3GPP TS V ( )

W3C on mobile, CSS, multimodal and more

The Embodied Conversational Agent Toolkit: A new modularization approach.

Communication Skills

TE Teacher s Edition PE Pupil Edition Page 1

DYNAMIC CONFIGURATION OF COLLABORATION IN NETWORKED ORGANISATIONS

Naming & Design Requirements (NDR)

Multimodal Applications from Mobile to Kiosk. Michael Johnston AT&T Labs Research W3C Sophia-Antipolis July 2004

SmartKom: Towards Multimodal Dialogues with Anthropomorphic Interface Agents

ETSI TS V ( )

Reusability and Adaptability of Interactive Resources in Web-Based Educational Systems. 01/06/2003

C16a: Model-View-Controller and JavaFX Styling

Voice Control becomes Natural

Automated Integration Testing in Agile Environments

Build, Deploy & Operate Intelligent Chatbots with Amazon Lex

Community TV: An Approach to Interaction for Groups and Single Users on Internet Video

Niusha, the first Persian speech-enabled IVR platform

Voice-controlled Home Automation Using Watson, Raspberry Pi, and Openwhisk

D7.2: Initial functional prototype

Transcription:

The SEMAINE API: An open-source research platform for multimodal, emotion-oriented interactive systems enterface 2010, Amsterdam

Outline The SEMAINE project The SEMAINE API Motivation A component integration framework API architecture Markup interfaces Building new emotion-oriented systems with the SEMAINE API combining existing and new modules Overview of Tutorial 2

The SEMAINE project Sustained Emotionally coloured Machinehuman Interaction using Nonverbal Expression http://www.semaine-project.eu FP7 project, Jan. 2008 Dec. 2010 Aim: build an autonomous, responsive Sensitive Artificial Listener (SAL) system real-time multimodal interaction strong on non-verbal skills cut down on verbal skills for now conversation: chat, not task-oriented 3

The SEMAINE project: Partners Coordinator: DFKI Saarbrücken (M. Schröder) system integration, speech synthesis Queen's University Belfast (R. Cowie) data collection, evaluation Imperial College, London (M. Pantic) facial analysis University of Twente (D. Heylen) dialogue modelling CNRS (C. Pelachaud) ECA system TU München (B. Schuller) ASR, vocal emotion recognition 4

The SEMAINE system 5

The SEMAINE system 6

The SEMAINE system: Components Facial analysis (Windows / Mac, C++) face detector, nod/shake detector Voice analysis (Linux / Mac, C++) feature extraction, keyword spotting, emotion rec. Verbal utterance planning (Java) interpret user behaviour, plan agent behaviour Listener behaviour (Windows, C++) generate backchannels + feedback Speech synthesis (Java) Visual behaviour realisation (Windows, C++) 7

Emotion-oriented interactive systems Emotion-enabled interactive characters MAX FearNot: learning anti-bullying strategies IDEAS4Games Other emotion-related functionality AffectiveDiary emoto: Pen for affective text messages... 8

Current state: island solutions Research systems tend to create ad hoc systems with domain-specific representations Difficult to reuse Difficult to combine parts of existing systems to implement new functionality 9

The benefit of standards Standards are taken for granted in many parts of our life electric voltage (220 V throughout Europe) fuel grade (95 Octane) screw threads... more recently: Document formats (ODF) Users can rely on guaranteed properties without worrying who provides the service and how it is implemented supports interoperability and reuse 10

The SEMAINE API Integrate components across languages and operating systems: Middleware! Message-oriented middleware (MOM) asynchronous sender doesn't need to wait for receiver publish-subscribe to Topics flexible n-to-m message passing currently using Apache ActiveMQ, a JMS server open source, Java/C++, reasonably fast SEMAINE API is a component integration layer on top of the MOM 11

Reasonably fast : ActiveMQ vs. Psyclone Round-trip message routing times on localhost ActiveMQ 100 0.3 55 ms 80 Psyclone 16 408 ms Milliseconds 60 40 Psyclone ActiveMQ 20 0 10 0 10 1, 0 00 10 0 0,0 String length 00 00 0 0 0, 0, 0 0 1 0 1, ActiveMQ between 10 and 50 times faster than Psyclone 12

The SEMAINE API: System architecture message-oriented middleware component meta messenger semaine.callback.* component meta messenger semaine.log.*... message logger semaine.data.* semaine.meta log reader system manager system systemmonitor monitor GUI GUI 13

The SEMAINE API: System monitor 14

Representation formats in the SEMAINE API EMMA w/ EmotionML or BML Interpreters Action proposers Action proposers user state dialog state SemaineML agent state Action Action proposers Actionproposers proposers candidate action FML Action selection action BML w/ SSML and MaryXML Analysers features Feature extractors feature vectors audio data, FAP, BAP Behaviour planner behaviour plan Behaviour realiser behaviour data Player 15

Representation formats in the SEMAINE API EMMA: W3C Extensible Multimodal Annotation markup language container format for user behaviour analysis EMMA transporting EmotionML: <emma:emma xmlns:emma="http://www.w3.org/2003/04/emma" version="1.0"> <emma:interpretation emma:start="123456789"> <emotion xmlns="http://www.w3.org/2005/incubator/emotion"> <dimensions set="valencearousalpotency"> <arousal value="-0.29"/> <valence value="-0.22"/> </dimensions> </emotion> </emma:interpretation> </emma:emma> 16

Representation formats in the SEMAINE API EmotionML W3C working draft Representation of emotions informed by affective sciences categories, dimensions, appraisals, action tendencies links to triggers, objects, expressive behaviour customizable: choose vocabulary to use 17

Representation formats in the SEMAINE API EMMA: W3C Extensible Multimodal Annotation markup language container format for user behaviour analysis EMMA transporting BML: <emma:emma xmlns:emma="http://www.w3.org/2003/04/emma" version="1.0"> <emma:interpretation emma:start="123456789"> <bml:head type= NOD xmlns:bml= http://www.mindmakers.org/projects/bml /> </emma:interpretation> </emma:emma> 18

Representation formats in the SEMAINE API SemaineML custom format for domain-specific annotations SemaineML representing dialogue state: <dialog-state xmlns="http://www.semaine-project.eu/semaineml" version="0.0.1"> <speaker who="agent"/> <listener who="user"/> </dialog-state> 19

Representation formats in the SEMAINE API BML: Behaviour Markup Language drive ECA behaviour speech, face, and gesture first, symbolic representation of AV synchronization TTS adds detailed timings 20

Representation formats in the SEMAINE API BML with symbolic time markers: <bml xmlns="http://www.mindmakers.org/projects/bml" id="bml1"> <speech id="s1" language="en-us" text="hi, I'm Poppy." ssml:xmlns="http://www.w3.org/2001/10/synthesis"> <ssml:mark name="s1:tm1"/> Hi, <ssml:mark name="s1:tm2"/> I'm <ssml:mark name="s1:tm3"/> Poppy. <ssml:mark name="s1:tm4"/> <pitchaccent id="xpa1" start="s1:tm1" end="s1:tm2"/> <pitchaccent id="xpa2" start="s1:tm3" end="s1:tm4"/> <boundary id="b1" time="s1:tm4"/> </speech> <gaze id="g1" start="s1:tm1" end="s1:tm4">... </gaze> <head id="h1" start="s1:tm3" end="s1:tm4" type="nod">... </head> </bml> 21

Representation formats in the SEMAINE API BML with symbolic time markers: <bml xmlns="http://www.mindmakers.org/projects/bml" id="bml1"> <speech id="s1" language="en-us" text="hi, I'm Poppy." ssml:xmlns="http://www.w3.org/2001/10/synthesis"> <ssml:mark name="s1:tm1"/> issue with XML namespaces Hi, in current BML draft <ssml:mark name="s1:tm2"/> I'm <ssml:mark name="s1:tm3"/> Poppy. <ssml:mark name="s1:tm4"/> <pitchaccent id="xpa1" start="s1:tm1" end="s1:tm2"/> <pitchaccent id="xpa2" start="s1:tm3" end="s1:tm4"/> <boundary id="b1" time="s1:tm4"/> </speech> <gaze id="g1" start="s1:tm1" end="s1:tm4">... </gaze> <head id="h1" start="s1:tm3" end="s1:tm4" type="nod">... </head> </bml> 22

Representation formats in the SEMAINE API BML with symbolic time markers: <bml xmlns="http://www.mindmakers.org/projects/bml" id="bml1"> <speech id="s1" language="en-us" text="hi, I'm Poppy." ssml:xmlns="http://www.w3.org/2001/10/synthesis"> <ssml:mark name="s1:tm1"/> Hi, <ssml:mark name="s1:tm2"/> I'm <ssml:mark name="s1:tm3"/> non-standard extension Poppy. <ssml:mark name="s1:tm4"/> <pitchaccent id="xpa1" start="s1:tm1" end="s1:tm2"/> <pitchaccent id="xpa2" start="s1:tm3" end="s1:tm4"/> <boundary id="b1" time="s1:tm4"/> </speech> <gaze id="g1" start="s1:tm1" end="s1:tm4">... </gaze> <head id="h1" start="s1:tm3" end="s1:tm4" type="nod">... </head> </bml> 23

Representation formats in the SEMAINE API BML with phone timing for lip synchronization: <bml xmlns="http://www.mindmakers.org/projects/bml" id="bml1"> <speech id="s1" language="en_us" text="hi, I'm Poppy." ssml:xmlns="http://www.w3.org/2001/10/synthesis" mary:xmlns="http://mary.dfki.de/2002/maryxml">... <ssml:mark name="s1:tm3"/> Poppy. <mary:syllable stress="1"> <mary:ph d="0.092" end="1.011" p="p"/> <mary:ph d="0.112" end="1.123" p="a"/> <mary:ph d="0.093" end="1.216" p="p"/> </mary:syllable> <mary:syllable> <mary:ph d="0.141" end="1.357" p="i"/> </mary:syllable> <ssml:mark name="s1:tm4"/>... </bml> 24

Representation formats in the SEMAINE API Some representations available from standards or standards-in-the-making EMMA, EmotionML, BML, SSML Others still missing FML, dialogue state, Implementation provides reality check for a specification feedback from implementation can improve spec 25

Building new emotion-oriented systems SEMAINE API simplifies the integration task abstract away from operating system, programming language and communication issues use standard representation formats where possible provide suitable support for XML handling Three example systems: Hello world Emotion mirror The swimmer's game 26

1. Emotional Hello world Minimal system: text input component dummy text analysis to infer emotional state emoticon output Code is short about 20 lines each 27

1. Emotional Hello world Arousal EmotionML Valence - 0 + + 8-( 8-8-) 0 :-( :- :-) - *-( *- *-) 28

1. Emotional Hello world 1 public class HelloAnalyser extends Component { 2 3 private XMLSender emotionsender = new XMLSender("semaine.data.hello.emotion", "EmotionML", getname()); 4 5 public HelloAnalyser() throws JMSException { 6 super("helloanalyser"); 7 receivers.add(new Receiver("semaine.data.hello.text")); 8 senders.add(emotionsender); 9 } 10 11 @Override protected void react(semainemessage m) throws JMSException { 12 int arousalvalue = 0, valencevalue = 0; 13 String input = m.gettext(); 14 if (input.contains("very")) arousalvalue = 1; 15 else if (input.contains("a bit")) arousalvalue = -1; 16 if (input.contains("happy")) valencevalue = 1; 17 else if (input.contains("sad")) valencevalue = -1; 18 Document emotionml = createemotionml(arousalvalue, valencevalue); 19 emotionsender.sendxml(emotionml, meta.gettime()); 20 } 29

2. Emotion mirror Mimick user emotion Infer user emotion from user speech Represent inferred emotion as an emoticon Benefit of reuse: need to implement only one new component Reuse opensmile emotion recognition from SEMAINE system Reuse EmoticonOutput from Hello World example Implement simple decision component that extracts emotion judgements from recognition output 30

2. Emotion mirror 31

2. Emotion mirror 1 public class EmotionExtractor extends Component { 2 private XMLSender emotionsender = new XMLSender("semaine.data.hello.emotion", "EmotionML", getname()); 3 4 public EmotionExtractor() throws JMSException { 5 super("emotionextractor"); 6 receivers.add(new EmmaReceiver("semaine.data.state.user.emma")); 7 senders.add(emotionsender); 8 } 9 10 @Override protected void react(semainemessage m) throws JMSException { 11 SEMAINEEmmaMessage emmamessage = (SEMAINEEmmaMessage) m; 12 Element interpretation = emmamessage.gettoplevelinterpretation(); 13 List<Element> emotionelements = emmamessage.getemotionelements(interpretation); 14 if (emotionelements.size() > 0) { 15 Element emotion = emotionelements.get(0); 16 Document emotionml = XMLTool.newDocument(EmotionML.ROOT_ELEMENT, EmotionML.namespaceURI); 17 emotionml.adoptnode(emotion); 18 emotionml.getdocumentelement().appendchild(emotion); 19 emotionsender.sendxml(emotionml, meta.gettime()); 20 } 21 } 22 } 32

3. The Swimmer's Game Simple emotional speech-driven game a swimmer needs to reach the river bank, but is pulled by the water towards the waterfall user can cheer up the swimmer with aroused speech Components: emotion detection from speech position computer for swimmer GUI display of swimmer position commentator using TTS output 33

3. The Swimmer's Game 34

3. The Swimmer's Game 35

SEMAINE API: Summary SEMAINE API makes it easy to write new components in Java / C++ Integration simplified by standard representation formats 36

What you will learn in the Tutorial Install the SEMAINE 2.0 system Understand parts involved: MOM ActiveMQ Java components native components distributed system Write a SEMAINE component in Java/C++ Create a new emotion-oriented system from new and existing components 37