Situated Conversational Interaction

Similar documents
Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing. Interspeech 2013

Conversational Knowledge Graphs. Larry Heck Microsoft Research

USING A KNOWLEDGE GRAPH AND QUERY CLICK LOGS FOR UNSUPERVISED LEARNING OF RELATION DETECTION. Microsoft Research Mountain View, CA, USA

Larry Heck Contributors

Rapidly Building Domain-Specific Entity-Centric Language Models Using Semantic Web Knowledge Sources

Re-contextualization and contextual Entity exploration. Sebastian Holzki

Learning Semantic Entity Representations with Knowledge Graph and Deep Neural Networks and its Application to Named Entity Disambiguation

Empowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia

Task Completion Platform: A self-serve multi-domain goal oriented dialogue platform

KNOWLEDGE GRAPH INFERENCE FOR SPOKEN DIALOG SYSTEMS

Prediction and Selection of Sequence of Actions Related to Voice Activated Computing Systems

Human Robot Interaction

(12) Patent Application Publication (10) Pub. No.: US 2014/ A1

Key differentiating technologies for mobile search

Ontology driven voice-based interaction in mobile environment

Using context information to generate dynamic user interfaces

Multimodal Applications from Mobile to Kiosk. Michael Johnston AT&T Labs Research W3C Sophia-Antipolis July 2004

From HyperTEXT to HyperTEC

Available online at ScienceDirect. Procedia Computer Science 87 (2016 ) 12 17

Information Retrieval

Digital Newsletter. Editorial. Second Review Meeting in Brussels

Dragon TV Overview. TIF Workshop 24. Sept Reimund Schmald mob:

TABLE OF CONTENTS. SECTION 01 How do i access my Yulio account? 03. SECTION 02 How do i start using Yulio? 04

Towards Efficient and Effective Semantic Table Interpretation Ziqi Zhang

WebEx Participant Guide

The Connected World and Speech Technologies: It s About Effortless

Dolby Voice Room Admin Guide

Gesture Recognition and Voice Synthesis using Intel Real Sense

Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment

Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population

Columbia University (office) Computer Science Department (mobile) Amsterdam Avenue

Columbia University High-Level Feature Detection: Parts-based Concept Detectors

Enhancing applications with Cognitive APIs IBM Corporation

NUS-I2R: Learning a Combined System for Entity Linking

A Hierarchical Domain Model-Based Multi-Domain Selection Framework for Multi-Domain Dialog Systems

A Few Things to Know about Machine Learning for Web Search

VOICE ENABLING THE AUTOSCOUT24 CAR SEARCH APP. 1 Introduction. 2 Previous work

Beyond Ten Blue Links Seven Challenges

LW2730 & LW2930 LIVE SD+ Series - Frequently Asked Questions

Voice. Voice. Patterson EagleSoft Overview Voice 629

Semantics Isn t Easy Thoughts on the Way Forward

Development of Mobile Search Applications over Structured Web Data through Domain-Specific Modeling Languages. M.Sc. Thesis Atakan ARAL June 2012

Introduction to Text Mining. Hongning Wang

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

On-line glossary compilation

Microsoft Cloud Workshop. Intelligent Analytics Hackathon Learner Guide

Making an on-device personal assistant a reality

DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TORONTO CSC318S THE DESIGN OF INTERACTIVE COMPUTATIONAL MEDIA. Lecture Feb.

Getting Started for Moderators Quick Reference Guide

Windows Movie Maker / Microsoft Photo Story Digital Video

Multimedia Information Retrieval at ORL

Three Key Engines of A Conversational Bot. Dr. Ming Zhou Principal Researcher Microsoft Research Asia Aug. 26, 2016

Chapter 19: Multimedia

Text, Knowledge, and Information Extraction. Lizhen Qu

Voice activated spell-check

Semantic Video Indexing

A Kinect Sensor based Windows Control Interface

(P6) "Everyone will benefit from that," cofounder and CEO Alex Lebrun says.

Joining Collaborative and Content-based Filtering

Towards Summarizing the Web of Entities

Alberto Messina, Maurizio Montagnuolo

Papers for comprehensive viva-voce

Maximum Likelihood Beamforming for Robust Automatic Speech Recognition

Also, it will give you an idea of how easily students can use video to create their own engaging multimedia projects:

Cisco WebEx Training Center on the Mac OS Getting Started. Join a Session. Schedule a Session. Start a Session. Connect to the Audio Conference

Implementation of Kinetic Typography by Motion Recognition Sensor

MIKE: a Multimodal Cinematographic Editor for Virtual Worlds

International Journal of Combined Research & Development (IJCRD) eissn: x;pissn: Volume: 1; Issue: 2; June 2013

Overview of Web Mining Techniques and its Application towards Web

COLLABORATE INTERFACE QUICK START GUIDE

Enhanced retrieval using semantic technologies:

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017

Action Items SYSTEM REQUIREMENTS

WebEx Network Recording Player User Guide

Disconnecting the application from the interaction model

End-User Evaluations of Semantic Web Technologies

Welcome to the topic of SAP HANA modeling views.

TRENTINOMEDIA: Exploiting NLP and Background Knowledge to Browse a Large Multimedia News Store

How to Build Optimized ML Applications with Arm Software

User Configurable Semantic Natural Language Processing

CIS 408 Internet Computing. Dr. Sunnie Chung Dept. of Electrical Engineering and Computer Science Cleveland State University

August 2012 Daejeon, South Korea

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

What is this Song About?: Identification of Keywords in Bollywood Lyrics

Data Mining Concepts & Tasks

Exploiting the Social and Semantic Web for guided Web Archiving

ABSTRACT 1. INTRODUCTION

Progress Report: Smart Mirror 1

Leader Guide. Sysco e-meeting

Towards a basis for human-computer dialogue. Bengt Nordström Chalmers, Göteborg

Actions on Landing Pages

Experience Security, Risk, and Governance

CIW: Site Development Associate. Course Outline. CIW: Site Development Associate. ( Add-On ) 26 Aug 2018

A personal digital assistant as an advanced remote control for audio/video equipment

Making always-on vision a reality. Dr. Evgeni Gousev Sr. Director, Engineering Qualcomm Technologies, Inc. September 22,

PRELIMINARY MEETING PREPARATION For the best quality and experience during your WebEx Meeting, you should have the following:

Blackboard Collaborate Ultra 2018 UT DALLAS USER MANUAL

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

ARTutor & Moodle. Athens, 1 2 December 2017

Transcription:

Situated Conversational Interaction LARRY HECK Global SIP 2013 C O N T RIBUTORS : D I L E K H A K K A N I- T UR, PAT R I C K PA N T E L D E C E M B E R 2 0 1 3 T W E E T C O M ME N T S / Q U E S T I O N S : # I E E E G LO BALSIP

Situated Conversational Interaction Leverage the situation or context of the user to create a conversational natural user interfaces to the world's knowledge

Modes of Situated Conversations Human computer interaction Augmented multi-party interaction Show me funny movies OK, funny movies Which ones have Tina Fey? Got it, with Tina Fey that are funny Kinect Skeletal Tracking Play Date Night Personal Assistant for Phones Send e-mail to Larry about the demo Wake me up tomorrow at 6 Other Screens: Look for Portlandia clips on hulu How about places with outdoor dining?

Conversational Browser SITUATED INTERACTION WITH THE WEB OF DOCUMENTS AND APPS

Creating a Web Scale Conversational Browser Situated Interactions: Dynamically Adapt to the Page Content Browsing to a new web page or App affects ASR and SLU ASR: Dynamic adaptation of LMs to ngrams of page content 16% ERR in WER SLU: Add 100s of click intent actions to static SLU Multi-tiered logic determines final intent

Creating a Web Scale Conversational Browser Situated Interactions: Multimodal Processing Multimodal Sensor for Study: Kinect o RGB camera o Depth sensor o Multi-array microphone running proprietary software Kinect enables full-body 3D motion capture, facial recognition and voice recognition

Creating a Web Scale Conversational Browser Situated Interactions: Multimodal Processing Lexical Click Intent Gesture Click Intent Combining Intents

Experimental Setup Living Room: users seated 5-6 feet away from TV Two Data Collections Set #1: 8 speakers over 25 sessions o 2,868 user turns o 917 (31.9%) with click intent Set #2: 7 speakers over 14 sessions o 1,101 user turns o 284 (25.8%) with click intent

Gesture Intent Model Results Studied errors in gesture intent model o False Accepts (random arm movement triggers intent) o Miss (system misses pointing intent) Removed all display control utts (e.g., scroll up ) 558 remaining turns (ones with/without hand gesture) False Accepts Miss

Multimodal Click Intent Detection Simulated vs Real Gestures Real Gesture ~10% ERR Simulated (always pointing) ~50% ERR What is the gesture accuracy of humans? [16.4 28.6] pixels Do humans point every time they speak? Humans only user gesture 30.0% of the time

To dig deeper Larry Heck, et al, Multimodal Conversational Search and Browse IEEE Workshop on Speech, Language and Audio in Multimedia, August 2013

Conversational Insights SITUATED INTERACTION WITH DOCUMENTS

Entity Linking Problem: Annotate mentions of an entity with the knowledge base entry for that entity (linking text to knowledge). Solution: Build sequence models and contextual disambiguation models trained over selfdisambiguated data such as Wikipedia pages or query-document click data. Features range from topical information, linguistic patterns, morpho-syntactic clues, web usage logs, anchor texts, etc. Silviu-Petru Cucerzan (EMNLP 07); Rakesh Agrawal, Ariel Fuxman, Anitha Kannan, John Shafer (WWW 12) Larry Heck, Dilek Hakkani-Tur, and Gokhan Tur (Interspeech 13) Ming-Wei Chang, Emre Kiciman (NAACL 2013) Chin-Yew Lin, Xiaojiang Huang, Yunbo Cao ATL-Cairo

Knowledge: Foundation of Situated Conversations A vast majority of user interactions are with people, locations, things. Knowledge refers to these entities/concepts and to how they are interrelated. The dual-role of knowledge People seek to find information about things, to transact on them, and to browse for recommendations. Knowledge joined with world situated conversation.

Semantic Depth Conversational Systems Challenge Scaling Through Situated Knowledge Pivot on Knowledge Head Strategy: Deep & Narrow Scale Strategy: Shallow & Broad Domain Breadth

Anchor position Destination page topic Aaron Paul Bryan Cranston Source page topic.. Crime drama Albuquerque.Bryan Cranston Aaron Paul. Crime drama protagonist Albuquerque...Vince Gillian Protagonist Vince Gillian

Larry Heck, Dilek Hakkani-Tur, and Gokhan Tur; 2012-2013 Wikipedia & other document sources Semantic Graph Web Search release_date awarded awarded directed_by nationality Movie-Director search queries: Life is beautiful and Roberto Benigni Titanic and James Cameron... Search Results Italy's rubber-faced funnyman Roberto Benigni accomplishes... Life Is Beautiful is a 1997 Italian film which tells the story of a... Titanic is a 1997 American film directed by James Cameron... James Cameron directed Titanic and he did the best job you... NL Patterns Movie-name directed by Director-name Director-name s Movie-name Director-name directed Movie-name... release_date directed_by starring Search Queries Query Click Logs clicks URLs November 2012 release contains: ~ 37M entities ~ 683M relations and growing... Sample user utterances: Corresponding relation on the knowledge graph Show me movies by Roberto Benigni?movie directed_by Roberto Benigni Who directed Life is Beautiful? Life is beautiful directed_by?director Data & features for training statistical CU models User request in query language SELECT?movie {?movie directed_by Roberto Benigni. } SELECT?director { Life is Beautiful directed_by?director. } Who directed the movie Life is beautiful Director of Life is beautiful... User request in logical form λy.ǝx.x= Roberto Benigni Λ directed_by(x,y) λx.ǝy.y= Life is beautiful Λ directed_by(x,y)

Conversational Systems Approach Scaling Through Situated Knowledge Processing Flow of an Interaction with Knowledge Scoping Linking the focus entity(ies) to the knowledge base E.g., linking a touch on marathon to the Wikipedia ID for NYC Marathon ; subselecting restaurants on a map after circling a region. Intent detection Interpret the intent (what the user wants) in the context of the knowledge (and other context) Execution Farm out the request to an appropriate execution engine Informational Browse Ones good for kids. Key Point: There are very few intent categories that govern most interactions.

Intent Class NUI Context: Scope Entity Collection Page App Navigate Browse Informational Transactional

Entity Linking Problem: Annotate mentions of an entity with the knowledge base entry for that entity (linking text to knowledge). Solution: Build sequence models and contextual disambiguation models trained over selfdisambiguated data such as Wikipedia pages or query-document click data. Features range from topical information, linguistic patterns, morpho-syntactic clues, web usage logs, anchor texts, etc. Silviu-Petru Cucerzan (EMNLP 07); Rakesh Agrawal, Ariel Fuxman, Anitha Kannan, John Shafer (WWW 12) Larry Heck, Dilek Hakkani-Tur, and Gokhan Tur (Interspeech 13) Ming-Wei Chang, Emre Kiciman (NAACL 2013) Chin-Yew Lin, Xiaojiang Huang, Yunbo Cao ATL-Cairo N E C P A Scoping B I Execution T

Contextual Search Problem: Recommend contextually relevant related content, pivoting on user selection. Solution: Leverage generic search engines by formulating queries with contextually targeted terms; train context aware re-ranker models on web usage sessions to encourage diversity. Leibniz: Ashok Chandra, Ariel Fuxman, Michael Gamon, Yuanhua Lv, Patrick Pantel, Bo Zhao Eric Brill, Silviu-Petru Cucerzan Dilek Hakkani-Tur, Gokhan Tur, Rukmini Iyer, and Larry Heck Rakesh Agrawal, Sreenivas Gollapudi, Anitha Kannan, Krishnaram Kenthapadi N E C P A Scoping B I Execution T

Interestingness Problem: Predict the k things on the page of most interest to the user. Solution: Train prediction models on web browsing sessions. Model topic semantics of source and target content. Michael Gamon, Patrick Pantel, Johnson Apacible, Xinying Song (Microsoft Office) Scoping N B I E C P A Execution T

Conversational Search and Browse Problem: Enable search and browse of content with voice. Solution: Dynamically adapt (at run-time) the speech language models and semantic parsers based on the content of the page; identify actionable elements on the page and add to intent list. Larry Heck, Dilek Hakkani-Tur, Madhu Chinthakunta, Gokhan Tur, Rukmini Iyer, Partha Parthasarathy, Lisa Stifelman, Elizabeth Shriberg, and Ashley Fidler (IEEE SLAM 13) Scoping N B I E C P A Execution T

Semantic Graphs for Conversational Interaction Experimental Setup Scenario: conversational search over movies (Netflix) Training Freebase film (movies) domain, 56 relations with linked Wikipedia articles Focused on 4 Netflix properties: movies names, actors, genres, directors Drama Task: Entity spotting (F-Measures of precision/recall) Entity modeling only Entity plus relation modeling Kate Winslet Starring Genre Titanic Director James Cameron Award Release Year Oscar, Best director 1997

Semantic Graphs for Conversational Interaction Summary Approach and Results Knowledge as Priors: Leverage large KGs (Freebase) to bootstrap web-scale semantic parsers ~50M entities ~700M relations Unsupervised Machine Learning No semantic schema design No data collection No manual annotations Graph Crawling Algorithm for Unsupervised Data Mining Entity and Relation Modeling with Mined Data Netflix Search Entity Spotting Entity modeling: 61.0% and 55.4% F-measure (Manual/ASR transcriptions) Entity modeling plus relation: 84.6% and 80.6% F-measure Within 5.5% of supervised training Larry Heck, Dilek Hakkani-Tur, and Gokhan Tur, Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing, in Proceedings of Interspeech, International Speech Communication Association, August 2013

Situated Conversational Interaction Leverage the situation or context of the user to create a conversational natural user interfaces to the world's knowledge