Prediction and Selection of Sequence of Actions Related to Voice Activated Computing Systems

Similar documents
Actions on Landing Pages

Artificial Intelligence Powered Brand Identification and Attribution for On Screen Content

Voice activated spell-check

Disabling facial unlocking using facial expression

Polite mode for a virtual assistant

Detection of lost status of mobile devices

Seamless transfer of ambient media from environment to mobile device

PRE-ARRIVAL DESTINATION PREPARATION

Control of jitter buffer size using machine learning

On-Device troubleshooting for conferencing and enterprise equipment

Technical Disclosure Commons

SYSTEM AND METHOD FOR SPEECH RECOGNITION

Input And Output Method Of Personalized Answers With Chatbot

Replay Real World Network Conditions To Test Cellular Switching

Intelligent Generation of Physical Products

Summarizing Existing Conversation For Newly Added Participants In Online Communication

Automatic event scheduling based on weather conditions

Automatic information protection when device camera is operated by secondary user

Technical Disclosure Commons

DETECTION OF POSITION FIXED CONTENT ITEM SLOTS AND CONTROL OF CONTENT DISPLAYED THEREIN

Handwriting recognition for IDEs with Unicode support

LANGUAGE RUNTIME NON-VOLATILE RAM AWARE SWAPPING

Displaying web pages within a software keyboard

VOICE AND TOUCH BASED INPUT

TEMPORAL/SPATIAL CALENDAR EVENTS AND TRIGGERS

IMAGE ACQUISTION AND PROCESSING FOR FINANCIAL DUE DILIGENCE

IMAGE ACQUISTION AND PROCESSING FOR FINANCIAL ANALYSIS

Smart-Voice Invocation of Scenes in Home- Automation Systems

Synchronizing Media Content In A Shared Virtual Reality Environment

Reporting road hazards using in-vehicle camera

Context based automatic responses

TIME DILATION CONTROLS FOR CAMERAS

Talking Books in PowerPoint

Selecting native ad units using gestures

Navigation of blogs related by a tag

An Eye Tracking System For Smart Devices

APPLICATIONS ON TOP OF DNA CENTER: TOPOLOGY EVOLUTION BASED ON CUSTOMER NETWORK DYNAMICS

Vehicle To Android Communication Mode

Intelligent ranking for photo galleries using sharing intent

Providing link suggestions based on recent user activity

Creative Uses of PowerPoint 2003

SOCIAL NETWORK BASED AUTHENTICATION FOR SECURE ACCESS

Pipeline To Generate Training Data For Image Recognition

Conversion of a 3D scene into video

REDUCING GRANULARITY OF BROWSER FINGERPRINTING TECHNIQUES

Direct Handwriting Editing

Ranking trip search results based on user travel history

Notification Features For Digital Content In A Mobile-Optimized Format

Secure web proxy resistant to probing attacks

Automatically Generating and Rendering Native Advertisements

Behavior-Triggered Answer Reconfirmation for Spam Detection in Online Surveys

You will receive a message in your Inbox inviting you to the training session.

BUFFERING AND INSERTING TEXT INPUTS

DYNAMICALLY CACHING VIDEO CONTENT BASED ON USER LIKELIHOOD OF INTERACTING WITH CONTENT ELEMENT

Group Targeting for Video Ads Based on Location Context

TIMING-BASED ASSISTANT CONTROLS

Hashing technique to optimally balance load within switching networks

Advanced Intrusion Detection

MULTIPLE TIER LOW OVERHEAD MEMORY LEAK DETECTOR

STORAGE AND SELECTION OF CELL MARKERS

Geo-Restricted In-Video Annotations

wego write Predictable User Guide Find more resources online: For wego write-d Speech-Generating Devices

SMART LIVE CHAT LIMITER

SERVICE LEVEL AGREEMENT ENFORCEMENT AND MEASUREMENT OF NETWORK SLICES

BLUETOOTH HANDSFREELINK (HFL)

ANALYZING AND TRANSFORMING TIME DIVISION MULTIPLEXING EQUIPMENT STRUCTURE

SECURITY SOLUTION FOR KUBERNETES USING CLOUD-NATIVE VIRTUAL NETWORK FUNCTIONS

EFFICIENT TRAINING FOR MACHINE LEARNING MODELS ON EMBEDDED AND/OR LOW-POWER DEVICES

Computer Speech. by Dick Evans,

PARTIAL CLICK PROTECTIONS FOR ADS

Competitor Identification Based On Organic Impressions

Method Of Password Entering And Storing

How to Change the Default Playback & Recording Audio Device. How to Change the Default Playback Device

Voice. Voice. Patterson EagleSoft Overview Voice 629

Master Your Mac. simple ways to tweak, customize, and secure os x

Sonic Studio. User Manual

User guide. Stereo Microphone STM10

Tizen apps with. Context Awareness, powered by AI. by Shashwat Pradhan, CEO Emberify

Robocall and fake caller-id detection

SSD Admission Control for Content Delivery Networks

Also, it will give you an idea of how easily students can use video to create their own engaging multimedia projects:

Configurable data import of multiple Excel

Preferential Resource Delivery Via Web Proxy or Web Browser

ALLOCATING RF SPECTRUM PARTITIONS IN A CBRS NETWORK

REMOVABLE COMPUTER DATA STORAGE MEDIUM WITH VISIBLE MUTABLEBULK- PROPERTY INDICATION

Analysis of cells within a spreadsheet

BLUETOOTH HANDSFREELINK (HFL)

This version of the guide was created in January 2018 and is based on Version of TeamSpeak (the current version of TeamSpeak at the time).

Creating a video suitable for Vodcast Using Quicktime

APPLICATIONS ON TOP OF DNA CENTER: NETWORK SECURITY SCORE BY NETWORK APPLICATION SEGMENTS

INTERPOLATED GRADIENT FOR DATA MAPS

Summary. Speech-Enabling Visual Basic 6 Applications with Microsoft SAPI Microsoft Corporation. All rights reserved.

Keyword Recognition Performance with Alango Voice Enhancement Package (VEP) DSP software solution for multi-microphone voice-controlled devices

BLUETOOTH HANDSFREELINK

ADAPTIVE USER INTERFACE BASED ON EYE TRACKING

Technical Disclosure Commons

Making XR a reality for everyone

MSN Messenger Service V4-5 Manual

Full Issue: vol. 45, no. 4

Page 1. Arrakis Systems 6604 Powell St. Loveland, CO

Transcription:

Technical Disclosure Commons Defensive Publications Series July 03, 2017 Prediction and Selection of Sequence of Actions Related to Voice Activated Computing Systems John D. Lanza Foley & Lardner LLP Follow this and additional works at: http://www.tdcommons.org/dpubs_series Recommended Citation Lanza, John D., "Prediction and Selection of Sequence of Actions Related to Voice Activated Computing Systems", Technical Disclosure Commons, ( July 03, 2017) http://www.tdcommons.org/dpubs_series/569 This work is licensed under a Creative Commons Attribution 4.0 License. This Article is brought to you for free and open access by Technical Disclosure Commons. It has been accepted for inclusion in Defensive Publications Series by an authorized administrator of Technical Disclosure Commons.

Lanza: Prediction and Selection of Sequence of Actions Related to Voice PREDICTION AND SELECTION OF SEQUENCE OF ACTIONS RELATED TO VOICE ACTIVATED COMPUTING SYSTEMS Voice activated computing systems provide a user with content or services in response to voice commands spoken by the user. Such systems can capture voice commands from a user, process the voice commands to determine requests and keywords in the voice commands, and provide the user with content or services related to the determined requests and keywords. As discussed herein, a voice activated computing system processes the voice commands to not only determine explicitly requested content and services, but to also predict likely content and services that may be useful to the user in the context of the explicitly requested content and services. The computing system can generate actions related to the predicted content and services, and execute these actions prior to the execution of the actions related to the explicitly requested content and services. For example, the voice activated computing system receiving a voice command OK, I would like to go to dinner and then a movie tonight, can determine actions related to dinner and movie. In addition, the system can predict an action related to transportation from the movie to home, even though this action was not explicitly requested in the voice command. The system may predict that the most likely order of actions would be actions related to dinner first, actions related to the movie next, followed by actions related to transportation from the movie to home. However, instead of executing actions in this order, the system can bypass the actions related to dinner and movie and execute an action related to the transportation form the movie to home. For example, the system can send the user an audio file stating Would you like a taxi waiting for you after the movie? prior to the execution of the actions related to dinner and actions related to movie. In this manner, the system not Published by Technical Disclosure Commons, 2017 2

Defensive Publications Series, Art. 569 [2017] only can predict content and services that the user may need, but also alter the execution sequence of the actions related to the content and services. Data Processing System Natural Language processor Audio signal generator Task Predictor Content selector Service provider Voice Assistant Device Network Figure 1 Figure 1 shows an example voice activated computing system. The system includes a voice assistant device, a service provider, and a data processing system communicating over a network. The voice assistant device can be a device accepts voice commands, and provides audio or visual output. The voice assistant can include one or more mics and cameras, such that voice commands received by the user are converted into corresponding audio signals. The voice assistant can send the audio signals to the data processing system and the service provider. The voice assistant device also can receive data such as audio signals or video signals from the data processing system or the service provider. The voice assistant device also can include audio http://www.tdcommons.org/dpubs_series/569 3

Lanza: Prediction and Selection of Sequence of Actions Related to Voice speakers that can convert the audio signals received from the data processing system or the service provider into sound. The data processing system can process voice commands received from the voice assistant device. The data processing system includes a natural language processor, an audio signal generator, a task predictor, and a content selector. The natural language processor is capable of processing voice commands included in the audio signals received from the voice assistant device. The natural language processor can convert the audio signals into recognized text by comparing the audio signals against a stored, representative set of audio waveforms, and choosing the closest matches. The representative waveforms are generated across a large set of users, and can be augmented with speech samples. After the audio signals are converted into recognized text, the natural language processor can match the text to words that are associated, for example via training across users or through manual specification, with actions that the data processing system can serve. Basically, the natural language processor identifies requests and trigger words in the converted text, based on which the natural language processor can determine the content and actions to be carried out. The task predictor can predict tasks or actions based on the converted text, and in particular by identifying requests and trigger keywords in the converted text. The task predictor also can predict the most likely sequence in which the tasks would be executed. The content selector can select content, such as services to be offered to the user based on the actions identified by the task predictor. In addition, the content selector also can alter the sequence or the order in which the actions related to the services offered to the user are executed. The audio signal generator can generate audio signals based on the services selected by the content selector. The audio signals can be representative of voice responses or voice instructions provided to the user in response to the voice commands. Published by Technical Disclosure Commons, 2017 4

Defensive Publications Series, Art. 569 [2017] The service provider can provide one or more service to the user. For example, the service provider can be a taxi or car sharing service provider, dining or reservation service provider, and the like. The service provider can communicate with the voice assistant device independently of the data processing system and provide the user the ability to request a ride, do a dinner reservation, or avail of other services provided by the service provider. The service provider can also include a natural language processor, similar to the one discussed above in relation to the data processing system, to convert user voice commands into text, and identify requests and keywords to determine the services requested by the user. Referring again to the voice command example mentioned above, the user can speak the voice command OK, I would like to go to dinner and then a movie tonight, to the voice assistant device. The mics at the voice assistant device can convert the voice commands into corresponding audio signals, which are be transmitted by the voice assistant device to the data processing system over the network. At the data processing system, the natural language processor processes the audio signal received form the voice assistant device and identify requests for dinner and movie. The natural language processor also can identify a trigger keyword go or to go to, which can indicate a need for transportation. Even though the user s voice command does not directly express an intent for transportation, the trigger keyword indicates that transportation may be needed. The task predictor, based on the requests for dinner and movie and on the trigger keywords, can determine a most likely sequence of actions related to the voice command. For example, one sequence of action is shown below in Figure 2. http://www.tdcommons.org/dpubs_series/569 5

Lanza: Prediction and Selection of Sequence of Actions Related to Voice Dinner Movie Transp. movie to home Sequence of Actions Figure 2 Figure 2 shows a predicted sequence of actions from time T-1 to time T-7. The task predictor determines that the most likely sequence of actions is actions related to dinner, followed by actions related to movie, and then actions related to transportation from movie to home. The actions related to dinner can include making reservations at a restaurant, while the actions related to movie can include purchasing tickets to a movie. The action of transportation from the movie to home, in turn, can include requesting a taxi from the movie to the user s home. While the task predictor predicts the above sequence of actions as the most likely sequence, the content selector can alter this sequence of actions. In particular, the content selector can execute actions related to transportation from movie to home prior to the actions related to dinner. For example, the content selector can change the sequence of actions such that the user is send a voice request Would you like a taxi waiting for you after the movie? prior to sending voice requests regarding making a dinner reservation or purchasing movie tickets. While Figure 2 shows that the actions related to transportation from movie to home are executed prior to the actions related to dinner, it is understood that the actions related to transportation from movie to home can be executed prior to any other action in the sequence of actions predicted by the task predictor. For example, the actions related to transportation from Published by Technical Disclosure Commons, 2017 6

Defensive Publications Series, Art. 569 [2017] movie to home may be executed after actions related to making dinner reservations but before the actions related to purchasing movie tickets. In the predicted sequence of actions, the system would send voice requests to the user regarding transportation from movie to home after the movie has ended. Thus, the user would potentially spend time after the movie has ended to interact with the service provider and wait for the taxi to arrive. By executing the actions related to transportation from movie to home, earlier in the sequence of actions, the taxi can arrive at the movie before or as soon as the movie ends, saving the user valuable time. http://www.tdcommons.org/dpubs_series/569 7

Lanza: Prediction and Selection of Sequence of Actions Related to Voice Abstract This document describes a technique for processing voice commands in a voice activated computing system. In particular, the system processes the voice commands to not only determine explicitly requested content and services, but to also predict likely content and services that may be useful to the user in the context of the explicitly requested content and services. The computing system can generate actions related to the predicted content and services, and execute these actions prior to the execution of the actions related to the explicitly requested content and services. Published by Technical Disclosure Commons, 2017 8