Real-time large-scale analysis of audiovisual data

Size: px
Start display at page:

Download "Real-time large-scale analysis of audiovisual data"

Transcription

1 Finnish Center of Excellence in Computational Inference Real-time large-scale analysis of audiovisual data Department of Signal Processing and Acoustics Aalto University School of Electrical Engineering Thanks to: Jorma Laaksonen Department of Computer Science Aalto University School of Science Thanks also to research groups at both departments

2 About Mikko Associate professor in speech and language processing at Aalto Background from Machine Learning algorithms and Pattern Recognition systems PhD 1997 at TKK on speech recognition training algorithms Research experience in several top speech and language groups: Research Centers: IDIAP (CH), SRI (USA), ICSI (USA) Universities: Edinburgh, Cambridge, Colorado, Nagoya Head of Aalto speech recognition research group, several national and European speech projects Research topics: Speech recognition, language modeling, speaker adaptation, speech translation, information retrieval from audio and video data

3 Goals of today 1. Know why video data are so important today 2. Learn ways how large-scale video data are used 3. Learn about related research topics at Aalto 4. Learn how to study speech and video processing at Aalto 3

4 Most mobile data are video Global mobile data traffic grew 69 percent in Mobile video traffic exceeded 50 percent of total mobile data traffic for the first time in ( 4

5 National audiovisual institute KAVI Archives Finnish television and radio streams ( 32 main channels full time, every day 100 other channels by samples Available for studio viewing for researchers and public since 2009 (no mobile viewing) 5

6 Digital archives of Yle Television and radio broadcasts of the Finnish Broadcasting Company (Yle) archived since 1935 Full digital archive available for Yle, selected parts also for public: Elävä Arkisto Areena 6

7 What people watch? Every day people watch hundreds of millions of hours on YouTube. Over 100 hours of video are uploaded every minute More than half of YouTube views come from mobile devices. ( ress/statistics.html) 7

8 How to use large-scale video data? Give a few examples! 8

9 Research at COIN Speech recognition: Turn the speech in videos to text Content-based video retrieval: Analyse the visual content 9

10 Research at COIN Speech recognition: Turn the speech in videos to text Index, summarize, search, browse, and play the video based on what was spoken Add captions, translations, and links to support understanding Recognize speakers and provide training data for improving speech recognition and speech synthesis systems Content-based video retrieval: Analyse the visual content 10

11 Research at COIN Speech recognition: Turn the speech in videos to text Index, summarize, search, browse, and play the video based on what was spoken Add closed captions, translations, and links to support understanding Recognize speakers and provide data for improving speech recognition and speech synthesis systems Content-based video retrieval: Analyse the visual content Segment the video into shots, find visual objects and concepts describe the video by natural language sentences Recognize people by faces etc. Detect non-speech sounds: explosions, clapping hands, laughing etc. 11

12 Real-time analysis In speech recognition optimize between: Acoustic and language model complexity Search accuracy in decoding In visual concept detection optimize between: Number of concepts detected Number and type of features extracted Time-complexity of the classifier(s) Number of classifiers used in post fusion Number of detections made per second Obtainable accuracy 12

13 Video content annotation demo + Character recognition for name tags Visual concept detection Face recognition Speaker recognition Speech recognition 13

14 Match voice and face when appearing together 14

15 Speaker spotting: - who is moving her lips? Detect faces and identify the rhythm of moving lips, eye blinks and eyebrows Results from Jorma Laaksonen 15

16 Information for a second screen Use audiovisual analysis to provide additional information. Show it on another screen. Can be links to Wikipedia, maps, search results 16

17 Research at COIN Speech recognition: Turn the speech in videos to text Index, summarize, search, browse, and play the video based on what was spoken Add closed captions, translations, and links to support understanding Recognize speakers and provide new data for improving and personalization of speech recognition and synthesis Content-based video retrieval: Analyse the visual content Segment the video into shots, find visual objects and concepts, describe the video by natural language sentences Recognize people by faces etc. Detect non-speech sounds: explosions, clapping hands, laughing etc. 17

18 Personalization requires adaptation of the computational speech models to speaker, language, speaking style, and recording conditions. Speech recognition: Dictation Translation: input Interfaces: input Retrieval of A/V content Speech synthesis: Reading text aloud Translation: output Interfaces: output Storing your personal voice 18

19 How to study the topic at Aalto? COURSES ELEC-E5500 Speech processing ELEC-E5510 Speech recognition ELEC-E5520 Speech and Language processing methods ELEC-E5530 Speech and Language processing seminar ELEC-E5550 Statistical natural language processing CS-E4850 Computer vision CS-E3210 Machine learning MASTER'S PROGRAMME Computer, Communication and Information Sciences MAJORS Signal, Speech and Language Processing Machine Learning and Data Mining (Macadamia)

20 More demos, results etc. Contact: ELEC SCI

Voice. Voice. Patterson EagleSoft Overview Voice 629

Voice. Voice. Patterson EagleSoft Overview Voice 629 Voice Voice Using the Microsoft voice engine, Patterson EagleSoft's Voice module is now faster, easier and more efficient than ever. Please refer to your Voice Installation guide prior to installing the

More information

Entering the World of Ubiquitous Media. Mikko Rusama, Chief Digital Yle February 15th, 2018

Entering the World of Ubiquitous Media. Mikko Rusama, Chief Digital Yle February 15th, 2018 Entering the World of Ubiquitous Media Mikko Rusama, Chief Digital Officer @ Yle February 15th, 2018 Yle milestones 1926 Radio 1958 TV 2004 2007 Revolution of user interfaces Over 35m smart speakers

More information

Lesson 11. Media Retrieval. Information Retrieval. Image Retrieval. Video Retrieval. Audio Retrieval

Lesson 11. Media Retrieval. Information Retrieval. Image Retrieval. Video Retrieval. Audio Retrieval Lesson 11 Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Retrieval = Query + Search Informational Retrieval: Get required information from database/web

More information

Accessibility Guidelines

Accessibility Guidelines Accessibility s Table 1: Accessibility s The guidelines in this section should be followed throughout the course, including in word processing documents, spreadsheets, presentations, (portable document

More information

Quick Start Guide MAC Operating System Built-In Accessibility

Quick Start Guide MAC Operating System Built-In Accessibility Quick Start Guide MAC Operating System Built-In Accessibility Overview The MAC Operating System X has many helpful universal access built-in options for users of varying abilities. In this quickstart,

More information

Part II: Universally-Designed Course Materials

Part II: Universally-Designed Course Materials Part II: Universally-Designed Course Materials Applying the UDL principles Two sides of the UDL coin Diverse Learning Needs Disabilities Mainstream Assistive Usability Accessibility Mandates vs. UDL Legal

More information

Echo360 is collaborating with Amazon to deliver native close captioning. This feature should be available in the next few months.

Echo360 is collaborating with Amazon to deliver native close captioning. This feature should be available in the next few months. Echo360 is collaborating with Amazon to deliver native close captioning. This feature should be available in the next few months. Until that time, here are instructions to use YouTube and Echo360 to generate

More information

Windows VISTA Built-In Accessibility. Quick Start Guide

Windows VISTA Built-In Accessibility. Quick Start Guide Windows VISTA Built-In Accessibility Quick Start Guide Overview Vista Built-In Accessibility Options Vista Ease of Access Center Magnifier Narrator On-Screen Keyboard Voice Recognition To Use How it is

More information

Integrate Speech Technology for Hands-free Operation

Integrate Speech Technology for Hands-free Operation Integrate Speech Technology for Hands-free Operation Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks

More information

A GET YOU GOING GUIDE

A GET YOU GOING GUIDE A GET YOU GOING GUIDE To Your copy here Audio Notetaker 4.0 April 2015 1 Learning Support Getting Started with Audio Notetaker Audio Notetaker is highly recommended for those of you who use a Digital Voice

More information

BUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES

BUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES BUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES O.O. Iakushkin a, G.A. Fedoseev, A.S. Shaleva, O.S. Sedova Saint Petersburg State University, 7/9 Universitetskaya nab., St. Petersburg,

More information

APPLYING THE POWER OF AI TO YOUR VIDEO PRODUCTION STORAGE

APPLYING THE POWER OF AI TO YOUR VIDEO PRODUCTION STORAGE APPLYING THE POWER OF AI TO YOUR VIDEO PRODUCTION STORAGE FINDING WHAT YOU NEED IN YOUR IN-HOUSE VIDEO STORAGE SECTION 1 You need ways to generate metadata for stored videos without time-consuming manual

More information

Accessibility: Building Products Everyone Can Use

Accessibility: Building Products Everyone Can Use Accessibility: Building Products Everyone Can Use Brad Green & Erin Rosenthal May 10, 2011 Twitter hash tags: #io2011, #TechTalk Feedback: goo.gl/n9bbr How many of you Accessibility awareness? Responsible

More information

Students are placed in System 44 based on their performance in the Scholastic Phonics Inventory. System 44 Placement and Scholastic Phonics Inventory

Students are placed in System 44 based on their performance in the Scholastic Phonics Inventory. System 44 Placement and Scholastic Phonics Inventory System 44 Overview The System 44 student application leads students through a predetermined path to learn each of the 44 sounds and the letters or letter combinations that create those sounds. In doing

More information

y texthelp Read&Write for Google Chrome Quick Reference Guide Docs, Slides and Web read&write - j & Google Docs

y texthelp Read&Write for Google Chrome Quick Reference Guide Docs, Slides and Web read&write - j & Google Docs y texthelp Read&Write for Chrome Quick Reference Guide 12.17 f m El 11 s, Slides and i >» := n i* - j Tool Symbol Where it works How it works Text to Speech Reads text aloud with dual color highlighting

More information

Nielsen List of Top 10 ios Mobile Apps

Nielsen List of Top 10 ios Mobile Apps Nielsen List of Top 10 ios Mobile Apps Nielsen's list of the most popular 10 mobile apps for ios in 2016 was dominated by just four technology giants: Google, Facebook, Apple and Amazon. The Nielsen organization

More information

TypeIt ReadIt. Windows v 1.7

TypeIt ReadIt. Windows v 1.7 TypeIt ReadIt Windows v 1.7 1 Table of Contents Page Topic 3 TypeIt ReadIt 4 What s New With Version 1.7 5 System Requirements 6 User Interface 11 Keyboard Shortcuts 12 Printing 2 TypeIt ReadIt TypeIt

More information

ABSTRACT 1. INTRODUCTION

ABSTRACT 1. INTRODUCTION ABSTRACT A Framework for Multi-Agent Multimedia Indexing Bernard Merialdo Multimedia Communications Department Institut Eurecom BP 193, 06904 Sophia-Antipolis, France merialdo@eurecom.fr March 31st, 1995

More information

The Leading Monitoring and Intelligence Platform for Post Broadcast Media

The Leading Monitoring and Intelligence Platform for Post Broadcast Media The Leading Monitoring and Intelligence Platform for Post Broadcast Media Actus View Web-based Broadcast and Monitoring Platform Records any TV, radio or internet media from any input and any format View

More information

Balancing Usability and Security in a Video CAPTCHA

Balancing Usability and Security in a Video CAPTCHA Balancing Usability and Security in a Video CAPTCHA Google, Inc. kak@google.com Rochester Institute of Technology rlaz@cs.rit.edu Symposium on Usable Privacy and Security (SOUPS) 2009 July 15th-17th, 2009,

More information

User guide. Parrot SK4000. English. Parrot SK4000 User Guide 1

User guide. Parrot SK4000. English. Parrot SK4000 User Guide 1 User guide Parrot SK4000 English Parrot SK4000 User Guide 1 Table of contents Introduction... 4 Kit contents... 4 Using the Parrot SK4000 for the first time... 5 Installing the Parrot SK4000... 5 Description

More information

System 44 Next Generation Software Manual

System 44 Next Generation Software Manual System 44 Next Generation Software Manual For use with System 44 Next Generation version 3.x or later and Student Achievement Manager version 3.x or later Table of Contents Overview... 5 Instructional

More information

Guide to creating a PowerPoint presentation with audio (Mac) and uploading to Moodle

Guide to creating a PowerPoint presentation with audio (Mac) and uploading to Moodle Guide to creating a PowerPoint presentation with audio (Mac) and uploading to Moodle This is a guide to creating an audio enhanced PowerPoint presentation using the Mac version. The PowerPoint programme

More information

Native Reporting for CARESTREAM Vue PACS

Native Reporting for CARESTREAM Vue PACS Native Reporting for CARESTREAM Vue PACS Part # 6K5150 2012-11-29 PAGE 1 of 29 Table of Contents Before You Begin... 3 Using the Speech Microphone Buttons... 3 Audio Wizard... 4 Running the Audio Wizard...

More information

CMU Sphinx: the recognizer library

CMU Sphinx: the recognizer library CMU Sphinx: the recognizer library Authors: Massimo Basile Mario Fabrizi Supervisor: Prof. Paola Velardi 01/02/2013 Contents 1 Introduction 2 2 Sphinx download and installation 4 2.1 Download..........................................

More information

Automatic Transcription of Speech From Applied Research to the Market

Automatic Transcription of Speech From Applied Research to the Market Think beyond the limits! Automatic Transcription of Speech From Applied Research to the Market Contact: Jimmy Kunzmann kunzmann@eml.org European Media Laboratory European Media Laboratory (founded 1997)

More information

New Features. Importing Resources

New Features. Importing Resources CyberLink StreamAuthor 4 is a powerful tool for creating compelling media-rich presentations using video, audio, PowerPoint slides, and other supplementary documents. It allows users to capture live videos

More information

Good afternoon and thank you for being at the webinar on accessible PowerPoint presentations. This is Dr. Zayira Jordan web accessibility coordinator

Good afternoon and thank you for being at the webinar on accessible PowerPoint presentations. This is Dr. Zayira Jordan web accessibility coordinator Good afternoon and thank you for being at the webinar on accessible PowerPoint presentations. This is Dr. Zayira Jordan web accessibility coordinator at Iowa State and this is the topic for this week s

More information

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Previous Lecture Audio Retrieval - Query by Humming

More information

Speech Recognition Systems for Automatic Transcription, Voice Command & Dialog applications. Frédéric Beaugendre

Speech Recognition Systems for Automatic Transcription, Voice Command & Dialog applications. Frédéric Beaugendre Speech Recognition Systems for Automatic Transcription, Voice Command & Dialog applications Frédéric Beaugendre www.seekiotech.com SeekioTech Start-up hosted at the multimedia incubator «la Belle de Mai»,

More information

Hands-Free Internet using Speech Recognition

Hands-Free Internet using Speech Recognition Introduction Trevor Donnell December 7, 2001 6.191 Preliminary Thesis Proposal Hands-Free Internet using Speech Recognition The hands-free Internet will be a system whereby a user has the ability to access

More information

TypeIt ReadIt. Macintosh v 1.7

TypeIt ReadIt. Macintosh v 1.7 TypeIt ReadIt Macintosh v 1.7 1 Table of Contents Page Topic 3 TypeIt ReadIt 4 What s New With Version 1.7 5 System Requirements 6 User Interface 11 Keyboard Shortcuts 12 Printing 2 TypeIt ReadIt TypeIt

More information

8.5 Application Examples

8.5 Application Examples 8.5 Application Examples 8.5.1 Genre Recognition Goal Assign a genre to a given video, e.g., movie, newscast, commercial, music clip, etc.) Technology Combine many parameters of the physical level to compute

More information

Eye and Mouth Openness Estimation in Sign Language and News Broadcast Videos

Eye and Mouth Openness Estimation in Sign Language and News Broadcast Videos Aalto University School of Science Master s Programme in Machine Learning and Data Mining Marcos Luzardo Eye and Mouth Openness Estimation in Sign Language and News Broadcast Videos Master s Thesis Espoo,

More information

LIP ACTIVITY DETECTION FOR TALKING FACES CLASSIFICATION IN TV-CONTENT

LIP ACTIVITY DETECTION FOR TALKING FACES CLASSIFICATION IN TV-CONTENT LIP ACTIVITY DETECTION FOR TALKING FACES CLASSIFICATION IN TV-CONTENT Meriem Bendris 1,2, Delphine Charlet 1, Gérard Chollet 2 1 France Télécom R&D - Orange Labs, France 2 CNRS LTCI, TELECOM-ParisTech,

More information

D6.4: Report on Integration into Community Translation Platforms

D6.4: Report on Integration into Community Translation Platforms D6.4: Report on Integration into Community Translation Platforms Philipp Koehn Distribution: Public CasMaCat Cognitive Analysis and Statistical Methods for Advanced Computer Aided Translation ICT Project

More information

Q.bo Webi User s Guide

Q.bo Webi User s Guide Contents Q.bo Webi reference guide... 2 1.1. Login... 3 1.2. System Check... 3 1.3. Config Wizard... 6 1.4. Teleoperation... 7 1.5. Training... 9 1.6. Questions & Answers... 10 1.7. Voice Recognition...

More information

ISAN: the Global ID for AV Content

ISAN: the Global ID for AV Content ISAN: the Global ID for AV Content A value added number for RMI Patrick Attallah Managing Director ISAN International Agency WIPO Geneva 17th of September 2007 ISAN International Agency 30 rue de Saint

More information

Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics

Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics Media Intelligence Business intelligence (BI) Uses data mining techniques and tools for the transformation of raw data into meaningful

More information

Approach to Metadata Production and Application Technology Research

Approach to Metadata Production and Application Technology Research Approach to Metadata Production and Application Technology Research In the areas of broadcasting based on home servers and content retrieval, the importance of segment metadata, which is attached in segment

More information

Smore s Accessibility Conformance Report VPAT Version 2.1 March 2018

Smore s Accessibility Conformance Report VPAT Version 2.1 March 2018 Smore s Accessibility Conformance Report VPAT Version 2.1 March 2018 Voluntary Product Accessibility Template and VPAT are registered service marks of the Information Technology Industry Council (ITI)

More information

Today. Web Accessibility. No class next week. Spring Break

Today. Web Accessibility. No class next week. Spring Break HCI and Design Today Web Accessibility No class next week. Spring Break Who is affected? People with disabilities Visual, hearing, motor, cognitive, reading About 1 in 5 adults (webaim.org/intro) Older

More information

Read&Write 5 GOLD FOR MAC MANUAL

Read&Write 5 GOLD FOR MAC MANUAL Read&Write 5 GOLD FOR MAC MANUAL ABBYY FineReader Engine 8.0 ABBYY Software Ltd. 2005. ABBYY FineReader the keenest eye in OCR. ABBYY, FINEREADER and ABBYY FineReader are registered trademarks of ABBYY

More information

Speech Applications. How do they work?

Speech Applications. How do they work? Speech Applications How do they work? What is a VUI? What the user interacts with when using a speech application VUI Elements Prompts or System Messages Prerecorded or Synthesized Grammars Define the

More information

Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV

Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Jan Vaněk and Josef V. Psutka Department of Cybernetics, West Bohemia University,

More information

What s Working Now. October YouTube Optimization

What s Working Now. October YouTube Optimization What s Working Now October 2015 YouTube Optimization Software Updates Today s Content Crowd Force & Bounce Breaker Why use YouTube? Video content strategy Uploading your videos the right way Optimization

More information

Anthony Ho. Ian Brown

Anthony Ho. Ian Brown Anthony Ho Ian Brown Practical Uses of Video Intro Video Resources (commercial) Video Resources (making your own) Video for MOOCs Video@PolyU a summary Practical Uses of Video 2014 a pivotal year for elearning

More information

Create accessible video from a PowerPoint slide presentation

Create accessible video from a PowerPoint slide presentation Create accessible video from a PowerPoint slide presentation The instructions below outline the process and preparations to create an accessible video file from your PowerPoint slide presentation. Create

More information

Page 1. Arrakis Systems 6604 Powell St. Loveland, CO

Page 1. Arrakis Systems 6604 Powell St. Loveland, CO Page 1 REVISION 1.0 27 February 2014 Page 2 NEW~WAVE QUICK START GUIDE Congratulations on your purchase of the New~Wave automation system! This quick start guide is to help get you setup quickly and easily.

More information

CIMWOS: A MULTIMEDIA ARCHIVING AND INDEXING SYSTEM

CIMWOS: A MULTIMEDIA ARCHIVING AND INDEXING SYSTEM CIMWOS: A MULTIMEDIA ARCHIVING AND INDEXING SYSTEM Nick Hatzigeorgiu, Nikolaos Sidiropoulos and Harris Papageorgiu Institute for Language and Speech Processing Epidavrou & Artemidos 6, 151 25 Maroussi,

More information

Maine CITE Webinar Presenter s Guide

Maine CITE Webinar Presenter s Guide Maine CITE Webinar Presenter s Guide Revised January 2016 When presenting at a Maine CITE sponsored webinar, we ask that you use this guide in preparing for your session. Maine CITE is committed to ensuring

More information

Master Your Mac. simple ways to tweak, customize, and secure os x

Master Your Mac. simple ways to tweak, customize, and secure os x Master Your Mac simple ways to tweak, customize, and secure os x matt cone 10 Talking to Your Mac You don t need a degree in computer science to know that talking to your computer is one of the ultimate

More information

Create Swift mobile apps with IBM Watson services IBM Corporation

Create Swift mobile apps with IBM Watson services IBM Corporation Create Swift mobile apps with IBM Watson services Create a Watson sentiment analysis app with Swift Learning objectives In this section, you ll learn how to write a mobile app in Swift for ios and add

More information

INTERNET AUDIO GUY MIKE STEWART INTERVIEWED

INTERNET AUDIO GUY MIKE STEWART INTERVIEWED INTERNET AUDIO GUY MIKE STEWART INTERVIEWED WHO IS MIKE STEWART WHAT SOFTWARE MAKES IT EASIER TO RECORD AND EDIT THE SOUND FILES YOU CREATE? AUDIO MARKETING TIPS VIDEO MARKETING TIPS AMAZON S3 TO REDUCE

More information

WEB APPLICATION FOR VOICE OPERATED EXCHANGE

WEB APPLICATION FOR VOICE OPERATED  EXCHANGE WEB APPLICATION FOR VOICE OPERATED E-MAIL EXCHANGE Sangeet Sagar 1, Vaibhav Awasthi 2, Samarth Rastogi 3, Tushar Garg 4, S. Kuzhalvaimozhi 5 1, 2,3,4,5 Information Science and Engineering, National Institute

More information

Reaching All Learners With Leopard

Reaching All Learners With Leopard Reaching All Learners With Leopard Diverse Learners Learning disabilities English Language barriers Emotional, behavior problems Lack of interest or engagement Sensory and physical disabilities Teaching

More information

The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System

The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System Our first participation on the TRECVID workshop A. F. de Araujo 1, F. Silveira 2, H. Lakshman 3, J. Zepeda 2, A. Sheth 2, P. Perez

More information

Digital Audio Basics

Digital Audio Basics CSC 170 Introduction to Computers and Their Applications Lecture #2 Digital Audio Basics Digital Audio Basics Digital audio is music, speech, and other sounds represented in binary format for use in digital

More information

System 44 Next Generation Software Manual

System 44 Next Generation Software Manual System 44 Next Generation Software Manual For use with System 44 Next Generation version 2.4 or later and Student Achievement Manager version 2.4 or later PDF0836 (PDF) Houghton Mifflin Harcourt Publishing

More information

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research

More information

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT Shoichiro IWASAWA*I, Tatsuo YOTSUKURA*2, Shigeo MORISHIMA*2 */ Telecommunication Advancement Organization *2Facu!ty of Engineering, Seikei University

More information

Multimedia Information Retrieval The case of video

Multimedia Information Retrieval The case of video Multimedia Information Retrieval The case of video Outline Overview Problems Solutions Trends and Directions Multimedia Information Retrieval Motivation With the explosive growth of digital media data,

More information

The 10 Questions Learning Leaders Should Ask in a Video Platform RFP

The 10 Questions Learning Leaders Should Ask in a Video Platform RFP The 10 Questions Learning Leaders Should Ask in a Video Platform RFP Steve Rozillis, Director, Customer Evangelism, Panopto ATD Watch & Learn webcast April 26, 2017 Storing in your LMS or CMS with 2GB

More information

The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect.

The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect. Web Accessibility The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect. Tim Berners-Lee, W3C Director and inventor of the World Wide Web 20% of

More information

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009 9 Video Retrieval Multimedia Databases 9 Video Retrieval 9.1 Hidden Markov Models (continued from last lecture) 9.2 Introduction into Video Retrieval Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme

More information

Hands-off Use of Computer towards Universal Access through Voice Control Human-Computer Interface

Hands-off Use of Computer towards Universal Access through Voice Control Human-Computer Interface Hands-off Use of Computer towards Universal Access through Voice Control Human-Computer Interface Dalila Landestoy, Melvin Ayala, Malek Adjouadi, and Walter Tischer Center for Advanced Technology and Education

More information

SNLP ELEC-E5550 Exercise 4: Speech Recognition

SNLP ELEC-E5550 Exercise 4: Speech Recognition : Speech Recognition Stig-Arne Grönroos Department of Signal Processing and Acoustics Aalto University, School of Electrical Engineering stig-arne.gronroos@aalto.fi 02.02.2017 Ex 4.1: Viterbi alignment

More information

Media Retrieval (2) Prepared by. Ling Guan Jose Lay Paisarn Muneesawang Ning Zhang Rui Zhang. Outlines (revisited)

Media Retrieval (2) Prepared by. Ling Guan Jose Lay Paisarn Muneesawang Ning Zhang Rui Zhang. Outlines (revisited) Media Retrieval (2) Prepared by Ling Guan Jose Lay Paisarn Muneesawang Ning Zhang Rui Zhang 1 Outlines (revisited) Introduction: Intellectual Foundation of Multimedia Information Retrieval Retrieval Models

More information

JAWS for Windows Training Bundle Outline

JAWS for Windows Training Bundle Outline Introduction to the Training Overview of topics to be covered in the training JAWS for Windows Training Bundle Outline Introduction to the DAISY format and why it is being used PlexTalk Pocket Introduction

More information

Accessibility on the Mac Website:

Accessibility on the Mac Website: Website: http://etc.usf.edu/te/ The Mac operating system includes several assistive technologies designed to make it easier for a person with a disability to use the computer. Whether you have difficulty

More information

Optical Character Recognition Based Speech Synthesis System Using LabVIEW

Optical Character Recognition Based Speech Synthesis System Using LabVIEW Optical Character Recognition Based Speech Synthesis System Using LabVIEW S. K. Singla* 1 and R.K.Yadav 2 1 Electrical and Instrumentation Engineering Department Thapar University, Patiala,Punjab *sunilksingla2001@gmail.com

More information

Topics in Operating Systems (mini-project)

Topics in Operating Systems (mini-project) Topics in Operating Systems (mini-project) Open-set speaker recognition by Ilya Kaganovsky Abstract Saya is a robotic receptionist of the Department of Computer Science in Ben- Gurion University of the

More information

C. The system is equally reliable for classifying any one of the eight logo types 78% of the time.

C. The system is equally reliable for classifying any one of the eight logo types 78% of the time. Volume: 63 Questions Question No: 1 A system with a set of classifiers is trained to recognize eight different company logos from images. It is 78% accurate. Without further information, which statement

More information

Life is a Learning Management System that is being rolled out into BCE Schools over the next year.

Life is a Learning Management System that is being rolled out into BCE Schools over the next year. Life is a Learning Management System that is being rolled out into BCE Schools over the next year. It allows teachers and students to share an online space, where resources and digital tools may be accessed

More information

Uploading Videos and Podcast to your Website

Uploading Videos and Podcast to your Website Part of the New Enhancements of the Saddleback Website Templates is the ability to Upload Videos just as simply as uploading pictures to the slideshow. Newly engineered sites have a Podcast and Stream

More information

MPEG-7. Multimedia Content Description Standard

MPEG-7. Multimedia Content Description Standard MPEG-7 Multimedia Content Description Standard Abstract The purpose of this presentation is to provide a better understanding of the objectives & components of the MPEG-7, "Multimedia Content Description

More information

Big Data, exploiter de grands volumes de données

Big Data, exploiter de grands volumes de données Big Data, exploiter de grands volumes de données mardi 3 juillet 2012 Daniel Teruggi, Head of Research dteruggi@ina.fr Ina: Institut National de l Audiovisuel Institut national de l audiovisuel Missions:

More information

Hot Transfer. Guide & User Instructions. America s Largest Message Notification Provider. Revised 04/2013

Hot Transfer. Guide & User Instructions. America s Largest Message Notification Provider. Revised 04/2013 Hot Transfer Guide & User Instructions Revised 04/2013 726 Grant Street Troy Ohio 45373 877.698.3262 937.335.3887 onecallnow.com support@onecallnow.com America s Largest Message Notification Provider Copyright

More information

FP SIMPLE4ALL deliverable D6.5. Deliverable D6.5. Initial Public Release of Open Source Tools

FP SIMPLE4ALL deliverable D6.5. Deliverable D6.5. Initial Public Release of Open Source Tools Deliverable D6.5 Initial Public Release of Open Source Tools The research leading to these results has received funding from the European Community s Seventh Framework Programme (FP7/2007-2013) under grant

More information

17/09/2015 Dyslexia Handbook XMC/LOC

17/09/2015 Dyslexia Handbook XMC/LOC INDEX 1. How to add the Speak Tab to Microsoft word. This option allows you to highlight text within Word and have these words spoken back to you. 2. How to switch on Speech Recognition in Microsoft 3.

More information

Enhancing applications with Cognitive APIs IBM Corporation

Enhancing applications with Cognitive APIs IBM Corporation Enhancing applications with Cognitive APIs After you complete this section, you should understand: The Watson Developer Cloud offerings and APIs The benefits of commonly used Cognitive services 2 Watson

More information

Voice Control becomes Natural

Voice Control becomes Natural Voice Control becomes Natural ITU-T FOCUS GROUP CarCom -- SPEECH IN CARS Dr. Udo Haiber Torino, Italy, October 16, 2009 Overview Company What is Natural? Involved Components Focus Change Approach Conclusion

More information

Browsing News and TAlk Video on a Consumer Electronics Platform Using face Detection

Browsing News and TAlk Video on a Consumer Electronics Platform Using face Detection MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Browsing News and TAlk Video on a Consumer Electronics Platform Using face Detection Kadir A. Peker, Ajay Divakaran, Tom Lanning TR2005-155

More information

3 Publishing Technique

3 Publishing Technique Publishing Tool 32 3 Publishing Technique As discussed in Chapter 2, annotations can be extracted from audio, text, and visual features. The extraction of text features from the audio layer is the approach

More information

Visual Modeling and Feature Adaptation in Sign Language Recognition

Visual Modeling and Feature Adaptation in Sign Language Recognition Visual Modeling and Feature Adaptation in Sign Language Recognition Philippe Dreuw and Hermann Ney dreuw@cs.rwth-aachen.de ITG 2008, Aachen, Germany Oct 2008 Human Language Technology and Pattern Recognition

More information

Practical Applications of Machine Learning for Image and Video in the Cloud

Practical Applications of Machine Learning for Image and Video in the Cloud Practical Applications of Machine Learning for Image and Video in the Cloud Shawn Przybilla, AWS Solutions Architect M&E @shawnprzybilla 2/27/18 There were 3.7 Billion internet users in 2017 1.2 Trillion

More information

PowerPoint with Voice-over-slides

PowerPoint with Voice-over-slides Making the Document Accessible: PowerPoint with Voice-over-slides Voice-over-slides should include closed caption for hearing-impaired audience. This instruction consists of two parts: 1. Convert a PowerPoint

More information

CS 4518 Mobile and Ubiquitous Computing Lecture 15: Final Project Slides/Paper, Other Ubicomp Android APIs Emmanuel Agu

CS 4518 Mobile and Ubiquitous Computing Lecture 15: Final Project Slides/Paper, Other Ubicomp Android APIs Emmanuel Agu CS 4518 Mobile and Ubiquitous Computing Lecture 15: Final Project Slides/Paper, Other Ubicomp Android APIs Emmanuel Agu The Rest of the Class The Rest of this class Part 1: Course and Android Introduction

More information

Fraunhofer IAIS Audio Mining Solution for Broadcast Archiving. Dr. Joachim Köhler LT-Innovate Brussels

Fraunhofer IAIS Audio Mining Solution for Broadcast Archiving. Dr. Joachim Köhler LT-Innovate Brussels Fraunhofer IAIS Audio Mining Solution for Broadcast Archiving Dr. Joachim Köhler LT-Innovate Brussels 22.11.2016 1 Outline Speech Technology in the Broadcast World Deep Learning Speech Technologies Fraunhofer

More information

R&D White Paper WHP 070. A distributed live subtitling system. Research & Development BRITISH BROADCASTING CORPORATION. September M.

R&D White Paper WHP 070. A distributed live subtitling system. Research & Development BRITISH BROADCASTING CORPORATION. September M. R&D White Paper WHP 070 September 2003 A distributed live subtitling system M. Marks Research & Development BRITISH BROADCASTING CORPORATION BBC Research & Development White Paper WHP 070 A distributed

More information

These are meant to be used as desktop reminders or cheat sheets for using Read&Write Gold. To use. your Print Dialog box as shown

These are meant to be used as desktop reminders or cheat sheets for using Read&Write Gold. To use. your Print Dialog box as shown These are meant to be used as desktop reminders or cheat sheets for using Read&Write Gold. To use them Print as HANDOUTS by setting your Print Dialog box as shown Then Print and Cut up as individual cards,

More information

Reference Manual ACTIV 1.0

Reference Manual ACTIV 1.0 Reference Manual ACTIV 1.0 Adapted Captions through Interactive Video (ACTIV) system is designed to easily enhance existing video clips with adaptive features such as highlighted text captions, picture

More information

DIGITAL ACCESSIBILITY IN PRACTICE

DIGITAL ACCESSIBILITY IN PRACTICE DIGITAL ACCESSIBILITY IN PRACTICE MAKING CONTENT ACCESSIBLE: WEBSITES PDFs SOCIAL MEDIA VIDEO CAPTIONING Learn more at: cuny.edu/accessibility MAKING CONTENT ACCESSIBLE: Anyone developing content (text,

More information

Section Software Applications and Operating Systems - Detail Criteria Supporting Features Remarks and explanations (a) When software is design

Section Software Applications and Operating Systems - Detail Criteria Supporting Features Remarks and explanations (a) When software is design Section 1194.21 Software Applications and Operating Systems - Detail Criteria Supporting Features Remarks and explanations (a) When software is designed to run on a system that has a keyboard, product

More information

Introductory Visualizing Technology

Introductory Visualizing Technology Introductory Visualizing Technology Seventh Edition Chapter 6 Digital Devices and Multimedia Learning Objectives 6.1 Explain the Features of Digital Cameras 6.2 Compare Methods for Transferring Images

More information

Glog One! Glog All! Jan McGee, Technology Coordinator West Monroe High School. Modified by Katherine Powell, Teacher Librarian Poway High School

Glog One! Glog All! Jan McGee, Technology Coordinator West Monroe High School. Modified by Katherine Powell, Teacher Librarian Poway High School Glog One! Glog All! Jan McGee, Technology Coordinator West Monroe High School Modified by Katherine Powell, Teacher Librarian Poway High School A Glog is like a poster... only better Glogs allow students

More information

ADA Compliant Design. Short Guide

ADA Compliant Design. Short Guide ADA Compliant Design Short Guide Suffolk County Community College Center for Innovative Pedagogy 2018 Table of Contents ADA Compliant Design: General Tips Using Heading Styles in Microsoft Word Creating

More information

TE-001. viaplatz. - Video-based Knowledge and Information Sharing - NTT IT Corp.

TE-001. viaplatz. - Video-based Knowledge and Information Sharing - NTT IT Corp. 2015.06.TE-001 viaplatz - Video-based Knowledge and Information Sharing - NTT IT Corp. Problem? Too much time, work and cost for employee training and seminars for compliance. We are not good at sharing

More information

Object-based audio production. Chris Baume EBU-PTS - 27th January 2016

Object-based audio production. Chris Baume EBU-PTS - 27th January 2016 Object-based audio production Chris Baume EBU-PTS - 27th January 2016 Structure Challenges in Radio ORPHEUS project Impact on production workflow Production tool demo What is object-based

More information

Quick Start Guide Natural Reader 14 (free version)

Quick Start Guide Natural Reader 14 (free version) Assistive Technology & Alternative Format Centre Disability Resource Service University of Canterbury Quick Start Guide Natural Reader 14 (free version) Overview Natural Reader text to speech software

More information