Real-time large-scale analysis of audiovisual data

Size: px

Start display at page:

Download "Real-time large-scale analysis of audiovisual data"

Briana Boone
5 years ago
Views:

1 Finnish Center of Excellence in Computational Inference Real-time large-scale analysis of audiovisual data Department of Signal Processing and Acoustics Aalto University School of Electrical Engineering Thanks to: Jorma Laaksonen Department of Computer Science Aalto University School of Science Thanks also to research groups at both departments

2 About Mikko Associate professor in speech and language processing at Aalto Background from Machine Learning algorithms and Pattern Recognition systems PhD 1997 at TKK on speech recognition training algorithms Research experience in several top speech and language groups: Research Centers: IDIAP (CH), SRI (USA), ICSI (USA) Universities: Edinburgh, Cambridge, Colorado, Nagoya Head of Aalto speech recognition research group, several national and European speech projects Research topics: Speech recognition, language modeling, speaker adaptation, speech translation, information retrieval from audio and video data

3 Goals of today 1. Know why video data are so important today 2. Learn ways how large-scale video data are used 3. Learn about related research topics at Aalto 4. Learn how to study speech and video processing at Aalto 3

Most mobile data are video Global mobile data traffic grew 69 percent in 2014.

4 Most mobile data are video Global mobile data traffic grew 69 percent in Mobile video traffic exceeded 50 percent of total mobile data traffic for the first time in ( 4

National audiovisual institute KAVI Archives Finnish television and radio streams (https://kavi.

5 National audiovisual institute KAVI Archives Finnish television and radio streams ( 32 main channels full time, every day 100 other channels by samples Available for studio viewing for researchers and public since 2009 (no mobile viewing) 5

6 Digital archives of Yle Television and radio broadcasts of the Finnish Broadcasting Company (Yle) archived since 1935 Full digital archive available for Yle, selected parts also for public: Elävä Arkisto Areena 6

7 What people watch? Every day people watch hundreds of millions of hours on YouTube. Over 100 hours of video are uploaded every minute More than half of YouTube views come from mobile devices. ( ress/statistics.html) 7

8 How to use large-scale video data? Give a few examples! 8

9 Research at COIN Speech recognition: Turn the speech in videos to text Content-based video retrieval: Analyse the visual content 9

10 Research at COIN Speech recognition: Turn the speech in videos to text Index, summarize, search, browse, and play the video based on what was spoken Add captions, translations, and links to support understanding Recognize speakers and provide training data for improving speech recognition and speech synthesis systems Content-based video retrieval: Analyse the visual content 10

11 Research at COIN Speech recognition: Turn the speech in videos to text Index, summarize, search, browse, and play the video based on what was spoken Add closed captions, translations, and links to support understanding Recognize speakers and provide data for improving speech recognition and speech synthesis systems Content-based video retrieval: Analyse the visual content Segment the video into shots, find visual objects and concepts describe the video by natural language sentences Recognize people by faces etc. Detect non-speech sounds: explosions, clapping hands, laughing etc. 11

12 Real-time analysis In speech recognition optimize between: Acoustic and language model complexity Search accuracy in decoding In visual concept detection optimize between: Number of concepts detected Number and type of features extracted Time-complexity of the classifier(s) Number of classifiers used in post fusion Number of detections made per second Obtainable accuracy 12

13 Video content annotation demo + Character recognition for name tags Visual concept detection Face recognition Speaker recognition Speech recognition 13

14 Match voice and face when appearing together 14

15 Speaker spotting: - who is moving her lips? Detect faces and identify the rhythm of moving lips, eye blinks and eyebrows Results from Jorma Laaksonen 15

16 Information for a second screen Use audiovisual analysis to provide additional information. Show it on another screen. Can be links to Wikipedia, maps, search results 16

17 Research at COIN Speech recognition: Turn the speech in videos to text Index, summarize, search, browse, and play the video based on what was spoken Add closed captions, translations, and links to support understanding Recognize speakers and provide new data for improving and personalization of speech recognition and synthesis Content-based video retrieval: Analyse the visual content Segment the video into shots, find visual objects and concepts, describe the video by natural language sentences Recognize people by faces etc. Detect non-speech sounds: explosions, clapping hands, laughing etc. 17

18 Personalization requires adaptation of the computational speech models to speaker, language, speaking style, and recording conditions. Speech recognition: Dictation Translation: input Interfaces: input Retrieval of A/V content Speech synthesis: Reading text aloud Translation: output Interfaces: output Storing your personal voice 18

19 How to study the topic at Aalto? COURSES ELEC-E5500 Speech processing ELEC-E5510 Speech recognition ELEC-E5520 Speech and Language processing methods ELEC-E5530 Speech and Language processing seminar ELEC-E5550 Statistical natural language processing CS-E4850 Computer vision CS-E3210 Machine learning MASTER'S PROGRAMME Computer, Communication and Information Sciences MAJORS Signal, Speech and Language Processing Machine Learning and Data Mining (Macadamia)

20 More demos, results etc. Contact: ELEC SCI

Voice. Voice. Patterson EagleSoft Overview Voice 629

Voice. Voice. Patterson EagleSoft Overview Voice 629 Voice Voice Using the Microsoft voice engine, Patterson EagleSoft's Voice module is now faster, easier and more efficient than ever. Please refer to your Voice Installation guide prior to installing the