Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Similar documents
MPEG-7 Audio: Tools for Semantic Audio Description and Processing

Lesson 11. Media Retrieval. Information Retrieval. Image Retrieval. Video Retrieval. Audio Retrieval

MPEG-4 General Audio Coding

Lecture 7: Introduction to Multimedia Content Description. Reji Mathew & Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2009

Multimedia Database Systems. Retrieval by Content

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 4: Audio

RECOMMENDATION ITU-R BS Procedure for the performance test of automated query-by-humming systems

MPEG-7. Multimedia Content Description Standard

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

An Intelligent System for Archiving and Retrieval of Audiovisual Material Based on the MPEG-7 Description Schemes

Principles of Audio Coding

Audio Classification and Content Description

Optical Storage Technology. MPEG Data Compression

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes:

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Lesson 6. MPEG Standards. MPEG - Moving Picture Experts Group Standards - MPEG-1 - MPEG-2 - MPEG-4 - MPEG-7 - MPEG-21

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

USING METADATA TO PROVIDE SCALABLE BROADCAST AND INTERNET CONTENT AND SERVICES

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

9/8/2016. Characteristics of multimedia Various media types

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

2.4 Audio Compression

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1

Detection of Acoustic Events in Meeting-Room Environment

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Audio-coding standards

MP3. Panayiotis Petropoulos

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Delivery Context in MPEG-21

MPEG-4. Today we'll talk about...

MPEG-4: Overview. Multimedia Naresuan University

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

MPEG-7 MULTIMEDIA TECHNIQUE

Mpeg 1 layer 3 (mp3) general overview

CHAPTER 8 Multimedia Information Retrieval

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Compression transparent low-level description of audio signals

Adaptive Multimedia Messaging based on MPEG-7 The M 3 -Box

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009

Lecture 3 Image and Video (MPEG) Coding

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 5: Multimedia description schemes

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

Audio-coding standards

Contend Based Multimedia Retrieval

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Video Search and Retrieval Overview of MPEG-7 Multimedia Content Description Interface

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Audio for Everybody. OCPUG/PATACS 21 January Tom Gutnick. Copyright by Tom Gutnick. All rights reserved.

The MPEG-7 Description Standard 1

Available online Journal of Scientific and Engineering Research, 2016, 3(4): Research Article

The MPEG-4 General Audio Coder

Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme

08 Sound. Multimedia Systems. Nature of Sound, Store Audio, Sound Editing, MIDI

Optimal Video Adaptation and Skimming Using a Utility-Based Framework

Lecture Information. Mod 01 Part 1: The Need for Compression. Why Digital Signal Coding? (1)

ITNP80: Multimedia! Sound-II!

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Multimedia Information Retrieval

Interoperable Content-based Access of Multimedia in Digital Libraries

M PEG-7,1,2 which became ISO/IEC 15398

Lecture Information Multimedia Video Coding & Architectures

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Digital Media. Daniel Fuller ITEC 2110

Extraction, Description and Application of Multimedia Using MPEG-7

Inserting multimedia objects in Dreamweaver

ISAN: the Global ID for AV Content

Multimedia Information Retrieval: Or This stuff is really hard... really! Matt Earp / Kid Kameleon

Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Media Formats. Sound. Image. Video. WAV (uncompressed, PCM) MIDI. BMP XML dia MPEG

ELL 788 Computational Perception & Cognition July November 2015

5: Music Compression. Music Coding. Mark Handley

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Audio and video compression

Multimedia Communications. Audio coding

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

MPEG-4 aacplus - Audio coding for today s digital media world

MPEG-7 Context and Objectives

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Minimal-Impact Personal Audio Archives

WHITE PAPER. Director Prof. Dr.-Ing. Albert Heuberger Am Wolfsmantel Erlangen

EUROPEAN COMPUTER DRIVING LICENCE. Multimedia Audio Editing. Syllabus

CHAPTER 7 MUSIC INFORMATION RETRIEVAL

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Digital Image Processing

Multimedia Databases

6MPEG-4 audio coding tools

Automatic Classification of Audio Data

Multimedia Databases. 8 Audio Retrieval. 8.1 Music Retrieval. 8.1 Statistical Features. 8.1 Music Retrieval. 8.1 Music Retrieval 12/11/2009

Parametric Coding of High-Quality Audio

Introduction to LAN/WAN. Application Layer 4

Digital Audio Basics

MPEG-21: The 21st Century Multimedia Framework

Chapter 14 MPEG Audio Compression

MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES

Digital video coding systems MPEG-1/2 Video

Transcription:

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1

Overview Extracting meaning from audio Why semantic description / metadata? Why standardization? The MPEG-7 Standard MPEG-7 Audio Tools Some Applications Conclusions Jürgen Herre, hrr@iis.fhg.de Page 2

Extracting Meaning From Audio Example: Generic pattern matching system - data flow Feature Extraction Feature Processing Classification / Decision Musik Genre, Tempo, Audio data (PCM) Features Clustered Fs. Output 1.4 Mbit/s Æ 10kbit/s Æ 100bit/s Æ 1bit/s Jürgen Herre, hrr@iis.fhg.de Page 3

Extracting Meaning From Audio (2) Data reduction & semantic level perspective Feature Extraction Feature Processing Classification / Decision Musik Genre, Tempo Data Rate / Information Semantic Level What is the optimum type of description? Jürgen Herre, hrr@iis.fhg.de Page 4

Why Semantic Description / Metadata? Challenges of the Multimedia Age How to enable efficient search, filtering, management of multimedia content? Today: Text search (e.g. popular search engines, such as Google, Yahoo, Lycos,...) Find audio / images / movies just as easily by simple intuitive queries! Use semantic analysis directly? Jürgen Herre, hrr@iis.fhg.de Page 5 Extraction / semantic analysis of content is usually complex & costly requires access to raw A/V data (not always possible; high bandwidth demand) Not everything can be extracted from the content itself (e.g. labeling metadata )

The Semantic Description / Metadata Idea Approach Descriptions Supplement A/V data with Content Description (Metadata = data about data ) Holds semantic descriptions of any level Search metadata rather A/V content itself Æ Key to efficient content handling! enable search & retrieval / filtering enable intelligent navigation summarization... 2 Key Areas of Development Jürgen Herre, hrr@iis.fhg.de Page 6 Efficient automatic production and usage of descriptive metadata Standardization of metadata formats

Why Standardization? Imagine there is Then Key Aspect metadata available for the existing A/V resources (e.g. on the web) a common way of representing semantics which key parameters to use how to represent & package them any multimedia application could use & exploit these resources!! (incl. search & recommendation engines, user agents, ) Interoperability between metadata DBs and applications on a world wide scale fi Standardization! Jürgen Herre, hrr@iis.fhg.de Page 7

MPEG-7 or: This is not about source coding! MPEG MPEG-1 (1992) MPEG-2 (1994) MPEG-2 AAC (1997) MPEG-4 (1999+) MPEG-7 (2001+) Moving Pictures Expert Group, ISO/IEC JTC1/SC29/WG11 First generic audio coding standard, Layers 1-3, (DAB, Worldspace, DVB, Internet Audio, MP3 ) Extending MPEG-1 coders towards lower sampling rates & multi-channel... More powerful mono... multi-channel coding New functionalities (scalability, object oriented representation, interactivity...) Multimedia Content Description Interface Metadata standard (not compression!) Jürgen Herre, hrr@iis.fhg.de Page 8

The MPEG-7 Standard Scope Standardization of description language : Description production Standard description Description consumption Boundaries of the MPEG-7 standard unique! Description language based on XML / XML Schema (and binary versions of it) Provides elaborated concepts for describing Generic properties of A/V data ( metadata ) Visual data (content-derived) Audio data (content-derived) Jürgen Herre, hrr@iis.fhg.de Page 9

MPEG-7 & MPEG-7 Audio ISO/IEC JTC1/SC29/WG11 IS 15938 Multimedia Content Description Interface : Part 1 Part 2 Part 3 Part 4 Part 5 Part 6 Part 7 Part 8 Jürgen Herre, hrr@iis.fhg.de Page 10 Systems Description Definition Language (DDL) Visual Audio Multimedia Description Schemes Reference Software Conformance Testing Extraction & Use of MPEG-7 Descriptions

MPEG-7 Audio: Basic Architecture MPEG-7 Audio Low Level Descriptor (LLD) Framework Generic Description Schemes (DSs) Application specific / motivated Jürgen Herre, hrr@iis.fhg.de Page 11

Low Level Descriptor Framework (1) Collection of 17 Descriptors Usefulness shown in Core Experiments (CEs) for specific applications Generality Generic signal characteristics fi applicable to a wide range of signals fi useful for many purposes Toolbox: Future applications can make use of them in flexible ways (form DS according to their specific needs) Provides basic layer of interoperability Jürgen Herre, hrr@iis.fhg.de Page 12

Low Level Descriptor Framework (2) Basic Basic Spectral Signal Parameters Spectral Basis Representations Instantaneous waveform, power, silence Power spectrum, spectral centroid, spectral spread, spectral flatness Fundamental frequency, harmonicity Used primarily for sound recognition, projections into low-dimensional space Timbral Temporal Timbral Spectral Log attack time, temporal centroid of a monophonic sound Features specific to the harmonic portions of signals (harmonic spectral centroid, spectral deviation, spectral spread, ) etc. Jürgen Herre, hrr@iis.fhg.de Page 13

MPEG-7 Audio Version 2 Additions (2003) Additions Audio signal quality description: Background Noise, channel cross-correlation, relative delay, balance, DC offset, bandwidth, transmission technology Errors in recordings (clicks, clipping, drop outs, ) Musical tempo [bpm] Extensions Description of stereo / multi-channel signals Extensions for spoken content description Jürgen Herre, hrr@iis.fhg.de Page 14

Some MPEG-7 Audio Applications Spoken Content Search General Sound Recognition & Indexing Audio Archiving and Restoration Instrument Timbre Search Melody Search / Query by Humming Robust Audio Identification / Fingerprinting Jürgen Herre, hrr@iis.fhg.de Page 15

Application: Spoken Content Search Context Speech is most important means of communication for humans Today s automatic speech recognizers represent speech internally as word & phone lattices Taj Mahal Drawing (example by Phil Garner) Jürgen Herre, hrr@iis.fhg.de Page 16

Application: Spoken Content Search (2) Approach Store lattices as descriptive data (rather than plain text) Use for querying/matching Allows handling of ambiguities ( recognize speech vs. wreck a nice beach ) Allows even retrieval of words unknown at annotation time Applications Spoken document retrieval Annotation & retrieval using speech (e.g.: personal photo database with spoken comments a la the kids at the beach ) Jürgen Herre, hrr@iis.fhg.de Page 17

Application: General Sound Recognition/Indexing Context Approach Applications How can various sounds be recognized, even within complex sound mixtures? How to address sound similarity problem? Independent Component Analysis of spectral components (ICA), HMM A/V indexing & search by audio (index/find scenes with animals, gun fights, laughter, explosions, ) Determine scene content/context from audio portion: Gun shot / explosion fi action scene Jürgen Herre, hrr@iis.fhg.de Page 18

Application: Archiving and Restoration Context Approach Preservation of cultural heritage (audio) Archivists always like to keep raw digitized recording for later processing & restoration Audio quality description encodes: General audio quality Technical recording parameters Location & type of defects Applications Search for archived audio material Restoration of audio material (may use latest restoration technology) Jürgen Herre, hrr@iis.fhg.de Page 19

Application: Instrument Timbre Search Context Human perception of sound includes timbre (besides pitch, loudness, duration) Approach Use perceptually relevant features to describe monophonic isolated instrument sounds fi enables comparison of sounds Considers sustained harmonic & percussive instrument sounds Applications Navigation through / search in instrument sound databases ( Giga-samplers ) Content-based editing Jürgen Herre, hrr@iis.fhg.de Page 20

Application: Melody Search Context How to search for melodies? Needs to be both efficient and tolerant to pitch & timing imprecisions Approach Melody description encodes: Pitch Timing/rhythmic information Other optional information (key ) Applications Jürgen Herre, hrr@iis.fhg.de Page 21 Query by Humming / Playing / Singing Search for themes in music catalogs Musicology

Application: Audio Identification/Fingerprinting Context How to automatically recognize/identify a recording? Approach Jürgen Herre, hrr@iis.fhg.de Page 22 Store unique MPEG-7 signature / fingerprint of original items in DB (robust, compact) Identification of unknown audio material by matching its signature with the DB signatures

Audio Identification/Fingerprinting (2) Applications Music recognition on handheld devices (PDA, cell phone), Content search on the Internet Broadcast monitoring How to find metadata for a given piece of legacy content (CD, CC, VCR Tape,... no metadata attached!)? Jürgen Herre, hrr@iis.fhg.de Page 23

Conclusions Notion of Metadata Essential to allow efficient handling of A/V data for real-world applications Search&retrieval, summarization, navigation Container for semantic audio analysis output Interoperability is crucial MPEG-7 Audio First international standard providing content-derived audio metadata fi unique! Generic descriptor toolbox Support for a number of attractive applications Large potential for more, still unconceived applications Jürgen Herre, hrr@iis.fhg.de Page 24