Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Similar documents
Digital Media. Daniel Fuller ITEC 2110

CHAPTER 6 Audio compression in practice

2.4 Audio Compression

5: Music Compression. Music Coding. Mark Handley

Principles of Audio Coding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

08 Sound. Multimedia Systems. Nature of Sound, Store Audio, Sound Editing, MIDI

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009

Audio-coding standards

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

Skill Area 214: Use a Multimedia Software. Software Application (SWA)

ITNP80: Multimedia! Sound-II!

Audio-coding standards

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

CHAPTER 10: SOUND AND VIDEO EDITING

EE482: Digital Signal Processing Applications

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Multimedia Databases

Digital Audio Basics

ELL 788 Computational Perception & Cognition July November 2015

Lecture #3: Digital Music and Sound

Digital Audio. Amplitude Analogue signal

Chapter 14 MPEG Audio Compression

Chapter 5.5 Audio Programming

Mpeg 1 layer 3 (mp3) general overview

Fundamental of Digital Media Design. Introduction to Audio

Optical Storage Technology. MPEG Data Compression

3 Sound / Audio. CS 5513 Multimedia Systems Spring 2009 LECTURE. Imran Ihsan Principal Design Consultant

Audio Coding and MP3

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Multimedia Communications. Audio coding

Lecture 16 Perceptual Audio Coding

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

UNDERSTANDING MUSIC & VIDEO FORMATS

Multimedia Databases. 8 Audio Retrieval. 8.1 Music Retrieval. 8.1 Statistical Features. 8.1 Music Retrieval. 8.1 Music Retrieval 12/11/2009

Data Representation. Reminders. Sound What is sound? Interpreting bits to give them meaning. Part 4: Media - Sound, Video, Compression

FILE CONVERSION AFTERMATH: ANALYSIS OF AUDIO FILE STRUCTURE FORMAT

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Multimedia Database Systems. Retrieval by Content

Data Compression. Audio compression

Digital Recording and Playback

Principles of MPEG audio compression

CISC 7610 Lecture 3 Multimedia data and data formats

Export Audio Mixdown

Lossy compression. CSCI 470: Web Science Keith Vertanen

Parametric Coding of High-Quality Audio

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

AUDIOVISUAL COMMUNICATION

Capturing and Editing Digital Audio *

AUDIOVISUAL COMMUNICATION

CSCD 443/533 Advanced Networks Fall 2017

Transporting audio-video. over the Internet

Multimedia Databases

Appendix 4. Audio coding algorithms

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

Compression; Error detection & correction

Chapter X Sampler Instrument

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

MPEG-7 Audio: Tools for Semantic Audio Description and Processing

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

Elementary Computing CSC 100. M. Cheng, Computer Science

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

ICT 514 Multimedia Systems Topic 4: Audio

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

Multimedia Databases. 0. Organizational Issues. 0. Organizational Issues. 0. Organizational Issues. 0. Organizational Issues. 1.

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

9/8/2016. Characteristics of multimedia Various media types

CS 074 The Digital World. Digital Audio

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

_APP B_549_10/31/06. Appendix B. Producing for Multimedia and the Web

MPEG-4 Structured Audio Systems

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

A Digital Audio Primer

CS 4455: Video Game Design & Implementation

EUROPEAN COMPUTER DRIVING LICENCE. Multimedia Audio Editing. Syllabus

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

Speech-Coding Techniques. Chapter 3

AUDIO MEDIA CHAPTER Background

AUDIO COMPRESSION USING WAVELET TRANSFORM

Quick Guide to Getting Started with:

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

Image and Video Coding I: Fundamentals

MPEG-4 aacplus - Audio coding for today s digital media world

Networking Applications

Parametric Coding of Spatial Audio

AAMS Auto Audio Mastering System V3 Manual

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK

CHAPTER 2 - DIGITAL DATA REPRESENTATION AND NUMBERING SYSTEMS

Proceedings of Meetings on Acoustics

Multimedia Data and Its Encoding

6MPEG-4 audio coding tools

2.1 Transcoding audio files

Transcription:

Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

6 Audio Retrieval 6 Audio Retrieval 6.1 Basics of Audio Data 6.2 Audio Information in Databases 6.3 Audio Retrieval Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 2

6.1 Basics of Audio Data Information transfer through sound Audio (Latin, "I hear") Three different types of data: Music Spoken text Noise Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 3

6.1 Basics Auditory perception through pressure fluctuations in the air Eardrum vibrates synchronously Ear bones amplify, direct Auditory hair cells in the ear cochlea, are stimulated by the vibrations Neurons produce electrical impulses Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 4

6.1 Basics 3D model of the human ear Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 5

6.1 Basics Our brain only interprets two major properties of sound: Pitch Volume Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 6

6.1 Basics Quantitative performance of the sound wave Amplitude as volume Logarithmic perception (tenfold increase in amplitude doubles the perceived loudness) Frequency as pitch Number of periods per unit time is known as frequency (measured in hertz) Hearing range between 20 Hz and 20 khz Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 7

6.1 Basics Audio signals are time-dependent (overlapping) waveforms Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 8

6.1 Basics Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 9

6.1 Basics Constructive and destructive interference Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 10

6.1 Basics Audio examples Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 11

6.1 Sound Creation Musical instruments are for vibration exciter arranged E.g., string-, blowing-, percussion Acoustics depends on the vibration generator E.g., strings-, air, membrane instruments Synthetic creation needs an oscillator The oscillator generates voltage oscillations Speaker transmit the voltage changes on a membrane Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 12

6.1 Sound Creation Influence of the oscillator Higher voltage Higher frequency (Moog, 1964) Amplifier influences the amplitude thus the volume ADSR (attack-decay-sustain-release) envelope influences the loudness of a sound in time Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 13

6.1 Sound Creation Moog 901B (1964) Modular Moog Synthesizer (1967) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 14

6.1 Sound Creation Emerson, Lake & Palmer: The Great Gates of Kiev (1974) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 15

6.1 ADSR envelope For the production of a single sound synthesizer through four phases are typical: Attack: speed and strength of the signal rise Decay: lowering the level Sustain: actual pitch Release: end of the signal Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 16

6.1 Digitalization of Audio Data Transformation of the continuous sound wave into a discrete representation Sampling: save at regular intervals, the current amplitude value of vibration Clearly, we have to reconstruct audio signals from these values Amplitude Time Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 17

6.1 Sampling Basic characteristics Sampling rate: how many times per unit time is the value of the continuous signal tapped? Resolution: which accuracy are the values recorded with? Often, a resolution of 16 bits is used (2 16 different amplitude values) The sampling rate is application dependent: Audio CD: 44100 Hz Phone: 8000 Hz Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 18

6.1 Sampling Rate The clear recoverability of the oscillation is very important The higher the sampling frequency, the more values must be saved Minimum sampling frequency? Sampling theorem (Nyquist, 1928) Sampling rate must be at least twice as large as the highest frequency occurring in the signal Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 19

6.1 Sampling Rate Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 20

6.1 Sampling Rate Phone: 8000 Hz DVD audio: 96,000 Hz or 192,000 Hz Audio CD:44,100 measurements per second for two stereo channels with 16 bits per measurement results in 176.4 kb/s ca 10 MB/min, i.e., 635 MB/h Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 21

6.1 Audio Formats For space reasons, is digital data usually stored in compressed form Known uncompressed formats: AIFF: *. aif (Apple Inter opportunity File Format) Wave: *. wav (Windows) IRCAM: *. snd (Institut de Recherche et Coordination Acoustique / Musique) AU: *. au (Sun audio) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 22

6.1 Compression Data reduction, usually with lossy algorithms There are also lossless compression methods, but they generally do not compress very much Lossless Audio (LA) Apple Lossless WavPack... Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 23

6.1 Compression Lossy compression algorithms typically are based on simple transformations Modified Discrete Cosine Transformation (MDCT) or wavelets Encoding: transformation of the waveform in frequency sequences (sampling) Decoding: Reconstruction of waveform from these values Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 24

6.1 Compression Change of the data without changing the subjective perception Omit very high/low frequencies Save superimposed frequencies with less precision Use of other effects according to psychoacoustic model, e.g., low tones before/after very loud sounds and frequency changes at a minimum distance are impossible to hear... Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 25

6.1 Compression MPEG-1 Audio Layers I, II and III (MP3) CD quality at bit rates of 128 kb/s Coarse approach to MP3 Channel coupling of the stereo signal by using the difference Cut off inaudible frequencies and all frequencies > 2 sampling rate (sampling theorem) Eliminate redundancy by considering the psychoacoustic effects Compress data using Huffman coding Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 26

6.1 Compression AAC (Advanced Audio Coding) Industry-driven improvement of the MP3 format (supported by the MPEG) Used in TV-/radio broadcasts, Apple itunes Store,... Better quality at same file size Support for multi-channel audio Supports 48 main sound channels with up to 96 khz sampling rate, 16 low-frequency channels (limited to 120 Hz) and 15 data flows Ogg Vorbis, Real Audio, WMA 9,... Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 27

6.1 Compression Lossless compression http://members.home.nl/w.speek/comparison.htm Lossy compression http://www.soundexpert.info/coders128.jsp Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 28

6.1 Compression Lossless compression, test procedure Rip N audio CDs with different music types Encode the resulting file Measure speed and compression rate Decode the encoded version Measure speed and compression rate Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 29

6.1 Compression Lossless compression, results Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 30

6.1 Compression Lossy compression, test procedure Rip N audio CDs with different music types Compress with different methods Measure the quality of the compressed sound file Use Mean Opinion Score (MOS) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 31

6.1 Compression Lossy compression, results Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 32

6.1 Compression Lossy compression, results Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 33

6.1 The MIDI Format Communication protocol For transmission, recording and playing musical control information between digital instruments and the PC Developed since 1983 by Sequential Circuits and Roland Statements are not sounds, but commands that can be used e.g., by sound cards Some commands: Note-on, note-off, key velocity, pitch, type of instrument Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 34

6.1 The MIDI Format 10 minutes music are about 200 KB of MIDI data Significant savings compared to sampling, but no original sound Data are inputted to the PC via keyboard and outputted via synthesizer Sequencer for caching data and changes Audio score can be automatically generated Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 35

6.2 Audio Information in Databases Audio data Music, CDs Sound effects, Earcons Audio data represent most of information transfer Storage of historical speeches Recordings of conversations, phone calls or negotiations Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 36

6.2 Special Applications Three main applications of audio signals in the context of databases Identification of audio signals (audio as query) Classification and search of similar signals (matching of audio) Phonetic synchronization Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 37

6.2 Identification of Audio Signals Find the title, etc. for this music piece Monitoring of audio streams Control of broadcasting of advertisements on radio Copyright Control (GEMA) (Remote) diagnosis based on noise Audio on Demand Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 38

6.2 Classification and Matching Find perceptionally similar audio signals E.g., similar pieces of music, the same quotation,... Recommendation E.g., bands with similar music Genre classification (rock, classical,...) E.g., in audio libraries Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 39

6.2 Synchronization Synchronization of audio streams Language text, notes audio,... Retrieval of text from or to speech Find specific points in a speech Verbal query to text documents Following of audio scores in concerts, etc. Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 40

6.2 State of the Art Identification The simplest of the three problems, in recent years, successful research Classification and Matching Often still manual annotations Automatic classification only works roughly, on small collections Matching is still largely unresolved Synchronization Meanwhile, tolerable error rates (language text) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 41

6.2 Persistent Storage (Compressed) audio files are stored in the database as (smart) BLOBs Additionally, are metadata and feature vectors stored for the realization of the search functionality Language: transcription as text Music: musical notation or MIDI Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 42

6.3 Audio Retrieval Search in audio data: metadata describe the audio file Semantically: difficult to generate title, artist, speaker, keywords,... File information: can be automatically generated e.g., time/place of recording, filename, file size,... Widely used, e.g., music exchange markets, online shops,... Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 43

6.3 Metadata-based Search Manual indexing is extremely labor intensive and expensive Information is often incomplete, partial and subjective (e.g., genre classification) No possibility to Query by Example ( "Sounds like...") Search only with SQL, approximate string search, etc. Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 44

6.3 Content-based Search Using content of audio files Compare measure vs., measure Not very promising, inefficient Differences in sampling rate and resolution Sounds can be differentiated by certain characteristics Low Level Features Logical Features Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 45

6.3 Low Level Features Acoustic features Same basic idea as in image databases Description of signal information by means of characteristic features In contrast to image information we don t use a single feature vector, but a time-dependent vector function Time-point of the acoustic characteristics, rather than being contained in the audio file Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 46

6.3 Low Level Features Typical Low Level Features Mean amplitude, loudness Frequency distribution Pitch Brightness Bandwidth Measured in the... Time domain (amplitude versus time)... Frequency domain (intensity versus frequency) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 47

6.3 Features in the Time Domain Amplitude Pressure fluctuations around the zero point Silence is equivalent to 0 amplitude Average energy Characterizes the volume of the signal with N as the total number of measurements and x n as n th measurement Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 48

6.3 Features in the Time Domain Zero-Crossing Rate Frequency of the sign change in the signal with sgn as a sign function (signum) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 49

6.3 Features in the Time Domain Silence Ratio Proportion of values that belong to a period of complete silence We must first establish: The amplitude value below which a pitch is considered to be silence The minimum number of consecutive readings that need to be silent, to form a silence period Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 50

6.3 Features in the Frequency Domain Fourier transform of the signal Decomposition into frequency components with coefficients (Fourier coefficients) Representation of frequency spectrum of the signal Size of the coefficients of the frequency (represents the amount of energy per frequency) Usually measured in decibels (db) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 51

6.3 Frequency Spectrum "Ahhh" sound and Fourier spectrum Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 52

6.3 Features in the Frequency Domain Bandwidth Interval of occurring frequencies Difference between the largest and smallest frequency in the spectrum (we consider minimum frequency, only frequencies above the silence threshold) Can also be used for classification, e.g., bandwidth in music is higher than for voice Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 53

6.3 Features in the Frequency Domain Power Distribution Can be read directly from the spectrum Distinction of frequencies with high/low energy Calculation of frequency bands with high/low energy Centroid as the center of the spectral energy distribution (brightness) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 54

6.3 Features in the Frequency Domain Harmony The lowest of all the "noisy" frequencies is called the fundamental frequency Harmony of a signal increases when the dominant components in the spectrum are multiples of the fundamental frequency E.g., standard pitch A, as the fundamental frequency (440 Hz) produced on a violin creates harmonic oscillations at 880 Hz, 1320 Hz, 1760 Hz, etc. Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 55

6.3 Harmony Harmonic oscillations Frequency spectrum of a sound played on an instrument Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 56

6.3 Features in the Frequency Domain Pitch Detectable only for periodic sounds Can be approximated by means of the Fourier spectrum The value is calculated from the frequencies and amplitudes of the peaks Related to the fundamental frequency, which is often used as an approximation Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 57

Next lecture Classification and Retrieval of Audio Low level Audio Features Difference Limen Pitch Detection Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 58