MPEG-7 Audio and Beyond
|
|
- Alicia McDowell
- 6 years ago
- Views:
Transcription
1 MPEG-7 Audio and Beyond Audio Content Indexing and Retrieval Hyoung-Gook Kim Samsung Advanced Institute of Technology, Korea Nicolas Moreau Technical University of Berlin, Germany Thomas Sikora Communication Systems Group, Technical University of Berlin, Germany
2
3 MPEG-7 Audio and Beyond
4
5 MPEG-7 Audio and Beyond Audio Content Indexing and Retrieval Hyoung-Gook Kim Samsung Advanced Institute of Technology, Korea Nicolas Moreau Technical University of Berlin, Germany Thomas Sikora Communication Systems Group, Technical University of Berlin, Germany
6 Copyright 2005 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England Telephone (+44) (for orders and customer service enquiries): Visit our Home Page on All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or ed to permreq@wiley.co.uk, or faxed to This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Other Wiley Editorial Offices John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA Jossey-Bass, 989 Market Street, San Francisco, CA , USA Wiley-VCH Verlag GmbH, Boschstr. 12, D Weinheim, Germany John Wiley & Sons Australia Ltd, 42 McDougall Street, Milton, Queensland 4064, Australia John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1 Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Library of Congress Cataloging in Publication Data Kim, Hyoung-Gook. Introduction to MPEG-7 audio / Hyoung-Gook Kim, Nicolas Moreau, Thomas Sikora. p. cm. Includes bibliographical references and index. ISBN (cloth: alk. paper) ISBN X (cloth: alk. paper) 1. MPEG (Video coding standard) 2. Multimedia systems. 3. Sound Recording and reproducing Digital techniques Standards. I. Moreau, Nicolas. II. Sikora, Thomas. III. Title. TK K dc22 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN (HB) ISBN X (HB) Typeset in 10/12pt Times by Integra Software Services Pvt. Ltd, Pondicherry, India Printed and bound in Great Britain by TJ International Ltd, Padstow, Cornwall This book is printed on acid-free paper responsibly manufactured from sustainable forestry in which at least two trees are planted for each one used for paper production.
7 Contents List of Acronyms List of Symbols xi xv 1 Introduction Audio Content Description MPEG-7 Audio Content Description An Overview MPEG-7 Low-Level Descriptors MPEG-7 Description Schemes MPEG-7 Description Definition Language (DDL) BiM (Binary Format for MPEG-7) Organization of the Book 10 2 Low-Level Descriptors Introduction Basic Parameters and Notations Time Domain Frequency Domain Scalable Series Series of Scalars Series of Vectors Binary Series Basic Descriptors Audio Waveform Audio Power Basic Spectral Descriptors Audio Spectrum Envelope Audio Spectrum Centroid Audio Spectrum Spread Audio Spectrum Flatness Basic Signal Parameters Audio Harmonicity Audio Fundamental Frequency 36
8 vi CONTENTS 2.7 Timbral Descriptors Temporal Timbral: Requirements Log Attack Time Temporal Centroid Spectral Timbral: Requirements Harmonic Spectral Centroid Harmonic Spectral Deviation Harmonic Spectral Spread Harmonic Spectral Variation Spectral Centroid Spectral Basis Representations Silence Segment Beyond the Scope of MPEG Other Low-Level Descriptors Mel-Frequency Cepstrum Coefficients 52 References 55 3 Sound Classification and Similarity Introduction Dimensionality Reduction Singular Value Decomposition (SVD) Principal Component Analysis (PCA) Independent Component Analysis (ICA) Non-Negative Factorization (NMF) Classification Methods Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Neural Network (NN) Support Vector Machine (SVM) MPEG-7 Sound Classification MPEG-7 Audio Spectrum Projection (ASP) Feature Extraction Training Hidden Markov Models (HMMs) Classification of Sounds Comparison of MPEG-7 Audio Spectrum Projection vs. MFCC Features Indexing and Similarity Audio Retrieval Using Histogram Sum of Squared Differences Simulation Results and Discussion Plots of MPEG-7 Audio Descriptors Parameter Selection Results for Distinguishing Between Speech, Music and Environmental Sound 91
9 CONTENTS vii Results of Sound Classification Using Three Audio Taxonomy Methods Results for Speaker Recognition Results of Musical Instrument Classification Audio Retrieval Results Conclusions 100 References Spoken Content Introduction Automatic Speech Recognition Basic Principles Types of Speech Recognition Systems Recognition Results MPEG-7 SpokenContent Description General Structure SpokenContentHeader SpokenContentLattice Application: Spoken Document Retrieval Basic Principles of IR and SDR Vector Space Models Word-Based SDR Sub-Word-Based Vector Space Models Sub-Word String Matching Combining Word and Sub-Word Indexing Conclusions MPEG-7 Interoperability MPEG-7 Flexibility Perspectives 166 References Music Description Tools Timbre Introduction InstrumentTimbre HarmonicInstrumentTimbre PercussiveInstrumentTimbre Distance Measures Melody Melody Meter Scale Key 181
10 viii CONTENTS MelodyContour MelodySequence Tempo AudioTempo AudioBPM Application Example: Query-by-Humming Monophonic Melody Transcription Polyphonic Melody Transcription Comparison of Melody Contours 200 References Fingerprinting and Audio Signal Quality Introduction Audio Signature Generalities on Audio Fingerprinting Fingerprint Extraction Distance and Searching Methods MPEG-7-Standardized AudioSignature Audio Signal Quality AudioSignalQuality Description Scheme BroadcastReady IsOriginalMono BackgroundNoiseLevel CrossChannelCorrelation RelativeDelay Balance DcOffset Bandwidth TransmissionTechnology ErrorEvent and ErrorEventList 226 References Application Introduction Automatic Audio Segmentation Feature Extraction Segmentation Metric-Based Segmentation Model-Selection-Based Segmentation Hybrid Segmentation Hybrid Segmentation Using MPEG-7 ASP Segmentation Results 250
11 CONTENTS ix 7.3 Sound Indexing and Browsing of Home Video Using Spoken Annotations A Simple Experimental System Retrieval Results Highlights Extraction for Sport Programmes Using Audio Event Detection Goal Event Segment Selection System Results A Spoken Document Retrieval System for Digital Photo Albums 265 References 266 Index 271
12
13 Acronyms ADSR AFF AH AP ASA ASB ASC ASE ASF ASP ASR ASS AWF BIC BP BPM CASA CBID CM CMN CRC DCT DDL DFT DP DS DSD DTD EBP ED EM EMIM Attack, Decay, Sustain, Release Audio Fundamental Frequency Audio Harmonicity Audio Power Auditory Scene Analysis Audio Spectrum Basis Audio Spectrum Centroid Audio Spectrum Envelope Audio Spectrum Flatness Audio Spectrum Projection Automatic Speech Recognition Audio Spectrum Spread Audio Waveform Bayesian Information Criterion Back Propagation Beats Per Minute Computational Auditory Scene Analysis Content-Based Audio Identification Coordinate Matching Cepstrum Mean Normalization Cyclic Redundancy Checking Discrete Cosine Transform Description Definition Language Discrete Fourier Transform Dynamic Programming Description Scheme Divergence Shape Distance Document Type Definition Error Back Propagation Edit Distance Expectation and Maximization Expected Mutual Information Measure
14 xii ACRONYMS EPM Exponential Pseudo Norm FFT Fast Fourier Transform GLR Generalized Likelihood Ratio GMM Gaussian Mixture Model GSM Global System for Mobile Communications HCNN Hidden Control Neural Network HMM Hidden Markov Model HR Harmonic Ratio HSC Harmonic Spectral Centroid HSD Harmonic Spectral Deviation HSS Harmonic Spectral Spread HSV Harmonic Spectral Variation ICA Independent Component Analysis IDF Inverse Document Frequency INED Inverse Normalized Edit Distance IR Information Retrieval ISO International Organization for Standardization KL Karhunen Loève KL Kullback Leibler KS Knowledge Source LAT Log Attack Time LBG Linde Buzo Gray LD Levenshtein Distance LHSC Local Harmonic Spectral Centroid LHSD Local Harmonic Spectral Deviation LHSS Local Harmonic Spectral Spread LHSV Local Harmonic Spectral Variation LLD Low-Level Descriptor LM Language Model LMPS Logarithmic Maximum Power Spectrum LP Linear Predictive LPC Linear Predictive Coefficient LPCC Linear Prediction Cepstrum Coefficient LSA Log Spectral Amplitude LSP Linear Spectral Pair LVCSR Large-Vocabulary Continuous Speech Recognition map Mean Average Precision MCLT Modulated Complex Lapped Transform MD5 Message Digest 5 MFCC Mel-Frequency Cepstrum Coefficient MFFE Multiple Fundamental Frequency Estimation MIDI Music Instrument Digital Interface MIR Music Information Retrieval MLP Multi-Layer Perceptron
15 ACRONYMS xiii M.M. MMS MPEG MPS MSD NASE NMF NN OOV OPCA PCA PCM PCM PLP PRC PSM QBE QBH RASTA RBF RCL RMS RSV SA SC SCP SDR SF SFM SNF SOM STA STFT SVD SVM TA TPBM TC TDNN ULH UM UML VCV VQ Metronom Mälzel Multimedia Mining System Moving Picture Experts Group Maximum Power Spectrum Maximum Squared Distance Normalized Audio Spectrum Envelope Non-Negative Matrix Factorization Neural Network Out-Of-Vocabulary Oriented Principal Component Analysis Principal Component Analysis Phone Confusion Matrix Pulse Code Modulated Perceptual Linear Prediction Precision Probabilistic String Matching Query-By-Example Query-By-Humming Relative Spectral Technique Radial Basis Function Recall Root Mean Square Retrieval Status Value Spectral Autocorrelation Spectral Centroid Speaker Change Point Spoken Document Retrieval Spectral Flux Spectral Flatness Measure Spectral Noise Floor Self-Organizing Map Spectro-Temporal Autocorrelation Short-Time Fourier Transform Singular Value Decomposition Support Vector Machine Temporal Autocorrelation Time Pitch Beat Matching Temporal Centroid Time-Delay Neural Network Upper Limit of Harmonicity Ukkonen Measure Unified Modeling Language Vowel Consonant Vowel Vector Quantization
MPEG-7 Audio and Beyond
MPEG-7 Audio and Beyond Audio Content Indexing and Retrieval Hyoung-Gook Kim Samsung Advanced Institute of Technology, Korea Nicolas Moreau Technical University of Berlin, Germany Thomas Sikora Communication
More informationQoS OVER HETEROGENEOUS NETWORKS
QoS OVER HETEROGENEOUS NETWORKS Mario Marchese Department of Communications, Computer and System Science University of Genoa, Italy QoS OVER HETEROGENEOUS NETWORKS QoS OVER HETEROGENEOUS NETWORKS Mario
More informationNext Generation Networks Perspectives and Potentials. Dr Jingming Li Salina LiSalina Consulting, Switzerland Pascal Salina Swisscom SA, Switzerland
Next Generation Networks Perspectives and Potentials Dr Jingming Li Salina LiSalina Consulting, Switzerland Pascal Salina Swisscom SA, Switzerland Next Generation Networks Next Generation Networks Perspectives
More informationSDH/SONET Explained in Functional Models
SDH/SONET Explained in Functional Models Modeling the Optical Transport Network Huub van Helvoort Networking Consultant, the Netherlands SDH/SONET Explained in Functional Models SDH/SONET Explained in
More informationMultimedia Messaging Service
Multimedia Messaging Service An Engineering Approach to MMS Gwenaël Le Bodic Alcatel, France Multimedia Messaging Service Multimedia Messaging Service An Engineering Approach to MMS Gwenaël Le Bodic
More informationS60 Programming A Tutorial Guide
S60 Programming A Tutorial Guide S60 Programming A Tutorial Guide Paul Coulton, Reuben Edwards With Helen Clemson Reviewed by Alex Wilbur, Alastair Milne, Filippo Finelli, Graeme Duncan, Iain Campbell,
More informationNetwork Convergence. Services, Applications, Transport, and Operations Support. Hu Hanrahan. John Wiley & Sons, Ltd
Network Convergence Network Convergence Services, Applications, Transport, and Operations Support Hu Hanrahan University of the Witwatersrand, Johannesburg, South Africa John Wiley & Sons, Ltd Copyright
More informationWorkshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards
Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Extracting meaning
More informationFUZZY LOGIC WITH ENGINEERING APPLICATIONS
FUZZY LOGIC WITH ENGINEERING APPLICATIONS Third Edition Timothy J. Ross University of New Mexico, USA A John Wiley and Sons, Ltd., Publication FUZZY LOGIC WITH ENGINEERING APPLICATIONS Third Edition FUZZY
More informationCOMPUTATIONAL DYNAMICS
COMPUTATIONAL DYNAMICS THIRD EDITION AHMED A. SHABANA Richard and Loan Hill Professor of Engineering University of Illinois at Chicago A John Wiley and Sons, Ltd., Publication COMPUTATIONAL DYNAMICS COMPUTATIONAL
More informationAdvanced Wireless Networks
Advanced Wireless Networks 4G Technologies Savo G. Glisic University of Oulu, Finland Advanced Wireless Networks Advanced Wireless Networks 4G Technologies Savo G. Glisic University of Oulu, Finland
More informationDigital Data Integrity
Digital Data Integrity The Evolution from Passive Protection to Active Management DAVID B. LITTLE SKIP FARMER OUSSAMA EL- HILALI Symantec Corporation, USA Digital Data Integrity Digital Data Integrity
More informationApplied C# in Financial Markets. Martin Worner
Applied C# in Financial Markets Martin Worner Applied C# in Financial Markets Wiley Finance Series Investment Risk Management Yen Yee Chong Understanding International Bank Risk Andrew Fight Global Credit
More informationContent Based Classification of Audio Using MPEG-7 Features
Content Based Classification of Audio Using MPEG-7 Features ManasiChoche, Dr.SatishkumarVarma Abstract The segmentation plays important role in audio classification. The audio data can be divided into
More informationVideo Compression and Communications
Video Compression and Communications Video Compression and Communications From Basics to H.261, H.263, H.264, MPEG4 for DVB and HSDPA-Style Adaptive Turbo-Transceivers Second Edition L. Hanzo, P. J. Cherriman
More informationDetection of goal event in soccer videos
Detection of goal event in soccer videos Hyoung-Gook Kim, Steffen Roeber, Amjad Samour, Thomas Sikora Department of Communication Systems, Technical University of Berlin, Einsteinufer 17, D-10587 Berlin,
More informationSemantic Web Technologies Trends and Research in Ontology-based Systems
Semantic Web Technologies Trends and Research in Ontology-based Systems John Davies BT, UK Rudi Studer University of Karlsruhe, Germany Paul Warren BT, UK Semantic Web Technologies Semantic Web Technologies
More informationEMERGING WIRELESS MULTIMEDIA SERVICES AND TECHNOLOGIES
EMERGING WIRELESS MULTIMEDIA SERVICES AND TECHNOLOGIES Edited by Apostolis K. Salkintzis Motorola, Greece Nikos Passas University of Athens, Greece EMERGING WIRELESS MULTIMEDIA EMERGING WIRELESS MULTIMEDIA
More informationMPEG-7 Audio: Tools for Semantic Audio Description and Processing
MPEG-7 Audio: Tools for Semantic Audio Description and Processing Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Why semantic description
More informationModeling the Spectral Envelope of Musical Instruments
Modeling the Spectral Envelope of Musical Instruments Juan José Burred burred@nue.tu-berlin.de IRCAM Équipe Analyse/Synthèse Axel Röbel / Xavier Rodet Technical University of Berlin Communication Systems
More informationSPECTRAL ELEMENT METHOD IN STRUCTURAL DYNAMICS
SPECTRAL ELEMENT METHOD IN STRUCTURAL DYNAMICS Usik Lee Inha University, Republic of Korea SPECTRAL ELEMENT METHOD IN STRUCTURAL DYNAMICS SPECTRAL ELEMENT METHOD IN STRUCTURAL DYNAMICS Usik Lee Inha
More informationAvailable online Journal of Scientific and Engineering Research, 2016, 3(4): Research Article
Available online www.jsaer.com, 2016, 3(4):417-422 Research Article ISSN: 2394-2630 CODEN(USA): JSERBR Automatic Indexing of Multimedia Documents by Neural Networks Dabbabi Turkia 1, Lamia Bouafif 2, Ellouze
More informationExploiting Distributed Resources in Wireless, Mobile and Social Networks Frank H. P. Fitzek and Marcos D. Katz
MOBILE CLOUDS Exploiting Distributed Resources in Wireless, Mobile and Social Networks Frank H. P. Fitzek and Marcos D. Katz MOBILE CLOUDS MOBILE CLOUDS EXPLOITING DISTRIBUTED RESOURCES IN WIRELESS,
More informationADVANCED CELLULAR NETWORK PLANNING AND OPTIMISATION 2G/2.5G/3G... EVOLUTION TO 4G
ADVANCED CELLULAR NETWORK PLANNING AND OPTIMISATION 2G/2.5G/3G... EVOLUTION TO 4G Edited by Ajay R Mishra Nokia Networks ADVANCED CELLULAR NETWORK PLANNING AND OPTIMISATION ADVANCED CELLULAR NETWORK
More informationInternet Security Cryptographic Principles, Algorithms and Protocols
Internet Security Cryptographic Principles, Algorithms and Protocols Man Young Rhee School of Electrical and Computer Engineering Seoul National University, Republic of Korea Internet Security Internet
More informationNetwork Performance Analysis
Network Performance Analysis Network Performance Analysis Thomas Bonald Mathieu Feuillet Series Editor Pierre-Noël Favennec First published 2011 in Great Britain and the United States by ISTE Ltd and
More informationAlgorithm Collections for Digital Signal Processing Applications Using Matlab
Algorithm Collections for Digital Signal Processing Applications Using Matlab Algorithm Collections for Digital Signal Processing Applications Using Matlab E.S. Gopi National Institute of Technology, Tiruchi,
More informationSpeech in Mobile and Pervasive Environments
Speech in Mobile and Pervasive Environments Wiley Series on Wireless Communications and Mobile Computing Series Editors: Dr Xuemin (Sherman) Shen, University of Waterloo, Canada Dr Yi Pan, Georgia State
More informationGSM Architecture, Protocols and Services Third Edition
GSM Architecture, Protocols and Services Third Edition GSM Architecture, Protocols and Services Third Edition 2009 John Wiley & Sons, Ltd. ISBN: 978-0- 470-03070- 7 J. E be rs pä c he r, H. -J. Vöge l,
More informationInside Symbian SQL. Lead Authors Ivan Litovski with Richard Maynard. Head of Technical Communications, Symbian Foundation Jo Stichbury
Inside Symbian SQL A Mobile Developer s Guide to SQLite Lead Authors Ivan Litovski with Richard Maynard With James Aley, Philip Cheung, James Clarke, Lorraine Martin, Philip Neal, Mike Owens, Martin Platts
More informationIndex. Symbols. background knowledge 194, 195, 196, 197, 198, 203, 204, 205, 206, 208, 212
382 Index Symbols 2-dimensional hidden Markov models (2D- HMMs) 149 2D Multi-resolution Hidden Markov Models (MHMMs) 125 3D trajectory 265, 266, 267, 283, 285 A acquisition 64, 75 across-stage inferencing
More informationMultimedia Database Systems. Retrieval by Content
Multimedia Database Systems Retrieval by Content MIR Motivation Large volumes of data world-wide are not only based on text: Satellite images (oil spill), deep space images (NASA) Medical images (X-rays,
More information1 Introduction. 3 Data Preprocessing. 2 Literature Review
Rock or not? This sure does. [Category] Audio & Music CS 229 Project Report Anand Venkatesan(anand95), Arjun Parthipan(arjun777), Lakshmi Manoharan(mlakshmi) 1 Introduction Music Genre Classification continues
More informationSymbian OS Communications Programming. 2nd Edition
Symbian OS Communications Programming 2nd Edition Symbian OS Communications Programming 2nd Edition By Iain Campbell With Dale Self, Emlyn Howell, Ian Bunning, Ibrahim Rahman, Lucy Caffery, Malcolm Box,
More informationThe Automatic Musicologist
The Automatic Musicologist Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego UCSD AI Seminar April 12, 2004 Based on the paper: Fast Recognition of Musical
More informationDigital Signal Processing System Design: LabVIEW-Based Hybrid Programming Nasser Kehtarnavaz
Digital Signal Processing System Design: LabVIEW-Based Hybrid Programming Nasser Kehtarnavaz Digital Signal Processing System Design: LabVIEW-Based Hybrid Programming by Nasser Kehtarnavaz University
More informationDETECTING INDOOR SOUND EVENTS
DETECTING INDOOR SOUND EVENTS Toma TELEMBICI, Lacrimioara GRAMA Signal Processing Group, Basis of Electronics Department, Faculty of Electronics, Telecommunications and Information Technology, Technical
More informationModern Experimental Design
Modern Experimental Design THOMAS P. RYAN Acworth, GA Modern Experimental Design Modern Experimental Design THOMAS P. RYAN Acworth, GA Copyright C 2007 by John Wiley & Sons, Inc. All rights reserved.
More informationThe Internet of Things
The Internet of Things The Internet of Things Connecting Objects to the Web Edited by Hakima Chaouchi First published 2010 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
More informationFundamentals of Digital Image Processing
\L\.6 Gw.i Fundamentals of Digital Image Processing A Practical Approach with Examples in Matlab Chris Solomon School of Physical Sciences, University of Kent, Canterbury, UK Toby Breckon School of Engineering,
More informationCHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri
1 CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING Alexander Wankhammer Peter Sciri introduction./the idea > overview What is musical structure?
More informationISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 4: Audio
INTERNATIONAL STANDARD ISO/IEC 15938-4 First edition 2002-06-15 Information technology Multimedia content description interface Part 4: Audio Technologies de l'information Interface de description du contenu
More informationContents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48
Contents Part I Prelude 1 Introduction... 3 1.1 Audio Coding... 4 1.2 Basic Idea... 6 1.3 Perceptual Irrelevance... 8 1.4 Statistical Redundancy... 9 1.5 Data Modeling... 9 1.6 Resolution Challenge...
More information716 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 5, MAY 2004
716 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 5, MAY 2004 Audio Classification Based on MPEG-7 Spectral Basis Representations Hyoung-Gook Kim, Nicolas Moreau, and Thomas
More informationTOWARDS THE SEMANTIC WEB
TOWARDS THE SEMANTIC WEB Ontology-driven Knowledge Management Edited by Dr John Davies British Telecommunications plc ProfessorDieterFensel University of Innsbruck, Austria and Professor Frank van Harmelen
More informationSHORT MESSAGE SERVICE (SMS)
SHORT MESSAGE SERVICE (SMS) THE CREATION OF PERSONAL GLOBAL TEXT MESSAGING Friedhelm Hillebrand (Editor) Hillebrand & Partners, Germany Finn Trosby Telenor, Norway Kevin Holley Telefónica Europe, UK Ian
More informationIMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING
SECOND EDITION IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING ith Algorithms for ENVI/IDL Morton J. Canty с*' Q\ CRC Press Taylor &. Francis Group Boca Raton London New York CRC
More informationPitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery
Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta Kumar Ghosh SPIRE LAB Electrical Engineering, Indian Institute of Science (IISc), Bangalore,
More informationMultimedia Data Mining in Digital Libraries: Standards and Features
Multimedia Data Mining in Digital Libraries: Standards and Features Sanjeevkumar R. Jadhav *, and Praveenkumar Kumbargoudar * Abstract The digital library retrieves, collects, stores and preserves the
More informationCLASSIFICATION AND CHANGE DETECTION
IMAGE ANALYSIS, CLASSIFICATION AND CHANGE DETECTION IN REMOTE SENSING With Algorithms for ENVI/IDL and Python THIRD EDITION Morton J. Canty CRC Press Taylor & Francis Group Boca Raton London NewYork CRC
More informationAditi Upadhyay Research Scholar, Department of Electronics & Communication Engineering Jaipur National University, Jaipur, Rajasthan, India
Analysis of Different Classifier Using Feature Extraction in Speaker Identification and Verification under Adverse Acoustic Condition for Different Scenario Shrikant Upadhyay Assistant Professor, Department
More informationAn Introduction to Pattern Recognition
An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included
More informationREAL-TIME DIGITAL SIGNAL PROCESSING
REAL-TIME DIGITAL SIGNAL PROCESSING FUNDAMENTALS, IMPLEMENTATIONS AND APPLICATIONS Third Edition Sen M. Kuo Northern Illinois University, USA Bob H. Lee Ittiam Systems, Inc., USA Wenshun Tian Sonus Networks,
More informationGSM Architecture, Protocols and Services
GSM Architecture, Protocols and Services Third Edition Jörg Eberspächer Technische Universität München, Germany Hans-Jörg Vögel BMW Group Research & Technology, Germany Christian Bettstetter University
More informationMACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014
MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION Steve Tjoa kiemyang@gmail.com June 25, 2014 Review from Day 2 Supervised vs. Unsupervised Unsupervised - clustering Supervised binary classifiers (2 classes)
More informationINFORMATION RETRIEVAL: SEARCHING IN THE 21ST CENTURY
INFORMATION RETRIEVAL: SEARCHING IN THE 21ST CENTURY Ayşe Göker City University London, UK John Davies BT, UK A John Wiley and Sons, Ltd., Publication INFORMATION RETRIEVAL INFORMATION RETRIEVAL: SEARCHING
More informationDietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++
Dietrich Paulus Joachim Hornegger Pattern Recognition of Images and Speech in C++ To Dorothea, Belinda, and Dominik In the text we use the following names which are protected, trademarks owned by a company
More informationAn Introduction to Programming with IDL
An Introduction to Programming with IDL Interactive Data Language Kenneth P. Bowman Department of Atmospheric Sciences Texas A&M University AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN
More informationReal-Time Optimization by Extremum-Seeking Control
Real-Time Optimization by Extremum-Seeking Control Real-Time Optimization by Extremum-Seeking Control KARTIK B. ARIYUR MIROSLAV KRSTIĆ A JOHN WILEY & SONS, INC., PUBLICATION Copyright 2003 by John Wiley
More informationLesson 11. Media Retrieval. Information Retrieval. Image Retrieval. Video Retrieval. Audio Retrieval
Lesson 11 Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Retrieval = Query + Search Informational Retrieval: Get required information from database/web
More informationGYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)
GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE
More informationHASHING IN COMPUTER SCIENCE FIFTY YEARS OF SLICING AND DICING
HASHING IN COMPUTER SCIENCE FIFTY YEARS OF SLICING AND DICING Alan G. Konheim JOHN WILEY & SONS, INC., PUBLICATION HASHING IN COMPUTER SCIENCE HASHING IN COMPUTER SCIENCE FIFTY YEARS OF SLICING AND DICING
More informationDigital Image Processing
Digital Image Processing Using MATLAB Rafael C. Gonzalez University of Tennessee Richard E. Woods MedData Interactive Steven L. Eddins The MathWorks, Inc. Upper Saddle River, NJ 07458 Library of Congress
More informationNew Results in Low Bit Rate Speech Coding and Bandwidth Extension
Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without
More informationCHAPTER 8 Multimedia Information Retrieval
CHAPTER 8 Multimedia Information Retrieval Introduction Text has been the predominant medium for the communication of information. With the availability of better computing capabilities such as availability
More informationVoice Command Based Computer Application Control Using MFCC
Voice Command Based Computer Application Control Using MFCC Abinayaa B., Arun D., Darshini B., Nataraj C Department of Embedded Systems Technologies, Sri Ramakrishna College of Engineering, Coimbatore,
More informationLEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS DINESH C. VERMA IBM T. J. Watson Research Center A JOHN WILEY & SONS, INC., PUBLICATION
LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS DINESH C. VERMA IBM T. J. Watson Research Center A JOHN WILEY & SONS, INC., PUBLICATION LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS LEGITIMATE APPLICATIONS
More informationSOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2
Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 1 Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 33720, Tampere, Finland toni.heittola@tut.fi,
More informationA Brief Overview of Audio Information Retrieval. Unjung Nam CCRMA Stanford University
A Brief Overview of Audio Information Retrieval Unjung Nam CCRMA Stanford University 1 Outline What is AIR? Motivation Related Field of Research Elements of AIR Experiments and discussion Music Classification
More informationCOSO Enterprise Risk Management
COSO Enterprise Risk Management COSO Enterprise Risk Management Establishing Effective Governance, Risk, and Compliance Processes Second Edition ROBERT R. MOELLER John Wiley & Sons, Inc. Copyright # 2007,
More informationDiscriminative training and Feature combination
Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics
More informationSPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION
Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing
More informationAnalyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun
Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about
More informationAdvanced techniques for management of personal digital music libraries
Advanced techniques for management of personal digital music libraries Jukka Rauhala TKK, Laboratory of Acoustics and Audio signal processing Jukka.Rauhala@acoustics.hut.fi Abstract In this paper, advanced
More informationAudio-coding standards
Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.
More informationText-Independent Speaker Identification
December 8, 1999 Text-Independent Speaker Identification Til T. Phan and Thomas Soong 1.0 Introduction 1.1 Motivation The problem of speaker identification is an area with many different applications.
More informationMPEG-l.MPEG-2, MPEG-4
The MPEG Handbook MPEG-l.MPEG-2, MPEG-4 Second edition John Watkinson PT ^PVTPR AMSTERDAM BOSTON HEIDELBERG LONDON. NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Focal Press is an
More informationAutomatic annotation of digital photos
University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2007 Automatic annotation of digital photos Wenbin Shao University
More informationINFORMATION RETRIEVAL SYSTEMS: Theory and Implementation
INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation THE KLUWER INTERNATIONAL SERIES ON INFORMATION RETRIEVAL Series Editor W. Bruce Croft University of Massachusetts Amherst, MA 01003 Also in the
More informationPrinciples of Audio Coding
Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting
More informationBetter Than MFCC Audio Classification Features. Author. Published. Book Title DOI. Copyright Statement. Downloaded from. Griffith Research Online
Better Than MFCC Audio Classification Features Author Gonzalez, Ruben Published 2013 Book Title The Era of Interactive Media DOI https://doi.org/10.1007/978-1-4614-3501-3_24 Copyright Statement 2013 Springer.
More informationLEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS
LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS DINESH C. VERMA IBM T. J. Watson Research Center A JOHN WILEY & SONS, INC., PUBLICATION LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS LEGITIMATE APPLICATIONS
More information2.4 Audio Compression
2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and
More informationInput speech signal. Selected /Rejected. Pre-processing Feature extraction Matching algorithm. Database. Figure 1: Process flow in ASR
Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Feature Extraction
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 6 Audio Retrieval 6 Audio Retrieval 6.1 Basics of
More informationAudio & Music Research at LabROSA
Audio & Music Research at LabROSA Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/
More informationA text-independent speaker verification model: A comparative analysis
A text-independent speaker verification model: A comparative analysis Rishi Charan, Manisha.A, Karthik.R, Raesh Kumar M, Senior IEEE Member School of Electronic Engineering VIT University Tamil Nadu, India
More informationDevice Activation based on Voice Recognition using Mel Frequency Cepstral Coefficients (MFCC s) Algorithm
Device Activation based on Voice Recognition using Mel Frequency Cepstral Coefficients (MFCC s) Algorithm Hassan Mohammed Obaid Al Marzuqi 1, Shaik Mazhar Hussain 2, Dr Anilloy Frank 3 1,2,3Middle East
More information2014, IJARCSSE All Rights Reserved Page 461
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real Time Speech
More informationLabROSA Research Overview
LabROSA Research Overview Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu 1. Music 2. Environmental sound 3.
More informationAudio Classification and Content Description
2004:074 MASTER S THESIS Audio Classification and Content Description TOBIAS ANDERSSON MASTER OF SCIENCE PROGRAMME Department of Computer Science and Electrical Engineering Division of Signal Processing
More information3.5 Filtering with the 2D Fourier Transform Basic Low Pass and High Pass Filtering using 2D DFT Other Low Pass Filters
Contents Part I Decomposition and Recovery. Images 1 Filter Banks... 3 1.1 Introduction... 3 1.2 Filter Banks and Multirate Systems... 4 1.2.1 Discrete Fourier Transforms... 5 1.2.2 Modulated Filter Banks...
More informationDIFFERENTIAL EQUATION ANALYSIS IN BIOMEDICAL SCIENCE AND ENGINEERING
DIFFERENTIAL EQUATION ANALYSIS IN BIOMEDICAL SCIENCE AND ENGINEERING DIFFERENTIAL EQUATION ANALYSIS IN BIOMEDICAL SCIENCE AND ENGINEERING ORDINARY DIFFERENTIAL EQUATION APPLICATIONS WITH R William E. Schiesser
More informationA Wavelet Tour of Signal Processing The Sparse Way
A Wavelet Tour of Signal Processing The Sparse Way Stephane Mallat with contributions from Gabriel Peyre AMSTERDAM BOSTON HEIDELBERG LONDON NEWYORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY»TOKYO
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationParametric Coding of High-Quality Audio
Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits
More informationComparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification
More informationMPEG-7 Sound Recognition Tools
MPEG-7 Sound Recognition Tools Michael Casey, member IEEE Abstract The MPEG-7 sound recognition descriptors and description schemes consist of tools for indexing audio media using probabilistic sound models.
More informationMinimal-Impact Personal Audio Archives
Minimal-Impact Personal Audio Archives Dan Ellis, Keansub Lee, Jim Ogle Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu
More informationCS229 Final Project: Audio Query By Gesture
CS229 Final Project: Audio Query By Gesture by Steinunn Arnardottir, Luke Dahl and Juhan Nam {steinunn,lukedahl,juhan}@ccrma.stanford.edu December 2, 28 Introduction In the field of Music Information Retrieval
More information