YAAFE, AN EASY TO USE AND EFFICIENT AUDIO FEATURE EXTRACTION SOFTWARE

Size: px
Start display at page:

Download "YAAFE, AN EASY TO USE AND EFFICIENT AUDIO FEATURE EXTRACTION SOFTWARE"

Transcription

1 YAAFE, AN EASY TO USE AND EFFICIENT AUDIO FEATURE EXTRACTION SOFTWARE Benoit Mathieu, Slim Essid, Thomas Fillon, Jacques Prado, Gaël Richard Institut Telecom, Telecom ParisTech, CNRS/LTCI ABSTRACT Music Information Retrieval systems are commonly built on a feature extraction stage. For applications involving automatic classification (e.g. speech/music discrimination, music genre or mood recognition,...), traditional approaches will consider a large set of audio features to be extracted on a large dataset. In some cases, this will lead to computationally intensive systems and there is, therefore, a strong need for efficient feature extraction. In this paper, a new audio feature extraction software, YAAFE 1, is presented and compared to widely used libraries. The main advantage of YAAFE is a significantly lower complexity due to the appropriate exploitation of redundancy in the feature calculation. YAAFE remains easy to configure and each feature can be parameterized independently. Finally, the YAAFE framework and most of its core feature library are released in source code under the GNU Lesser General Public License. 1. INTRODUCTION AND RELATED WORK Most Musical Information Retrieval (MIR) systems include an initial low-level or mid-level audio feature extraction stage. For applications involving automatic classification (e.g. speech/music discrimination, music genre or mood recognition,...), traditional approaches consider a large set of audio features to be extracted on a large dataset, possibly combined with early temporal integration 2. The importance of the feature extraction stage therefore justifies the increasing effort of the community in this domain and a number of initiatives related to audio features extraction have emerged in the last ten years, with various objectives. For example, Marsyas is a software framework for audio processing [1], written in C++. It is designed as a dataflow processing framework, with the advantage of efficiency and low memory usage. Various building blocks Temporal integration is the process of summarizing features values over a segment or a texture window by computing mean, standard deviation, and/or any relevant statistical function. The term early refers to an integration performed before the classification step. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. c 2010 International Society for Music Information Retrieval. are available to build real-time applications for audio analysis, synthesis, segmentation, and classification. Marsyas is widely and successfully used for various tasks. Note however, that the audio feature extraction (bextract program) is only a small component of the whole Marsyas s framework. Extracted features are written in ARFF format, and can be directly reused with the WEKA [6] machine learning toolkit. Some classic features are available out-of-the-box. The user can select which features to extract, but parameters like frame size and overlaps are global. The user also has low control upon temporal integration. VAMP Plugins 3 is the specification of a C++ Application Programming Interface (API) for plugins allowing extraction of low level features on audio signals. The very permissive BSD-style license permits the user to develop his own plugin or application that uses existing plugins. Several plugin libraries have been developed by various research labs. VAMP Plugins comes with the Sonic Visualizer [2] application, a tool for viewing contents of music audio files together with extracted features. Batch feature extraction using VAMP Plugins can be done with the command line tool Sonic annotator 4. Users can declare features to extract in RDF files 5 with precise control over each feature parameter. Output can be written to CSV 6 or RDF files. Early temporal integration is limited to predefined segment summaries, and it is not possible to perform temporal integration over overlapping texture windows. VAMP Plugins API allows the development of independent libraries, but prevents the development of new plugins that would depend on already existing plugins. Another example, the MIR toolbox, is a Matlab toolbox dedicated to musical feature extraction [3]. Algorithms are decomposed into stages, that the user can parameterize. Functions are provided with a simple and adaptive syntax. The MIR toolbox relies on the Matlab environment and therefore benefits from already existing toolboxes and built-in visualization capabilities, but suffers from memory management limitations. Other projects also exist. jaudio [5] is a java-based audio feature extractor library, whose results are written in 3 Queen Mary, University of London Resource Description Framework is a semantic web standard. 6 Comma Separated Values 441

2 XML format. Maaate is a C++ toolkit that has been developed to analyze audio in the compressed frequency domain. FEAPI [4] is a plugin API similar to VAMP. MPEG7 also provides Matlab and C codes for feature extraction. Lately, MIR web services have surfaced. For instance, the Echo Nest 7 provides a web service API for audio feature extraction. Input files are submitted through the web, and the user receives a XML description. Whatever the objectives are, the computational efficiency of the feature extraction process remains of utmost interest. It is also clear that many features share common intermediate representations, such as spectrum magnitude, signal envelope and constant-q transform. As already observed for the VAMP plugins with the Fast Fourier Transform (FFT), performances can be drastically improved if those representations are computed only once and this especially when large feature sets are extracted. Note also that this philosophy can be extended to the different transformations (such as derivatives) of a given feature. YAAFE has therefore been created both to get the best of the previous tools and to address their main limitations in situations where a large feature set needs to be extracted from large audio collections with different parameterizations. In particular, YAAFE has been designed with the following requirements in mind: Computational efficiency with an appropriate exploitation of feature calculation redundancies. Usage simplicity with a particular attention to the feature declaration syntax. Capability to process very long audio files. Storage efficiency and simplicity. The paper is organized as follows: the architecture of YAAFE is detailed in section 2. A detailed benchmark is then proposed in section 3. Finally, we suggest some conclusions and future work in section Overview 2. YAAFE YAAFE is a command line program. Figure 1 describes how YAAFE handles feature extraction. The user has to provide the audio files and a feature extraction plan. The feature extraction plan is a text file where the user declares the features to extract, their parameters and transformations (see section 2.2). To take advantage of feature computation redundancy, YAAFE proceeds in two main stages. In a first stage, a parser analyzes the feature extraction plan in order to find common computational steps (implemented in C++ components), and a reduced dataflow graph is produced. Then in a second stage, feature extraction is applied to the given audio files according to the reduced dataflow graph and results are stored in HDF5 files (see section 2.6). 7 Figure 1. YAAFE internals overview. Python is preferred to C++ for implementing the feature library and the parser, because the Python object model and reflection allow more concise and readable code to be written. The dataflow engine and the component library have been developed in the C++ language for performance. YAAFE can be extended. Anyone can create their own extension which consists of a feature library and a component library. Provided extensions are loaded at runtime. 2.2 Feature extraction plan Features YAAFE feature extraction plan is a text file that describes the features to extract. Each line defines one feature, with the following syntax: name: Feature param=value param=value An example: m: MFCC blocksize=1024 stepsize=512 z: ZCR blocksize=1024 stepsize=512 l: LPC LPCNbCoeffs=10 ss: SpectralSlope The example above will produce 4 output datasets (see section 2.6) named m, z, l and ss, which will hold features MFCC 8, ZCR 9, LPC 10, SpectralSlope with given parameters. Missing parameters are automatically set to a predefined default value Transformations and temporal integration One can also use spatial or temporal feature transforms, such as Derivate 11, StatisticalIntegrator 12, or SlopeInte- 8 Mel-Frequency Cepstral Coefficients 9 Zero Crossing Rate 10 Linear Prediction Coefficients 11 Derivate computes first and/or second derivatives. 12 StatisticalIntegrator computes mean and standard deviation over the given frames. 442

3 Figure 3. Temporally aligned frame decomposition for different frame sizes A and B, with same step sizes. components manage audio file reading and output file writing. The dataflow engine loads components, links them according to the given dataflow graph, and manages computations and data blocks. Reading, computations and writing is done block by block, so that arbitrarily long files can be processed with a low memory occupation. 2.5 Feature timestamps alignment Figure 2. Automatic redundancy removal performed when parsing feature extraction plan. Fr(N) boxes are decompositions into analysis frames of size N. Step size is omitted but assumed equal. grator 13 to enrich his feature extraction plan. For example, a plan to extract MFCCs along with derivatives and perform early integration over 60 frames will look like this: m: MFCC > StatisticalIntegrator NbFrames=60 m1: MFCC > Derivate DOrder=1... > StatisticalIntegrator NbFrames=60 m2: MFCC > Derivate DOrder=2... > StatisticalIntegrator NbFrames=60 Obviously, m, m1, m2 are all based on MFCC computation which should be computed only once. This is discussed in the next section. 2.3 Feature plan parser Within YAAFE, each feature is defined as a sequence of computational steps. For example, MFCC is the succession of steps: Frames, FFT, MelFilterBank, Cepstrum. The same applies to feature transforms and temporal integrators. As shown in Figure 2, the feature plan parser decomposes each declared feature into steps and groups together identical steps which have the same input into a reduced directed graph of computational steps. The reduced graph can be dumped into a dot file, so an advanced user can discern how the features are really computed. 2.4 Dataflow engine Each computational step is implemented in a C++ component which performs computation on a data block. Specific 13 SlopeIntegrator computes the slope over the given frames. In a feature extraction plan, each feature may have its own analysis frame size and step size. Some features require longer analysis frame sizes than others. As we intended to use YAAFE as input for classification systems, we have ensured that extracted features are temporally aligned. This is especially important with operations like the Constant-Q Transform (CQT) that may have very large analysis frames. YAAFE addresses this issue as follows. We assume that when a feature is computed over an analysis frame, the resulting value corresponds to the time of the analysis frame center. Then, beginning with a frame centered on the signal start (left padded with zeros) ensures that all features with the same step size will be temporally aligned (see Figure 3). A feature may also have an intrinsic time-delay. For example, when applying a derivative filter, we want the output value to be aligned with the center of the derivative filter. The design of YAAFE ensures that this is handled properly and that output features will be temporally aligned. YAAFE only deals with equidistantly sampled features. However, some features like onsets have a natural representation which is event-based. In the current version, event-based features are represented as equidistantly sampled features for which the first dimension is a boolean value denoting the presence of an event. 2.6 Output format YAAFE outputs results in HDF5 files 14. Other output formats will be added in the future. The choice of the HDF5 format has initially been motivated by storage size and I/O performance. HDF5 allows for on-the-fly compression. Results are stored as double precision floating point numbers hence with no precision loss. HDF5 is a binary format designed for efficient storage of large amounts of scientific data. HDF5 files can be read 14 Hierarchical Data Format, 443

4 in the Matlab environment through built-in functions 15, and in the Python environment with the h5py package 16. HDF5 files are platform independent, so they can be easily shared. A HDF5 file can hold several datasets organized into a hierarchical structure. A dataset can be a table with several columns (or fields) of different data types, or simply a 2-D matrix of a specific data type. Attributes can be attached to datasets, an attribute has a name and a value of any data type. YAAFE creates one HDF5 file for each input audio file. For each feature declared in the feature extraction plan, one dataset is created, with some attributes attached such as the feature definition, the frame size, the step size and the sample rate. 2.7 Availability and License The YAAFE framework and a core feature library are released together under the GNU Lesser General Public License, so that it can freely be reused as a component of a bigger system. The core feature library contains several spectral features, Mel-Frequencies Cepstrum Coefficients, Loudness, Autocorrelation, Linear Prediction Coefficients, Octave Band Signal Intensities, OBSI ratios, amplitude modulation (tremolo and graininess description), complex domain onset detection [7], Zero Crossing Rate. Derivative and Cepstral transforms as well as statistical, slope and histogram early integrators are also provided. YAAFE is available for Linux platforms, source code can be downloaded 17. A separate feature library will be available in binary version and for non commercial use only. It will provide Constant-Q Transform, Chromas [8], Chord detection [9], Onset detection [10], Beat histogram summary [11]. An implementation of CQT with normalization and kernels temporal synchronicity improvements [12] from reference implementation 18 is proposed. 3. BENCHMARK We have run a small benchmark to compare YAAFE with Marsyas bextract and Sonic Annotator. The objective is to compare the design of the three system, and not the algorithms used to compute feature. We chose few similar and well-defined features, available for the three systems for which we compared CPU time, memory occupation and output size when extracting those features on the same audio collection. 3.1 Protocol We chose to extract the following features: MFCC (13 coefficients), spectral centroid, spectral rolloff, spectral crest factor, spectral flatness, and zero crossing rate. Features 15 See the hdf5info, hdf5read and hdf5write functions. YAAFE also provide useful scripts to directly load feature data into a matrix B.Blankertz, The Constant Q Transform, S.A. Marsyas YAAFE CPU time 52m05s 24m21s 6m34s RAM used 14.0 Mbs 10.6 Mbs 15.5 Mbs Output format CSV ARFF HDF5 Output size 1.74 Gbs 2.7 Gbs 1.22 Gbs Feature dim (32) 19 Table 1. Feature extraction with Sonic Annotator with VAMP libxtract plugins, Marsyas s bextract and YAAFE. All features are extracted simultaneously. Audio collection is 40 hours of 32 KHz mono wav files. Feature S.A. Marsyas YAAFE MFCC 25m06s 19m28s 2m22s Centroid 12m04s 15m42s 3m55s Rolloff 12m11s 15m51s 3m14s ZCR 3m41s 10m20s 0m57s Total 53m02s 61m21s 10m28s Table 2. CPU times for single feature extraction on the same collection as Table 1. like chroma or beat detection have been avoided because the associated algorithms can be very different. In the case of Sonic Annotator, all features are available in the Vamp libxtract plugins 19 [13]. Early temporal integration is not computed. We ran the feature extraction over about 40 hours of 32 KHz mono wav files (8.7 Gbs). The collection is composed of 80 radio excerpts of about 30mn each. The measures have been done on a Intel Core 2 Duo 3GHz machine, with 4 Gbs of RAM, under the Debian Lenny operating system. We checked that all systems used one core only. The RAM used has been measured with the ps mem.py script 20. We first ran the benchmark measuring the extraction of all features simultaneously. Then we ran the benchmark a second time measuring the extraction of each feature independently. 3.2 Results The results are described in Table 1 and Table 2. It is important to note some differences between the 3 systems that influence the results. Firstly, we could not prevent Marsyas from performing temporal integration, so we reduced integration length to 1. Consequently, the output generated by Marsyas has 32 columns: 16 columns of feature data (mean) and 16 columns of zeros (standard deviation). This explains why Marsyas has a larger output size. Secondly, YAAFE extracts spectral spread, skewness and kurtosis together with the spectral centroid. This explains why output feature dimension is 19 for YAAFE and 16 for other systems. Due to those differences the measures must be taken with caution. We can say that all systems performed well. 19 The VAMP libxtract plugins rely on the libxtract library: mem.py 444

5 YAAFE CPU time 11m15s RAM used 30.3 Mbs Output format HDF5 Output size 0.64 Gbs Feature dim. 288 Table 3. Large feature set extraction with YAAFE. Audio collection is 40 hours of 32 KHz mono wav files. They all succeed at extracting features over audio files of 30 minutes length and with low memory occupation. The sum of single extraction times in Table 2 compared to the extraction time in Table 1 shows that Sonic Annotator does not exploit computation redundancy. The VAMP plugin API allows for computing feature in the frequency domain, but this is not done by Vamp libxtract plugins. That explains why Sonic Annotator requires more CPU time than others. Marsyas performance clearly suffers from writing 16 column of zeros. For the evaluated task, the CPU times in Table 1 show that YAAFE tends to be faster than Marsyas. As Sonic Annotator stored the timestamp in each output files (one per feature), and half of Marsyas output is additional zeros, we can say that Sonic Annotator and Marsyas outputs are roughly equivalent in space. This is not a surprise as both CSV and ARFF format are text formats. Using HDF5 format, YAAFE stores more feature data, with no precision loss, using less space. 3.3 Extracting many features YAAFE is designed for extracting a large number of features simultaneously. To check how it performs in such situation we ran YAAFE a second time under the same conditions but with a larger feature extraction plan. In this run, we extracted MFCCs, various spectral features, loudness, loudness-based sharpness and spread, and zero crossing rate. For each feature except zero crossing rate, we added first and second derivatives. Then we performed early temporal integration by computing mean and standard deviation over sliding windows of 1 second with a step of 0.5 second. The total output dimension is 288. The results are presented in Table 3. It should be emphasized that temporal integration is done, so the output size is much smaller than in the previous run. As a larger feature set is extracted, the dataflow graph is larger and uses more RAM. The CPU time shows that YAAFE remains very efficient in this situation. 4. CONCLUSIONS AND FUTURE WORK In this paper, a new audio feature extraction software, YAAFE, is introduced. YAAFE is especially efficient in situations where many features are simultaneously extracted over large audio collections. To achieve this, the feature computation redundancies are appropriately exploited in a two step extraction process. First, the feature extraction plan is analyzed, each feature is decomposed into computational steps and a reduced dataflow graph is produced. Then, a dataflow engine processes computations block by block over the given audio files. YAAFE remains easy to use. The feature extraction plan is a text file where the user can declare features to extract, transformations and early temporal integration according to a very simple syntax. YAAFE has already been used in Quaero project internal evaluation campaigns for the music/speech discrimination and musical genre recognition tasks. Future plans include the extension of the toolbox with additional high level features such as fundamental frequency estimator, melody detection and tempo estimator and the extension to alternative output formats. 5. ACKNOWLEDGMENT This work was done as part of the Quaero Programme, funded by OSEO, French State agency for innovation. 6. REFERENCES [1] G. Tzanetakis, and P. Cook: MARSYAS: A framework for audio analysis, Org. Sound, Vol. 4, No. 3, pp , [2] C. Cannam, C. Landone, M. Sandler, and J. Bello: The Sonic Visualiser: A Visualisation Platform For Semantic Descriptiors From Musical Signals, Proceedings of International Conference on Musical Information Retrieval, Victoria, Canada, [3] O. Lartillot, and P. Toiviainen: A MATLAB TOOL- BOX FOR MUSICAL FEATURE EXTRACTION FROM AUDIO, Proceedings of the International Conference on Digital Audio Effects (DAFx 07), [4] A. Lerch, G. Eisenberg, and K. Tanghe: FEAPI, A LOW LEVEL FEATURES EXTRACTION PLUGIN API, Proceedings of the International Conference on Digital Audio Effects, (DAFx 05), [5] D. McEnnis, C. McKay, I. Fujinaga, and P. Depalle: jaudio: A feature extraction library, Proceedings of the International Conference on Music Information Retrieval, pp , [6] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten: The WEKA Data Mining Software: An Update, SIGKDD Explorations, Vol. 11, Issue 1, 2009 [7] C.Duxbury et al., Complex domain onset detection for musical signals, Proceedings of the International Conference on Digital Audio Effects, (DAFx 03), [8] J.P. Bello and J. Pickens: A Robust Mid-level Representation for Harmonic Content in Music Signals., Proceedings of the 6th International Conference on Music Information Retrieval, (ISMIR-05),

6 [9] L.Oudre, Y.Grenier, C.Fevotte: TEMPLATE-BASED CHORD RECOGNITION : INFLUENCE OF THE CHORD TYPES, Proceedings of the International Conference on Music Information Retrieval, 2009 [10] M.Alonso, G.Richard. B.David: EXTRACTING NOTE ONSETS FROM MUSICAL RECORDINGS, International Conference on Multimedia and Expo (IEEE-ICME 05), [11] G. Tzanetakis, Musical Genre Classification of Audio Signals, IEEE Transactions on speech and audio processing, vol. 10, No. 5, [12] J.Prado, Transformée à Q constant, technical report 2010D004, download.cgi?did=185, Institut TELE- COM, TELECOM ParisTech, CNRS LTCI, [13] J. Bullock, Libxtract: A lightweight library for audio feature extraction, in Proceedings of the International Computer Music Conference,

jaudio: Towards a standardized extensible audio music feature extraction system

jaudio: Towards a standardized extensible audio music feature extraction system jaudio: Towards a standardized extensible audio music feature extraction system Cory McKay Faculty of Music, McGill University 555 Sherbrooke Street West Montreal, Quebec, Canada H3A 1E3 cory.mckay@mail.mcgill.ca

More information

A MUSICAL WEB MINING AND AUDIO FEATURE EXTRACTION EXTENSION TO THE GREENSTONE DIGITAL LIBRARY SOFTWARE

A MUSICAL WEB MINING AND AUDIO FEATURE EXTRACTION EXTENSION TO THE GREENSTONE DIGITAL LIBRARY SOFTWARE 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A MUSICAL WEB MINING AND AUDIO FEATURE EXTRACTION EXTENSION TO THE GREENSTONE DIGITAL LIBRARY SOFTWARE Cory McKay Marianopolis

More information

ADDITIONS AND IMPROVEMENTS TO THE ACE 2.0 MUSIC CLASSIFIER

ADDITIONS AND IMPROVEMENTS TO THE ACE 2.0 MUSIC CLASSIFIER ADDITIONS AND IMPROVEMENTS TO THE ACE 2.0 MUSIC CLASSIFIER Jessica Thompson Cory McKay John Ashley Burgoyne Ichiro Fujinaga Music Technology jessica.thompson@ mail.mcgill.ca CIRMMT cory.mckay@ mail.mcgill.ca

More information

A Simulated Annealing Optimization of Audio Features for Drum Classification

A Simulated Annealing Optimization of Audio Features for Drum Classification A Simulated Annealing Optimization of Audio Features for Drum Classification Sven Degroeve 1, Koen Tanghe 2, Bernard De Baets 1, Marc Leman 2 and Jean-Pierre Martens 3 1 Department of Applied Mathematics,

More information

Multimedia Database Systems. Retrieval by Content

Multimedia Database Systems. Retrieval by Content Multimedia Database Systems Retrieval by Content MIR Motivation Large volumes of data world-wide are not only based on text: Satellite images (oil spill), deep space images (NASA) Medical images (X-rays,

More information

AN AUDIO PROCESSING LIBRARY FOR MIR APPLICATION DEVELOPMENT IN FLASH

AN AUDIO PROCESSING LIBRARY FOR MIR APPLICATION DEVELOPMENT IN FLASH 11th International Society for Music Information Retrieval Conference (ISMIR 2010) AN AUDIO PROCESSING LIBRARY FOR MIR APPLICATION DEVELOPMENT IN FLASH Jeffrey Scott, Raymond Migneco, Brandon Morton,Christian

More information

A GENERIC SYSTEM FOR AUDIO INDEXING: APPLICATION TO SPEECH/ MUSIC SEGMENTATION AND MUSIC GENRE RECOGNITION

A GENERIC SYSTEM FOR AUDIO INDEXING: APPLICATION TO SPEECH/ MUSIC SEGMENTATION AND MUSIC GENRE RECOGNITION A GENERIC SYSTEM FOR AUDIO INDEXING: APPLICATION TO SPEECH/ MUSIC SEGMENTATION AND MUSIC GENRE RECOGNITION Geoffroy Peeters IRCAM - Sound Analysis/Synthesis Team, CNRS - STMS Paris, France peeters@ircam.fr

More information

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri 1 CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING Alexander Wankhammer Peter Sciri introduction./the idea > overview What is musical structure?

More information

Meyda: an audio feature extraction library for the Web Audio API

Meyda: an audio feature extraction library for the Web Audio API Meyda: an audio feature extraction library for the Web Audio API Hugh Rawlinson Nevo Segal Jakub Fiala Goldsmiths, University of London New Cross London SE14 6NW mu202hr@gold.ac.uk mu202ns@gold.ac.uk mu201jf@gold.ac.uk

More information

Ontological Representation of Audio Features

Ontological Representation of Audio Features Ontological Representation of Audio Features Alo Allik, György Fazekas, and Mark Sandler Queen Mary University of London, {a.allik, g.fazekas, mark.sandler}@qmul.ac.uk Abstract. Feature extraction algorithms

More information

MARSYAS SUBMISSIONS TO MIREX 2010

MARSYAS SUBMISSIONS TO MIREX 2010 MARSYAS SUBMISSIONS TO MIREX 2010 George Tzanetakis University of Victoria Computer Science gtzan@cs.uvic.ca ABSTRACT Marsyas is an open source software framework for audio analysis, synthesis and retrieval

More information

Music Genre Classification

Music Genre Classification Music Genre Classification Matthew Creme, Charles Burlin, Raphael Lenain Stanford University December 15, 2016 Abstract What exactly is it that makes us, humans, able to tell apart two songs of different

More information

Implementing a Speech Recognition System on a GPU using CUDA. Presented by Omid Talakoub Astrid Yi

Implementing a Speech Recognition System on a GPU using CUDA. Presented by Omid Talakoub Astrid Yi Implementing a Speech Recognition System on a GPU using CUDA Presented by Omid Talakoub Astrid Yi Outline Background Motivation Speech recognition algorithm Implementation steps GPU implementation strategies

More information

A TIMBRE ANALYSIS AND CLASSIFICATION TOOLKIT FOR PURE DATA

A TIMBRE ANALYSIS AND CLASSIFICATION TOOLKIT FOR PURE DATA A TIMBRE ANALYSIS AND CLASSIFICATION TOOLKIT FOR PURE DATA William Brent University of California, San Diego Center for Research in Computing and the Arts ABSTRACT This paper describes example applications

More information

Robustness and independence of voice timbre features under live performance acoustic degradations

Robustness and independence of voice timbre features under live performance acoustic degradations Robustness and independence of voice timbre features under live performance acoustic degradations Dan Stowell and Mark Plumbley dan.stowell@elec.qmul.ac.uk Centre for Digital Music Queen Mary, University

More information

CHAPTER 7 MUSIC INFORMATION RETRIEVAL

CHAPTER 7 MUSIC INFORMATION RETRIEVAL 163 CHAPTER 7 MUSIC INFORMATION RETRIEVAL Using the music and non-music components extracted, as described in chapters 5 and 6, we can design an effective Music Information Retrieval system. In this era

More information

Overview of OMEN. 1. Introduction

Overview of OMEN. 1. Introduction Overview of OMEN Daniel McEnnis, Cory McKay, and Ichiro Fujinaga Music Technology, Schulich School of Music, McGill University 555 Sherbrooke Street West Montreal, QC H3A 1E3 {daniel.mcennis, cory.mckay}@mail.mcgill.ca,

More information

USING ACE XML 2.0 TO STORE AND SHARE FEATURE, INSTANCE AND CLASS DATA FOR MUSICAL CLASSIFICATION

USING ACE XML 2.0 TO STORE AND SHARE FEATURE, INSTANCE AND CLASS DATA FOR MUSICAL CLASSIFICATION USING ACE XML 2.0 TO STORE AND SHARE FEATURE, INSTANCE AND CLASS DATA FOR MUSICAL CLASSIFICATION Cory McKay John Ashley Burgoyne Jessica Thompson Ichiro Fujinaga CIRMMT CIRMMT Music Technology CIRMMT cory.mckay@

More information

Ontological Representation of Audio Features

Ontological Representation of Audio Features Ontological Representation of Audio Features Alo Allik, György Fazekas, and Mark Sandler Queen Mary University of London, {a.allik, g.fazekas, mark.sandler}@qmul.ac.uk Abstract. Feature extraction algorithms

More information

The Automatic Musicologist

The Automatic Musicologist The Automatic Musicologist Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego UCSD AI Seminar April 12, 2004 Based on the paper: Fast Recognition of Musical

More information

Machine Learning for Music Discovery

Machine Learning for Music Discovery Talk Machine Learning for Music Discovery Joan Serrà Artificial Intelligence Research Institute (IIIA-CSIC) Spanish National Research Council jserra@iiia.csic.es http://joanserra.weebly.com Joan Serrà

More information

Cepstral Analysis Tools for Percussive Timbre Identification

Cepstral Analysis Tools for Percussive Timbre Identification Cepstral Analysis Tools for Percussive Timbre Identification William Brent Department of Music and Center for Research in Computing and the Arts University of California, San Diego wbrent@ucsd.edu ABSTRACT

More information

Experiments in computer-assisted annotation of audio

Experiments in computer-assisted annotation of audio Experiments in computer-assisted annotation of audio George Tzanetakis Computer Science Dept. Princeton University en St. Princeton, NJ 844 USA +1 69 8 491 gtzan@cs.princeton.edu Perry R. Cook Computer

More information

Text-Independent Speaker Identification

Text-Independent Speaker Identification December 8, 1999 Text-Independent Speaker Identification Til T. Phan and Thomas Soong 1.0 Introduction 1.1 Motivation The problem of speaker identification is an area with many different applications.

More information

Advanced techniques for management of personal digital music libraries

Advanced techniques for management of personal digital music libraries Advanced techniques for management of personal digital music libraries Jukka Rauhala TKK, Laboratory of Acoustics and Audio signal processing Jukka.Rauhala@acoustics.hut.fi Abstract In this paper, advanced

More information

Available online Journal of Scientific and Engineering Research, 2016, 3(4): Research Article

Available online   Journal of Scientific and Engineering Research, 2016, 3(4): Research Article Available online www.jsaer.com, 2016, 3(4):417-422 Research Article ISSN: 2394-2630 CODEN(USA): JSERBR Automatic Indexing of Multimedia Documents by Neural Networks Dabbabi Turkia 1, Lamia Bouafif 2, Ellouze

More information

SyncPlayer Public Demo

SyncPlayer Public Demo SyncPlayer 1.5.2 Public Demo Christian Fremerey and Frank Kurth Multimedia Signal Processing Group Department of Computer Science III University of Bonn, Römerstraße 164, 53117 Bonn, Germany e-mail: {fremerey,frank}@iai.uni-bonn.de

More information

arxiv: v1 [cs.sd] 19 Jan 2015

arxiv: v1 [cs.sd] 19 Jan 2015 Technical Report Institut Langevin ESPCI - CNRS - Paris Diderot University - UPMC arxiv:1501.04981v1 [cs.sd] 19 Jan 2015 1 rue Jussieu 75005 Paris France Corresponding author: laurent.daudet@espci.fr Listening

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Repeating Segment Detection in Songs using Audio Fingerprint Matching

Repeating Segment Detection in Songs using Audio Fingerprint Matching Repeating Segment Detection in Songs using Audio Fingerprint Matching Regunathan Radhakrishnan and Wenyu Jiang Dolby Laboratories Inc, San Francisco, USA E-mail: regu.r@dolby.com Institute for Infocomm

More information

1 Introduction. 3 Data Preprocessing. 2 Literature Review

1 Introduction. 3 Data Preprocessing. 2 Literature Review Rock or not? This sure does. [Category] Audio & Music CS 229 Project Report Anand Venkatesan(anand95), Arjun Parthipan(arjun777), Lakshmi Manoharan(mlakshmi) 1 Introduction Music Genre Classification continues

More information

Mining Large-Scale Music Data Sets

Mining Large-Scale Music Data Sets Mining Large-Scale Music Data Sets Dan Ellis & Thierry Bertin-Mahieux Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu

More information

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Extracting meaning

More information

Affective Music Video Content Retrieval Features Based on Songs

Affective Music Video Content Retrieval Features Based on Songs Affective Music Video Content Retrieval Features Based on Songs R.Hemalatha Department of Computer Science and Engineering, Mahendra Institute of Technology, Mahendhirapuri, Mallasamudram West, Tiruchengode,

More information

A Brief Overview of Audio Information Retrieval. Unjung Nam CCRMA Stanford University

A Brief Overview of Audio Information Retrieval. Unjung Nam CCRMA Stanford University A Brief Overview of Audio Information Retrieval Unjung Nam CCRMA Stanford University 1 Outline What is AIR? Motivation Related Field of Research Elements of AIR Experiments and discussion Music Classification

More information

Further Studies of a FFT-Based Auditory Spectrum with Application in Audio Classification

Further Studies of a FFT-Based Auditory Spectrum with Application in Audio Classification ICSP Proceedings Further Studies of a FFT-Based Auditory with Application in Audio Classification Wei Chu and Benoît Champagne Department of Electrical and Computer Engineering McGill University, Montréal,

More information

A SIMPLE, HIGH-YIELD METHOD FOR ASSESSING STRUCTURAL NOVELTY

A SIMPLE, HIGH-YIELD METHOD FOR ASSESSING STRUCTURAL NOVELTY A SIMPLE, HIGH-YIELD METHOD FOR ASSESSING STRUCTURAL NOVELTY Olivier Lartillot, Donato Cereghetti, Kim Eliard, Didier Grandjean Swiss Center for Affective Sciences, University of Geneva, Switzerland olartillot@gmail.com

More information

Dietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++

Dietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++ Dietrich Paulus Joachim Hornegger Pattern Recognition of Images and Speech in C++ To Dorothea, Belinda, and Dominik In the text we use the following names which are protected, trademarks owned by a company

More information

Image Classification for JPEG Compression

Image Classification for JPEG Compression Image Classification for Compression Jevgenij Tichonov Vilnius University, Institute of Mathematics and Informatics Akademijos str. 4 LT-08663, Vilnius jevgenij.tichonov@gmail.com Olga Kurasova Vilnius

More information

Detection of goal event in soccer videos

Detection of goal event in soccer videos Detection of goal event in soccer videos Hyoung-Gook Kim, Steffen Roeber, Amjad Samour, Thomas Sikora Department of Communication Systems, Technical University of Berlin, Einsteinufer 17, D-10587 Berlin,

More information

Mpeg 1 layer 3 (mp3) general overview

Mpeg 1 layer 3 (mp3) general overview Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

SCREAM AND GUNSHOT DETECTION IN NOISY ENVIRONMENTS

SCREAM AND GUNSHOT DETECTION IN NOISY ENVIRONMENTS SCREAM AND GUNSHOT DETECTION IN NOISY ENVIRONMENTS L. Gerosa, G. Valenzise, M. Tagliasacchi, F. Antonacci, A. Sarti Dipartimento di Elettronica e Informazione, Politecnico di Milano Piazza Leonardo da

More information

high performance medical reconstruction using stream programming paradigms

high performance medical reconstruction using stream programming paradigms high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming

More information

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi 1. Introduction The choice of a particular transform in a given application depends on the amount of

More information

LabROSA Research Overview

LabROSA Research Overview LabROSA Research Overview Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu 1. Music 2. Environmental sound 3.

More information

Two-layer Distance Scheme in Matching Engine for Query by Humming System

Two-layer Distance Scheme in Matching Engine for Query by Humming System Two-layer Distance Scheme in Matching Engine for Query by Humming System Feng Zhang, Yan Song, Lirong Dai, Renhua Wang University of Science and Technology of China, iflytek Speech Lab, Hefei zhangf@ustc.edu,

More information

SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION 1

SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION 1 SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION 1 Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone Institute for Signal and Information Processing, Mississippi State University {gao, duncan, baca,

More information

arxiv: v1 [cs.lg] 5 Mar 2013

arxiv: v1 [cs.lg] 5 Mar 2013 GURLS: a Least Squares Library for Supervised Learning Andrea Tacchetti, Pavan K. Mallapragada, Matteo Santoro, Lorenzo Rosasco Center for Biological and Computational Learning, Massachusetts Institute

More information

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,

More information

Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification

Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification

More information

TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University

TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION Prateek Verma, Yang-Kai Lin, Li-Fan Yu Stanford University ABSTRACT Structural segmentation involves finding hoogeneous sections appearing

More information

Saliency Detection for Videos Using 3D FFT Local Spectra

Saliency Detection for Videos Using 3D FFT Local Spectra Saliency Detection for Videos Using 3D FFT Local Spectra Zhiling Long and Ghassan AlRegib School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA ABSTRACT

More information

A framework for audio analysis

A framework for audio analysis MARSYAS: A framework for audio analysis George Tzanetakis 1 Department of Computer Science Princeton University Perry Cook 2 Department of Computer Science 3 and Department of Music Princeton University

More information

AUDIO information often plays an essential role in understanding

AUDIO information often plays an essential role in understanding 1062 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval Serkan Kiranyaz,

More information

MatCL - OpenCL MATLAB Interface

MatCL - OpenCL MATLAB Interface MatCL - OpenCL MATLAB Interface MatCL - OpenCL MATLAB Interface Slide 1 MatCL - OpenCL MATLAB Interface OpenCL toolkit for Mathworks MATLAB/SIMULINK Compile & Run OpenCL Kernels Handles OpenCL memory management

More information

Intel s MMX. Why MMX?

Intel s MMX. Why MMX? Intel s MMX Dr. Richard Enbody CSE 820 Why MMX? Make the Common Case Fast Multimedia and Communication consume significant computing resources. Providing specific hardware support makes sense. 1 Goals

More information

Digital Presentation and Preservation of Cultural and Scientific Heritage International Conference

Digital Presentation and Preservation of Cultural and Scientific Heritage International Conference Radoslav Pavlov Peter Stanchev Editors Digital Presentation and Preservation of Cultural and Scientific Heritage International Conference Veliko Tarnovo, Bulgaria September 18 21, 2013 Proceedings Volume

More information

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

New Results in Low Bit Rate Speech Coding and Bandwidth Extension Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without

More information

City Research Online. Permanent City Research Online URL:

City Research Online. Permanent City Research Online URL: Tidhar, D., Fazekas, G., Kolozali, S. & Sandler, M. (2009). Publishing Music Similarity Features on the Semantic Web.. Paper presented at the 10th International Society for Music Information Retrieval

More information

Contents. ACE Presentation. Comparison with existing frameworks. Technical aspects. ACE 2.0 and future work. 24 October 2009 ACE 2

Contents. ACE Presentation. Comparison with existing frameworks. Technical aspects. ACE 2.0 and future work. 24 October 2009 ACE 2 ACE Contents ACE Presentation Comparison with existing frameworks Technical aspects ACE 2.0 and future work 24 October 2009 ACE 2 ACE Presentation 24 October 2009 ACE 3 ACE Presentation Framework for using

More information

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

An Acceleration Scheme to The Local Directional Pattern

An Acceleration Scheme to The Local Directional Pattern An Acceleration Scheme to The Local Directional Pattern Y.M. Ayami Durban University of Technology Department of Information Technology, Ritson Campus, Durban, South Africa ayamlearning@gmail.com A. Shabat

More information

2014, IJARCSSE All Rights Reserved Page 461

2014, IJARCSSE All Rights Reserved Page 461 Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real Time Speech

More information

Multimedia Data Mining in Digital Libraries: Standards and Features

Multimedia Data Mining in Digital Libraries: Standards and Features Multimedia Data Mining in Digital Libraries: Standards and Features Sanjeevkumar R. Jadhav *, and Praveenkumar Kumbargoudar * Abstract The digital library retrieves, collects, stores and preserves the

More information

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding Heiko Purnhagen Laboratorium für Informationstechnologie University of Hannover, Germany Outline Introduction What is "Parametric Audio Coding"?

More information

Automatic Classification of Audio Data

Automatic Classification of Audio Data Automatic Classification of Audio Data Carlos H. C. Lopes, Jaime D. Valle Jr. & Alessandro L. Koerich IEEE International Conference on Systems, Man and Cybernetics The Hague, The Netherlands October 2004

More information

MULTIPLE HYPOTHESES AT MULTIPLE SCALES FOR AUDIO NOVELTY COMPUTATION WITHIN MUSIC. Florian Kaiser and Geoffroy Peeters

MULTIPLE HYPOTHESES AT MULTIPLE SCALES FOR AUDIO NOVELTY COMPUTATION WITHIN MUSIC. Florian Kaiser and Geoffroy Peeters MULTIPLE HYPOTHESES AT MULTIPLE SCALES FOR AUDIO NOVELTY COMPUTATION WITHIN MUSIC Florian Kaiser and Geoffroy Peeters STMS IRCAM-CNRS-UPMC 1 Place Igor Stravinsky 75004 Paris florian.kaiser@ircam.fr ABSTRACT

More information

FUSING BLOCK-LEVEL FEATURES FOR MUSIC SIMILARITY ESTIMATION

FUSING BLOCK-LEVEL FEATURES FOR MUSIC SIMILARITY ESTIMATION FUSING BLOCK-LEVEL FEATURES FOR MUSIC SIMILARITY ESTIMATION Klaus Seyerlehner Dept. of Computational Perception, Johannes Kepler University Linz, Austria klaus.seyerlehner@jku.at Gerhard Widmer Dept. of

More information

A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval

A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval 1 A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval Serkan Kiranyaz,

More information

DETECTING INDOOR SOUND EVENTS

DETECTING INDOOR SOUND EVENTS DETECTING INDOOR SOUND EVENTS Toma TELEMBICI, Lacrimioara GRAMA Signal Processing Group, Basis of Electronics Department, Faculty of Electronics, Telecommunications and Information Technology, Technical

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

A MULTIPOINT VIDEOCONFERENCE RECEIVER BASED ON MPEG-4 OBJECT VIDEO. Chih-Kai Chien, Chen-Yu Tsai, and David W. Lin

A MULTIPOINT VIDEOCONFERENCE RECEIVER BASED ON MPEG-4 OBJECT VIDEO. Chih-Kai Chien, Chen-Yu Tsai, and David W. Lin A MULTIPOINT VIDEOCONFERENCE RECEIVER BASED ON MPEG-4 OBJECT VIDEO Chih-Kai Chien, Chen-Yu Tsai, and David W. Lin Dept. of Electronics Engineering and Center for Telecommunications Research National Chiao

More information

MUSIC GENRE CLASSIFICATION VIA COMPRESSIVE SAMPLING

MUSIC GENRE CLASSIFICATION VIA COMPRESSIVE SAMPLING MUSIC GENRE CLASSIFICATION VIA COMPRESSIVE SAMPLING Kaichun K. Chang Department of Computer Science King s College London London, United Kingdom ken.chang@kcl.ac.uk Jyh-Shing Roger Jang Department of Computer

More information

Multimedia Communications. Audio coding

Multimedia Communications. Audio coding Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

Content-based Video Genre Classification Using Multiple Cues

Content-based Video Genre Classification Using Multiple Cues Content-based Video Genre Classification Using Multiple Cues Hazım Kemal Ekenel Institute for Anthropomatics Karlsruhe Institute of Technology (KIT) 76131 Karlsruhe, Germany ekenel@kit.edu Tomas Semela

More information

BAT: An open-source, web-based audio events annotation tool

BAT: An open-source, web-based audio events annotation tool BAT: An open-source, web-based audio events annotation tool Meléndez-Catalán, Blai; Molina, Emilio; Gómez, Emilia Attribution-NonCommercial-NoDerivs 3.0 United States For additional information about this

More information

ANALYZING THE MILLION SONG DATASET USING MAPREDUCE

ANALYZING THE MILLION SONG DATASET USING MAPREDUCE PROGRAMMING ASSIGNMENT 3 ANALYZING THE MILLION SONG DATASET USING MAPREDUCE Version 1.0 DUE DATE: Wednesday, October 18 th, 2017 @ 5:00 pm OBJECTIVE You will be developing MapReduce programs that parse

More information

A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO

A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO International journal of computer science & information Technology (IJCSIT) Vol., No.5, October A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO Pranab Kumar Dhar *, Mohammad

More information

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia? Multimedia What is multimedia? Media types +Text + Graphics + Audio +Image +Video Interchange formats What is multimedia? Multimedia = many media User interaction = interactivity Script = time 1 2 Most

More information

Parallel FFT Program Optimizations on Heterogeneous Computers

Parallel FFT Program Optimizations on Heterogeneous Computers Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid

More information

5. Feature Extraction from Images

5. Feature Extraction from Images 5. Feature Extraction from Images Aim of this Chapter: Learn the Basic Feature Extraction Methods for Images Main features: Color Texture Edges Wie funktioniert ein Mustererkennungssystem Test Data x i

More information

Analyzing structure: segmenting musical audio

Analyzing structure: segmenting musical audio Analyzing structure: segmenting musical audio Musical form (1) Can refer to the type of composition (as in multi-movement form), e.g. symphony, concerto, etc Of more relevance to this class, it refers

More information

Movie synchronization by audio landmark matching

Movie synchronization by audio landmark matching Movie synchronization by audio landmark matching Ngoc Q. K. Duong, Franck Thudor To cite this version: Ngoc Q. K. Duong, Franck Thudor. Movie synchronization by audio landmark matching. IEEE International

More information

Study and application of acoustic information for the detection of harmful content, and fusion with visual information.

Study and application of acoustic information for the detection of harmful content, and fusion with visual information. Study and application of acoustic information for the detection of harmful content, and fusion with visual information. Theodoros Giannakopoulos National and Kapodistrian University of Athens Department

More information

Design of Feature Extraction Circuit for Speech Recognition Applications

Design of Feature Extraction Circuit for Speech Recognition Applications Design of Feature Extraction Circuit for Speech Recognition Applications SaambhaviVB, SSSPRao and PRajalakshmi Indian Institute of Technology Hyderabad Email: ee10m09@iithacin Email: sssprao@cmcltdcom

More information

FPDJ. Baltazar Ortiz, Angus MacMullen, Elena Byun

FPDJ. Baltazar Ortiz, Angus MacMullen, Elena Byun Overview FPDJ Baltazar Ortiz, Angus MacMullen, Elena Byun As electronic music becomes increasingly prevalent, many listeners wonder how to make their own music. While there is software that allows musicians

More information

An Architecture for Animal Sound Identification based on Multiple Feature Extraction and Classification Algorithms

An Architecture for Animal Sound Identification based on Multiple Feature Extraction and Classification Algorithms An Architecture for Animal Sound Identification based on Multiple Feature Extraction and Classification Algorithms Leandro Tacioli 1, Luís Felipe Toledo 2, Claudia Bauzer Medeiros 1 1 Institute of Computing,

More information

CHAPTER 3. Preprocessing and Feature Extraction. Techniques

CHAPTER 3. Preprocessing and Feature Extraction. Techniques CHAPTER 3 Preprocessing and Feature Extraction Techniques CHAPTER 3 Preprocessing and Feature Extraction Techniques 3.1 Need for Preprocessing and Feature Extraction schemes for Pattern Recognition and

More information

Speech and audio coding

Speech and audio coding Institut Mines-Telecom Speech and audio coding Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced compression Outline Introduction Introduction Speech signal Music signal Masking Codeurs simples

More information

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1 Multimedia What is multimedia? Media types + Text +Graphics +Audio +Image +Video Interchange formats Petri Vuorimaa 1 What is multimedia? Multimedia = many media User interaction = interactivity Script

More information

Lecture 3 Image and Video (MPEG) Coding

Lecture 3 Image and Video (MPEG) Coding CS 598KN Advanced Multimedia Systems Design Lecture 3 Image and Video (MPEG) Coding Klara Nahrstedt Fall 2017 Overview JPEG Compression MPEG Basics MPEG-4 MPEG-7 JPEG COMPRESSION JPEG Compression 8x8 blocks

More information

Performance of MPEG-7 low level audio descriptors with compressed data

Performance of MPEG-7 low level audio descriptors with compressed data University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2003 Performance of MPEG-7 low level audio descriptors with compressed

More information

MPEG-7 Audio: Tools for Semantic Audio Description and Processing

MPEG-7 Audio: Tools for Semantic Audio Description and Processing MPEG-7 Audio: Tools for Semantic Audio Description and Processing Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Why semantic description

More information

FFT-Based Astronomical Image Registration and Stacking using GPU

FFT-Based Astronomical Image Registration and Stacking using GPU M. Aurand 4.21.2010 EE552 FFT-Based Astronomical Image Registration and Stacking using GPU The productive imaging of faint astronomical targets mandates vanishingly low noise due to the small amount of

More information

MP3 Speech and Speaker Recognition with Nearest Neighbor. ECE417 Multimedia Signal Processing Fall 2017

MP3 Speech and Speaker Recognition with Nearest Neighbor. ECE417 Multimedia Signal Processing Fall 2017 MP3 Speech and Speaker Recognition with Nearest Neighbor ECE417 Multimedia Signal Processing Fall 2017 Goals Given a dataset of N audio files: Features Raw Features, Cepstral (Hz), Cepstral (Mel) Classifier

More information

ALIGNED HIERARCHIES: A MULTI-SCALE STRUCTURE-BASED REPRESENTATION FOR MUSIC-BASED DATA STREAMS

ALIGNED HIERARCHIES: A MULTI-SCALE STRUCTURE-BASED REPRESENTATION FOR MUSIC-BASED DATA STREAMS ALIGNED HIERARCHIES: A MULTI-SCALE STRUCTURE-BASED REPRESENTATION FOR MUSIC-BASED DATA STREAMS Katherine M. Kinnaird Department of Mathematics, Statistics, and Computer Science Macalester College, Saint

More information

Evaluation in Quaero. Edouard Geoffrois, DGA Quaero Technology Evaluation Manager. Quaero/imageCLEF workshop Aarhus, Denmark Sept 16 th, 2008

Evaluation in Quaero. Edouard Geoffrois, DGA Quaero Technology Evaluation Manager. Quaero/imageCLEF workshop Aarhus, Denmark Sept 16 th, 2008 Evaluation in Quaero Edouard Geoffrois, DGA Quaero Technology Evaluation Manager Quaero/imageCLEF workshop Aarhus, Denmark Sept 16 th, 2008 Presentation outline The Quaero program Context, scope and approach

More information