Genre Classification of Compressed Audio Data

Size: px
Start display at page:

Download "Genre Classification of Compressed Audio Data"

Transcription

1 Genre Classification of Compressed Audio Data Antonello Rizzi, Nicola Maurizio Buccino, Massimo Panella, and Aurelio Uncini INFO COM Department, University of Rome La Sapienza Via Eudossiana 18, Rome, Italy Abstract This paper deals with the musical genre classification problem, starting from a set of features extracted directly from MPEG 1 layer III compressed audio data. The automatic classification of compressed audio signals into a short hierarchy of musical genres is explored. More specifically, three feature sets for representing timbre, rhythmic content and energy content are proposed for a four leafs tree genre hierarchy. The adopted set of features are computed from the spectral information available in the MPEG decoding stage. The performance and relative importance of the proposed approach is investigated by training a classification model using the audio collections proposed in musical genre contests. We also used an optimization strategy based on genetic algorithms. The results are comparable to those obtained by PCM based musical genre classification systems. I. INTRODUCTION Genre hierarchies, typically created manually by human experts, are currently one of the ways used to structure music contents on the WEB. Automatic musical genre classification can potentially automate this process and provide an important component for a complete music information retrieval system for audio signals. In this paper, the problem of automatically classifying MPEG 1 layer III (MP3) compressed audio signals into a hierarchy of musical genres is addressed. Although music on the Web is usually in a compressed form, in particular MP3, most of the known techniques use features calculated form PCM or MIDI audio data [1]. The technique proposed in this paper allows the direct classification of partially decompressed data, thus avoiding the complete PCM decompression. For the datasets proposed herein, we will consider a four leafs taxonomy made of classical, electronic, pop and world music. The adopted taxonomy seems to be a good compromise between the physical and perceptual features that characterize a musical genre. As concerns research works related to the classification problem using MPEG compressed audio data very little has been done, mostly regarding the music/speech classification task [2] [5]. An interesting work on musical genre classification in the compressed domain is illustrated in [6]. In this paper, the Author makes a classification in 6 different genres: blues, easy listening, classical, opera, dance (techno) and indie rock. The features used for this task are energy related features, in particular the cepstrum coefficients and two different classification strategies are compared: a GMM (Gaussian Mixture Model) and a Vector Tree Quantization (VTQ). Results show a 90.9% accuracy for GMM and 85.1% for VTQ. Comparing these results with the ones obtained using PCM audio data, there is an accuracy deterioration of about 4%. In our paper, three sets of features for representing timbre, rhythmic content and energy content are proposed. Although timbral and energy features are also used for speech and general sound classification, the rhythmic feature set is novel and specifically designed to represent aspects of musical content. Moreover, a psychoacoustic preprocessing is performed in order to enhance the perceptual aspect of the proposed features set. Its performance and relative importance is evaluated by training a classification model and by an automatic feature selection procedure based on a genetic optimization technique. Data used for the evaluation of the proposed classification system are audio collections available in well known repositories on the Web. II. PSYCHOACOUSTIC REMARKS In our system some psychoacoustic considerations are taken into account, before the sheer feature extraction stage, in order to extract the significant information related to subbands. As can be seen in Fig.1, the process starts as a normal MP3 decompression, including bitstream parsing and frequency sample de-quantization. Once subband data become available, they are used as a source for further computations rather than for synthesizing actual samples with the synthesis filter. We consider in this paper MP3 compressed audio data [7]. In the first step of the MPEG encoding process, the audio signal is converted into 32 equally spaced spectral components using an analysis filterbank. For every 32 consecutive PCM input audio samples (corresponding to T =32 T c seconds of audio signal sampled at a period of T c seconds), the filterbank provides 32 subband samples s i [k] = s i (kt), one sample per subband indexed by i = The layer III algorithm groups the input signal in frames made of 1152 PCM samples. Each MP3 frame consists of two granules, each of 576 PCM samples. With a standard 44.1 KHz sampling rate, a granule occurs approximately every 13 ms. Each granule contains 18 consecutive subband samples, where each subband sample is a vector of 32 frequency band amplitudes, each related to a subband of 689 Hz. The first observation is that the 32 filterbank constant bandwidths do not accurately reflect the human ear critical bands; in particular, each bandwidth is too wide for the lower frequencies and too narrow for the higher /08/$ IEEE 654 MMSP 2008

2 ones. Based on this considerations we can say that a subband of 689 Hz is too wide to represent the lowest critical bands. This is true especially for the first critical band. In order to face this limitation we will apply in the following a Discrete Wavelet Transform (DWT) to the first subband, thus obtaining two other subbands, each referring to a 345 Hz frequency range. The DWT is performed using a standard first order biorthogonal filter. The second important perceptual consideration regards empirical results showing that the ear has a limited frequency selectivity, which varies in acuity from less than 100 Hz for the lowest audible frequencies to more than 4 KHz for the highest ones. Thus, not all the audible spectrum is useful for a perceptual analysis and some frequencies are perceptually more meaningful than others. This observation reflects the resolving power of the ear as a function of frequency, which is usually approximated by the Fletcher s curve, representing the ear s sensitivity thresholds with respect to the sound pressure level. Looking at the Fletcher s curve, it is evident that frequencies over 17 KHz are not perceptually relevant. So, in our analysis the subbands from 26 to 32 (ranging from KHz to more than 20 KHz) are not taken into account. Bit Stream Unpacking and Dequantization Frame Granule Granule s 0 [1] s 0 [2] s 0 [18] s 0 [19] s 0 [36] s 1 [1] s 1 [2] s 2 [1] s 2 [2] s 29 [1] s 29 [2] s 30 [1] s 30 [2] s 31 [1] s 31 [2] s 1 [18] s 1 [19] s 2 [18] s 2 [19] s 29 [18] s 29 [19] s 30 [18] s 30 [19] s 31 [18] s 31 [19] Data used for psychoacustic analysis and feature extraction s 1 [36] s 2 [36] s 29 [36] s 30 [36] s 31 [36] Invese filterbank Fig. 1: Psychoacustic analysis and feature extraction data The last psychoacoustic observation derives once again from the Fletcher s curve. In particular, due to the high ear s sensitivity around the 4 KHz, the frequency range from 700 Hz to 7.5 KHz has been emphasized by an amplification of a factor 3 of the subbands from 2 to 11. In this way, after this perceptual-based computations, the actual data are made of 25 subband samples. This data will be used for the sheer feature extraction. III. FEATURE EXTRACTION PROCEDURE The first step of our analysis system is intended to extract some features from the audio data, in order to manipulate more meaningful information and to reduce the further processing. The features that are used here describe timbre, rhythm, and energy. We used for this experiment MP3, 44.1 KHz, 128Kbps, stereo files [7]. Stereo channels have been processed separately and the resulting features have been averaged in order to represent the whole stereo file. First of all, a root mean squared subband granule vector RMSGV TG is calculated as follows: 18 k=1 RMSGV TG [i] = s2 i [k], (1) 18 where index i, i =1...25, denotes the i-th element of the vector, each related to a subband; s i [k] is the k-th sample of the i-th subband; T G stands for Time of Granule and it indicates that the calculation refers to T G 13 ms of audio signal. In thesameway,thei-th element of a root mean squared subband frame vector RMSF V TF is calculated: 36 k=1 RMSF V TF [i] = s2 i [k], (2) 36 where i =1...25; T F stands for Time of Frame indicating that the calculation refers to T G 26 ms of audio signal. A. Timbral Features Timbre is currently defined in the literature as the perceptual feature that makes different two sounds with the same pitch and loudness. Features characterizing timbre can be found in [2] [5]. These features analyze the spectral distribution of the signal and are global, in the sense that they integrate the information of all sources and instruments at the same time. Most of these descriptors are computed at regular time intervals over short windows of typical length between 10 to 60 ms. In the context of classification, timbre descriptors are often summarized by evaluating low-order statistics of their distribution over larger windows commonly called texture windows [1], [2], [5]. Modeling timbre on a higher time scale not only reduces furtherly computation, but it is also perceptually more meaningful, since the short frames of signal used to evaluate features are not long enough for human perception. Consequently, the timbral features used in our system are the following ones. 1) Spectral centroid: the spectral centroid SC is the balancing point of the RMSGV TG vector and is defined as follows: 25 ( i RMSGVTG [i] ) SC = 25 RMSGV. (3) T G [i] 2) Spectral flux: the spectral flux SF represents the spectral difference between two temporally successive normalized granule vectors NGV TG.Thei-th element, i =1...25, is defined as: NGV TG [i] = RMSGV T G [i] RMSGV TG, (4) SF = 25 NGV TG [i] NGV TG 1[i] 2, (5) where the subscript T G 1 refers to the previous granule. 3) Spectral roll-off: the spectral roll-off is defined as the number SR of subbands where 85% of the whole granule energy lays. The SR is defined such that: SR 25 RMSGV TG [i] 0.85 RMSGV TG [i]. (6) 655

3 Once all the previous scalar features have been computed, we will consider their mean an variance in a larger window (77 granules) of about 1 s. This larger window is used to capture significant perceptive characteristics of audio signals. B. Energy Features Energy features are the least perceptual ones, they represent physical aspects of audio signals such as the energy content. This features are widely used in speech analysis and music speech classification systems [1], [2], [4] [6]. 1) RMS: this feature is a measure of the granules loudness and it is defined as: 25 RMS = RMSGV T 2 G [i] (7) 25 2) Low energy: this feature, denoted as LE, represents the percentage of granules in a 1 s window (77 granules) that have less than the average RMS power. 3) Pseudo-Cepstral coefficients: the cepstral coefficients are widely used in speech analysis. Cepstrum coefficients are defined as the Discrete Cosine Transform (DCT) of the log transformed Fourier coefficients of the signal. The availability of the 25 filterbank spectral coefficients allows us to bypass the Fourier transform step. In this way, 16 cepstral coefficients are obtained via the 16 DCT coefficients of the first 16 subband samples of a frame. Once again, excluding LE that already refers to a 1 s window, the actual features used for the analysis are the mean and variance of those 16 pseudo cepstral coefficients and of the RMS, in a larger window (77 granules for RMS and 38 frames for cepstral coefficients) of about 1 s. In addition to these features, for each 1 s window the overall sum of the means and variances of the 16 cepstral coefficients are computed. C. Rhythm Features Automatic rhythm description may be oriented toward different applications: tempo induction, beat tracking, meter induction, quantization of performed rhythm, or characterization of intentional timing deviations. They all works with PCM or MIDI audio data. A usual PCM automatic beat detector consists of a filterbank decomposition, followed by an envelope extraction step, and finally a periodicity detection algorithm that is used to detect the lag at which the signals envelope is most similar to itself [1]. In our case, the 25 filterbank coefficients of the MPEG encoder/decoder make possible the direct extraction of the evenlope of the signal. The feature set for representing rhythm structure is based on detecting the most salient periodicity of the signal. In Fig.2 it is shown the flow diagram of the proposed beat analysis algorithm. 1) Evenlope extraction and autocorrelation: using the 25 filterbank coefficients, the time domain amplitude envelope of the i-th subband, i =1...25, is extracted separately. This is achieved by the following processing steps: Full Wave Rectification (FWR): x i [k] = si [k]. FWR LPF MR Evenlope Extraction Evenlope Extrction Subbands Evenlope Extrction Autocorrelation Peak Picking Beat Histogram Evenlope Extrction Fig. 2: Flow diagram of the beat histogram calculation. This is applied in order to extract the temporal envelope of the signal rather than the time domain signal itself. Low-Pass Filtering (LPF): y i [k] = (1 α)x i [k] αy i [k 1]. A one-pole filter with α =0.99 is used to smooth the envelope. Full wave rectification followed by low-pass filtering is a standard envelope extraction technique. Mean Removal (MR): f i [k] =y i [k] E{y i [k]}, where E{ } denotes the expected value. The MR is applied in order to make the signal centered around zero for the autocorrelation stage. After mean removal, the envelopes of each band are then summed together: S[k] = 25 f i[k]. The autocorrelation of the resulting sum envelope is then computed: R[h] = 1 L S[m]S[m + h]. L m=1 As specified in the following, L is the length of a 1 s sequence. The autocorrelation function is further manipulated in order to reduce the effect of integer multiples of the basic periodicities. The original autocorrelation R[h] is clipped to positive values, downsampled by a factor 2, and subtracted from the original clipped function. The same process is repeated with other integer factors (i.e. 3, 4, and 5); in such a way the repetitive peaks at integer multiples are removed. The dominant peaks of the so obtained autocorrelation function correspond to the various periodicities of the signal envelope. This analysis refers to a window of 231 granules that at 44.1 KHz sampling rate corresponds to approximately 3 s, with an overlap of 154 granules corresponding to about 2 s. This larger window is necessary to capture the signal repetitions at the beat and subbeat levels. In this way the above mentioned autocorrelation is computed every 1 s of signal. 2) Peak detection and beat histogram computation: the first three peaks of each autocorrelation function (one for every 1 s of signal) are selected and accumulated over the whole sound file into a Beat Histogram (BH), where each bin corresponds to 656

4 the peak lag, i.e., the beat period in beats-per-minute (bpm). The lags of the histogram correspond to a range that spans from 40 to 200 bpm (corresponding respectively to periodicity from 1.5 s to 0.3 s). For each peak of autocorrelation functions, the peak amplitude is added to the histogram. That way, peaks having high amplitude (where the signal is highly similar) are weighted in the histogram computation more strongly than weaker peaks. So, when the signal is very similar to itself (strong beat) the histogram peaks will be higher. 3) Beat histogram features: The BH representation captures detailed information about the rhythmic content of the piece that can be used to effectively guess the musical genre of a song. Starting from this observation, a set of features based on the BH are calculated in order to represent rhythmic content. The latter are: P 1, P 2, P 3 : period of the first, second and third peak (in bpm); S: overall sum of the histogram (indication of beat strength); R 1, R 2, R 3 : first, second and third histogram peak divided by S (relative amplitude); P r 2 : ratio of the period (in bpm) of the second peak to the period (in bpm) of the first peak; P r 3 : ratio of the period (in bpm) of the third peak to the period (in bpm) of the first peak. D. Feature Vector The feature calculation described so far allows us to represent each piece of music as a 52 dimensional vector. It is important to notice that the first nine (rhythm) features are global features, in the sense that they are computed over the whole sound file. On the contrary, the other ones are computed every 1 s of music. So, the actual feature vector used to represent an audio file is obtained by the union of the 9 (global) rhythm features and the mean of the other 43 (local) features over the whole duration of the audio clip. IV. CLASSIFICATION MODEL The feature vectors obtained by the extraction procedure described in Sect. III will feed a classification system. The choice of the classification system is independent of the genre classification task; in this paper we will adopt the <Min-Max, PARC> classification system [8], whose performance has been ascertained in many real-world problems with respect to wellknown classification benchmarks. The classification model is a classical fuzzy Min-Max neural network. The classification strategy consists in covering the patterns of the training set with hyperboxes, whose boundary hyperplanes are parallel to the main reference system. The hyperbox can be considered as a crisp frame on which different types of membership functions can be adapted. The neurofuzzy classification model is trained by the Pruning Adaptive Resolution Classifier (PARC) algorithm. It is a constructive procedure able to generate a suitable succession of Min-Max classifiers, which are characterized by a decreasing Classical (Class 1) Node 1 Electronic (Class 2) Music Node 2 Node 3 Other Other Pop World (Class 3) (Class 4) Fig. 3: Proposed classification tree. complexity (number of neurons in the hidden layer). The regularized network is automatically selected according to learning theory, as the Ockham s Razor criterion. For this reason, PARC is characterized by a good generalization capability as well as a high automation degree. Consequently, it is particularly suited to be used as the core modeling system for a wrapper-like feature optimization procedure, such as the one described in the following Section. The implemented system consists in a classification tree where each node is a Min-Max classifier. Each node is trained to discriminate one genre among the remaining ones. As illustrated in Fig.3, the overall system can discriminate between the four said genres (classical, electronic, pop and world) in a sequential manner. The first node classical/other, decides whether the audio clip belongs to classical or not (other). All the patterns classified as other will feed the second node electronic/other of the tree. Again, all the patterns classified as other will feed the third node pop/world of the tree. V. AUTOMATIC FEATURE SELECTION In order to evaluate the effectiveness of the proposed features, an optimization strategy has been implemented. The technique is based on a genetic algorithm for selecting the optimal subset of features for the assigned task. Consequently, this reduces the input space dimension and improves the classification accuracy. Genetic algorithms are designed in order to manage a population of individuals, i.e. a set of potential solutions for the optimization problem at hand. Each individual is univocally represented by a genetic code, which is typically a string of binary digits. The fitness of a particular individual coincides with the corresponding value assumed by the objective function to be optimized. In our application, the adopted fitness function is a convex linear combination of two terms: F j =(1 λ)e j + λc j,wheree j is a performance measure on the training set S tr and C j is a complexity measure of the j-th classifier; λ [0, 1] is a meta-parameter by which the user can control the training procedure, taking into account that small values of λ yield more complex models and consequently more accuracy on S tr (the opposite situation is characterized by large values of λ). For a feature selection problem, the genetic code represents a subset of the original feature set; in our application, each 657

5 genetic code is made of 52 binary digits, where the n-th digit, n =1...52, is equal to one if the corresponding feature feeds the classification system, otherwise it is zero. The evolution starts from a population of P random selected individuals. At the k-th generation G k, with k =0,...,M gen, the next generation G k+1 is determined by applying standard selection, mutation and crossover operators. The behavior of the whole algorithm depends on P and M gen values, as well as on the mutation rate MR and on the crossover rate CR, which are two probability thresholds that control the related operators. The convergence of the genetic algorithm is assured by using elitism, i.e. by copying the best individual in the next generation. VI. PERFORMANCE EVALUATION The system s performance has been evaluated using different Web repositories (in particular all-music.com and mp3.com ), and different classification procedures (GMM, k NN, SVM, Min Max). In spite of the several tests we carried out, for the sake of synthesis and objectivity we will show in the following the results obtained using the ISMIR2004 genre classification contest [9]. The results in that database were obtained working with PCM audio data and using a six genres taxonomy: classical, electronic, jazz blues, metal punk, rock pop and world. On the basis of the consideration made in Sect. I, we use a four genres taxonomy: classical, electronic, pop (including jazz blues, metal punk and rock pop sub genres), and world. TABLE I: Data Composition Data Genre S (n) tr S (n) ts S ts classical electronic pop world Total It is important to notice that the classification systems of ISMIR2004 were tested on a Test_Tracks collection (made up of 700 tracks) that was supplied during the contest. This collection is not available, so our system has been trained with the Training_Tracks collection (like all the systems of the contest participants) and tested using the tracks of the Development_Tracks collection (supplied for the development of the participants classification systems and not for the test). Starting form this observation, it is evident that our system has been trained using a quantity of data that is smaller than the one used by the contest participants. The data set used in our experiment is available for download at the ISMIR2004 website; it is distributed and copyrighted by Mangatune.com and it is made up of two different collections. The first collection, called Training_Tracks, consists in 729 MP3, 128 Kbps stereo files sampled at 44.1 KHz, 16 bit, divided into 6 genres: classical, electronic, jazz blues, metal punk, rock pop and world. The files of this collection have been used for training each Min-Max neural network that made up the decision tree. The second collection, called Development_Tracks, consists in 729 MP3, 128 Kbps stereo files sampled at 44.1 KHz, 16 bit. The files of this collection are used for the validation of each Min- Max neural network that made up the decision tree and for the test of the overall system. A. Training In order to train each node, every clip of the Training_Tracks collection has been represented as a unique mean feature vector referred to 40 s at its center. For practical needs, the tracks with a duration less than 10 s have,shown in the first column of Table I, consists in 726 segments of 40 s, for a total duration of about 8 hours of music. An intermediate performance evaluation has been performed in order to check the reliability on each node of the classification tree. In order to test each node, every clip of the Development_Tracks collection has been represented as a unique mean feature vector referred to 40 s at its center. Once again, the tracks with a duration less than 10 s have been deleted. In this way, the node training set S (n) tr been deleted. In this way, the node test set S (n) ts, indicated in the second column of Table I, consists in 725 segments of 40 s, for a total duration of about 8 hours of music. The first node classical/other has been trained using all the four node training sets, where patterns belonging to electronic, pop and world classes have been relabeled as other. Thus, the whole training set for the first node is made up of all the 726 tracks. The node has been tested with all the four node test sets, made up of all the 725 tracks and labeled as for the training set. The second node electronic/other has been trained using the three node training sets related to electronic, pop and world classes. Patterns belonging to pop and world classes have been relabeled as other. Consequently, the whole training set of the second node is made up of = 409 files. The test set for this node is built in the same way, using the corresponding node test sets, made up of = 408 files. Finally, for the third node pop/world, only pop and world node training sets are used. Thus, the whole training set consists of = 294 files, while the test set, built in the same way, consists in = 295 files. In order to improve the classification results and to test the effectiveness of the proposed set of features, the genetic algorithm and feature selection technique described in Sect. V, was applied to the available data. In particular, for each node of the decision tree a genetic algorithm with P = 100, M gen = 100, MR = 0.3, and CR = 1 was ran. The genetic codes that minimize the classification error, according to the criterion proposed in the previous Section, are used for the implementation of an optimized classification tree. We obtained the node performances shown in Table II; also the actual number of used features N f is reported. 658

6 B. Test of the Whole Classification Tree The overall performance of the implemented decision tree was tested on the Development_Tracks collection. Tracks with a duration less than 10 s and greater than 10 min have been deleted. In this way, the test set S ts, illustrated in the third column of Table I, consists in 720 tracks for a total duration of about 47 hours of music. In particular the classification are made in the following way: each track was split in 40 s duration consecutive segments; a unique mean feature vector (with 52 components) are calculated for each segment; each segment flows through the decision tree and it is assigned a particular class; each track is classified on the basis of the most frequent class assigned to its segments; if two or more classes have the same frequency among the segments of the track, this one is classified as an indeterminate. TABLE II: Accuracy and Number of Selected Features N f Node Accuracy on S (n) tr Accuracy on S (n) ts N f Node % 88.97% 24 Node % 80.88% 26 Node 3 100% 83.73% 31 The performances of the optimized classification tree, using S ts, are reported in Table III. These results show a classification accuracy of 71.12% (512 correctly classified tracks) and an indetermination of 4.86% (35 tracks). C. Performance Comparison To make a comparison possible, our results have been compared with the best and worse ISMIR2004 contest results. The best result was achieved in [10], the worst in [11]. Both systems works with PCM audio data and uses Mel Frequency Cepstral Coefficients whit GMM and clustering techniques. These results show a 86.14% accuracy for the best and a 67.9% for the worst. Our system places in the middle with a 71.12% accuracy. In all the cases, the best discriminated class is classical; this result can be justified by two different kind of observations: a relevant presence of patterns belonging to this class in the training set and its intrinsic perceptual well-defined kind of musical genre. On the other side, the worst classified genre is world and a considerable overlap with classical is evident. This result can be physically justified by a real similarity in terms of timbre and rhythm structure of the tracks belonging to these two classes. As concerns pop and electronic, evenin this case results show a discrete overlap between the classes; this can be ascribed to a significative track similarity (this is especially true for the files belonging to electronic and the two sub-genres rock and pop belonging to the pop class). VII. CONCLUSION Definitively it can be said that, despite the fuzzy nature of genre boundaries, musical genre classification can be per- TABLE III: Confusion Matrix and Relative Accuracy (a) Confusion Matrix Output Target classical electronic pop world classical electronic pop world (b) Relative Accuracy Genre Total patterns Correctly classified patterns Accuracy classical % electronic % pop % world % formed automatically and directly in the compress domain with results and performance comparable to PCM genre classification. The classification system proposed in this paper, along with the psychoacoustic-based preprocessing of MP3 files, achieves a good accuracy. Nevertheless, this is obtained automatically, reducing to a minimum the intervention of human experts, as commonly required by Web and other multimedia applications. REFERENCES [1] G. Tzanetakis and P.Cook, Musical genre classification of audio signals, IEEE Transaction on Speech and Audio Processing, vol. 10, no. 5, July [2], Sound analysis using mpeg compressed audio, in Proceedings of the 2000 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP 2000), vol. 2, Instanbul, Turkey, June , pp [3] R. Jarina, N. Murphy, N. OConnor, and S. Marlow, An experiment in audio classification from compressed data, in International Workshop on Systems, Signals and Image Processing, Poznan, Poland, September [4] S. Kiranyaz, M. Aubazac, and M. Gabbouj, Unsupervised segmentation and classification over mp3 and aac audio bitstreams, in Proceedings of European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003), London, UK, April [5] A. Rizzi, N. M. Buccino, M. Panella, and A. Uncini, Optimal shorttime features for music/speech classification of compressed audio data, in Proceedings of IEEE International Conference on Computational Intelligence for Modelling, Control and Automation (CIMCA06), Sydney, Australia, Noveber [6] D. Pye, Content-based methods for managing electronic music, in Proceedings of the 2000 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP 2000), vol. 4, Instanbul, Turkey, June , pp [7] Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbits/s Part 3: Audio, ISO/IEC International Standard IS , Information Technology Std. [8] A. Rizzi, M. Panella, and F. M. F. Mascioli, Adaptive resolution minmax classifiers, IEEE Transactions on Neural Networks, vol. 13, no. 2, pp , March [9] Availabe at: [10] E. Pampalk, A matlab toolbox to compute music similarity from audio, in Proceedings of International Symposium on Music Information Retrieval (ISMIR2004), Barcelona, Spain, October [11] D. Ellis and B. Whitman, Automatic record reviews, in Proceedings of International Symposium on Music Information Retrieval (ISMIR2004), Barcelona, Spain, October , pp

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

The Automatic Musicologist

The Automatic Musicologist The Automatic Musicologist Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego UCSD AI Seminar April 12, 2004 Based on the paper: Fast Recognition of Musical

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

1 Introduction. 3 Data Preprocessing. 2 Literature Review

1 Introduction. 3 Data Preprocessing. 2 Literature Review Rock or not? This sure does. [Category] Audio & Music CS 229 Project Report Anand Venkatesan(anand95), Arjun Parthipan(arjun777), Lakshmi Manoharan(mlakshmi) 1 Introduction Music Genre Classification continues

More information

A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO

A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO International journal of computer science & information Technology (IJCSIT) Vol., No.5, October A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO Pranab Kumar Dhar *, Mohammad

More information

MPEG-1 Bitstreams Processing for Audio Content Analysis

MPEG-1 Bitstreams Processing for Audio Content Analysis ISSC, Cork. June 5- MPEG- Bitstreams Processing for Audio Content Analysis Roman Jarina, Orla Duffner, Seán Marlow, Noel O Connor, and Noel Murphy Visual Media Processing Group Dublin City University Glasnevin,

More information

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014 MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION Steve Tjoa kiemyang@gmail.com June 25, 2014 Review from Day 2 Supervised vs. Unsupervised Unsupervised - clustering Supervised binary classifiers (2 classes)

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek

More information

Mpeg 1 layer 3 (mp3) general overview

Mpeg 1 layer 3 (mp3) general overview Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,

More information

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.

More information

Chapter 14 MPEG Audio Compression

Chapter 14 MPEG Audio Compression Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1

More information

Speech-Music Discrimination from MPEG-1 Bitstream

Speech-Music Discrimination from MPEG-1 Bitstream Speech-Music Discrimination from MPEG-1 Bitstream ROMAN JARINA, NOEL MURPHY, NOEL O CONNOR, SEÁN MARLOW Centre for Digital Video Processing / RINCE Dublin City University, Dublin 9 IRELAND jarinar@eeng.dcu.ie

More information

Adaptive Resolution Min-Max Classifiers

Adaptive Resolution Min-Max Classifiers 402 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 2, MARCH 2002 Adaptive Resolution Min-Max Classifiers Antonello Rizzi, Massimo Panella, and Fabio Massimo Frattale Mascioli Abstract A high automation

More information

5: Music Compression. Music Coding. Mark Handley

5: Music Compression. Music Coding. Mark Handley 5: Music Compression Mark Handley Music Coding LPC-based codecs model the sound source to achieve good compression. Works well for voice. Terrible for music. What if you can t model the source? Model the

More information

Multimedia Database Systems. Retrieval by Content

Multimedia Database Systems. Retrieval by Content Multimedia Database Systems Retrieval by Content MIR Motivation Large volumes of data world-wide are not only based on text: Satellite images (oil spill), deep space images (NASA) Medical images (X-rays,

More information

Appendix 4. Audio coding algorithms

Appendix 4. Audio coding algorithms Appendix 4. Audio coding algorithms 1 Introduction The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically

More information

Optical Storage Technology. MPEG Data Compression

Optical Storage Technology. MPEG Data Compression Optical Storage Technology MPEG Data Compression MPEG-1 1 Audio Standard Moving Pictures Expert Group (MPEG) was formed in 1988 to devise compression techniques for audio and video. It first devised the

More information

AUDIO information often plays an essential role in understanding

AUDIO information often plays an essential role in understanding 1062 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval Serkan Kiranyaz,

More information

A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval

A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval 1 A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval Serkan Kiranyaz,

More information

Automatic Classification of Audio Data

Automatic Classification of Audio Data Automatic Classification of Audio Data Carlos H. C. Lopes, Jaime D. Valle Jr. & Alessandro L. Koerich IEEE International Conference on Systems, Man and Cybernetics The Hague, The Netherlands October 2004

More information

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio: Audio Compression Audio Compression CD quality audio: Sampling rate = 44 KHz, Quantization = 16 bits/sample Bit-rate = ~700 Kb/s (1.41 Mb/s if 2 channel stereo) Telephone-quality speech Sampling rate =

More information

Lecture 16 Perceptual Audio Coding

Lecture 16 Perceptual Audio Coding EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero

More information

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri 1 CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING Alexander Wankhammer Peter Sciri introduction./the idea > overview What is musical structure?

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 213 http://acousticalsociety.org/ ICA 213 Montreal Montreal, Canada 2-7 June 213 Engineering Acoustics Session 2pEAb: Controlling Sound Quality 2pEAb1. Subjective

More information

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur Module 9 AUDIO CODING Lesson 29 Transform and Filter banks Instructional Objectives At the end of this lesson, the students should be able to: 1. Define the three layers of MPEG-1 audio coding. 2. Define

More information

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Extracting meaning

More information

A Robust Audio Fingerprinting Algorithm in MP3 Compressed Domain

A Robust Audio Fingerprinting Algorithm in MP3 Compressed Domain A Robust Audio Fingerprinting Algorithm in MP3 Compressed Domain Ruili Zhou, Yuesheng Zhu Abstract In this paper, a new robust audio fingerprinting algorithm in MP3 compressed domain is proposed with high

More information

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,

More information

2.4 Audio Compression

2.4 Audio Compression 2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and

More information

Speech and audio coding

Speech and audio coding Institut Mines-Telecom Speech and audio coding Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced compression Outline Introduction Introduction Speech signal Music signal Masking Codeurs simples

More information

Repeating Segment Detection in Songs using Audio Fingerprint Matching

Repeating Segment Detection in Songs using Audio Fingerprint Matching Repeating Segment Detection in Songs using Audio Fingerprint Matching Regunathan Radhakrishnan and Wenyu Jiang Dolby Laboratories Inc, San Francisco, USA E-mail: regu.r@dolby.com Institute for Infocomm

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Multimedia Communications. Audio coding

Multimedia Communications. Audio coding Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated

More information

ELL 788 Computational Perception & Cognition July November 2015

ELL 788 Computational Perception & Cognition July November 2015 ELL 788 Computational Perception & Cognition July November 2015 Module 11 Audio Engineering: Perceptual coding Coding and decoding Signal (analog) Encoder Code (Digital) Code (Digital) Decoder Signal (analog)

More information

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Patrick Brown EE382C Embedded Software Systems May 10, 2000 $EVWUDFW MPEG Audio Layer-3 is a standard for the compression of high-quality digital audio.

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 13 Audio Signal Processing 14/04/01 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Performance Analysis of Discrete Wavelet Transform based Audio Watermarking on Indian Classical Songs

Performance Analysis of Discrete Wavelet Transform based Audio Watermarking on Indian Classical Songs Volume 73 No.6, July 2013 Performance Analysis of Discrete Wavelet Transform based Audio ing on Indian Classical Songs C. M. Juli Janardhanan Department of ECE Government Engineering College, Wayanad Mananthavady,

More information

Speech Modulation for Image Watermarking

Speech Modulation for Image Watermarking Speech Modulation for Image Watermarking Mourad Talbi 1, Ben Fatima Sira 2 1 Center of Researches and Technologies of Energy, Tunisia 2 Engineering School of Tunis, Tunisia Abstract Embedding a hidden

More information

Available online Journal of Scientific and Engineering Research, 2016, 3(4): Research Article

Available online   Journal of Scientific and Engineering Research, 2016, 3(4): Research Article Available online www.jsaer.com, 2016, 3(4):417-422 Research Article ISSN: 2394-2630 CODEN(USA): JSERBR Automatic Indexing of Multimedia Documents by Neural Networks Dabbabi Turkia 1, Lamia Bouafif 2, Ellouze

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Audio Fundamentals, Compression Techniques & Standards Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Outlines Audio Fundamentals Sampling, digitization, quantization μ-law

More information

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding. Introduction to Digital Audio Compression B. Cavagnolo and J. Bier Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, CA 94704 (510) 665-1600 info@bdti.com http://www.bdti.com INTRODUCTION

More information

Audio Segmentation and Classification. Abdillahi Hussein Omar

Audio Segmentation and Classification. Abdillahi Hussein Omar Audio Segmentation and Classification Abdillahi Hussein Omar Kgs. Lyngby 2005 Preface The work presented in this thesis has been carried out at the Intelligent Signal Processing Group, at the Institute

More information

Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay

Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay ACOUSTICAL LETTER Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay Kazuhiro Kondo and Kiyoshi Nakagawa Graduate School of Science and Engineering, Yamagata University,

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 6 Audio Retrieval 6 Audio Retrieval 6.1 Basics of

More information

Overcompressing JPEG images with Evolution Algorithms

Overcompressing JPEG images with Evolution Algorithms Author manuscript, published in "EvoIASP2007, Valencia : Spain (2007)" Overcompressing JPEG images with Evolution Algorithms Jacques Lévy Véhel 1, Franklin Mendivil 2 and Evelyne Lutton 1 1 Inria, Complex

More information

CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION

CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION In chapter 4, SVD based watermarking schemes are proposed which met the requirement of imperceptibility, having high payload and

More information

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved. Compressed Audio Demystified Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity

More information

Adaptive Quantization for Video Compression in Frequency Domain

Adaptive Quantization for Video Compression in Frequency Domain Adaptive Quantization for Video Compression in Frequency Domain *Aree A. Mohammed and **Alan A. Abdulla * Computer Science Department ** Mathematic Department University of Sulaimani P.O.Box: 334 Sulaimani

More information

Data Hiding in Video

Data Hiding in Video Data Hiding in Video J. J. Chae and B. S. Manjunath Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 9316-956 Email: chaejj, manj@iplab.ece.ucsb.edu Abstract

More information

Bit or Noise Allocation

Bit or Noise Allocation ISO 11172-3:1993 ANNEXES C & D 3-ANNEX C (informative) THE ENCODING PROCESS 3-C.1 Encoder 3-C.1.1 Overview For each of the Layers, an example of one suitable encoder with the corresponding flow-diagram

More information

Spectral modeling of musical sounds

Spectral modeling of musical sounds Spectral modeling of musical sounds Xavier Serra Audiovisual Institute, Pompeu Fabra University http://www.iua.upf.es xserra@iua.upf.es 1. Introduction Spectral based analysis/synthesis techniques offer

More information

Compression of Stereo Images using a Huffman-Zip Scheme

Compression of Stereo Images using a Huffman-Zip Scheme Compression of Stereo Images using a Huffman-Zip Scheme John Hamann, Vickey Yeh Department of Electrical Engineering, Stanford University Stanford, CA 94304 jhamann@stanford.edu, vickey@stanford.edu Abstract

More information

Cepstral Analysis Tools for Percussive Timbre Identification

Cepstral Analysis Tools for Percussive Timbre Identification Cepstral Analysis Tools for Percussive Timbre Identification William Brent Department of Music and Center for Research in Computing and the Arts University of California, San Diego wbrent@ucsd.edu ABSTRACT

More information

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings MPEG-1 Overview of MPEG-1 1 Standard Introduction to perceptual and entropy codings Contents History Psychoacoustics and perceptual coding Entropy coding MPEG-1 Layer I/II Layer III (MP3) Comparison and

More information

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06 Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06 Goals of Lab Introduction to fundamental principles of digital audio & perceptual audio encoding Learn the basics of psychoacoustic

More information

MPEG-7 Audio: Tools for Semantic Audio Description and Processing

MPEG-7 Audio: Tools for Semantic Audio Description and Processing MPEG-7 Audio: Tools for Semantic Audio Description and Processing Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Why semantic description

More information

CHAPTER 3. Preprocessing and Feature Extraction. Techniques

CHAPTER 3. Preprocessing and Feature Extraction. Techniques CHAPTER 3 Preprocessing and Feature Extraction Techniques CHAPTER 3 Preprocessing and Feature Extraction Techniques 3.1 Need for Preprocessing and Feature Extraction schemes for Pattern Recognition and

More information

Authentication and Secret Message Transmission Technique Using Discrete Fourier Transformation

Authentication and Secret Message Transmission Technique Using Discrete Fourier Transformation , 2009, 5, 363-370 doi:10.4236/ijcns.2009.25040 Published Online August 2009 (http://www.scirp.org/journal/ijcns/). Authentication and Secret Message Transmission Technique Using Discrete Fourier Transformation

More information

DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING

DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING Christopher Burges, Daniel Plastina, John Platt, Erin Renshaw, and Henrique Malvar March 24 Technical Report MSR-TR-24-19 Audio fingerprinting

More information

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Ralf Geiger 1, Gerald Schuller 1, Jürgen Herre 2, Ralph Sperschneider 2, Thomas Sporer 1 1 Fraunhofer IIS AEMT, Ilmenau, Germany 2 Fraunhofer

More information

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model 1 M. Chinna Rao M.Tech,(Ph.D) Research scholar, JNTUK,kakinada chinnarao.mortha@gmail.com 2 Dr. A.V.S.N. Murthy Professor of Mathematics,

More information

Wavelet filter bank based wide-band audio coder

Wavelet filter bank based wide-band audio coder Wavelet filter bank based wide-band audio coder J. Nováček Czech Technical University, Faculty of Electrical Engineering, Technicka 2, 16627 Prague, Czech Republic novacj1@fel.cvut.cz 3317 New system for

More information

QueST: Querying Music Databases by Acoustic and Textual Features

QueST: Querying Music Databases by Acoustic and Textual Features QueST: Querying Music Databases by Acoustic and Textual Features Bin Cui 1 Ling Liu 2 Calton Pu 2 Jialie Shen 3 Kian-Lee Tan 4 1 Department of Computer Science & National Lab on Machine Perception, Peking

More information

An investigation of non-uniform bandwidths auditory filterbank in audio coding

An investigation of non-uniform bandwidths auditory filterbank in audio coding PAGE 360 An investigation of non-uniform bandwidths auditory filterbank in audio coding Andrew Lin, Stevan Berber, Waleed Abdulla Department of Electrical and Computer Engineering University of Auckland,

More information

CS229 Final Project: Audio Query By Gesture

CS229 Final Project: Audio Query By Gesture CS229 Final Project: Audio Query By Gesture by Steinunn Arnardottir, Luke Dahl and Juhan Nam {steinunn,lukedahl,juhan}@ccrma.stanford.edu December 2, 28 Introduction In the field of Music Information Retrieval

More information

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd Overview Audio Signal Processing Applications @ Dolby Audio Signal Processing Basics

More information

Audio and video compression

Audio and video compression Audio and video compression 4.1 introduction Unlike text and images, both audio and most video signals are continuously varying analog signals. Compression algorithms associated with digitized audio and

More information

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

New Results in Low Bit Rate Speech Coding and Bandwidth Extension Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without

More information

A GENERIC SYSTEM FOR AUDIO INDEXING: APPLICATION TO SPEECH/ MUSIC SEGMENTATION AND MUSIC GENRE RECOGNITION

A GENERIC SYSTEM FOR AUDIO INDEXING: APPLICATION TO SPEECH/ MUSIC SEGMENTATION AND MUSIC GENRE RECOGNITION A GENERIC SYSTEM FOR AUDIO INDEXING: APPLICATION TO SPEECH/ MUSIC SEGMENTATION AND MUSIC GENRE RECOGNITION Geoffroy Peeters IRCAM - Sound Analysis/Synthesis Team, CNRS - STMS Paris, France peeters@ircam.fr

More information

Music Genre Classification

Music Genre Classification Music Genre Classification Matthew Creme, Charles Burlin, Raphael Lenain Stanford University December 15, 2016 Abstract What exactly is it that makes us, humans, able to tell apart two songs of different

More information

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories,

More information

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia? Multimedia What is multimedia? Media types +Text + Graphics + Audio +Image +Video Interchange formats What is multimedia? Multimedia = many media User interaction = interactivity Script = time 1 2 Most

More information

Parametric Coding of High-Quality Audio

Parametric Coding of High-Quality Audio Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits

More information

Voice Command Based Computer Application Control Using MFCC

Voice Command Based Computer Application Control Using MFCC Voice Command Based Computer Application Control Using MFCC Abinayaa B., Arun D., Darshini B., Nataraj C Department of Embedded Systems Technologies, Sri Ramakrishna College of Engineering, Coimbatore,

More information

Compression of RADARSAT Data with Block Adaptive Wavelets Abstract: 1. Introduction

Compression of RADARSAT Data with Block Adaptive Wavelets Abstract: 1. Introduction Compression of RADARSAT Data with Block Adaptive Wavelets Ian Cumming and Jing Wang Department of Electrical and Computer Engineering The University of British Columbia 2356 Main Mall, Vancouver, BC, Canada

More information

Detection of goal event in soccer videos

Detection of goal event in soccer videos Detection of goal event in soccer videos Hyoung-Gook Kim, Steffen Roeber, Amjad Samour, Thomas Sikora Department of Communication Systems, Technical University of Berlin, Einsteinufer 17, D-10587 Berlin,

More information

CISC 7610 Lecture 3 Multimedia data and data formats

CISC 7610 Lecture 3 Multimedia data and data formats CISC 7610 Lecture 3 Multimedia data and data formats Topics: Perceptual limits of multimedia data JPEG encoding of images MPEG encoding of audio MPEG and H.264 encoding of video Multimedia data: Perceptual

More information

Using Noise Substitution for Backwards-Compatible Audio Codec Improvement

Using Noise Substitution for Backwards-Compatible Audio Codec Improvement Using Noise Substitution for Backwards-Compatible Audio Codec Improvement Colin Raffel AES 129th Convention San Francisco, CA February 16, 2011 Outline Introduction and Motivation Coding Error Analysis

More information

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 1 Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 33720, Tampere, Finland toni.heittola@tut.fi,

More information

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1 Multimedia What is multimedia? Media types + Text +Graphics +Audio +Image +Video Interchange formats Petri Vuorimaa 1 What is multimedia? Multimedia = many media User interaction = interactivity Script

More information

DAB. Digital Audio Broadcasting

DAB. Digital Audio Broadcasting DAB Digital Audio Broadcasting DAB history DAB has been under development since 1981 at the Institut für Rundfunktechnik (IRT). In 1985 the first DAB demonstrations were held at the WARC-ORB in Geneva

More information

Dietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++

Dietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++ Dietrich Paulus Joachim Hornegger Pattern Recognition of Images and Speech in C++ To Dorothea, Belinda, and Dominik In the text we use the following names which are protected, trademarks owned by a company

More information

A Brief Overview of Audio Information Retrieval. Unjung Nam CCRMA Stanford University

A Brief Overview of Audio Information Retrieval. Unjung Nam CCRMA Stanford University A Brief Overview of Audio Information Retrieval Unjung Nam CCRMA Stanford University 1 Outline What is AIR? Motivation Related Field of Research Elements of AIR Experiments and discussion Music Classification

More information

Politecnico di Torino. Porto Institutional Repository

Politecnico di Torino. Porto Institutional Repository Politecnico di Torino Porto Institutional Repository [Proceeding] Detection and classification of double compressed MP3 audio tracks Original Citation: Tiziano Bianchi;Alessia De Rosa;Marco Fontani;Giovanni

More information

A Miniature-Based Image Retrieval System

A Miniature-Based Image Retrieval System A Miniature-Based Image Retrieval System Md. Saiful Islam 1 and Md. Haider Ali 2 Institute of Information Technology 1, Dept. of Computer Science and Engineering 2, University of Dhaka 1, 2, Dhaka-1000,

More information

Video Key-Frame Extraction using Entropy value as Global and Local Feature

Video Key-Frame Extraction using Entropy value as Global and Local Feature Video Key-Frame Extraction using Entropy value as Global and Local Feature Siddu. P Algur #1, Vivek. R *2 # Department of Information Science Engineering, B.V. Bhoomraddi College of Engineering and Technology

More information

Image Classification Using Wavelet Coefficients in Low-pass Bands

Image Classification Using Wavelet Coefficients in Low-pass Bands Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August -7, 007 Image Classification Using Wavelet Coefficients in Low-pass Bands Weibao Zou, Member, IEEE, and Yan

More information

CSCD 443/533 Advanced Networks Fall 2017

CSCD 443/533 Advanced Networks Fall 2017 CSCD 443/533 Advanced Networks Fall 2017 Lecture 18 Compression of Video and Audio 1 Topics Compression technology Motivation Human attributes make it possible Audio Compression Video Compression Performance

More information

Compression transparent low-level description of audio signals

Compression transparent low-level description of audio signals University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 25 Compression transparent low-level description of audio signals Jason

More information

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Torsten Palfner, Alexander Mali and Erika Müller Institute of Telecommunications and Information Technology, University of

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

IMPROVED CONTEXT-ADAPTIVE ARITHMETIC CODING IN H.264/AVC

IMPROVED CONTEXT-ADAPTIVE ARITHMETIC CODING IN H.264/AVC 17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 IMPROVED CONTEXT-ADAPTIVE ARITHMETIC CODING IN H.264/AVC Damian Karwowski, Marek Domański Poznań University

More information

CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING. domain. In spatial domain the watermark bits directly added to the pixels of the cover

CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING. domain. In spatial domain the watermark bits directly added to the pixels of the cover 38 CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING Digital image watermarking can be done in both spatial domain and transform domain. In spatial domain the watermark bits directly added to the pixels of the

More information

An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria

An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria N. RUIZ REYES 1, M. ROSA ZURERA 2, F. LOPEZ FERRERAS 2, D. MARTINEZ MUÑOZ 1 1 Departamento

More information

CHAPTER 7 MUSIC INFORMATION RETRIEVAL

CHAPTER 7 MUSIC INFORMATION RETRIEVAL 163 CHAPTER 7 MUSIC INFORMATION RETRIEVAL Using the music and non-music components extracted, as described in chapters 5 and 6, we can design an effective Music Information Retrieval system. In this era

More information