Genre Classification of Compressed Audio Data
|
|
- Edwin Hill
- 6 years ago
- Views:
Transcription
1 Genre Classification of Compressed Audio Data Antonello Rizzi, Nicola Maurizio Buccino, Massimo Panella, and Aurelio Uncini INFO COM Department, University of Rome La Sapienza Via Eudossiana 18, Rome, Italy Abstract This paper deals with the musical genre classification problem, starting from a set of features extracted directly from MPEG 1 layer III compressed audio data. The automatic classification of compressed audio signals into a short hierarchy of musical genres is explored. More specifically, three feature sets for representing timbre, rhythmic content and energy content are proposed for a four leafs tree genre hierarchy. The adopted set of features are computed from the spectral information available in the MPEG decoding stage. The performance and relative importance of the proposed approach is investigated by training a classification model using the audio collections proposed in musical genre contests. We also used an optimization strategy based on genetic algorithms. The results are comparable to those obtained by PCM based musical genre classification systems. I. INTRODUCTION Genre hierarchies, typically created manually by human experts, are currently one of the ways used to structure music contents on the WEB. Automatic musical genre classification can potentially automate this process and provide an important component for a complete music information retrieval system for audio signals. In this paper, the problem of automatically classifying MPEG 1 layer III (MP3) compressed audio signals into a hierarchy of musical genres is addressed. Although music on the Web is usually in a compressed form, in particular MP3, most of the known techniques use features calculated form PCM or MIDI audio data [1]. The technique proposed in this paper allows the direct classification of partially decompressed data, thus avoiding the complete PCM decompression. For the datasets proposed herein, we will consider a four leafs taxonomy made of classical, electronic, pop and world music. The adopted taxonomy seems to be a good compromise between the physical and perceptual features that characterize a musical genre. As concerns research works related to the classification problem using MPEG compressed audio data very little has been done, mostly regarding the music/speech classification task [2] [5]. An interesting work on musical genre classification in the compressed domain is illustrated in [6]. In this paper, the Author makes a classification in 6 different genres: blues, easy listening, classical, opera, dance (techno) and indie rock. The features used for this task are energy related features, in particular the cepstrum coefficients and two different classification strategies are compared: a GMM (Gaussian Mixture Model) and a Vector Tree Quantization (VTQ). Results show a 90.9% accuracy for GMM and 85.1% for VTQ. Comparing these results with the ones obtained using PCM audio data, there is an accuracy deterioration of about 4%. In our paper, three sets of features for representing timbre, rhythmic content and energy content are proposed. Although timbral and energy features are also used for speech and general sound classification, the rhythmic feature set is novel and specifically designed to represent aspects of musical content. Moreover, a psychoacoustic preprocessing is performed in order to enhance the perceptual aspect of the proposed features set. Its performance and relative importance is evaluated by training a classification model and by an automatic feature selection procedure based on a genetic optimization technique. Data used for the evaluation of the proposed classification system are audio collections available in well known repositories on the Web. II. PSYCHOACOUSTIC REMARKS In our system some psychoacoustic considerations are taken into account, before the sheer feature extraction stage, in order to extract the significant information related to subbands. As can be seen in Fig.1, the process starts as a normal MP3 decompression, including bitstream parsing and frequency sample de-quantization. Once subband data become available, they are used as a source for further computations rather than for synthesizing actual samples with the synthesis filter. We consider in this paper MP3 compressed audio data [7]. In the first step of the MPEG encoding process, the audio signal is converted into 32 equally spaced spectral components using an analysis filterbank. For every 32 consecutive PCM input audio samples (corresponding to T =32 T c seconds of audio signal sampled at a period of T c seconds), the filterbank provides 32 subband samples s i [k] = s i (kt), one sample per subband indexed by i = The layer III algorithm groups the input signal in frames made of 1152 PCM samples. Each MP3 frame consists of two granules, each of 576 PCM samples. With a standard 44.1 KHz sampling rate, a granule occurs approximately every 13 ms. Each granule contains 18 consecutive subband samples, where each subband sample is a vector of 32 frequency band amplitudes, each related to a subband of 689 Hz. The first observation is that the 32 filterbank constant bandwidths do not accurately reflect the human ear critical bands; in particular, each bandwidth is too wide for the lower frequencies and too narrow for the higher /08/$ IEEE 654 MMSP 2008
2 ones. Based on this considerations we can say that a subband of 689 Hz is too wide to represent the lowest critical bands. This is true especially for the first critical band. In order to face this limitation we will apply in the following a Discrete Wavelet Transform (DWT) to the first subband, thus obtaining two other subbands, each referring to a 345 Hz frequency range. The DWT is performed using a standard first order biorthogonal filter. The second important perceptual consideration regards empirical results showing that the ear has a limited frequency selectivity, which varies in acuity from less than 100 Hz for the lowest audible frequencies to more than 4 KHz for the highest ones. Thus, not all the audible spectrum is useful for a perceptual analysis and some frequencies are perceptually more meaningful than others. This observation reflects the resolving power of the ear as a function of frequency, which is usually approximated by the Fletcher s curve, representing the ear s sensitivity thresholds with respect to the sound pressure level. Looking at the Fletcher s curve, it is evident that frequencies over 17 KHz are not perceptually relevant. So, in our analysis the subbands from 26 to 32 (ranging from KHz to more than 20 KHz) are not taken into account. Bit Stream Unpacking and Dequantization Frame Granule Granule s 0 [1] s 0 [2] s 0 [18] s 0 [19] s 0 [36] s 1 [1] s 1 [2] s 2 [1] s 2 [2] s 29 [1] s 29 [2] s 30 [1] s 30 [2] s 31 [1] s 31 [2] s 1 [18] s 1 [19] s 2 [18] s 2 [19] s 29 [18] s 29 [19] s 30 [18] s 30 [19] s 31 [18] s 31 [19] Data used for psychoacustic analysis and feature extraction s 1 [36] s 2 [36] s 29 [36] s 30 [36] s 31 [36] Invese filterbank Fig. 1: Psychoacustic analysis and feature extraction data The last psychoacoustic observation derives once again from the Fletcher s curve. In particular, due to the high ear s sensitivity around the 4 KHz, the frequency range from 700 Hz to 7.5 KHz has been emphasized by an amplification of a factor 3 of the subbands from 2 to 11. In this way, after this perceptual-based computations, the actual data are made of 25 subband samples. This data will be used for the sheer feature extraction. III. FEATURE EXTRACTION PROCEDURE The first step of our analysis system is intended to extract some features from the audio data, in order to manipulate more meaningful information and to reduce the further processing. The features that are used here describe timbre, rhythm, and energy. We used for this experiment MP3, 44.1 KHz, 128Kbps, stereo files [7]. Stereo channels have been processed separately and the resulting features have been averaged in order to represent the whole stereo file. First of all, a root mean squared subband granule vector RMSGV TG is calculated as follows: 18 k=1 RMSGV TG [i] = s2 i [k], (1) 18 where index i, i =1...25, denotes the i-th element of the vector, each related to a subband; s i [k] is the k-th sample of the i-th subband; T G stands for Time of Granule and it indicates that the calculation refers to T G 13 ms of audio signal. In thesameway,thei-th element of a root mean squared subband frame vector RMSF V TF is calculated: 36 k=1 RMSF V TF [i] = s2 i [k], (2) 36 where i =1...25; T F stands for Time of Frame indicating that the calculation refers to T G 26 ms of audio signal. A. Timbral Features Timbre is currently defined in the literature as the perceptual feature that makes different two sounds with the same pitch and loudness. Features characterizing timbre can be found in [2] [5]. These features analyze the spectral distribution of the signal and are global, in the sense that they integrate the information of all sources and instruments at the same time. Most of these descriptors are computed at regular time intervals over short windows of typical length between 10 to 60 ms. In the context of classification, timbre descriptors are often summarized by evaluating low-order statistics of their distribution over larger windows commonly called texture windows [1], [2], [5]. Modeling timbre on a higher time scale not only reduces furtherly computation, but it is also perceptually more meaningful, since the short frames of signal used to evaluate features are not long enough for human perception. Consequently, the timbral features used in our system are the following ones. 1) Spectral centroid: the spectral centroid SC is the balancing point of the RMSGV TG vector and is defined as follows: 25 ( i RMSGVTG [i] ) SC = 25 RMSGV. (3) T G [i] 2) Spectral flux: the spectral flux SF represents the spectral difference between two temporally successive normalized granule vectors NGV TG.Thei-th element, i =1...25, is defined as: NGV TG [i] = RMSGV T G [i] RMSGV TG, (4) SF = 25 NGV TG [i] NGV TG 1[i] 2, (5) where the subscript T G 1 refers to the previous granule. 3) Spectral roll-off: the spectral roll-off is defined as the number SR of subbands where 85% of the whole granule energy lays. The SR is defined such that: SR 25 RMSGV TG [i] 0.85 RMSGV TG [i]. (6) 655
3 Once all the previous scalar features have been computed, we will consider their mean an variance in a larger window (77 granules) of about 1 s. This larger window is used to capture significant perceptive characteristics of audio signals. B. Energy Features Energy features are the least perceptual ones, they represent physical aspects of audio signals such as the energy content. This features are widely used in speech analysis and music speech classification systems [1], [2], [4] [6]. 1) RMS: this feature is a measure of the granules loudness and it is defined as: 25 RMS = RMSGV T 2 G [i] (7) 25 2) Low energy: this feature, denoted as LE, represents the percentage of granules in a 1 s window (77 granules) that have less than the average RMS power. 3) Pseudo-Cepstral coefficients: the cepstral coefficients are widely used in speech analysis. Cepstrum coefficients are defined as the Discrete Cosine Transform (DCT) of the log transformed Fourier coefficients of the signal. The availability of the 25 filterbank spectral coefficients allows us to bypass the Fourier transform step. In this way, 16 cepstral coefficients are obtained via the 16 DCT coefficients of the first 16 subband samples of a frame. Once again, excluding LE that already refers to a 1 s window, the actual features used for the analysis are the mean and variance of those 16 pseudo cepstral coefficients and of the RMS, in a larger window (77 granules for RMS and 38 frames for cepstral coefficients) of about 1 s. In addition to these features, for each 1 s window the overall sum of the means and variances of the 16 cepstral coefficients are computed. C. Rhythm Features Automatic rhythm description may be oriented toward different applications: tempo induction, beat tracking, meter induction, quantization of performed rhythm, or characterization of intentional timing deviations. They all works with PCM or MIDI audio data. A usual PCM automatic beat detector consists of a filterbank decomposition, followed by an envelope extraction step, and finally a periodicity detection algorithm that is used to detect the lag at which the signals envelope is most similar to itself [1]. In our case, the 25 filterbank coefficients of the MPEG encoder/decoder make possible the direct extraction of the evenlope of the signal. The feature set for representing rhythm structure is based on detecting the most salient periodicity of the signal. In Fig.2 it is shown the flow diagram of the proposed beat analysis algorithm. 1) Evenlope extraction and autocorrelation: using the 25 filterbank coefficients, the time domain amplitude envelope of the i-th subband, i =1...25, is extracted separately. This is achieved by the following processing steps: Full Wave Rectification (FWR): x i [k] = si [k]. FWR LPF MR Evenlope Extraction Evenlope Extrction Subbands Evenlope Extrction Autocorrelation Peak Picking Beat Histogram Evenlope Extrction Fig. 2: Flow diagram of the beat histogram calculation. This is applied in order to extract the temporal envelope of the signal rather than the time domain signal itself. Low-Pass Filtering (LPF): y i [k] = (1 α)x i [k] αy i [k 1]. A one-pole filter with α =0.99 is used to smooth the envelope. Full wave rectification followed by low-pass filtering is a standard envelope extraction technique. Mean Removal (MR): f i [k] =y i [k] E{y i [k]}, where E{ } denotes the expected value. The MR is applied in order to make the signal centered around zero for the autocorrelation stage. After mean removal, the envelopes of each band are then summed together: S[k] = 25 f i[k]. The autocorrelation of the resulting sum envelope is then computed: R[h] = 1 L S[m]S[m + h]. L m=1 As specified in the following, L is the length of a 1 s sequence. The autocorrelation function is further manipulated in order to reduce the effect of integer multiples of the basic periodicities. The original autocorrelation R[h] is clipped to positive values, downsampled by a factor 2, and subtracted from the original clipped function. The same process is repeated with other integer factors (i.e. 3, 4, and 5); in such a way the repetitive peaks at integer multiples are removed. The dominant peaks of the so obtained autocorrelation function correspond to the various periodicities of the signal envelope. This analysis refers to a window of 231 granules that at 44.1 KHz sampling rate corresponds to approximately 3 s, with an overlap of 154 granules corresponding to about 2 s. This larger window is necessary to capture the signal repetitions at the beat and subbeat levels. In this way the above mentioned autocorrelation is computed every 1 s of signal. 2) Peak detection and beat histogram computation: the first three peaks of each autocorrelation function (one for every 1 s of signal) are selected and accumulated over the whole sound file into a Beat Histogram (BH), where each bin corresponds to 656
4 the peak lag, i.e., the beat period in beats-per-minute (bpm). The lags of the histogram correspond to a range that spans from 40 to 200 bpm (corresponding respectively to periodicity from 1.5 s to 0.3 s). For each peak of autocorrelation functions, the peak amplitude is added to the histogram. That way, peaks having high amplitude (where the signal is highly similar) are weighted in the histogram computation more strongly than weaker peaks. So, when the signal is very similar to itself (strong beat) the histogram peaks will be higher. 3) Beat histogram features: The BH representation captures detailed information about the rhythmic content of the piece that can be used to effectively guess the musical genre of a song. Starting from this observation, a set of features based on the BH are calculated in order to represent rhythmic content. The latter are: P 1, P 2, P 3 : period of the first, second and third peak (in bpm); S: overall sum of the histogram (indication of beat strength); R 1, R 2, R 3 : first, second and third histogram peak divided by S (relative amplitude); P r 2 : ratio of the period (in bpm) of the second peak to the period (in bpm) of the first peak; P r 3 : ratio of the period (in bpm) of the third peak to the period (in bpm) of the first peak. D. Feature Vector The feature calculation described so far allows us to represent each piece of music as a 52 dimensional vector. It is important to notice that the first nine (rhythm) features are global features, in the sense that they are computed over the whole sound file. On the contrary, the other ones are computed every 1 s of music. So, the actual feature vector used to represent an audio file is obtained by the union of the 9 (global) rhythm features and the mean of the other 43 (local) features over the whole duration of the audio clip. IV. CLASSIFICATION MODEL The feature vectors obtained by the extraction procedure described in Sect. III will feed a classification system. The choice of the classification system is independent of the genre classification task; in this paper we will adopt the <Min-Max, PARC> classification system [8], whose performance has been ascertained in many real-world problems with respect to wellknown classification benchmarks. The classification model is a classical fuzzy Min-Max neural network. The classification strategy consists in covering the patterns of the training set with hyperboxes, whose boundary hyperplanes are parallel to the main reference system. The hyperbox can be considered as a crisp frame on which different types of membership functions can be adapted. The neurofuzzy classification model is trained by the Pruning Adaptive Resolution Classifier (PARC) algorithm. It is a constructive procedure able to generate a suitable succession of Min-Max classifiers, which are characterized by a decreasing Classical (Class 1) Node 1 Electronic (Class 2) Music Node 2 Node 3 Other Other Pop World (Class 3) (Class 4) Fig. 3: Proposed classification tree. complexity (number of neurons in the hidden layer). The regularized network is automatically selected according to learning theory, as the Ockham s Razor criterion. For this reason, PARC is characterized by a good generalization capability as well as a high automation degree. Consequently, it is particularly suited to be used as the core modeling system for a wrapper-like feature optimization procedure, such as the one described in the following Section. The implemented system consists in a classification tree where each node is a Min-Max classifier. Each node is trained to discriminate one genre among the remaining ones. As illustrated in Fig.3, the overall system can discriminate between the four said genres (classical, electronic, pop and world) in a sequential manner. The first node classical/other, decides whether the audio clip belongs to classical or not (other). All the patterns classified as other will feed the second node electronic/other of the tree. Again, all the patterns classified as other will feed the third node pop/world of the tree. V. AUTOMATIC FEATURE SELECTION In order to evaluate the effectiveness of the proposed features, an optimization strategy has been implemented. The technique is based on a genetic algorithm for selecting the optimal subset of features for the assigned task. Consequently, this reduces the input space dimension and improves the classification accuracy. Genetic algorithms are designed in order to manage a population of individuals, i.e. a set of potential solutions for the optimization problem at hand. Each individual is univocally represented by a genetic code, which is typically a string of binary digits. The fitness of a particular individual coincides with the corresponding value assumed by the objective function to be optimized. In our application, the adopted fitness function is a convex linear combination of two terms: F j =(1 λ)e j + λc j,wheree j is a performance measure on the training set S tr and C j is a complexity measure of the j-th classifier; λ [0, 1] is a meta-parameter by which the user can control the training procedure, taking into account that small values of λ yield more complex models and consequently more accuracy on S tr (the opposite situation is characterized by large values of λ). For a feature selection problem, the genetic code represents a subset of the original feature set; in our application, each 657
5 genetic code is made of 52 binary digits, where the n-th digit, n =1...52, is equal to one if the corresponding feature feeds the classification system, otherwise it is zero. The evolution starts from a population of P random selected individuals. At the k-th generation G k, with k =0,...,M gen, the next generation G k+1 is determined by applying standard selection, mutation and crossover operators. The behavior of the whole algorithm depends on P and M gen values, as well as on the mutation rate MR and on the crossover rate CR, which are two probability thresholds that control the related operators. The convergence of the genetic algorithm is assured by using elitism, i.e. by copying the best individual in the next generation. VI. PERFORMANCE EVALUATION The system s performance has been evaluated using different Web repositories (in particular all-music.com and mp3.com ), and different classification procedures (GMM, k NN, SVM, Min Max). In spite of the several tests we carried out, for the sake of synthesis and objectivity we will show in the following the results obtained using the ISMIR2004 genre classification contest [9]. The results in that database were obtained working with PCM audio data and using a six genres taxonomy: classical, electronic, jazz blues, metal punk, rock pop and world. On the basis of the consideration made in Sect. I, we use a four genres taxonomy: classical, electronic, pop (including jazz blues, metal punk and rock pop sub genres), and world. TABLE I: Data Composition Data Genre S (n) tr S (n) ts S ts classical electronic pop world Total It is important to notice that the classification systems of ISMIR2004 were tested on a Test_Tracks collection (made up of 700 tracks) that was supplied during the contest. This collection is not available, so our system has been trained with the Training_Tracks collection (like all the systems of the contest participants) and tested using the tracks of the Development_Tracks collection (supplied for the development of the participants classification systems and not for the test). Starting form this observation, it is evident that our system has been trained using a quantity of data that is smaller than the one used by the contest participants. The data set used in our experiment is available for download at the ISMIR2004 website; it is distributed and copyrighted by Mangatune.com and it is made up of two different collections. The first collection, called Training_Tracks, consists in 729 MP3, 128 Kbps stereo files sampled at 44.1 KHz, 16 bit, divided into 6 genres: classical, electronic, jazz blues, metal punk, rock pop and world. The files of this collection have been used for training each Min-Max neural network that made up the decision tree. The second collection, called Development_Tracks, consists in 729 MP3, 128 Kbps stereo files sampled at 44.1 KHz, 16 bit. The files of this collection are used for the validation of each Min- Max neural network that made up the decision tree and for the test of the overall system. A. Training In order to train each node, every clip of the Training_Tracks collection has been represented as a unique mean feature vector referred to 40 s at its center. For practical needs, the tracks with a duration less than 10 s have,shown in the first column of Table I, consists in 726 segments of 40 s, for a total duration of about 8 hours of music. An intermediate performance evaluation has been performed in order to check the reliability on each node of the classification tree. In order to test each node, every clip of the Development_Tracks collection has been represented as a unique mean feature vector referred to 40 s at its center. Once again, the tracks with a duration less than 10 s have been deleted. In this way, the node training set S (n) tr been deleted. In this way, the node test set S (n) ts, indicated in the second column of Table I, consists in 725 segments of 40 s, for a total duration of about 8 hours of music. The first node classical/other has been trained using all the four node training sets, where patterns belonging to electronic, pop and world classes have been relabeled as other. Thus, the whole training set for the first node is made up of all the 726 tracks. The node has been tested with all the four node test sets, made up of all the 725 tracks and labeled as for the training set. The second node electronic/other has been trained using the three node training sets related to electronic, pop and world classes. Patterns belonging to pop and world classes have been relabeled as other. Consequently, the whole training set of the second node is made up of = 409 files. The test set for this node is built in the same way, using the corresponding node test sets, made up of = 408 files. Finally, for the third node pop/world, only pop and world node training sets are used. Thus, the whole training set consists of = 294 files, while the test set, built in the same way, consists in = 295 files. In order to improve the classification results and to test the effectiveness of the proposed set of features, the genetic algorithm and feature selection technique described in Sect. V, was applied to the available data. In particular, for each node of the decision tree a genetic algorithm with P = 100, M gen = 100, MR = 0.3, and CR = 1 was ran. The genetic codes that minimize the classification error, according to the criterion proposed in the previous Section, are used for the implementation of an optimized classification tree. We obtained the node performances shown in Table II; also the actual number of used features N f is reported. 658
6 B. Test of the Whole Classification Tree The overall performance of the implemented decision tree was tested on the Development_Tracks collection. Tracks with a duration less than 10 s and greater than 10 min have been deleted. In this way, the test set S ts, illustrated in the third column of Table I, consists in 720 tracks for a total duration of about 47 hours of music. In particular the classification are made in the following way: each track was split in 40 s duration consecutive segments; a unique mean feature vector (with 52 components) are calculated for each segment; each segment flows through the decision tree and it is assigned a particular class; each track is classified on the basis of the most frequent class assigned to its segments; if two or more classes have the same frequency among the segments of the track, this one is classified as an indeterminate. TABLE II: Accuracy and Number of Selected Features N f Node Accuracy on S (n) tr Accuracy on S (n) ts N f Node % 88.97% 24 Node % 80.88% 26 Node 3 100% 83.73% 31 The performances of the optimized classification tree, using S ts, are reported in Table III. These results show a classification accuracy of 71.12% (512 correctly classified tracks) and an indetermination of 4.86% (35 tracks). C. Performance Comparison To make a comparison possible, our results have been compared with the best and worse ISMIR2004 contest results. The best result was achieved in [10], the worst in [11]. Both systems works with PCM audio data and uses Mel Frequency Cepstral Coefficients whit GMM and clustering techniques. These results show a 86.14% accuracy for the best and a 67.9% for the worst. Our system places in the middle with a 71.12% accuracy. In all the cases, the best discriminated class is classical; this result can be justified by two different kind of observations: a relevant presence of patterns belonging to this class in the training set and its intrinsic perceptual well-defined kind of musical genre. On the other side, the worst classified genre is world and a considerable overlap with classical is evident. This result can be physically justified by a real similarity in terms of timbre and rhythm structure of the tracks belonging to these two classes. As concerns pop and electronic, evenin this case results show a discrete overlap between the classes; this can be ascribed to a significative track similarity (this is especially true for the files belonging to electronic and the two sub-genres rock and pop belonging to the pop class). VII. CONCLUSION Definitively it can be said that, despite the fuzzy nature of genre boundaries, musical genre classification can be per- TABLE III: Confusion Matrix and Relative Accuracy (a) Confusion Matrix Output Target classical electronic pop world classical electronic pop world (b) Relative Accuracy Genre Total patterns Correctly classified patterns Accuracy classical % electronic % pop % world % formed automatically and directly in the compress domain with results and performance comparable to PCM genre classification. The classification system proposed in this paper, along with the psychoacoustic-based preprocessing of MP3 files, achieves a good accuracy. Nevertheless, this is obtained automatically, reducing to a minimum the intervention of human experts, as commonly required by Web and other multimedia applications. REFERENCES [1] G. Tzanetakis and P.Cook, Musical genre classification of audio signals, IEEE Transaction on Speech and Audio Processing, vol. 10, no. 5, July [2], Sound analysis using mpeg compressed audio, in Proceedings of the 2000 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP 2000), vol. 2, Instanbul, Turkey, June , pp [3] R. Jarina, N. Murphy, N. OConnor, and S. Marlow, An experiment in audio classification from compressed data, in International Workshop on Systems, Signals and Image Processing, Poznan, Poland, September [4] S. Kiranyaz, M. Aubazac, and M. Gabbouj, Unsupervised segmentation and classification over mp3 and aac audio bitstreams, in Proceedings of European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2003), London, UK, April [5] A. Rizzi, N. M. Buccino, M. Panella, and A. Uncini, Optimal shorttime features for music/speech classification of compressed audio data, in Proceedings of IEEE International Conference on Computational Intelligence for Modelling, Control and Automation (CIMCA06), Sydney, Australia, Noveber [6] D. Pye, Content-based methods for managing electronic music, in Proceedings of the 2000 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP 2000), vol. 4, Instanbul, Turkey, June , pp [7] Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbits/s Part 3: Audio, ISO/IEC International Standard IS , Information Technology Std. [8] A. Rizzi, M. Panella, and F. M. F. Mascioli, Adaptive resolution minmax classifiers, IEEE Transactions on Neural Networks, vol. 13, no. 2, pp , March [9] Availabe at: [10] E. Pampalk, A matlab toolbox to compute music similarity from audio, in Proceedings of International Symposium on Music Information Retrieval (ISMIR2004), Barcelona, Spain, October [11] D. Ellis and B. Whitman, Automatic record reviews, in Proceedings of International Symposium on Music Information Retrieval (ISMIR2004), Barcelona, Spain, October , pp
Audio-coding standards
Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.
More informationAudio-coding standards
Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.
More informationThe Automatic Musicologist
The Automatic Musicologist Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego UCSD AI Seminar April 12, 2004 Based on the paper: Fast Recognition of Musical
More informationBoth LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.
Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general
More informationPerceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.
Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general
More informationPrinciples of Audio Coding
Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting
More information1 Introduction. 3 Data Preprocessing. 2 Literature Review
Rock or not? This sure does. [Category] Audio & Music CS 229 Project Report Anand Venkatesan(anand95), Arjun Parthipan(arjun777), Lakshmi Manoharan(mlakshmi) 1 Introduction Music Genre Classification continues
More informationA NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO
International journal of computer science & information Technology (IJCSIT) Vol., No.5, October A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO Pranab Kumar Dhar *, Mohammad
More informationMPEG-1 Bitstreams Processing for Audio Content Analysis
ISSC, Cork. June 5- MPEG- Bitstreams Processing for Audio Content Analysis Roman Jarina, Orla Duffner, Seán Marlow, Noel O Connor, and Noel Murphy Visual Media Processing Group Dublin City University Glasnevin,
More informationMACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014
MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION Steve Tjoa kiemyang@gmail.com June 25, 2014 Review from Day 2 Supervised vs. Unsupervised Unsupervised - clustering Supervised binary classifiers (2 classes)
More information19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007
19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek
More informationMpeg 1 layer 3 (mp3) general overview
Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,
More informationPerceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding
Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.
More informationChapter 14 MPEG Audio Compression
Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1
More informationSpeech-Music Discrimination from MPEG-1 Bitstream
Speech-Music Discrimination from MPEG-1 Bitstream ROMAN JARINA, NOEL MURPHY, NOEL O CONNOR, SEÁN MARLOW Centre for Digital Video Processing / RINCE Dublin City University, Dublin 9 IRELAND jarinar@eeng.dcu.ie
More informationAdaptive Resolution Min-Max Classifiers
402 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 2, MARCH 2002 Adaptive Resolution Min-Max Classifiers Antonello Rizzi, Massimo Panella, and Fabio Massimo Frattale Mascioli Abstract A high automation
More information5: Music Compression. Music Coding. Mark Handley
5: Music Compression Mark Handley Music Coding LPC-based codecs model the sound source to achieve good compression. Works well for voice. Terrible for music. What if you can t model the source? Model the
More informationMultimedia Database Systems. Retrieval by Content
Multimedia Database Systems Retrieval by Content MIR Motivation Large volumes of data world-wide are not only based on text: Satellite images (oil spill), deep space images (NASA) Medical images (X-rays,
More informationAppendix 4. Audio coding algorithms
Appendix 4. Audio coding algorithms 1 Introduction The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically
More informationOptical Storage Technology. MPEG Data Compression
Optical Storage Technology MPEG Data Compression MPEG-1 1 Audio Standard Moving Pictures Expert Group (MPEG) was formed in 1988 to devise compression techniques for audio and video. It first devised the
More informationAUDIO information often plays an essential role in understanding
1062 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval Serkan Kiranyaz,
More informationA Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval
A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval 1 A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval Serkan Kiranyaz,
More informationAutomatic Classification of Audio Data
Automatic Classification of Audio Data Carlos H. C. Lopes, Jaime D. Valle Jr. & Alessandro L. Koerich IEEE International Conference on Systems, Man and Cybernetics The Hague, The Netherlands October 2004
More informationAudio Compression. Audio Compression. Absolute Threshold. CD quality audio:
Audio Compression Audio Compression CD quality audio: Sampling rate = 44 KHz, Quantization = 16 bits/sample Bit-rate = ~700 Kb/s (1.41 Mb/s if 2 channel stereo) Telephone-quality speech Sampling rate =
More informationLecture 16 Perceptual Audio Coding
EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero
More informationCHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri
1 CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING Alexander Wankhammer Peter Sciri introduction./the idea > overview What is musical structure?
More informationAUDIOVISUAL COMMUNICATION
AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 213 http://acousticalsociety.org/ ICA 213 Montreal Montreal, Canada 2-7 June 213 Engineering Acoustics Session 2pEAb: Controlling Sound Quality 2pEAb1. Subjective
More informationModule 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur
Module 9 AUDIO CODING Lesson 29 Transform and Filter banks Instructional Objectives At the end of this lesson, the students should be able to: 1. Define the three layers of MPEG-1 audio coding. 2. Define
More informationWorkshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards
Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Extracting meaning
More informationA Robust Audio Fingerprinting Algorithm in MP3 Compressed Domain
A Robust Audio Fingerprinting Algorithm in MP3 Compressed Domain Ruili Zhou, Yuesheng Zhu Abstract In this paper, a new robust audio fingerprinting algorithm in MP3 compressed domain is proposed with high
More informationSPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL
SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,
More information2.4 Audio Compression
2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and
More informationSpeech and audio coding
Institut Mines-Telecom Speech and audio coding Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced compression Outline Introduction Introduction Speech signal Music signal Masking Codeurs simples
More informationRepeating Segment Detection in Songs using Audio Fingerprint Matching
Repeating Segment Detection in Songs using Audio Fingerprint Matching Regunathan Radhakrishnan and Wenyu Jiang Dolby Laboratories Inc, San Francisco, USA E-mail: regu.r@dolby.com Institute for Infocomm
More informationAUDIOVISUAL COMMUNICATION
AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis
More informationMultimedia Communications. Audio coding
Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated
More informationELL 788 Computational Perception & Cognition July November 2015
ELL 788 Computational Perception & Cognition July November 2015 Module 11 Audio Engineering: Perceptual coding Coding and decoding Signal (analog) Encoder Code (Digital) Code (Digital) Decoder Signal (analog)
More informationModeling of an MPEG Audio Layer-3 Encoder in Ptolemy
Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Patrick Brown EE382C Embedded Software Systems May 10, 2000 $EVWUDFW MPEG Audio Layer-3 is a standard for the compression of high-quality digital audio.
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 13 Audio Signal Processing 14/04/01 http://www.ee.unlv.edu/~b1morris/ee482/
More informationPerformance Analysis of Discrete Wavelet Transform based Audio Watermarking on Indian Classical Songs
Volume 73 No.6, July 2013 Performance Analysis of Discrete Wavelet Transform based Audio ing on Indian Classical Songs C. M. Juli Janardhanan Department of ECE Government Engineering College, Wayanad Mananthavady,
More informationSpeech Modulation for Image Watermarking
Speech Modulation for Image Watermarking Mourad Talbi 1, Ben Fatima Sira 2 1 Center of Researches and Technologies of Energy, Tunisia 2 Engineering School of Tunis, Tunisia Abstract Embedding a hidden
More informationAvailable online Journal of Scientific and Engineering Research, 2016, 3(4): Research Article
Available online www.jsaer.com, 2016, 3(4):417-422 Research Article ISSN: 2394-2630 CODEN(USA): JSERBR Automatic Indexing of Multimedia Documents by Neural Networks Dabbabi Turkia 1, Lamia Bouafif 2, Ellouze
More informationQUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose
QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,
More informationAudio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011
Audio Fundamentals, Compression Techniques & Standards Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Outlines Audio Fundamentals Sampling, digitization, quantization μ-law
More informationFigure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.
Introduction to Digital Audio Compression B. Cavagnolo and J. Bier Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, CA 94704 (510) 665-1600 info@bdti.com http://www.bdti.com INTRODUCTION
More informationAudio Segmentation and Classification. Abdillahi Hussein Omar
Audio Segmentation and Classification Abdillahi Hussein Omar Kgs. Lyngby 2005 Preface The work presented in this thesis has been carried out at the Intelligent Signal Processing Group, at the Institute
More informationSimple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay
ACOUSTICAL LETTER Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay Kazuhiro Kondo and Kiyoshi Nakagawa Graduate School of Science and Engineering, Yamagata University,
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 6 Audio Retrieval 6 Audio Retrieval 6.1 Basics of
More informationOvercompressing JPEG images with Evolution Algorithms
Author manuscript, published in "EvoIASP2007, Valencia : Spain (2007)" Overcompressing JPEG images with Evolution Algorithms Jacques Lévy Véhel 1, Franklin Mendivil 2 and Evelyne Lutton 1 1 Inria, Complex
More informationCHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION
CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION In chapter 4, SVD based watermarking schemes are proposed which met the requirement of imperceptibility, having high payload and
More informationCompressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.
Compressed Audio Demystified Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity
More informationAdaptive Quantization for Video Compression in Frequency Domain
Adaptive Quantization for Video Compression in Frequency Domain *Aree A. Mohammed and **Alan A. Abdulla * Computer Science Department ** Mathematic Department University of Sulaimani P.O.Box: 334 Sulaimani
More informationData Hiding in Video
Data Hiding in Video J. J. Chae and B. S. Manjunath Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 9316-956 Email: chaejj, manj@iplab.ece.ucsb.edu Abstract
More informationBit or Noise Allocation
ISO 11172-3:1993 ANNEXES C & D 3-ANNEX C (informative) THE ENCODING PROCESS 3-C.1 Encoder 3-C.1.1 Overview For each of the Layers, an example of one suitable encoder with the corresponding flow-diagram
More informationSpectral modeling of musical sounds
Spectral modeling of musical sounds Xavier Serra Audiovisual Institute, Pompeu Fabra University http://www.iua.upf.es xserra@iua.upf.es 1. Introduction Spectral based analysis/synthesis techniques offer
More informationCompression of Stereo Images using a Huffman-Zip Scheme
Compression of Stereo Images using a Huffman-Zip Scheme John Hamann, Vickey Yeh Department of Electrical Engineering, Stanford University Stanford, CA 94304 jhamann@stanford.edu, vickey@stanford.edu Abstract
More informationCepstral Analysis Tools for Percussive Timbre Identification
Cepstral Analysis Tools for Percussive Timbre Identification William Brent Department of Music and Center for Research in Computing and the Arts University of California, San Diego wbrent@ucsd.edu ABSTRACT
More informationMPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings
MPEG-1 Overview of MPEG-1 1 Standard Introduction to perceptual and entropy codings Contents History Psychoacoustics and perceptual coding Entropy coding MPEG-1 Layer I/II Layer III (MP3) Comparison and
More informationFundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06
Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06 Goals of Lab Introduction to fundamental principles of digital audio & perceptual audio encoding Learn the basics of psychoacoustic
More informationMPEG-7 Audio: Tools for Semantic Audio Description and Processing
MPEG-7 Audio: Tools for Semantic Audio Description and Processing Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Why semantic description
More informationCHAPTER 3. Preprocessing and Feature Extraction. Techniques
CHAPTER 3 Preprocessing and Feature Extraction Techniques CHAPTER 3 Preprocessing and Feature Extraction Techniques 3.1 Need for Preprocessing and Feature Extraction schemes for Pattern Recognition and
More informationAuthentication and Secret Message Transmission Technique Using Discrete Fourier Transformation
, 2009, 5, 363-370 doi:10.4236/ijcns.2009.25040 Published Online August 2009 (http://www.scirp.org/journal/ijcns/). Authentication and Secret Message Transmission Technique Using Discrete Fourier Transformation
More informationDUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING
DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING Christopher Burges, Daniel Plastina, John Platt, Erin Renshaw, and Henrique Malvar March 24 Technical Report MSR-TR-24-19 Audio fingerprinting
More informationScalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC
Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Ralf Geiger 1, Gerald Schuller 1, Jürgen Herre 2, Ralph Sperschneider 2, Thomas Sporer 1 1 Fraunhofer IIS AEMT, Ilmenau, Germany 2 Fraunhofer
More informationAudio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model
Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model 1 M. Chinna Rao M.Tech,(Ph.D) Research scholar, JNTUK,kakinada chinnarao.mortha@gmail.com 2 Dr. A.V.S.N. Murthy Professor of Mathematics,
More informationWavelet filter bank based wide-band audio coder
Wavelet filter bank based wide-band audio coder J. Nováček Czech Technical University, Faculty of Electrical Engineering, Technicka 2, 16627 Prague, Czech Republic novacj1@fel.cvut.cz 3317 New system for
More informationQueST: Querying Music Databases by Acoustic and Textual Features
QueST: Querying Music Databases by Acoustic and Textual Features Bin Cui 1 Ling Liu 2 Calton Pu 2 Jialie Shen 3 Kian-Lee Tan 4 1 Department of Computer Science & National Lab on Machine Perception, Peking
More informationAn investigation of non-uniform bandwidths auditory filterbank in audio coding
PAGE 360 An investigation of non-uniform bandwidths auditory filterbank in audio coding Andrew Lin, Stevan Berber, Waleed Abdulla Department of Electrical and Computer Engineering University of Auckland,
More informationCS229 Final Project: Audio Query By Gesture
CS229 Final Project: Audio Query By Gesture by Steinunn Arnardottir, Luke Dahl and Juhan Nam {steinunn,lukedahl,juhan}@ccrma.stanford.edu December 2, 28 Introduction In the field of Music Information Retrieval
More informationIntroducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd
Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd Overview Audio Signal Processing Applications @ Dolby Audio Signal Processing Basics
More informationAudio and video compression
Audio and video compression 4.1 introduction Unlike text and images, both audio and most video signals are continuously varying analog signals. Compression algorithms associated with digitized audio and
More informationNew Results in Low Bit Rate Speech Coding and Bandwidth Extension
Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without
More informationA GENERIC SYSTEM FOR AUDIO INDEXING: APPLICATION TO SPEECH/ MUSIC SEGMENTATION AND MUSIC GENRE RECOGNITION
A GENERIC SYSTEM FOR AUDIO INDEXING: APPLICATION TO SPEECH/ MUSIC SEGMENTATION AND MUSIC GENRE RECOGNITION Geoffroy Peeters IRCAM - Sound Analysis/Synthesis Team, CNRS - STMS Paris, France peeters@ircam.fr
More informationMusic Genre Classification
Music Genre Classification Matthew Creme, Charles Burlin, Raphael Lenain Stanford University December 15, 2016 Abstract What exactly is it that makes us, humans, able to tell apart two songs of different
More informationIntroducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd
Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories,
More informationWhat is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?
Multimedia What is multimedia? Media types +Text + Graphics + Audio +Image +Video Interchange formats What is multimedia? Multimedia = many media User interaction = interactivity Script = time 1 2 Most
More informationParametric Coding of High-Quality Audio
Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits
More informationVoice Command Based Computer Application Control Using MFCC
Voice Command Based Computer Application Control Using MFCC Abinayaa B., Arun D., Darshini B., Nataraj C Department of Embedded Systems Technologies, Sri Ramakrishna College of Engineering, Coimbatore,
More informationCompression of RADARSAT Data with Block Adaptive Wavelets Abstract: 1. Introduction
Compression of RADARSAT Data with Block Adaptive Wavelets Ian Cumming and Jing Wang Department of Electrical and Computer Engineering The University of British Columbia 2356 Main Mall, Vancouver, BC, Canada
More informationDetection of goal event in soccer videos
Detection of goal event in soccer videos Hyoung-Gook Kim, Steffen Roeber, Amjad Samour, Thomas Sikora Department of Communication Systems, Technical University of Berlin, Einsteinufer 17, D-10587 Berlin,
More informationCISC 7610 Lecture 3 Multimedia data and data formats
CISC 7610 Lecture 3 Multimedia data and data formats Topics: Perceptual limits of multimedia data JPEG encoding of images MPEG encoding of audio MPEG and H.264 encoding of video Multimedia data: Perceptual
More informationUsing Noise Substitution for Backwards-Compatible Audio Codec Improvement
Using Noise Substitution for Backwards-Compatible Audio Codec Improvement Colin Raffel AES 129th Convention San Francisco, CA February 16, 2011 Outline Introduction and Motivation Coding Error Analysis
More informationSOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2
Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 1 Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 33720, Tampere, Finland toni.heittola@tut.fi,
More informationMultimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1
Multimedia What is multimedia? Media types + Text +Graphics +Audio +Image +Video Interchange formats Petri Vuorimaa 1 What is multimedia? Multimedia = many media User interaction = interactivity Script
More informationDAB. Digital Audio Broadcasting
DAB Digital Audio Broadcasting DAB history DAB has been under development since 1981 at the Institut für Rundfunktechnik (IRT). In 1985 the first DAB demonstrations were held at the WARC-ORB in Geneva
More informationDietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++
Dietrich Paulus Joachim Hornegger Pattern Recognition of Images and Speech in C++ To Dorothea, Belinda, and Dominik In the text we use the following names which are protected, trademarks owned by a company
More informationA Brief Overview of Audio Information Retrieval. Unjung Nam CCRMA Stanford University
A Brief Overview of Audio Information Retrieval Unjung Nam CCRMA Stanford University 1 Outline What is AIR? Motivation Related Field of Research Elements of AIR Experiments and discussion Music Classification
More informationPolitecnico di Torino. Porto Institutional Repository
Politecnico di Torino Porto Institutional Repository [Proceeding] Detection and classification of double compressed MP3 audio tracks Original Citation: Tiziano Bianchi;Alessia De Rosa;Marco Fontani;Giovanni
More informationA Miniature-Based Image Retrieval System
A Miniature-Based Image Retrieval System Md. Saiful Islam 1 and Md. Haider Ali 2 Institute of Information Technology 1, Dept. of Computer Science and Engineering 2, University of Dhaka 1, 2, Dhaka-1000,
More informationVideo Key-Frame Extraction using Entropy value as Global and Local Feature
Video Key-Frame Extraction using Entropy value as Global and Local Feature Siddu. P Algur #1, Vivek. R *2 # Department of Information Science Engineering, B.V. Bhoomraddi College of Engineering and Technology
More informationImage Classification Using Wavelet Coefficients in Low-pass Bands
Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August -7, 007 Image Classification Using Wavelet Coefficients in Low-pass Bands Weibao Zou, Member, IEEE, and Yan
More informationCSCD 443/533 Advanced Networks Fall 2017
CSCD 443/533 Advanced Networks Fall 2017 Lecture 18 Compression of Video and Audio 1 Topics Compression technology Motivation Human attributes make it possible Audio Compression Video Compression Performance
More informationCompression transparent low-level description of audio signals
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 25 Compression transparent low-level description of audio signals Jason
More informationOptimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform
Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Torsten Palfner, Alexander Mali and Erika Müller Institute of Telecommunications and Information Technology, University of
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationIMPROVED CONTEXT-ADAPTIVE ARITHMETIC CODING IN H.264/AVC
17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 IMPROVED CONTEXT-ADAPTIVE ARITHMETIC CODING IN H.264/AVC Damian Karwowski, Marek Domański Poznań University
More informationCHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING. domain. In spatial domain the watermark bits directly added to the pixels of the cover
38 CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING Digital image watermarking can be done in both spatial domain and transform domain. In spatial domain the watermark bits directly added to the pixels of the
More informationAn adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria
An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria N. RUIZ REYES 1, M. ROSA ZURERA 2, F. LOPEZ FERRERAS 2, D. MARTINEZ MUÑOZ 1 1 Departamento
More informationCHAPTER 7 MUSIC INFORMATION RETRIEVAL
163 CHAPTER 7 MUSIC INFORMATION RETRIEVAL Using the music and non-music components extracted, as described in chapters 5 and 6, we can design an effective Music Information Retrieval system. In this era
More information