Feature and Dissimilarity Representations for the Sound-Based Recognition of Bird Species

Size: px
Start display at page:

Download "Feature and Dissimilarity Representations for the Sound-Based Recognition of Bird Species"

Transcription

1 Feature and Dissimilarity Representations for the Sound-Based Recognition of Bird Species José Francisco Ruiz-Muñoz, Mauricio Orozco-Alzate César Germán Castellanos-Domínguez Universidad Nacional de Colombia Sede Manizales 16th Iberoamerican Congress on Pattern Recognition Pucón, Chile. November 17, /40

2 Outline 1 Introduction 2 Representations 3 Experimental setup 4 Results 5 Discussion 6 Conclusions 2/40

3 Outline 1 Introduction 2 Representations 3 Experimental setup 4 Results 5 Discussion 6 Conclusions 3/40

4 Introduction Advances in PR and DSP allow the identification of bird species by their emitted sounds and, thereby, the design of automated systems for avian monitoring. In spite of those advances, biodiversity assessments have typically been carried out by visual inspection, which requires human involvement and, therefore, may be expensive and have a limited coverage. 4/40

5 Introduction Advances in PR and DSP allow the identification of bird species by their emitted sounds and, thereby, the design of automated systems for avian monitoring. In spite of those advances, biodiversity assessments have typically been carried out by visual inspection, which requires human involvement and, therefore, may be expensive and have a limited coverage. 4/40

6 Introduction Advances in PR and DSP allow the identification of bird species by their emitted sounds and, thereby, the design of automated systems for avian monitoring. In spite of those advances, biodiversity assessments have typically been carried out by visual inspection, which requires human involvement and, therefore, may be expensive and have a limited coverage. 4/40

7 Automatic systems: a solution Automatic acoustic monitoring is a non-intrusive and cost effective alternative that may provide good temporal and spatial coverages [Brandes, 2008]. Source: [Frommolt et al., 2008] 5/40

8 Units in a bird song TATIONS OF BIRD SOUNDS 2253 were possible, the recognied even in a noisy environalternative approach of recllenging. The main problem be several birds singing sial variability in the songs of ongs of other species makes ng patterns for each species. rd sounds was based on sinugnition results were encourl was clearly oversimplified: he frequency and amplitude sinusoid. It was also shown el is significantly better in center frequency. In [16], roduced which characterize le. This model is revised in airs of stylized syllables in information about consecion accuracy. In the current mparison of different paras. The problem of decomzable units is discussed in ing the segmentation of a g is also introduced. In Sece introduced and compared, operties of the classification ection V, we provide the eriments. In addition to the f bird vocalization, we also syllables. Finally, after the sions from the current work Source: [Somervuo et al., 2006] Fig. 1. Hierarchical levels of song of the Common Chaffinch. Sound-based recognition of bird-species is suitable when considering syllables as elementary units [Fagerlund, 2007, Acevedo et al., 2009]. Raw measurements corresponding to those elementary units can be represented in vector spaces where classification rules can afterwards be applied. In this paper, we call the smallest unit a syllable. A syllable is basically a sound that a bird produces with a single blow of air from the lungs. This is also somewhat inaccurate definition as many birds are capable of complicated circular breathing cycles during singing [18]. The rate of events in bird vocalization may also be so high that the separation of individual syllables is difficult to perform in a natural environment due to reverberation. The segmentationof a recording toindividual syllables is performed using an iterative time-domain algorithm [19]. First, a smoothenergy envelopeofthe signaliscomputedand theglobal minimumenergyis selectedastheinitialbackgroundnoiselevel estimate. Initial threshold is set to the half of the initial noise level, which is set to the lowest signal envelope energy. Noise level and threshold are updated using the following 6/40

9 Units in a bird song TATIONS OF BIRD SOUNDS 2253 were possible, the recognied even in a noisy environalternative approach of recllenging. The main problem be several birds singing sial variability in the songs of ongs of other species makes ng patterns for each species. rd sounds was based on sinugnition results were encourl was clearly oversimplified: he frequency and amplitude sinusoid. It was also shown el is significantly better in center frequency. In [16], roduced which characterize le. This model is revised in airs of stylized syllables in information about consecion accuracy. In the current mparison of different paras. The problem of decomzable units is discussed in ing the segmentation of a g is also introduced. In Sece introduced and compared, operties of the classification ection V, we provide the eriments. In addition to the f bird vocalization, we also syllables. Finally, after the sions from the current work Source: [Somervuo et al., 2006] Fig. 1. Hierarchical levels of song of the Common Chaffinch. Sound-based recognition of bird-species is suitable when considering syllables as elementary units [Fagerlund, 2007, Acevedo et al., 2009]. Raw measurements corresponding to those elementary units can be represented in vector spaces where classification rules can afterwards be applied. In this paper, we call the smallest unit a syllable. A syllable is basically a sound that a bird produces with a single blow of air from the lungs. This is also somewhat inaccurate definition as many birds are capable of complicated circular breathing cycles during singing [18]. The rate of events in bird vocalization may also be so high that the separation of individual syllables is difficult to perform in a natural environment due to reverberation. The segmentationof a recording toindividual syllables is performed using an iterative time-domain algorithm [19]. First, a smoothenergy envelopeofthe signaliscomputedand theglobal minimumenergyis selectedastheinitialbackgroundnoiselevel estimate. Initial threshold is set to the half of the initial noise level, which is set to the lowest signal envelope energy. Noise level and threshold are updated using the following 6/40

10 Representations Representations for bioacoustic signals have traditionally been built by feature extraction. Dissimilarity representations are also a feasible option to face this problem: dissimilarities have the potential to build either simpler or richer representations; the later case, a richer representation, when considering for instance time-varying measurements that take into account non-stationarity. 7/40

11 Representations Representations for bioacoustic signals have traditionally been built by feature extraction. Dissimilarity representations are also a feasible option to face this problem: dissimilarities have the potential to build either simpler or richer representations; the later case, a richer representation, when considering for instance time-varying measurements that take into account non-stationarity. 7/40

12 Outline 1 Introduction 2 Representations 3 Experimental setup 4 Results 5 Discussion 6 Conclusions 8/40

13 Feature representations Several methods of feature-based representations, commonly used in bioacoustics and bird sound classification, are compared: Standard features [Acevedo et al., 2009]: minimum and maximum frequencies, temporal duration and maximum power. Coarse representation of segment structure [Acevedo et al., 2009]: minimum and maximum frequencies, temporal duration and frequency of maximum power in eight non-overlapping frames. 9/40

14 Feature representations Several methods of feature-based representations, commonly used in bioacoustics and bird sound classification, are compared: Standard features [Acevedo et al., 2009]: minimum and maximum frequencies, temporal duration and maximum power. Coarse representation of segment structure [Acevedo et al., 2009]: minimum and maximum frequencies, temporal duration and frequency of maximum power in eight non-overlapping frames. 9/40

15 Feature representations Several methods of feature-based representations, commonly used in bioacoustics and bird sound classification, are compared: Standard features [Acevedo et al., 2009]: minimum and maximum frequencies, temporal duration and maximum power. Coarse representation of segment structure [Acevedo et al., 2009]: minimum and maximum frequencies, temporal duration and frequency of maximum power in eight non-overlapping frames. 9/40

16 Feature representations Descriptive features [Fagerlund, 2007]: this set includes both temporal and spectral features. Segments are divided into overlapping frames of 256 samples with 50% overlap. Mel-cepstrum representation: the first 12 MFCCs, the log-energy and the so-called delta and delta-delta coeficients are obtained for each frame. Their mean values along the frames are used as features. 10/40

17 Feature representations Descriptive features [Fagerlund, 2007]: this set includes both temporal and spectral features. Segments are divided into overlapping frames of 256 samples with 50% overlap. Mel-cepstrum representation: the first 12 MFCCs, the log-energy and the so-called delta and delta-delta coeficients are obtained for each frame. Their mean values along the frames are used as features. 10/40

18 Dissimilarity representations A dissimilarity representation consists in building vectorial spaces where coordinate axes represent dissimilarities, typically distance measures to prototypes. 11/40

19 Dissimilarity representations: one dimensional (1-D) transforms Analysis of signal properties is usually done in the frequency domain, for this study were calculated the spectrum for each signal by using two different approaches: Fast Fourier Transform (FFT) and Power Spectral Density (PSD). 12/40

20 Dissimilarity representations: two dimensional (2-D) transforms Time-varying representations are computed by dividing sound segments into frames and converting each one to either a spectral or a feature representation. Feature sets measured for each frame were: 1 Spectrogram, also known as short time Fourier transform (SFT). 2 PSD by using the Yule Walker method. 3 Selected descriptive features (spectral centroid, signal bandwidth, spectral rolloff frequency, spectral flux, spectral flatness, zero crossing rate and short time energy). 4 Mel-cepstrum representation. 13/40

21 Dissimilarity representations: two dimensional (2-D) transforms Time-varying representations are computed by dividing sound segments into frames and converting each one to either a spectral or a feature representation. Feature sets measured for each frame were: 1 Spectrogram, also known as short time Fourier transform (SFT). 2 PSD by using the Yule Walker method. 3 Selected descriptive features (spectral centroid, signal bandwidth, spectral rolloff frequency, spectral flux, spectral flatness, zero crossing rate and short time energy). 4 Mel-cepstrum representation. 13/40

22 Dissimilarity representations: two dimensional (2-D) transforms Time-varying representations are computed by dividing sound segments into frames and converting each one to either a spectral or a feature representation. Feature sets measured for each frame were: 1 Spectrogram, also known as short time Fourier transform (SFT). 2 PSD by using the Yule Walker method. 3 Selected descriptive features (spectral centroid, signal bandwidth, spectral rolloff frequency, spectral flux, spectral flatness, zero crossing rate and short time energy). 4 Mel-cepstrum representation. 13/40

23 Dissimilarity representations: two dimensional (2-D) transforms Time-varying representations are computed by dividing sound segments into frames and converting each one to either a spectral or a feature representation. Feature sets measured for each frame were: 1 Spectrogram, also known as short time Fourier transform (SFT). 2 PSD by using the Yule Walker method. 3 Selected descriptive features (spectral centroid, signal bandwidth, spectral rolloff frequency, spectral flux, spectral flatness, zero crossing rate and short time energy). 4 Mel-cepstrum representation. 13/40

24 Dissimilarity representations: two dimensional (2-D) transforms Time-varying representations are computed by dividing sound segments into frames and converting each one to either a spectral or a feature representation. Feature sets measured for each frame were: 1 Spectrogram, also known as short time Fourier transform (SFT). 2 PSD by using the Yule Walker method. 3 Selected descriptive features (spectral centroid, signal bandwidth, spectral rolloff frequency, spectral flux, spectral flatness, zero crossing rate and short time energy). 4 Mel-cepstrum representation. 13/40

25 Dissimilarity representations: two dimensional (2-D) transforms Time-varying representations are computed by dividing sound segments into frames and converting each one to either a spectral or a feature representation. Feature sets measured for each frame were: 1 Spectrogram, also known as short time Fourier transform (SFT). 2 PSD by using the Yule Walker method. 3 Selected descriptive features (spectral centroid, signal bandwidth, spectral rolloff frequency, spectral flux, spectral flatness, zero crossing rate and short time energy). 4 Mel-cepstrum representation. 13/40

26 Dissimilarity representations: measures In the case of equally sized representations (e.g. FFT or PSD) a classical measure, as the Euclidean distance, can be used. Euclidean distance can not be directly applied to time-varying representations. To overcome this dificulty, in this study we have used the EMD [Rubner et al., 2000]. 14/40

27 Dissimilarity representations: measures In the case of equally sized representations (e.g. FFT or PSD) a classical measure, as the Euclidean distance, can be used. Euclidean distance can not be directly applied to time-varying representations. To overcome this dificulty, in this study we have used the EMD [Rubner et al., 2000]. 14/40

28 Outline 1 Introduction 2 Representations 3 Experimental setup 4 Results 5 Discussion 6 Conclusions 15/40

29 Introduction Representations Experimental setup Results Discussion Conclusions References Dataset A set of birdsong recordings of eleven species were used in the experiments. 16/40

30 Dataset Raw field recordings were taken at Reserva Natural Río Blanco in Manizales, Colombia. The sampling frequency of the recordings is 44.1 khz. Bird sounds are detected in raw data using segmentation techniques based on the energy [Härmä, 2003]. The data set contains a total of 595 syllables distributed per species as shown in Table 1. 17/40

31 Dataset Raw field recordings were taken at Reserva Natural Río Blanco in Manizales, Colombia. The sampling frequency of the recordings is 44.1 khz. Bird sounds are detected in raw data using segmentation techniques based on the energy [Härmä, 2003]. The data set contains a total of 595 syllables distributed per species as shown in Table 1. 17/40

32 Dataset Raw field recordings were taken at Reserva Natural Río Blanco in Manizales, Colombia. The sampling frequency of the recordings is 44.1 khz. Bird sounds are detected in raw data using segmentation techniques based on the energy [Härmä, 2003]. The data set contains a total of 595 syllables distributed per species as shown in Table 1. 17/40

33 Dataset Raw field recordings were taken at Reserva Natural Río Blanco in Manizales, Colombia. The sampling frequency of the recordings is 44.1 khz. Bird sounds are detected in raw data using segmentation techniques based on the energy [Härmä, 2003]. The data set contains a total of 595 syllables distributed per species as shown in Table 1. 17/40

34 Dataset Table: 1. Birds species in the current study. Latin name Abbreviation Syllables Grallaria ruficapilla GR 33 Henicorhina leucophrys HL 64 Mimus gilvus MG 66 Myadestes ralloides MR 58 Pitangus sulphuratus PS 53 Pyrrhomyias cinnamomea PC 36 Troglodytes aedon TA 33 Turdus ignobilis TI 74 Turdus serranus TS 78 Xiphocolaptes promeropirhynchus XP 46 Zonotrichia capensis ZC 54 18/40

35 Outline 1 Introduction 2 Representations 3 Experimental setup 4 Results 5 Discussion 6 Conclusions 19/40

36 Results Three approaches for representing bird sounds have been analyzed: 1 Feature representations. 2 Dissimilarity representations for signals in the frequency domain. 3 Dissimilarity representations for time-varying signal transforms. 20/40

37 Results Three approaches for representing bird sounds have been analyzed: 1 Feature representations. 2 Dissimilarity representations for signals in the frequency domain. 3 Dissimilarity representations for time-varying signal transforms. 20/40

38 Results Three approaches for representing bird sounds have been analyzed: 1 Feature representations. 2 Dissimilarity representations for signals in the frequency domain. 3 Dissimilarity representations for time-varying signal transforms. 20/40

39 Results Three approaches for representing bird sounds have been analyzed: 1 Feature representations. 2 Dissimilarity representations for signals in the frequency domain. 3 Dissimilarity representations for time-varying signal transforms. 20/40

40 Performance measure Evaluation for each representation was assessed by using measures of the leave-one-out 1-NN performance. For each representation, a confusion matrix is reported. The following performance measures per class are presented: True Positive rate (TP), False Positive rate (FP), Accuracy (ACC) and F1 score. 21/40

41 Performance measure Evaluation for each representation was assessed by using measures of the leave-one-out 1-NN performance. For each representation, a confusion matrix is reported. The following performance measures per class are presented: True Positive rate (TP), False Positive rate (FP), Accuracy (ACC) and F1 score. 21/40

42 Performance measure Evaluation for each representation was assessed by using measures of the leave-one-out 1-NN performance. For each representation, a confusion matrix is reported. The following performance measures per class are presented: True Positive rate (TP), False Positive rate (FP), Accuracy (ACC) and F1 score. 21/40

43 Results for feature representations 22/40

44 Results for feature representations 23/40

45 Results for feature representations 24/40

46 Results for feature representations This table shows the best performance of feature representation. 25/40

47 Results for feature representations This table shows the best performance of feature representation. 25/40

48 Results for dissimilarity representations: 1-D transform This table shows the poorest performance with a total accuracy of 50.58%. 26/40

49 Results for dissimilarity representations: 1-D transform This table shows the poorest performance with a total accuracy of 50.58%. 26/40

50 Results for dissimilarity representations: 1-D transform This table shows the best performance of dissimilarity representations for signals in the frequency domain. 27/40

51 Results for dissimilarity representations: 1-D transform This table shows the best performance of dissimilarity representations for signals in the frequency domain. 27/40

52 Results for dissimilarity representations: time-varying 28/40

53 Results for dissimilarity representations: time-varying 29/40

54 Results for dissimilarity representations: time-varying 30/40

55 Results for dissimilarity representations: time-varying This table shows the best performance of dissimilarity representations for time-varying signal transforms. 31/40

56 Results for dissimilarity representations: time-varying This table shows the best performance of dissimilarity representations for time-varying signal transforms. 31/40

57 Outline 1 Introduction 2 Representations 3 Experimental setup 4 Results 5 Discussion 6 Conclusions 32/40

58 Discussion Mel-cepstrum and descriptive features showed a good performance in both feature and dissimilarity representations, as expected because those representations are specifically designed for audio recognition. The dissimilarity representation for PSD, in spite of being a rather simple representation, yielded an acceptable performance with a total accuracy of 80.67%. 33/40

59 Discussion Mel-cepstrum and descriptive features showed a good performance in both feature and dissimilarity representations, as expected because those representations are specifically designed for audio recognition. The dissimilarity representation for PSD, in spite of being a rather simple representation, yielded an acceptable performance with a total accuracy of 80.67%. 33/40

60 Discussion Performances of dissimilarity representations for time-varying signal transforms are remarkable. This fact reveals the importance of taking into account the non-stationarity; which is observable by comparing results of dissimilarity representations computed for FFT. The poorest ones with a total accuracy of 50.58%, and results of the dissimilarity representations for time-varying FFT (SFT) that had a total accuracy of 93.78%. 34/40

61 Discussion In the case of PSD, the performance also increased when deriving dissimilarities for time-varying transforms but in less proportion, with a total accuracy of 86.22%. 35/40

62 Outline 1 Introduction 2 Representations 3 Experimental setup 4 Results 5 Discussion 6 Conclusions 36/40

63 Conclusions We conclude that Mel-cepstrum coefficients are suitable for bird sound representation, even more when dissimilarities are computed from them; i.e. when non-stationarity is taken into account. Representation in dissimilarity spaces derived from time-varying representations was found to be preferable instead of doing so in the corresponding 1-D representations. 37/40

64 Conclusions We conclude that Mel-cepstrum coefficients are suitable for bird sound representation, even more when dissimilarities are computed from them; i.e. when non-stationarity is taken into account. Representation in dissimilarity spaces derived from time-varying representations was found to be preferable instead of doing so in the corresponding 1-D representations. 37/40

65 Future work Continuation of the data collection stage: Larger and more representative set. Construction of trained classifiers in the dissimilarity space. Comparison with hidden markov model-based classification: Classical Bayesian approach. Generative embeddings. 38/40

66 Future work Continuation of the data collection stage: Larger and more representative set. Construction of trained classifiers in the dissimilarity space. Comparison with hidden markov model-based classification: Classical Bayesian approach. Generative embeddings. 38/40

67 Future work Continuation of the data collection stage: Larger and more representative set. Construction of trained classifiers in the dissimilarity space. Comparison with hidden markov model-based classification: Classical Bayesian approach. Generative embeddings. 38/40

68 Future work Continuation of the data collection stage: Larger and more representative set. Construction of trained classifiers in the dissimilarity space. Comparison with hidden markov model-based classification: Classical Bayesian approach. Generative embeddings. 38/40

69 Future work Continuation of the data collection stage: Larger and more representative set. Construction of trained classifiers in the dissimilarity space. Comparison with hidden markov model-based classification: Classical Bayesian approach. Generative embeddings. 38/40

70 Future work Continuation of the data collection stage: Larger and more representative set. Construction of trained classifiers in the dissimilarity space. Comparison with hidden markov model-based classification: Classical Bayesian approach. Generative embeddings. 38/40

71 References Acevedo, M. A., Corrada-Bravo, C. J., Corrada-Bravo, H., Villanueva-Rivera, L. J., and Aide, T. M. (2009). Automated classification of bird and amphibian calls using machine learning: A comparison of methods. Ecological Informatics, 4(4): Brandes, T. S. (2008). Automated sound recording and analysis techniques for bird surveys and conservation. Bird Conservation International, 18(S1):S163 S173. Fagerlund, S. (2007). Bird species recognition using support vector machines. EURASIP Journal on Advances in Signal Processing, 2007(1): Frommolt, K.-H., Bardeli, R., and Clausen, M., editors (2008). Computational bioacoustics for assessing biodiversity: Proceedings of the International Expert meeting on IT-based detection of bioacoustical patterns, December 7th until December 10th, 2007 at the International Academy for Nature Conservation (INA). Isle of Vilm, Germany, volume 234 of BfN - Skripten, Bonn, Germany. Federal Agency for Nature Conservation. Härmä, A. (2003). Automatic identification of bird species based on sinusoidal modeling of syllables. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 03, volume 5, pages Rubner, Y., Tomasi, C., and Guibas, L. (2000). The earth mover s distance as a metric for image retrieval. International Journal of Computer Vision, 40(S2):S99 -S121. Somervuo, P., Härmä, A., and Fagerlund, S. (2006). Parametric representations of bird sounds for automatic species recognition. IEEE Transactions on Audio, Speech, and Language Processing, 14(6): /40

72 Questions? 40/40

Repeating Segment Detection in Songs using Audio Fingerprint Matching

Repeating Segment Detection in Songs using Audio Fingerprint Matching Repeating Segment Detection in Songs using Audio Fingerprint Matching Regunathan Radhakrishnan and Wenyu Jiang Dolby Laboratories Inc, San Francisco, USA E-mail: regu.r@dolby.com Institute for Infocomm

More information

Detection of goal event in soccer videos

Detection of goal event in soccer videos Detection of goal event in soccer videos Hyoung-Gook Kim, Steffen Roeber, Amjad Samour, Thomas Sikora Department of Communication Systems, Technical University of Berlin, Einsteinufer 17, D-10587 Berlin,

More information

Archetypal Analysis Based Sparse Convex Sequence Kernel for Bird Activity Detection

Archetypal Analysis Based Sparse Convex Sequence Kernel for Bird Activity Detection Archetypal Analysis Based Sparse Convex Sequence Kernel for Bird Activity Detection V. Abrol, P. Sharma, A. Thakur, P. Rajan, A. D. Dileep, Anil K. Sao School of Computing and Electrical Engineering, Indian

More information

Bird Species Identification Using Support Vector Machine

Bird Species Identification Using Support Vector Machine Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 9, September 2015,

More information

BIRD-PHRASE SEGMENTATION AND VERIFICATION: A NOISE-ROBUST TEMPLATE-BASED APPROACH

BIRD-PHRASE SEGMENTATION AND VERIFICATION: A NOISE-ROBUST TEMPLATE-BASED APPROACH BIRD-PHRASE SEGMENTATION AND VERIFICATION: A NOISE-ROBUST TEMPLATE-BASED APPROACH Kantapon Kaewtip, Lee Ngee Tan, Charles E.Taylor 2, Abeer Alwan Department of Electrical Engineering, 2 Department of Ecology

More information

A SPARSE REPRESENTATION-BASED CLASSIFIER FOR IN-SET BIRD PHRASE VERIFICATION AND CLASSIFICATION WITH LIMITED TRAINING DATA

A SPARSE REPRESENTATION-BASED CLASSIFIER FOR IN-SET BIRD PHRASE VERIFICATION AND CLASSIFICATION WITH LIMITED TRAINING DATA A SPARSE REPRESENTATION-BASED CLASSIFIER FOR IN-SET BIRD PHRASE VERIFICATION AND CLASSIFICATION WITH LIMITED TRAINING DATA Lee Ngee Tan, George Kossan 2, Martin L. Cody 2, Charles E. Taylor 2, Abeer Alwan

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek

More information

An Architecture for Animal Sound Identification based on Multiple Feature Extraction and Classification Algorithms

An Architecture for Animal Sound Identification based on Multiple Feature Extraction and Classification Algorithms An Architecture for Animal Sound Identification based on Multiple Feature Extraction and Classification Algorithms Leandro Tacioli 1, Luís Felipe Toledo 2, Claudia Bauzer Medeiros 1 1 Institute of Computing,

More information

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE

More information

1 Introduction. 3 Data Preprocessing. 2 Literature Review

1 Introduction. 3 Data Preprocessing. 2 Literature Review Rock or not? This sure does. [Category] Audio & Music CS 229 Project Report Anand Venkatesan(anand95), Arjun Parthipan(arjun777), Lakshmi Manoharan(mlakshmi) 1 Introduction Music Genre Classification continues

More information

Automatic Classification of Audio Data

Automatic Classification of Audio Data Automatic Classification of Audio Data Carlos H. C. Lopes, Jaime D. Valle Jr. & Alessandro L. Koerich IEEE International Conference on Systems, Man and Cybernetics The Hague, The Netherlands October 2004

More information

Multimedia Database Systems. Retrieval by Content

Multimedia Database Systems. Retrieval by Content Multimedia Database Systems Retrieval by Content MIR Motivation Large volumes of data world-wide are not only based on text: Satellite images (oil spill), deep space images (NASA) Medical images (X-rays,

More information

Accelerometer Gesture Recognition

Accelerometer Gesture Recognition Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate

More information

Detection of Acoustic Events in Meeting-Room Environment

Detection of Acoustic Events in Meeting-Room Environment 11/Dec/2008 Detection of Acoustic Events in Meeting-Room Environment Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of 34 Content Introduction State of the Art Acoustic

More information

Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map

Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map Markus Turtinen, Topi Mäenpää, and Matti Pietikäinen Machine Vision Group, P.O.Box 4500, FIN-90014 University

More information

Voice Command Based Computer Application Control Using MFCC

Voice Command Based Computer Application Control Using MFCC Voice Command Based Computer Application Control Using MFCC Abinayaa B., Arun D., Darshini B., Nataraj C Department of Embedded Systems Technologies, Sri Ramakrishna College of Engineering, Coimbatore,

More information

K-Nearest Neighbor Classification Approach for Face and Fingerprint at Feature Level Fusion

K-Nearest Neighbor Classification Approach for Face and Fingerprint at Feature Level Fusion K-Nearest Neighbor Classification Approach for Face and Fingerprint at Feature Level Fusion Dhriti PEC University of Technology Chandigarh India Manvjeet Kaur PEC University of Technology Chandigarh India

More information

Research Article Bird Species Recognition Using Support Vector Machines

Research Article Bird Species Recognition Using Support Vector Machines Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2007, Article ID 38637, 8 pages doi:10.1155/2007/38637 Research Article Bird Species Recognition Using Support Vector

More information

A Brief Overview of Audio Information Retrieval. Unjung Nam CCRMA Stanford University

A Brief Overview of Audio Information Retrieval. Unjung Nam CCRMA Stanford University A Brief Overview of Audio Information Retrieval Unjung Nam CCRMA Stanford University 1 Outline What is AIR? Motivation Related Field of Research Elements of AIR Experiments and discussion Music Classification

More information

Real Time Speaker Recognition System using MFCC and Vector Quantization Technique

Real Time Speaker Recognition System using MFCC and Vector Quantization Technique Real Time Speaker Recognition System using MFCC and Vector Quantization Technique Roma Bharti Mtech, Manav rachna international university Faridabad ABSTRACT This paper represents a very strong mathematical

More information

Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification

Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification

More information

Chapter 3. Speech segmentation. 3.1 Preprocessing

Chapter 3. Speech segmentation. 3.1 Preprocessing , as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents

More information

Saliency Detection for Videos Using 3D FFT Local Spectra

Saliency Detection for Videos Using 3D FFT Local Spectra Saliency Detection for Videos Using 3D FFT Local Spectra Zhiling Long and Ghassan AlRegib School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA ABSTRACT

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

METRIC LEARNING BASED DATA AUGMENTATION FOR ENVIRONMENTAL SOUND CLASSIFICATION

METRIC LEARNING BASED DATA AUGMENTATION FOR ENVIRONMENTAL SOUND CLASSIFICATION METRIC LEARNING BASED DATA AUGMENTATION FOR ENVIRONMENTAL SOUND CLASSIFICATION Rui Lu 1, Zhiyao Duan 2, Changshui Zhang 1 1 Department of Automation, Tsinghua University 2 Department of Electrical and

More information

Automatic Singular Spectrum Analysis for Time-Series Decomposition

Automatic Singular Spectrum Analysis for Time-Series Decomposition Automatic Singular Spectrum Analysis for Time-Series Decomposition A.M. Álvarez-Meza and C.D. Acosta-Medina and G. Castellanos-Domínguez Universidad Nacional de Colombia, Signal Processing and Recognition

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 1 Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 33720, Tampere, Finland toni.heittola@tut.fi,

More information

Robustness and independence of voice timbre features under live performance acoustic degradations

Robustness and independence of voice timbre features under live performance acoustic degradations Robustness and independence of voice timbre features under live performance acoustic degradations Dan Stowell and Mark Plumbley dan.stowell@elec.qmul.ac.uk Centre for Digital Music Queen Mary, University

More information

2014, IJARCSSE All Rights Reserved Page 461

2014, IJARCSSE All Rights Reserved Page 461 Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real Time Speech

More information

Further Studies of a FFT-Based Auditory Spectrum with Application in Audio Classification

Further Studies of a FFT-Based Auditory Spectrum with Application in Audio Classification ICSP Proceedings Further Studies of a FFT-Based Auditory with Application in Audio Classification Wei Chu and Benoît Champagne Department of Electrical and Computer Engineering McGill University, Montréal,

More information

Implementing a Speech Recognition System on a GPU using CUDA. Presented by Omid Talakoub Astrid Yi

Implementing a Speech Recognition System on a GPU using CUDA. Presented by Omid Talakoub Astrid Yi Implementing a Speech Recognition System on a GPU using CUDA Presented by Omid Talakoub Astrid Yi Outline Background Motivation Speech recognition algorithm Implementation steps GPU implementation strategies

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing

More information

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures Clustering and Dissimilarity Measures Clustering APR Course, Delft, The Netherlands Marco Loog May 19, 2008 1 What salient structures exist in the data? How many clusters? May 19, 2008 2 Cluster Analysis

More information

Better Than MFCC Audio Classification Features. Author. Published. Book Title DOI. Copyright Statement. Downloaded from. Griffith Research Online

Better Than MFCC Audio Classification Features. Author. Published. Book Title DOI. Copyright Statement. Downloaded from. Griffith Research Online Better Than MFCC Audio Classification Features Author Gonzalez, Ruben Published 2013 Book Title The Era of Interactive Media DOI https://doi.org/10.1007/978-1-4614-3501-3_24 Copyright Statement 2013 Springer.

More information

Content-based retrieval of music using mel frequency cepstral coefficient (MFCC)

Content-based retrieval of music using mel frequency cepstral coefficient (MFCC) Content-based retrieval of music using mel frequency cepstral coefficient (MFCC) Abstract Xin Luo*, Xuezheng Liu, Ran Tao, Youqun Shi School of Computer Science and Technology, Donghua University, Songjiang

More information

A Neural Network for Real-Time Signal Processing

A Neural Network for Real-Time Signal Processing 248 MalkofT A Neural Network for Real-Time Signal Processing Donald B. Malkoff General Electric / Advanced Technology Laboratories Moorestown Corporate Center Building 145-2, Route 38 Moorestown, NJ 08057

More information

Structural Health Monitoring Using Guided Ultrasonic Waves to Detect Damage in Composite Panels

Structural Health Monitoring Using Guided Ultrasonic Waves to Detect Damage in Composite Panels Structural Health Monitoring Using Guided Ultrasonic Waves to Detect Damage in Composite Panels Colleen Rosania December, Introduction In the aircraft industry safety, maintenance costs, and optimum weight

More information

Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique

Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique R D Neal, R J Shaw and A S Atkins Faculty of Computing, Engineering and Technology, Staffordshire University, Stafford

More information

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Noise-based Feature Perturbation as a Selection Method for Microarray Data Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering

More information

Voiced-Unvoiced-Silence Classification via Hierarchical Dual Geometry Analysis

Voiced-Unvoiced-Silence Classification via Hierarchical Dual Geometry Analysis Voiced-Unvoiced-Silence Classification via Hierarchical Dual Geometry Analysis Maya Harel, David Dov, Israel Cohen, Ronen Talmon and Ron Meir Andrew and Erna Viterbi Faculty of Electrical Engineering Technion

More information

A Method for the Identification of Inaccuracies in Pupil Segmentation

A Method for the Identification of Inaccuracies in Pupil Segmentation A Method for the Identification of Inaccuracies in Pupil Segmentation Hugo Proença and Luís A. Alexandre Dep. Informatics, IT - Networks and Multimedia Group Universidade da Beira Interior, Covilhã, Portugal

More information

Spectral modeling of musical sounds

Spectral modeling of musical sounds Spectral modeling of musical sounds Xavier Serra Audiovisual Institute, Pompeu Fabra University http://www.iua.upf.es xserra@iua.upf.es 1. Introduction Spectral based analysis/synthesis techniques offer

More information

Using Acoustic Recorders as Cost Effective Monitoring for Marine Bioacoustics and Ambient Noise Simultaneously

Using Acoustic Recorders as Cost Effective Monitoring for Marine Bioacoustics and Ambient Noise Simultaneously Using Acoustic Recorders as Cost Effective Monitoring for Marine Bioacoustics and Ambient Noise Simultaneously Mona Doss Director Wildlife Acoustics, Inc. Concord, Massachusetts USA Topics Brief introduction

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Text-Independent Speaker Identification

Text-Independent Speaker Identification December 8, 1999 Text-Independent Speaker Identification Til T. Phan and Thomas Soong 1.0 Introduction 1.1 Motivation The problem of speaker identification is an area with many different applications.

More information

DETECTING INDOOR SOUND EVENTS

DETECTING INDOOR SOUND EVENTS DETECTING INDOOR SOUND EVENTS Toma TELEMBICI, Lacrimioara GRAMA Signal Processing Group, Basis of Electronics Department, Faculty of Electronics, Telecommunications and Information Technology, Technical

More information

Handwritten Script Recognition at Block Level

Handwritten Script Recognition at Block Level Chapter 4 Handwritten Script Recognition at Block Level -------------------------------------------------------------------------------------------------------------------------- Optical character recognition

More information

AUTOMATED ROBUST ANURAN CLASSIFICATION BY EXTRACTING ELLIPTICAL FEATURE PAIRS FROM AUDIO SPECTROGRAMS

AUTOMATED ROBUST ANURAN CLASSIFICATION BY EXTRACTING ELLIPTICAL FEATURE PAIRS FROM AUDIO SPECTROGRAMS IEEE International Conference on Acoustic, Speech, and Signal Processing - ICASSP 2017 AUTOMATED ROBUST ANURAN CLASSIFICATION BY EXTRACTING ELLIPTICAL FEATURE PAIRS FROM AUDIO SPECTROGRAMS Marcello Tomasini

More information

Implementation of Speech Based Stress Level Monitoring System

Implementation of Speech Based Stress Level Monitoring System 4 th International Conference on Computing, Communication and Sensor Network, CCSN2015 Implementation of Speech Based Stress Level Monitoring System V.Naveen Kumar 1,Dr.Y.Padma sai 2, K.Sonali Swaroop

More information

Dietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++

Dietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++ Dietrich Paulus Joachim Hornegger Pattern Recognition of Images and Speech in C++ To Dorothea, Belinda, and Dominik In the text we use the following names which are protected, trademarks owned by a company

More information

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús

More information

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION Please contact the conference organizers at dcasechallenge@gmail.com if you require an accessible file, as the files provided by ConfTool Pro to reviewers are filtered to remove author information, and

More information

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se

More information

Fuzzy Multilevel Graph Embedding for Recognition, Indexing and Retrieval of Graphic Document Images

Fuzzy Multilevel Graph Embedding for Recognition, Indexing and Retrieval of Graphic Document Images Cotutelle PhD thesis for Recognition, Indexing and Retrieval of Graphic Document Images presented by Muhammad Muzzamil LUQMAN mluqman@{univ-tours.fr, cvc.uab.es} Friday, 2 nd of March 2012 Directors of

More information

2. Basic Task of Pattern Classification

2. Basic Task of Pattern Classification 2. Basic Task of Pattern Classification Definition of the Task Informal Definition: Telling things apart 3 Definition: http://www.webopedia.com/term/p/pattern_recognition.html pattern recognition Last

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

CT516 Advanced Digital Communications Lecture 7: Speech Encoder CT516 Advanced Digital Communications Lecture 7: Speech Encoder Yash M. Vasavada Associate Professor, DA-IICT, Gandhinagar 2nd February 2017 Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February

More information

Response surface methods for selecting spectrogram hyperparameters with application to acoustic classification of environmental-noise signatures

Response surface methods for selecting spectrogram hyperparameters with application to acoustic classification of environmental-noise signatures Response surface methods for selecting spectrogram hyperparameters with application to acoustic classification of environmental-noise signatures Waterford at Springfield, April 4 th, 2017 Ed Nykaza (ERDC-CERL)

More information

An Introduction to Pattern Recognition

An Introduction to Pattern Recognition An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included

More information

Corner Detection. Harvey Rhody Chester F. Carlson Center for Imaging Science Rochester Institute of Technology

Corner Detection. Harvey Rhody Chester F. Carlson Center for Imaging Science Rochester Institute of Technology Corner Detection Harvey Rhody Chester F. Carlson Center for Imaging Science Rochester Institute of Technology rhody@cis.rit.edu April 11, 2006 Abstract Corners and edges are two of the most important geometrical

More information

A text-independent speaker verification model: A comparative analysis

A text-independent speaker verification model: A comparative analysis A text-independent speaker verification model: A comparative analysis Rishi Charan, Manisha.A, Karthik.R, Raesh Kumar M, Senior IEEE Member School of Electronic Engineering VIT University Tamil Nadu, India

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

Cepstral Analysis Tools for Percussive Timbre Identification

Cepstral Analysis Tools for Percussive Timbre Identification Cepstral Analysis Tools for Percussive Timbre Identification William Brent Department of Music and Center for Research in Computing and the Arts University of California, San Diego wbrent@ucsd.edu ABSTRACT

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Neetha Das Prof. Andy Khong

Neetha Das Prof. Andy Khong Neetha Das Prof. Andy Khong Contents Introduction and aim Current system at IMI Proposed new classification model Support Vector Machines Initial audio data collection and processing Features and their

More information

Content Based Classification of Audio Using MPEG-7 Features

Content Based Classification of Audio Using MPEG-7 Features Content Based Classification of Audio Using MPEG-7 Features ManasiChoche, Dr.SatishkumarVarma Abstract The segmentation plays important role in audio classification. The audio data can be divided into

More information

A Fast Approximated k Median Algorithm

A Fast Approximated k Median Algorithm A Fast Approximated k Median Algorithm Eva Gómez Ballester, Luisa Micó, Jose Oncina Universidad de Alicante, Departamento de Lenguajes y Sistemas Informáticos {eva, mico,oncina}@dlsi.ua.es Abstract. The

More information

Efficient Similarity Search in Scientific Databases with Feature Signatures

Efficient Similarity Search in Scientific Databases with Feature Signatures DATA MANAGEMENT AND DATA EXPLORATION GROUP Prof. Dr. rer. nat. Thomas Seidl DATA MANAGEMENT AND DATA EXPLORATION GROUP Prof. Dr. rer. nat. Thomas Seidl Efficient Similarity Search in Scientific Databases

More information

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about

More information

Speaker Diarization System Based on GMM and BIC

Speaker Diarization System Based on GMM and BIC Speaer Diarization System Based on GMM and BIC Tantan Liu 1, Xiaoxing Liu 1, Yonghong Yan 1 1 ThinIT Speech Lab, Institute of Acoustics, Chinese Academy of Sciences Beijing 100080 {tliu, xliu,yyan}@hccl.ioa.ac.cn

More information

3. Cluster analysis Overview

3. Cluster analysis Overview Université Laval Multivariate analysis - February 2006 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as

More information

Video shot segmentation using late fusion technique

Video shot segmentation using late fusion technique Video shot segmentation using late fusion technique by C. Krishna Mohan, N. Dhananjaya, B.Yegnanarayana in Proc. Seventh International Conference on Machine Learning and Applications, 2008, San Diego,

More information

Music Genre Classification

Music Genre Classification Music Genre Classification Matthew Creme, Charles Burlin, Raphael Lenain Stanford University December 15, 2016 Abstract What exactly is it that makes us, humans, able to tell apart two songs of different

More information

Deficiencies in NIST Fingerprint Image Quality Algorithm Predicting biometric performance using image quality metrics

Deficiencies in NIST Fingerprint Image Quality Algorithm Predicting biometric performance using image quality metrics Deficiencies in NIST Fingerprint Image Quality Algorithm Predicting biometric performance using image quality metrics Martin Aastrup Olsen, Hochschule Darmstadt, CASED 12. Deutscher IT-Sicherheitskongress,

More information

A Gaussian Mixture Model Spectral Representation for Speech Recognition

A Gaussian Mixture Model Spectral Representation for Speech Recognition A Gaussian Mixture Model Spectral Representation for Speech Recognition Matthew Nicholas Stuttle Hughes Hall and Cambridge University Engineering Department PSfrag replacements July 2003 Dissertation submitted

More information

Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications

Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications Anil K Goswami 1, Swati Sharma 2, Praveen Kumar 3 1 DRDO, New Delhi, India 2 PDM College of Engineering for

More information

International Journal of the Korean Society of Precision Engineering, Vol. 1, No. 1, June 2000.

International Journal of the Korean Society of Precision Engineering, Vol. 1, No. 1, June 2000. International Journal of the Korean Society of Precision Engineering, Vol. 1, No. 1, June 2000. Jae-Yeol Kim *, Gyu-Jae Cho * and Chang-Hyun Kim ** * School of Mechanical Engineering, Chosun University,

More information

Multimedia Databases

Multimedia Databases Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Previous Lecture Audio Retrieval - Low Level Audio

More information

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi 1. Introduction The choice of a particular transform in a given application depends on the amount of

More information

Input speech signal. Selected /Rejected. Pre-processing Feature extraction Matching algorithm. Database. Figure 1: Process flow in ASR

Input speech signal. Selected /Rejected. Pre-processing Feature extraction Matching algorithm. Database. Figure 1: Process flow in ASR Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Feature Extraction

More information

Customer Clustering using RFM analysis

Customer Clustering using RFM analysis Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

In this project, I examined methods to classify a corpus of s by their content in order to suggest text blocks for semi-automatic replies.

In this project, I examined methods to classify a corpus of  s by their content in order to suggest text blocks for semi-automatic replies. December 13, 2006 IS256: Applied Natural Language Processing Final Project Email classification for semi-automated reply generation HANNES HESSE mail 2056 Emerson Street Berkeley, CA 94703 phone 1 (510)

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

Chapter 4: Text Clustering

Chapter 4: Text Clustering 4.1 Introduction to Text Clustering Clustering is an unsupervised method of grouping texts / documents in such a way that in spite of having little knowledge about the content of the documents, we can

More information

Available online Journal of Scientific and Engineering Research, 2016, 3(4): Research Article

Available online   Journal of Scientific and Engineering Research, 2016, 3(4): Research Article Available online www.jsaer.com, 2016, 3(4):417-422 Research Article ISSN: 2394-2630 CODEN(USA): JSERBR Automatic Indexing of Multimedia Documents by Neural Networks Dabbabi Turkia 1, Lamia Bouafif 2, Ellouze

More information

An Unsupervised Technique for Statistical Data Analysis Using Data Mining

An Unsupervised Technique for Statistical Data Analysis Using Data Mining International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 5, Number 1 (2013), pp. 11-20 International Research Publication House http://www.irphouse.com An Unsupervised Technique

More information

Machine Learning for. Artem Lind & Aleskandr Tkachenko

Machine Learning for. Artem Lind & Aleskandr Tkachenko Machine Learning for Object Recognition Artem Lind & Aleskandr Tkachenko Outline Problem overview Classification demo Examples of learning algorithms Probabilistic modeling Bayes classifier Maximum margin

More information

A Neural Classifier for Anomaly Detection in Magnetic Motion Capture

A Neural Classifier for Anomaly Detection in Magnetic Motion Capture A Neural Classifier for Anomaly Detection in Magnetic Motion Capture Iain Miller 1 and Stephen McGlinchey 2 1 University of Paisley, Paisley. PA1 2BE, UK iain.miller@paisley.ac.uk, 2 stephen.mcglinchey@paisley.ac.uk

More information

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics Helmut Berger and Dieter Merkl 2 Faculty of Information Technology, University of Technology, Sydney, NSW, Australia hberger@it.uts.edu.au

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Dynamic Human Shape Description and Characterization

Dynamic Human Shape Description and Characterization Dynamic Human Shape Description and Characterization Z. Cheng*, S. Mosher, Jeanne Smith H. Cheng, and K. Robinette Infoscitex Corporation, Dayton, Ohio, USA 711 th Human Performance Wing, Air Force Research

More information

Texture April 17 th, 2018

Texture April 17 th, 2018 Texture April 17 th, 2018 Yong Jae Lee UC Davis Announcements PS1 out today Due 5/2 nd, 11:59 pm start early! 2 Review: last time Edge detection: Filter for gradient Threshold gradient magnitude, thin

More information

A framework for audio analysis

A framework for audio analysis MARSYAS: A framework for audio analysis George Tzanetakis 1 Department of Computer Science Princeton University Perry Cook 2 Department of Computer Science 3 and Department of Music Princeton University

More information

Clustering & Classification (chapter 15)

Clustering & Classification (chapter 15) Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical

More information

Localized Anomaly Detection via Hierarchical Integrated Activity Discovery

Localized Anomaly Detection via Hierarchical Integrated Activity Discovery Localized Anomaly Detection via Hierarchical Integrated Activity Discovery Master s Thesis Defense Thiyagarajan Chockalingam Advisors: Dr. Chuck Anderson, Dr. Sanjay Rajopadhye 12/06/2013 Outline Parameter

More information

Ergodic Hidden Markov Models for Workload Characterization Problems

Ergodic Hidden Markov Models for Workload Characterization Problems Ergodic Hidden Markov Models for Workload Characterization Problems Alfredo Cuzzocrea DIA Dept., University of Trieste and ICAR-CNR, Italy alfredo.cuzzocrea@dia.units.it Enzo Mumolo DIA Dept., University

More information

A SIMPLE, HIGH-YIELD METHOD FOR ASSESSING STRUCTURAL NOVELTY

A SIMPLE, HIGH-YIELD METHOD FOR ASSESSING STRUCTURAL NOVELTY A SIMPLE, HIGH-YIELD METHOD FOR ASSESSING STRUCTURAL NOVELTY Olivier Lartillot, Donato Cereghetti, Kim Eliard, Didier Grandjean Swiss Center for Affective Sciences, University of Geneva, Switzerland olartillot@gmail.com

More information