QUERY BY EXAMPLE IN LARGE DATABASES USING KEY-SAMPLE DISTANCE TRANSFORMATION AND CLUSTERING

Size: px
Start display at page:

Download "QUERY BY EXAMPLE IN LARGE DATABASES USING KEY-SAMPLE DISTANCE TRANSFORMATION AND CLUSTERING"

Transcription

1 QUERY BY EXAMPLE IN LARGE DATABASES USING KEY-SAMPLE DISTANCE TRANSFORMATION AND CLUSTERING Marko Helén Tampere University of Technology Institute of Signal Processing Korkeakoulunkatu 1, FIN Tampere, Finland Tel: Fax: Tommi Lahti Nokia Research Center Interaction Core Technology Center Personal Media and Content team Finland ABSTRACT Calculating the similarity estimates between the query sample and the database samples becomes an exhaustive task with large, usually continuously updated multimedia databases. In this paper, a fast and low complexity transformation from the original feature space into k-dimensional vector space and clustering are proposed to alleviate the problem. First k keysamples are chosen randomly from the database. These samples and a distance function specify the transformation from the series of feature vectors into k-dimensional vector space where database (re)clustering can be done fast with plurality of traditional clustering technique whenever required. In the experiments, similarity between the samples was calculated by using the Euclidean distance between their associated feature vector probability density functions. The k-means algorithm was used to cluster the transformed samples in the vector space. The experiments show that considerable time and computational savings are achieved while there is only a marginal drop in performance. 1. INTRODUCTION Query by example aims at automatic retrieval of samples from the database, which are similar to the example provided by the user. The most accurate way of making a query is naturally the exhaustive full query, in which the distances between the features associated to the query sample and the features associated to the database samples are extracted and distance from query sample to all samples in database is calculated. Audio similarity calculations between audio samples is considered as an example throughout this paper. After the feature extraction there is a series of feature vectors associated to each sample. Helén and Virtanen provided a closed form solution for the Euclidean distance between a probability density functions (pdfs) and proposed a full query using this distance measure for the series of feature vectors[1]. However, with large, usually continuously updated multimedia databases, full query is impractically slow and computationally expensive. In order to reduce the computational time, clustering can be applied as a preprocessing. However, clustering techniques like k-means that rely on vector operations when calculating the cluster centroids, can not be applied directly. Furthermore, there are alternatives which can be applied. For example, the k-medoids algorithm, which is strongly related to the k-means. The difference is, that in k-means the cluster centroid is the center of cluster but in k-medoids the cluster centroid is an actual sample in the database having the smallest sum of distances to the other samples in the cluster. The drawback is that during the clustering, the expensive distance calculations (discussed below) need to be done many times iteratively. Shapiro calculated distances from the database samples to predefined reference points. Then the query was made only within the samples which were nearly the same distance from the reference points [2]. This way the number of distance calculations could be reduced and thus speed up the query. Several indexing techniques have also been developed for the purpose. Spatial Access Methods (SAMs) utilize hierarchical tree structures to cluster the feature space [3]. The other common class of techniques called Metric Access Methods (MAMs) assume only the availability of the distance function [4]. Clustering algorithms like k-means algorithm, again, cannot operate directly in the distance space. Therefore, one obvious solution is to map all the samples into n- dimensional vector space while preserving the distances between all the database samples. Multidimensional Scaling (MDS) techniques like FastMap are known to preserve the distances but they basically require calculating distances between all database pair at some point [5]. Kiranyaz and Gabbouj proposed technique called Pro-

2 gressive Query (PQ) that performs a series of intermediate queries, returns intermediate results to the user and finally converging to the full scale query [6]. Having the most promising intermediate queries - in other words, the most promising clusters of samples - processed first, naturally improves the usability of this idea. Prior information on the underlying classes can be utilized in ranking the clusters. Categorization of music into genres and training statistical models on them is one common example. Berenzweig, Ellis, and Lawrence proposed a method of mapping music into an anchor space [7]. In the general query by example situation, however, prior information to enable semantic etc. modeling can not always be assumed. Another thing is that modeling may become costly and may required even hand labeled training material. We are aiming at very general query by example method, which would be practical in large multimedia databases. In this paper the transformation from the series of feature vectors to fixed dimensional feature space is proposed. This enables the use of effective clustering methods in order to reduce the query times. This paper is organized as follows. Section 2 gives an overview of the fast query by example system. Section 3 presents the proposed transformation and introduces the distance measure used here. Section 4 represents the experimental results from the fast search compared to exhaustive full search. Finally, the conclusions are presented in Section 5. Database Feature extraction Estimate GMMs Key-sample distance transform Cluster the database Key-sample selection Find the nearest cluster Calculate distances inside cluster Sort by distance Example signal Feature extraction Estimate GMM Key-sample distance transform 2. SYSTEM OVERVIEW An overview of the system is illustrated in Fig. 1. First, the features are extracted from the example signal given by the user. Second, a GMM which models the feature distribution is estimated using the expectation maximization (EM) algorithm. The same set of features and a GMM is estimated for each sample in the query database in beforehand. Third, the samples are transformed into k-dimensional space using key-sample distance transformation (explained in Chapter 3). Fourth, the database is clustered into n clusters using standard clustering algorithms. Fifth, the example signal is compared against the clusters and the one with the shortest distance is chosen. Sixth, the example is compared against signals in the chosen cluster one by one and similarity between all pairs is estimated by the Euclidean distance between their pdfs. Finally, when all the similarity values are calculated a decision is made regarding the similarity of the samples to the example and those considered similar are retrieved to the user. 3. KEY-SAMPLE DISTANCE TRANSFORMATION The clustering is an important step in query process, because the size of the database can easily grow so large that going linearily through the whole database would be too exchaustive. However, traditional clustering algorithms require the setting Similar database samples Fig. 1. Overview of the fast query by example system. of cluster centroids, which are points in the feature space [8]. Unfortunately, if samples contain a series of feature vectors, the samples can not be inserted in any fixed dimensional feature space. One solution would be applying some statistical measure (mean, median,...) to the series of feature vectors but then lots of information that could be used in distance calculation would be lost. Feature vectors could also be concatenated if the series of feature vectors are the same length, but since the varying length series are considered here this does not solve the problem. We propose a transformation which enables the usage of effective clustering methods in large databases where each sample is a series of feature vectors. The transformation is based on distances to key-samples chosen from the database. The transformation is defined as follows: T (x, O,d)=F R k, (1)

3 Original feature set Random sampling Data transformation Clustering Fig. 2. The key-sample distance transformation. where x is the original series of feature vectors, O is the set of k key-samples, d is the distance measure, F is the original feature space, and R k is the k-dimensional feature space in which i th element is the distance from x to i th key-sample (i =1,..., k). The system is illustrated in Fig. 2. First, the k samples are chosen randomly from the database to work as key-samples. Second, the distances from each sample in the database to these key-samples are calculated. The distances from one sample to all of these key-samples are considered as a feature vector for this sample. Third, the database is clustered using these new feature vectors. Now, the samples are points in k- dimensional feature space and thus, the traditional clustering algorithms can be applied. We used k-means clustering in the simulations, since it is known to be very efficient algorithm. When the query is made, the nearest cluster to query sample is found using these random sample distances as a features and the actual query is made only inside the closest clusters. In order to achieve the most accurate results we applied Euclidean distance between pdfs [1] for the inside cluster query. The advantage of using this transformation is that we achieve significant speedup in clustering system, since instead of series of feature vectors, we can operate with single feature vectors. Simultanously, we are able to use very accurate distance measures since, in contrast to full search, only a small fraction of all combinations have to be calculated Euclidean distance between pdfs The distribution p(x) of the features of each sample is modelled using a Gaussian mixture model (GMM), defined as p(x) = w i N i (x; µ i, Σ i ), (2) i=1 where w i is the weight of the i th component, I is the number of components, and N i is the multivariate normal distribution with mean vector µ i and diagonal covariance matrix Σ i.the weights are non-negative and sum to unity. The parameters of GMMs are estimated using expectation maximization (EM) algorithm. The means, variances, and weights for a fixed number of components are estimated. It should be noted that the variances have to be restricted above a relatively high fixed minimum level, since low-variance components would dominate the measure. The similarity of two samples is measured by the square of the Euclidean distance e between their distributions p 1 (x) and p 2 (x). This is obtained by integrating the squared difference over the whole feature space: e =... [p 1 (x) p 2 (x)] 2 dx 1...dx N. (3) Helen and Virtanen derived a closed-form solution for this in [1]: e = 2 i=1 j=1 i=1 j=1 w i w j Q i,j,1,1 + J w i v j Q i,j,1,2, J i=1 j=1 J v i v j Q i,j,2,2 where w i and w j are the weights of the i th and j th component of GMM 1, v i and v j are the weights of the i th and j th component of GMM 2, and I and J are the number of components in GMM 1 and GMM 2 respectively. Q i,j,k,m denotes the integral of the product of the i th component of GMM k {1, 2} and the j th component of GMM m {1, 2}: N 1 (x; µ 1, Σ 1 )N 2 (x; µ 2, Σ 2 ) dx 1 = (2π) N/2 N n=1 σ1,n 2 + σ2 2,n [ ] exp 1 (µ 1,n µ 2,n ) 2 2 σ1,n 2 +, σ2 2,n n=1 where µ k,n is the n th entry of mean vector µ k, k {1, 2}, and σk,n 2 is its variance. 4. EXPERIMENTAL RESULTS To evaluate the performance of the proposed transformation in query by example, the following simulations were used. The audio database used in the tests contained 1332 samples with 16 khz sampling rate. The samples were manually annotated into 4 main categories and 17 sub categories. Samples falling into each class were considered to be similar. The classes and the number of samples in each class are listed in Table 1. The representative samples for each class are selected by listening and choosing the samples for each class, which do not have content from other classes. Samples for Environmental class are taken from CASR recordings [9]. The subclasses correspond the classes in CASR (car, restaurant, road). The drum samples are acoustic drum sequences used by Paulus [10]. The rest of the Music subclasses are from RWC Music Database [11], acoustic class is from RWC Jazz Music Database, electroacoustic is from RWC Popular Music Database, and Symphony is from Classical Music Database. Sing class was taken from Vox (4) (5)

4 Table 1. Classes used in simulations. Mainclass Subclass Environmental (231) Inside car (151) In restaurant (42) Traffic (38) Music (620) Acoustic (264) Drums (56) Electroacoustic (249) Symphony (51) Sing (165) Humming (52) Singing (60) Whistling (53) Speech (316) Speaker1 (50) Speaker2 (47) Speaker3 (44) Speaker4 (40) Speaker5 (47) Speaker6 (38) Speaker7 (50) database presented in [12]. The speech samples are from the CMU Arctic speech database [13]. All the samples in our database are 10 seconds long. The length of speech samples in Arctic database is 2-4 seconds, thus the samples from each speaker are combined so that 10 second samples are achieved. Original samples in other databases are longer than 10 seconds, thus random 10 second clips are cut from those Feature extraction Feature extraction aims at modelling the perceptually most relevant information from the original signal using only a small number of parameters. Most features are extracted in short (20ms-60ms) frame, and typically they parametrize the spectrum of the sound because in comparison to the timedomain signal, the spectrum correlates better with human sound perception. We are aiming at very general audio signal query thus we choose the features which measure the different properties from the sound. In our earlier studies, different feature sets were tested and based on experiments [1] [14] the best feature set was chosen. The frequency content of the frame is described using three Mel-frequency cepstral coefficients, spectral centroid, noise likeness [15], spectral spread, spectral flux, harmonic ratio [16], and maximum autocorrelation lag. Temporal characteristics of the signal are described using zero crossing rate, crest factor, total energy, and variance of instantaneous power. The features are extracted in 46 ms frames. Each feature is normalized to have zero mean and unity variance over the whole database. The total number of features is Evaluation procedure The database was first transformed and clustered using the proposed method. One sample at the time was drawn from the database to serve as a query sample and the rest were considered as the database. The nearest cluster was found by calculating the Euclidean distance between the transformed query sample and the cluster centroids. Then query was made inside the nearest cluster using original series of feature vectors of the samples and the Euclidean distance between their pdfs. The number of Gaussian components used in simulations was 8. In addition to nearest cluster, also the effect of two and three nearest clusters were tested. If in the query process the retrieved sample was labeled in the same class as the query sample, the database sample was seen as correctly retrieved from the database. The query results were compared to full search, which offers us an upper limit for precision which could be achieved with the optimal clustering. The results are presented here in terms of precision rates, which gives the portion of correctly retrieved samples over all retrieved samples from the database: precision = c r, (6) where c is the number of correctly retrieved samples from the database and r gives the number of all retrieved samples Results Figure 3 presents the results of query by example using the key-sample distance transformation with 17 clusters compared to full search. Here 5 most similar samples are retrieved from the database. The figure illustrates the difference in precision while the number of key-samples is changing. As can be seen, the higher number of key-samples results in higher precision. However, the improvement in precision is quite small after 10 key-samples. It is advantageous to restrict the number of key-samples to as small as possible, because the distance from these samples to all the other samples in database have to be calculated and thus, with large number of key-samples the clustering becomes exhaustive. Compared to full search the difference in precision when using 10 key-samples is only 3 percent units. On the other hand, the speedup of query is directly proportional to the number of clusters assuming that clustering is made offline and the time to search the nearest cluster is negligible. The search of the nearest cluster may not always be negligable, since the distances from query sample to all key-samples have to be calculated. However, with 10 key-samples the computation time is acceptable. Figure 4 illustrates the effect that number of clusters has to the retrieval accuracy. Here 10 key-samples are used and 5 most similar samples are retrieved. One cluster case corresponds to the full search and as expected the search accu-

5 Fast search Full search Fast search 1 cluster Fast search 2 cluster Fast search 3 cluster Full search Precision Precision Number of key samples Number of clusters Fig. 3. Precision values when the number of key-samples is changing. Fig. 4. Precision values when the number of clusters is changing. racy decreases while the number of clusters is increased. The choice of how many clusteres should be used, depends on the application. When the number of clusters is low, the clustering phase is faster but the query is slower. Likewise, when the number of clusters is high, the query is fast but the clustering is slow. The clustering phase can be made offline, thus the query part is usually more critical. The accuracy of the query can be increased by finding the similar samples also from the other near clusters. Figure 4 also illustrates the effect of making the query in two or three nearest clusters. In these cases the precision is very close to full search even when using cluster numbers as high as 50. Table 2 presents the confusion matrix of the query with 10 key-samples, 17 clusters, and 10 most similar samples were retrieved. The overall precision in this testcase was 91.1 %. It can be seen that almost all false retrieved were however, inside the same main class. The worst cases were acoustic music vs. electroacoustic and singing vs. humming vs. whistling. These errors are understandable because those classes are close to each other for human listener as well. 5. CONCLUSIONS AND FUTURE WORK A novel method for speeding up the query by example in large database using key-sample distance transformation and clustering was proposed. The method was tested in audio query by example but is applicaple to any query by example or classification task. The running time of query by example was reduced significantly (to less than one tenth) compared to the full search, while the accuracy was only reduced by 3 percent units, when the search was made only inside the closest cluster. When the search was expanded to 2 or 3 nearest clusters the difference in precision to full search was only around 1 percent unit. In our future work we will concentrate on choosing the number of clusters and updating the clusters while the number of samples in the database is altering. When samples are added to or removed from the database, the existing clusters must be updated since the samples inside the clusters are changing. Furthermore, there comes a point when clusters must be split or combined in order to maintain the desired size of clusters. 6. ACKNOWLEDGEMENTS This work was supported by the Academy of Finland, project No (Finnish Centre of Excellence program ) and Nokia Research Center. 7. REFERENCES [1] M. Helén and T. Virtanen, Query by Example Methods of Audio Signals Using Euclidean Distance Between Gaussian Mixture Models, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), Honolulu, Hawaii, USA, Apr [2] M. Shapiro, The Choice of Reference Points in Bestmatch File Searching, Communications of the ACM, vol. 20, no. 5, pp , [3] V. Gaede and O. Günther, Multidimensional access methods, ACM Computing Surveys, vol. 30, no. 2, pp , 1998.

6 Table 2. Confusion matrix for proposed method when 10 most similar samples are retrieved. Inside car In restaurant Traffic Acoustic Drums Electroacoustic Symphony Inside car In restaurent Traffic Acoustic Drums Electroacoustic Symphony Humming Singing Whistling Speaker Speaker Speaker Speaker Speaker Speaker Speaker Humming Singing Whistling Speaker1 Speaker2 Speaker3 Speaker4 Speaker5 Speaker6 Speaker7 [4] C. Traina, A. Traina, B. Seeger, and C. Faloutsos, Slim-Trees: High Performance Metric Trees Minimizing Overlap between Nodes, Lecture Notes in Computer Science, vol. 1777, pp , [5] C. Faloutsos and K.-I. Lin, FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets, in Proc ACM SIGMOD International Conference on Management of Data, San Jose, California, 1995, pp [6] S. Kiranyaz and M. Gabbouj, A Novel Multimedia Retrieval Technique: Progressive Query (WHY WAIT?), IEE Proceedings - Vision, Image and Signal Processing, vol. 152, no. 3, pp , June [7] A. Berenzweig, D. P. W. Ellis, and S. Lawrence, Anchor Space for Classification and Similarity Measurement of Music, in Proc. International Conference on Multimedia and Expo (ICME 03), 2003, pp [8] H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. El Abbadi, Approximate Nearest Neighbor Searching in Multimedia Databases, in Proc. 17th International Conference on Data Engineering, Heidelberg, Germany, 2001, pp [9] V. Peltonen, J. Tuomi, A. Klapuri, J. Huopaniemi, and T. Sorsa, Computational auditory scene recognition, in Proc IEEE International Conference on Acoustics, Speech, and Signal Processing, Florida, May [10] J. Paulus and T. Virtanen, Drum Transcription with Non-negative Spectrogram Factorisation, in Proc. 13th European Signal Processing Conference (EUSIPCO2005), Antalya, Turkey, Sept [11] M. Goto, H. Hashiguchi, T. Nishimura, and R. Oka, RWC Music Database: Popular, Classical, and Jazz Music Databases, in Proc. 3rd International Conference on Music Information Retrieval, Oct [12] T. Viitaniemi, A. Klapuri, and A. Eronen, A Probabilistic Model for the Transcription of Single-Voice Melodies, in Proc. Finnish Signal Processing Symposium (FINSIG 03), Finland, May 2003, pp [13] J. Kominek and A. Black, The CMU Arctic speech databases, in Proc. 5th ISCA Speech Synthesis Workshop, Pittsburgh, USA, 2004, pp [14] M. Helén and T. Lahti, Query by Example Methods for Audio Signals, in Proc. 7th IEEE Nordic Signal Processing Symposium, Iceland, June 2006, pp [15] C. Uhle, C. Dittmar, and T. Sporer, Extraction of Drum Tracks From Polyphonic Music Using Independent Subspace Analysis, in Proc. 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003), Nara, Japan, Apr [16] J. J. Burred and A. Lerch, A Hierarchical Approach to Automatic Musical Genre Classification, in Proc. 6th International Conference on Digital Audio Effects (DAFX), London, UK, Sept

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 1 Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 33720, Tampere, Finland toni.heittola@tut.fi,

More information

Repeating Segment Detection in Songs using Audio Fingerprint Matching

Repeating Segment Detection in Songs using Audio Fingerprint Matching Repeating Segment Detection in Songs using Audio Fingerprint Matching Regunathan Radhakrishnan and Wenyu Jiang Dolby Laboratories Inc, San Francisco, USA E-mail: regu.r@dolby.com Institute for Infocomm

More information

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014 MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION Steve Tjoa kiemyang@gmail.com June 25, 2014 Review from Day 2 Supervised vs. Unsupervised Unsupervised - clustering Supervised binary classifiers (2 classes)

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components Review Lecture 14 ! PRINCIPAL COMPONENT ANALYSIS Eigenvectors of the covariance matrix are the principal components 1. =cov X Top K principal components are the eigenvectors with K largest eigenvalues

More information

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms

More information

University of Florida CISE department Gator Engineering. Clustering Part 2

University of Florida CISE department Gator Engineering. Clustering Part 2 Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical

More information

Content-based image and video analysis. Machine learning

Content-based image and video analysis. Machine learning Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016 Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the

More information

The Automatic Musicologist

The Automatic Musicologist The Automatic Musicologist Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego UCSD AI Seminar April 12, 2004 Based on the paper: Fast Recognition of Musical

More information

Speaker Diarization System Based on GMM and BIC

Speaker Diarization System Based on GMM and BIC Speaer Diarization System Based on GMM and BIC Tantan Liu 1, Xiaoxing Liu 1, Yonghong Yan 1 1 ThinIT Speech Lab, Institute of Acoustics, Chinese Academy of Sciences Beijing 100080 {tliu, xliu,yyan}@hccl.ioa.ac.cn

More information

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Outline K-means, K-medoids, EM algorithm choosing number of clusters: Gap test hierarchical clustering spectral

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

MSA220 - Statistical Learning for Big Data

MSA220 - Statistical Learning for Big Data MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

A Graph Theoretic Approach to Image Database Retrieval

A Graph Theoretic Approach to Image Database Retrieval A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 98195-2500

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

1 Introduction. 3 Data Preprocessing. 2 Literature Review

1 Introduction. 3 Data Preprocessing. 2 Literature Review Rock or not? This sure does. [Category] Audio & Music CS 229 Project Report Anand Venkatesan(anand95), Arjun Parthipan(arjun777), Lakshmi Manoharan(mlakshmi) 1 Introduction Music Genre Classification continues

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

Detection of goal event in soccer videos

Detection of goal event in soccer videos Detection of goal event in soccer videos Hyoung-Gook Kim, Steffen Roeber, Amjad Samour, Thomas Sikora Department of Communication Systems, Technical University of Berlin, Einsteinufer 17, D-10587 Berlin,

More information

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri 1 CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING Alexander Wankhammer Peter Sciri introduction./the idea > overview What is musical structure?

More information

TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University

TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION Prateek Verma, Yang-Kai Lin, Li-Fan Yu Stanford University ABSTRACT Structural segmentation involves finding hoogeneous sections appearing

More information

Metric Learning for Large Scale Image Classification:

Metric Learning for Large Scale Image Classification: Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Thomas Mensink 1,2 Jakob Verbeek 2 Florent Perronnin 1 Gabriela Csurka 1 1 TVPA - Xerox Research Centre

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System

Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol.2, II-131 II-137, Dec. 2001. Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System

More information

Textural Features for Image Database Retrieval

Textural Features for Image Database Retrieval Textural Features for Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington Seattle, WA 98195-2500 {aksoy,haralick}@@isl.ee.washington.edu

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming

Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming Takuichi Nishimura Real World Computing Partnership / National Institute of Advanced

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:

More information

ICA mixture models for image processing

ICA mixture models for image processing I999 6th Joint Sy~nposiurn orz Neural Computation Proceedings ICA mixture models for image processing Te-Won Lee Michael S. Lewicki The Salk Institute, CNL Carnegie Mellon University, CS & CNBC 10010 N.

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Markov Random Fields and Segmentation with Graph Cuts

Markov Random Fields and Segmentation with Graph Cuts Markov Random Fields and Segmentation with Graph Cuts Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project Proposal due Oct 27 (Thursday) HW 4 is out

More information

Modeling the Spectral Envelope of Musical Instruments

Modeling the Spectral Envelope of Musical Instruments Modeling the Spectral Envelope of Musical Instruments Juan José Burred burred@nue.tu-berlin.de IRCAM Équipe Analyse/Synthèse Axel Röbel / Xavier Rodet Technical University of Berlin Communication Systems

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Extracting meaning

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute

More information

Visual Representations for Machine Learning

Visual Representations for Machine Learning Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering

More information

Enhancing K-means Clustering Algorithm with Improved Initial Center

Enhancing K-means Clustering Algorithm with Improved Initial Center Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 9: Data Mining (4/4) March 9, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Application of Characteristic Function Method in Target Detection

Application of Characteristic Function Method in Target Detection Application of Characteristic Function Method in Target Detection Mohammad H Marhaban and Josef Kittler Centre for Vision, Speech and Signal Processing University of Surrey Surrey, GU2 7XH, UK eep5mm@ee.surrey.ac.uk

More information

Kapitel 4: Clustering

Kapitel 4: Clustering Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Modelling image complexity by independent component analysis, with application to content-based image retrieval

Modelling image complexity by independent component analysis, with application to content-based image retrieval Modelling image complexity by independent component analysis, with application to content-based image retrieval Jukka Perkiö 1,2 and Aapo Hyvärinen 1,2,3 1 Helsinki Institute for Information Technology,

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

Client Dependent GMM-SVM Models for Speaker Verification

Client Dependent GMM-SVM Models for Speaker Verification Client Dependent GMM-SVM Models for Speaker Verification Quan Le, Samy Bengio IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland {quan,bengio}@idiap.ch Abstract. Generative Gaussian Mixture Models (GMMs)

More information

Lecture 7: Segmentation. Thursday, Sept 20

Lecture 7: Segmentation. Thursday, Sept 20 Lecture 7: Segmentation Thursday, Sept 20 Outline Why segmentation? Gestalt properties, fun illusions and/or revealing examples Clustering Hierarchical K-means Mean Shift Graph-theoretic Normalized cuts

More information

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 CLUSTERING CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 1. K-medoids: REFERENCES https://www.coursera.org/learn/cluster-analysis/lecture/nj0sb/3-4-the-k-medoids-clustering-method https://anuradhasrinivas.files.wordpress.com/2013/04/lesson8-clustering.pdf

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1 Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group

More information

An Improvement of Centroid-Based Classification Algorithm for Text Classification

An Improvement of Centroid-Based Classification Algorithm for Text Classification An Improvement of Centroid-Based Classification Algorithm for Text Classification Zehra Cataltepe, Eser Aygun Istanbul Technical Un. Computer Engineering Dept. Ayazaga, Sariyer, Istanbul, Turkey cataltepe@itu.edu.tr,

More information

Metric Learning for Large-Scale Image Classification:

Metric Learning for Large-Scale Image Classification: Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka

More information

Nonlinear Image Interpolation using Manifold Learning

Nonlinear Image Interpolation using Manifold Learning Nonlinear Image Interpolation using Manifold Learning Christoph Bregler Computer Science Division University of California Berkeley, CA 94720 bregler@cs.berkeley.edu Stephen M. Omohundro'" Int. Computer

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

SEMI-SUPERVISED LEARNING FOR MUSICAL INSTRUMENT RECOGNITION. Aleksandr Diment, Toni Heittola, Tuomas Virtanen

SEMI-SUPERVISED LEARNING FOR MUSICAL INSTRUMENT RECOGNITION. Aleksandr Diment, Toni Heittola, Tuomas Virtanen SEMI-SUPERVISED LEARNING FOR MUSICAL INSTRUMENT RECOGNITION Aleksandr Diment, Toni Heittola, Tuomas Virtanen Tampere University of Technology Department of Signal Processing Korkeakoulunkatu 1, 33720,

More information

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:

More information

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition Linear Discriminant Analysis in Ottoman Alphabet Character Recognition ZEYNEB KURT, H. IREM TURKMEN, M. ELIF KARSLIGIL Department of Computer Engineering, Yildiz Technical University, 34349 Besiktas /

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem

Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem Stefan Hauger 1, Karen H. L. Tso 2, and Lars Schmidt-Thieme 2 1 Department of Computer Science, University of

More information

A NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON DWT WITH SVD

A NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON DWT WITH SVD A NEW ROBUST IMAGE WATERMARKING SCHEME BASED ON WITH S.Shanmugaprabha PG Scholar, Dept of Computer Science & Engineering VMKV Engineering College, Salem India N.Malmurugan Director Sri Ranganathar Institute

More information

732A54/TDDE31 Big Data Analytics

732A54/TDDE31 Big Data Analytics 732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks

More information

application of learning vector quantization algorithms. In Proceedings of the International Joint Conference on

application of learning vector quantization algorithms. In Proceedings of the International Joint Conference on [5] Teuvo Kohonen. The Self-Organizing Map. In Proceedings of the IEEE, pages 1464{1480, 1990. [6] Teuvo Kohonen, Jari Kangas, Jorma Laaksonen, and Kari Torkkola. LVQPAK: A program package for the correct

More information

Automatic Classification of Audio Data

Automatic Classification of Audio Data Automatic Classification of Audio Data Carlos H. C. Lopes, Jaime D. Valle Jr. & Alessandro L. Koerich IEEE International Conference on Systems, Man and Cybernetics The Hague, The Netherlands October 2004

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Object and Action Detection from a Single Example

Object and Action Detection from a Single Example Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:

More information

Tree-based Cluster Weighted Modeling: Towards A Massively Parallel Real- Time Digital Stradivarius

Tree-based Cluster Weighted Modeling: Towards A Massively Parallel Real- Time Digital Stradivarius Tree-based Cluster Weighted Modeling: Towards A Massively Parallel Real- Time Digital Stradivarius Edward S. Boyden III e@media.mit.edu Physics and Media Group MIT Media Lab 0 Ames St. Cambridge, MA 039

More information

Segmentation: Clustering, Graph Cut and EM

Segmentation: Clustering, Graph Cut and EM Segmentation: Clustering, Graph Cut and EM Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 yingwu@northwestern.edu http://www.eecs.northwestern.edu/~yingwu

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

IBL and clustering. Relationship of IBL with CBR

IBL and clustering. Relationship of IBL with CBR IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed

More information

Fall 09, Homework 5

Fall 09, Homework 5 5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You

More information

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek

More information

An Introduction to PDF Estimation and Clustering

An Introduction to PDF Estimation and Clustering Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 1 An Introduction to PDF Estimation and Clustering David Corrigan corrigad@tcd.ie Electrical and Electronic Engineering Dept., University

More information

The K-modes and Laplacian K-modes algorithms for clustering

The K-modes and Laplacian K-modes algorithms for clustering The K-modes and Laplacian K-modes algorithms for clustering Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://faculty.ucmerced.edu/mcarreira-perpinan

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

Nearest Neighbors Classifiers

Nearest Neighbors Classifiers Nearest Neighbors Classifiers Raúl Rojas Freie Universität Berlin July 2014 In pattern recognition we want to analyze data sets of many different types (pictures, vectors of health symptoms, audio streams,

More information

An Introduction to Pattern Recognition

An Introduction to Pattern Recognition An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems

More information

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups

More information

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Comparative study of Clustering Algorithms using MapReduce in Hadoop A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering

More information