Hierarchical 3-D von Mises-Fisher Mixture Model

Size: px
Start display at page:

Download "Hierarchical 3-D von Mises-Fisher Mixture Model"

Transcription

1 Hierarchical 3-D von Mises-Fisher Mixture Model Md. Abul Hasnat Université Jean Monnet, Saint Etienne, France. Olivier Alata Université Jean Monnet, Saint Etienne, France. Alain Trémeau Université Jean Monnet, Saint Etienne, France. Abstract In this paper, we propose a complete method for clustering data, which are in the form of unit vectors. The solution consists of a distribution based clustering algorithm with the assumption of a generative model. In the model, the data is generated from a finite statistical mixture model based on the von Mises-Fisher (vmf) distribution. Initially, Bregman soft clustering algorithm is applied to obtain the parameters of the vmf mixture model (vmf-mm) for certain maximum number of components. Then, a hierarchy of mixture models is generated from the parameters. The hierarchy is generated by appropriately using Bregman divergence to compute dissimilarity among distributions as well as fuse/merge the centroids of the clusters. After constructing the hierarchy, Kullback Leibler divergence (KLD) is used to compute the distance between statistical mixture models with different number of components. Finally, a threshold (KLD value) is used to select number of components of the mixture model. The proposed method is called Hierarchical 3-D von Mises-Fisher mixture model. We validated the method by applying it on simulated data. Additionally, we applied the proposed method to cluster image normal, which are computed from the depth image. As an outcome of the clustering, we obtained a bottom-up segmentation of the depth image. Obtained results confirmed our assumption that the proposed method can be a potential tool to analyze depth images. Proceedings of the 1st Workshop on Divergences and Divergence Learning (at ICML 2013), Atlanta, Georgia, USA, Copyright 2013 by the author(s). 1. Introduction Data/features in the form of a unit vector exhibits directional behavior. For this type of features, directional distributions (Mardia & Jupp 2000) are the standard choice to construct a statistical mixture model (Murphy 2012). Such data frequently appear in varieties of domains in order to analyze image, speech signals, text documents, gene expressions (Banerjee et al. 2005), treatment beam (Bangert et al. 2010) etc. The sample space for directional distributions is circle, sphere and hypersphere. Most prominent distributions in directional statistics (von Mises-Fisher, Kent, Watson, Bingham etc.) belong to exponential family of distributions (Mardia & Jupp 2000). Directional distributions are associated with complicated normalizing constants. For this reason, analytical solution to obtain maximum likelihood estimate (MLE) of the parameters even for a single distribution is difficult (Sra 2012). The minimal set of parameters in a directional distribution is the mean and concentration. Satisfactory approximation is available (Mardia & Jupp 2000) for lower dimensional data. However, for higher dimensional data, estimation of the concentration parameters is non-trivial since it involves functional inversion of ratios of special function such as Bessel functions (Banerjee et al. 2005). The fundamental directional distribution is the von Mises- Fisher (vmf) distribution, which is also called Fisher distribution for (Mardia & Jupp 2000). It models data concentrated around a mean-direction. Heuristic approximation of the parameters for higher dimensional data distributed according to mixture of vmf distribution is obtained (Banerjee et al. 2005). In data clustering problem, statistical mixture model is one of the most prominent and widely used tools. It consists of a base probability distribution for the observed data and prior probability for the clusters (Murphy 2012) (that generates the data samples). Therefore, a mixture

2 model is a powerful mechanism to explain the appearance of data through a generative process. Expectation Maximization (EM) is the most common algorithm to learn the parameters of a mixture model. Standard EM approach maximizes log likelihood of the data (Murphy 2012), while considering constrains in the optimization goal. However, the maximization (M-step) of EM algorithm is often computationally expensive. Banerjee et al. (Banerjee et al. 2005) proposed Bregman soft clustering algorithm for MLE of parameters. The algorithm has the following attractive features: (a) it is equivalent to EM for mixture of exponential families; (b) it simplifies the computationally expensive M-step; (c) it is applicable to mixed data types and (d) its computational complexity is linear in the data points. Bregman soft clustering is a centroid based parametric clustering approach (e.g., k-means) which arises by special choice of Bregman divergence (Banerjee et al. 2005). Bregman divergences include a large number of distortion functions commonly used in data clustering problem. Due to the bijection between Bregman divergence and exponential family, the maximization step for the density parameters in EM algorithm reduces to a simple weighted averaging step (equivalent to update the centroid of the cluster). Moreover, Bregman divergence can be applied to compute the relative entropy (KLD) between statistical distributions that belong to exponential family. Therefore, in hierarchical mixture model (Garcia & Nielsen 2010) Bregman divergence can be effectively used to compute the dissimilarity matrix. These benefits of using Bregman divergence provide strong motivation to design a clustering model that estimates parameter as well as optimal number of components. A statistical distribution can take the benefits of Bregman divergence if its canonical exponential family representation is available. While it exists for several distributions (Nielsen & Garcia 2009), the vmf distribution is yet to have such canonical representation. Finite mixtures of vmf distributions were introduced by (Banerjee et al. 2005) where they proposed EM algorithm for MLE of parameters. However, they did not address the issue of component selection. A nonparametric Bayesian framework considering vmf mixture model (imfmm) was proposed by (Bangert et al. 2010), where they discussed about the strategy to obtain number of components of the model. However, imfmm is nondeterministic and computationally expensive. A nonlinear least-squares technique to compute vmf mixture model was proposed by (McGraw et al. 2006), which is different than the family of methods we consider in this research. In this paper, we propose a hierarchical von Mises-Fisher mixture model (H-vMF-MM) in 2D sphere (or 3D Euclidean space). The model addresses several prominent issues such as: parameter estimation, number of component selection and computational efficiency. In order to adapt our clustering problem with Bregman soft clustering, we derived canonical exponential family representation of the vmf distribution. We compute Bregman divergence by exploiting the exponential parameters and Legendre dual of the log normalizing function (Banerjee et al. 2005). The clustering approach begins with Bregman soft clustering algorithm considering maximum number of components called in the mixture. A hierarchical clustering is then performed on the components of vmf-mm. This provides a hierarchical structure as well as the parameters of simplified mixture models (Garcia & Nielsen 2010) of to 1 class. Finally, we apply KLD based threshold among different mixture models in order to obtain desired number of components. Experiments on simulated data exhibits satisfactory clustering performance of the proposed approach. We applied the proposed method to analyze depth images (captured by Microsoft Kinect camera) with the aim to obtain bottom up image segmentation. For this purpose, we consider image normal (a 3 dimensional unit vector describe the surface property of individual pixels) as the feature vector. We conducted experiments on the NYU depth dataset (Silberman et al. 2012). Experimental results show that the proposed method can be considered a very useful tool for analyzing depth images captured by any range sensing devices / technologies. Our key contributions in this paper are: (a) provide a mathematical formulation to compute Bregman divergence among vmf distributions (for d=3); (b) exploit the divergence to design a computationally efficient hierarchical clustering scheme. Overall, we propose a complete clustering algorithm. We validate the performance of the proposed algorithm by applying it on simulated data as well as real image features. The remaining of this paper is structured as follows: Section 2 describes the complete clustering model. Experimental results followed by discussion are reported in Section 3. Finally, section 4 draws conclusion and possible future extensions of the model. 2. Hierarchical 3-D von Mises-Fisher (vmf) Mixture Model 2.1 vmf distributions and its canonical form (for d=3) EXPONENTIAL FAMILY OF DISTRIBUTIONS A multivariate probability density function belongs to the exponential family if it has the following form (Murphy 2012; Banerjee et al. 2005; Garcia & Nielsen 2010): ( ) Here, denotes the sufficient statistics, denotes the natural parameters. is the log partition function. is the carrier measure. is the inner product.

3 The expectation of the sufficient statistics w.r.t. the density function (Eq. (1)) is called the expectation parameter (Banerjee et al. 2005) which is computed as:, -. There exists a one-to-one correspondence between expectation and natural parameters, which allows them to span spaces that exhibit dual relationship (Banerjee et al. 2005). Relationship among these two forms of parameters can be expressed as: Here, is the gradient of. (2) VON MISES-FISHER DISTRIBUTIONS FOR 3 DIMENSIONS For a d dimensional random unit vector (i.e. and ), the vmf distribution on the (d 1) dimensional sphere (i.e. ) is defined as (Mardia & Jupp 2000): Here, denotes the mean (with ), denotes the concentration parameter (with ). The normalization constant is equal to: Here represents the modified Bessel function of the first kind and order. For d = 3, the normalizing factor of the distribution, is (Mardia & Jupp 2000): Therefore we can rewrite Eq. (3) as: (. /*, for d=3 (4) 2.2 Bregman divergence (BD) for vmf (d=3) distributions BD FOR EXPONENTIAL FAMILY OF DISTRIBUTIONS For strictly convex functions F (with natural parameters and ), BD (, ) can be formally defined as (Banerjee et al. 2005): (5) Here, is the log normalizing function of the natural parameter and is the gradient of. The oneto-one correspondence (Eq. (2)) between the natural and expectation parameter provides an equivalent form of BD (alternative of Eq. (5)) as: (6) Here, is the Legendre dual of log normalizing function (Banerjee et al. 2005). is the gradient of. can be computed by exploiting its relationship with Log normalization function (Banerjee et al. 2005):, which is expressed as ( ) (7) Due to the bijection 1 between BD and exponential families, Eq. (5) and (6) can be used to compute the distance between exponential family distributions BD AMONG VMF (3 DIMENSIONS) DISTRIBUTIONS Considering the canonical representation of exponential family (Eq. (1)), the vmf for d=3 (Eq. (4)) can be decomposed as: sufficient statistics, natural parameter, log normalizing function ( ) and carrier measure. The mean and concentration parameter can be written in terms of natural parameter as: (8) The gradient of the log normalizing function can be written as: ( Considering Eq. (2) we can write: { * { } } and, * ( ) ( ) + Now, we can write: Where, ( ) ( ) *( ) ( ) + ( ) From Eq. (9): * ( ) ( ) + Using the property of collinear vectors we can write: * ( ) ( ) + We can apply Newton-Raphson method to compute from using the following iterative update equation: 1 The bijection is expressed as: ( ) where is a uniquely determined function. For more details, please see: theorem 3 of reference (Banerjee et al. 2005).

4 Where, ( ) and ( ) Considering (Nielsen & Garcia 2009), we can use equations (6, 10, 11 and 12) to compute the BD between clusters. However, this computation is not appropriate to compute divergence between a sample and clusters. The reason is that, for the vmf samples (unit vector), computed from is very high and eventually the becomes. This problem is already addressed for univariate gaussian and corresponding solution is proposed by (Garcia & Nielsen 2010; Nielsen & Garcia 2009). Adopting the solution in our case, we can write the simplified (of Eq. (6)) BD as: * + (13) 2.3 Clustering A generative model (Murphy 2012) is assumed for the appearance of the directional data clusters. The model consists of a mixture of vmf distributions such as (Banerjee et al. 2005): Here, denotes a single sample, * + is the set of component parameters, is the mixing proportion and is the vmf distribution for any particular component BREGMAN SOFT CLUSTERING WITH FIXED K Solution of this clustering problem (Eq. (14)) is to compute MLE of each component (vmf distribution) parameters (Banerjee et al. 2005). Therefore, the goal is to obtain for such that: with,. Here, * +, N denotes total number of data samples. Bregman soft clustering exploits BD in the EM framework (Murphy 2012) to compute MLE of the model (mixture of exponential family distributions) parameters and provide a soft clustering of the dataset (Banerjee et al. 2005). In the expectation step (E-step) of the algorithm, the posterior probability is computed as:. / ( ) Here, denotes the expectation parameter for data sample. and denotes the expectation parameter for any cluster and. Eq. (13) is used to compute the BD. The maximization step (M-step) updates the mixing proportion and expectation parameter for each class as (Banerjee et al. 2005): HIERARCHICAL CLUSTERING In the previous step, we applied Bregman soft clustering with a fixed number of components. Let us denote for further usage, where denotes the maximum number of clusters to begin with. In the context of data clustering with a finite mixture model, choice of number of components is one of the principle problems (Murphy 2012). In our approach we address the solution of this problem by first generating a hierarchical structure consisting of number of components and then choose optimal number of components from the hierarchical structure. We apply agglomerative hierarchical clustering (Murphy 2012; Garcia & Nielsen 2010) on the mixture model parameters (computed after applying Bregman soft clustering). The dissimilarity matrix is computed using BD based sided distance (Garcia & Nielsen 2010). The expectation parameters based left-sided (type of divergence is chosen empirically, explained in section 3.1.3) divergence (Eq. (6)) is chosen to compute the distance between two clusters as: ( ) The average distance (Murphy 2012) criterion is chosen empirically (explained in section 3.1.3) as the linkage criteria of hierarchical clustering in order to determine the order of subset merging. 2.4 Hierarchical mixture model for vmf distribution We consider a statistical mixture model consists of number of vmf distributions (Eq. (14)). Let, { } be the set of component parameters that we obtained after applying Bregman soft clustering on. Applying hierarchical clustering on T generates a nested cluster structure through a bottom up merging process (Murphy 2012). During the merging process, in each iteration, two components are merged and number of clusters reduced by 1. Parameters of the merged clusters are computed as weighted average of the expectation parameters: Computing expectation parameter of a merged cluster in this way is analogues to the computation of left sided (type of centroid is chosen empirically, explained in

5 section 3.1.3) Bregman centroid 2 (Garcia & Nielsen 2010). Note that, the type of Bregman centroid used for merging/fusion of two clusters parameters, must correspond to the sided/symmetric BD used to build the dissimilarity matrix. The decision of the usage of sided (left, right or symmetric) centroid is made empirically. Moreover, the merging process updates the cluster membership of each sample. The nested structure obtained from the abovementioned method is called hierarchical von Mises-Fisher mixture model (H-vMF-MM). This model (H-vMF-MM) facilitates flexible access to the cluster parameters and associated data members at any particular resolution (i.e. for any particular number of clusters) within 1. In literature, this is called mixture model simplification process (Garcia & Nielsen 2010). 2.5 Choice of optimal number of components Let us consider T as the vmf-mm with number of components. H-vMF-MM generates parameters for any particular number of clusters within 1. The problem of finding optimal mixture model size can be described as the identification of the desired mixture model { } with number of components from T KULLBACK LEIBLER DIVERGENCE (KLD) BASED COMPONENT SELECTION KLD is the fundamental measure of distance between two statistical distributions (Garcia & Nielsen 2010; Hershey & Olsen 2007). An equivalent measure can be obtained with BD (using Eq. (6)). In order to select number of components, it is necessary to measure distance among mixture models. To the best of our knowledge, no solution exists to use BD for such measure. Similarly, No closed-form approximation exists for computing the KL divergence among mixture models (Hershey & Olsen 2007). Therefore, we need to use an approximation of the distance (Hershey & Olsen 2007). The goal of applying KL divergence ( mixture models is to obtain such that: ( ) ) among where the value is user defined (Garcia & Nielsen 2010). Classical Monte-Carlo sampling based distance approximation is employed to compute among two mixture models (Hershey & Olsen 2007) in the following form: ( ) ( ) Here, denotes the number of i.i.d samples obtained using a sampling procedure from the associated mixture models. 2.6 Complete clustering method We propose a complete data clustering method which is illustrated in Fig.1. We consider the data as Nx3 unit vectors. The clustering procedure begins with applying Bregman soft clustering on the vmf-mm with number of components. We initialize the mixture model using kmeans++ (Arthur & Vassilvitskii 2007) clustering algorithm. For each component, Bregman soft clustering generates associated probability and parameters. For data samples, it provides associated labels. In the next step, we apply hierarchical clustering on the set of parameters obtained from previous step. The outcome of the hierarchical clustering is a nested cluster structure composed of mixture model parameters with different number of components ranging from Additionally the hierarchical clustering updates the labels of the input data samples. Finally, clustered data membership for every sample is obtained as: Here, and denotes the expectation parameter associated with sample and cluster j = {1,, }. Let, = { } be a set of mixture models consists of different number of components with * }. An example of a mixture model from set is { }. 2 In Eq. (19), the centroid is computed from expectation parameters which is different than the centroid computation with natural parameter in (Garcia & Nielsen 2010). Figure 1. Proposed data clustering method. Prm: Parameters * + of the mixture model, : soft data membership to clusters computed with Eq. (16).

6 3. Experiments 3.1 Clustering with simulated data SIMULATED DATA SAMPLES We considered a finite set of sample unit vectors * + on a sphere. These samples were drawn independently from a vmf-mm with different number of components. For generating samples, we followed standard sampling method for vmf-mm (Dhillon & Sra 2003) and the Metropolis Hastings (MH) algorithm (Murphy 2012). In order to verify the efficiency of our proposed method, we generated two types of samples: (a) well separated with manually selected parameters (b) not-well separated with random parameters. For each type, we generated 10,000 i.i.d. samples. Fig. 2 illustrates an example of different type of simulated data samples. evaluated appropriate distance types (left /right /symmetric) and linkage criteria (Murphy 2012) with respect to KLD and resolution (number of components). The parameter fusion choices (left/right/symmetric centroid) during subset merging correspond to the type of divergence/distance that we used to compute the dissimilarity matrix. We computed the average of KLD values obtained from the evaluation of data samples consisting of mixture models (vmf-mm) with different number of components. Below (Table 1 and Fig. 3), we present results obtained from evaluating the hierarchical cluster construction with a mixture model which consists of 7 components. We began our experiments to select appropriate linkage criteria (single, complete, average, ward, weighted, median and centroid). To this aim, we computed cophenetic correlation coefficient (Romesburg 1984) among combinations of distances and linkage criteria. Table 1 presents the numerical evaluation, which indicates that the average linkage criterion is the best choice for our model. Table 1. Numerical evaluation using cophenetic correlation coefficient. Each entry in the table indicates the average value for a particular choice of distance type and linkage criteria. Figure 2. (a) Well separated samples generated from 3 components vmf-mm (b) Not well separated samples generated from 7 components vmf-mm BREGMAN SOFT CLUSTERING According to the proposed model, the primary choice in the entire clustering process was the maximum number of components. For the simulated samples we set which was determined empirically. In order to evaluate Bregman soft clustering performance, we computed the negative log likelihood (nllh) value: Link type Left sided Right sided Symmetric Single Complete Average Ward Weighted Median Centroid Fig. 3 illustrates the results obtained for evaluating distance/divergence types. Similar to (Garcia & Nielsen 2010), we observe that the left sided BD provides the best simplification quality with respect to KLD values. ( ) We kept track on the nllh values with respect to the number of iterations necessary to converge. At this stage, we did not compare the resulting clusters with the simulated ground truth, due to the fact that we preset for the soft clustering HIERARCHICAL CLUSTERING CONSTRUCTION Similar to hierarchical representations of mixtures of exponential families (Garcia & Nielsen 2010), we Figure 3. Evaluation of distance type and linkage criteria. Average KLD values for different type of distances. Linkage criteria: average link. KLD threshold value: 0.1.

7 Figure 4. Resulting clusters generated for different number of components. Associated KLD threshold values are provided. We applied these experiments on the simulated data (mixture model with different number of components). Based on complete evaluation we choose left-sided BD with the average-link criteria to construct H-vMF-MM KLD BASED COMPONENT SELECTION In this approach, a simplified mixture was obtained based on a user defined threshold value. From Fig. 3 we can have the idea of selecting a threshold value for simulated data. From our experiments we observed that, for the well separated samples, choosing a very small threshold value (, see Fig. 3 - line at the bottom) allows the selection of correct number of components. However, the observation was not evident for the not-well separated samples. Therefore, we learnt the optimal threshold from the ground truth data with our threshold selection algorithm. The threshold selection algorithm was applied on not-well separated samples only. For this purpose, we generated simulated data with different number of samples (2k, 5k, 10k, 20k, 50k), different number of components (3, 5, 7, 9) and different values. Table 2. Empirical threshold obtained from learning threshold value from simulated data. Num. classes Th. Val groups of samples were experimented where sufficient randomness in parameter selection were ensured. Therefore, finally the optimal threshold selection algorithm was applied on ~50k times. From experiments on threshold learning, we observed that threshold value has an inverse relationship with the number of classes. Table 2 presents the threshold value obtained for different number of classes EVALUATION AND COMPARATIVE STUDY We evaluated and compared the performance of H-vMF- MM based on accuracy and computational efficiency. In order to analyze the accuracy, we used simulated data set for which ground truths were known. Table 3 presents the comparison 3 of clustering accuracy for simulated samples with different number of classes and types. Table 3. Comparison of clustering accuracy. Experimented on - two different classes: 3 and 5; two different types: well separated (ws) and not well separated (nws) of simulated samples. Methods: kmeans++ (KMPP), Gaussian Mixture Model (GMM), Spherical kmeans (SPKM), vmf-mm and H- vmf-mm. KM PP GMM SP KM VMF MM HVMF MM 3 cl, ws cl, nws cl, ws cl, nws In each group of samples the optimal threshold value algorithm was applied ~5000 times (50 simulations, 10 times per simulation, 10 different values). Total 3 In order to compare with different methods, we obtained MATLAB implantation either provided by the authors (KMPP, SPKM and VMFMM) or from standard toolbox (GMM).

8 From the results in Table 3, it is evident that H-vMF-MM provides sufficiently high clustering accuracy. Moreover, it is better/equivalent to other methods. It appears that for the not well separated samples its performance is notably better than others. Next, we evaluated computational efficiency. For this purpose, we explored an alternative pathway without considering BD. The steps include: (a) apply classical EM (Banerjee et al. 2005) to obtain vmf-mm parameters; (b) construct hierarchical structure with cosine distance; (c) use KLD threshold to obtain clustering at particular resolution. We observed that due to the incorporation of BD, our proposed model enhances the computational efficiency in the first two steps of the alternative pathway. 3.2 Depth Image Analysis The proposed method was experimented on the depth images obtained from NYU depth dataset (Silberman et al. 2012). First, we computed the image normal (Silberman et al. 2012) for every pixel. Then, we applied H-vMF-MM to cluster the image. Resulting clusters generated a bottom-up segmentation of the depth image. Fig. 4 illustrates the clustering performance of H-vMF- MM on a depth image with different resolutions (number of components). In Fig. 4, the RGB image is shown in order to provide the readers an idea about the contents of the depth image. KLD threshold exhibits inverse relation (see Fig. 4) with the resolution, which is similar to the experiments with not-well separated simulated data (see section 3.1.4). Therefore, we can interpret the obtained image segmentation from the perspective of increasing or decreasing the threshold value. Increasing threshold is equivalent to merging image regions. This is evident when threshold value increased from 0.19 to 0.2 (resolution decreases from 7 to 6, see 3 rd row and 3 rd column of Fig. 4). In contrary, decreasing threshold is equivalent to splitting the image regions. We observe from the results that, the clustering provides sufficient semantic interpretation about the structure of the indoor scene. Most interestingly, we notice that it provides three principal surfaces (planes in the indoor scene) when the resolution is 4 (see 2nd row 3rd column of Fig. 4). It appears that, the more we increase the resolution (starting from 2), the more we can discover the principal surfaces present in the image. However, increasing the resolution too much will enforce oversegmentation (evident from resolution 7). Therefore, careful choice of the threshold value is very important. On the other hand, we observe from Table 2 that determining a unique threshold will not provide true number of classes. Rather, a unique threshold value for overall clustering task will create over or under partitioning the generated cluster. Therefore, in the context of this research it remains an open problem to work in future. Additionally, we noticed that the computed normal contains noisy information. This may be another significant issue that affects the final clustering result. This is evident from Fig. 4, where a new cluster appears around the paper towel dispenser when the number of components is 6 or more (see 3 rd row). The source of noise is caused by the low depth accuracy. 4. Conclusion Statistical distributions, which belong to the exponential family, provide an advantage of designing a mixture model. The advantage includes efficient computation of the model parameters by exploiting algorithms such as Bregman soft clustering (Banerjee et al. 2005). Directional distributions provide the benefit of designing a mixture model that captures true nature of data, which have the form of a unit vector. Among them, most prominent distributions belong to exponential family. However, because of the complicated normalization term in those distributions, it is not possible to take benefit of being in an exponential family. It is found that, for three or less dimensional vectors the normalization term is mathematically less complicated. We take advantage of this and derive the canonical representation of the most fundamental directional distribution called von Mises- Fisher. With the canonical representation, we propose a complete clustering model called Hierarchical von Mises Fisher Mixture Model (H-vMF-MM) that most closely resembles the hierarchical models proposed by Garcia (Garcia & Nielsen 2010). In our proposed model we exploited Bregman divergence at two stages of the clustering tasks: (a) soft clustering and (b) hierarchical clustering. In addition to that, we used KLD in order to determine the resolution (number of components) of the clustering. Therefore, appropriate use of divergence plays a significant role in designing the complete method. We conducted initial experiments on the simulated data, which clearly justify the validity of the proposed model for directional data. Then we used the model to cluster image normals (Silberman et al. 2012) which eventually generates a bottom up segmentation of the depth image. Segmentation results generated by the model provide semantic interpretation of indoor scene surfaces. Therefore, the proposed model can be used for computer vision tasks. However, we believe that this model will be equally applicable to other data mining and clustering tasks which involve directional data. We foresee several future directions to extend the work, such as: (a) determine the KLD threshold value in order to obtain optimal number of semantic classes; (b) explore other clustering approaches such as total Bregman soft clustering (Liu et al. 2012) in order to enhance robustness w.r.t. noise and outliers (c) propose similar model for other directional distributions which belong to the exponential family; (d) propose a model for high dimensional data; (e) extension of the model in a Bayesian framework.

9 References Arthur, D., and Vassilvitskii, S. (2007). k-means++: the advantages of careful seeding. Eighteenth annual ACM- SIAM symposium on Discrete algorithms (pp ). New Orleans, Louisiana: Society for Industrial and Applied Mathematics. Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S. (2005). Clustering on the Unit Hypersphere using von Mises- Fisher Distributions. Journal of Machine Learning Research (JMLR), 6, Banerjee, A., Merugu, S., Dhillon, I., and Ghosh, J. (2005). Clustering with Bregman divergences. Journal of Machine Learning Research, 6, Bangert, M., Hennig, P., Oelfke, and Uwe. (2010). Using an Infinite Von Mises-Fisher Mixture Model to Cluster Treatment Beam Directions in External Radiation Therapy. International Conference on Machine Learning and Applications (ICMLA), (pp ). Dhillon, I. S., and Sra, S. (2003). Modeling Data using Directional Distributions. Comp. Sci., Univ. of Texas at Austin. Garcia, V., and Nielsen, F. (2010). Simplification and hierarchical representations of mixtures of exponential families. Signal Processing, 90(12), Hershey, J. R., and Olsen, P. A. (2007). Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models. IEEE International Conference on Acoustics, Speech and Signal Processing, 4, pp. IV IV-320. Liu, M., Vemuri, B. C., Amari, S.-I., and Nielsen, F. (2012). Shape Retrieval Using Hierarchical Total Bregman Soft Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12), Mardia, K. V., and Jupp, P. (2000). Directional Statistics (2nd ed.). John Wiley and Sons Ltd. McGraw, T., Vemuri, B., Yezierski, R., and Mareci, T. (2006). Segmentation of high angular resolution diffusion MRI modeled as a field of von mises-fisher mixtures. European conference on Computer Vision (pp ). Springer-Verlag. Murphy, K. P. (2012). Machine Learning a Probabilistic Perspective. MIT Press. Nielsen, F., and Garcia, V. (2009). Statistical exponential families: A digest with flash cards. Computing Research Repository (CoRR), abs/ Romesburg, H. C. (1984). Cluster Analysis for Researchers. Belmont, Calif.: Lifetime Learning publications. Silberman, N., Hoiem, D., Kohli, P., Fergus, and Rob. (2012). Indoor Segmentation and Support Inference from RGBD Images. European Conference on Computer Vision. Sra, S. (2012). A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of Is(x). Computational Statistics, 27,

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

Motivation. Technical Background

Motivation. Technical Background Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering

More information

Clustering. Chapter 10 in Introduction to statistical learning

Clustering. Chapter 10 in Introduction to statistical learning Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What

More information

Behavioral Data Mining. Lecture 18 Clustering

Behavioral Data Mining. Lecture 18 Clustering Behavioral Data Mining Lecture 18 Clustering Outline Why? Cluster quality K-means Spectral clustering Generative Models Rationale Given a set {X i } for i = 1,,n, a clustering is a partition of the X i

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Spherical Randomized Gravitational Clustering

Spherical Randomized Gravitational Clustering Spherical Randomized Gravitational Clustering Jonatan Gomez and Elizabeth Leon 2 ALIFE Research Group, Computer Systems, Universidad Nacional de Colombia jgomezpe@unal.edu.co 2 MIDAS Research Group, Computer

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Document Clustering: Comparison of Similarity Measures

Document Clustering: Comparison of Similarity Measures Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation

More information

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) 10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other

More information

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures Clustering and Dissimilarity Measures Clustering APR Course, Delft, The Netherlands Marco Loog May 19, 2008 1 What salient structures exist in the data? How many clusters? May 19, 2008 2 Cluster Analysis

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Geoff McLachlan and Angus Ng. University of Queensland. Schlumberger Chaired Professor Univ. of Texas at Austin. + Chris Bishop

Geoff McLachlan and Angus Ng. University of Queensland. Schlumberger Chaired Professor Univ. of Texas at Austin. + Chris Bishop EM Algorithm Geoff McLachlan and Angus Ng Department of Mathematics & Institute for Molecular Bioscience University of Queensland Adapted by Joydeep Ghosh Schlumberger Chaired Professor Univ. of Texas

More information

CS Introduction to Data Mining Instructor: Abdullah Mueen

CS Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information

Cluster Analysis for Microarray Data

Cluster Analysis for Microarray Data Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Objective of clustering

Objective of clustering Objective of clustering Discover structures and patterns in high-dimensional data. Group data with similar patterns together. This reduces the complexity and facilitates interpretation. Expression level

More information

Texture Image Segmentation using FCM

Texture Image Segmentation using FCM Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore Texture Image Segmentation using FCM Kanchan S. Deshmukh + M.G.M

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation

Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation Bryan Poling University of Minnesota Joint work with Gilad Lerman University of Minnesota The Problem of Subspace

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the

More information

Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering

Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering World Journal of Computer Application and Technology 5(2): 24-29, 2017 DOI: 10.13189/wjcat.2017.050202 http://www.hrpub.org Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation An Introduction Justus H. Piater, Université de Liège Overview 1. Densities and their Estimation 2. Basic Estimators for Univariate KDE 3. Remarks 4. Methods for Particular Domains

More information

Automatic Tracking of Moving Objects in Video for Surveillance Applications

Automatic Tracking of Moving Objects in Video for Surveillance Applications Automatic Tracking of Moving Objects in Video for Surveillance Applications Manjunath Narayana Committee: Dr. Donna Haverkamp (Chair) Dr. Arvin Agah Dr. James Miller Department of Electrical Engineering

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

PATTERN CLASSIFICATION AND SCENE ANALYSIS

PATTERN CLASSIFICATION AND SCENE ANALYSIS PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering. Chapter 4 Fuzzy Segmentation 4. Introduction. The segmentation of objects whose color-composition is not common represents a difficult task, due to the illumination and the appropriate threshold selection

More information

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

Model-based segmentation and recognition from range data

Model-based segmentation and recognition from range data Model-based segmentation and recognition from range data Jan Boehm Institute for Photogrammetry Universität Stuttgart Germany Keywords: range image, segmentation, object recognition, CAD ABSTRACT This

More information

Spectral Clustering and Community Detection in Labeled Graphs

Spectral Clustering and Community Detection in Labeled Graphs Spectral Clustering and Community Detection in Labeled Graphs Brandon Fain, Stavros Sintos, Nisarg Raval Machine Learning (CompSci 571D / STA 561D) December 7, 2015 {btfain, nisarg, ssintos} at cs.duke.edu

More information

Clustering web search results

Clustering web search results Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI

More information

3. Cluster analysis Overview

3. Cluster analysis Overview Université Laval Multivariate analysis - February 2006 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as

More information

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Factorization with Missing and Noisy Data

Factorization with Missing and Noisy Data Factorization with Missing and Noisy Data Carme Julià, Angel Sappa, Felipe Lumbreras, Joan Serrat, and Antonio López Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona,

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)

More information

Clustering will not be satisfactory if:

Clustering will not be satisfactory if: Clustering will not be satisfactory if: -- in the input space the clusters are not linearly separable; -- the distance measure is not adequate; -- the assumptions limit the shape or the number of the clusters.

More information

Multiplicative Mixture Models for Overlapping Clustering

Multiplicative Mixture Models for Overlapping Clustering Multiplicative Mixture Models for Overlapping Clustering Qiang Fu Dept of Computer Science & Engineering University of Minnesota, Twin Cities qifu@cs.umn.edu Arindam Banerjee Dept of Computer Science &

More information

Hierarchical Clustering

Hierarchical Clustering What is clustering Partitioning of a data set into subsets. A cluster is a group of relatively homogeneous cases or observations Hierarchical Clustering Mikhail Dozmorov Fall 2016 2/61 What is clustering

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

DOCUMENT CLUSTERING USING HIERARCHICAL METHODS. 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar. 3. P.Praveen Kumar. achieved.

DOCUMENT CLUSTERING USING HIERARCHICAL METHODS. 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar. 3. P.Praveen Kumar. achieved. DOCUMENT CLUSTERING USING HIERARCHICAL METHODS 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar 3. P.Praveen Kumar ABSTRACT: Cluster is a term used regularly in our life is nothing but a group. In the view

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

K-Means Clustering. Sargur Srihari

K-Means Clustering. Sargur Srihari K-Means Clustering Sargur srihari@cedar.buffalo.edu 1 Topics in Mixture Models and EM Mixture models K-means Clustering Mixtures of Gaussians Maximum Likelihood EM for Gaussian mistures EM Algorithm Gaussian

More information

Symposium on Optimal Transport and Information Geometry

Symposium on Optimal Transport and Information Geometry Symposium on Optimal Transport and Information Geometry Computational information geometry for audio signal processing (Based on a tutorial presented at DAFx 2012) Arnaud Dessein Institut de Recherche

More information

Lecture 11: E-M and MeanShift. CAP 5415 Fall 2007

Lecture 11: E-M and MeanShift. CAP 5415 Fall 2007 Lecture 11: E-M and MeanShift CAP 5415 Fall 2007 Review on Segmentation by Clustering Each Pixel Data Vector Example (From Comanciu and Meer) Review of k-means Let's find three clusters in this data These

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Unsupervised Learning: Clustering Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com (Some material

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2 161 Machine Learning Hierarchical clustering Reading: Bishop: 9-9.2 Second half: Overview Clustering - Hierarchical, semi-supervised learning Graphical models - Bayesian networks, HMMs, Reasoning under

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications

Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications Anil K Goswami 1, Swati Sharma 2, Praveen Kumar 3 1 DRDO, New Delhi, India 2 PDM College of Engineering for

More information

Clustering algorithms and introduction to persistent homology

Clustering algorithms and introduction to persistent homology Foundations of Geometric Methods in Data Analysis 2017-18 Clustering algorithms and introduction to persistent homology Frédéric Chazal INRIA Saclay - Ile-de-France frederic.chazal@inria.fr Introduction

More information

Keywords: clustering algorithms, unsupervised learning, cluster validity

Keywords: clustering algorithms, unsupervised learning, cluster validity Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based

More information

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION WILLIAM ROBSON SCHWARTZ University of Maryland, Department of Computer Science College Park, MD, USA, 20742-327, schwartz@cs.umd.edu RICARDO

More information

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which

More information

A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval

A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval Florent Perronnin, Yan Liu and Jean-Michel Renders Xerox Research Centre Europe (XRCE) Textual and

More information

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically

More information

One-Shot Learning with a Hierarchical Nonparametric Bayesian Model

One-Shot Learning with a Hierarchical Nonparametric Bayesian Model One-Shot Learning with a Hierarchical Nonparametric Bayesian Model R. Salakhutdinov, J. Tenenbaum and A. Torralba MIT Technical Report, 2010 Presented by Esther Salazar Duke University June 10, 2011 E.

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

Foundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot

Foundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot Foundations of Machine Learning CentraleSupélec Fall 2017 12. Clustering Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning objectives

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Bioimage Informatics

Bioimage Informatics Bioimage Informatics Lecture 14, Spring 2012 Bioimage Data Analysis (IV) Image Segmentation (part 3) Lecture 14 March 07, 2012 1 Outline Review: intensity thresholding based image segmentation Morphological

More information

Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu

Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu 1 Introduction Yelp Dataset Challenge provides a large number of user, business and review data which can be used for

More information

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6 Cluster Analysis and Visualization Workshop on Statistics and Machine Learning 2004/2/6 Outlines Introduction Stages in Clustering Clustering Analysis and Visualization One/two-dimensional Data Histogram,

More information

Traditional clustering fails if:

Traditional clustering fails if: Traditional clustering fails if: -- in the input space the clusters are not linearly separable; -- the distance measure is not adequate; -- the assumptions limit the shape or the number of the clusters.

More information

COMS 4771 Clustering. Nakul Verma

COMS 4771 Clustering. Nakul Verma COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find

More information

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA Modeling and Reasoning with Bayesian Networks Adnan Darwiche University of California Los Angeles, CA darwiche@cs.ucla.edu June 24, 2008 Contents Preface 1 1 Introduction 1 1.1 Automated Reasoning........................

More information

University of Florida CISE department Gator Engineering. Clustering Part 5

University of Florida CISE department Gator Engineering. Clustering Part 5 Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean

More information

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves

More information

IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING

IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING SECOND EDITION IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING ith Algorithms for ENVI/IDL Morton J. Canty с*' Q\ CRC Press Taylor &. Francis Group Boca Raton London New York CRC

More information