Topographic Local PCA Maps
|
|
- Gerard Logan
- 5 years ago
- Views:
Transcription
1 Topographic Local PCA Maps Peter Meinicke and Helge Ritter Neuroinformatics Group, University of Bielefeld {pmeinick, Abstract We present a model for coupling Local Principal Component Analysers based on a probabilistic notion of neighbourhood, which is inspired by the Self-Organizing Map (SOM). With our approach topologically ordered configurations of Local PCA s arise from homotopy-based minimization of a global error function. We indicate that such an approach can be viewed as a natural generalization of the basic SOM while, unlike the SOM, it is not restricted to capture the variation of multivariate data only along a small number of grid dimensions. We show the close relations to the Adaptive Subspace SOM (ASSOM) and by experimental results on synthetic and high-dimensional real-world data we demonstrate the capabilities of the model. 1 Introduction Local PCA (LPCA) learning [7, 1] can be viewed as a plausible extension to the conventional Vector Quantization (VQ) framework. It replaces the point prototypes by linear manifolds, which can considerably improve generalization especially in high-dimensional data spaces. In both frameworks of LPCA and VQ based learning the problem of overfitting quickly arises if we increase the number of prototypes to improve the approximation capabilities of the model. In the VQ case, one successful type of approach to restrict the model complexity is by an introduction of couplings that constrain the flexibility of each prototype relative to a set of "neighbours". The wellknown SOM [8, 10] imposes the relationships among the K prototypes by means of a neighbourhood function h jk which determines the coupling strength between prototypes j and k. Usually, h jk is related to the closeness of lattice points ("nodes") j, k on a fictitious grid, which is chosen to resemble the topology of the data in order to obtain good generalization. For a biological motivation of such an approach we refer to [8, 10]. In this paper we propose to extend the point-wise SOM prototypes to linear manifold prototypes, which are now able to extract some local subspace structure from the data. Optimization of the neighbourhood coupled manifolds results in a special kind of feature maps which we refer to as "Topographic Local PCA" (T-LPCA) maps. With respect to the representational capabilities T- LPCA can be viewed as a natural generalization of the classical SOM and in addition it can also realize the Adaptive Subspace SOM (ASSOM) [8], which turns out to be a special case of the T-LPCA framework too. Therefore a probabilistic version of the SOM can be easily realized within the Topographic Local PCA framework and indeed it plays an important role during learning via successive model refinement. 2 T-LPCA Prototypes In essence the definition of our T-LPCA model is based on the combination of two formalisms, which realize some variability with respect to the kind of prototype and a certain kind of probabilistic neighbourhood respectively. More specifically, the prototype variability is achieved by a parametrized distance function, which can provide a smooth transition from points to linear manifolds. The neighbourhood coupling is achieved by a set of assignment probabilities, which combine the squared distances w.r.t. distinct prototypes. This idea of a probabilistic notion of neighbourhood goes back to [9] and has been extended to "Soft Topographic Vector Quantization" by [4]. The combination of the above formalisms results in a function which measures the error of a d-dimensional point x with respect to the neighbourhood of a prototype j: E j (x; ) = 1 h jk k(i V k Vk T )(x w k)k 2 (1) 2 k=1 for j = 1; : : : ; K with 2 [0; 1]. The error function sums up (squared) neighbourhood-weighted central (SOM) distances for = 0 and orthogonal distances w.r.t. some linear manifolds for = 1. While I is the d d identity matrix the V k are d q matrices with orthonormal columns which determine the directions of the
2 manifolds. Each V k Vk T thus represents an orthogonal projection onto a q-dimensional subspace that becomes associated with node k. The w k are points in data space which correspond to the SOM prototypes for = 0 and which determine the distance of the linear manifolds to the origin for = 1. Within a statistical interpretation [9, 4] a data-point is not deterministically assigned to a single prototype but instead to a set of probabilistically related prototypes. The final assignment depends on a discrete random variable r j which assigns a point P to the k-th prototype with probability h jk. Therefore k h jk = 1 and the distribution of r j defines what we refer to as the "neighbourhood" of prototype j or shortly the j-th neighbourhood. The above error function (1) therefore defines the expected squared -distance of a data-point w.r.t. the distribution of r j. The topological justification for that probabilistic notion of neighbourhood comes from the fact that within a neighbourhood j we require the probability h jk to be a monotonous function of the distance between the corresponding nodes of prototypes j and k on a prespecified grid ("array") of chosen topology. The corresponding neighbourhood function which maps array-distances to probabilities establishes the connection to the classical SOM. In subsection 3.3 we will propose a Gaussian realization of that function which leads to a convenient parametrization of the probabilities. For notational convenience the neighbourhood probabilities of all K random variables are collected in the prespecified neighbourhood matrix H = (h jk ) which encodes the topological relations between all prototypes. For the sake of a simplified notation in the following we suppress the H-dependency of E j (). 3 Optimization Learning from the sample X = fx 1 ; : : : ; x N g R d requires minimization of the following global error function: E(M; ; ) = i=1 j=1 m ij E j (x i ; ) (2) where comprises all model parameters w 1 ; V 1 ; : : : ; w K ; V K and the N K matrix M = (m ij ) contains a set of membership variables m ij, which denote the membership of a data point i w.r.t. to a certain neighbourhood j. Within a hard-clustering framework the membership variables would be binary and would assign a data point to exactly one neighbourhood. In order to avoid poor local minima of the above objective function we do not start with a direct minimization of (2) by hard-clustering where a nearest neighbour(hood) partitioning of the data and a subsequent reestimation of the parameter values are iterated. Instead we use a homotopy-based method which gradually deforms an initial error function with a welldefined global minimum until the original error function is minimized in a final optimization step. This technique is usually referred to as deterministic annealing [11] and will be the subject of the next subsection. 3.1 Deterministic Annealing In the following the value of m ij is viewed as the probability of a data-point i to belong to neighbourhood j, requiring X j m ij = 1; m ij 0; j = 1; : : : ; K (3) In that way we introduce a set of N random variables s i with probability distributions P fs i = jg = m ij which randomize the data-to-neighbourhood assignments. Thus (2) denotes the expected error w.r.t. to these s i distributions. In contrast to the r j of the previous section it doesn t make sense to prespecify the distributions of the s i, the corresponding probabilities have to be estimated from the data. However such an estimation scheme is not well-defined unless some further constraints are imposed on the m ij. For that purpose a suitable approach is to constrain the entropy of the s i distributions. This technique can be derived from the well-known "maximum entropy"-principle [5] and can be simply implemented by adding a regularization term to the above error function (2): E(M; ; ; ) = E(M; ; ) 1 m ij log m ij i=1 j=1 (4) where plays the role of an inverse temperature [11]. For an infinitely high temperature, i.e.! 0, minimization of (4) w.r.t. to the m ij yields the maximum entropy solution for the s i distributions with all probabilities equal to 1=K. For P = 0 optimization of the point prototypes yields ^w k = 1=N x i i for all k as the unique minimizers of (4) since the Hessian of the error function is positive definite in this case [11]. This means that for! 0 all optimal prototypes coincide in the global sample mean. If is increased, without neighbourhood constraint, this state remains stable as long as the value of 1= exceeds the largest eigenvalue of the sample covariance matrix, which is known as the "critical temperature" [11]. With neighbourhood constraint this critical temperature also depends on H [4]. With further increasing
3 the prototype vectors undergo a series of splittings in order to minimize the regularized error function and in the limit! 1 a hard-clustering of the data is achieved. The technique described so far is well known as "deterministic annealing" and it has the reputation of being rather robust against shallow local minima, provided an adequate annealing schedule is chosen. 3.2 Parameter Estimation For given values of, and H minimization of (4) can be achieved by a special version of the EM-algorithm [2]. Thereby the following two steps are iterated until convergence. E-Step Given some parameter values for the optimal membership probabilities can be derived from the corresponding stationarity conditions (zero first derivatives) under the constraint (3), which yield ^m ij = expfe j(x i ; )g Pk expfe k(x i ; )g for i = 1; : : : ; N and j = 1; : : : ; K. M-Step (5) Given some values for the membership probabilities the optimal prototypes are derived from the corresponding stationarity conditions, which yield the following local means for k = 1; : : : ; K. ^w k = 1 n k n k = i=1 x i i=1 j=1 j=1 ^m ij h jk (6) ^m ij h jk (7) For > 0 from (4) optimal direction-matrices are defined by ^V k = arg max V tr (S kvv T ) subject to V T V = I q (8) with tr() denoting the trace operation, I q being the q q identity matrix and (x i ^w k )(x i ^w k ) T ^m ij h jk (9) S k = 1 n k i=1 j=1 being some local covariance matrix. Now it can be shown that a (non-unique) maximizer of the above trace in (8) can be found from an eigenvalue decomposition of S k with ^V k containing those eigenvectors as columns which are associated with the q largest eigenvalues of S k (see e.g. [6] pp. 9). Thus estimation of the optimal direction matrices is achieved by performing K local PCA s. 3.3 Varying and H To obtain a good local minimum of the global error we combine the above deterministic annealing with two other deformations of the error function which involve a gradual increase of the above parameter and a successive modification of H, which realizes a "shrinking" neighbourhood. Since the above splitting scheme of the previous subsection is not well-defined for general linear manifolds it makes sense to first apply deterministic annealing to a set of initial point prototypes with = 0. In order to control the extent of the neighbourhoods it is necessary to provide a suitable parametrization of the neighbourhood matrix H. A convenient choice is to use a Gaussian neighbourhood function, which for a 1-D array leads to the following probabilities: h jk = 1 exp 1 Z j 2 2 jj kj2 (10) where the Z j is chosen to provide unit row sums of H. Such a scheme easily generalizes to higher-dimensional arrays and provides a suitable control of the neighbourhood width by the variance 2 of the Gaussian neighbourhood function. With the specification of a neighbourhood function we can now extend the global error function to E(M; ; ; ; ) in order to make the -dependency explicit. As in the SOM case, for the minimization of this function it is recommendable to start with a large neighbourhood width and successively decrease the width until, in the limit, all couplings between prototypes may vanish. Although other strategies are conceivable the following overall optimization scheme has proven useful in all our experiments. We always start with = 0 at some high temperature 1= min and with a large neighbourhood width. Then is increased in a few optimization steps according to an exponential schedule. After this initial deterministic annealing phase we continue with zero temperature hard-clustering and according to a linear schedule we increase and decrease in a few steps until = 1 and = 0 respectively. For the case = 0 the Gaussian neighbourhood function is replaced by a Dirac impulse. The overall optimization scheme is shown in table 1 in a more algorithmic fashion.
4 ➊ Define max > min > 0; max > 0; > 1; > 0 ➋ Initialize = 0; = min ; = max ➌ Minimize E(M; ; ; ; ) ➍ Set = ➎ If < max Goto ➌ ➏ Minimize E(M; ; ; 1; ) ➐ Set = + ; = (1 ) max ➑ If 1 Goto ➏ Else Stop. Table 1: Homotopy-based optimization scheme; for example values of constants min, max, max, and see section 5. 4 Relations to the ASSOM Learning with the Adaptive Subspace SOM (ASSOM) [8] can be viewed as an online variant of our T-LPCA optimization scheme for the particular case! 1 and = 1 with the linear manifolds passing through the origin, i.e. w k = 0; k = 1; : : : ; K. Due to the latter constraint formally the ASSOM can not be viewed as a generalization of the SOM and in practice it wouldn t be possible to build the ASSOM from an initially given SOM by simply extending the prototypes. In addition the constraint specializes the ASSOM to certain kinds of data distributions as illustrated by the experiments of the next section. In cases where the local means w k of the T-LPCA map are highly correlated with the main directions of the V k the ASSOM can be expected to yield a similarly good representation of the data. Our experimental results indicate that this might be the case for the handwritten digit image data which we used for T-LPCA training, since figure 3 shows that the main directions in the second row (from bottom) are approximately scaled versions of the corresponding means of the bottom row. However the noisy circle (see figure 1 and 2) shows an example, which is better suited for the more general T-LPCA model, since it allows the linear manifolds to have arbitrary offsets w.r.t. the origin. 5 Experimental Results In all experiments we applied the optimization scheme described in section 3 and table 1. Thereby the initial neighbourhood width max was set to twice the grid spacing and the initial temperature 1= min was set to the largest eigenvalue of the sample covariance matrix. In the deterministic annealing phase we used a factor = 2 to increase over 10 iterations. During each iteration the above EM-optimization of subsection 3.2 was applied to reduce the global error. In the second zero-temperature phase was incremented by = 0: Noisy Circle In the first experiment prototypes with one-dimensional subspaces (q = 1) were fitted to a synthetic data set of 100 points, which were generated by sampling from the unit circle and adding isotropic Gaussian noise with standard deviation 0:1. We used a model with K = 6 prototypes and a 1D array of equally spaced nodes. The residual squared error was per data point on the average and the resulting model is depicted in figure 1. For comparison we also fitted a model with zero local means in order to achieve an ASSOM-like representation. The average squared error was in this case and the result is shown in figure Figure 1: T-LPCA model with 1D-topology and K = 6 prototypes fitted to 100-point sample of noisy circle. 5.2 Feature Representation In the second experiment we used a downsampled 1000-point subset of the MNIST database ( containing 8 8 images of handwritten "1" digits. In the 64-dimensional data space we fitted a T-LPCA model with K = 6 prototypes and q = 5 subspace dimensions.
5 Figure 2: T-LPCA map with w k = 0; k = 1; : : : ; K leads to an ASSOM-like model with all K = 6 lines passing through the origin Again the nodes were arranged on a regular 1D array. From the result in figure 3 we see that most of the non-linear variation, in this case mainly due to rotation in the image plane, is captured along the horizontal array dimension. In addition some linear feature filters emerge along the vertical subspace dimensions. 5.3 Visualization A convenient visualization of a 2D SOM can be achieved if the distance between neighbouring prototypes is mapped to the greylevel of a corresponding image region, according to the topology of the underlying SOM array. In essence the resulting visualization resembles the socalled U-map [12] and one might argue that this concept easily carries over to linear manifold prototypes. However a suitable distance metric isn t quite obvious. The distance between two subspaces S j and S k, represented by matrices V j ; V k (see section 2), can be defined as [3] dist(s j ; S k ) = kv j V T j V kv T k k 2 (11) which equals the largest singular value of V j Vj T V k Vk T. However this distance doesn t involve the local means and the results we achieved by using the direction matrices only, were rather poor. As a possible alternative we investigated an extension of the usual point-to-point distance to a sum of point-to-manifold distances D jk = 1 2 k(2i V jv T j V kv T k )(w j w k )k (12) Figure 3: T-LPCA map from 1000-point sample of 64 dimensional image vectors of handwritten "1"-digits; bottom row shows local means w k, the next upper rows show the column-vectors of the direction matrices V k for q = 5. which is simply half the orthogonal distance of the local mean w j to linear manifold k plus half the distance of w k to manifold j. For zero-dimensional subspaces it reduces to the usual point-to-point distance, normally used for U-map imaging. As an illustrative example we used this "pseudo"-distance to build an U-map from a T-LPCA model which we had trained on 88 images of digits "0" to "4". The training set contained 1000 examples of each digit which were used to optimize a model with 36 nodes arranged on a regular 6 6 grid. The resulting U-map is shown in figure 4. 6 Conclusion We conclude that the T-LPCA map is a highly promising extension of the SOM, which is capable to catch some high-dimensional local linear variation in addition to the global non-linear variation along the low-dimensional SOM array. We showed that T-LPCA maps can be formulated as a probabilistic generalization of the SOM by means of a suitable parametrization of a global error function which is minimized by homotopy-based optimization.
6 Information Processing Systems, volume 6, pages Morgan Kaufmann Publishers, Inc., [8] T. Kohonen. Self-Organizing Maps. Springer, Berlin, [9] Stephen P. Luttrell. A Bayesian analysis of selforganizing maps. Neural Computation, 6(5): , [10] H. Ritter, T. Martinetz, and K. Schulten. Neural Computation and Self-Organizing Maps. An Introduction. Addison-Wesley, Reading, MA, [11] K. Rose, E. Gurewitz, and G. C. Fox. Vector quantization by deterministic annealing. IEEE Transactions on Information Theory, 38(4): , Figure 4: T-LPCA U-map for 6x6 model built from digit data; fields between units have greylevel proportional to the D jk "distance" defined in the text; on diagonals the minimum of both distances is taken; fields of units take the median of their surrounding fields; labels indicate most common digit class mapped to the corresponding prototype. [12] A. Ultsch. Self-organizing neural networks for visualization and classification. In O. Opitz, B. Lausen, and R. Klar, editors, Information and Classification, pages , Berlin, Springer. References [1] Christoph Bregler and Stephen M. Omohundro. Surface learning with applications to lipreading. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages Morgan Kaufmann Publishers, Inc., [2] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39:1 38, [3] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, third edition, [4] T. Graepel, M. Burger, and K. Obermayer. Phase transitions in stochastic self-organizing maps. Physical Review E, 56(4): , [5] E. Jaynes. Information theory and statistical mechanics. Physical Review, 106(4): , [6] I. T. Jolliffe. Principal Component Analysis. Springer, New York, [7] Nanda Kambhaltla and Todd K. Leen. Fast nonlinear dimension reduction. In Advances in Neural
Function approximation using RBF network. 10 basis functions and 25 data points.
1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data
More informationA Topography-Preserving Latent Variable Model with Learning Metrics
A Topography-Preserving Latent Variable Model with Learning Metrics Samuel Kaski and Janne Sinkkonen Helsinki University of Technology Neural Networks Research Centre P.O. Box 5400, FIN-02015 HUT, Finland
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationA Stochastic Optimization Approach for Unsupervised Kernel Regression
A Stochastic Optimization Approach for Unsupervised Kernel Regression Oliver Kramer Institute of Structural Mechanics Bauhaus-University Weimar oliver.kramer@uni-weimar.de Fabian Gieseke Institute of Structural
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationt 1 y(x;w) x 2 t 2 t 3 x 1
Neural Computing Research Group Dept of Computer Science & Applied Mathematics Aston University Birmingham B4 7ET United Kingdom Tel: +44 (0)121 333 4631 Fax: +44 (0)121 333 4586 http://www.ncrg.aston.ac.uk/
More informationSubspace Clustering with Global Dimension Minimization And Application to Motion Segmentation
Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation Bryan Poling University of Minnesota Joint work with Gilad Lerman University of Minnesota The Problem of Subspace
More informationINDEPENDENT COMPONENT ANALYSIS WITH QUANTIZING DENSITY ESTIMATORS. Peter Meinicke, Helge Ritter. Neuroinformatics Group University Bielefeld Germany
INDEPENDENT COMPONENT ANALYSIS WITH QUANTIZING DENSITY ESTIMATORS Peter Meinicke, Helge Ritter Neuroinformatics Group University Bielefeld Germany ABSTRACT We propose an approach to source adaptivity in
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More information9.1. K-means Clustering
424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationRobot Manifolds for Direct and Inverse Kinematics Solutions
Robot Manifolds for Direct and Inverse Kinematics Solutions Bruno Damas Manuel Lopes Abstract We present a novel algorithm to estimate robot kinematic manifolds incrementally. We relate manifold learning
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training
More informationCIE L*a*b* color model
CIE L*a*b* color model To further strengthen the correlation between the color model and human perception, we apply the following non-linear transformation: with where (X n,y n,z n ) are the tristimulus
More informationChapter 7: Competitive learning, clustering, and self-organizing maps
Chapter 7: Competitive learning, clustering, and self-organizing maps António R. C. Paiva EEL 6814 Spring 2008 Outline Competitive learning Clustering Self-Organizing Maps What is competition in neural
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationCOMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS
COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationAnalysis of Functional MRI Timeseries Data Using Signal Processing Techniques
Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Sea Chen Department of Biomedical Engineering Advisors: Dr. Charles A. Bouman and Dr. Mark J. Lowe S. Chen Final Exam October
More informationTHE preceding chapters were all devoted to the analysis of images and signals which
Chapter 5 Segmentation of Color, Texture, and Orientation Images THE preceding chapters were all devoted to the analysis of images and signals which take values in IR. It is often necessary, however, to
More informationClustering with Reinforcement Learning
Clustering with Reinforcement Learning Wesam Barbakh and Colin Fyfe, The University of Paisley, Scotland. email:wesam.barbakh,colin.fyfe@paisley.ac.uk Abstract We show how a previously derived method of
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationRecognizing Handwritten Digits Using Mixtures of Linear Models. Abstract
Recognizing Handwritten Digits Using Mixtures of Linear Models Geoffrey E Hinton Michael Revow Peter Dayan Deparbnent of Computer Science, University of Toronto Toronto, Ontario, Canada M5S la4 Abstract
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationApplication of Principal Components Analysis and Gaussian Mixture Models to Printer Identification
Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification Gazi. Ali, Pei-Ju Chiang Aravind K. Mikkilineni, George T. Chiu Edward J. Delp, and Jan P. Allebach School
More informationLoopy Belief Propagation
Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure
More informationA New Orthogonalization of Locality Preserving Projection and Applications
A New Orthogonalization of Locality Preserving Projection and Applications Gitam Shikkenawis 1,, Suman K. Mitra, and Ajit Rajwade 2 1 Dhirubhai Ambani Institute of Information and Communication Technology,
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationK-Means Clustering. Sargur Srihari
K-Means Clustering Sargur srihari@cedar.buffalo.edu 1 Topics in Mixture Models and EM Mixture models K-means Clustering Mixtures of Gaussians Maximum Likelihood EM for Gaussian mistures EM Algorithm Gaussian
More informationRobust Kernel Methods in Clustering and Dimensionality Reduction Problems
Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationLast week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints
Last week Multi-Frame Structure from Motion: Multi-View Stereo Unknown camera viewpoints Last week PCA Today Recognition Today Recognition Recognition problems What is it? Object detection Who is it? Recognizing
More informationMethods for Intelligent Systems
Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationImage Analysis, Classification and Change Detection in Remote Sensing
Image Analysis, Classification and Change Detection in Remote Sensing WITH ALGORITHMS FOR ENVI/IDL Morton J. Canty Taylor &. Francis Taylor & Francis Group Boca Raton London New York CRC is an imprint
More informationImage Processing. Image Features
Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching
More informationPATTERN CLASSIFICATION AND SCENE ANALYSIS
PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane
More informationFigure (5) Kohonen Self-Organized Map
2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;
More informationLearning a Manifold as an Atlas Supplementary Material
Learning a Manifold as an Atlas Supplementary Material Nikolaos Pitelis Chris Russell School of EECS, Queen Mary, University of London [nikolaos.pitelis,chrisr,lourdes]@eecs.qmul.ac.uk Lourdes Agapito
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationAlgebraic Iterative Methods for Computed Tomography
Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationThe Pre-Image Problem in Kernel Methods
The Pre-Image Problem in Kernel Methods James Kwok Ivor Tsang Department of Computer Science Hong Kong University of Science and Technology Hong Kong The Pre-Image Problem in Kernel Methods ICML-2003 1
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More information10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)
10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,
More informationOne-mode Additive Clustering of Multiway Data
One-mode Additive Clustering of Multiway Data Dirk Depril and Iven Van Mechelen KULeuven Tiensestraat 103 3000 Leuven, Belgium (e-mail: dirk.depril@psy.kuleuven.ac.be iven.vanmechelen@psy.kuleuven.ac.be)
More informationHuman Motion Synthesis by Motion Manifold Learning and Motion Primitive Segmentation
Human Motion Synthesis by Motion Manifold Learning and Motion Primitive Segmentation Chan-Su Lee and Ahmed Elgammal Rutgers University, Piscataway, NJ, USA {chansu, elgammal}@cs.rutgers.edu Abstract. We
More information5.6 Self-organizing maps (SOM) [Book, Sect. 10.3]
Ch.5 Classification and Clustering 5.6 Self-organizing maps (SOM) [Book, Sect. 10.3] The self-organizing map (SOM) method, introduced by Kohonen (1982, 2001), approximates a dataset in multidimensional
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationAn efficient algorithm for sparse PCA
An efficient algorithm for sparse PCA Yunlong He Georgia Institute of Technology School of Mathematics heyunlong@gatech.edu Renato D.C. Monteiro Georgia Institute of Technology School of Industrial & System
More informationMultiresponse Sparse Regression with Application to Multidimensional Scaling
Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 5
Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean
More informationClustering & Dimensionality Reduction. 273A Intro Machine Learning
Clustering & Dimensionality Reduction 273A Intro Machine Learning What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationSOM+EOF for Finding Missing Values
SOM+EOF for Finding Missing Values Antti Sorjamaa 1, Paul Merlin 2, Bertrand Maillet 2 and Amaury Lendasse 1 1- Helsinki University of Technology - CIS P.O. Box 5400, 02015 HUT - Finland 2- Variances and
More informationSGN (4 cr) Chapter 11
SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter
More informationAbstractacceptedforpresentationatthe2018SEGConventionatAnaheim,California.Presentationtobemadeinsesion
Abstractacceptedforpresentationatthe2018SEGConventionatAnaheim,California.Presentationtobemadeinsesion MLDA3:FaciesClasificationandReservoirProperties2,onOctober17,2018from 11:25am to11:50am inroom 204B
More informationExploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray
Exploratory Data Analysis using Self-Organizing Maps Madhumanti Ray Content Introduction Data Analysis methods Self-Organizing Maps Conclusion Visualization of high-dimensional data items Exploratory data
More informationSome questions of consensus building using co-association
Some questions of consensus building using co-association VITALIY TAYANOV Polish-Japanese High School of Computer Technics Aleja Legionow, 4190, Bytom POLAND vtayanov@yahoo.com Abstract: In this paper
More informationFlexible Lag Definition for Experimental Variogram Calculation
Flexible Lag Definition for Experimental Variogram Calculation Yupeng Li and Miguel Cuba The inference of the experimental variogram in geostatistics commonly relies on the method of moments approach.
More informationTopological Correlation
Topological Correlation K.A.J. Doherty, R.G. Adams and and N. Davey University of Hertfordshire, Department of Computer Science College Lane, Hatfield, Hertfordshire, UK Abstract. Quantifying the success
More informationGaussian Process Latent Variable Models for Visualisation of High Dimensional Data
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science University of Sheffield Regent Court, 211 Portobello Street, Sheffield,
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @
More informationOptimum Array Processing
Optimum Array Processing Part IV of Detection, Estimation, and Modulation Theory Harry L. Van Trees WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Preface xix 1 Introduction 1 1.1 Array Processing
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationThe Use of Biplot Analysis and Euclidean Distance with Procrustes Measure for Outliers Detection
Volume-8, Issue-1 February 2018 International Journal of Engineering and Management Research Page Number: 194-200 The Use of Biplot Analysis and Euclidean Distance with Procrustes Measure for Outliers
More informationDimension Reduction of Image Manifolds
Dimension Reduction of Image Manifolds Arian Maleki Department of Electrical Engineering Stanford University Stanford, CA, 9435, USA E-mail: arianm@stanford.edu I. INTRODUCTION Dimension reduction of datasets
More informationExperiments with Edge Detection using One-dimensional Surface Fitting
Experiments with Edge Detection using One-dimensional Surface Fitting Gabor Terei, Jorge Luis Nunes e Silva Brito The Ohio State University, Department of Geodetic Science and Surveying 1958 Neil Avenue,
More informationSelf-organizing mixture models
Self-organizing mixture models Jakob Verbeek, Nikos Vlassis, Ben Krose To cite this version: Jakob Verbeek, Nikos Vlassis, Ben Krose. Self-organizing mixture models. Neurocomputing / EEG Neurocomputing,
More informationBasis Functions. Volker Tresp Summer 2017
Basis Functions Volker Tresp Summer 2017 1 Nonlinear Mappings and Nonlinear Classifiers Regression: Linearity is often a good assumption when many inputs influence the output Some natural laws are (approximately)
More informationRobust Subspace Computation Using L1 Norm
Robust Subspace Computation Using L1 Norm Qifa Ke and Takeo Kanade August 2003 CMU-CS-03-172 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract Linear subspace has many
More informationInverse KKT Motion Optimization: A Newton Method to Efficiently Extract Task Spaces and Cost Parameters from Demonstrations
Inverse KKT Motion Optimization: A Newton Method to Efficiently Extract Task Spaces and Cost Parameters from Demonstrations Peter Englert Machine Learning and Robotics Lab Universität Stuttgart Germany
More informationGTM: The Generative Topographic Mapping
GTM: The Generative Topographic Mapping Christopher M. Bishop, Markus Svensén Microsoft Research 7 J J Thomson Avenue Cambridge, CB3 0FB, U.K. {cmbishop,markussv}@microsoft.com http://research.microsoft.com/{
More informationFeature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22
Feature selection Javier Béjar cbea LSI - FIB Term 2011/2012 Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/2012 1 / 22 Outline 1 Dimensionality reduction 2 Projections 3 Attribute selection
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,
More informationInvariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction
Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of
More informationCS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs
CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs Felix Wang fywang2 John Wieting wieting2 Introduction We implement a texture classification algorithm using 2-D Noncausal Hidden
More informationLearning in Medical Image Databases. Cristian Sminchisescu. Department of Computer Science. Rutgers University, NJ
Learning in Medical Image Databases Cristian Sminchisescu Department of Computer Science Rutgers University, NJ 08854 email: crismin@paul.rutgers.edu December, 998 Abstract In this paper we present several
More informationRoad Sign Visualization with Principal Component Analysis and Emergent Self-Organizing Map
Road Sign Visualization with Principal Component Analysis and Emergent Self-Organizing Map H6429: Computational Intelligence, Method and Applications Assignment One report Written By Nguwi Yok Yen (nguw0001@ntu.edu.sg)
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationOBJECT-CENTERED INTERACTIVE MULTI-DIMENSIONAL SCALING: ASK THE EXPERT
OBJECT-CENTERED INTERACTIVE MULTI-DIMENSIONAL SCALING: ASK THE EXPERT Joost Broekens Tim Cocx Walter A. Kosters Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Email: {broekens,
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationAdvanced visualization techniques for Self-Organizing Maps with graph-based methods
Advanced visualization techniques for Self-Organizing Maps with graph-based methods Georg Pölzlbauer 1, Andreas Rauber 1, and Michael Dittenbach 2 1 Department of Software Technology Vienna University
More information