A Stochastic Optimization Approach for Unsupervised Kernel Regression

Size: px
Start display at page:

Download "A Stochastic Optimization Approach for Unsupervised Kernel Regression"

Transcription

1 A Stochastic Optimization Approach for Unsupervised Kernel Regression Oliver Kramer Institute of Structural Mechanics Bauhaus-University Weimar Fabian Gieseke Institute of Structural Mechanics Bauhaus-University Weimar Abstract Unsupervised kernel regression (), the unsupervised counterpart of the Nadaraya-Watson estimator, is a dimension reduction technique for learning of low-dimensional manifolds. It is based on optimizing representative low-dimensional latent variables with regard to the space reconstruction error. The problem of scaling initial local linear embedding solutions, and optimization in latent space is a continuous multi-modal optimization problem. In this paper we employ evolutionary approaches for the optimization of the model. Based on local linear embedding solutions, the stochastic search techniques are used to optimize the model. An experimental study compares covariance matrix adaptation evolution strategies to an iterated local search evolution strategy. I. INTRODUCTION Unsupervised kernel regression () is the unsupervised counterpart of the Nadaraya-Watson estimator. It is based on optimizing latent variables w.r.t. to the space reconstruction error. The evolved latent variables define a solution. Projection back into space yields the manifold. But the optimization problem has multiple local optima. In the following, we employ evolutionary optimization strategies, i.e., the CMA-ES and Powell evolution strategies (Powell-ES) [], to optimize manifolds. Furthermore, we analyze the influence of local linear embedding (LLE) [8] as initialization procedure. This paper is organized as follows. In Section II we give a brief overview of manifold learning techniques. In Section III we introduce the problem by Meinicke et al. [], []. In this section we also shortly present the standard optimization approach that we are going to replace by an evolutionary framework. Section IV introduces the components of the evolutionary approach. An experimental study in Section V will give insights into the evolutionary optimization process. Important results are summarized in Section VI. II. RELATED WORK High-dimensional is usually difficult to interpret and visualize. But many sets show correlations between their variables. For high-dimensional, a low-dimensional simplified manifold can often be found that represents characteristics of the original. The assumption is that the structurally relevant lies on an embedded manifold of lower dimension. In the following, we will overview famous manifold learning techniques pointing out that the overview can only be subjective depiction of a wide field of methods. One of the most famous and widely used dimension reduction methods is principal component analysis (PCA) that assumes linearity of the manifold. Pearson [6] provided early work in this field. He fitted lines and planes to a given set of points. The standard PCA as most frequently known today can be found in the depiction of Jolliffe [8]. The PCA computes the eigenvectors, i.e., the largest eigenvalues of the covariance matrix of the samples. An approach for learning of nonlinear manifolds is kernel PCA [9] that projects the into a Hilbert space similarly to the SVM and SVR principle. Hastie and Stuetzle [6] introduced principal curves that are self-consistent smooth curves that pass through the middle of clouds. Self-consistency means that the principal curve is the average of the projected on it. Bad initializations can lead to bad local optima in the previous approaches. A solution to this problem is k-segments by Verbeek, Vlassis, and Kröse [] that alternates fitting of unconnected local principal axes and connecting the segments to form a polygonal line. Self-organizing feature maps [] proposed by Kohonen learn a topological mapping from space to a map of neurons, i.e., they perform a mapping to discrete values based on neural (codebook) vectors in space. During the training phase the neural vectors are pulled into the direction of the training. Generative topographic mapping (GTM) by Bishop, Svensén, and Williams [], [] is similar to selforganizing feature maps, but assumes that the observed has been generated by a parametric model, e.g., a Gaussian mixture model. It can be seen as constrained mixture of Gaussian, while the SOM can be viewed as constrained vector quantizer. Multi-dimensional scaling is a further class of dimension reduction methods, and is based on the pointwise embedding of the set, i.e., for each high-dimensional point y i, a low dimensional point x i is found, and for which similarity or dissimilarity conditions hold. For example, the pairwise (Euclidean) distances of two low-dimensional points shall be consistent with the high-dimensional counterparts. Another famous dimension reduction method based on multi-dimensional scaling is Isomap introduced by Tenenbaum, Silva, and Langford []. It is based on three steps: first, a neighborhood graph of Y is computed, second, the shortest distances between its

2 nodes y i and y j are computed (e.g. with Dijkstra), third, multi-dimensional scaling is applied based on the distances along the neighborhood graph that more represent curvilinear distances than Euclidean distances. The optimal embedding can be computed by the solution of an eigendecomposition. III. UNSUPERVISED KERNEL REGRESSION In this section we introduce the approach, regularization techniques and the Klanke-Ritter optimization approach [9]. A. Kernel Functions is based on kernel density functions K : R d R. A typical kernel function is the Gaussian (multivariate) kernel: ( K G (z) = (π) q/ det(h) exp H z ), () with bandwidth matrix H = diag(h, h,..., h d ). Figure illustrates that the result significantly depends on the choice of an appropriate bandwidth. For a random cloud the kernel density estimate is visualized. A too small bandwidth results in an overfitted model (left) Bandwidth Bandidth.5 Fig.. Comparison of kernel density estimates with two bandwidths of a cloud. B. Formulation has been introduced by Meinicke et al. [], [] within a general regression framework for the reversed problem of learning manifolds. The task of is to find the input parameters of the regression function that are the latent variables of the low-dimensional manifold. reverses the Nadaraya-Watson estimator [4], [], i.e., the latent variables X = (x,... x N ) R q N become parameters of the system, in particular X becomes the lower dimensional representation of the observed Y = (y,... y N ) R d N. The regression function can be written as follows: f(x; X) = i= y i K(x x i ) N j= K(x x j), () which is the revised Nadaraya-Watson estimator. The free parameters X define the manifold, i.e., the low-dimensional representation of the Y. Parameter x is the location where the function is evaluated, and is based on the entries of X. For convenience, Klanke et al. [9] introduced a vector b( ) R N of basis functions that define the ratios of the kernels: K(x x i ) b i (x, X) = N j= K(x x j). () Each component i of the vector b( ) contains the relative kernel density of point x w.r.t. the i-th point of matrix X. Equation () can also be written in terms of these basis functions: y i b i (x; X) = Yb(x; X). (4) i= The matrix Y of observed is fixed, and the basis functions are tuned during the learning process. The basis functions b i sum to one as they are normalized by the denominator. The quality of the principal manifold learning is evaluated with the space reconstruction error, i.e., the Euclidean distance between training and its reconstruction. R(X) = N Y YB(X) F, (5) using the Frobenius norm, and with the matrix of basis functions. The Frobenius norm of a matrix A is defined as follows: A m n F = a ij. (6) i= j= To summarize, b i R is a relative kernel density, b R N is a vector of basis functions, and B R N N is a matrix, whose columns consists of the vector of basis functions. Hence, the product of Y R d N and B R N N (which is the Nadaraya-Watson estimator) results in a d N-matrix. B(X) = (b(x ; X),..., b(x N ; X)). (7) Leave-one-out cross-validation (LOO-CV) can easily be employed by setting the diagonal entries of B to zero, normalizing the columns, and then applying Equation 5. Instead of applying LOO-CV an model can be regularized via penalizing extension in latent space (corresponding to penalizing small bandwidths in kernel regression) [9]: R(X) = N Y YB(X) F + λ X F. (8) The regularized approach will be used in the comparison between the CMA-ES and the Powell-ES in Section V-C. C. Klanke-Ritter Optimization Scheme Klanke and Ritter [9] have introduced an optimization scheme consisting of various steps. It uses PCA [8] and multiple LLE [8] solutions for initialization, see Section III-D. In particular, the optimization scheme consists of the following steps: ) Initialization of n + candidate solutions are, n solution from LLE, one solution from PCA, ) selection of the best initial solution w.r.t. CV-error, ) search for optimal scale factors that scale the best LLE solution to an solution w.r.t. CV-error, 4) selection of the most promising solution w.r.t. CV-error, 5) CV-error minimization: if the best solution stems from PCA: search for optimal regularization parameters η with the homotopy method,

3 if the best solution steps form LLE: CV-error minimization with the homotopy method / resiliant backpropagation (RPROP) [7], 6) final density threshold selection. For a detailed and formal description of the steps, we refer to the depiction of Klanke and Ritter [9]. They discuss that spectral methods do not always yield an initial set of latent variables that is close to a sufficient deep local minimum. We will employ evolution strategies to solve the global optimization problem, but also use LLE for initial solutions. D. Local Linear Embedding For non-linear manifolds LLE by Roweis and Saul [8] is often employed. LLE assumes local linearity of manifolds. It works as follows for mapping high-dimensional points y Y to low-dimensional embedding vectors x X. LLE computes a neighborhood graph like for the other nonlinear spectral embedding methods, see Section II. Then it computes the weights w ij that best reconstruct each point y i from its neighbors, minimizing the cost function: R(w) = i= y i j w ij y j. (9) The resulting weights capture the geometric structure of the as they are invariant under rotation, scaling and translation of the vectors. Then, LLE computes the vectors y i best reconstructed by the weights w ij minimizing R(w) = x i w ij x j () i= j= For a detailed introduction to LLE we refer to [8], and Chang and Yeung [4] for a variant robust against outliers. A free parameter of LLE is the number of local models, which can reach from to N. LLE is employed as initialization routine. The best LLE solution (w.r.t. the CV-error of the manifold) is used as basis for the subsequent stochastic CVerror minimization. IV. EVOLUTIONARY UNSUPERVISED KERNEL REGRESSION A. Evolutionary Optimization Scheme We employ the CMA-ES, and the Powell-ES to solve two steps of the optimization framework. Our aim is to replace the rather complicated optimization scheme we briefly summarized in Section III-C. It consists of the following steps: ) Initialization of n candidate LLE solutions, ) selection of the best initial solution ˆX init w.r.t. CV-error, ) search for optimal scale factors s = arg min s R CV (diag(s) ˆX init) () with CMA-ES/Powell-ES, and 4) CV-error minimization with CMA-ES/Powell-ES. We employ Huber s loss function [7], see Section IV-B, for the following experiments. The subsequent sections describe an experimental analysis of the evolutionary approach. A side effect of the use of an evolutionary scheme is that arbitrary, also non-differentiable kernel functions can be employed. LOO-CV can easily be implemented by setting the diagonal entries of X to zero, and normalizing the columns, and then applying Equation (5). In the following, we compare two optimization approaches for optimizing the model: () Powell s conjugate gradient ES [], and () the CMA- ES [5]. B. Huber s Loss In regression typically different loss function are used that weight the residuals. In the best case the loss function is chosen according to the needs of the underlying mining model. With the design of a loss function, the emphasis of outliers can be controlled. Let L : R q R d R be the loss function. In the univariate case d = a loss function is defined as L = N i= L(y i, f(x i )). The L loss is defined as and L is defined as L = L = y i f(x i ), () i= (y i f(x i )). () i= Huber s loss [7] is a differential alternative to the L loss, and makes use of a trade-off point δ between the L and the L characteristic, i.e.: and: L H = L h (r) = L h (y i f(x i )), (4) i= { δ r r < δ r δ r δ (5) Parameter δ allows a problem specific adjustment to certain problem characteristics. In the experimental part we use the setting δ =.. V. EXPERIMENTAL STUDY A. Datasets and Fitness Measure The experimental analysis is based on the following sets: -D-S: Noisy S : points, d =, noise magnitude σ =., and, test points without noise, -D-S: Noisy S : points, d =, noise magnitude σ =. and, test points without noise, digits 7: samples with d = 56 (6 6 greyscale values) of figure 7 from the digits set [5], and 5 test samples. The test error is computed by the projection of test points uniformly generated in latent space, and mapped to space. The test error is the sum of distances between each test point

4 x t T, and closest projection and the sum distances between each projected point x p P and the closest test point: R t = min x p x t + min x p x t (6) x p P x t T x t T x p P For training (validation) and test measurement of residuals Huber s loss function is employed, see Section IV-B. B. LLE Initialization and Scaling Balance Initialization with LLE is part of the optimization scheme introduced by Klanke and Ritter [9]. The question arises, how much optimization effort should be invested into the scaling in comparison to the CV-error minimization process. In the following, we analyze the balance of optimization effort for both optimization steps with a budget of 4, optimization steps. Table I shows the CV-errors () of the best LLE model, () after scaling optimization with the CMA-ES, () after the final CV-error minimization, and (4) the error on the test set. We test five optimization balances, the first number indicates the number of steps for the LLE scaling optimization, the second number states the number of steps for the latent variable-based optimization. We test the following combinations: (/4) meaning no scaling, 4, generations of final CV minimization, and the balances (/) meaning, scaling and, CV, (/) meaning, scaling and, CV, (/) meaning, scaling and, CV, and finally (4/) meaning 4 scaling and no final CV optimization. The values shown in Table I present the best test error of runs. The results show that the (/) variant achieves the lowest rest error, while the (/4) variant achieves the lowest training error. TABLE I EXPERIMENTAL ANALYSIS OF OPTIMIZATION STEPS INVESTED INTO SCALING OF THE LLE SOLUTIONS, AND CV-ERROR MINIMIZATION (MINIMAL VALUES). balance /4 / / / 4/ LLE scaling CV test C. CMA-ES and Powell-ES In the following, we compare two optimization algorithms, i.e., the CMA-ES and the Powell-ES (also known as Powell- ILS, see []) as optimization approaches for the learning problem. It is based on Powell s fast local search. To overcome local optima the Powell-ES makes use of a (µ+λ)- ES []. Each objective variable x R is mutated using Gaussian mutation [] with a step-size (variance σ), and a Powell-search is conducted until a local optimum is found. The step-sizes of the ES are mutated as follows: if in successive iterations the same local optimum is found, the step-sizes are increased to overcome local optima. In turn, if different local optima are found, the step-sizes are decreased. Table II compares the two optimizers on three artificial sets, using the penalized variant, see Equation (8) with λ =.. The values show the test error, i.e., the distance between the original to the projections of d samples after 4 fitness function evaluations, i.e., steps of scale optimization, and steps of CV error minimization. The experiments show that the Powell-ES achieves a lower training error in each of the experiments. But only on the problems D-S this is reflected in a lower test error. This means that on D-S and digits overfitting effect occurred that could not been prevented with the penalty regularization approach. A deeper analysis of the balancing parameter λ will be necessary TABLE II EXPERIMENTAL COMPARISON OF CMA-ES AND POWELL-ES ON THREE DATASETS D-S, D- AND digits. CMA-ES Powell-ES train test max train test max - - D-S D-S digits Fig.. Evolutionary on noisy S set. The figures show the results of various optimization efforts spent on scaling and final CV-error minimization, i.e., /4, /, /, and a regularized approach with λ =.5, see Equation (8). The projection is used for visualization. Figure gives a visual impression of the evolved manifolds. The original is shown by (blue) spots, the manifold is drawn as a (red) line. With the help of the test set, we compute a test error, see Equation (6). VI. CONCLUSIONS The multi-modal optimization problem of can be solved with the CMA-ES or the Powell-ES leading to an easier optimization framework that is also capable of handling nondifferentiable loss and kernel functions. However, initial LLE solutions still improve the optimization process. Overfitting effects might occur that have to be avoided by improved regularization approaches. For this sake further search has to be invested into parameter λ, e.g., employing grid-search.

5 In the future we plan to balance regularization with multiobjective optimization techniques. REFERENCES [] H.-G. Beyer and H.-P. Schwefel. Evolution strategies - A comprehensive introduction. Natural Computing, : 5,. [] C. M. Bishop, M. Svensén, and C. K. I. Williams. Developments of the generative topographic mapping. Neurocomputing, (-): 4, 998. [] C. M. Bishop, M. Svensén, and C. K. I. Williams. Gtm: The generative topographic mapping. Neural Computation, ():5 4, 998. [4] H. Chang and D. Yeung. Robust locally linear embedding. Pattern Recognition, 9:5 65, 6. [5] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, Berlin, 9. [6] Y. Hastie and W. Stuetzle. Principal curves. Journal of the American Statistical Association, 85(46):5 56, 989. [7] P. J. Huber. Robust Statistics. Wiley, New York, 98. [8] I. Jolliffe. Principal component analysis. Springer series in statistics. Springer, New York u.a., 986. [9] S. Klanke and H. Ritter. Variants of unsupervised kernel regression: General cost functions. Neurocomputing, 7(7-9):89, 7. [] T. Kohonen. Self-Organizing Maps. Springer,. [] O. Kramer. Fast blackbox optimization: Iterated local search and the strategy of powell. In Proceedings of the 9 International Conference on Genetic and Evolutionary Methods, 9. [] P. Meinicke. Unsupervised Learning in a Generalized Regression Framework. PhD thesis, University of Bielefeld,. [] P. Meinicke, S. Klanke, R. Memisevic, and H. Ritter. Principal surfaces from unsupervised kernel regression. IEEE Trans. Pattern Anal. Mach. Intell., 7(9):79 9, 5. [4] E. Nadaraya. On estimating regression. Theory of Probability and Its Application, :86 9, 964. [5] A. Ostermeier, A. Gawelczyk, and N. Hansen. A derandomized approach to self adaptation of evolution strategies. Evolutionary Computation, (4):69 8, 994. [6] K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, (6):559 57, 9. [7] M. Riedmiller and H. Braun. A direct adaptive method for faster backpropagation learning: The rprop algorithm. In In Proceedings of the IEEE International Conference on Neural Networks, pages , 99. [8] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. SCIENCE, 9: 6,. [9] B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, (5):99 9, 998. [] J. B. Tenenbaum, V. D. Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 9:9,. [] J. Verbeek, N. Vlassis, and B. Kröse. A k-segments algorithm for finding principal curves. Pattern Recognition, :9 7,. [] G. Watson. Smooth regression analysis. Sankhya Series A, 6:59 7, 964.

Non-linear dimension reduction

Non-linear dimension reduction Sta306b May 23, 2011 Dimension Reduction: 1 Non-linear dimension reduction ISOMAP: Tenenbaum, de Silva & Langford (2000) Local linear embedding: Roweis & Saul (2000) Local MDS: Chen (2006) all three methods

More information

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust

More information

Locality Preserving Projections (LPP) Abstract

Locality Preserving Projections (LPP) Abstract Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL

More information

Dimension Reduction of Image Manifolds

Dimension Reduction of Image Manifolds Dimension Reduction of Image Manifolds Arian Maleki Department of Electrical Engineering Stanford University Stanford, CA, 9435, USA E-mail: arianm@stanford.edu I. INTRODUCTION Dimension reduction of datasets

More information

Automatic Alignment of Local Representations

Automatic Alignment of Local Representations Automatic Alignment of Local Representations Yee Whye Teh and Sam Roweis Department of Computer Science, University of Toronto ywteh,roweis @cs.toronto.edu Abstract We present an automatic alignment procedure

More information

Does Dimensionality Reduction Improve the Quality of Motion Interpolation?

Does Dimensionality Reduction Improve the Quality of Motion Interpolation? Does Dimensionality Reduction Improve the Quality of Motion Interpolation? Sebastian Bitzer, Stefan Klanke and Sethu Vijayakumar School of Informatics - University of Edinburgh Informatics Forum, 10 Crichton

More information

Lecture Topic Projects

Lecture Topic Projects Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Locality Preserving Projections (LPP) Abstract

Locality Preserving Projections (LPP) Abstract Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL

More information

Non-linear CCA and PCA by Alignment of Local Models

Non-linear CCA and PCA by Alignment of Local Models Non-linear CCA and PCA by Alignment of Local Models Jakob J. Verbeek, Sam T. Roweis, and Nikos Vlassis Informatics Institute, University of Amsterdam Department of Computer Science,University of Toronto

More information

Topographic Local PCA Maps

Topographic Local PCA Maps Topographic Local PCA Maps Peter Meinicke and Helge Ritter Neuroinformatics Group, University of Bielefeld E-mail:{pmeinick, helge}@techfak.uni-bielefeld.de Abstract We present a model for coupling Local

More information

Selecting Models from Videos for Appearance-Based Face Recognition

Selecting Models from Videos for Appearance-Based Face Recognition Selecting Models from Videos for Appearance-Based Face Recognition Abdenour Hadid and Matti Pietikäinen Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P.O.

More information

Unsupervised Kernel Regression for Nonlinear Dimensionality Reduction

Unsupervised Kernel Regression for Nonlinear Dimensionality Reduction Unsupervised Kernel Regression for Nonlinear Dimensionality Reduction Diplomarbeit an der Technischen Fakultät der Universität Bielefeld Februar 23 Roland Memisevic Betreuer: Prof. Helge Ritter Universität

More information

Graph projection techniques for Self-Organizing Maps

Graph projection techniques for Self-Organizing Maps Graph projection techniques for Self-Organizing Maps Georg Pölzlbauer 1, Andreas Rauber 1, Michael Dittenbach 2 1- Vienna University of Technology - Department of Software Technology Favoritenstr. 9 11

More information

5.6 Self-organizing maps (SOM) [Book, Sect. 10.3]

5.6 Self-organizing maps (SOM) [Book, Sect. 10.3] Ch.5 Classification and Clustering 5.6 Self-organizing maps (SOM) [Book, Sect. 10.3] The self-organizing map (SOM) method, introduced by Kohonen (1982, 2001), approximates a dataset in multidimensional

More information

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science University of Sheffield Regent Court, 211 Portobello Street, Sheffield,

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\ Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Learning a Manifold as an Atlas Supplementary Material

Learning a Manifold as an Atlas Supplementary Material Learning a Manifold as an Atlas Supplementary Material Nikolaos Pitelis Chris Russell School of EECS, Queen Mary, University of London [nikolaos.pitelis,chrisr,lourdes]@eecs.qmul.ac.uk Lourdes Agapito

More information

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial Image Similarities for Learning Video Manifolds Selen Atasoy MICCAI 2011 Tutorial Image Spaces Image Manifolds Tenenbaum2000 Roweis2000 Tenenbaum2000 [Tenenbaum2000: J. B. Tenenbaum, V. Silva, J. C. Langford:

More information

Generalized Principal Component Analysis CVPR 2007

Generalized Principal Component Analysis CVPR 2007 Generalized Principal Component Analysis Tutorial @ CVPR 2007 Yi Ma ECE Department University of Illinois Urbana Champaign René Vidal Center for Imaging Science Institute for Computational Medicine Johns

More information

Seismic facies analysis using generative topographic mapping

Seismic facies analysis using generative topographic mapping Satinder Chopra + * and Kurt J. Marfurt + Arcis Seismic Solutions, Calgary; The University of Oklahoma, Norman Summary Seismic facies analysis is commonly carried out by classifying seismic waveforms based

More information

ABSTRACT. Keywords: visual training, unsupervised learning, lumber inspection, projection 1. INTRODUCTION

ABSTRACT. Keywords: visual training, unsupervised learning, lumber inspection, projection 1. INTRODUCTION Comparison of Dimensionality Reduction Methods for Wood Surface Inspection Matti Niskanen and Olli Silvén Machine Vision Group, Infotech Oulu, University of Oulu, Finland ABSTRACT Dimensionality reduction

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Unsupervised learning Daniel Hennes 29.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Supervised learning Regression (linear

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

A Population Based Convergence Criterion for Self-Organizing Maps

A Population Based Convergence Criterion for Self-Organizing Maps A Population Based Convergence Criterion for Self-Organizing Maps Lutz Hamel and Benjamin Ott Department of Computer Science and Statistics, University of Rhode Island, Kingston, RI 02881, USA. Email:

More information

A Topography-Preserving Latent Variable Model with Learning Metrics

A Topography-Preserving Latent Variable Model with Learning Metrics A Topography-Preserving Latent Variable Model with Learning Metrics Samuel Kaski and Janne Sinkkonen Helsinki University of Technology Neural Networks Research Centre P.O. Box 5400, FIN-02015 HUT, Finland

More information

Tight Clusters and Smooth Manifolds with the Harmonic Topographic Map.

Tight Clusters and Smooth Manifolds with the Harmonic Topographic Map. Proceedings of the th WSEAS Int. Conf. on SIMULATION, MODELING AND OPTIMIZATION, Corfu, Greece, August -9, (pp8-) Tight Clusters and Smooth Manifolds with the Harmonic Topographic Map. MARIAN PEÑA AND

More information

Manifold Learning for Video-to-Video Face Recognition

Manifold Learning for Video-to-Video Face Recognition Manifold Learning for Video-to-Video Face Recognition Abstract. We look in this work at the problem of video-based face recognition in which both training and test sets are video sequences, and propose

More information

Robust Pose Estimation using the SwissRanger SR-3000 Camera

Robust Pose Estimation using the SwissRanger SR-3000 Camera Robust Pose Estimation using the SwissRanger SR- Camera Sigurjón Árni Guðmundsson, Rasmus Larsen and Bjarne K. Ersbøll Technical University of Denmark, Informatics and Mathematical Modelling. Building,

More information

Experimental Analysis of GTM

Experimental Analysis of GTM Experimental Analysis of GTM Elias Pampalk In the past years many different data mining techniques have been developed. The goal of the seminar Kosice-Vienna is to compare some of them to determine which

More information

Bioinformatics - Lecture 07

Bioinformatics - Lecture 07 Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles

More information

Advanced visualization techniques for Self-Organizing Maps with graph-based methods

Advanced visualization techniques for Self-Organizing Maps with graph-based methods Advanced visualization techniques for Self-Organizing Maps with graph-based methods Georg Pölzlbauer 1, Andreas Rauber 1, and Michael Dittenbach 2 1 Department of Software Technology Vienna University

More information

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\ Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization

More information

Sensitivity to parameter and data variations in dimensionality reduction techniques

Sensitivity to parameter and data variations in dimensionality reduction techniques Sensitivity to parameter and data variations in dimensionality reduction techniques Francisco J. García-Fernández 1,2,MichelVerleysen 2, John A. Lee 3 and Ignacio Díaz 1 1- Univ. of Oviedo - Department

More information

Clustering with Reinforcement Learning

Clustering with Reinforcement Learning Clustering with Reinforcement Learning Wesam Barbakh and Colin Fyfe, The University of Paisley, Scotland. email:wesam.barbakh,colin.fyfe@paisley.ac.uk Abstract We show how a previously derived method of

More information

Function approximation using RBF network. 10 basis functions and 25 data points.

Function approximation using RBF network. 10 basis functions and 25 data points. 1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Bagging for One-Class Learning

Bagging for One-Class Learning Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one

More information

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition Technical Report Title: Manifold learning and Random Projections for multi-view object recognition Authors: Grigorios Tsagkatakis 1 and Andreas Savakis 2 1 Center for Imaging Science, Rochester Institute

More information

GTM: The Generative Topographic Mapping

GTM: The Generative Topographic Mapping Communicated by Helge Ritter GTM: The Generative Topographic Mapping Christopher M. Bishop Markus Svensén Christopher K. I. Williams Neural Computing Research Group, Department of Computer Science and

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

GTM: The Generative Topographic Mapping

GTM: The Generative Topographic Mapping GTM: The Generative Topographic Mapping Christopher M. Bishop, Markus Svensén Microsoft Research 7 J J Thomson Avenue Cambridge, CB3 0FB, U.K. {cmbishop,markussv}@microsoft.com http://research.microsoft.com/{

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

Chap.12 Kernel methods [Book, Chap.7]

Chap.12 Kernel methods [Book, Chap.7] Chap.12 Kernel methods [Book, Chap.7] Neural network methods became popular in the mid to late 1980s, but by the mid to late 1990s, kernel methods have also become popular in machine learning. The first

More information

Globally and Locally Consistent Unsupervised Projection

Globally and Locally Consistent Unsupervised Projection Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Globally and Locally Consistent Unsupervised Projection Hua Wang, Feiping Nie, Heng Huang Department of Electrical Engineering

More information

Image Processing. Image Features

Image Processing. Image Features Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Unsupervised Learning of a Kinematic Arm Model

Unsupervised Learning of a Kinematic Arm Model Unsupervised Learning of a Kinematic Arm Model Heiko Hoffmann and Ralf Möller Cognitive Robotics, Max Planck Institute for Psychological Research, Amalienstr. 33, D-80799 Munich, Germany hoffmann@psy.mpg.de,

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

Local multidimensional scaling with controlled tradeoff between trustworthiness and continuity

Local multidimensional scaling with controlled tradeoff between trustworthiness and continuity Local multidimensional scaling with controlled tradeoff between trustworthiness and continuity Jaro Venna and Samuel Kasi, Neural Networs Research Centre Helsini University of Technology Espoo, Finland

More information

Feature scaling in support vector data description

Feature scaling in support vector data description Feature scaling in support vector data description P. Juszczak, D.M.J. Tax, R.P.W. Duin Pattern Recognition Group, Department of Applied Physics, Faculty of Applied Sciences, Delft University of Technology,

More information

CIE L*a*b* color model

CIE L*a*b* color model CIE L*a*b* color model To further strengthen the correlation between the color model and human perception, we apply the following non-linear transformation: with where (X n,y n,z n ) are the tristimulus

More information

Large-Scale Face Manifold Learning

Large-Scale Face Manifold Learning Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

LOCAL LINEAR APPROXIMATION OF PRINCIPAL CURVE PROJECTIONS. Peng Zhang, Esra Ataer-Cansizoglu, Deniz Erdogmus

LOCAL LINEAR APPROXIMATION OF PRINCIPAL CURVE PROJECTIONS. Peng Zhang, Esra Ataer-Cansizoglu, Deniz Erdogmus 2012 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 23 26, 2012, SATANDER, SPAIN LOCAL LINEAR APPROXIMATION OF PRINCIPAL CURVE PROJECTIONS Peng Zhang, Esra Ataer-Cansizoglu,

More information

Head Frontal-View Identification Using Extended LLE

Head Frontal-View Identification Using Extended LLE Head Frontal-View Identification Using Extended LLE Chao Wang Center for Spoken Language Understanding, Oregon Health and Science University Abstract Automatic head frontal-view identification is challenging

More information

CS 195-5: Machine Learning Problem Set 5

CS 195-5: Machine Learning Problem Set 5 CS 195-5: Machine Learning Problem Set 5 Douglas Lanman dlanman@brown.edu 26 November 26 1 Clustering and Vector Quantization Problem 1 Part 1: In this problem we will apply Vector Quantization (VQ) to

More information

Non-Local Estimation of Manifold Structure

Non-Local Estimation of Manifold Structure Non-Local Estimation of Manifold Structure Yoshua Bengio, Martin Monperrus and Hugo Larochelle Département d Informatique et Recherche Opérationnelle Centre de Recherches Mathématiques Université de Montréal

More information

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6 Cluster Analysis and Visualization Workshop on Statistics and Machine Learning 2004/2/6 Outlines Introduction Stages in Clustering Clustering Analysis and Visualization One/two-dimensional Data Histogram,

More information

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM Olga Kouropteva, Oleg Okun and Matti Pietikäinen Machine Vision Group, Infotech Oulu and Department of Electrical and

More information

t 1 y(x;w) x 2 t 2 t 3 x 1

t 1 y(x;w) x 2 t 2 t 3 x 1 Neural Computing Research Group Dept of Computer Science & Applied Mathematics Aston University Birmingham B4 7ET United Kingdom Tel: +44 (0)121 333 4631 Fax: +44 (0)121 333 4586 http://www.ncrg.aston.ac.uk/

More information

Non-Local Manifold Tangent Learning

Non-Local Manifold Tangent Learning Non-Local Manifold Tangent Learning Yoshua Bengio and Martin Monperrus Dept. IRO, Université de Montréal P.O. Box 1, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,monperrm}@iro.umontreal.ca Abstract

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Data fusion and multi-cue data matching using diffusion maps

Data fusion and multi-cue data matching using diffusion maps Data fusion and multi-cue data matching using diffusion maps Stéphane Lafon Collaborators: Raphy Coifman, Andreas Glaser, Yosi Keller, Steven Zucker (Yale University) Part of this work was supported by

More information

A SOM-view of oilfield data: A novel vector field visualization for Self-Organizing Maps and its applications in the petroleum industry

A SOM-view of oilfield data: A novel vector field visualization for Self-Organizing Maps and its applications in the petroleum industry A SOM-view of oilfield data: A novel vector field visualization for Self-Organizing Maps and its applications in the petroleum industry Georg Pölzlbauer, Andreas Rauber (Department of Software Technology

More information

The Analysis of Parameters t and k of LPP on Several Famous Face Databases

The Analysis of Parameters t and k of LPP on Several Famous Face Databases The Analysis of Parameters t and k of LPP on Several Famous Face Databases Sujing Wang, Na Zhang, Mingfang Sun, and Chunguang Zhou College of Computer Science and Technology, Jilin University, Changchun

More information

RDRToolbox A package for nonlinear dimension reduction with Isomap and LLE.

RDRToolbox A package for nonlinear dimension reduction with Isomap and LLE. RDRToolbox A package for nonlinear dimension reduction with Isomap and LLE. Christoph Bartenhagen October 30, 2017 Contents 1 Introduction 1 1.1 Loading the package......................................

More information

Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling

Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling Moritz Baecher May 15, 29 1 Introduction Edge-preserving smoothing and super-resolution are classic and important

More information

A k-segments algorithm for finding principal curves

A k-segments algorithm for finding principal curves A k-segments algorithm for finding principal curves Jakob Verbeek, Nikos Vlassis, Ben Krose To cite this version: Jakob Verbeek, Nikos Vlassis, Ben Krose. A k-segments algorithm for finding principal curves.

More information

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Raquel Urtasun & Rich Zemel University of Toronto Nov 4, 2015 Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 1 / 18

More information

Remote Sensing Data Classification Using Combined Spectral and Spatial Local Linear Embedding (CSSLE)

Remote Sensing Data Classification Using Combined Spectral and Spatial Local Linear Embedding (CSSLE) 2016 International Conference on Artificial Intelligence and Computer Science (AICS 2016) ISBN: 978-1-60595-411-0 Remote Sensing Data Classification Using Combined Spectral and Spatial Local Linear Embedding

More information

Automatic estimation of the inlier threshold in robust multiple structures fitting.

Automatic estimation of the inlier threshold in robust multiple structures fitting. Automatic estimation of the inlier threshold in robust multiple structures fitting. Roberto Toldo and Andrea Fusiello Dipartimento di Informatica, Università di Verona Strada Le Grazie, 3734 Verona, Italy

More information

Data-Dependent Kernels for High-Dimensional Data Classification

Data-Dependent Kernels for High-Dimensional Data Classification Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005 Data-Dependent Kernels for High-Dimensional Data Classification Jingdong Wang James T. Kwok

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 14-PCA & Autoencoders 1 / 18

More information

Markov Random Fields and Gibbs Sampling for Image Denoising

Markov Random Fields and Gibbs Sampling for Image Denoising Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov

More information

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation

More information

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks Series Prediction as a Problem of Missing Values: Application to ESTSP7 and NN3 Competition Benchmarks Antti Sorjamaa and Amaury Lendasse Abstract In this paper, time series prediction is considered as

More information

Content-based image and video analysis. Machine learning

Content-based image and video analysis. Machine learning Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all

More information

INDEPENDENT COMPONENT ANALYSIS WITH QUANTIZING DENSITY ESTIMATORS. Peter Meinicke, Helge Ritter. Neuroinformatics Group University Bielefeld Germany

INDEPENDENT COMPONENT ANALYSIS WITH QUANTIZING DENSITY ESTIMATORS. Peter Meinicke, Helge Ritter. Neuroinformatics Group University Bielefeld Germany INDEPENDENT COMPONENT ANALYSIS WITH QUANTIZING DENSITY ESTIMATORS Peter Meinicke, Helge Ritter Neuroinformatics Group University Bielefeld Germany ABSTRACT We propose an approach to source adaptivity in

More information

Classification by Nearest Shrunken Centroids and Support Vector Machines

Classification by Nearest Shrunken Centroids and Support Vector Machines Classification by Nearest Shrunken Centroids and Support Vector Machines Florian Markowetz florian.markowetz@molgen.mpg.de Max Planck Institute for Molecular Genetics, Computational Diagnostics Group,

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

Evgeny Maksakov Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages:

Evgeny Maksakov Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages: Today Problems with visualizing high dimensional data Problem Overview Direct Visualization Approaches High dimensionality Visual cluttering Clarity of representation Visualization is time consuming Dimensional

More information

Iterative Non-linear Dimensionality Reduction by Manifold Sculpting

Iterative Non-linear Dimensionality Reduction by Manifold Sculpting Iterative Non-linear Dimensionality Reduction by Manifold Sculpting Mike Gashler, Dan Ventura, and Tony Martinez Brigham Young University Provo, UT 84604 Abstract Many algorithms have been recently developed

More information

Spatial Outlier Detection

Spatial Outlier Detection Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point

More information

Tensor Sparse PCA and Face Recognition: A Novel Approach

Tensor Sparse PCA and Face Recognition: A Novel Approach Tensor Sparse PCA and Face Recognition: A Novel Approach Loc Tran Laboratoire CHArt EA4004 EPHE-PSL University, France tran0398@umn.edu Linh Tran Ho Chi Minh University of Technology, Vietnam linhtran.ut@gmail.com

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones What is machine learning? Data interpretation describing relationship between predictors and responses

More information

Unsupervised Kernel Regression Modeling Approach for RUL Prediction

Unsupervised Kernel Regression Modeling Approach for RUL Prediction Unsupervised Kernel Regression Modeling Approach for RUL Prediction Racha Khelif, Simon Malinowski, Brigitte Chebel - Morello and Noureddine Zerhouni FEMTO - ST Institute, 24 rue Alain Savary, Besanon,

More information

MTTTS17 Dimensionality Reduction and Visualization. Spring 2018 Jaakko Peltonen. Lecture 11: Neighbor Embedding Methods continued

MTTTS17 Dimensionality Reduction and Visualization. Spring 2018 Jaakko Peltonen. Lecture 11: Neighbor Embedding Methods continued MTTTS17 Dimensionality Reduction and Visualization Spring 2018 Jaakko Peltonen Lecture 11: Neighbor Embedding Methods continued This Lecture Neighbor embedding by generative modeling Some supervised neighbor

More information

Recent Developments in Model-based Derivative-free Optimization

Recent Developments in Model-based Derivative-free Optimization Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information