Facing Non-Convex Optimization to Scale Machine Learning to AI

Size: px
Start display at page:

Download "Facing Non-Convex Optimization to Scale Machine Learning to AI"

Transcription

1 Facing Non-Convex Optimization to Scale Machine Learning to AI October 10th 2006 Thanks to: Yann Le Cun, Geoffrey Hinton, Pascal Lamblin, Olivier Delalleau, Nicolas Le Roux, Hugo Larochelle

2 Machine Learning for Artificial Intelligence Knowledge-based AI has stalled : extracting knowledge from humans and formalizing it into a coherent AI is too labor-intensive, and does not work because much human knowledge is not explicit. The promise of Machine Learning to solve AI : use data and learn the AI tasks from examples. Remains elusive! Here : examine limitations linked to AI for large class of non-parametric approaches (e.g. SVMs), enjoying easy (convex) optimization, and discuss approaches not suffering from these limitations, but non-convex.

3 ML for AI : Desiderata Need large nb examples n because learning complex functions Most examples will be unlabeled : need semi-supervised learning Need to be able to efficiently represent such complex functions : statistical scaling Computational scaling should be O(n) : online learnine Human labor required should NOT increase linearly with nb of sub-tasks Many interrelated tasks in world of humans : need multi-task learning

4 No Free Lunch but Broad Priors for AI No Free Lunch theorems for ML : no completely general learning algorithm exists. Reduce to AI tasks, that animals perform effortlessly : perception, control higher animals and humans : long-term prediction, reasoning, planning, language Should we hand-craft priors for each particular task? (e.g. recognizing one class of objects in images) overwhelming human labor required or can we hope to find a few broad priors (i.e. learning principles) that cover most AI tasks?

5 Kernel Machines f (x) = b + i α i K(x, x i ) In some cases K depends mildly on the data (e.g. normalization). In Support Vector Machines (SVMs), Gaussian Processes, but also in unsupervised manifold learning (LLE, Isomap, kernel PCA, etc.), and non-parametric semi-supervised learning (based on neighborhood graph). Easy optimization (analytic or convex optimization problems), but scaling may still be unacceptable w.r.t. number of training examples (e.g. quadratic). Kernel machines usually embody a smoothness prior x y f (x) f (y) through a local kernel functions K(x, y) large for x near y. Unsuprisingly, relying only on this prior is inadequate to learn functions with many variations, such as in AI.

6 Shallow vs Deep Architectures 1 Linear predictors f (x) = w φ(x), with φ(x) low-dimensional 2 Kernel machines f (x) = i α ik(x, x i ) (kernel trick allows high-dimensional φ(x)) 3 2-layer machines : truly adaptive kernel, RBF networks, standard 1-hidden-layer neural nets, boosting 4 Deep architectures : many (4 to 12 levels reported to date). All above (except 1 if φ low-dim) can theoretically approximate any function. Good thing about type 1 and 2 : convex optimization programs. But what is the price?

7 The Depth-Breadth Tradeoff The disadvantage of shallow architectures is inefficient representation. Worse-case can be exponentially bad. Examples from boolean circuits theory Multiplier circuits N bits N bits can be shallow : O(2 N ) elements needed, or deep : O(N log N) elements needed. DNF (shallow) representation of a O(N) formula may require O(2 N ) terms N-bit parity : shallow needs O(2 N ), deep needs O(N log N) elements with log N levels. FFT : shallow (matrix) representation needs O(N 2 ) op., Fast FT needs O(N log N) with log N levels. (see also Utgoff 2002)

8 Myopic vs Far-Reaching Learning Algorithms The current algorithms are myopic because they must rely on highly local data to characterize the data distribution. We should develop algorithms that allow one to generalize far from the training set, for example sharing information about global parameters that describe the structure of the manifold. DEEP ARCHITECTURES ALLOW NON-LOCAL GENERALIZATION. But they come with a price, non-convex optimization.

9 Local Learning Algorithms A learned parameter of the model influences the value of the learned function in a local area of the input domain.

10 Local Learning Algorithms A learned parameter of the model influences the value of the learned function in a local area of the input domain. With local kernel machine f (x) = i α i K(x, x i ), α i only influences f (x) for x near x i.

11 Local Learning Algorithms A learned parameter of the model influences the value of the learned function in a local area of the input domain. With local kernel machine f (x) = i α i K(x, x i ), α i only influences f (x) for x near x i. Examples : nearest-neighbor algorithms local kernel machines most non-parametric models except multi-layer neural networks

12 Mathematical Problem with Local Learning Theorem With K the Gaussian kernel and f ( ) changing sign at least 2k times along some straight line (i.e. that line crosses the decision surface at least 2k times), then at least k examples are required.

13 Mathematical Problem with Local Learning Theorem With K the Gaussian kernel and f ( ) changing sign at least 2k times along some straight line (i.e. that line crosses the decision surface at least 2k times), then at least k examples are required. Class 1 decision surface Class 1 With local kernels, learning a function that has many bumps requires as many examples as bumps.

14 The Curse of Dimensionality Mathematical problem with classical non-parametric models

15 The Curse of Dimensionality Mathematical problem with classical non-parametric models

16 The Curse of Dimensionality Mathematical problem with classical non-parametric models May need to have examples for each probable combination of the variables of interest. OK for 2 or 3 variables, NOT OK for abstract concepts...

17 Mathematical Problem with Local Kernels Theorem With K the Gaussian kernel, and the goal to learn a maximally changing binary function (f (x) f (x ) when x x = 1) with d inputs, then at least 2 d 1 examples are required.

18 Mathematical Problem with Local Kernels Theorem With K the Gaussian kernel, and the goal to learn a maximally changing binary function (f (x) f (x ) when x x = 1) with d inputs, then at least 2 d 1 examples are required. need to cover the space of possibilities with examples may require nb examples exponential in nb inputs

19 Mathematical Problem with Local Kernels Theorem With K the Gaussian kernel, and the goal to learn a maximally changing binary function (f (x) f (x ) when x x = 1) with d inputs, then at least 2 d 1 examples are required. need to cover the space of possibilities with examples may require nb examples exponential in nb inputs = strongly negative mathematical results on local kernel machines Other similar results in (Bengio, Delalleau, Le Roux, NIPS 2005)

20 Local Manifold Learning : Local Linear Patches Current manifold learning algorithms cannot handle highly curved manifolds because they are based on locally linear patches estimated locally (possibly aligned globally). tangent image tangent directions high contrast image shifted image tangent image tangent directions

21 Local Manifold Learning Algorithms We have shown that LLE, Isomap, kernel PCA, Laplacian Eigenmaps, etc. are kernel machines with a local kernel (Bengio et al 2005). Local Manifold Learning Algorithms : derive information about the manifold structure near x using mostly the neighbors of x. For LLE, kernel PCA with Gaussian kernel, spectral clustering, Laplacian Eigenmaps K D (x, y) 0 for x far from y, so e k (x) only depends on the neighbors of x. Therefore the tangent plane e k (x) x = 1 λ k n i=1 v ki K D (x, x i ) x also only depends on the neighbors of x. can t say anything about the manifold structure near a new example x that is far from training examples!

22 The Curse of Dimensionality on a Manifold Similar to ordinary curse of dimensionality for classical non-parametric statistics, but where d = dimension of the manifold. Hurts all local manifold learning methods!

23 Fundamental Problems with Local Manifold Learning High Noise : constraints not perfectly satisfied. Data not strictly on manifold. More noise more data needed per local patch. High Curvature : need more smaller patches O((1/r) d ) with r = patch radius decreasing with curvature. High Manifold Dimension : O((1/r) d ) patches are needed (curse of dimensionality), at least O(d) examples per patch ( noise). Many manifolds : e.g. images of transformed object instances = 1 manifold per instance or per object class. Local manifold learning can t take advantage of shared structure across multiple manifolds.

24 Fat but Shallow Neural Networks Equivalent to SVMs In Convex Neural Networks (Bengio, Le Roux, Vincent, Delalleau, Marcotte, 2006) : show an equivalence between shallow neural networks (1-hidden layer) and SVMs or Gaussian Processes when number of hidden units becomes large using L2 regularization of the output weights The only difference is in the type of kernel, but it still depends on Euclidean distance between its arguments. So ordinary MLPs may have the disadvantages of SVMs (shallow architecture) without the advantages (convex optimization).

25 Non-Smooth Functions are Learnable Wrong common belief : that if we do not have strong prior knowledge, then highly variable (non-smooth) functions are not learnable. Simple counter-examples : Prior can generally be encoded using Kolmogov complexity, use MDL strategy. Highly variable functions but simple functions in C language : sinus, parity. Such functions could be learned using only few examples and C language MDL prior.

26 Convex Optimization can be Too Slow Convex optimization for SVMs : between O(n 2 ) and O(n 3 ) CPU for n examples, and O(n 2 ) memory. Convex optimization for manifold learning based on neighborhood graph : between O(n 2 ) (LLE), O(n 3 log n) (Isomap) and O(n 7 ) (semi-definite embedding). Does not scale well when n. On the other, sub-optimal stochastic gradient descent for multi-layer neural networks (which only approaches a local minimum) can be applied online (i.e. needs to perform constant computation per example = O(n) computation). Recent work on approximate and on-line optimization of SVMs (Bottou et al 2006) : better scaling properties but nb. support vectors may still grow too fast.

27 Non-Convex Optimization of Deep Architectures What deep architectures are known? various kinds of multi-layer neural networks with many layers. Except for a very special kind of architectures for machine vision (convolutional networks), deep architectures have been neglected in machine learning. Why? training gets stuck in mediocre solutions (Tesauro 92). No hope?

28 Convex Optimization can be a Good Initialization Trading Convexity for Scalability, Collobert, Weston and Bottou (2005) : show that SVM classification can be significantly improved by continuing training (local optimization) with a non-convex more discriminant criterion. In addition, this non-convex criterion can be applied to transductive SVMs, allowing for the first time to optimize them in a reasonable time.

29 Greedy Learning of Abstractions Greedily learning simple things first, higher-level abstractions on top of lower-level ones seems like a possible good strategy and is psychologically plausible.

30 Greedy Learning of Abstractions Greedily learning simple things first, higher-level abstractions on top of lower-level ones seems like a possible good strategy and is psychologically plausible. Coherent with psychological litterature starting with Piaget We learn baby math before arithmetic before algebra before differential equations... Also evidence from neurobiology : (Guillery 2005) Is postnatal neocortical maturation hierarchical?.

31 Deep Networks Some functions can be represented very efficiently with a deep network, but require many more computational elements with a 1-layer or 2-layer network.

32 Deep Networks Some functions can be represented very efficiently with a deep network, but require many more computational elements with a 1-layer or 2-layer network

33 Deep Networks Some functions can be represented very efficiently with a deep network, but require many more computational elements with a 1-layer or 2-layer network. e.g. d-bit parity : adaptive layer (SVM) : 2 d parameters required units and adaptive layers (neural net) : d units, d 2 parameters d-layer net : 2d units, 5d parameters recurrent net : 2 units, 5 param.

34 Deep Belief Networks Geoff Hinton just introduced a deep network model (Hinton, Osindero and Teh, 2006) that provides more evidence that this direction is worthwhile :

35 Deep Belief Networks Geoff Hinton just introduced a deep network model (Hinton, Osindero and Teh, 2006) that provides more evidence that this direction is worthwhile : unsupervised learning of each layer, each trying to model distribution of its inputs

36 Deep Belief Networks Geoff Hinton just introduced a deep network model (Hinton, Osindero and Teh, 2006) that provides more evidence that this direction is worthwhile : unsupervised learning of each layer, each trying to model distribution of its inputs unsupervised greedy layer-wise training serves as INITIALIZATION, to replace traditional random initialization of multi-layer networks

37 Deep Belief Networks Geoff Hinton just introduced a deep network model (Hinton, Osindero and Teh, 2006) that provides more evidence that this direction is worthwhile : unsupervised learning of each layer, each trying to model distribution of its inputs unsupervised greedy layer-wise training serves as INITIALIZATION, to replace traditional random initialization of multi-layer networks beating state-of-the-art statistical learning in experiments on a large machine learning benchmark task (MNIST)

38 Greedy Layer-wise Initialization The principle of greedy layer-wise initialization proposed by Hinton can be generalized to other algorithms. We replaced the probabilistic model (Restricted Boltzmann Machine) used by Hinton for unsupervised training of each layer by a simple auto-associator : Find W which minimizes cross-entropy loss in predicting x from sigmoid(w tanh(wx)). In this context W could be initialized using a convex + analytic heuristic.

39 Experiments on Greedy Layer-wise Initialization Deep nets with 3 to 4 hidden layers. Compare SUPERVISED and UNSUPERVISED (auto-associator or DBN) greedy strategy. train. valid. test DBN, unsupervised pre-training 0% 1.3% 1.4% Deep net, auto-associator pre-training 0% 1.4% 1.4% Deep net, supervised pre-training 0% 1.75% 2.0% Deep net, no pre-training.004% 2.1% 2.4% Shallow net, no pre-training.004% 1.8% 1.9% Classification error on MNIST training, validation, and test sets, with the best hyper-parameters according to validation error, with and without pre-training, using purely supervised or purely unsupervised pre-training. Finds around 500 hidden units per layer. SUPERVISED GREEDY is TOO GREEDY.

40 It is Really an Optimization Problem Why 0 train error even with deep net / no-pretraining? Because last fat hidden layer did all the work. Classification error on MNIST with 20 hidden units on top layer : train. valid. test Deep net, auto-associator pre-training 0% 1.4% 1.6% Deep net, supervised pre-training 0% 1.8% 1.9% Deep net, no pre-training.59% 2.1% 2.2% Shallow net, no pre-training 3.6% 4.7% 5.0% YES IT IS REALLY AN OPTIMIZATION PROBLEM, AND GREEDY UNSUPERVISED HELPS LOTS.

41 Learning Visual Invariances A number of experiments from three different labs (Hinton, LeCun, Bengio) point to inability of kernel machines to efficiently (computational + statistical senses) learn from data involving many complex invariances, e.g., from geometry of images, manifold due to translation, rotation, scaling, shear, thickness, etc. Example : consider N objects in image, each with K different geometric degrees of freedom, corresponding to curvatures such that M different values of each dimension must be considered. Need O(M KN ) templates. the ability of various non-local non-shallow architectures to capture such invariances. But research on deep architectures is still very young!

42 Conclusions Need for AI need ML for highly-varying functions Shallow architectures / local kernel machines with convex optimization do not deliver Number of required examples grows linearly with nb of desired variations Curse of dimensionality arguments Can trade convexity for scalability (computational and statistical)! Deep architectures were thought not trainable, but new methods appear to break through obstacle

43 Belkin, M., Matveeva, I., and Niyogi, P. (2004). Regularization and semi-supervised learning on large graphs. In Shawe-Taylor, J. and Singer, Y., editors, COLT Springer. Belkin, M. and Niyogi, P. (2003). Using manifold structure for partially labeled classification. In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15, Cambridge, MA. MIT Press. Bengio, Y., Delalleau, O., and Le Roux, N. (2006). The curse of highly variable functions for local kernel machines. In Advances in Neural Information Processing Systems 18. MIT Press. Bengio, Y., Delalleau, O., Le Roux, N., Paiement, J.-F., Vincent, P., and Ouimet, M. (2004a). Learning eigenfunctions links spectral embedding and kernel PCA. Neural Computation, 16(10) : Bengio, Y. and Larochelle, H. (2006). Non-local manifold parzen windows. In Weiss, Y., Schölkopf, B., and Platt, J., editors, Advances in Neural Information Processing Systems 18. MIT Press. Bengio, Y., Paiement, J., Vincent, P., Delalleau, O., Le Roux, N., and Ouimet, M. (2004b). Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering. In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16. MIT Press. Bengio, Y., Simard, P., and Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2) : Bourlard, H. and Kamp, Y. (1988). Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics, 59 : Brand, M. (2003).

44 Charting a manifold. In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15. MIT Press. Chapelle, O., Weston, J., and Schölkopf, B. (2003). Cluster kernels for semi-supervised learning. In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15, Cambridge, MA. MIT Press. Cox, T. and Cox, M. (1994). Multidimensional Scaling. Chapman & Hall, London. Delalleau, O., Bengio, Y., and Le Roux, N. (2005). Efficient non-parametric function induction in semi-supervised learning. In Cowell, R. and Ghahramani, Z., editors, Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, Jan 6-8, 2005, Savannah Hotel, Barbados, pages Society for Artificial Intelligence and Statistics. Freund, Y. and Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine Learning : Proceedings of Thirteenth International Conference, pages Ghahramani, Z. and Hinton, G. (1996). The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, Dpt. of Comp. Sci., Univ. of Toronto. Hinton, G. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8) : Hinton, G. E., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation. Ng, A. Y., Jordan, M. I., and Weiss, Y. (2002).

45 On spectral clustering : analysis and an algorithm. In Dietterich, T., Becker, S., and Ghahramani, Z., editors, Advances in Neural Information Processing Systems 14, Cambridge, MA. MIT Press. Rao, R. and Ruderman, D. (1999). Learning lie groups for invariant visual perception. In Kearns, M., Solla, S., and Cohn, D., editors, Advances in Neural Information Processing Systems 11, pages MIT Press, Cambridge, MA. Roweis, S. and Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500) : Rumelhart, D., Hinton, G., and Williams, R. (1986). Learning internal representations by error propagation. In Rumelhart, D. and McClelland, J., editors, Parallel Distributed Processing, volume 1, chapter 8, pages MIT Press, Cambridge. Saul, L. and Roweis, S. (2002). Think globally, fit locally : unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4 : Saund, E. (1989). Dimensionality-reduction using connectionist networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(3) : Schölkopf, B., Smola, A., and Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10 : Szummer, M. and Jaakkola, T. (2002). Partially labeled classification with markov random walks. In Dietterich, T., Becker, S., and Ghahramani, Z., editors, Advances in Neural Information Processing Systems 14, Cambridge, MA. MIT Press.

46 Teh, Y. W. and Roweis, S. (2003). Automatic alignment of local representations. In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15. MIT Press. Tenenbaum, J., de Silva, V., and Langford, J. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500) : Tipping, M. and Bishop, C. (1999). Mixtures of probabilistic principal component analysers. Neural Computation, 11(2) : Torgerson, W. (1952). Multidimensional scaling, 1 : Theory and method. Psychometrika, 17 : Vincent, P. and Bengio, Y. (2003). Manifold parzen windows. In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15, Cambridge, MA. MIT Press. Weiss, Y. (1999). Segmentation using eigenvectors : a unifying view. In Proceedings IEEE International Conference on Computer Vision, pages Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., and Schölkopf, B. (2004). Learning with local and global consistency. In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16, Cambridge, MA. MIT Press. Zhu, X., Ghahramani, Z., and Lafferty, J. (2003). Semi-supervised learning using Gaussian fields and harmonic functions. In ICML 2003.

Non-Local Manifold Tangent Learning

Non-Local Manifold Tangent Learning Non-Local Manifold Tangent Learning Yoshua Bengio and Martin Monperrus Dept. IRO, Université de Montréal P.O. Box 1, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,monperrm}@iro.umontreal.ca Abstract

More information

Discovering Shared Structure in Manifold Learning

Discovering Shared Structure in Manifold Learning Discovering Shared Structure in Manifold Learning Yoshua Bengio and Martin Monperrus Dept. IRO, Université de Montréal P.O. Box 1, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,monperrm}@iro.umontreal.ca

More information

Non-Local Estimation of Manifold Structure

Non-Local Estimation of Manifold Structure Neural Computation Manuscript #3171 Non-Local Estimation of Manifold Structure Yoshua Bengio, Martin Monperrus and Hugo Larochelle Département d Informatique et Recherche Opérationnelle Centre de Recherches

More information

Non-Local Estimation of Manifold Structure

Non-Local Estimation of Manifold Structure Non-Local Estimation of Manifold Structure Yoshua Bengio, Martin Monperrus and Hugo Larochelle Département d Informatique et Recherche Opérationnelle Centre de Recherches Mathématiques Université de Montréal

More information

Autoencoders, denoising autoencoders, and learning deep networks

Autoencoders, denoising autoencoders, and learning deep networks 4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 5-9 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,

More information

Neural Networks: promises of current research

Neural Networks: promises of current research April 2008 www.apstat.com Current research on deep architectures A few labs are currently researching deep neural network training: Geoffrey Hinton s lab at U.Toronto Yann LeCun s lab at NYU Our LISA lab

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

Efficient Algorithms may not be those we think

Efficient Algorithms may not be those we think Efficient Algorithms may not be those we think Yann LeCun, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University http://yann.lecun.com http://www.cs.nyu.edu/~yann

More information

Generalized Principal Component Analysis CVPR 2007

Generalized Principal Component Analysis CVPR 2007 Generalized Principal Component Analysis Tutorial @ CVPR 2007 Yi Ma ECE Department University of Illinois Urbana Champaign René Vidal Center for Imaging Science Institute for Computational Medicine Johns

More information

Isometric Mapping Hashing

Isometric Mapping Hashing Isometric Mapping Hashing Yanzhen Liu, Xiao Bai, Haichuan Yang, Zhou Jun, and Zhihong Zhang Springer-Verlag, Computer Science Editorial, Tiergartenstr. 7, 692 Heidelberg, Germany {alfred.hofmann,ursula.barth,ingrid.haas,frank.holzwarth,

More information

Efficient Non-Parametric Function Induction in Semi-Supervised Learning

Efficient Non-Parametric Function Induction in Semi-Supervised Learning Efficient Non-Parametric Function Induction in Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Dept. IRO, Université de Montréal P.O. Box 618, Succ. Centre-Ville, Montreal,

More information

Locality Preserving Projections (LPP) Abstract

Locality Preserving Projections (LPP) Abstract Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,

More information

Locality Preserving Projections (LPP) Abstract

Locality Preserving Projections (LPP) Abstract Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL

More information

Robust Pose Estimation using the SwissRanger SR-3000 Camera

Robust Pose Estimation using the SwissRanger SR-3000 Camera Robust Pose Estimation using the SwissRanger SR- Camera Sigurjón Árni Guðmundsson, Rasmus Larsen and Bjarne K. Ersbøll Technical University of Denmark, Informatics and Mathematical Modelling. Building,

More information

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider

More information

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition Technical Report Title: Manifold learning and Random Projections for multi-view object recognition Authors: Grigorios Tsagkatakis 1 and Andreas Savakis 2 1 Center for Imaging Science, Rochester Institute

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Advanced Introduction to Machine Learning, CMU-10715

Advanced Introduction to Machine Learning, CMU-10715 Advanced Introduction to Machine Learning, CMU-10715 Deep Learning Barnabás Póczos, Sept 17 Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio

More information

3D Object Recognition with Deep Belief Nets

3D Object Recognition with Deep Belief Nets 3D Object Recognition with Deep Belief Nets Vinod Nair and Geoffrey E. Hinton Department of Computer Science, University of Toronto 10 King s College Road, Toronto, M5S 3G5 Canada {vnair,hinton}@cs.toronto.edu

More information

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio Université de Montréal 13/06/2007

More information

Iterative Non-linear Dimensionality Reduction by Manifold Sculpting

Iterative Non-linear Dimensionality Reduction by Manifold Sculpting Iterative Non-linear Dimensionality Reduction by Manifold Sculpting Mike Gashler, Dan Ventura, and Tony Martinez Brigham Young University Provo, UT 84604 Abstract Many algorithms have been recently developed

More information

Manifold Clustering. Abstract. 1. Introduction

Manifold Clustering. Abstract. 1. Introduction Manifold Clustering Richard Souvenir and Robert Pless Washington University in St. Louis Department of Computer Science and Engineering Campus Box 1045, One Brookings Drive, St. Louis, MO 63130 {rms2,

More information

Non-linear dimension reduction

Non-linear dimension reduction Sta306b May 23, 2011 Dimension Reduction: 1 Non-linear dimension reduction ISOMAP: Tenenbaum, de Silva & Langford (2000) Local linear embedding: Roweis & Saul (2000) Local MDS: Chen (2006) all three methods

More information

Automatic Alignment of Local Representations

Automatic Alignment of Local Representations Automatic Alignment of Local Representations Yee Whye Teh and Sam Roweis Department of Computer Science, University of Toronto ywteh,roweis @cs.toronto.edu Abstract We present an automatic alignment procedure

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Data-Dependent Kernels for High-Dimensional Data Classification

Data-Dependent Kernels for High-Dimensional Data Classification Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005 Data-Dependent Kernels for High-Dimensional Data Classification Jingdong Wang James T. Kwok

More information

Neighbor Search with Global Geometry: A Minimax Message Passing Algorithm

Neighbor Search with Global Geometry: A Minimax Message Passing Algorithm : A Minimax Message Passing Algorithm Kye-Hyeon Kim fenrir@postech.ac.kr Seungjin Choi seungjin@postech.ac.kr Department of Computer Science, Pohang University of Science and Technology, San 31 Hyoja-dong,

More information

ORT EP R RCH A ESE R P A IDI! " #$$% &' (# $!"

ORT EP R RCH A ESE R P A IDI!  #$$% &' (# $! R E S E A R C H R E P O R T IDIAP A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert a b Yoshua Bengio b IDIAP RR 01-12 April 26, 2002 Samy Bengio a published in Neural Computation,

More information

A supervised strategy for deep kernel machine

A supervised strategy for deep kernel machine A supervised strategy for deep kernel machine Florian Yger, Maxime Berar, Gilles Gasso and Alain Rakotomamonjy LITIS EA 4108 - Université de Rouen/ INSA de Rouen, 76800 Saint Etienne du Rouvray - France

More information

Challenges motivating deep learning. Sargur N. Srihari

Challenges motivating deep learning. Sargur N. Srihari Challenges motivating deep learning Sargur N. srihari@cedar.buffalo.edu 1 Topics In Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation

More information

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle larocheh@iro.umontreal.ca Dumitru Erhan erhandum@iro.umontreal.ca Aaron Courville courvila@iro.umontreal.ca

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017 3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural

More information

Deep Learning. Architecture Design for. Sargur N. Srihari

Deep Learning. Architecture Design for. Sargur N. Srihari Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation

More information

K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms

K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms Pascal Vincent and Yoshua Bengio Dept. IRO, Université de Montréal C.P. 6128, Montreal, Qc, H3C 3J7, Canada {vincentp,bengioy}@iro.umontreal.ca

More information

Kernels and Clustering

Kernels and Clustering Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran

More information

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM Olga Kouropteva, Oleg Okun and Matti Pietikäinen Machine Vision Group, Infotech Oulu and Department of Electrical and

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract

More information

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract

More information

Introduction to Deep Learning

Introduction to Deep Learning ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)

More information

Large-Scale Face Manifold Learning

Large-Scale Face Manifold Learning Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random

More information

Locally Linear Landmarks for Large-Scale Manifold Learning

Locally Linear Landmarks for Large-Scale Manifold Learning Locally Linear Landmarks for Large-Scale Manifold Learning Max Vladymyrov and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science, University of California, Merced, USA {mvladymyrov,mcarreira-perpinan}@ucmerced.edu

More information

Extended Isomap for Pattern Classification

Extended Isomap for Pattern Classification From: AAAI- Proceedings. Copyright, AAAI (www.aaai.org). All rights reserved. Extended for Pattern Classification Ming-Hsuan Yang Honda Fundamental Research Labs Mountain View, CA 944 myang@hra.com Abstract

More information

Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Contractive Auto-Encoders: Explicit Invariance During Feature Extraction : Explicit Invariance During Feature Extraction Salah Rifai (1) Pascal Vincent (1) Xavier Muller (1) Xavier Glorot (1) Yoshua Bengio (1) (1) Dept. IRO, Université de Montréal. Montréal (QC), H3C 3J7, Canada

More information

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science University of Sheffield Regent Court, 211 Portobello Street, Sheffield,

More information

To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine

To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine 2014 22nd International Conference on Pattern Recognition To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine Takayoshi Yamashita, Masayuki Tanaka, Eiji Yoshida, Yuji Yamauchi and Hironobu

More information

Large-Scale Semi-Supervised Learning

Large-Scale Semi-Supervised Learning Large-Scale Semi-Supervised Learning Jason WESTON a a NEC LABS America, Inc., 4 Independence Way, Princeton, NJ, USA 08540. Abstract. Labeling data is expensive, whilst unlabeled data is often abundant

More information

Deep Learning for Generic Object Recognition

Deep Learning for Generic Object Recognition Deep Learning for Generic Object Recognition, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University Collaborators: Marc'Aurelio Ranzato, Fu Jie Huang,

More information

Dimension Reduction of Image Manifolds

Dimension Reduction of Image Manifolds Dimension Reduction of Image Manifolds Arian Maleki Department of Electrical Engineering Stanford University Stanford, CA, 9435, USA E-mail: arianm@stanford.edu I. INTRODUCTION Dimension reduction of datasets

More information

Visualizing pairwise similarity via semidefinite programming

Visualizing pairwise similarity via semidefinite programming Visualizing pairwise similarity via semidefinite programming Amir Globerson Computer Science and Artificial Intelligence Laboratory MIT Cambridge, MA 02139 gamir@csail.mit.edu Sam Roweis Department of

More information

Learning Two-Layer Contractive Encodings

Learning Two-Layer Contractive Encodings In Proceedings of International Conference on Artificial Neural Networks (ICANN), pp. 620-628, September 202. Learning Two-Layer Contractive Encodings Hannes Schulz and Sven Behnke Rheinische Friedrich-Wilhelms-Universität

More information

Kernel Information Embeddings

Kernel Information Embeddings Roland Memisevic roland@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto ON M5S3G4 Canada Abstract We describe a family of embedding algorithms that are based on nonparametric

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

Learning a Manifold as an Atlas Supplementary Material

Learning a Manifold as an Atlas Supplementary Material Learning a Manifold as an Atlas Supplementary Material Nikolaos Pitelis Chris Russell School of EECS, Queen Mary, University of London [nikolaos.pitelis,chrisr,lourdes]@eecs.qmul.ac.uk Lourdes Agapito

More information

Alternating Minimization. Jun Wang, Tony Jebara, and Shih-fu Chang

Alternating Minimization. Jun Wang, Tony Jebara, and Shih-fu Chang Graph Transduction via Alternating Minimization Jun Wang, Tony Jebara, and Shih-fu Chang 1 Outline of the presentation Brief introduction and related work Problems with Graph Labeling Imbalanced labels

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process

More information

Global versus local methods in nonlinear dimensionality reduction

Global versus local methods in nonlinear dimensionality reduction Global versus local methods in nonlinear dimensionality reduction Vin de Silva Department of Mathematics, Stanford University, Stanford. CA 94305 silva@math.stanford.edu Joshua B. Tenenbaum Department

More information

Sensitivity to parameter and data variations in dimensionality reduction techniques

Sensitivity to parameter and data variations in dimensionality reduction techniques Sensitivity to parameter and data variations in dimensionality reduction techniques Francisco J. García-Fernández 1,2,MichelVerleysen 2, John A. Lee 3 and Ignacio Díaz 1 1- Univ. of Oviedo - Department

More information

Manifold Learning for Video-to-Video Face Recognition

Manifold Learning for Video-to-Video Face Recognition Manifold Learning for Video-to-Video Face Recognition Abstract. We look in this work at the problem of video-based face recognition in which both training and test sets are video sequences, and propose

More information

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial Image Similarities for Learning Video Manifolds Selen Atasoy MICCAI 2011 Tutorial Image Spaces Image Manifolds Tenenbaum2000 Roweis2000 Tenenbaum2000 [Tenenbaum2000: J. B. Tenenbaum, V. Silva, J. C. Langford:

More information

Differential Structure in non-linear Image Embedding Functions

Differential Structure in non-linear Image Embedding Functions Differential Structure in non-linear Image Embedding Functions Robert Pless Department of Computer Science, Washington University in St. Louis pless@cse.wustl.edu Abstract Many natural image sets are samples

More information

Large-Scale Graph-based Semi-Supervised Learning via Tree Laplacian Solver

Large-Scale Graph-based Semi-Supervised Learning via Tree Laplacian Solver Large-Scale Graph-based Semi-Supervised Learning via Tree Laplacian Solver AAAI Press Association for the Advancement of Artificial Intelligence 2275 East Bayshore Road, Suite 60 Palo Alto, California

More information

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018 CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2018 Last Time: Multi-Dimensional Scaling Multi-dimensional scaling (MDS): Non-parametric visualization: directly optimize the z i locations.

More information

Data fusion and multi-cue data matching using diffusion maps

Data fusion and multi-cue data matching using diffusion maps Data fusion and multi-cue data matching using diffusion maps Stéphane Lafon Collaborators: Raphy Coifman, Andreas Glaser, Yosi Keller, Steven Zucker (Yale University) Part of this work was supported by

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt Energy Based Models, Restricted Boltzmann Machines and Deep Networks Jesse Eickholt ???? Who s heard of Energy Based Models (EBMs) Restricted Boltzmann Machines (RBMs) Deep Belief Networks Auto-encoders

More information

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components

More information

School of Computer and Communication, Lanzhou University of Technology, Gansu, Lanzhou,730050,P.R. China

School of Computer and Communication, Lanzhou University of Technology, Gansu, Lanzhou,730050,P.R. China Send Orders for Reprints to reprints@benthamscienceae The Open Automation and Control Systems Journal, 2015, 7, 253-258 253 Open Access An Adaptive Neighborhood Choosing of the Local Sensitive Discriminant

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs

More information

Semi-supervised protein classification using cluster kernels

Semi-supervised protein classification using cluster kernels Semi-supervised protein classification using cluster kernels Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany weston@tuebingen.mpg.de Dengyong Zhou, Andre Elisseeff

More information

Kernel-based Transductive Learning with Nearest Neighbors

Kernel-based Transductive Learning with Nearest Neighbors Kernel-based Transductive Learning with Nearest Neighbors Liangcai Shu, Jinhui Wu, Lei Yu, and Weiyi Meng Dept. of Computer Science, SUNY at Binghamton Binghamton, New York 13902, U. S. A. {lshu,jwu6,lyu,meng}@cs.binghamton.edu

More information

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD. Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,

More information

Day 3 Lecture 1. Unsupervised Learning

Day 3 Lecture 1. Unsupervised Learning Day 3 Lecture 1 Unsupervised Learning Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations

More information

Large Scale Manifold Transduction

Large Scale Manifold Transduction Large Scale Manifold Transduction Michael Karlen, Jason Weston, Ayse Erkan & Ronan Collobert NEC Labs America, Princeton, USA Ećole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland New York University,

More information

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple

More information

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi(

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi( Nonlinear projections Université catholique de Louvain (Belgium) Machine Learning Group http://www.dice.ucl ucl.ac.be/.ac.be/mlg/ 1 Motivation High-dimensional data are difficult to represent difficult

More information

Semi-supervised Learning by Sparse Representation

Semi-supervised Learning by Sparse Representation Semi-supervised Learning by Sparse Representation Shuicheng Yan Huan Wang Abstract In this paper, we present a novel semi-supervised learning framework based on l 1 graph. The l 1 graph is motivated by

More information

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running

More information

THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York

THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York The MNIST database of handwritten digits, available from this page, has a training set

More information

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..

More information

Does the Brain do Inverse Graphics?

Does the Brain do Inverse Graphics? Does the Brain do Inverse Graphics? Geoffrey Hinton, Alex Krizhevsky, Navdeep Jaitly, Tijmen Tieleman & Yichuan Tang Department of Computer Science University of Toronto How to learn many layers of features

More information

Autoencoder Using Kernel Method

Autoencoder Using Kernel Method 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) Banff Center, Banff, Canada, October 5-8, 2017 Autoencoder Using Kernel Method Yan Pei Computer Science Division University of

More information

Graph Laplacian Kernels for Object Classification from a Single Example

Graph Laplacian Kernels for Object Classification from a Single Example Graph Laplacian Kernels for Object Classification from a Single Example Hong Chang & Dit-Yan Yeung Department of Computer Science, Hong Kong University of Science and Technology {hongch,dyyeung}@cs.ust.hk

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title Prototype Vector Machine for Large Scale Semi-Supervised Learning Permalink https://escholarship.org/uc/item/64r3c1rx Author

More information

Towards Multi-scale Heat Kernel Signatures for Point Cloud Models of Engineering Artifacts

Towards Multi-scale Heat Kernel Signatures for Point Cloud Models of Engineering Artifacts Towards Multi-scale Heat Kernel Signatures for Point Cloud Models of Engineering Artifacts Reed M. Williams and Horea T. Ilieş Department of Mechanical Engineering University of Connecticut Storrs, CT

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Sparse Manifold Clustering and Embedding

Sparse Manifold Clustering and Embedding Sparse Manifold Clustering and Embedding Ehsan Elhamifar Center for Imaging Science Johns Hopkins University ehsan@cis.jhu.edu René Vidal Center for Imaging Science Johns Hopkins University rvidal@cis.jhu.edu

More information

C. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun

C. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun Efficient Learning of Sparse Overcomplete Representations with an Energy-Based Model Marc'Aurelio Ranzato C. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun CIAR Summer School Toronto 2006 Why Extracting

More information

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan Lecture 17: Neural Networks and Deep Learning Instructor: Saravanan Thirumuruganathan Outline Perceptron Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks Auto Encoders

More information

Stacked Denoising Autoencoders for Face Pose Normalization

Stacked Denoising Autoencoders for Face Pose Normalization Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule. CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your

More information

Non-linear CCA and PCA by Alignment of Local Models

Non-linear CCA and PCA by Alignment of Local Models Non-linear CCA and PCA by Alignment of Local Models Jakob J. Verbeek, Sam T. Roweis, and Nikos Vlassis Informatics Institute, University of Amsterdam Department of Computer Science,University of Toronto

More information