Facing Non-Convex Optimization to Scale Machine Learning to AI
|
|
- Estella Greene
- 5 years ago
- Views:
Transcription
1 Facing Non-Convex Optimization to Scale Machine Learning to AI October 10th 2006 Thanks to: Yann Le Cun, Geoffrey Hinton, Pascal Lamblin, Olivier Delalleau, Nicolas Le Roux, Hugo Larochelle
2 Machine Learning for Artificial Intelligence Knowledge-based AI has stalled : extracting knowledge from humans and formalizing it into a coherent AI is too labor-intensive, and does not work because much human knowledge is not explicit. The promise of Machine Learning to solve AI : use data and learn the AI tasks from examples. Remains elusive! Here : examine limitations linked to AI for large class of non-parametric approaches (e.g. SVMs), enjoying easy (convex) optimization, and discuss approaches not suffering from these limitations, but non-convex.
3 ML for AI : Desiderata Need large nb examples n because learning complex functions Most examples will be unlabeled : need semi-supervised learning Need to be able to efficiently represent such complex functions : statistical scaling Computational scaling should be O(n) : online learnine Human labor required should NOT increase linearly with nb of sub-tasks Many interrelated tasks in world of humans : need multi-task learning
4 No Free Lunch but Broad Priors for AI No Free Lunch theorems for ML : no completely general learning algorithm exists. Reduce to AI tasks, that animals perform effortlessly : perception, control higher animals and humans : long-term prediction, reasoning, planning, language Should we hand-craft priors for each particular task? (e.g. recognizing one class of objects in images) overwhelming human labor required or can we hope to find a few broad priors (i.e. learning principles) that cover most AI tasks?
5 Kernel Machines f (x) = b + i α i K(x, x i ) In some cases K depends mildly on the data (e.g. normalization). In Support Vector Machines (SVMs), Gaussian Processes, but also in unsupervised manifold learning (LLE, Isomap, kernel PCA, etc.), and non-parametric semi-supervised learning (based on neighborhood graph). Easy optimization (analytic or convex optimization problems), but scaling may still be unacceptable w.r.t. number of training examples (e.g. quadratic). Kernel machines usually embody a smoothness prior x y f (x) f (y) through a local kernel functions K(x, y) large for x near y. Unsuprisingly, relying only on this prior is inadequate to learn functions with many variations, such as in AI.
6 Shallow vs Deep Architectures 1 Linear predictors f (x) = w φ(x), with φ(x) low-dimensional 2 Kernel machines f (x) = i α ik(x, x i ) (kernel trick allows high-dimensional φ(x)) 3 2-layer machines : truly adaptive kernel, RBF networks, standard 1-hidden-layer neural nets, boosting 4 Deep architectures : many (4 to 12 levels reported to date). All above (except 1 if φ low-dim) can theoretically approximate any function. Good thing about type 1 and 2 : convex optimization programs. But what is the price?
7 The Depth-Breadth Tradeoff The disadvantage of shallow architectures is inefficient representation. Worse-case can be exponentially bad. Examples from boolean circuits theory Multiplier circuits N bits N bits can be shallow : O(2 N ) elements needed, or deep : O(N log N) elements needed. DNF (shallow) representation of a O(N) formula may require O(2 N ) terms N-bit parity : shallow needs O(2 N ), deep needs O(N log N) elements with log N levels. FFT : shallow (matrix) representation needs O(N 2 ) op., Fast FT needs O(N log N) with log N levels. (see also Utgoff 2002)
8 Myopic vs Far-Reaching Learning Algorithms The current algorithms are myopic because they must rely on highly local data to characterize the data distribution. We should develop algorithms that allow one to generalize far from the training set, for example sharing information about global parameters that describe the structure of the manifold. DEEP ARCHITECTURES ALLOW NON-LOCAL GENERALIZATION. But they come with a price, non-convex optimization.
9 Local Learning Algorithms A learned parameter of the model influences the value of the learned function in a local area of the input domain.
10 Local Learning Algorithms A learned parameter of the model influences the value of the learned function in a local area of the input domain. With local kernel machine f (x) = i α i K(x, x i ), α i only influences f (x) for x near x i.
11 Local Learning Algorithms A learned parameter of the model influences the value of the learned function in a local area of the input domain. With local kernel machine f (x) = i α i K(x, x i ), α i only influences f (x) for x near x i. Examples : nearest-neighbor algorithms local kernel machines most non-parametric models except multi-layer neural networks
12 Mathematical Problem with Local Learning Theorem With K the Gaussian kernel and f ( ) changing sign at least 2k times along some straight line (i.e. that line crosses the decision surface at least 2k times), then at least k examples are required.
13 Mathematical Problem with Local Learning Theorem With K the Gaussian kernel and f ( ) changing sign at least 2k times along some straight line (i.e. that line crosses the decision surface at least 2k times), then at least k examples are required. Class 1 decision surface Class 1 With local kernels, learning a function that has many bumps requires as many examples as bumps.
14 The Curse of Dimensionality Mathematical problem with classical non-parametric models
15 The Curse of Dimensionality Mathematical problem with classical non-parametric models
16 The Curse of Dimensionality Mathematical problem with classical non-parametric models May need to have examples for each probable combination of the variables of interest. OK for 2 or 3 variables, NOT OK for abstract concepts...
17 Mathematical Problem with Local Kernels Theorem With K the Gaussian kernel, and the goal to learn a maximally changing binary function (f (x) f (x ) when x x = 1) with d inputs, then at least 2 d 1 examples are required.
18 Mathematical Problem with Local Kernels Theorem With K the Gaussian kernel, and the goal to learn a maximally changing binary function (f (x) f (x ) when x x = 1) with d inputs, then at least 2 d 1 examples are required. need to cover the space of possibilities with examples may require nb examples exponential in nb inputs
19 Mathematical Problem with Local Kernels Theorem With K the Gaussian kernel, and the goal to learn a maximally changing binary function (f (x) f (x ) when x x = 1) with d inputs, then at least 2 d 1 examples are required. need to cover the space of possibilities with examples may require nb examples exponential in nb inputs = strongly negative mathematical results on local kernel machines Other similar results in (Bengio, Delalleau, Le Roux, NIPS 2005)
20 Local Manifold Learning : Local Linear Patches Current manifold learning algorithms cannot handle highly curved manifolds because they are based on locally linear patches estimated locally (possibly aligned globally). tangent image tangent directions high contrast image shifted image tangent image tangent directions
21 Local Manifold Learning Algorithms We have shown that LLE, Isomap, kernel PCA, Laplacian Eigenmaps, etc. are kernel machines with a local kernel (Bengio et al 2005). Local Manifold Learning Algorithms : derive information about the manifold structure near x using mostly the neighbors of x. For LLE, kernel PCA with Gaussian kernel, spectral clustering, Laplacian Eigenmaps K D (x, y) 0 for x far from y, so e k (x) only depends on the neighbors of x. Therefore the tangent plane e k (x) x = 1 λ k n i=1 v ki K D (x, x i ) x also only depends on the neighbors of x. can t say anything about the manifold structure near a new example x that is far from training examples!
22 The Curse of Dimensionality on a Manifold Similar to ordinary curse of dimensionality for classical non-parametric statistics, but where d = dimension of the manifold. Hurts all local manifold learning methods!
23 Fundamental Problems with Local Manifold Learning High Noise : constraints not perfectly satisfied. Data not strictly on manifold. More noise more data needed per local patch. High Curvature : need more smaller patches O((1/r) d ) with r = patch radius decreasing with curvature. High Manifold Dimension : O((1/r) d ) patches are needed (curse of dimensionality), at least O(d) examples per patch ( noise). Many manifolds : e.g. images of transformed object instances = 1 manifold per instance or per object class. Local manifold learning can t take advantage of shared structure across multiple manifolds.
24 Fat but Shallow Neural Networks Equivalent to SVMs In Convex Neural Networks (Bengio, Le Roux, Vincent, Delalleau, Marcotte, 2006) : show an equivalence between shallow neural networks (1-hidden layer) and SVMs or Gaussian Processes when number of hidden units becomes large using L2 regularization of the output weights The only difference is in the type of kernel, but it still depends on Euclidean distance between its arguments. So ordinary MLPs may have the disadvantages of SVMs (shallow architecture) without the advantages (convex optimization).
25 Non-Smooth Functions are Learnable Wrong common belief : that if we do not have strong prior knowledge, then highly variable (non-smooth) functions are not learnable. Simple counter-examples : Prior can generally be encoded using Kolmogov complexity, use MDL strategy. Highly variable functions but simple functions in C language : sinus, parity. Such functions could be learned using only few examples and C language MDL prior.
26 Convex Optimization can be Too Slow Convex optimization for SVMs : between O(n 2 ) and O(n 3 ) CPU for n examples, and O(n 2 ) memory. Convex optimization for manifold learning based on neighborhood graph : between O(n 2 ) (LLE), O(n 3 log n) (Isomap) and O(n 7 ) (semi-definite embedding). Does not scale well when n. On the other, sub-optimal stochastic gradient descent for multi-layer neural networks (which only approaches a local minimum) can be applied online (i.e. needs to perform constant computation per example = O(n) computation). Recent work on approximate and on-line optimization of SVMs (Bottou et al 2006) : better scaling properties but nb. support vectors may still grow too fast.
27 Non-Convex Optimization of Deep Architectures What deep architectures are known? various kinds of multi-layer neural networks with many layers. Except for a very special kind of architectures for machine vision (convolutional networks), deep architectures have been neglected in machine learning. Why? training gets stuck in mediocre solutions (Tesauro 92). No hope?
28 Convex Optimization can be a Good Initialization Trading Convexity for Scalability, Collobert, Weston and Bottou (2005) : show that SVM classification can be significantly improved by continuing training (local optimization) with a non-convex more discriminant criterion. In addition, this non-convex criterion can be applied to transductive SVMs, allowing for the first time to optimize them in a reasonable time.
29 Greedy Learning of Abstractions Greedily learning simple things first, higher-level abstractions on top of lower-level ones seems like a possible good strategy and is psychologically plausible.
30 Greedy Learning of Abstractions Greedily learning simple things first, higher-level abstractions on top of lower-level ones seems like a possible good strategy and is psychologically plausible. Coherent with psychological litterature starting with Piaget We learn baby math before arithmetic before algebra before differential equations... Also evidence from neurobiology : (Guillery 2005) Is postnatal neocortical maturation hierarchical?.
31 Deep Networks Some functions can be represented very efficiently with a deep network, but require many more computational elements with a 1-layer or 2-layer network.
32 Deep Networks Some functions can be represented very efficiently with a deep network, but require many more computational elements with a 1-layer or 2-layer network
33 Deep Networks Some functions can be represented very efficiently with a deep network, but require many more computational elements with a 1-layer or 2-layer network. e.g. d-bit parity : adaptive layer (SVM) : 2 d parameters required units and adaptive layers (neural net) : d units, d 2 parameters d-layer net : 2d units, 5d parameters recurrent net : 2 units, 5 param.
34 Deep Belief Networks Geoff Hinton just introduced a deep network model (Hinton, Osindero and Teh, 2006) that provides more evidence that this direction is worthwhile :
35 Deep Belief Networks Geoff Hinton just introduced a deep network model (Hinton, Osindero and Teh, 2006) that provides more evidence that this direction is worthwhile : unsupervised learning of each layer, each trying to model distribution of its inputs
36 Deep Belief Networks Geoff Hinton just introduced a deep network model (Hinton, Osindero and Teh, 2006) that provides more evidence that this direction is worthwhile : unsupervised learning of each layer, each trying to model distribution of its inputs unsupervised greedy layer-wise training serves as INITIALIZATION, to replace traditional random initialization of multi-layer networks
37 Deep Belief Networks Geoff Hinton just introduced a deep network model (Hinton, Osindero and Teh, 2006) that provides more evidence that this direction is worthwhile : unsupervised learning of each layer, each trying to model distribution of its inputs unsupervised greedy layer-wise training serves as INITIALIZATION, to replace traditional random initialization of multi-layer networks beating state-of-the-art statistical learning in experiments on a large machine learning benchmark task (MNIST)
38 Greedy Layer-wise Initialization The principle of greedy layer-wise initialization proposed by Hinton can be generalized to other algorithms. We replaced the probabilistic model (Restricted Boltzmann Machine) used by Hinton for unsupervised training of each layer by a simple auto-associator : Find W which minimizes cross-entropy loss in predicting x from sigmoid(w tanh(wx)). In this context W could be initialized using a convex + analytic heuristic.
39 Experiments on Greedy Layer-wise Initialization Deep nets with 3 to 4 hidden layers. Compare SUPERVISED and UNSUPERVISED (auto-associator or DBN) greedy strategy. train. valid. test DBN, unsupervised pre-training 0% 1.3% 1.4% Deep net, auto-associator pre-training 0% 1.4% 1.4% Deep net, supervised pre-training 0% 1.75% 2.0% Deep net, no pre-training.004% 2.1% 2.4% Shallow net, no pre-training.004% 1.8% 1.9% Classification error on MNIST training, validation, and test sets, with the best hyper-parameters according to validation error, with and without pre-training, using purely supervised or purely unsupervised pre-training. Finds around 500 hidden units per layer. SUPERVISED GREEDY is TOO GREEDY.
40 It is Really an Optimization Problem Why 0 train error even with deep net / no-pretraining? Because last fat hidden layer did all the work. Classification error on MNIST with 20 hidden units on top layer : train. valid. test Deep net, auto-associator pre-training 0% 1.4% 1.6% Deep net, supervised pre-training 0% 1.8% 1.9% Deep net, no pre-training.59% 2.1% 2.2% Shallow net, no pre-training 3.6% 4.7% 5.0% YES IT IS REALLY AN OPTIMIZATION PROBLEM, AND GREEDY UNSUPERVISED HELPS LOTS.
41 Learning Visual Invariances A number of experiments from three different labs (Hinton, LeCun, Bengio) point to inability of kernel machines to efficiently (computational + statistical senses) learn from data involving many complex invariances, e.g., from geometry of images, manifold due to translation, rotation, scaling, shear, thickness, etc. Example : consider N objects in image, each with K different geometric degrees of freedom, corresponding to curvatures such that M different values of each dimension must be considered. Need O(M KN ) templates. the ability of various non-local non-shallow architectures to capture such invariances. But research on deep architectures is still very young!
42 Conclusions Need for AI need ML for highly-varying functions Shallow architectures / local kernel machines with convex optimization do not deliver Number of required examples grows linearly with nb of desired variations Curse of dimensionality arguments Can trade convexity for scalability (computational and statistical)! Deep architectures were thought not trainable, but new methods appear to break through obstacle
43 Belkin, M., Matveeva, I., and Niyogi, P. (2004). Regularization and semi-supervised learning on large graphs. In Shawe-Taylor, J. and Singer, Y., editors, COLT Springer. Belkin, M. and Niyogi, P. (2003). Using manifold structure for partially labeled classification. In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15, Cambridge, MA. MIT Press. Bengio, Y., Delalleau, O., and Le Roux, N. (2006). The curse of highly variable functions for local kernel machines. In Advances in Neural Information Processing Systems 18. MIT Press. Bengio, Y., Delalleau, O., Le Roux, N., Paiement, J.-F., Vincent, P., and Ouimet, M. (2004a). Learning eigenfunctions links spectral embedding and kernel PCA. Neural Computation, 16(10) : Bengio, Y. and Larochelle, H. (2006). Non-local manifold parzen windows. In Weiss, Y., Schölkopf, B., and Platt, J., editors, Advances in Neural Information Processing Systems 18. MIT Press. Bengio, Y., Paiement, J., Vincent, P., Delalleau, O., Le Roux, N., and Ouimet, M. (2004b). Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering. In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16. MIT Press. Bengio, Y., Simard, P., and Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2) : Bourlard, H. and Kamp, Y. (1988). Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics, 59 : Brand, M. (2003).
44 Charting a manifold. In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15. MIT Press. Chapelle, O., Weston, J., and Schölkopf, B. (2003). Cluster kernels for semi-supervised learning. In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15, Cambridge, MA. MIT Press. Cox, T. and Cox, M. (1994). Multidimensional Scaling. Chapman & Hall, London. Delalleau, O., Bengio, Y., and Le Roux, N. (2005). Efficient non-parametric function induction in semi-supervised learning. In Cowell, R. and Ghahramani, Z., editors, Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, Jan 6-8, 2005, Savannah Hotel, Barbados, pages Society for Artificial Intelligence and Statistics. Freund, Y. and Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine Learning : Proceedings of Thirteenth International Conference, pages Ghahramani, Z. and Hinton, G. (1996). The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, Dpt. of Comp. Sci., Univ. of Toronto. Hinton, G. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8) : Hinton, G. E., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation. Ng, A. Y., Jordan, M. I., and Weiss, Y. (2002).
45 On spectral clustering : analysis and an algorithm. In Dietterich, T., Becker, S., and Ghahramani, Z., editors, Advances in Neural Information Processing Systems 14, Cambridge, MA. MIT Press. Rao, R. and Ruderman, D. (1999). Learning lie groups for invariant visual perception. In Kearns, M., Solla, S., and Cohn, D., editors, Advances in Neural Information Processing Systems 11, pages MIT Press, Cambridge, MA. Roweis, S. and Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500) : Rumelhart, D., Hinton, G., and Williams, R. (1986). Learning internal representations by error propagation. In Rumelhart, D. and McClelland, J., editors, Parallel Distributed Processing, volume 1, chapter 8, pages MIT Press, Cambridge. Saul, L. and Roweis, S. (2002). Think globally, fit locally : unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4 : Saund, E. (1989). Dimensionality-reduction using connectionist networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(3) : Schölkopf, B., Smola, A., and Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10 : Szummer, M. and Jaakkola, T. (2002). Partially labeled classification with markov random walks. In Dietterich, T., Becker, S., and Ghahramani, Z., editors, Advances in Neural Information Processing Systems 14, Cambridge, MA. MIT Press.
46 Teh, Y. W. and Roweis, S. (2003). Automatic alignment of local representations. In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15. MIT Press. Tenenbaum, J., de Silva, V., and Langford, J. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500) : Tipping, M. and Bishop, C. (1999). Mixtures of probabilistic principal component analysers. Neural Computation, 11(2) : Torgerson, W. (1952). Multidimensional scaling, 1 : Theory and method. Psychometrika, 17 : Vincent, P. and Bengio, Y. (2003). Manifold parzen windows. In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15, Cambridge, MA. MIT Press. Weiss, Y. (1999). Segmentation using eigenvectors : a unifying view. In Proceedings IEEE International Conference on Computer Vision, pages Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., and Schölkopf, B. (2004). Learning with local and global consistency. In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16, Cambridge, MA. MIT Press. Zhu, X., Ghahramani, Z., and Lafferty, J. (2003). Semi-supervised learning using Gaussian fields and harmonic functions. In ICML 2003.
Non-Local Manifold Tangent Learning
Non-Local Manifold Tangent Learning Yoshua Bengio and Martin Monperrus Dept. IRO, Université de Montréal P.O. Box 1, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,monperrm}@iro.umontreal.ca Abstract
More informationDiscovering Shared Structure in Manifold Learning
Discovering Shared Structure in Manifold Learning Yoshua Bengio and Martin Monperrus Dept. IRO, Université de Montréal P.O. Box 1, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,monperrm}@iro.umontreal.ca
More informationNon-Local Estimation of Manifold Structure
Neural Computation Manuscript #3171 Non-Local Estimation of Manifold Structure Yoshua Bengio, Martin Monperrus and Hugo Larochelle Département d Informatique et Recherche Opérationnelle Centre de Recherches
More informationNon-Local Estimation of Manifold Structure
Non-Local Estimation of Manifold Structure Yoshua Bengio, Martin Monperrus and Hugo Larochelle Département d Informatique et Recherche Opérationnelle Centre de Recherches Mathématiques Université de Montréal
More informationAutoencoders, denoising autoencoders, and learning deep networks
4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 5-9 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,
More informationNeural Networks: promises of current research
April 2008 www.apstat.com Current research on deep architectures A few labs are currently researching deep neural network training: Geoffrey Hinton s lab at U.Toronto Yann LeCun s lab at NYU Our LISA lab
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationEfficient Algorithms may not be those we think
Efficient Algorithms may not be those we think Yann LeCun, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University http://yann.lecun.com http://www.cs.nyu.edu/~yann
More informationGeneralized Principal Component Analysis CVPR 2007
Generalized Principal Component Analysis Tutorial @ CVPR 2007 Yi Ma ECE Department University of Illinois Urbana Champaign René Vidal Center for Imaging Science Institute for Computational Medicine Johns
More informationIsometric Mapping Hashing
Isometric Mapping Hashing Yanzhen Liu, Xiao Bai, Haichuan Yang, Zhou Jun, and Zhihong Zhang Springer-Verlag, Computer Science Editorial, Tiergartenstr. 7, 692 Heidelberg, Germany {alfred.hofmann,ursula.barth,ingrid.haas,frank.holzwarth,
More informationEfficient Non-Parametric Function Induction in Semi-Supervised Learning
Efficient Non-Parametric Function Induction in Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Dept. IRO, Université de Montréal P.O. Box 618, Succ. Centre-Ville, Montreal,
More informationLocality Preserving Projections (LPP) Abstract
Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationLocality Preserving Projections (LPP) Abstract
Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL
More informationRobust Pose Estimation using the SwissRanger SR-3000 Camera
Robust Pose Estimation using the SwissRanger SR- Camera Sigurjón Árni Guðmundsson, Rasmus Larsen and Bjarne K. Ersbøll Technical University of Denmark, Informatics and Mathematical Modelling. Building,
More informationMachine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart
Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider
More informationTechnical Report. Title: Manifold learning and Random Projections for multi-view object recognition
Technical Report Title: Manifold learning and Random Projections for multi-view object recognition Authors: Grigorios Tsagkatakis 1 and Andreas Savakis 2 1 Center for Imaging Science, Rochester Institute
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationAdvanced Introduction to Machine Learning, CMU-10715
Advanced Introduction to Machine Learning, CMU-10715 Deep Learning Barnabás Póczos, Sept 17 Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio
More information3D Object Recognition with Deep Belief Nets
3D Object Recognition with Deep Belief Nets Vinod Nair and Geoffrey E. Hinton Department of Computer Science, University of Toronto 10 King s College Road, Toronto, M5S 3G5 Canada {vnair,hinton}@cs.toronto.edu
More informationAn Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation
An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio Université de Montréal 13/06/2007
More informationIterative Non-linear Dimensionality Reduction by Manifold Sculpting
Iterative Non-linear Dimensionality Reduction by Manifold Sculpting Mike Gashler, Dan Ventura, and Tony Martinez Brigham Young University Provo, UT 84604 Abstract Many algorithms have been recently developed
More informationManifold Clustering. Abstract. 1. Introduction
Manifold Clustering Richard Souvenir and Robert Pless Washington University in St. Louis Department of Computer Science and Engineering Campus Box 1045, One Brookings Drive, St. Louis, MO 63130 {rms2,
More informationNon-linear dimension reduction
Sta306b May 23, 2011 Dimension Reduction: 1 Non-linear dimension reduction ISOMAP: Tenenbaum, de Silva & Langford (2000) Local linear embedding: Roweis & Saul (2000) Local MDS: Chen (2006) all three methods
More informationAutomatic Alignment of Local Representations
Automatic Alignment of Local Representations Yee Whye Teh and Sam Roweis Department of Computer Science, University of Toronto ywteh,roweis @cs.toronto.edu Abstract We present an automatic alignment procedure
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationClassification: Feature Vectors
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12
More informationData-Dependent Kernels for High-Dimensional Data Classification
Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005 Data-Dependent Kernels for High-Dimensional Data Classification Jingdong Wang James T. Kwok
More informationNeighbor Search with Global Geometry: A Minimax Message Passing Algorithm
: A Minimax Message Passing Algorithm Kye-Hyeon Kim fenrir@postech.ac.kr Seungjin Choi seungjin@postech.ac.kr Department of Computer Science, Pohang University of Science and Technology, San 31 Hyoja-dong,
More informationORT EP R RCH A ESE R P A IDI! " #$$% &' (# $!"
R E S E A R C H R E P O R T IDIAP A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert a b Yoshua Bengio b IDIAP RR 01-12 April 26, 2002 Samy Bengio a published in Neural Computation,
More informationA supervised strategy for deep kernel machine
A supervised strategy for deep kernel machine Florian Yger, Maxime Berar, Gilles Gasso and Alain Rakotomamonjy LITIS EA 4108 - Université de Rouen/ INSA de Rouen, 76800 Saint Etienne du Rouvray - France
More informationChallenges motivating deep learning. Sargur N. Srihari
Challenges motivating deep learning Sargur N. srihari@cedar.buffalo.edu 1 Topics In Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation
More informationAn Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation
An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle larocheh@iro.umontreal.ca Dumitru Erhan erhandum@iro.umontreal.ca Aaron Courville courvila@iro.umontreal.ca
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationDeep Learning. Architecture Design for. Sargur N. Srihari
Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More informationK-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms
K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms Pascal Vincent and Yoshua Bengio Dept. IRO, Université de Montréal C.P. 6128, Montreal, Qc, H3C 3J7, Canada {vincentp,bengioy}@iro.umontreal.ca
More informationKernels and Clustering
Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationThe Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem
Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran
More informationSELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen
SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM Olga Kouropteva, Oleg Okun and Matti Pietikäinen Machine Vision Group, Infotech Oulu and Department of Electrical and
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationA Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images
A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract
More informationA Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images
A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract
More informationIntroduction to Deep Learning
ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)
More informationLarge-Scale Face Manifold Learning
Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random
More informationLocally Linear Landmarks for Large-Scale Manifold Learning
Locally Linear Landmarks for Large-Scale Manifold Learning Max Vladymyrov and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science, University of California, Merced, USA {mvladymyrov,mcarreira-perpinan}@ucmerced.edu
More informationExtended Isomap for Pattern Classification
From: AAAI- Proceedings. Copyright, AAAI (www.aaai.org). All rights reserved. Extended for Pattern Classification Ming-Hsuan Yang Honda Fundamental Research Labs Mountain View, CA 944 myang@hra.com Abstract
More informationContractive Auto-Encoders: Explicit Invariance During Feature Extraction
: Explicit Invariance During Feature Extraction Salah Rifai (1) Pascal Vincent (1) Xavier Muller (1) Xavier Glorot (1) Yoshua Bengio (1) (1) Dept. IRO, Université de Montréal. Montréal (QC), H3C 3J7, Canada
More informationGaussian Process Latent Variable Models for Visualisation of High Dimensional Data
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science University of Sheffield Regent Court, 211 Portobello Street, Sheffield,
More informationTo be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine
2014 22nd International Conference on Pattern Recognition To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine Takayoshi Yamashita, Masayuki Tanaka, Eiji Yoshida, Yuji Yamauchi and Hironobu
More informationLarge-Scale Semi-Supervised Learning
Large-Scale Semi-Supervised Learning Jason WESTON a a NEC LABS America, Inc., 4 Independence Way, Princeton, NJ, USA 08540. Abstract. Labeling data is expensive, whilst unlabeled data is often abundant
More informationDeep Learning for Generic Object Recognition
Deep Learning for Generic Object Recognition, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University Collaborators: Marc'Aurelio Ranzato, Fu Jie Huang,
More informationDimension Reduction of Image Manifolds
Dimension Reduction of Image Manifolds Arian Maleki Department of Electrical Engineering Stanford University Stanford, CA, 9435, USA E-mail: arianm@stanford.edu I. INTRODUCTION Dimension reduction of datasets
More informationVisualizing pairwise similarity via semidefinite programming
Visualizing pairwise similarity via semidefinite programming Amir Globerson Computer Science and Artificial Intelligence Laboratory MIT Cambridge, MA 02139 gamir@csail.mit.edu Sam Roweis Department of
More informationLearning Two-Layer Contractive Encodings
In Proceedings of International Conference on Artificial Neural Networks (ICANN), pp. 620-628, September 202. Learning Two-Layer Contractive Encodings Hannes Schulz and Sven Behnke Rheinische Friedrich-Wilhelms-Universität
More informationKernel Information Embeddings
Roland Memisevic roland@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto ON M5S3G4 Canada Abstract We describe a family of embedding algorithms that are based on nonparametric
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationLearning a Manifold as an Atlas Supplementary Material
Learning a Manifold as an Atlas Supplementary Material Nikolaos Pitelis Chris Russell School of EECS, Queen Mary, University of London [nikolaos.pitelis,chrisr,lourdes]@eecs.qmul.ac.uk Lourdes Agapito
More informationAlternating Minimization. Jun Wang, Tony Jebara, and Shih-fu Chang
Graph Transduction via Alternating Minimization Jun Wang, Tony Jebara, and Shih-fu Chang 1 Outline of the presentation Brief introduction and related work Problems with Graph Labeling Imbalanced labels
More informationMachine Learning Lecture 3
Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process
More informationGlobal versus local methods in nonlinear dimensionality reduction
Global versus local methods in nonlinear dimensionality reduction Vin de Silva Department of Mathematics, Stanford University, Stanford. CA 94305 silva@math.stanford.edu Joshua B. Tenenbaum Department
More informationSensitivity to parameter and data variations in dimensionality reduction techniques
Sensitivity to parameter and data variations in dimensionality reduction techniques Francisco J. García-Fernández 1,2,MichelVerleysen 2, John A. Lee 3 and Ignacio Díaz 1 1- Univ. of Oviedo - Department
More informationManifold Learning for Video-to-Video Face Recognition
Manifold Learning for Video-to-Video Face Recognition Abstract. We look in this work at the problem of video-based face recognition in which both training and test sets are video sequences, and propose
More informationImage Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial
Image Similarities for Learning Video Manifolds Selen Atasoy MICCAI 2011 Tutorial Image Spaces Image Manifolds Tenenbaum2000 Roweis2000 Tenenbaum2000 [Tenenbaum2000: J. B. Tenenbaum, V. Silva, J. C. Langford:
More informationDifferential Structure in non-linear Image Embedding Functions
Differential Structure in non-linear Image Embedding Functions Robert Pless Department of Computer Science, Washington University in St. Louis pless@cse.wustl.edu Abstract Many natural image sets are samples
More informationLarge-Scale Graph-based Semi-Supervised Learning via Tree Laplacian Solver
Large-Scale Graph-based Semi-Supervised Learning via Tree Laplacian Solver AAAI Press Association for the Advancement of Artificial Intelligence 2275 East Bayshore Road, Suite 60 Palo Alto, California
More informationCPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018
CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2018 Last Time: Multi-Dimensional Scaling Multi-dimensional scaling (MDS): Non-parametric visualization: directly optimize the z i locations.
More informationData fusion and multi-cue data matching using diffusion maps
Data fusion and multi-cue data matching using diffusion maps Stéphane Lafon Collaborators: Raphy Coifman, Andreas Glaser, Yosi Keller, Steven Zucker (Yale University) Part of this work was supported by
More informationFacial Expression Classification with Random Filters Feature Extraction
Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle
More informationEnergy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt
Energy Based Models, Restricted Boltzmann Machines and Deep Networks Jesse Eickholt ???? Who s heard of Energy Based Models (EBMs) Restricted Boltzmann Machines (RBMs) Deep Belief Networks Auto-encoders
More informationNeural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders
Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components
More informationSchool of Computer and Communication, Lanzhou University of Technology, Gansu, Lanzhou,730050,P.R. China
Send Orders for Reprints to reprints@benthamscienceae The Open Automation and Control Systems Journal, 2015, 7, 253-258 253 Open Access An Adaptive Neighborhood Choosing of the Local Sensitive Discriminant
More informationMachine Learning for NLP
Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs
More informationSemi-supervised protein classification using cluster kernels
Semi-supervised protein classification using cluster kernels Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany weston@tuebingen.mpg.de Dengyong Zhou, Andre Elisseeff
More informationKernel-based Transductive Learning with Nearest Neighbors
Kernel-based Transductive Learning with Nearest Neighbors Liangcai Shu, Jinhui Wu, Lei Yu, and Weiyi Meng Dept. of Computer Science, SUNY at Binghamton Binghamton, New York 13902, U. S. A. {lshu,jwu6,lyu,meng}@cs.binghamton.edu
More informationDeep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.
Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,
More informationDay 3 Lecture 1. Unsupervised Learning
Day 3 Lecture 1 Unsupervised Learning Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations
More informationLarge Scale Manifold Transduction
Large Scale Manifold Transduction Michael Karlen, Jason Weston, Ayse Erkan & Ronan Collobert NEC Labs America, Princeton, USA Ećole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland New York University,
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationNonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi(
Nonlinear projections Université catholique de Louvain (Belgium) Machine Learning Group http://www.dice.ucl ucl.ac.be/.ac.be/mlg/ 1 Motivation High-dimensional data are difficult to represent difficult
More informationSemi-supervised Learning by Sparse Representation
Semi-supervised Learning by Sparse Representation Shuicheng Yan Huan Wang Abstract In this paper, we present a novel semi-supervised learning framework based on l 1 graph. The l 1 graph is motivated by
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running
More informationTHE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York
THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York The MNIST database of handwritten digits, available from this page, has a training set
More informationCSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..
More informationDoes the Brain do Inverse Graphics?
Does the Brain do Inverse Graphics? Geoffrey Hinton, Alex Krizhevsky, Navdeep Jaitly, Tijmen Tieleman & Yichuan Tang Department of Computer Science University of Toronto How to learn many layers of features
More informationAutoencoder Using Kernel Method
2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) Banff Center, Banff, Canada, October 5-8, 2017 Autoencoder Using Kernel Method Yan Pei Computer Science Division University of
More informationGraph Laplacian Kernels for Object Classification from a Single Example
Graph Laplacian Kernels for Object Classification from a Single Example Hong Chang & Dit-Yan Yeung Department of Computer Science, Hong Kong University of Science and Technology {hongch,dyyeung}@cs.ust.hk
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationLawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory
Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title Prototype Vector Machine for Large Scale Semi-Supervised Learning Permalink https://escholarship.org/uc/item/64r3c1rx Author
More informationTowards Multi-scale Heat Kernel Signatures for Point Cloud Models of Engineering Artifacts
Towards Multi-scale Heat Kernel Signatures for Point Cloud Models of Engineering Artifacts Reed M. Williams and Horea T. Ilieş Department of Mechanical Engineering University of Connecticut Storrs, CT
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationSparse Manifold Clustering and Embedding
Sparse Manifold Clustering and Embedding Ehsan Elhamifar Center for Imaging Science Johns Hopkins University ehsan@cis.jhu.edu René Vidal Center for Imaging Science Johns Hopkins University rvidal@cis.jhu.edu
More informationC. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun
Efficient Learning of Sparse Overcomplete Representations with an Energy-Based Model Marc'Aurelio Ranzato C. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun CIAR Summer School Toronto 2006 Why Extracting
More informationLecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan
Lecture 17: Neural Networks and Deep Learning Instructor: Saravanan Thirumuruganathan Outline Perceptron Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks Auto Encoders
More informationStacked Denoising Autoencoders for Face Pose Normalization
Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.
CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your
More informationNon-linear CCA and PCA by Alignment of Local Models
Non-linear CCA and PCA by Alignment of Local Models Jakob J. Verbeek, Sam T. Roweis, and Nikos Vlassis Informatics Institute, University of Amsterdam Department of Computer Science,University of Toronto
More information