Day 3 Lecture 1. Unsupervised Learning

Similar documents
A Taxonomy of Semi-Supervised Learning Algorithms

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

CSC321: Neural Networks. Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis. Geoffrey Hinton

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.

Tutorial Deep Learning : Unsupervised Feature Learning

Alternatives to Direct Supervision

Deep Generative Models Variational Autoencoders

Unsupervised Learning

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Dimension Reduction CS534

Deep Learning. Volker Tresp Summer 2014

What is machine learning?

Variational Autoencoders. Sargur N. Srihari

Unsupervised Learning of Feature Hierarchies

Generative Adversarial Networks (GANs) Ian Goodfellow, Research Scientist MLSLP Keynote, San Francisco

Unsupervised Learning

Grundlagen der Künstlichen Intelligenz

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones

Machine Learning for Physicists Lecture 6. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

CS 229 Midterm Review

Auto-Encoding Variational Bayes

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Challenges motivating deep learning. Sargur N. Srihari

Extracting and Composing Robust Features with Denoising Autoencoders

3D Object Recognition with Deep Belief Nets

Neural Networks: promises of current research

Lecture 16: Object recognition: Part-based generative models

Autoencoders, denoising autoencoders, and learning deep networks

Semi-supervised Learning

Efficient Algorithms may not be those we think

COMS 4771 Clustering. Nakul Verma

Introduction to Deep Learning

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Lecture 19: Generative Adversarial Networks

Model Generalization and the Bias-Variance Trade-Off

Outlier detection using autoencoders

Inference and Representation

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

Machine Learning Techniques

Generative and discriminative classification techniques

Probabilistic Graphical Models

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

GENERATIVE ADVERSARIAL NETWORKS (GAN) Presented by Omer Stein and Moran Rubin

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

Gaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017

ECE 5424: Introduction to Machine Learning

Novel Lossy Compression Algorithms with Stacked Autoencoders

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Large Scale Data Analysis Using Deep Learning

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Machine Learning / Jan 27, 2010

Machine Learning. Supervised Learning. Manfred Huber

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Introduction to Machine Learning CMU-10701

GAN Frontiers/Related Methods

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

List of Accepted Papers for ICVGIP 2018

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Cambridge Interview Technical Talk

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

MTTTS17 Dimensionality Reduction and Visualization. Spring 2018 Jaakko Peltonen. Lecture 11: Neighbor Embedding Methods continued

Learning Bayesian Networks (part 3) Goals for the lecture

Unsupervised Learning

Semi-Supervised Learning with Trees

Latent Variable Models and Expectation Maximization

Deep Generative Models and a Probabilistic Programming Library

Deep Learning Srihari. Autoencoders. Sargur Srihari

Multiview Feature Learning

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

Semantic Segmentation. Zhongang Qi

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Mixtures of Gaussians and Advanced Feature Encoding

Neural Networks and Deep Learning

Deep Learning for Generic Object Recognition

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

Supervised Learning for Image Segmentation

CSE 158. Web Mining and Recommender Systems. Midterm recap

Stacked Denoising Autoencoders for Face Pose Normalization

Machine Problem 8 - Mean Field Inference on Boltzman Machine

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Lecture 11: Clustering Introduction and Projects Machine Learning

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Dropout. Sargur N. Srihari This is part of lecture slides on Deep Learning:

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Data Mining and Analytics

Advanced Introduction to Machine Learning, CMU-10715

Generative and discriminative classification techniques

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Applied Statistics for Neuroscientists Part IIa: Machine Learning

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Image Processing with Nonparametric Neighborhood Statistics

Semi- Supervised Learning

The EM Algorithm Lecture What's the Point? Maximum likelihood parameter estimates: One denition of the \best" knob settings. Often impossible to nd di

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Transcription:

Day 3 Lecture 1 Unsupervised Learning

Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations from unlabelled data You can transfer learned representations from a related task You can train on a nearby surrogate objective for which it is easy to generate labels

Using unlabelled examples: 1D example Max margin decision boundary

Using unlabelled examples: 1D example Semi supervised decision boundary

Using unlabelled examples: 2D example

Using unlabelled examples: 2D example

A probabilistic perspective Bayes rule P(Y X) depends on P(X Y) and P(X) Knowledge of P(X) can help to predict P(Y X) Good model of P(X) must have Y as an implicit latent variable

Example x2 Not linearly separable :( x1

Example x2 4D BoW representation Cluster 1 Cluster 2 1 2 3 4 Separable! Cluster 3 Cluster 4 1 2 3 4 x1 https://github.com/kevinmcguinness/ml-examples/blob/master/notebooks/semi_supervised_simple.ipynb

Assumptions To model P(X) given data, it is necessary to make some assumptions You can t do inference without making assumptions -- David MacKay, Information Theory, Inference, and Learning Algorithms Typical assumptions: Smoothness assumption Cluster assumption Points which are close to each other are more likely to share a label. The data form discrete clusters; points in the same cluster are likely to share a label Manifold assumption The data lie approximately on a manifold of much lower dimension than the input space.

Examples Smoothness assumption Label propagation K-means, etc. Represent points by cluster centers Soft assignment VLAD Gaussian mixture models Recursively propagate labels to nearby points Problem: in high-d, your nearest neighbour may be very far away! Cluster assumption Bag of words models Manifold assumption Linear manifolds Fisher vectors PCA Linear autoencoders Random projections ICA Non-linear manifolds: Non-linear autoencoders Deep autoencoders Restricted Boltzmann machines Deep belief nets

The manifold hypothesis The data distribution lie close to a low-dimensional manifold Example: consider image data Very high dimensional (1,000,000D) A randomly generated image will almost certainly not look like any real world scene The space of images that occur in nature is almost completely empty Hypothesis: real world images lie on a smooth, lowdimensional manifold Manifold distance is a good measure of similarity Similar for audio and text

The manifold hypothesis x2 x2 Non-linear manifold Linear manifold wtx + b x1 x1

The Johnson Lindenstrauss lemma Informally: A small set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. The map used for the embedding is at least Lipschitz continuous. Intuition: Imagine threading a string through a few points in 2D The manifold hypothesis guesses that such a manifold generalizes well to unseen data

Energy-based models Often intractable to explicitly model probability density Energy-based model: high energy for data far from manifold, low energy for data near manifold of observed data Fitting energy-based models Push down on area near observations. Push up everywhere else. Examples Encoder-decoder models: measure energy with reconstruction error K-Means: push down near prototypes. Push up based on distance from prototypes. PCA: push down near line of maximum variation. Push up based on distance to line. Autoencoders: non-linear manifolds... LeCun et al, A Tutorial on Energy-Based Learning, Predicting Structured Data, 2006 http://yann.lecun.com/exdb/publis/pdf/lecun-06.pdf

Autoencoders Latent variables (representation/features) data Encoder W1 h Loss (reconstruction error) Decoder W2 reconstruction

Autoencoders Latent variables (representation/features) data y Encoder W1 h Loss (cross entropy) Classifier WC prediction

Autoencoders Need to somehow push up on energy far from manifold Standard: limit the dimension of the hidden representation Sparse autoencoders: add penalty to make hidden representation sparse Denoising autoencoders: add noise to the data, reconstruct without noise. Can stack autoencoders to attempt to learn higher level features Can train stacked autoencoders by greedy layerwise training Finetune for classification using backprop

Denoising autoencoder example https://github.com/kevinmcguinness/mlexamples/blob/master/notebooks/denoisingautoencoder.ipynb

Greedy layerwise training Supervised objective Reconstruction of layer 2 Reconstruction of layer 1 Layer 3 Reconstruction of input Layer 2 Layer 1 Y Input Backprop

Unsupervised learning from video Slow feature analysis Temporal coherence assumption: features should change slowly over time in video Steady feature analysis Second order changes also small: changes in the past should resemble changes in the future Train on triples of frames from video Loss encourages nearby frames to have slow and steady features, and far frames to have different features Jayaraman and Grauman. Slow and steady feature analysis: higher order temporal coherence in video CVPR 2016. https://arxiv. org/abs/1506.04714

Ladder networks Combine supervised and unsupervised objectives and train together Clean path and noisy path Decoder which can invert the mappings on each layer Loss is weighted sum of supervised and unsupervised cost 1.13% error on permutation invariant MNIST with only 100 examples Rasmus et al. Semi-Supervised Learning with Ladder Networks. NIPS 2015. http://arxiv.org/abs/1507.02672

Summary Many methods available for learning from unlabelled data Autoencoders (many variations) Restricted boltzmann machines Video and ego-motion Semi-supervised methods (e.g. ladder networks) Very active research area!