Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Similar documents
Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Learning Two-Layer Contractive Encodings

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Deep Learning. Volker Tresp Summer 2014

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Extracting and Composing Robust Features with Denoising Autoencoders

Stacked Denoising Autoencoders for Face Pose Normalization

Learning Discrete Representations via Information Maximizing Self-Augmented Training

Introduction to Deep Learning

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Tutorial Deep Learning : Unsupervised Feature Learning

Deep Learning Srihari. Autoencoders. Sargur Srihari

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.

Neural Network Neurons

Novel Lossy Compression Algorithms with Stacked Autoencoders

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Autoencoders, denoising autoencoders, and learning deep networks

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Unsupervised Learning

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning With Noise

Model Generalization and the Bias-Variance Trade-Off

Neural Networks: promises of current research

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Perceptron: This is convolution!

arxiv: v2 [cs.lg] 23 Oct 2018

Deep Generative Models Variational Autoencoders

The exam is closed book, closed notes except your one-page cheat sheet.

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Visual object classification by sparse convolutional neural networks

A Fast Learning Algorithm for Deep Belief Nets

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

Image Restoration Using DNN

Depth Image Dimension Reduction Using Deep Belief Networks

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Image Compression: An Artificial Neural Network Approach

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Feature Extraction and Learning for RSSI based Indoor Device Localization

Grundlagen der Künstlichen Intelligenz

CS294-1 Assignment 2 Report

Vulnerability of machine learning models to adversarial examples

Deep Learning Applications

Alternatives to Direct Supervision

C. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun

Backpropagation + Deep Learning

CSC 578 Neural Networks and Deep Learning

Neural Networks and Deep Learning

Deep Learning for Computer Vision II

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

DEEP learning algorithms have been the object of much

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Knowledge Discovery and Data Mining

Static Gesture Recognition with Restricted Boltzmann Machines

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Deep Reinforcement Learning

Simple Model Selection Cross Validation Regularization Neural Networks

Unsupervised Learning of Spatiotemporally Coherent Metrics

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

COMPARATIVE DEEP LEARNING FOR CONTENT- BASED MEDICAL IMAGE RETRIEVAL

Capsule Networks. Eric Mintun

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Neural Bag-of-Features Learning

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Neural Network Weight Selection Using Genetic Algorithms

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Handwritten Hindi Numerals Recognition System

For Monday. Read chapter 18, sections Homework:

CSE 250B Project Assignment 4

To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Transfer Learning Using Rotated Image Data to Improve Deep Neural Network Performance

Deep Learning. Volker Tresp Summer 2015

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Optimization in the Big Data Regime 5: Parallelization? Sham M. Kakade

SPE MS. Abstract. Introduction. Autoencoders

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Multinomial Regression and the Softmax Activation Function. Gary Cottrell!

HW Assignment 3 (Due by 9:00am on Mar 6)

CS229 Final Project: Predicting Expected Response Times

Lecture on Modeling Tools for Clustering & Regression

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM

Day 3 Lecture 1. Unsupervised Learning

Bilevel Sparse Coding

Variational Autoencoders. Sargur N. Srihari

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Deep Learning for Embedded Security Evaluation

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Deep Learning Cook Book

Deep Learning with Tensorflow AlexNet

Deep Learning for Computer Vision

arxiv: v1 [cs.lg] 24 Jan 2019

Transcription:

Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1

AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different Auto-encoder Contractive Auto-encoder Results and benchmarking tests Conclusion 2

Introduction To Auto- Encoders 3

Auto-Encoder Introduction Notation for Auto-Encoders - AE AE = retain good info bad info. AE is a great technique for : Characterizing the input distribution Dimensionality reduction Feature rich extraction # of hidden layer nodes < input nodes = bottleneck! (fail to extract enough useful info.) 4

Auto-Encoder Illustration Composed of two parts Encoder Decoder 5

Auto-Encoder Mathematical Expression Encoder maps input x to a hidden or higher level representation: where h is the hidden layer representation, s f is encoder activation, W is weight, b h is the bias. Encoder output contains reduced dimension or compact data. 6

Auto-Encoder Mathematical Expression Cont. Decoder tries to reconstruct the original information with less error. where g(h) is the decoder function, s g is decoder activation, W is transpose of weight, b y is the bias. 7

Auto-Encoder Cont. Types of activation functions used : Linear (identity, binary etc.) Non-linear (sigmoid, tanh etc.) Why linear activation? : Very simple to implement No interesting information at output Why non-linear activation? : Feature rich output Computation burden Very popular 8

Training AE and Cost Function Initialize the weight, biases of encoder and decoder function parameters. Train the data set and minimize the reconstruction error and cost function : L is reconstruction error function (e.g. MMSE, cross entropy function) and τ AE (θ) is the cost function. 9

10 Types of Auto-Encoders

Types of Auto-Encoders Auto-Encoders can be categorized as follows : Normal AE Regularized AE Denoising AE Sparse AE Contractive AE We will focus on regularized, denoising and the proposed contractive AE. 11

Regularized Auto-Encoder Idea is to favor very small increments in weight by decaying the bad features : λ controls strength of regularizing weights, W is weight parameters. Offers significantly better results than normal AE in most benchmarking datasets (MNIST, CIFAR etc.) 12

Denoising Auto-Encoder Modification of Regularized AE. Idea is to add noise to input on purpose and reconstruct a cleaner version. x = x + Є is the corrupted version of input, q( x x) is the corruption process (e.g. Gaussian noise). Optimization is done by stochastic gradient descent algorithm. 13

14 Contractive Auto-Encoders

Contractive AE Modification of Regularized AE. Idea is to avoid/penalize uninteresting features. Introduce a penalty function which penalizes the highly sensitive inputs to increase robustness as follows : As a result, all the samples are flat or invariant to small variations in input samples. 15

Contractive Auto-Encoder Cont. is the Frobenius norm of the Jacobian matrix. If the encoder is linear, RAE and CAE are identical. CAE and Denoising AE (DAE) behave in the same way, but.. CAE increases flatness from first hidden layer in contrast to DAE; DAE encourages flatness only from reconstruction layer. However, cost of computation remains same! 16

Contractive Auto-Encoder Cont. The cost function is given as follows: where λ has the same functionality as in regularized AE and is the Jacobian penalty function as discussed previously. 17

Example Received power data set 4 million samples. 18

19 Results and Benchmarking

Considered Models For Comparison The models considered for performance comparisons with CAE are as follows: 20

Experimental Setting The experimental setting for AE is as follows: Unsupervised training. First a single layer NN, then extended to multilayer. All auto-encoder variants used tied weights (faster convergence and less parameters to optimize). A sigmoid activation function for both encoder and decoder. A cross-entropy reconstruction error function. Optimization by stochastic gradient descent. 1000 hidden layer units are considered while training. 21

Experimental Setting The experimental setting for RBM neural network is as follows: Unsupervised training. First a single layer NN, then extended to multilayer. Contrastive divergence to train the RBM. After training, feature extraction parameters W, b are fed to a MLP with another random output layer for classification. Gradient decent is then used for fine tuning. 22

Results Two standard data sets are considered MNIST and CIFAR-bw. The results are as follows : 23

Results Cont. SAT indicates the average fraction of saturated units. A unit is saturated, if activation function output is below a certain threshold (e.g. 0.05 is lower SAT or 0.95 for upper SAT). The penalty function is a measure of contraction/flatness. Lower the average, better the invariance to small variations. 24

Results Cont. Results for stacked neural networks are as follows: Dual layer CAE is better than other 3-layer NN! 25

How Contraction Works? For better understanding of how contraction works, we use the following analysis: Need to understand local behavior of a data point when contractive penalty is applied. Singular values of Jacobian matrix. Contraction of samples has effect on not just the immediate samples, but beyond (mean and variance). Contraction ratio between two close points - d 1 d 2 (r) Average contraction ratio for a hidden layer defined using a randomly generated sphere of radius r. 26

Effect of Singular Values Large singular value corresponds to direction of allowed variation. CAE better at characterizing low-dimensional inputs. 27

Contraction Ratio The contraction ratio can be visualized as follows: x 0 is some point from validation data set. x 1 is randomly generated mapping of x 0 in a 3D sphere of radius r, centered at x 0. Contraction ratio between x 0 and x 1 after mapping is given by d 1 d 2 (r) d 1 = dist. in original i/p space. d 2 = dist. in mapped space. 28

Contraction ratio vs Radius Decrease in CAE ratio occurs at max r. CAE is trying to make the features invariant in all directions around the training examples. Reconstruction error is making sure that that the representation function doesn t change. 29

Contraction ratio vs Radius Measure of contraction ratio for CIFAR-bw. 30

Contraction ratio vs Radius Deeper encoders produce features that are more invariant, over a farther distance. 31

Conclusion Contractive AE uses a penalty to induce flatness to small variations in input. By looking at the contraction ratio and singular values, we have studied how the CAE is robust to small scale variations in the data set. Finally, the penalty function helps the CAE to improve the performance compared to other auto-encoders. 32

33 Thank you