Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Similar documents
Deep Learning with Tensorflow AlexNet

Machine Learning. MGS Lecture 3: Deep Learning

Deep Learning. Volker Tresp Summer 2014

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Advanced Introduction to Machine Learning, CMU-10715

Convolutional Neural Networks

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

Machine Learning 13. week

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Global Optimality in Neural Network Training

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Deep Learning for Computer Vision II

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

DEEP LEARNING TO DIVERSIFY BELIEF NETWORKS FOR REMOTE SENSING IMAGE CLASSIFICATION

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Computer Vision Lecture 16

Deep Learning. Volker Tresp Summer 2015

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Deconvolutions in Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks

Study of Residual Networks for Image Recognition

Deep Neural Networks:

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

CS489/698: Intro to ML

Lecture 7: Neural network acoustic models in speech recognition

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Alternatives to Direct Supervision

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Unsupervised Learning

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Deep Learning. Volker Tresp Summer 2017

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Transfer Learning. Style Transfer in Deep Learning

Structured Prediction using Convolutional Neural Networks

Deep Learning for Computer Vision

Dynamic Routing Between Capsules

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Seeing Beyond Seeing with Enhanced Deep Tracking

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Video Compression Using Recurrent Convolutional Neural Networks

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Deep Learning & Neural Networks

Depth Image Dimension Reduction Using Deep Belief Networks

Ruslan Salakhutdinov and Geoffrey Hinton. University of Toronto, Machine Learning Group IRGM Workshop July 2007

Weighted Convolutional Neural Network. Ensemble.

Introduction to Deep Learning

Convolutional Neural Network for Facial Expression Recognition

Know your data - many types of networks

Emotion Detection using Deep Belief Networks

Capsule Networks. Eric Mintun

To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine

Deep Boltzmann Machines

Deep Learning Applications

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Does the Brain do Inverse Graphics?

Dropout. Sargur N. Srihari This is part of lecture slides on Deep Learning:

Single Image Depth Estimation via Deep Learning

IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC.

MoonRiver: Deep Neural Network in C++

A Fast Learning Algorithm for Deep Belief Nets

Seminars in Artifiial Intelligenie and Robotiis

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Semantic Segmentation. Zhongang Qi

COMPARATIVE DEEP LEARNING FOR CONTENT- BASED MEDICAL IMAGE RETRIEVAL

Advanced Video Analysis & Imaging

ROB 537: Learning-Based Control

Does the Brain do Inverse Graphics?

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

6. Convolutional Neural Networks

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Unsupervised Learning of Spatiotemporally Coherent Metrics

Day 3 Lecture 1. Unsupervised Learning

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Su et al. Shape Descriptors - III

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

Deep Neural Networks Optimization

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

END-TO-END CHINESE TEXT RECOGNITION

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Exploring Capsules. Binghui Peng Runzhou Tao Shunyu Yao IIIS, Tsinghua University {pbh15, trz15,

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity

11. Neural Network Regularization

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

3D model classification using convolutional neural network

Unsupervised Feature Learning for Optical Character Recognition

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Early Stopping. Sargur N. Srihari

Transcription:

Shallow vs. deep networks Restricted Boltzmann Machines Shallow: one hidden layer Features can be learned more-or-less independently Arbitrary function approximator (with enough hidden units) Deep: two or more hidden layers Upper hidden units reuse lower-level features to compute more complex, general functions Learning is slow: Learning high-level features is not independent of learning low-level features Recurrent: form of deep network that reuses features over time 1 / 18 No connections among units within a layer; allows fast settling Fast/efficient learning procedure Can be stacked; successive hidden layers can be learned incrementally (starting closest to the input) (Hinton) 3 / 18 Boltzmann Machine learning: Unsupervised version Visible units clamped to external input in positive phase analogous to outputs in standard formulation Network free-runs in negative phase (nothing clamped) Network learns to make its free-running behavior look like its behavior when receiving input (i.e., learns to generate input patterns) Stacked RBMs Train iteratively; only use top-down (generative) weights in lower-level RBMs. Objective function (unsupervised) G = p + (V α ) log p+ (V α ) p (V α α ) V α visible units in pattern α p + probabilities in positive phase [outputs (= inputs ) clamped] p probabilities in negative phase [nothing clamped] [ G = ] α,β p+( ) p I α, O + (O β log β I α) p (O β I α) 2 / 18 4 / 18

Deep autoencoder (Hinton & Salakhutdinov, 2006, Science) Digit reconstructions (Hinton & Salakhutdinov, 2006) 5 / 18 PCA reconstructions (2 components) Network reconstructions (2-unit bottleneck) 7 / 18 Face reconstructions (Hinton & Salakhutdinov, 2006) Document retrieval (Hinton & Salakhutdinov, 2006) Network (2D) Latent Semantic Analysis (2D) Top: Middle: Bottom: Original images in test set Network reconstructions (30-unit bottleneck) PCA reconstructions (30 components) 6 / 18 8 / 18

Deep learning with back-propagation Sigmoid function leads to extremely small derivatives for early layers (due to asympototes) Deep learning with back-propagation: Technical advances Huge datasets available via the internet ( big data ) Application of GPUs (Graphics Processing Units) for very efficient 2D image processing Linear units preserve derivatives but cannot alter similarity structure Rectified linear units (ReLUs) preserve derivatives but impose (limited) non-linearity Net input Often applied with dropout: On any given trial, only a random subset of units (e.g., half) actually work (i.e., produce output if input > 0). 9 / 18 Krizhevsky, Sutskever, and Hinton (2012, NIPS) 11 / 18 Online simulator What does a deep network learn? Feedfoward network: 40 inputs to 40 outputs via 6 hidden layers (of size 40) playground.tensorflow.org Random input patterns map to random output patterns (n = 100) Compute pairwise similarities of representations at each hidden layer and compare to (correlate with) pairwise similarities of inputs and of outputs ( Representational Similarity Analysis) Network gradually transforms from input similarity to output similarity 10 / 18 12 / 18

Promoting generalization Prevent overfitting by constraining network in a general way weight decay, cross-validation Train on so much data that it s not possible to overfit Including fabricating new data by transforming existing data in a way that you know the network must generalize over (e.g., viewpoint, color, lighting transformations) Can also train an adversarial network to generate examples that produce high error Constrain structure of network in a way that forces a specific type of generalization Temporal invariance Time-delay neural networks (TDNNs) Position invariance Convolutional neural networks (CNNs) 13 / 18 15 / 18 Time-delay neural networks (TDNNs) Learning long-distance dependencies requires preserving information over multiple time steps Conventential networks (e.g., SRNs) must learn to do this LSTM networks use much more complex units that intrinsically preserve and manipulate information 14 / 18 16 / 18

Convolutional neural networks (CNNs) Hidden units organized into feature maps (each using weight sharing to enforce identical receptive fields) Subsequent layer pools across features at similar locations (e.g., MAX function) Deep learning with back-propagation: Technical advances Huge datasets available via the internet ( big data ) Application of GPUs (Graphics Processing Units) for very efficient 2D image processing 17 / 18 Krizhevsky, Sutskever, and Hinton (2012, NIPS) 18 / 18