深度学习 : 机器学习的新浪潮. Kai Yu Deputy Managing Director, Baidu IDL
|
|
- Bernard Melton
- 5 years ago
- Views:
Transcription
1 深度学习 : 机器学习的新浪潮 Kai Yu Deputy Managing Director, Baidu IDL
2 Machine Learning the design and development of algorithms that allow computers to evolve behaviors based on empirical data, 7/23/2013 2
3 All Machine Learning Models in One Page Shallow Models Deep Models 7/23/2013 3
4 Shallow Models Since Late 80 s Neural Networks Boosting Support Vector Machines Maximum Entropy Given good features, how to do classification? 7/23/2013 4
5 Since 2000 Learning with Structures Kernel Learning Transfer Learning Semi-supervised Learning Manifold Learning Sparse Learning Matrix Factorization? Given good features, how to do classification? 7/23/2013 5
6 Mining$for$Structure$ Mission Yet Accomplished Massive$increase$in$both$computa: onal$power$and$the$amount$of$ data$available$from$web,$video$cameras,$laboratory$measurements.$ Images$&$Video$ Text$&$Language$$ Speech$&$Audio$ Gene$Expression$ Product$$ Recommenda: on$ Rela: onal$data/$$ Social$Network$ Climate$Change$ Geological$Data$ Mostly$Unlabeled$ $Develop$sta: s: cal$models$that$can$discover$underlying$structure,$cause,$or$ sta: s: cal$correla: on$from$data$in$unsupervised*or$semi,supervised*way.$$ Slide Courtesy: Russ Salakhutdinov $Mul: ple$applica: on$domains.$ 7/23/2013 6
7 The pipeline of machine visual perception Most Efforts in Machine Learning Low-level sensing Preprocessing Feature extract. Feature selection Inference: prediction, recognition Most critical for accuracy Account for most of the computation for testing Most time-consuming in development cycle Often hand-craft in practice 7/23/2013 7
8 Computer vision features SIFT Spin image HoG RIFT GLOH Slide Courtesy: Andrew Ng
9 Deep Learning: learning features from data Machine Learning Low-level sensing Preprocessing Feature extract. Feature selection Inference: prediction, recognition Feature Learning 7/23/2013 9
10 Convolution Neural Networks Coding Pooling Coding Pooling Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, /23/
11 Winter of Neural Networks Since 90 s Non-convex Need a lot of tricks to play with Hard to do theoretical analysis 7/23/
12 The Paradigm of Deep Learning
13 Deep Learning Since 2006 Neural networks 焕发第二春! 7/23/
14 Race on ImageNet (Top 5 Hit Rate) 72%, %, /23/
15 The Best system on ImageNet (by ) Local Gradients e.g, SIFT, HOG Pooling Variants of Sparse Coding: LLC, Super-vector Pooling Linear Classifier - This is a moderate deep model - The first two layers are hand-designed 7/23/
16 Challenge to Deep Learners Key questions: What if no hand-craft features at all? What if use much deeper neural networks? -- By Geoff Hinton 7/23/
17 Answer from Geoff Hinton, %, %, %, /23/
18 The Architecture Slide Courtesy: Geoff Hinton 7/23/
19 The Architecture 7 hidden layers not counting max pooling. Early layers are conv., last two layers globally connected. Uses rectified linear units in every layer. Uses competitive normalization to suppress hidden activities. Slide Courtesy: Geoff HInton 7/23/
20 Revolution on Speech Recognition Word error rates from MSR, IBM, and the Google speech group Slide Courtesy: Geoff Hinton 7/23/
21 Deep Learning for NLP Natural Language Processing (Almost) from Scratch, Collobert et al, JMLR /23/
22 Biological & Theoretical Justification
23 [Thorpe] Why Hierarchy? Theoretical: well-known depth-breadth tradeoff in circuits design [Hastad 1987]. This suggests many functions can be much more efficiently represented with deeper architectures [Bengio & LeCun 2007] Biological: Visual cortex is hierarchical (Hubel-Wiesel Model, 1981 Nobel Prize)
24 Sparse DBN: Training on face images object models Deep Architecture in the Brain object parts (combination of edges) Area V4 Area V2 Area V1 Higher level visual abstractions Primitive shape detectors Edge detectors edges Retina pixels pixels Note: Sparsity important for these results [Lee, Grosse, Ranganath & Ng, 2009]
25 Sensor representation in the brain Auditory cortex learns to see. Auditory Cortex (Same rewiring process also works for touch/ somatosensory cortex.) Seeing with your tongue [Roe et al., 1992; BrainPort; Welsh & Blasch, 1997]
26 Deep Learning in Industry
27 Microsoft First successful deep learning models for speech recognition, by MSR in /23/
28 Google Brain Project Ley by Google fellow Jeff Dean Published two papers, ICML2012, NIPS2012 Company-wise large-scale deep learning infrastructure Significant success on images, speech, 7/23/
29 Deep Baidu Started working on deep learning in 2012 summer Big success in speech recognition and image recognition, 6 products have been launched so far. Recently, very encouraging results on CTR Meanwhile, efforts are being tried on in areas like LTR, NLP 7/23/
30 Deep Learning Ingredients Building blocks RBMs, autoencoder neural nets, sparse coding, Go deeper: Layerwise feature learning Layer-by-layer unsupervised training Layer-by-layer supervised training Fine tuning via Backpropogation If data are big enough, direct fine tuning is enough Sparsity on hidden layers are often helpful. 7/23/
31 Building Blocks of Deep Learning
32 CVPR 2012 Tutorial: Deep Learning for Vision 09.00am: Introduction Rob Fergus (NYU) 10.00am: Coffee Break 10.30am: Sparse Coding Kai Yu (Baidu) 11.30am: Neural Networks Marc Aurelio Ranzato (Google) 12.30pm: Lunch 01.30pm: Restricted Boltzmann Honglak Lee (Michigan) Machines 02.30pm: Deep Boltzmann Ruslan Salakhutdinov (Toronto) Machines 03.00pm: Coffee Break 03.30pm: Transfer Learning Ruslan Salakhutdinov (Toronto) 04.00pm: Motion & Video Graham Taylor (Guelph) 05.00pm: Summary / Q & A All 05.30pm: End
33 Building Block 1 - RBM
34 Restricted Boltzmann Machine 7/23/2013 Slide Courtesy: Russ Salakhutdinov 34
35 Restricted Boltzmann Machine Restricted: Slide Courtesy: Russ Salakhutdinov 7/23/
36 Model Parameter Learning i.i.d. Approximate*maximum*likelihood*learning:** Slide Courtesy: Russ Salakhutdinov 7/23/
37 Contrastive Divergence 7/23/2013 Slide Courtesy: Russ Salakhutdinov 37
38 RBMs for Images 7/23/2013 Slide Courtesy: Russ Salakhutdinov 38
39 Layerwise Pre-training 7/23/2013 Slide Courtesy: Russ Salakhutdinov 39
40 DBNs for MNIST Classification 7/23/2013 Slide Courtesy: Russ Salakhutdinov 40
41 Deep Autoencoders for Unsupervised Feature Learning 7/23/2013 Slide Courtesy: Russ Salakhutdinov 41
42 Searching$Large$Image$Database$ using$binary$codes$ Image Retrieval using Binary Codes $Map$images$into$binary$codes$for$fast$retrieval.* $Small$Codes,$Torralba,$Fergus,$Weiss,$CVPR$2008$ *Spectral$Hashing,$Y.$Weiss,$A.$Torralba,$R.$Fergus,$NIPS$2008* *Kulis$and$Darrell,$NIPS$2009,$Gong$and$Lazebnik,$CVPR$20111$ *Norouzi$and$Fleet,$ICML$2011,$* Slide Courtesy: Russ Salakhutdinov 7/23/
43 Building Block 2 Autoencoder Neural Net
44 AUTO-ENCODERS NEURAL NETS Autoencoder Neural Network input encoder code decoder prediction Error input higher dimensional than code - error: prediction - input - training: back-propagation Ranzato Slide Courtesy: Marc'Aurelio Ranzato 7/23/
45 Sparse AutoencoderSparsity SPARSE AUTO-ENCODERS Penalty input input encoder Sparsity Penalty code encoder decoder code decoder prediction prediction Error Err sparsity penalty: code sparsity penalty: code error: prediction - input - error: prediction - input - loss: sum of square reconstruction error and sparsity loss: - training: sum back-propagation of square reconstruction error and sparsity - training: back-propagation 1 2 Ranzato Slide Courtesy: Marc'Aurelio Ranzato 7/23/ Ran
46 Sparse Autoencoder SPARSE AUTO-ENCODERS input input SPARSE AUTO-ENCODERS encoder Sparsity Penalty encoder code Sparsity Penalty decoder prediction decoder prediction Error Er sparsity penalty: code input: code: - error: prediction - input - training: back-propagation loss: sum of square reconstruction error and sparsity - loss: X h= W T X 117 L X ;W = W h X 2 j h j Ranzato Le et al. ICA with reconstruction cost.. NIPS 2011 Ran Slide Courtesy: Marc'Aurelio Ranzato 7/23/
47 Building Block 3 Sparse Coding
48 Sparse coding Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection). Training: given a set of random patches x, learning a dictionary of bases [Φ 1, Φ 2, ] Coding: for data vector x, solve LASSO to find the sparse coefficient vector a 7/23/
49 Sparse coding: training time Input: Images x 1, x 2,, x m (each in R d ) Learn: Dictionary of bases f 1, f 2,, f k (also R d ). Alternating optimization: 1.Fix dictionary f 1, f 2,, f k, optimize a (a standard LASSO problem) 2.Fix activations a, optimize dictionary f 1, f 2,, f k (a convex QP problem)
50 Sparse coding: testing time Input: Novel image patch x (in R d ) and previously learned f i s Output: Representation [a i,1, a i,2,, a i,k ] of image patch x i. 0.8 * * * Represent x i as: a i = [0, 0,, 0, 0.8, 0,, 0, 0.3, 0,, 0, 0.5, ]
51 Sparse coding illustration Natural Images Learned bases (f 1,, f 64 ): Edges Test example 0.8 * * * f 63 x 0.8 * f * f * [a 1,, a 64 ] = [0, 0,, 0, 0.8, 0,, 0, 0.3, 0,, 0, 0.5, 0] (feature representation) Slide credit: Andrew Ng Compact & easily interpretable
52 RBM & autoencoders - also involve activation and reconstruction - but have explicit f(x) - not necessarily enforce sparsity on a - but if put sparsity on a, often get improved results [e.g. sparse RBM, Lee et al. NIPS08] a x f(x) encoding a x g(a) decoding 7/23/
53 Sparse coding: A broader view Any feature mapping from x to a, i.e. a = f(x), where -a is sparse (and often higher dim. than x) -f(x) is nonlinear -reconstruction x =g(a), such that x x a f(x) a g(a) x x Therefore, sparse RBMs, sparse auto-encoder, even VQ can be viewed as a form of sparse coding. 7/23/
54 Example of sparse activations (sparse coding) x 3 x 1 x 2 x m a 1 [ ] a 2 [ ] a 3 [ ] a m [ ] different x has different dimensions activated locally-shared sparse representation: similar x s tend to have similar non-zero dimensions 7/23/
55 Example of sparse activations (sparse coding) x 1 x 2 x 3 x m a 1 [ ] a 2 [ ] a 3 [ ] a m [ ] another example: preserving manifold structure more informative in highlighting richer data structures, i.e. clusters, manifolds, 7/23/
56 Sparsity vs. Locality sparse coding local sparse coding Intuition: similar data should get similar activated features Local sparse coding: data in the same neighborhood tend to have shared activated features; data in different neighborhoods tend to have different features activated. 7/23/
57 Sparse coding is not always local: example Case 1 independent subspaces Each basis is a direction Sparsity: each datum is a linear combination of only several bases. 7/23/2013 Case 2 data manifold (or clusters) Each basis an anchor point Sparsity: each datum is a linear combination of neighbor anchors. Sparsity is caused by locality. 57
58 Two approaches to local sparse coding Approach 1 Coding via local anchor points Local coordinate coding Learning locality-constrained linear coding for image classification, Jingjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang. In CVPR Nonlinear learning using local coordinate coding, Kai Yu, Tong Zhang, and Yihong Gong. In NIPS Approach 2 Coding via local subspaces Super-vector coding Image Classification using Super-Vector Coding of Local Image Descriptors, Xi Zhou, Kai Yu, Tong Zhang, and Thomas Huang. In ECCV Large-scale Image Classification: Fast Feature Extraction and SVM Training, Yuanqing Lin, Fengjun Lv, Shenghuo Zhu, Ming Yang, Timothee Cour, Kai Yu, LiangLiang Cao, Thomas Huang. In CVPR /23/
59 Two approaches to local sparse coding Approach 1 Coding via local anchor points Local coordinate coding Approach 2 Coding via local subspaces Super-vector coding - Sparsity achieved by explicitly ensuring locality - Sound theoretical justifications - Much simpler to implement and compute - Strong empirical success 7/23/
60 Hierarchical Sparse Coding n x n ] R d n B R d p q \alpha w x (W,α)= L (W,α)+ λ 1 W,α n W 1 + γ L (W,α) α 0, L (W,α) L (W,α)= 1 2n X BW 2 F + λ 2 S n n 1 1 n 2 x 2 w i i Bw i + λ2 w i Ω(α)w i i= 1 Σ(α W = (w 1 w 2 w n W) R p n S (W α R q W Ω(α) S (W ) 1 n Yu, Lin, & Lafferty, i= 1CVPR 11 7/23/ q n α k (φ k ) w i w T i 1. R
61 Hierarchical Sparse Coding on MNIST Yu, Lin, & Lafferty, CVPR 11 HSC vs. CNN: HSC provide even better performance than CNN more amazingly, HSC learns features in unsupervised manner! 61
62 Second-layer dictionary Yu, Lin, & Lafferty, CVPR 11 A hidden unit in the second layer is connected to a unit group in the 1 st layer: invariance to translation, rotation, and deformation 62
63 Adaptive Deconvolutional Networks for Mid and High Level Feature Learning Matthew D. Zeiler, Graham W. Taylor, and Rob Fergus, ICCV 2011 Hierarchical Convolutional Sparse Coding. Trained with respect to image from all layers (L1-L4). Pooling both spatially and amongst features. Learns invariant midlevel features. L4 Feature Maps L3 Feature Maps L2 Feature Maps L1 Feature Maps Image Select L4 Features Select L3 Feature Groups Select L2 Feature Groups L1 Features
64 Recap: Deep Learning Ingredients Building blocks RBMs, autoencoder neural nets, sparse coding, Go deeper: Layerwise feature learning Layer-by-layer unsupervised training Layer-by-layer supervised training Fine tuning via Backpropogation If data are big enough, direct fine tuning is enough Sparsity on hidden layers are often helpful. 7/23/
65 Layer-by-layer unsupervised training + fine tuning Loss = supervised_error + unsupervised_error input prediction label Weston et al. Deep learning via semi-supervised embedding ICML 2008 Ra Slide Courtesy: Marc'Aurelio Ranzato 7/23/
66 Layer-by-Layer Supervised Training 7/23/
67 Geoff Hinton, ICASSP 2013
68 Large-scale training: Stochastic vs. batch
69 The challenge The Challenge A Large Scale problem has: lots of training samples (>10M) lots of classes (>10K) and lots of input dimensions (>10K). best optimizer in practice is on-line SGD which is naturally sequential, hard to parallelize. layers cannot be trained independently and in parallel, hard to distribute model can have lots of parameters that may clog the network, hard to distribute across machines Slide Courtesy: Marc'Aurelio Ranzato 7/23/
70 A solution by model parallelism Our Solution 1 st machine 2 nd machine 3 rd machine MODEL PARALLELISM + input #3 input #2 input #1 DATA PARALLELISM MODEL PARALLELISM 7/23/ Ranzato Le et al. Building high-level features using large-scale unsupervised learning ICML 2012 Slide Courtesy: Marc'Aurelio Ranzato Ra
71 input #3 input #2 input #1 input #3 input #2 input #1 MODEL PARALLELISM MODEL PARALLELISM + + DATA PARALLELISM DATA PARALLELISM Le et al. Building high-level features using large-scale unsupervised learning ICML 2012 Le 7/23/2013 et al. Building high-level features using large-scale unsupervised learning ICML Slide Courtesy: Marc'Aurelio Ranzato Ra Ran
72 Asynchronous SGD Asynchronous SGD L 1 PARAMETER SERVER 1 st replica 2 nd replica 3 rd replica 7/23/ Ranzato Slide Courtesy: Marc'Aurelio Ranzato
73 Asynchronous SGD Asynchronous SGD PARAMETER SERVER 1 1 st replica 2 nd replica 3 rd replica 7/23/ Ranzato Slide Courtesy: Marc'Aurelio Ranzato
74 PARAMETER SERVER L 2 1 st replica 2 nd replica 3 rd replica 7/23/ Ranzato Slide Courtesy: Marc'Aurelio Ranzato
75 PARAMETER SERVER 2 1 st replica 2 nd replica 3 rd replica 7/23/ Ranzato Slide Courtesy: Marc'Aurelio Ranzato
76 Unsupervised Learning With 1B Paramete Training A Model with 1 B Parameters Deep Net: 3 stages each stage consists of local filtering, L2 pooling, LCN - 18x18 filters - 8 filters at each location - L2 pooling and LCN over 5x5 neighborhoods training jointly the three layers by: - reconstructing the input of each layer - sparsity on the code Slide Courtesy: Marc'Aurelio Ranzato 7/23/
77 Conclusion Remark We see the promise of deep learning Big success on speech, vision, and many other Large-scale training is the key challenge More successful stories are coming 7/23/
CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016
CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period
More informationA Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images
A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationA Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images
A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationDeep Neural Networks:
Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,
More informationLearning Visual Semantics: Models, Massive Computation, and Innovative Applications
Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features
More informationDeep Learning & Feature Learning Methods for Vision
Deep Learning & Feature Learning Methods for Vision CVPR 2012 Tutorial: 9am-5:30pm Rob Fergus (NYU) Kai Yu (Baidu) Marc Aurelio Ranzato (Google) Honglak Lee (Michigan) Ruslan Salakhutdinov (U. Toronto)
More informationMachine Learning. MGS Lecture 3: Deep Learning
Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer
More informationImageNet classification: fast descriptor coding and large-scale SVM training
ImageNet classification: fast descriptor coding and large-scale SVM training Yuanqing Lin, Fengjun Lv, Shenghuo Zhu, Ming Yang, Timothee Cour, Kai Yu LiangLiang Cao, Zhen Li, Min-Hsuan Tsai, Xi Zhou, Thomas
More informationDeep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.
Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,
More informationDeep Learning for Vision: Tricks of the Trade
Deep Learning for Vision: Tricks of the Trade Marc'Aurelio Ranzato Facebook, AI Group www.cs.toronto.edu/~ranzato BAVM Friday, 4 October 2013 Ideal Features Ideal Feature Extractor - window, right - chair,
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationDeep Learning & Neural Networks
Deep Learning & Neural Networks Machine Learning CSE4546 Sham Kakade University of Washington November 29, 2016 Sham Kakade 1 Announcements: HW4 posted Poster Session Thurs, Dec 8 Today: Review: EM Neural
More informationParallel Implementation of Deep Learning Using MPI
Parallel Implementation of Deep Learning Using MPI CSE633 Parallel Algorithms (Spring 2014) Instructor: Prof. Russ Miller Team #13: Tianle Ma Email: tianlema@buffalo.edu May 7, 2014 Content Introduction
More informationLearning Feature Hierarchies for Object Recognition
Learning Feature Hierarchies for Object Recognition Koray Kavukcuoglu Computer Science Department Courant Institute of Mathematical Sciences New York University Marc Aurelio Ranzato, Kevin Jarrett, Pierre
More informationUnsupervised Deep Learning for Scene Recognition
Unsupervised Deep Learning for Scene Recognition Akram Helou and Chau Nguyen May 19, 2011 1 Introduction Object and scene recognition are usually studied separately. However, research [2]shows that context
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationLearning-based Methods in Vision
Learning-based Methods in Vision 16-824 Sparsity and Deep Learning Motivation Multitude of hand-designed features currently in use in vision - SIFT, HoG, LBP, MSER, etc. Even the best approaches, just
More informationIntroduction to Deep Learning
ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)
More informationC. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun
Efficient Learning of Sparse Overcomplete Representations with an Energy-Based Model Marc'Aurelio Ranzato C. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun CIAR Summer School Toronto 2006 Why Extracting
More informationAdvanced Introduction to Machine Learning, CMU-10715
Advanced Introduction to Machine Learning, CMU-10715 Deep Learning Barnabás Póczos, Sept 17 Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio
More informationDeep Learning in Visual Recognition. Thanks Da Zhang for the slides
Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object
More informationDeep Learning for Generic Object Recognition
Deep Learning for Generic Object Recognition, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University Collaborators: Marc'Aurelio Ranzato, Fu Jie Huang,
More informationCPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia
CPSC340 State-of-the-art Neural Networks Nando de Freitas November, 2012 University of British Columbia Outline of the lecture This lecture provides an overview of two state-of-the-art neural networks:
More informationDeep Learning. Volker Tresp Summer 2014
Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationConvolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations Honglak Lee Roger Grosse Rajesh Ranganath Andrew Y. Ng Computer Science Department, Stanford University,
More informationMachine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart
Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider
More informationMultiview Feature Learning
Multiview Feature Learning Roland Memisevic Frankfurt, Montreal Tutorial at IPAM 2012 Roland Memisevic (Frankfurt, Montreal) Multiview Feature Learning Tutorial at IPAM 2012 1 / 163 Outline 1 Introduction
More information3D Object Recognition with Deep Belief Nets
3D Object Recognition with Deep Belief Nets Vinod Nair and Geoffrey E. Hinton Department of Computer Science, University of Toronto 10 King s College Road, Toronto, M5S 3G5 Canada {vnair,hinton}@cs.toronto.edu
More informationECE 6504: Deep Learning for Perception
ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural Nets Dhruv Batra Virginia Tech Administrativia Presentation Assignments https://docs.google.com/spreadsheets/d/ 1m76E4mC0wfRjc4HRBWFdAlXKPIzlEwfw1-u7rBw9TJ8/
More informationHistograms of Sparse Codes for Object Detection
Histograms of Sparse Codes for Object Detection Xiaofeng Ren (Amazon), Deva Ramanan (UC Irvine) Presented by Hossein Azizpour What does the paper do? (learning) a new representation local histograms of
More informationNeural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders
Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components
More informationLearning Invariant Representations with Local Transformations
Kihyuk Sohn kihyuks@umich.edu Honglak Lee honglak@eecs.umich.edu Dept. of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA Abstract Learning invariant representations
More informationNeural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing
Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing feature 3 PC 3 Beate Sick Many slides are taken form Hinton s great lecture on NN: https://www.coursera.org/course/neuralnets
More informationComputer Vision Lecture 16
Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:
More informationRegionlet Object Detector with Hand-crafted and CNN Feature
Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu Overview of this section Regionlet
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationNeural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Neural Network and Deep Learning Early history of deep learning Deep learning dates back to 1940s: known as cybernetics in the 1940s-60s, connectionism in the 1980s-90s, and under the current name starting
More informationEfficient Algorithms may not be those we think
Efficient Algorithms may not be those we think Yann LeCun, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University http://yann.lecun.com http://www.cs.nyu.edu/~yann
More informationDay 3 Lecture 1. Unsupervised Learning
Day 3 Lecture 1 Unsupervised Learning Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations
More informationAlternatives to Direct Supervision
CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of
More informationProceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong
, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA
More informationObject Recognition with Hierarchical Kernel Descriptors
Object Recognition with Hierarchical Kernel Descriptors Liefeng Bo 1 Kevin Lai 1 Xiaofeng Ren 2 Dieter Fox 1,2 University of Washington 1 Intel Labs Seattle 2 {lfb,kevinlai,fox}@cs.washington.edu xiaofeng.ren@intel.com
More informationLEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING. Nandita M. Nayak, Amit K. Roy-Chowdhury. University of California, Riverside
LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING Nandita M. Nayak, Amit K. Roy-Chowdhury University of California, Riverside ABSTRACT We present an approach which incorporates spatiotemporal
More informationUnsupervised Learning of Feature Hierarchies
Unsupervised Learning of Feature Hierarchies by Marc Aurelio Ranzato A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationStudy of Residual Networks for Image Recognition
Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks
More informationNovel Lossy Compression Algorithms with Stacked Autoencoders
Novel Lossy Compression Algorithms with Stacked Autoencoders Anand Atreya and Daniel O Shea {aatreya, djoshea}@stanford.edu 11 December 2009 1. Introduction 1.1. Lossy compression Lossy compression is
More informationDoes the Brain do Inverse Graphics?
Does the Brain do Inverse Graphics? Geoffrey Hinton, Alex Krizhevsky, Navdeep Jaitly, Tijmen Tieleman & Yichuan Tang Department of Computer Science University of Toronto How to learn many layers of features
More informationDeep Learning. Volker Tresp Summer 2015
Deep Learning Volker Tresp Summer 2015 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationDeep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies
http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort
More informationHierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms Liefeng Bo University of Washington Seattle WA 98195, USA Xiaofeng Ren ISTC-Pervasive Computing Intel Labs Seattle
More informationText Detection and Character Recognition in Scene Images with Unsupervised Feature Learning
Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning Adam Coates, Blake Carpenter, Carl Case, Sanjeev Satheesh, Bipin Suresh, Tao Wang, Andrew Y. Ng Computer Science
More informationStacked Denoising Autoencoders for Face Pose Normalization
Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University
More informationFacial Expression Classification with Random Filters Feature Extraction
Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle
More informationMulti-Task Learning of Facial Landmarks and Expression
Multi-Task Learning of Facial Landmarks and Expression Terrance Devries 1, Kumar Biswaranjan 2, and Graham W. Taylor 1 1 School of Engineering, University of Guelph, Guelph, Canada N1G 2W1 2 Department
More informationAn Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation
An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio Université de Montréal 13/06/2007
More informationBidirectional Recurrent Convolutional Networks for Video Super-Resolution
Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Qi Zhang & Yan Huang Center for Research on Intelligent Perception and Computing (CRIPAC) National Laboratory of Pattern Recognition
More informationAutoencoders, denoising autoencoders, and learning deep networks
4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 5-9 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,
More informationAutoencoder Using Kernel Method
2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) Banff Center, Banff, Canada, October 5-8, 2017 Autoencoder Using Kernel Method Yan Pei Computer Science Division University of
More informationTiled convolutional neural networks
Tiled convolutional neural networks Quoc V. Le, Jiquan Ngiam, Zhenghao Chen, Daniel Chia, Pang Wei Koh, Andrew Y. Ng Computer Science Department, Stanford University {quocle,jngiam,zhenghao,danchia,pangwei,ang}@cs.stanford.edu
More informationLecture 7: Semantic Segmentation
Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr
More informationFace Recognition A Deep Learning Approach
Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison
More informationLearning Transferable Features with Deep Adaptation Networks
Learning Transferable Features with Deep Adaptation Networks Mingsheng Long, Yue Cao, Jianmin Wang, Michael I. Jordan Presented by Changyou Chen October 30, 2015 1 Changyou Chen Learning Transferable Features
More informationNeural Networks: promises of current research
April 2008 www.apstat.com Current research on deep architectures A few labs are currently researching deep neural network training: Geoffrey Hinton s lab at U.Toronto Yann LeCun s lab at NYU Our LISA lab
More informationAn Analysis of Single-Layer Networks in Unsupervised Feature Learning
An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates Honglak Lee Andrew Y. Ng Stanford University Computer Science Dept. 353 Serra Mall Stanford, CA 94305 University of Michigan
More informationComputer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction Preprocessing The goal of pre-processing is to try to reduce unwanted variation in image due to lighting,
More informationAdaptive Deconvolutional Networks for Mid and High Level Feature Learning
ICCV 2011 submission. Currently under review. Please do not distribute. Adaptive Deconvolutional Networks for Mid and High Level Feature Learning Matthew D. Zeiler, Graham W. Taylor and Rob Fergus Dept.
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationDeep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?
Data Mining Deep Learning Deep Learning provided breakthrough results in speech recognition and image classification. Why? Because Speech recognition and image classification are two basic examples of
More informationarxiv: v1 [cs.lg] 20 Dec 2013
Unsupervised Feature Learning by Deep Sparse Coding Yunlong He Koray Kavukcuoglu Yun Wang Arthur Szlam Yanjun Qi arxiv:1312.5783v1 [cs.lg] 20 Dec 2013 Abstract In this paper, we propose a new unsupervised
More informationLocally Scale-Invariant Convolutional Neural Networks
Locally Scale-Invariant Convolutional Neural Networks Angjoo Kanazawa Department of Computer Science University of Maryland, College Park, MD 20740 kanazawa@umiacs.umd.edu Abhishek Sharma Department of
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationSingle Image Depth Estimation via Deep Learning
Single Image Depth Estimation via Deep Learning Wei Song Stanford University Stanford, CA Abstract The goal of the project is to apply direct supervised deep learning to the problem of monocular depth
More informationUnsupervised Learning
Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy
More informationRestricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version
Shallow vs. deep networks Restricted Boltzmann Machines Shallow: one hidden layer Features can be learned more-or-less independently Arbitrary function approximator (with enough hidden units) Deep: two
More informationKnow your data - many types of networks
Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for
More informationEnergy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt
Energy Based Models, Restricted Boltzmann Machines and Deep Networks Jesse Eickholt ???? Who s heard of Energy Based Models (EBMs) Restricted Boltzmann Machines (RBMs) Deep Belief Networks Auto-encoders
More informationA Deep Learning Framework for Authorship Classification of Paintings
A Deep Learning Framework for Authorship Classification of Paintings Kai-Lung Hua ( 花凱龍 ) Dept. of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei,
More informationSupervised Translation-Invariant Sparse Coding
Supervised Translation-Invariant Sparse Coding Jianchao Yang,KaiYu, Thomas Huang Beckman Institute, University of Illinois at Urbana-Champaign NEC Laboratories America, Inc., Cupertino, California {jyang29,
More informationDynamic Routing Between Capsules
Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet
More informationDoes the Brain do Inverse Graphics?
Does the Brain do Inverse Graphics? Geoffrey Hinton, Alex Krizhevsky, Navdeep Jaitly, Tijmen Tieleman & Yichuan Tang Department of Computer Science University of Toronto The representation used by the
More informationUsing Machine Learning for Classification of Cancer Cells
Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationHierarchical Representation Using NMF
Hierarchical Representation Using NMF Hyun Ah Song 1, and Soo-Young Lee 1,2 1 Department of Electrical Engineering, KAIST, Daejeon 305-701, Republic of Korea 2 Department of Bio and Brain Engineering,
More informationNeural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Neural Networks These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides
More informationAn Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation
An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle larocheh@iro.umontreal.ca Dumitru Erhan erhandum@iro.umontreal.ca Aaron Courville courvila@iro.umontreal.ca
More informationCS 179 Lecture 16. Logistic Regression & Parallel SGD
CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)
More informationCMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful
More informationDiscriminative classifiers for image recognition
Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study
More informationLearning Convolutional Feature Hierarchies for Visual Recognition
Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann LeCun Computer Science Department Courant Institute
More informationLayerwise Interweaving Convolutional LSTM
Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States
More informationLearning to Align from Scratch
Learning to Align from Scratch Gary B. Huang 1 Marwan A. Mattar 1 Honglak Lee 2 Erik Learned-Miller 1 1 University of Massachusetts, Amherst, MA {gbhuang,mmattar,elm}@cs.umass.edu 2 University of Michigan,
More informationAkarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different
More informationIntroduction to Neural Networks
Introduction to Neural Networks Jakob Verbeek 2017-2018 Biological motivation Neuron is basic computational unit of the brain about 10^11 neurons in human brain Simplified neuron model as linear threshold
More information