Learning Feature Hierarchies for Object Recognition

Size: px
Start display at page:

Download "Learning Feature Hierarchies for Object Recognition"

Transcription

1 Learning Feature Hierarchies for Object Recognition Koray Kavukcuoglu Computer Science Department Courant Institute of Mathematical Sciences New York University Marc Aurelio Ranzato, Kevin Jarrett, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Arthur Szlam Rob Fergus and Yann Lecun

2 Overview Feature Extractors Unsupervised Feature Learning Sparse Coding Learning Invariance Convolutional Sparse Coding Hierarchical Object Recognition

3 Object Recognition Feature Extraction Gabor, SIFT, HoG, Color, combinations... Classification PMK-SVM, Linear,... Grauman 05, Lazebnik 06, Serre 05, Mutch 06,...

4 Object Recognition Feature Extractor Classifier It would be better to learn everything adaptive to different domains Learn feature extractor and classifier together

5 Feature Extraction Can be based on unsupervised learning Should be efficient to extract features Overcomplete sparse representations are easily separable

6 Sparse Coding min 1 2 x Dz2 2 + λ i z i Input Code Dictionary D is given, search for optimal z Sparsity Reconstruction + Sparsity A mapping f : x z For every input x inference takes too much time Mallat 93, Chen 98, Beck 09, Li 09

7 Sparse Modeling min 1 2 x Dz2 2 + λ i z i Learn from data D has to be bounded to avoid trivial solutions Online or batch algorithms for updating dictionary Learn mapping f D : x z Olshausen and Field 97, Aharon 06, Lee 07, Ranzato 07, Kavukcuoglu 08, Zeiler 10,...

8 Per sample energy Sparse Modeling E(x, z, D) =min 1 2 x Dz2 2 + λ i z i Loss L(x, D) = 1 X x X E(x, z, D) For each sample, 1. do inference minimize E(x,z,D) wrt z (sparse coding) 2. update parameters keeping z fixed D D η E D 3. Project columns of D on the unit sphere

9 Sparse Modeling min 1 2 x Dz2 2 + λ i z i Iteration 1 Convergence = Inference process suppresses many except few

10 Sparse Modeling Problems 1. Inference takes long time Train a predictor function 2. Sparse coding is unstable Complex cell model 3. Patch based modeling produces redundant features Convolutional sparse modeling

11 Predictive Sparse Decomposition min 1 2 x Dz2 2 + λ i z = g tanh(k T x) z = sh λ k T x z = sh λ k T x + Ssh λ (k T x) z i + αz F e (x; K) 2 2 z Learned ISTA Gregor 10 Learning For each sample from data, do: 1. Fix K and D, minimize to get optimal z 2. Using the optimal value of z update D and K 3. Scale elements of D to be unit norm

12 Predictive Sparse Decomposition Encoder (k) Decoder (D) 12x12 image patches 256 dictionary elements

13 Predictive Sparse Decomposition Encoder (k) Decoder (D) 28x28 MNIST digit images 200 dictionary elements Strokes for digit parts

14 Good Representation? Performance on MNIST using 28x28 filters Compare representations from different methods PSD : worse reconstruction than other models, but better recognition Ranzato 07, Kavukcuoglu 08

15 Recognition Filterbank + Non-linearity + Pooling 64 filters Non-linearity max / av Contrast Rectification Local Normalization Pinto 08 Pooling

16 Recognition - C101 Optimal (Feature Sign, Lee 07) vs PSD features PSD features perform slightly better Naturally optimal point of sparsity After 64 features not much gain PSD features are order of magnitude faster

17 (In)Stability of Sparse Coding x16 input patch 1024 dictionary elements (4x overcomplete) pixel shifted

18 Learning Invariance min 1 2 x Dz2 2 + λ K i=1 j P i w j z 2 j + αz F e(x; K) 2 2 Group sparsity : Idea proposed by Hyvarinen&Hoyer (2001) in the context of square ICA w j : Gaussian weighting window Learning algorithm is the same as PSD Feedforward regressor F e(x;k), followed by pooling function produces invariant representations efficiently Ability to learn necessary transformations from data

19 Learning Invariance Overlapping Neighborhoods P i P 1 { P Gaussian Window w j Map of z P K (a) (b) Sparsity across pools rather than units Drives basis functions in a pool to be similar Overlapping pools ensure smooth representation manifolds Pool size =1 Regular PSD Kavukcuoglu 09

20 Topographic Maps Circular boundary conditions in both directions 6x6 pools with stride 2 in both dimensions

21 How invariant? 1.5 rotation 0 degrees 1.2 rotation 25 degrees 1 Normalized MSE Normalized MSE horizontal shift SIFT non rot. inv. SIFT Our alg. non inv. Our alg. inv horizontal shift Left: Normalized MSE between representations of original and transformed 16x16 patches Right: Same after 25 rotation IPSD is more invariant

22 Good for Recognition? i=1 i=2 i=k Caltech 101 (Accuracy) IPSD(24x24) 50.9% Linear SIFT(not rot.inv.) (24x24) 51.2% SVM SIFT (rot.inv.) (24x24) 45.2% IPSD(34x34) 59.6% PMK IPSD(56x56) 62.6% SVM IPSD(120x120) 65.6% MNIST (Error Rate) Linear IPSD (5x5) 1.0% SVM SIFT(not rot.inv.) (5x5) 1.5%

23 Multi-Stage Object Recognition Each stage contains a filter-bank, non-linearity and pooling Filterbank Tanh Abs LCN Pooling Conv Net Learned Average HMAX Gabor Max Jarret 09

24 Multi-Stage Object Recognition Unsupervised Pre-Training Filter Bank Non- Linearity Pooling Unsupervised Pre-Training x z 1 Filter Bank Non- Linearity Pooling z 2 Supervised Refinement Filterbank - Fe(x;K) Non-linearities Pooling Building block of a multi-stage architecture

25 Multi-Stage Object Recognition R U R + U + RR UU R + R + U + U + Pa N-Pa N-Pm Rabs-Pa Rabs-N-Pa C-Rabs-N-Pa Pa Unsupervised Pm Random N Supervised Fine Tuning Rabs Absolute Value Rect C Convolutional Unsup U Unsupervised R Random + Supervised Fine Tuning 2 stage > 1 stage

26 Multi-Stage Object Recognition R U R + U + RR UU R + R + U + U + Pa N-Pa N-Pm Rabs-Pa Rabs-N-Pa C-Rabs-N-Pa Pa Unsupervised Pm Random N Supervised Fine Tuning Rabs Absolute Value Rect C Convolutional Unsup U Unsupervised R Random + Supervised Fine Tuning Abs > No Abs

27 Multi-Stage Object Recognition R U R + U + RR UU R + R + U + U + Pa N-Pa N-Pm Rabs-Pa Rabs-N-Pa C-Rabs-N-Pa Pa Unsupervised Pm Random N Supervised Fine Tuning Rabs Absolute Value Rect C Convolutional Unsup U Unsupervised R Random + Supervised Fine Tuning LCN > No LCN

28 Multi-Stage Object Recognition R U R + U + RR UU R + R + U + U + Pa N-Pa N-Pm Rabs-Pa Rabs-N-Pa C-Rabs-N-Pa Pa Unsupervised Pm Random N Supervised Fine Tuning Rabs Absolute Value Rect C Convolutional Unsup U Unsupervised R Random + Supervised Fine Tuning Even Random Works!!!

29 Optimal Stimuli PSD Random Optimize input to maximize output of one unit after abs + LCN + average pooling Random feature extraction respond to oriented gratings too.

30 Random Filter Performance NORB Dataset: error rate x96 grayscale images Caltech 101 F CSG P A (R + R + ) F CSG R abs N P A (UU) F CSG R abs N P A (R + R + ) F CSG R abs N P A (RR) F CSG R abs N P A (U + U + ) number of training samples per class 2. 5 classes (human, car, truck, airplane, animal) 3. Almost 5000 training samples per class

31 Redundancy in Feature Extraction Filters Convolve Feature maps Patch based learning has to model same structure at every location They produce highly redundant features

32 Convolutional PSD 1 D k z k λ z 1 + α z F e (x) x k x R w h D R K s s z R K (w s+1) (h s+1) Patch based Convolutional Convolutional training yields a more diverse set of features Kavukcuoglu 10

33 Convolutional PSD Measuring the redundancy in the dictionary Cumulative histogram of angle between ALL PAIRS of dictionary elements 10 4 acos(max(abs(d i D T j ))) Patch Based Training Convolutional Training # of cross corr < deg deg

34 Convolutional PSD 1 D k z k λ z 1 + α z F e (x) x k x R w h D R K s s z R K (w s+1) (h s+1) = Convolutional sparse coding model large images rather than small image patches Each iteration reduces redundancy in the feature representation

35 Convolutional PSD Input (x) Dictionary (D) Reconstruction Code (z) at Iteration 1 Each iteration reduces redundancy in the feature representation

36 Convolutional PSD Input (x) Dictionary (D) Reconstruction Code (z) at Iteration 2 Each iteration reduces redundancy in the feature representation

37 Convolutional PSD Input (x) Dictionary (D) Reconstruction Code (z) at Convergence Each iteration reduces redundancy in the feature representation

38 Convolutional PSD - Better Encoders To be able to predict convolutional sparse representations, simple encoders are very inadequate A better encoder should use shrinkage operator with a learned suppression matrix to be able to approximate sparse codes (Gregor 10) Encoder Training 2nd order information is important for fast convergence Smooth shrinkage is important for conserving derivatives z = sh λ k T x z = sh λ k T x + Ssh λ (k T x) 1 β log(exp(β b)+exp(β s) 1) b

39 Convolutional Training Inference and Training Order of magnitude more costly Efficient inference algorithms are crucial (ISTA, FISTA, CD) 64 filters = 64 times overcomplete representation Proper handling of border effects is important Test time is the same as patch based model

40 Convolutional PSD Recognition Performance on C101 Low level convolutional feature learning improves Patch Based Convolutional 1 Unsup 52.2% 57.1% Stage Unsup % 57.6% 2 Unsup 63.7% 65.5% Stage Unsup % 66.3%

41 Pedestrian Detection On INRIA Shapelet orig (90.5%) PoseInvSvm (68.6%) VJ OpenCv (53.0%) PoseInv (51.4%) Shapelet (50.4%) 0.3 VJ (47.5%) FtrMine (34.0%) miss rate % Pls (23.4%) HOG (23.1%) HikSvm (21.9%) LatSvm V1 (17.5%) MultiFtr (15.6%) R+R+ (14.8%) U+U+ (11.5%) 0.05 MultiFtr+CSS (10.9%) 11.5% LatSvm V2 (9.3%) FPDW (9.3%) ChnFtrs (8.7%) false positives per image Purely supervised training: 14.8% miss rate Unsupervised pre-training with Conv PSD + supervised refinement : 11.5% Close to state of the art and improving quickly...

42 Questions?

Learning Convolutional Feature Hierarchies for Visual Recognition

Learning Convolutional Feature Hierarchies for Visual Recognition Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann LeCun Computer Science Department Courant Institute

More information

Learning Convolutional Feature Hierarchies for Visual Recognition

Learning Convolutional Feature Hierarchies for Visual Recognition Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu 1, Pierre Sermanet 1, Y-Lan Boureau 2,1, Karol Gregor 1, Michaël Mathieu 1, Yann LeCun 1 1 Courant Institute of Mathematical

More information

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract

More information

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract

More information

What is the Best Multi-Stage Architecture for Object Recognition?

What is the Best Multi-Stage Architecture for Object Recognition? What is the Best Multi-Stage Architecture for Object Recognition? Kevin Jarrett, Koray Kavukcuoglu, Marc Aurelio Ranzato and Yann LeCun The Courant Institute of Mathematical Sciences New York University,

More information

Pedestrian Detection with Unsupervised Multi-Stage Feature Learning

Pedestrian Detection with Unsupervised Multi-Stage Feature Learning 2013 IEEE Conference on Computer Vision and Pattern Recognition Pedestrian Detection with Unsupervised Multi-Stage Feature Learning Pierre Sermanet Koray Kavukcuoglu Soumith Chintala Yann LeCun Courant

More information

Unsupervised Learning of Feature Hierarchies

Unsupervised Learning of Feature Hierarchies Unsupervised Learning of Feature Hierarchies by Marc Aurelio Ranzato A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science

More information

Integral Channel Features Addendum

Integral Channel Features Addendum DOLLÁR, et al.: INTEGRAL CHANNEL FEATURES ADDENDUM 1 Integral Channel Features Addendum Piotr Dollár 1 pdollar@caltech.edu Zhuowen Tu 2 zhuowen.tu@loni.ucla.edu Pietro Perona 1 perona@caltech.edu Serge

More information

Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms Liefeng Bo University of Washington Seattle WA 98195, USA Xiaofeng Ren ISTC-Pervasive Computing Intel Labs Seattle

More information

Adaptive Deconvolutional Networks for Mid and High Level Feature Learning

Adaptive Deconvolutional Networks for Mid and High Level Feature Learning ICCV 2011 submission. Currently under review. Please do not distribute. Adaptive Deconvolutional Networks for Mid and High Level Feature Learning Matthew D. Zeiler, Graham W. Taylor and Rob Fergus Dept.

More information

Learning-based Methods in Vision

Learning-based Methods in Vision Learning-based Methods in Vision 16-824 Sparsity and Deep Learning Motivation Multitude of hand-designed features currently in use in vision - SIFT, HoG, LBP, MSER, etc. Even the best approaches, just

More information

Learning Hierarchical Feature Extractors For Image Recognition

Learning Hierarchical Feature Extractors For Image Recognition Learning Hierarchical Feature Extractors For Image Recognition by Y-Lan Boureau A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of

More information

Learning Hierarchical Feature Extractors For Image Recognition

Learning Hierarchical Feature Extractors For Image Recognition Learning Hierarchical Feature Extractors For Image Recognition by Y-Lan Boureau A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of

More information

Supplementary material for the paper Are Sparse Representations Really Relevant for Image Classification?

Supplementary material for the paper Are Sparse Representations Really Relevant for Image Classification? Supplementary material for the paper Are Sparse Representations Really Relevant for Image Classification? Roberto Rigamonti, Matthew A. Brown, Vincent Lepetit CVLab, EPFL Lausanne, Switzerland firstname.lastname@epfl.ch

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional

More information

Deep Learning for Generic Object Recognition

Deep Learning for Generic Object Recognition Deep Learning for Generic Object Recognition, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University Collaborators: Marc'Aurelio Ranzato, Fu Jie Huang,

More information

Using Machine Learning for Classification of Cancer Cells

Using Machine Learning for Classification of Cancer Cells Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.

More information

CS229 Final Project Report. A Multi-Task Feature Learning Approach to Human Detection. Tiffany Low

CS229 Final Project Report. A Multi-Task Feature Learning Approach to Human Detection. Tiffany Low CS229 Final Project Report A Multi-Task Feature Learning Approach to Human Detection Tiffany Low tlow@stanford.edu Abstract We focus on the task of human detection using unsupervised pre-trained neutral

More information

Efficient Algorithms may not be those we think

Efficient Algorithms may not be those we think Efficient Algorithms may not be those we think Yann LeCun, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University http://yann.lecun.com http://www.cs.nyu.edu/~yann

More information

CPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia

CPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia CPSC340 State-of-the-art Neural Networks Nando de Freitas November, 2012 University of British Columbia Outline of the lecture This lecture provides an overview of two state-of-the-art neural networks:

More information

Human Vision Based Object Recognition Sye-Min Christina Chan

Human Vision Based Object Recognition Sye-Min Christina Chan Human Vision Based Object Recognition Sye-Min Christina Chan Abstract Serre, Wolf, and Poggio introduced an object recognition algorithm that simulates image processing in visual cortex and claimed to

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Generalized Lasso based Approximation of Sparse Coding for Visual Recognition

Generalized Lasso based Approximation of Sparse Coding for Visual Recognition Generalized Lasso based Approximation of Sparse Coding for Visual Recognition Nobuyuki Morioka The University of New South Wales & NICTA Sydney, Australia nmorioka@cse.unsw.edu.au Shin ichi Satoh National

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Sparse Models in Image Understanding And Computer Vision

Sparse Models in Image Understanding And Computer Vision Sparse Models in Image Understanding And Computer Vision Jayaraman J. Thiagarajan Arizona State University Collaborators Prof. Andreas Spanias Karthikeyan Natesan Ramamurthy Sparsity Sparsity of a vector

More information

Object Category Detection. Slides mostly from Derek Hoiem

Object Category Detection. Slides mostly from Derek Hoiem Object Category Detection Slides mostly from Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical template matching with sliding window Part-based Models

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers 3D Human Pose Inference Difficulties Towards

More information

C. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun

C. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun Efficient Learning of Sparse Overcomplete Representations with an Energy-Based Model Marc'Aurelio Ranzato C. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun CIAR Summer School Toronto 2006 Why Extracting

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING. Nandita M. Nayak, Amit K. Roy-Chowdhury. University of California, Riverside

LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING. Nandita M. Nayak, Amit K. Roy-Chowdhury. University of California, Riverside LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING Nandita M. Nayak, Amit K. Roy-Chowdhury University of California, Riverside ABSTRACT We present an approach which incorporates spatiotemporal

More information

Stacks of Convolutional Restricted Boltzmann Machines for Shift-Invariant Feature Learning

Stacks of Convolutional Restricted Boltzmann Machines for Shift-Invariant Feature Learning Stacks of Convolutional Restricted Boltzmann Machines for Shift-Invariant Feature Learning Mohammad Norouzi, Mani Ranjbar, and Greg Mori School of Computing Science Simon Fraser University Burnaby, BC

More information

Efficient Learning of Sparse Representations with an Energy-Based Model

Efficient Learning of Sparse Representations with an Energy-Based Model Efficient of Sparse Representations with an Energy-Based Model Marc Aurelio Ranzato, Christopher Poultney, Sumit Chopra, Yann Le Cun Presented by Pascal Lamblin February 14 th, 2007 Efficient of Sparse

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

OBJECT RECOGNITION ALGORITHM FOR MOBILE DEVICES

OBJECT RECOGNITION ALGORITHM FOR MOBILE DEVICES Image Processing & Communication, vol. 18,no. 1, pp.31-36 DOI: 10.2478/v10248-012-0088-x 31 OBJECT RECOGNITION ALGORITHM FOR MOBILE DEVICES RAFAŁ KOZIK ADAM MARCHEWKA Institute of Telecommunications, University

More information

Autoencoders, denoising autoencoders, and learning deep networks

Autoencoders, denoising autoencoders, and learning deep networks 4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 5-9 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,

More information

Neural Networks: promises of current research

Neural Networks: promises of current research April 2008 www.apstat.com Current research on deep architectures A few labs are currently researching deep neural network training: Geoffrey Hinton s lab at U.Toronto Yann LeCun s lab at NYU Our LISA lab

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

Convolutional-Recursive Deep Learning for 3D Object Classification

Convolutional-Recursive Deep Learning for 3D Object Classification Convolutional-Recursive Deep Learning for 3D Object Classification Richard Socher, Brody Huval, Bharath Bhat, Christopher D. Manning, Andrew Y. Ng NIPS 2012 Iro Armeni, Manik Dhar Motivation Hand-designed

More information

Learning Fast Approximations of Sparse Coding

Learning Fast Approximations of Sparse Coding Karol Gregor and Yann LeCun {kgregor,yann}@cs.nyu.edu Courant Institute, New York University, 715 Broadway, New York, NY 10003, USA Abstract In Sparse Coding (SC), input vectors are reconstructed using

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Multiview Feature Learning

Multiview Feature Learning Multiview Feature Learning Roland Memisevic Frankfurt, Montreal Tutorial at IPAM 2012 Roland Memisevic (Frankfurt, Montreal) Multiview Feature Learning Tutorial at IPAM 2012 1 / 163 Outline 1 Introduction

More information

Modern Object Detection. Most slides from Ali Farhadi

Modern Object Detection. Most slides from Ali Farhadi Modern Object Detection Most slides from Ali Farhadi Comparison of Classifiers assuming x in {0 1} Learning Objective Training Inference Naïve Bayes maximize j i logp + logp ( x y ; θ ) ( y ; θ ) i ij

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

arxiv: v1 [cs.lg] 20 Dec 2013

arxiv: v1 [cs.lg] 20 Dec 2013 Unsupervised Feature Learning by Deep Sparse Coding Yunlong He Koray Kavukcuoglu Yun Wang Arthur Szlam Yanjun Qi arxiv:1312.5783v1 [cs.lg] 20 Dec 2013 Abstract In this paper, we propose a new unsupervised

More information

THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York

THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York THE MNIST DATABASE of handwritten digits Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York The MNIST database of handwritten digits, available from this page, has a training set

More information

Modeling Visual Cortex V4 in Naturalistic Conditions with Invari. Representations

Modeling Visual Cortex V4 in Naturalistic Conditions with Invari. Representations Modeling Visual Cortex V4 in Naturalistic Conditions with Invariant and Sparse Image Representations Bin Yu Departments of Statistics and EECS University of California at Berkeley Rutgers University, May

More information

Deep Neural Networks:

Deep Neural Networks: Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,

More information

A HMAX with LLC for Visual Recognition

A HMAX with LLC for Visual Recognition A HMAX with LLC for Visual Recognition Kean Hong Lau, Yong Haur Tay, Fook Loong Lo Centre for Computing and Intelligent System Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia {laukh,tayyh,lofl}@utar.edu.my

More information

Weighted Convolutional Neural Network. Ensemble.

Weighted Convolutional Neural Network. Ensemble. Weighted Convolutional Neural Network Ensemble Xavier Frazão and Luís A. Alexandre Dept. of Informatics, Univ. Beira Interior and Instituto de Telecomunicações Covilhã, Portugal xavierfrazao@gmail.com

More information

Supervised Translation-Invariant Sparse Coding

Supervised Translation-Invariant Sparse Coding Supervised Translation-Invariant Sparse Coding Jianchao Yang,KaiYu, Thomas Huang Beckman Institute, University of Illinois at Urbana-Champaign NEC Laboratories America, Inc., Cupertino, California {jyang29,

More information

Capsule Networks. Eric Mintun

Capsule Networks. Eric Mintun Capsule Networks Eric Mintun Motivation An improvement* to regular Convolutional Neural Networks. Two goals: Replace max-pooling operation with something more intuitive. Keep more info about an activated

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

ECE 6504: Deep Learning for Perception

ECE 6504: Deep Learning for Perception ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural Nets Dhruv Batra Virginia Tech Administrativia Presentation Assignments https://docs.google.com/spreadsheets/d/ 1m76E4mC0wfRjc4HRBWFdAlXKPIzlEwfw1-u7rBw9TJ8/

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

Learning Invariant Feature Hierarchies

Learning Invariant Feature Hierarchies Learning Invariant Feature Hierarchies Yann LeCun Center for Data Science & Courant Institute, NYU yann@cs.nyu.edu http://yann.lecun.com 55 years of hand-crafted features The traditional model of pattern

More information

Classification and Detection in Images. D.A. Forsyth

Classification and Detection in Images. D.A. Forsyth Classification and Detection in Images D.A. Forsyth Classifying Images Motivating problems detecting explicit images classifying materials classifying scenes Strategy build appropriate image features train

More information

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah Human detection using histogram of oriented gradients Srikumar Ramalingam School of Computing University of Utah Reference Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection,

More information

Extracting and Composing Robust Features with Denoising Autoencoders

Extracting and Composing Robust Features with Denoising Autoencoders Presenter: Alexander Truong March 16, 2017 Extracting and Composing Robust Features with Denoising Autoencoders Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre-Antoine Manzagol 1 Outline Introduction

More information

Dynamic Routing Between Capsules. Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy

Dynamic Routing Between Capsules. Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy Dynamic Routing Between Capsules Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy Problems & Results Object classification in images without losing information about important parts of the picture.

More information

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601 Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,

More information

Category vs. instance recognition

Category vs. instance recognition Category vs. instance recognition Category: Find all the people Find all the buildings Often within a single image Often sliding window Instance: Is this face James? Find this specific famous building

More information

2D Image Processing Feature Descriptors

2D Image Processing Feature Descriptors 2D Image Processing Feature Descriptors Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Overview

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Object Recognition II

Object Recognition II Object Recognition II Linda Shapiro EE/CSE 576 with CNN slides from Ross Girshick 1 Outline Object detection the task, evaluation, datasets Convolutional Neural Networks (CNNs) overview and history Region-based

More information

Histograms of Sparse Codes for Object Detection

Histograms of Sparse Codes for Object Detection Histograms of Sparse Codes for Object Detection Xiaofeng Ren (Amazon), Deva Ramanan (UC Irvine) Presented by Hossein Azizpour What does the paper do? (learning) a new representation local histograms of

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

Large-Scale Visual Recognition With Deep Learning

Large-Scale Visual Recognition With Deep Learning Large-Scale Visual Recognition With Deep Learning Marc'Aurelio ranzato@google.com www.cs.toronto.edu/~ranzato Sunday 23 June 2013 Why Is Recognition Hard? Object Recognizer panda 2 Why Is Recognition Hard?

More information

Deformable Part Models

Deformable Part Models CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

An Analysis of Single-Layer Networks in Unsupervised Feature Learning

An Analysis of Single-Layer Networks in Unsupervised Feature Learning An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates Honglak Lee Andrew Y. Ng Stanford University Computer Science Dept. 353 Serra Mall Stanford, CA 94305 University of Michigan

More information

Tiled convolutional neural networks

Tiled convolutional neural networks Tiled convolutional neural networks Quoc V. Le, Jiquan Ngiam, Zhenghao Chen, Daniel Chia, Pang Wei Koh, Andrew Y. Ng Computer Science Department, Stanford University {quocle,jngiam,zhenghao,danchia,pangwei,ang}@cs.stanford.edu

More information

Image Restoration and Background Separation Using Sparse Representation Framework

Image Restoration and Background Separation Using Sparse Representation Framework Image Restoration and Background Separation Using Sparse Representation Framework Liu, Shikun Abstract In this paper, we introduce patch-based PCA denoising and k-svd dictionary learning method for the

More information

The Fastest Pedestrian Detector in the West

The Fastest Pedestrian Detector in the West DOLLÁR, et al.: THE FASTEST PEDESTRIAN DETECTOR IN THE WEST The Fastest Pedestrian Detector in the West Piotr Dollár pdollar@caltech.edu Serge Belongie 2 sjb@cs.ucsd.edu Pietro Perona perona@caltech.edu

More information

Image Restoration Using DNN

Image Restoration Using DNN Image Restoration Using DNN Hila Levi & Eran Amar Images were taken from: http://people.tuebingen.mpg.de/burger/neural_denoising/ Agenda Domain Expertise vs. End-to-End optimization Image Denoising and

More information

Large-Scale Deep Learning

Large-Scale Deep Learning Large-Scale Deep Learning IPAM SUMMER SCHOOL Marc'Aurelio ranzato@google.com www.cs.toronto.edu/~ranzato UCLA, 24 July 2012 Two Approches to Deep Learning Deep Neural Nets: - usually more efficient (at

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

3D Object Recognition with Deep Belief Nets

3D Object Recognition with Deep Belief Nets 3D Object Recognition with Deep Belief Nets Vinod Nair and Geoffrey E. Hinton Department of Computer Science, University of Toronto 10 King s College Road, Toronto, M5S 3G5 Canada {vnair,hinton}@cs.toronto.edu

More information

Alternatives to Direct Supervision

Alternatives to Direct Supervision CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of

More information

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations Honglak Lee Roger Grosse Rajesh Ranganath Andrew Y. Ng Computer Science Department, Stanford University,

More information

A New Algorithm for Training Sparse Autoencoders

A New Algorithm for Training Sparse Autoencoders A New Algorithm for Training Sparse Autoencoders Ali Shahin Shamsabadi, Massoud Babaie-Zadeh, Seyyede Zohreh Seyyedsalehi, Hamid R. Rabiee, Christian Jutten Sharif University of Technology, University

More information

Developing Open Source code for Pyramidal Histogram Feature Sets

Developing Open Source code for Pyramidal Histogram Feature Sets Developing Open Source code for Pyramidal Histogram Feature Sets BTech Project Report by Subodh Misra subodhm@iitk.ac.in Y648 Guide: Prof. Amitabha Mukerjee Dept of Computer Science and Engineering IIT

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Learning global properties of scene images based on their correlational structures

Learning global properties of scene images based on their correlational structures Learning global properties of scene images based on their correlational structures Wooyoung Lee Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 wooyoung@cs.cmu.edu Michael S.

More information

arxiv: v2 [cs.lg] 22 Mar 2014

arxiv: v2 [cs.lg] 22 Mar 2014 Alireza Makhzani makhzani@psi.utoronto.ca Brendan Frey frey@psi.utoronto.ca University of Toronto, 10 King s College Rd. Toronto, Ontario M5S 3G4, Canada arxiv:1312.5663v2 [cs.lg] 22 Mar 2014 Abstract

More information

Supplementary material: Efficient pedestrian detection by directly optimizing the partial area under the ROC curve

Supplementary material: Efficient pedestrian detection by directly optimizing the partial area under the ROC curve Supplementary material: Efficient pedestrian detection by directly optimizing the partial area under the ROC curve Sakrapee Paisitkriangkrai, Chunhua Shen, Anton van den Hengel The University of Adelaide,

More information

Computer Vision for HCI. Topics of This Lecture

Computer Vision for HCI. Topics of This Lecture Computer Vision for HCI Interest Points Topics of This Lecture Local Invariant Features Motivation Requirements, Invariances Keypoint Localization Features from Accelerated Segment Test (FAST) Harris Shi-Tomasi

More information

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Abstract: This project aims at creating a benchmark for Deep Learning (DL) algorithms

More information

Return of the Devil in the Details: Delving Deep into Convolutional Nets

Return of the Devil in the Details: Delving Deep into Convolutional Nets Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield - Karen Simonyan - Andrea Vedaldi - Andrew Zisserman University of Oxford The Devil is still in the Details 2011 2014

More information

Learning Algorithms for Medical Image Analysis. Matteo Santoro slipguru

Learning Algorithms for Medical Image Analysis. Matteo Santoro slipguru Learning Algorithms for Medical Image Analysis Matteo Santoro slipguru santoro@disi.unige.it June 8, 2010 Outline 1. learning-based strategies for quantitative image analysis 2. automatic annotation of

More information

arxiv: v1 [cs.lg] 16 Jan 2013

arxiv: v1 [cs.lg] 16 Jan 2013 Stochastic Pooling for Regularization of Deep Convolutional Neural Networks arxiv:131.3557v1 [cs.lg] 16 Jan 213 Matthew D. Zeiler Department of Computer Science Courant Institute, New York University zeiler@cs.nyu.edu

More information

Introduction to Deep Learning

Introduction to Deep Learning ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)

More information

Deconvolution Networks

Deconvolution Networks Deconvolution Networks Johan Brynolfsson Mathematical Statistics Centre for Mathematical Sciences Lund University December 6th 2016 1 / 27 Deconvolution Neural Networks 2 / 27 Image Deconvolution True

More information

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features

More information

Face Recognition A Deep Learning Approach

Face Recognition A Deep Learning Approach Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison

More information