Bilinear Models for Fine-Grained Visual Recognition
|
|
- Bartholomew Clark
- 6 years ago
- Views:
Transcription
1 Bilinear Models for Fine-Grained Visual Recognition Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst
2 Fine-grained visual recognition Example: distinguish between closely related categories California gull Ringed beak gull inter-category variation v.s intra-category variation location, pose, viewpoint, background, lighting, gender, season, etc 2
3 Part-based models Localize parts and compare corresponding locations (, )(, ) Factor out the variation due to pose, viewpoint and location 3
4 General image classification Classical approaches: Image as a collection of patches Bag-of-Visual-Word models [Leung & Malik 99, Csurka et al. 04] Orderless pooling and no explicit modeling of pose or viewpoint Fisher vectors are effective for fine-grained classification [california, ringed beak, heermann,..] Modern approaches: Convolutional Neural Networks (CNNs) 4
5 Texture representations vs CNNs image! non-linear filters! feature field! encoder! representation! x" Handcrafted features" Orderless pooling" ɸ(x)! local h spatial i x" c1! c2! c3! c4! c5! f6! f7! f8! ɸ(x)! convolutional layers! fully-connected (FC) layers! 5
6 Mix and match image" non-linear filters" feature field" encoder" representation" Handcrafted local descriptors" Orderless pooling" ɸ(x)" CNN local descriptors" CNN FC pooling" 6
7 Mix and match Standard texture representation! image# non-linear filters# feature field# encoder# representation# Handcrafted local descriptors! Orderless pooling! x! ɸ(x)# CNN local descriptors! CNN FC pooling! [Sivic and Zisserman 03, Csurka et al. 04, Perronnin and Dance 07, Perronnin et al. 10, Jegou et al. 10] # 7
8 Mix and match Standard application of CNN! image# non-linear filters# feature field# encoder# representation# Handcrafted local descriptors! Orderless pooling! ɸ(x)# CNN local descriptors! CNN FC pooling! FC-CNN# [Chatfield et al. 14, Girshick et al. 2014, Gong et al. 14, Razavin et al. 14]# 8
9 Mix and match Order-less pooling of CNN local descriptors! image" non-linear filters" feature field" encoder" representation" Handcrafted local descriptors! Orderless pooling! ɸ(x)" CNN local descriptors! CNN FC pooling! 9
10 Mix and match CNN descriptors pooled by Fisher Vector! image" non-linear filters" feature field" encoder" representation" Handcrafted local descriptors! Fisher Vector! ɸ(x)" CNN local descriptors! CNN FC pooling! FV-CNN" [Cimpoi et al., CVPR 15] 10
11 CNNs for texture recognition Texture recognition accuracy Dataset FV-SIFT FC-CNN FV-CNN KT-2b FMD DTD Cimpoi et al., CVPR 15 Significant improvements over simply using CNN features KT-2b dataset (11 material categories) 11
12 Title CNNs Text for texture recognition Texture recognition accuracy Dataset FV-SIFT FC-CNN FV-CNN FV-CNN (VD) KT-2b FMD DTD Cimpoi et al., CVPR 15 Using the very deep model from Oxford VGG group that performed among the best on LSVRC 2014 (ImageNet classification challenge) Facebook 12
13 Scenes as textures MIT Indoor dataset (67 classes) Prev. best: 70.8% FV-CNN 81.0% Zhou et al., NIPS 14 Cimpoi et al., CVPR 15 Domain specific CNNs with texture models ImageNet (no data augmentation) The advantage of domain specific training disappears with FV-CNN AlexNet Better CNNs lead to better performance and FV is better than FC Facebook 13
14 Drawbacks of FV-CNN The features in the FV-CNN are not learned Hand crafted (e.g., SIFT), or CNN but trained with a different architecture (e.g., fully connected layers) The GMM parameters are learned in an unsupervised manner Can we learn the features for FV models? Gradient of the FV with respect to x is nasty (x) = exp 1 2 (x µ i) T 1 (x µ i i ) Pi exp 1 2 (x µ i) T 1 (x µ i i ) Partial attempt: Sydorov et al. 10 learn GMM params discriminatively 14
15 Bilinear models for classification A bilinear model for classification is a four-tuple B =(f A,f B, P, C) feature extractor pooling classification f : L I!R c D e.g., SIFT is R 1x128 image local features l I f A (l, I) f B (l, I) f A (l, I) T f B (l, I) bilinear(l, I) 15
16 Bilinear models for classification A bilinear model for classification is a four-tuple B =(f A,f B, P, C) feature extractor pooling classification f : L I!R c D image local features pooling descriptor C class l f A (l, I) f A (l, I) T f B (l, I) X bilinear(l, I) f B (l, I) l I bilinear(l, I) (I) 16
17 Fisher vector is a bilinear model [california, ringed beak, heermann,..] Fisher vector (FV) models [Perronnin et al., 10] Locally encode statistics of feature x weighted by η(x) FV is bilinear model with 17
18 Bilinear CNN model Decouple fa and fb by using separate CNNs CNN stream A chestnut sided warbler CNN stream B convolutional + pooling layers pooled bilinear vector softmax 18
19 Bilinear CNN model Back-propagation though the bilinear layer is easy Allows end-to-end training 19
20 Experiments: Methods Local features: SIFT descriptor [Lowe ICCV99] VGG-M (5 conv + 2 fc layers) [Chatfield et al., BMVC14] VGG-VD (16 conv + 2 fc layers) [Simonyan and Zisserman, ICLR15] Pooling architectures: Fully connected pooling (FC) Fisher vector pooling (FV) Bilinear pooling (B) Notation examples: FC-CNN (M) Fully connected pooling with VGG-M FV-CNN (D) Fisher vector pooling with VGG-VD [Cimpoi et al.,15] B-CNN (D, M) Bilinear pooling with VGG-D and VGG-M 20
21 Experiments: Datasets small, clutter clutter CUB species 11,788 images FGVC Aircraft 100 variants 10,000 images Stanford cars 196 models 16,185 images All models are trained with image labels only No part or object annotations are used at training or test time 21
22 Results: Birds classification Accuracy on CUB dataset Setting: provided with only the image at test time Method w/o ft w/ ft FV-SIFT
23 Results: Birds classification Accuracy on CUB dataset Setting: provided with only the image at test time Method w/o ft w/ ft FV-SIFT 18.8 FC-CNN (M)
24 Results: Birds classification Accuracy on CUB dataset Setting: provided with only the image at test time Method w/o ft w/ ft FV-SIFT 18.8 FC-CNN (M) 52.7 FV-CNN (M)
25 Results: Birds classification Accuracy on CUB dataset Setting: provided with only the image at test time Method w/o ft w/ ft FV-SIFT 18.8 FC-CNN (M) 52.7 FV-CNN (M) 61.1 B-CNN (M,M)
26 Results: Birds classification Accuracy on CUB dataset Setting: provided with only the image at test time Method w/o ft w/ ft FV-SIFT FC-CNN (M) fine-tuning helps FV-CNN (M) 61.1 B-CNN (M,M)
27 Results: Birds classification Accuracy on CUB dataset Setting: provided with only the image at test time Method w/o ft w/ ft FV-SIFT FC-CNN (M) FV-CNN (M) B-CNN (M,M) 72.0 direct fine-tuning is hard so use ft FC-CNN models indirect fine-tuning helps outperforms multi-scale FV-CNN Cimpoi et al. CVPR 15 27
28 Results: Birds classification Accuracy on CUB dataset Setting: provided with only the image at test time Method w/o ft w/ ft FV-SIFT FC-CNN (M) FV-CNN (M) B-CNN (M,M) direct fine-tuning is hard so use ft FC-CNN models indirect fine-tuning helps outperforms multi-scale FV-CNN Cimpoi et al. CVPR 15 28
29 Results: Birds classification Accuracy on CUB dataset Setting: provided with only the image at test time Method w/o ft w/ ft FV-SIFT FC-CNN (M) FC-CNN (D) FV-CNN (M) B-CNN (M,M)
30 Results: Birds classification Accuracy on CUB dataset Setting: provided with only the image at test time Method w/o ft w/ ft FV-SIFT FC-CNN (M) FC-CNN (D) FV-CNN (M) FV-CNN (D) B-CNN (M,M)
31 Results: Birds classification Accuracy on CUB dataset Setting: provided with only the image at test time Method w/o ft w/ ft FV-SIFT FC-CNN (M) FC-CNN (D) FV-CNN (M) FV-CNN (D) B-CNN (M,M) B-CNN (D,M) B-CNN (D,D)
32 Results: Birds classification Accuracy on CUB dataset Setting: provided with only the image at test time Method w/o ft w/ ft FV-SIFT FC-CNN (M) FC-CNN (D) FV-CNN (M) FV-CNN (D) B-CNN (M,M) B-CNN (D,M) B-CNN (D,D) SoTA 84.1 [1], 82.0 [2], 73.9 [3], 75.7 [4] [1] Spatial Transformer Networks, Jaderberg et al., NIPS 15 [2] Fine-Grained Rec. w/o Part Annotations, Krause et al., CVPR 15 (+ object bounding-boxes) [3] Part-based R-CNNs, Zhang et al., ECCV 14 (+ part bounding-boxes) [4] Pose normalized CNNs, Branson et al., BMVC 14 (+ landmarks) 32
33 Results: Comparison with SoTA FC-CNN FV-CNN B-CNN SoTA CUB FVGC-Aircraft Stanford-Cars
34 Model visualization Visualizing top activation on fine-tuned B-CNN (D,M) on CUB D-Net M-Net 34
35 Most confused categories CUB-200 Aircrafts Stanford cars 35
36 Visualizing invariances (I) 0 (I ) invariant images 36
37 Visualizing categories Iˆ = arg max score(c, I) + (I) I log-likelihood image-prior 37
38 Conclusion Bilinear models achieve high accuracy on fine-grained tasks w/ limited data allow end-to-end training of bag-of-visual-words models fast at test time: B-CNN [D, D] 10 images/second on TeslaK40 code and models: References Fisher vector CNNs [Cimpoi et al., CVPR 15] Bilinear CNNs [Lin et al., ICCV 2015] Visualizing and Understanding Deep Texture Representations [Lin and Maji, arxiv: , Nov 2015] Tsung-Yu Lin Aruni RoyChowdhury 38
Bilinear CNN Models for Fine-grained Visual Recognition
Bilinear CNN Models for Fine-grained Visual Recognition Tsung-Yu Lin Aruni RoyChowdhury Subhransu Maji University of Massachusetts, Amherst {tsungyulin,arunirc,smaji}@cs.umass.edu Abstract We propose bilinear
More informationFINE-GRAINED recognition involves classification of instances
1 Bilinear CNNs for Fine-grained Visual Recognition Tsung-Yu Lin Aruni RoyChowdhury Subhransu Maji arxiv:1504.07889v6 [cs.cv] 1 Jun 2017 Abstract We present a simple and effective architecture for fine-grained
More informationDFT-based Transformation Invariant Pooling Layer for Visual Classification
DFT-based Transformation Invariant Pooling Layer for Visual Classification Jongbin Ryu 1, Ming-Hsuan Yang 2, and Jongwoo Lim 1 1 Hanyang University 2 University of California, Merced Abstract. We propose
More informationImproved Bilinear Pooling with CNNs
LIN AND MAJI: IMPROVED BILINEAR POOLING WITH CNNS 1 Improved Bilinear Pooling with CNNs Tsung-Yu Lin tsyungyulin@cs.umass.edu Subhransu Maji smaji@cs.umass.edu University of Massachusetts, Amherst Amherst,
More informationVisualizing and Understanding Deep Texture Representations
Visualizing and Understanding Deep Teture Representations Tsung-Yu Lin University of Massachusetts, Amherst tsungyulin@cs.umass.edu Subhransu Maji University of Massachusetts, Amherst smaji@cs.umass.edu
More informationDeep learning for object detection. Slides from Svetlana Lazebnik and many others
Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep
More informationIn Defense of Fully Connected Layers in Visual Representation Transfer
In Defense of Fully Connected Layers in Visual Representation Transfer Chen-Lin Zhang, Jian-Hao Luo, Xiu-Shen Wei, Jianxin Wu National Key Laboratory for Novel Software Technology, Nanjing University,
More informationDeep Spatial Pyramid Ensemble for Cultural Event Recognition
215 IEEE International Conference on Computer Vision Workshops Deep Spatial Pyramid Ensemble for Cultural Event Recognition Xiu-Shen Wei Bin-Bin Gao Jianxin Wu National Key Laboratory for Novel Software
More informationLocally-Transferred Fisher Vectors for Texture Classification
Locally-Transferred Fisher Vectors for Texture Classification Yang Song 1, Fan Zhang 2, Qing Li 1, Heng Huang 3, Lauren J. O Donnell 2, Weidong Cai 1 1 School of Information Technologies, University of
More informationCompact Bilinear Pooling
Compact Bilinear Pooling Yang Gao 1, Oscar Beijbom 1, Ning Zhang 2, Trevor Darrell 1 1 EECS, UC Berkeley 2 Snapchat Inc. {yg, obeijbom, trevor}@eecs.berkeley.edu {ning.zhang}@snapchat.com Abstract Bilinear
More informationarxiv: v1 [cs.cv] 18 Apr 2016
arxiv:1604.04994v1 [cs.cv] 18 Apr 2016 Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval Xiu-Shen Wei, Jian-Hao Luo, and Jianxin Wu National Key Laboratory for Novel Software
More informationMultiple VLAD encoding of CNNs for image classification
Multiple VLAD encoding of CNNs for image classification Qing Li, Qiang Peng, Chuan Yan 1 arxiv:1707.00058v1 [cs.cv] 30 Jun 2017 Abstract Despite the effectiveness of convolutional neural networks (CNNs)
More informationDeepIndex for Accurate and Efficient Image Retrieval
DeepIndex for Accurate and Efficient Image Retrieval Yu Liu, Yanming Guo, Song Wu, Michael S. Lew Media Lab, Leiden Institute of Advance Computer Science Outline Motivation Proposed Approach Results Conclusions
More informationarxiv: v2 [cs.cv] 23 May 2016
Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition arxiv:1605.06217v2 [cs.cv] 23 May 2016 Xiao Liu Jiang Wang Shilei Wen Errui Ding Yuanqing Lin Baidu Research
More informationIs 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014
Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014 Problem Definition Viewpoint estimation: Given an image, predicting viewpoint for object of
More informationAn Exploration of Computer Vision Techniques for Bird Species Classification
An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex
More informationMultilayer and Multimodal Fusion of Deep Neural Networks for Video Classification
Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Xiaodong Yang, Pavlo Molchanov, Jan Kautz INTELLIGENT VIDEO ANALYTICS Surveillance event detection Human-computer interaction
More informationReturn of the Devil in the Details: Delving Deep into Convolutional Nets
Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield - Karen Simonyan - Andrea Vedaldi - Andrew Zisserman University of Oxford The Devil is still in the Details 2011 2014
More informationon learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015
on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,
More informationPart Localization by Exploiting Deep Convolutional Networks
Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.
More informationAggregating Descriptors with Local Gaussian Metrics
Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,
More informationRobust Scene Classification with Cross-level LLC Coding on CNN Features
Robust Scene Classification with Cross-level LLC Coding on CNN Features Zequn Jie 1, Shuicheng Yan 2 1 Keio-NUS CUTE Center, National University of Singapore, Singapore 2 Department of Electrical and Computer
More informationMulti-region Bilinear Convolutional Neural Networks for Person Re-Identification
Multi-region Bilinear Convolutional Neural Networks for Person Re-Identification The choice of the convolutional architecture for embedding in the case of person re-identification is far from obvious.
More informationWeakly Supervised Learning of Part Selection Model with Spatial Constraints for Fine-Grained Image Classification
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Weakly Supervised Learning of Part Selection Model with Spatial Constraints for Fine-Grained Image Classification Xiangteng
More informationUsing Machine Learning for Classification of Cancer Cells
Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.
More informationCS231N Section. Video Understanding 6/1/2018
CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image
More informationObject detection with CNNs
Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals
More informationNeural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks
Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks Marcel Simon and Erik Rodner Computer Vision Group, University of Jena, Germany http://www.inf-cv.uni-jena.de/constellation_model_revisited
More informationDeep filter banks for texture recognition and segmentation
Deep filter banks for texture recognition and segmentation Mircea Cimpoi University of Oxford Subhransu Maji University of Massachusetts, Amherst Andrea Vedaldi University of Oxford mircea@robots.ox.ac.uk
More informationMixtures of Gaussians and Advanced Feature Encoding
Mixtures of Gaussians and Advanced Feature Encoding Computer Vision Ali Borji UWM Many slides from James Hayes, Derek Hoiem, Florent Perronnin, and Hervé Why do good recognition systems go bad? E.g. Why
More informationIMAGE CLASSIFICATION WITH MAX-SIFT DESCRIPTORS
IMAGE CLASSIFICATION WITH MAX-SIFT DESCRIPTORS Lingxi Xie 1, Qi Tian 2, Jingdong Wang 3, and Bo Zhang 4 1,4 LITS, TNLIST, Dept. of Computer Sci&Tech, Tsinghua University, Beijing 100084, China 2 Department
More informationProceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong
, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA
More informationInstance Retrieval at Fine-grained Level Using Multi-Attribute Recognition
Instance Retrieval at Fine-grained Level Using Multi-Attribute Recognition Roshanak Zakizadeh, Yu Qian, Michele Sasdelli and Eduard Vazquez Cortexica Vision Systems Limited, London, UK Australian Institute
More informationarxiv: v1 [cs.cv] 8 Dec 2016
Deep TEN: Texture Encoding Network Hang Zhang Jia Xue Kristin Dana Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ 08854 {zhang.hang,jia.xue}@rutgers.edu, kdana@ece.rutgers.edu
More informationObject Detection Based on Deep Learning
Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
More informationStructured Prediction using Convolutional Neural Networks
Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer
More informationLearning to generate 3D shapes
Learning to generate 3D shapes Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst http://people.cs.umass.edu/smaji August 10, 2018 @ Caltech Creating 3D shapes
More informationRegionlet Object Detector with Hand-crafted and CNN Feature
Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu Overview of this section Regionlet
More informationDeep Neural Networks:
Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,
More informationSpatial Localization and Detection. Lecture 8-1
Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts
More informationComputer Vision Lecture 16
Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationDeep Co-occurrence Feature Learning for Visual Object Recognition
Deep Co-occurrence Feature Learning for Visual Object Recognition Ya-Fang Shih 1,2, Yang-Ming Yeh 1,3, Yen-Yu Lin 1, Ming-Fang Weng 4, Yi-Chang Lu 3, and Yung-Yu Chuang 2 1 Research Center for Information
More informationarxiv: v2 [cs.cv] 19 Dec 2014
Fisher Kernel for Deep Neural Activations Donggeun Yoo, Sunggyun Park, Joon-Young Lee, and In So Kweon arxiv:42.628v2 [cs.cv] 9 Dec 204 KAIST Daejeon, 305-70, Korea. dgyoo@rcv.kaist.ac.kr, sunggyun@kaist.ac.kr,
More informationLearning Visual Semantics: Models, Massive Computation, and Innovative Applications
Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features
More informationMoNet: Moments Embedding Network
MoNet: Moments Embedding Network Mengran Gou 1 Fei Xiong 2 Octavia Camps 1 Mario Sznaier 1 1 Electrical and Computer Engineering, Northeastern University, Boston, MA, US 2 Information Sciences Institute,
More informationIndustrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ
Stop Line Detection and Distance Measurement for Road Intersection based on Deep Learning Neural Network Guan-Ting Lin 1, Patrisia Sherryl Santoso *1, Che-Tsung Lin *ǂ, Chia-Chi Tsai and Jiun-In Guo National
More informationEncoder-Decoder Networks for Semantic Segmentation. Sachin Mehta
Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:
More informationAFTER the breakthrough in image classification using. Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval
ACCEPTED BY IEEE TIP 1 Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval Xiu-Shen Wei, Jian-Hao Luo, Jianxin Wu, Member, IEEE, Zhi-Hua Zhou, Fellow, IEEE arxiv:1604.04994v2
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationKernel Pooling for Convolutional Neural Networks
Kernel Pooling for Convolutional Neural Networks Yin Cui 1,2 Feng Zhou 3 Jiang Wang 4 Xiao Liu 3 Yuanqing Lin 3 Serge Belongie 1,2 1 Department of Computer Science, Cornell University 2 Cornell Tech 3
More informationDirect Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.
[ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that
More informationTransferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery
Remote Sens. 2015, 7, 14680-14707; doi:10.3390/rs71114680 OPEN ACCESS remote sensing ISSN 2072-4292 www.mdpi.com/journal/remotesensing Article Transferring Deep Convolutional Neural Networks for the Scene
More informationYiqi Yan. May 10, 2017
Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field
More informationReal-time Object Detection CS 229 Course Project
Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection
More informationFine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task
Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Kyunghee Kim Stanford University 353 Serra Mall Stanford, CA 94305 kyunghee.kim@stanford.edu Abstract We use a
More informationREGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION
REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationPaper Motivation. Fixed geometric structures of CNN models. CNNs are inherently limited to model geometric transformations
Paper Motivation Fixed geometric structures of CNN models CNNs are inherently limited to model geometric transformations Higher-level features combine lower-level features at fixed positions as a weighted
More informationDeep Quantization: Encoding Convolutional Activations with Deep Generative Model
Deep Quantization: Encoding Convolutional Activations with Deep Generative Model Zhaofan Qiu, Ting Yao, and Tao Mei University of Science and Technology of China, Hefei, China Microsoft Research, Beijing,
More informationOverall Description. Goal: to improve spatial invariance to the input data. Translation, Rotation, Scale, Clutter, Elastic
Philippe Giguère Overall Description Goal: to improve spatial invariance to the input data Translation, Rotation, Scale, Clutter, Elastic How: add a learnable module which explicitly manipulate spatially
More informationDEEP STRUCTURED OUTPUT LEARNING FOR UNCONSTRAINED TEXT RECOGNITION
DEEP STRUCTURED OUTPUT LEARNING FOR UNCONSTRAINED TEXT RECOGNITION Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman Visual Geometry Group, Department Engineering Science, University of Oxford,
More informationBag-of-features. Cordelia Schmid
Bag-of-features for category classification Cordelia Schmid Visual search Particular objects and scenes, large databases Category recognition Image classification: assigning a class label to the image
More informationMask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma
Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left
More informationLocalizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition Xiao Liu, Jiang Wang,
More informationGrassmann Pooling as Compact Homogeneous Bilinear Pooling for Fine-Grained Visual Classification
Grassmann Pooling as Compact Homogeneous Bilinear Pooling for Fine-Grained Visual Classification Xing Wei 1, Yue Zhang 1, Yihong Gong 1, Jiawei Zhang 2, and Nanning Zheng 1 1 Institute of Artificial Intelligence
More informationLecture 7: Semantic Segmentation
Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr
More informationFast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation
Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation Al-Hussein A. El-Shafie Faculty of Engineering Cairo University Giza, Egypt elshafie_a@yahoo.com Mohamed Zaki Faculty
More informationDL Tutorial. Xudong Cao
DL Tutorial Xudong Cao Historical Line 1960s Perceptron 1980s MLP BP algorithm 2006 RBM unsupervised learning 2012 AlexNet ImageNet Comp. 2014 GoogleNet VGGNet ImageNet Comp. Rule based AI algorithm Game
More informationLow-rank Bilinear Pooling for Fine-Grained Classification
Low-rank Bilinear Pooling for Fine-Grained Classification Shu Kong, Charless Fowlkes Dept. of Computer Science, University of California, Irvine {skong2, fowlkes}@ics.uci.edu Abstract Pooling second-order
More informationDeep Learning in Visual Recognition. Thanks Da Zhang for the slides
Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object
More informationRecognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)
Recognition of Animal Skin Texture Attributes in the Wild Amey Dharwadker (aap2174) Kai Zhang (kz2213) Motivation Patterns and textures are have an important role in object description and understanding
More informationarxiv: v3 [cs.cv] 13 Apr 2016 Abstract
ProNet: Learning to Propose Object-specific Boxes for Cascaded Neural Networks Chen Sun 1,2 Manohar Paluri 2 Ronan Collobert 2 Ram Nevatia 1 Lubomir Bourdev 3 1 USC 2 Facebook AI Research 3 UC Berkeley
More informationDeep Image: Scaling up Image Recognition
Deep Image: Scaling up Image Recognition Ren Wu 1 Shengen Yan Yi Shan Qingqing Dang Gang Sun ABSTRACT We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep
More informationSunrise or Sunset: Selective Comparison Learning for Subtle Attribute Recognition
ZHOU, GAO, WU: SUNRISE OR SUNSET 1 Sunrise or Sunset: Selective Comparison Learning for Subtle Attribute Recognition Hong-Yu Zhou zhouhy@lamda.nju.edu.cn Bin-Bin Gao gaobb@lamda.nju.edu.cn Jianxin Wu wujx2001@nju.edu.cn
More informationDeep Texture Manifold for Ground Terrain Recognition
Deep Texture Manifold for Ground Terrain Recognition Jia Xue 1 Hang Zhang 1,2 Kristin Dana 1 1 Department of Electrical and Computer Engineering, Rutgers University, 2 Amazon AI {jia.xue, zhang.hang}@rutgers.edu,
More informationDeep Quantization: Encoding Convolutional Activations with Deep Generative Model
Deep Quantization: Encoding Convolutional Activations with Deep Generative Model Zhaofan Qiu, Ting Yao, and Tao Mei University of Science and Technology of China, Hefei, China Microsoft Research, Beijing,
More informationMetric Learning for Large Scale Image Classification:
Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Thomas Mensink 1,2 Jakob Verbeek 2 Florent Perronnin 1 Gabriela Csurka 1 1 TVPA - Xerox Research Centre
More informationRich feature hierarchies for accurate object detection and semantic segmentation
Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification
More informationarxiv: v1 [cs.cv] 24 Apr 2015
Object Level Deep Feature Pooling for Compact Image Representation arxiv:1504.06591v1 [cs.cv] 24 Apr 2015 Konda Reddy Mopuri and R. Venkatesh Babu Video Analytics Lab, SERC, Indian Institute of Science,
More informationFully Convolutional Networks for Semantic Segmentation
Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling
More informationPASCAL VOC Classification: Local Features vs. Deep Features. Shuicheng YAN, NUS
PASCAL VOC Classification: Local Features vs. Deep Features Shuicheng YAN, NUS PASCAL VOC Why valuable? Multi-label, Real Scenarios! Visual Object Recognition Object Classification Object Detection Object
More informationTowards Large-Scale Semantic Representations for Actionable Exploitation. Prof. Trevor Darrell UC Berkeley
Towards Large-Scale Semantic Representations for Actionable Exploitation Prof. Trevor Darrell UC Berkeley traditional surveillance sensor emerging crowd sensor Desired capabilities: spatio-temporal reconstruction
More informationarxiv: v2 [cs.cv] 27 Nov 2015
Weakly Supervised Deep Detection Networks Hakan Bilen Andrea Vedaldi Department of Engineering Science, University of Oxford arxiv:1511.02853v2 [cs.cv] 27 Nov 2015 Abstract Weakly supervised learning of
More informationActions and Attributes from Wholes and Parts
Actions and Attributes from Wholes and Parts Georgia Gkioxari UC Berkeley gkioxari@berkeley.edu Ross Girshick Microsoft Research rbg@microsoft.com Jitendra Malik UC Berkeley malik@berkeley.edu Abstract
More informationIntroduction to Neural Networks
Introduction to Neural Networks Machine Learning and Object Recognition 2016-2017 Course website: http://thoth.inrialpes.fr/~verbeek/mlor.16.17.php Biological motivation Neuron is basic computational unit
More informationSupplementary material for Analyzing Filters Toward Efficient ConvNet
Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable
More informationImproving Face Recognition by Exploring Local Features with Visual Attention
Improving Face Recognition by Exploring Local Features with Visual Attention Yichun Shi and Anil K. Jain Michigan State University Difficulties of Face Recognition Large variations in unconstrained face
More informationYOLO9000: Better, Faster, Stronger
YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object
More informationDeconvolutions in Convolutional Neural Networks
Overview Deconvolutions in Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Deconvolutions in CNNs Applications Network visualization
More informationAdapting Models to Signal Degradation using Distillation
SU, MAJI: ADAPTING MODELS TO SIGNAL DEGRADATION USING DISTILLATION 1 Adapting Models to Signal Degradation using Distillation Jong-Chyi Su jcsu@cs.umass.edu Subhransu Maji smaji@cs.umass.edu University
More informationECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016
ECCV 2016 Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 Fundamental Question What is a good vector representation of an object? Something that can be easily predicted from 2D
More informationarxiv: v2 [cs.cv] 30 Jul 2016
arxiv:1512.04065v2 [cs.cv] 30 Jul 2016 Cross-dimensional Weighting for Aggregated Deep Convolutional Features Yannis Kalantidis, Clayton Mellina and Simon Osindero Computer Vision and Machine Learning
More informationLearning Transferable Features with Deep Adaptation Networks
Learning Transferable Features with Deep Adaptation Networks Mingsheng Long, Yue Cao, Jianmin Wang, Michael I. Jordan Presented by Changyou Chen October 30, 2015 1 Changyou Chen Learning Transferable Features
More informationarxiv: v1 [cs.cv] 6 Jul 2016
arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks
More informationFuzzy Set Theory in Computer Vision: Example 3
Fuzzy Set Theory in Computer Vision: Example 3 Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Purpose of these slides are to make you aware of a few of the different CNN architectures
More informationPixels to Voxels: Modeling Visual Representation in the Human Brain
Pixels to Voxels: Modeling Visual Representation in the Human Brain Authors: Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, Jack L. Gallant Presenters: JunYoung Gwak, Kuan Fang Outlines Background Motivation
More informationPIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN
PIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN By Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, Jack L. Gallant University of California Berkeley Presented by Tim Patzelt AGENDA
More informationarxiv: v1 [cs.cv] 26 Dec 2015
Part-Stacked CNN for Fine-Grained Visual Categorization arxiv:1512.08086v1 [cs.cv] 26 Dec 2015 Abstract Shaoli Huang* University of Technology, Sydney Sydney, Ultimo, NSW 2007, Australia shaoli.huang@student.uts.edu.au
More information