PASCAL VOC Classification: Local Features vs. Deep Features. Shuicheng YAN, NUS
|
|
- Jocelin Simpson
- 6 years ago
- Views:
Transcription
1 PASCAL VOC Classification: Local Features vs. Deep Features Shuicheng YAN, NUS
2 PASCAL VOC Why valuable? Multi-label, Real Scenarios! Visual Object Recognition Object Classification Object Detection Object Segmentation Person, Horse, Barrier, Table, etc PASCAL VOC Visual object classes challenges Main tasks: object classification, detection and segmentation Be held yearly Tens of teams from universities and industries participated including INRIA, Berkeley, Oxford, NEC, etc. Become the dataset for visual object recognition research Other tasks: person layout, action recognition, etc. Data: 20 object classes, ~23,000 images with fine labeling
3 PASCAL VOC: NUS-(PSL) team results 2014, Classification MAP to , 2011, 2010, Winner of object classification task. (cls) 2012, Winner of object segmentation task. (seg) 2010, Honorable mention of object detection task. (det) NUS-(PSL) architecture Visual Object Recognition A joint learning of cls-det-seg. Cls: Global Information Det: Local Information Seg: Finedetailed Information
4 PASCAL VOC: HCP 2014: 91.4% Deep feature 2014: 83.2% Sub-category 2012: 82.2% GHM 2011: 78.7% Context-SVM LLC 2010: 73.8% 25% Deep feature 2013: 79.0%
5 I. Spring of Local Features:
6 Pipeline Feature Representation Low Level Features Feature Encoding Feature Pooling GHM[2]: Generalized Hierarchical Matching (GHM) for object central problems. Object central pooling. Model Learning Classifier Learning Subcategory mining[1]: Automatically mining the visual subcategories based on ambiguity modeling. Context Modeling Contextualization[3]: Mutual Contextualization for object classification and detection tasks. Great performance improvement. 1. Jian Dong, Qiang Chen, Jiashi Feng, Wei Xia, Zhongyang Huang, Shuicheng YAN, Subcategory-aware Object Classification. In CVPR' Qiang Chen, Zheng Song, Yang Hua, Zhongyang Huang, Shuicheng Yan. Hierarchical Matching with Side Information for Image Classification. In CVPR Zheng Song*, Qiang Chen*, Zhongyang Huang, Yang Hua, and Shuicheng Yan. Contextualizing Object Detection and Classification. In CVPR'11.
7 Framework NUS-PSL 2010 Visual Features Chair Local Feature Extraction Feature Coding Kernel Nonlinear Kernel Classification Post Processing SVM Kernel Regression Regression Feature Pooling SPM Detection Results Max pooling Linear Kernel Confidence Refinement with Exclusive prior
8 Framework NUS-PSL 2012 Subcategory Mining Chair III Subcategory Flipping Mining Visual Features Local Feature Extraction Feature Coding Feature Coding FK Flipping Kernel Nonlinear+ Nonlinear Kernel Linear Kernel I Contextualized Object Classification and Detection Classification Post Processing SVM Kernel Regression Regression Feature Pooling Generalized SPM, SPM GHM II Hierarchical Matching Subcategory Detection Results Detection Results Flipping Max pooling Linear Kernel Confidence Refinement with Exclusive prior
9 Outline for VOC: Context model: Contextualized Object Classification and Detection Feature pooling: Generalized Hierarchical Matching/Pooling Subcategory learning: Sub-Category Aware Detection & Classification
10 Contextualized Object Classification and Detection Det: Local patches with matched local shape/texture Cls: Global probabilities to contain objects Occurrence Probability Det Cls Whether Can Exchange Information?
11 Observations Object classification and detection are mutually complemental to each other. Each subject task serves as context task for the other. Context is not robust for the subject task, so use only when necessary person Scene/Global level information is not stable for object detection. False alarm of object detection harms object classification.
12 Contextualized SVM - Formulation Adaptive contextualization Sample specific classification Adaptive embedding of context features Original classification hyperplane n: feature dim m: context dim Configurable model complexity: low rank constraint dim n x m R x (n + m) Context model (dim m) Easy to be solved and kernelized, if Selection to ambiguous samples (dim n) is fixed.
13 Contextualized SVM - Formulation Ambiguity modeling: Define the ambiguity degree of sample as the hinge loss of the subject task, Learn the Ambiguity-guided Mixture Model (AMM) through EM to maximize the following objective, Multi-mode ambiguity term is defined as the posterior of each mixture r,
14 Iterative Co-training of Detection and Classification Learn to Detect Classification Pipeline Detection Pipeline Context from initial Classification Initial Model Detection Feature Detection Feature Context from 1st Classification Context SVM Learn to Classify Context from initial Detection Context SVM Classification Feature Context from 1st Detection Classification Feature a) initial model b) 1st iteration of ContextSVM c) 2nd iteration of ContextSVM
15 Results Iterative contextualization: Mean AP values of 20 classes on VOC 2010 train/val
16 Results Comparison with state-of-the-arts on VOC 2010
17 Exemplar results Representative examples of the baseline (without contextualization) and Context-SVM for classification task.
18 Outline for VOC: Context model: Contextualized Object Classification and Detection Feature pooling: Generalized Hierarchical Matching/Pooling Subcategory learning: Sub-Category Aware Detection & Classification
19 Generalized Hierarchical Matching/Pooling Traditional Pooling: SPM = approximate geometric constraint Not optimal for object recognition due to misalignment (a) Images (b) SPM partitions (c) Object Confidence Map partition
20 Hierarchical Pooling for Image Classification Design a general form of hierarchical matching with side information. Represent image with hierarchical structure
21 Hierarchical Matching Kernel Image Similarity Kernel is defined as the weighted sum over each cluster kernel. General form of SPM, PMK, etc Flexible to integrate other side information.
22 Generalized Hierarchical Matching/Pooling Encoded local feature vs. side information (a) Side information and Image (c) Hierarchical structure representation (b) Hierarchically cluster by side information. Level 1 (top),2 (mid),3 (bottom) (d) Matching/pooling within each cluster Utilize side information to hierarchically pool local features
23 Side information design Side Information - Detection Confidence Map Images Sliding window Process Shape Model sub-window Score vote back to image Score vote back to image Fusing Object Confidence Maps Appearance Model
24 Results VOC
25 Outline for VOC: Context model: Contextualized Object Classification and Detection Feature pooling: Generalized Hierarchical Matching/Pooling Subcategory learning: Sub-Category Aware Detection & Classification
26 Sub-Category Mining Chair Sofa Diningtable Ambiguity Guided Subcategory Mining Subcategory-aware Object Classification Subcategory Model 1 Subcategory Model 2 Fusion Model Subcategory Model N
27 Sub-Category Mining Ambiguous Categories Sofa Instance Affinity Graph Detected Subgraphs Corresponding Subcategories Ambiguity Chair Graph Shift Visualization Similarity Ambiguity Similarity Subcategory Mining based on both Similarity & Ambiguity Calculate the sample intra-class similarity Calculate the sample inter-class ambiguity Detect dense subgraphs by graph shift algorithm [1] Subgraphs to subcategories. [1] Hairong Liu, Shuicheng Yan. Robust Graph Mode Seeking by Graph Shift. ICML 2010
28 Sub-Category Aware Detection & Classification Subcategory Model 1 Testing Image Feature Sliding/Selective Extraction Window Search Local Feature Extraction and Coding GHM Pooling Detection Model Image Classification Model Representation Subcategory Model N Subcategory Classification Result 1 Subcategory Detection Result 1 Subcategory Classification Result N Subcategory Detection Result N Fusion Model Category level Result
29 Sub-Category Mining Result Subcategories Bus Chair Outliers
30 Summary of VOC results 2010 Our Best Other's Best aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor MAP Our Best Other's Best Our Best Other's Best
31 II. Spring of Deep Feature:
32 CNN: Single-label Image Classification Definition Assign one and only one label from a pre-defined set to an image Explicit assumption: object is roughly aligned Alex Net [1] made a great breakthrough in single-label classification in ILSVRC2012 (with 10% gain over the previous methods) [1] A. Krizhevsky, I. Sutskever, G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012.
33 CNN: Multi-label Image Classification Definition Assign multiple labels from a pre-defined set to an image vs. Single-label images Multi-label images Challenges Foreground objects are not roughly aligned Interactions between different objects, e.g. partial visibility and occlusion A large number of training images are required The label space is expanded from n to 2^n Directly CNN training is unreasonable and unreliable!
34
35 Hypotheses-CNN-Pooling(HCP) Our framework Scores for individual hypothesis Shared convolutional neural network dog person sheep Hypotheses assumption: single-labeled c Max 256 Max 96 Pooling Pooling Max Pooling Max Pooling
36 Characteristics of Our Framework No ground-truth bounding box information is required for training on the multi-label image dataset The proposed HCP infrastructure is robust to the noisy and/or redundant hypotheses No explicit hypothesis label is required for training The shared CNN can be well pre-trained with a large-scale single-label image dataset The HCP outputs are naturally multi-label prediction results
37 Training of HCP Hypotheses extraction Initialization of HCP Pre-training on a large-scale single-label image set, e.g. ImageNet Image-fine-tuning on a multi-label image set Hypotheses-fine-tuning
38 Hypotheses Extraction Criteria: High object detection recall rate Small number of hypotheses High computational efficiency Solution: BING [2]+ Boxes clustering [2] M.-M. Cheng, J. Warrell, W.-Y. Lin, and P.H.S.Torr. BING: Binarized normed gradients for objectness estimation at 300fps. CVPR 2014.
39 Hypotheses Extraction
40 Initialization of HCP Pre-training Step1 Single-label Images (e.g. ImageNet) Parameters transferring Step2 Multi-label Images (e.g. Pascal VOC) Image-fine-tuning
41 Hypotheses-fine-tuning
42 Experimental Results A subset from detection dataset of ILSVRC 2013 is used for BING training
43 Experimental Results Performance on PASCAL VOC 2007 New
44 Experimental Results Performance on PASCAL VOC 2012 New-1 New-2
45 Experimental Results Complementary Analysis: Hand-crafted features vs. Deep features
46 Experimental Results One test sample from VOC hypotheses for each image, 1~1.5s Generate hypotheses Feed into the shared CNN person hors e car person Cross-hypothesis max-pooling person horse car
47 New Result: Network in Network (NIN) NIN: CNN with non-linear filters, yet without final fully-connected NN layer CNN Intuitively less overfitting globally, and more discriminative locally (not finally used in our submission due to the surgery of our main team member, but very effective) [4] With less parameter # [4] Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron C. Courville, Yoshua Bengio: Maxout Networks. ICML (3) 2013: NIN
48 Better Local Abstraction Local patch is projected to its feature vector. Using a small network. Motivation: Better Local Abstraction! Cascaded Cross Channel Parametric Pooling (CCCP) Lin, Min, Qiang Chen, and Shuicheng Yan. "Network In Network." ICLR-2014.
49 CCCP Cascaded 1x1 Convolution in Implementation
50 Global Average Pooling CNN NIN Confidence map of each category Save tons of parameters
51 NIN in ILSVR2014 To avoid hyper-parameter tuning, we put cccp layer directly on convolution layers of ZFNet. (Network in ZFNet) layer details Conv1 Stride = 2, kernel = 7x7, channel_out = 96 Cccp1 Output = 96 Conv2 Stride = 2, kernel = 5x5, channel_out = 256 Cccp2 Output = 256 Conv3 Stride = 1, kernel = 3x3, channel_out = 512 Cccp3 Output = 256 Conv4 Stride = 1, kernel = 3x3, channel_out = 1024 Cccp4 Output = 512 Cccp5 Output = 384 Conv5 Stride = 1, kernel = 3x3, channel_out = 512 Cccp6 Output = 256 layer details Conv1 Stride = 2, kernel = 7x7, channel_out = 96 Conv2 Stride = 2, kernel = 5x5, channel_out = 256 Conv3 Stride = 1, kernel = 3x3, channel_out = 512 Conv4 Stride = 1, kernel = 3x3, channel_out = 1024 Conv5 Stride = 1, kernel = 3x3, channel_out = 512 Fc1 Output = 4096 Fc2 Output = 4096 Fc1 Output = 4096 Fc3 Output = 1000 Fc2 Output = 4096 Fc3 Output = 1000 (10.91%) With 256xN training and 3 view test Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." Computer Vision ECCV Springer International Publishing,
52 NIN in HCP Scores for individual hypothesis Shared NIN dog person sheep Max Pooling c
53 Compared with State-of-the-arts on VOC 2012 From 81.7% Category plane bicycle bird boat bottle bus car cat chair cow table dog horse motor person plant sheep sofa train tv MAP NUS-PSL[1] PRE-1000C[2] PRE-1512[2] < 90.3% Chatfield et al.[3] HCP-NIN HCP-NIN+NUS-PSL [1] S. Yan, J. Dong, Q. Chen, Z. Song, Y. Pan, W. Xia, H. Zhongyang, Y. Hua, and S. Shen. Generalized hierarchical matching for subcategory aware object classification. In Visual Recognition Challange workshop, ECCV, [2] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. CVPR, [3] K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman. Return of the Devil in the Details: Delving Deep into Convolutional Nets, BMVC, 2014
54 Demo Online Demo
55 Highest and Lowest Score Five Images for Each Class Aeroplane Bicycle Bird Boat Bottle
56 Highest and Lowest Score Five Images for Each Class Bus Car Cat Chair Cow
57 Highest and Lowest Score Five Images for Each Class Dining table Dog Horse Motorbike Person
58 Highest and Lowest Score Five Images for Each Class Pottedplant Sheep Sofa Train TV monitor
59 What s next? Better Solution for Small/Occluded Objects? More Extra Data? HCP 2014: 91.4% Deep feature 2014: 83.2% Better Local Features? Sub-category Better Deep Features? GHM 2011: 78.7% Context-SVM LLC 2009: 66.5% 2010: 73.8% 2012: 82.2% 25%
60 Shuicheng YAN
Deep condolence to Professor Mark Everingham
Deep condolence to Professor Mark Everingham Towards VOC2012 Object Classification Challenge Generalized Hierarchical Matching for Sub-category Aware Object Classification National University of Singapore
More informationReturn of the Devil in the Details: Delving Deep into Convolutional Nets
Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield - Karen Simonyan - Andrea Vedaldi - Andrew Zisserman University of Oxford The Devil is still in the Details 2011 2014
More informationCPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia
CPSC340 State-of-the-art Neural Networks Nando de Freitas November, 2012 University of British Columbia Outline of the lecture This lecture provides an overview of two state-of-the-art neural networks:
More informationFine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task
Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Kyunghee Kim Stanford University 353 Serra Mall Stanford, CA 94305 kyunghee.kim@stanford.edu Abstract We use a
More informationObject Detection Based on Deep Learning
Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
More informationWeakly Supervised Object Recognition with Convolutional Neural Networks
GDR-ISIS, Paris March 20, 2015 Weakly Supervised Object Recognition with Convolutional Neural Networks Ivan Laptev ivan.laptev@inria.fr WILLOW, INRIA/ENS/CNRS, Paris Joint work with: Maxime Oquab Leon
More informationBeyond Sliding Windows: Object Localization by Efficient Subwindow Search
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search Christoph H. Lampert, Matthew B. Blaschko, & Thomas Hofmann Max Planck Institute for Biological Cybernetics Tübingen, Germany Google,
More informationSupplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network
Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network Anurag Arnab and Philip H.S. Torr University of Oxford {anurag.arnab, philip.torr}@eng.ox.ac.uk 1. Introduction
More informationREGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION
REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological
More informationCategory-level localization
Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period
More informationDetection III: Analyzing and Debugging Detection Methods
CS 1699: Intro to Computer Vision Detection III: Analyzing and Debugging Detection Methods Prof. Adriana Kovashka University of Pittsburgh November 17, 2015 Today Review: Deformable part models How can
More informationSemantic Pooling for Image Categorization using Multiple Kernel Learning
Semantic Pooling for Image Categorization using Multiple Kernel Learning Thibaut Durand (1,2), Nicolas Thome (1), Matthieu Cord (1), David Picard (2) (1) Sorbonne Universités, UPMC Univ Paris 06, UMR 7606,
More informationAn Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b
5th International Conference on Advanced Materials and Computer Science (ICAMCS 2016) An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b 1
More informationDeformable Part Models
CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones
More informationExtend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network
Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of
More informationSpatial Localization and Detection. Lecture 8-1
Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday
More informationarxiv: v3 [cs.cv] 13 Apr 2016 Abstract
ProNet: Learning to Propose Object-specific Boxes for Cascaded Neural Networks Chen Sun 1,2 Manohar Paluri 2 Ronan Collobert 2 Ram Nevatia 1 Lubomir Bourdev 3 1 USC 2 Facebook AI Research 3 UC Berkeley
More informationObject Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018
Object Detection TA : Young-geun Kim Biostatistics Lab., Seoul National University March-June, 2018 Seoul National University Deep Learning March-June, 2018 1 / 57 Index 1 Introduction 2 R-CNN 3 YOLO 4
More informationRegionlet Object Detector with Hand-crafted and CNN Feature
Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu Overview of this section Regionlet
More informationHierarchical Image-Region Labeling via Structured Learning
Hierarchical Image-Region Labeling via Structured Learning Julian McAuley, Teo de Campos, Gabriela Csurka, Florent Perronin XRCE September 14, 2009 McAuley et al (XRCE) Hierarchical Image-Region Labeling
More informationObject Recognition II
Object Recognition II Linda Shapiro EE/CSE 576 with CNN slides from Ross Girshick 1 Outline Object detection the task, evaluation, datasets Convolutional Neural Networks (CNNs) overview and history Region-based
More informationFashion Analytics and Systems
Learning and Vision Group, NUS (NUS-LV) Fashion Analytics and Systems Shuicheng YAN eleyans@nus.edu.sg National University of Singapore [ Special thanks to Luoqi LIU, Xiaodan LIANG, Si LIU, Jianshu LI]
More informationUnified, real-time object detection
Unified, real-time object detection Final Project Report, Group 02, 8 Nov 2016 Akshat Agarwal (13068), Siddharth Tanwar (13699) CS698N: Recent Advances in Computer Vision, Jul Nov 2016 Instructor: Gaurav
More informationLearning Representations for Visual Object Class Recognition
Learning Representations for Visual Object Class Recognition Marcin Marszałek Cordelia Schmid Hedi Harzallah Joost van de Weijer LEAR, INRIA Grenoble, Rhône-Alpes, France October 15th, 2007 Bag-of-Features
More informationBag-of-features. Cordelia Schmid
Bag-of-features for category classification Cordelia Schmid Visual search Particular objects and scenes, large databases Category recognition Image classification: assigning a class label to the image
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationarxiv: v1 [cs.cv] 4 Jun 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks arxiv:1506.01497v1 [cs.cv] 4 Jun 2015 Shaoqing Ren Kaiming He Ross Girshick Jian Sun Microsoft Research {v-shren, kahe, rbg,
More informationDirect Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.
[ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that
More informationAnalysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009
Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context
More informationOptimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation
Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation Md Atiqur Rahman and Yang Wang Department of Computer Science, University of Manitoba, Canada {atique, ywang}@cs.umanitoba.ca
More informationFeature-Fused SSD: Fast Detection for Small Objects
Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn
More informationLEARNING OBJECT SEGMENTATION USING A MULTI NETWORK SEGMENT CLASSIFICATION APPROACH
Università degli Studi dell'insubria Varese, Italy LEARNING OBJECT SEGMENTATION USING A MULTI NETWORK SEGMENT CLASSIFICATION APPROACH Simone Albertini Ignazio Gallo, Angelo Nodari, Marco Vanetti albertini.simone@gmail.com
More informationCS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018
CS 1674: Intro to Computer Vision Object Recognition Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018 Different Flavors of Object Recognition Semantic Segmentation Classification + Localization
More informationCAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha
More informationSubcategory-aware Object Classification
2013 IEEE Conference on Computer Vision and Pattern Recognition Subcategory-aware Object Classification Jian Dong 1, Wei Xia 1, Qiang Chen 1, Jianshi Feng 1, Zhongyang Huang 2, Shuicheng Yan 1 1 Department
More informationDeepIndex for Accurate and Efficient Image Retrieval
DeepIndex for Accurate and Efficient Image Retrieval Yu Liu, Yanming Guo, Song Wu, Michael S. Lew Media Lab, Leiden Institute of Advance Computer Science Outline Motivation Proposed Approach Results Conclusions
More informationFully Convolutional Networks for Semantic Segmentation
Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling
More informationLecture 5: Object Detection
Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based
More informationExploit Bounding Box Annotations for Multi-label Object Recognition
Exploit Bounding Box Annotations for Multi-label Object Recognition Hao Yang 1, Joey Tianyi Zhou 2, Yu Zhang 3, Bin-Bin Gao 4, Jianxin Wu 4, and Jianfei Cai 1 1 SCE, Nanyang Technological University, lancelot365@gmail.com,
More informationGroupout: A Way to Regularize Deep Convolutional Neural Network
Groupout: A Way to Regularize Deep Convolutional Neural Network Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu Abstract Groupout is a new technique
More informationStructured Prediction using Convolutional Neural Networks
Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer
More informationVideo Semantic Indexing using Object Detection-Derived Features
Video Semantic Indexing using Object Detection-Derived Features Kotaro Kikuchi, Kazuya Ueki, Tetsuji Ogawa, and Tetsunori Kobayashi Dept. of Computer Science, Waseda University, Japan Abstract A new feature
More informationUsing Machine Learning for Classification of Cancer Cells
Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.
More informationGradient of the lower bound
Weakly Supervised with Latent PhD advisor: Dr. Ambedkar Dukkipati Department of Computer Science and Automation gaurav.pandey@csa.iisc.ernet.in Objective Given a training set that comprises image and image-level
More informationDeep Learning with Tensorflow AlexNet
Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification
More informationRich feature hierarchies for accurate object detection and semantic segmentation
Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification
More informationEfficient Segmentation-Aided Text Detection For Intelligent Robots
Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related
More informationOptimizing Object Detection:
Lecture 10: Optimizing Object Detection: A Case Study of R-CNN, Fast R-CNN, and Faster R-CNN Visual Computing Systems Today s task: object detection Image classification: what is the object in this image?
More informationG-CNN: an Iterative Grid Based Object Detector
G-CNN: an Iterative Grid Based Object Detector Mahyar Najibi 1, Mohammad Rastegari 1,2, Larry S. Davis 1 1 University of Maryland, College Park 2 Allen Institute for Artificial Intelligence najibi@cs.umd.edu
More informationSingle-Shot Refinement Neural Network for Object Detection -Supplementary Material-
Single-Shot Refinement Neural Network for Object Detection -Supplementary Material- Shifeng Zhang 1,2, Longyin Wen 3, Xiao Bian 3, Zhen Lei 1,2, Stan Z. Li 4,1,2 1 CBSR & NLPR, Institute of Automation,
More informationLarge-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds
Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds Sudheendra Vijayanarasimhan Kristen Grauman Department of Computer Science University of Texas at Austin Austin,
More informationObject Detection by 3D Aspectlets and Occlusion Reasoning
Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition
More informationProject 3 Q&A. Jonathan Krause
Project 3 Q&A Jonathan Krause 1 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations 2 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations
More informationPart Localization by Exploiting Deep Convolutional Networks
Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.
More informationRich feature hierarchies for accurate object detection and semantic segmentation
Rich feature hierarchies for accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Presented by Pandian Raju and Jialin Wu Last class SGD for Document
More informationCost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling
[DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji
More informationSSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang
SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation
More informationDetection and Localization with Multi-scale Models
Detection and Localization with Multi-scale Models Eshed Ohn-Bar and Mohan M. Trivedi Computer Vision and Robotics Research Laboratory University of California San Diego {eohnbar, mtrivedi}@ucsd.edu Abstract
More informationAttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)
AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015) Donggeun Yoo, Sunggyun Park, Joon-Young Lee, Anthony Paek, In So Kweon. State-of-the-art frameworks for object
More informationDeep learning for object detection. Slides from Svetlana Lazebnik and many others
Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep
More informationTiny ImageNet Visual Recognition Challenge
Tiny ImageNet Visual Recognition Challenge Ya Le Department of Statistics Stanford University yle@stanford.edu Xuan Yang Department of Electrical Engineering Stanford University xuany@stanford.edu Abstract
More informationCSE 559A: Computer Vision
CSE 559A: Computer Vision Fall 2018: T-R: 11:30-1pm @ Lopata 101 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao Xia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/
More informationContent-Based Image Recovery
Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose
More informationKnow your data - many types of networks
Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for
More informationJOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA
JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS Zhao Chen Machine Learning Intern, NVIDIA ABOUT ME 5th year PhD student in physics @ Stanford by day, deep learning computer vision scientist
More informationSpeaker: Ming-Ming Cheng Nankai University 15-Sep-17 Towards Weakly Supervised Image Understanding
Towards Weakly Supervised Image Understanding (WSIU) Speaker: Ming-Ming Cheng Nankai University http://mmcheng.net/ 1/50 Understanding Visual Information Image by kirkh.deviantart.com 2/50 Dataset Annotation
More informationLinear combinations of simple classifiers for the PASCAL challenge
Linear combinations of simple classifiers for the PASCAL challenge Nik A. Melchior and David Lee 16 721 Advanced Perception The Robotics Institute Carnegie Mellon University Email: melchior@cmu.edu, dlee1@andrew.cmu.edu
More informationObject detection with CNNs
Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals
More informationc 2011 by Pedro Moises Crisostomo Romero. All rights reserved.
c 2011 by Pedro Moises Crisostomo Romero. All rights reserved. HAND DETECTION ON IMAGES BASED ON DEFORMABLE PART MODELS AND ADDITIONAL FEATURES BY PEDRO MOISES CRISOSTOMO ROMERO THESIS Submitted in partial
More informationRobust Scene Classification with Cross-level LLC Coding on CNN Features
Robust Scene Classification with Cross-level LLC Coding on CNN Features Zequn Jie 1, Shuicheng Yan 2 1 Keio-NUS CUTE Center, National University of Singapore, Singapore 2 Department of Electrical and Computer
More informationAttributes. Computer Vision. James Hays. Many slides from Derek Hoiem
Many slides from Derek Hoiem Attributes Computer Vision James Hays Recap: Human Computation Active Learning: Let the classifier tell you where more annotation is needed. Human-in-the-loop recognition:
More informationTS 2 C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection
TS 2 C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection Yunchao Wei 1, Zhiqiang Shen 1,2, Bowen Cheng 1, Honghui Shi 3, Jinjun Xiong 3, Jiashi Feng 4, and
More informationMachine Learning. MGS Lecture 3: Deep Learning
Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer
More informationarxiv: v2 [cs.cv] 22 Sep 2014
arxiv:47.6v2 [cs.cv] 22 Sep 24 Analyzing the Performance of Multilayer Neural Networks for Object Recognition Pulkit Agrawal, Ross Girshick, Jitendra Malik {pulkitag,rbg,malik}@eecs.berkeley.edu University
More informationarxiv: v1 [cs.cv] 13 Jul 2018
arxiv:1807.04897v1 [cs.cv] 13 Jul 2018 TS 2 C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection Yunchao Wei 1, Zhiqiang Shen 1,2, Bowen Cheng 1, Honghui Shi
More informationPart-based and local feature models for generic object recognition
Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza
More informationYiqi Yan. May 10, 2017
Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field
More informationFACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE. Chubu University 1200, Matsumoto-cho, Kasugai, AICHI
FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE Masatoshi Kimura Takayoshi Yamashita Yu Yamauchi Hironobu Fuyoshi* Chubu University 1200, Matsumoto-cho,
More informationFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects
More informationLearning Spatial Context: Using Stuff to Find Things
Learning Spatial Context: Using Stuff to Find Things Wei-Cheng Su Motivation 2 Leverage contextual information to enhance detection Some context objects are non-rigid and are more naturally classified
More informationScene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science
Scene Text Recognition for Augmented Reality Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Outline Research area and motivation Finding text in natural scenes Prior art Improving
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts
More informationRanking Figure-Ground Hypotheses for Object Segmentation
Ranking Figure-Ground Hypotheses for Object Segmentation João Carreira, Fuxin Li, Cristian Sminchisescu Faculty of Mathematics and Natural Science, INS, University of Bonn http://sminchisescu.ins.uni-bonn.de/
More informationProceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong
, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA
More informationMixtures of Gaussians and Advanced Feature Encoding
Mixtures of Gaussians and Advanced Feature Encoding Computer Vision Ali Borji UWM Many slides from James Hayes, Derek Hoiem, Florent Perronnin, and Hervé Why do good recognition systems go bad? E.g. Why
More informationarxiv: v1 [cs.cv] 23 Apr 2015
Object Detection Networks on Convolutional Feature Maps Shaoqing Ren Kaiming He Ross Girshick Xiangyu Zhang Jian Sun Microsoft Research {v-shren, kahe, rbg, v-xiangz, jiansun}@microsoft.com arxiv:1504.06066v1
More informationLearning Object Representations for Visual Object Class Recognition
Learning Object Representations for Visual Object Class Recognition Marcin Marszalek, Cordelia Schmid, Hedi Harzallah, Joost Van de Weijer To cite this version: Marcin Marszalek, Cordelia Schmid, Hedi
More informationCascade Region Regression for Robust Object Detection
Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Cascade Region Regression for Robust Object Detection Jiankang Deng, Shaoli Huang, Jing Yang, Hui Shuai, Zhengbo Yu, Zongguang Lu, Qiang Ma, Yali
More informationProposal-free Network for Instance-level Object Segmentation
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. X, X 20XX 1 Proposal-free Network for Instance-level Object Segmentation Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Jianchao
More informationSegmenting Objects in Weakly Labeled Videos
Segmenting Objects in Weakly Labeled Videos Mrigank Rochan, Shafin Rahman, Neil D.B. Bruce, Yang Wang Department of Computer Science University of Manitoba Winnipeg, Canada {mrochan, shafin12, bruce, ywang}@cs.umanitoba.ca
More informationGenerative Adversarial Network
Generative Adversarial Network Many slides from NIPS 2014 Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio Generative adversarial
More informationFinal Report: Smart Trash Net: Waste Localization and Classification
Final Report: Smart Trash Net: Waste Localization and Classification Oluwasanya Awe oawe@stanford.edu Robel Mengistu robel@stanford.edu December 15, 2017 Vikram Sreedhar vsreed@stanford.edu Abstract Given
More informationStructured Models in. Dan Huttenlocher. June 2010
Structured Models in Computer Vision i Dan Huttenlocher June 2010 Structured Models Problems where output variables are mutually dependent or constrained E.g., spatial or temporal relations Such dependencies
More informationsegdeepm: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection
: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection Yukun Zhu Raquel Urtasun Ruslan Salakhutdinov Sanja Fidler University of Toronto {yukun,urtasun,rsalakhu,fidler}@cs.toronto.edu
More informationComputer Vision Lecture 16
Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:
More informationSemantic Segmentation
Semantic Segmentation UCLA:https://goo.gl/images/I0VTi2 OUTLINE Semantic Segmentation Why? Paper to talk about: Fully Convolutional Networks for Semantic Segmentation. J. Long, E. Shelhamer, and T. Darrell,
More informationFind that! Visual Object Detection Primer
Find that! Visual Object Detection Primer SkTech/MIT Innovation Workshop August 16, 2012 Dr. Tomasz Malisiewicz tomasz@csail.mit.edu Find that! Your Goals...imagine one such system that drives information
More informationPart based models for recognition. Kristen Grauman
Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily
More information