NeoNet: Object centric training for image recognition

Similar documents
Snapdragon NPE Overview

802.11ax: Meeting the demands of modern networks. Gopi Sirineni, Vice President Qualcomm Technologies, Inc. April 19,

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Convolutional Neural Networks

Date: 13 June Location: Sophia Antipolis. Integrating the SIM. Dr. Adrian Escott. Qualcomm Technologies, Inc.

A New Foundation for the Connected Home. Qualcomm Technologies, Inc. June

Making always-on vision a reality. Dr. Evgeni Gousev Sr. Director, Engineering Qualcomm Technologies, Inc. September 22,

The Future of Mobility. Keith Kressin Senior Vice President, Product Management Qualcomm Technologies,

Leading the world to 5G

Preparing for Mass Market Virtual Reality: A Mobile Perspective. Qualcomm Technologies, Inc. September 16, 2017

Mobile: the foundation of the digital economy

Making XR a reality for everyone

ITU Workshop Combating grey devices. Audrey Scozzaro Ferrazzini Standardisation and Industrial Policy Lead, EMENA Government Affairs 28 June 2016

Perform. Travis Lanier Sr. Director, Product Management Qualcomm Technologies,

RISC-V: Opportunities and Challenges in SoCs

2016 Seoul DevU Pipeline Cache Object. Bill Licea-Kane Engineer, Senior Staff Qualcomm Technologies, Inc

Global 5G spectrum update

IoT with 5G Technology

Immersion. Tim Leland Vice President, Product Management Qualcomm Technologies,

Innovative Wireless Technologies for Mobile Broadband

Bringing link-time optimization to the embedded world: (Thin)LTO with Linker Scripts

Machine Learning for Selected SI & PI Problems. Timothy Michalka Sr. Director, Engineering Qualcomm Technologies, Inc. 18-Oct-2017

Ultra-low Power Always-On Computer Vision

Making 5G NR a reality

How Qualcomm Wireless Reach M&E Catalyzes SGBs. Lauren H Reed Staff Analyst Government Affairs 1

5 GHz for consumers. Guillaume Lebrun Director 7 th June 2016

5G Spectrum Access. Wassim Chourbaji. Vice President, Government Affairs and Public Policy EMEA Qualcomm Technologies Inc.

Open Source and Standards: A Proposal for Collaboration

Impact of the current LLVM inlining strategy on complex embedded application memory utilization and performance

CSR102x Starter Development Kit

Qualcomm Snapdragon Technologies

ITU-R Handbook on Global Trends in IMT

Enabling and Optimizing MariaDB on Qualcomm Centriq 2400 Arm-based Servers

Qualcomm Snapdragon 450 Mobile Platform

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Mobile technology: A catalyst for change. Kedar Kondap Vice President, Product Management Qualcomm Technologies, Inc.

UvA-DARE (Digital Academic Repository)

Welcome to the 5G age

Structured Prediction using Convolutional Neural Networks

5G Design and Technology. Durga Malladi SVP Engineering Qualcomm Technologies, Inc. October 19 th, 2016

Making 5G NR a commercial reality

Towards 5G NR Commercialization

Making Mobile 5G a Commercial Reality. Peter Carson Senior Director Product Marketing Qualcomm Technologies, Inc.

Spatial Localization and Detection. Lecture 8-1

Leading the World to 5G NR

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Heterogeneous Computing Made Easy:

5G and Automotive Cellular Vehicle-to-Everything (C-V2X) March 2017

Japan. December 13, C-V2X Trial in Japan. Qualcomm Technologies Inc.

Spectrum for 4G and 5G. Qualcomm Technologies, Inc. October, 2016

Future Networked Car Geneva Auto Show 2018

Deconvolutions in Convolutional Neural Networks

Emerging Vision Technologies: Enabling a New Era of Intelligent Devices

November, Qualcomm s 5G vision Qualcomm Technologies, Inc. and/or its affiliates.

Sanjeev Athalye, Sr. Director, Product Management Qualcomm Technologies, Inc.

Qualcomm WiPower Flexible Wireless Charging

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

11. Neural Network Regularization

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Making 5G NR a reality

Yiqi Yan. May 10, 2017

Know your data - many types of networks

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

The Mobile Future of extended Reality (XR) Hugo Swart Senior Director, Product Management Qualcomm Technologies, Inc.

The promise of higher spectrum bands for 5G. Rasmus Hellberg PhD Senior Director, Technical Marketing Qualcomm Technologies, Inc.

Parag Kar Vice President, Government Affairs, Qualcomm India & South Asia 15 th June, Spectrum Sharing : Opportunities & Challenges

Lecture 7: Semantic Segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks

Innovation and EU s cultural policy. Guillaume Lebrun Director Qualcomm 12 th May 2016

Making an on-device personal assistant a reality

Qualcomm AllPlay Smart Media Platform

5G NR to high capacity and

Fuzzy Set Theory in Computer Vision: Example 3

Object detection with CNNs

Vulkan API 杨瑜, 资深工程师

Smartphone-powered future

Computer Vision Lecture 16

Learning Deep Features for Visual Recognition

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

YOLO9000: Better, Faster, Stronger

New Technologies for UAV/UGV

Learning Deep Representations for Visual Recognition

Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Regionlet Object Detector with Hand-crafted and CNN Feature

Spectrum for 4G and 5G. Qualcomm Technologies, Inc. July, 2017

Deep Learning for Computer Vision II

Advanced Video Analysis & Imaging

Fei-Fei Li & Justin Johnson & Serena Yeung

Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction SUPPLEMENTAL MATERIAL

Exploiting noisy web data for largescale visual recognition

Neville Meijers VP, Business Development Qualcomm Technologies, Inc. Extending the Benefits of LTE to Unlicensed Spectrum

Yelp Restaurant Photo Classification

All You Want To Know About CNNs. Yukun Zhu

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Transcription:

Daniel Fontijne, Koen E. A. van de Sande, Eren Gölge, R. Blythe Towal, Anthony Sarah, Cees G. M. Snoek Qualcomm Technologies, Inc., December 17, 2015 NeoNet: Object centric training for image recognition Presented by: Daniel Fontijne Senior Staff Engineer 1

Summary Score Ranking Classification 4.8 - Localization 12.6 3 Detection 53.6 2 Places 2 17.6 3 Key component: object centric training 2

1 2 3 4 5 Foundation Classification Localization Detection Places 2 Agenda 3

Ioffe & Szegedy ICML 2015 Foundation: Batch-normalized inception The base network for all our submissions is the inception network as introduced in the batch normalization paper by Ioffe & Szegedy. 4

Lin et al. ICLR 2014 Network in an inception module Note: the 5x5 path is not used. 5

1 2 3 4 5 Foundation Classification Localization Detection Places 2 Agenda 6

Classification overview Ensemble of 12 networks Train really long, 350 epochs. Randomized RELU. Test at 14 scales, 10 crops. Object preserving crops. Xu et al. ICML workshop 2015 7

Quiz: What is this? 8

Answer: Flower 9

Quiz: In case you got that right, what is this? 10

Answer: Butterfly 11

Object preserving crops Random crop selection might miss the object of interest. Network tries to remember butterfly when presented with leaves. Solution: use provided boxes to assure crop contains the object. For images without box annotation, use best box predicted by localization system. X 12

Component breakdown Epochs Single view Multi-view First attempt at inception + batch norm 112 8.63% 6.58% Train ~325 epochs 324 8.77% 6.34% 32 images / mini-batch 130 8.74% 6.68% Object preserving, 32 images/mini-batch 120 8.59% 6.51% Object preserving with generated boxes 130 8.47% 6.46% Ensemble of 12 - - 4.84% 13

Final classification results Top-5 classification error on test set MSRA ReCeption Trimps-Soushen NeoNet Ioffe & Szegedy, ICML '15 GoogLeNet ('14) Clarifai ('13) SuperVision ('12) 3.6 3.6 4.6 4.8 4.9 6.7 11.7 16.4 0 5 10 15 20 NeoNet is competitive on object classification 14

1 2 3 4 5 Foundation Classification Localization Detection Places 2 Agenda 15

Localization overview Foundations. Generate box proposals using fast selective search. Train box-classification networks on crops. Uijlings et al. IJCV 2013 Girshik et al. PAMI 2016 Object centric training. Object pre-training network. Object localization network. Object alignment network. 16

Object centric pre-training Use the bounding box annotations for pre-training. Increase the number of classes from N to 2*N+1: N classes for the object, well-framed. N classes for partially framed objects. 1 class for background, i.e., object not visible. 1% 1.5% improvement compared to standard pre-training. 17

Object centric pre-training Dual-head network to account for missing bounding boxes. One with 1000 outputs. One with 2001 outputs. No error gradient when box annotation is missing. 18

Object localization network Fully connected layer on top of Inception 4e and 5b. Re-train Inception 5b and new head. Then fine-tune entire network. 19

Quiz: Is this an entire skyscraper? 20

Bordering the object A 40% border worked best. Such that in 7x7 resolution of Inception 5b there is a 1 pixel border. 21

Object alignment network Extra head for object box alignment. Classification head is also used, but with cross entropy cost. 22

Object alignment border Object box alignment moves corners up to 50% of the width and height. 100% border allows network to see full range of possible alignments. ~2% gain. 23

Component breakdown Top-5 localization error First attempt 24.0% 40% border, FC on top of inception 5b 22.5% FC on top of inception 5b+4e 21.8% Object centric pre-training 20.3% Ensemble of 8 17.5% Object alignment 15.5% Final result with ILSVRC blacklist applied 14.5% 24

Final localization results Top-5 localization error on test set MSRA 9.0 Trimps-Soushen NeoNet 12.3 12.6 VGG ('14) 25.3 OverFeat ('13) 30.0 SuperVision ('12) 34.2 UvA ('11) 42.5 0 5 10 15 20 25 30 35 40 45 NeoNet is competitive on object localization 25

1 2 3 4 5 Foundation Classification Localization Detection Places 2 Agenda 26

Improved selective search Fast Improved Color spaces 2 3 Segmentations 2 4 Similarity functions 2 4 Average boxes 1,600 5,000 MABO 77.5 82.6 Time (s) 0.8 2.4 map 41.2 44.0 27

Object detection network Five inception-style networks for feature extraction Two trained on 1,000 object classes, no input border, fine-tuning on detection boxes Three trained on 1,000 object windows with input border, no fine tuning 28

Component breakdown map on validation set Best object class network 44.6 Best object centric network 47.7 Ensemble of 5 51.9 29

Component breakdown map on validation set Best object class network 44.6 Best object centric network 47.7 Ensemble of 5 51.9 + context 53.2 Four classification networks fine tuned with 200 detection class labels 30

Component breakdown map on validation set Best object class network 44.6 Best object centric network 47.7 Ensemble of 5 51.9 + context 53.2 + object alignment 54.6 31

Final detection results Mean average precision on test set MSRA 62.1 NeoNet Deep-ID Net 53.6 52.7 GoogLeNet ('14) 43.9 UvA/Euvision ('13) 22.6 0 10 20 30 40 50 60 70 NeoNet is competitive on object detection 32

1 2 3 4 5 Foundation Classification Localization Detection Places 2 Agenda 33

Places 2 overview Our best submission: an ensemble of two inception nets. Reduce fully connected layer from 1,000 to 401 outputs. Use pre-trained weights from ImageNet 1,000 (~325 epochs). Train Inception 5b and fully connected layer for two epochs. Fine-tune entire network for eight epochs. Adding other networks reduced the accuracy 34

Component breakdown (top-5 error) Single view Multi view ~325 epochs pre-training 17.9% 16.8% First attempt. 112 epochs pre-training. 19.1% 17.9% 512 channel 5b, Alex-style FC head 20.0% 18.4% 32 images / batch 18.7% 17.6% Randomized RELU 18.2% 17.5% Ensemble of 7-16.7% Ensemble of 2-16.5% 35

Final places 2 results Top-5 classification error on test set WM SIAT_MMLAB NeoNet Trimps-Soushen ntu_rose MERL HiVision 16.9 17.4 17.6 18.0 19.3 19.4 20 15 16 17 18 19 20 21 NeoNet is competitive on scene classification 36

On device recognition at 18 ms 37

Summary Score Ranking Classification 4.8 - Localization 12.6 3 Detection 53.6 2 Places 2 17.6 3 Key component: object centric training 38

Thank you Follow us on: For more information on Qualcomm, visit us at: www.qualcomm.com & www.qualcomm.com/blog Nothing in these materials is an offer to sell any of the components or devices referenced herein. 2013-2015 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm and Snapdragon are trademarks of Qualcomm Incorporated, registered in the United States and other countries. Zeroth is a trademark of Qualcomm Incorporated. Other products and brand names may be trademarks or registered trademarks of their respective owners. References in this presentation to Qualcomm may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT. 39