Todo before next class

Similar documents
Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Yiqi Yan. May 10, 2017

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Classifying a specific image region using convolutional nets with an ROI mask as input

Deep Residual Learning

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

DETECTION and analysis of faces is a challenging problem

Final Report: Smart Trash Net: Waste Localization and Classification

Recognizing people. Deva Ramanan

Spatial Localization and Detection. Lecture 8-1

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Improved Face Detection and Alignment using Cascade Deep Convolutional Network

Object Detection Based on Deep Learning

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

DD2427 Final Project Report. Human face attributes prediction with Deep Learning

arxiv: v1 [cs.cv] 31 Mar 2017

Computer Vision Lecture 16

Computer Vision Lecture 16

Deep Face Recognition. Nathan Sun

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Deep Learning for Object detection & localization

Fuzzy Set Theory in Computer Vision: Example 3

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Hybrid Cascade Model for Face Detection in the Wild Based on Normalized Pixel Difference and a Deep Convolutional Neural Network

arxiv: v1 [cs.cv] 29 Sep 2016

arxiv: v1 [cs.cv] 15 Oct 2018

Rich feature hierarchies for accurate object detection and semantic segmentation

Improving Face Recognition by Exploring Local Features with Visual Attention

CS231N Section. Video Understanding 6/1/2018

1 MS Student OR 1 Undergrad Student

Lightweight Two-Stream Convolutional Face Detection

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou

Unified, real-time object detection

An Exploration of Computer Vision Techniques for Bird Species Classification

Implementing Deep Learning for Video Analytics on Tegra X1.

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Flow-Based Video Recognition

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Learning Deep Features for Visual Recognition

Channel Locality Block: A Variant of Squeeze-and-Excitation

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 4 Nov 2016

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

Lecture 7: Semantic Segmentation

SEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Rich feature hierarchies for accurate object detection and semantic segmentation

Object detection with CNNs

Multi-Path Region-Based Convolutional Neural Network for Accurate Detection of Unconstrained Hard Faces

Know your data - many types of networks

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

RECENT developments in convolutional neural network. Partial Face Detection in the Mobile Domain. arxiv: v1 [cs.

Team G-RMI: Google Research & Machine Intelligence

Lecture 5: Object Detection

1 Overview Definitions (read this section carefully) 2

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Skin Lesion Attribute Detection for ISIC Using Mask-RCNN

Real-time Object Detection CS 229 Course Project

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

Classification of objects from Video Data (Group 30)

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

arxiv: v3 [cs.cv] 18 Oct 2017

YOLO9000: Better, Faster, Stronger

Object Recognition II

Object Detection on Self-Driving Cars in China. Lingyun Li

Yield Estimation using faster R-CNN

Visual features detection based on deep neural network in autonomous driving tasks

Unconstrained Face Alignment without Face Detection

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

POINT CLOUD DEEP LEARNING

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

A Comparison of CNN-based Face and Head Detectors for Real-Time Video Surveillance Applications

Semantic Segmentation

arxiv: v1 [cs.cv] 26 Jul 2018

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

arxiv: v3 [cs.cv] 24 Jan 2018

Volume 6, Issue 12, December 2018 International Journal of Advance Research in Computer Science and Management Studies

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

An Analysis of Scale Invariance in Object Detection SNIP

Building the Software 2.0 Stack. Andrej Karpathy May 10, 2018

Learning Deep Representations for Visual Recognition

RSRN: Rich Side-output Residual Network for Medial Axis Detection

arxiv: v1 [cs.cv] 19 Feb 2019

Fully Convolutional Networks for Semantic Segmentation

Project 3 Q&A. Jonathan Krause

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet

arxiv: v5 [cs.cv] 13 Apr 2018

Automatic detection of books based on Faster R-CNN

Large-scale Video Classification with Convolutional Neural Networks

Two-Stream Convolutional Networks for Action Recognition in Videos

Transcription:

Todo before next class Each project group should submit a short project report (4 pages presentation slides) including 1. Problem definition 2. Related work 3. Preliminary results 4. Future plan Submission: Email to chad.dechant@columbia.edu by April 5 Note: your slides will be put on course website. The submission must be PDF file, named by your group number

Deep Networks for Image Classification and Detection Liangliang Cao llcao.net/cu-deeplearning17 2

Outline Difference of vision and speech and NLP ImageNet and model adaptation Recent trends in industry and academia Two recent works HyperFace Mask R-CNN with ResNet 3

Image Recognition is Lucky Why? - Data: Images are easier to label than speech/language - Data: Fei-Fei et al. made a lot of to release ImageNet - Platform: Nvidia s cudnn standardizes most important comp. - Platform: A number of great toolkits built on cudnn 4

ImageNet LSVRC 5

Treasure from ImageNet Dataset By adapting models trained from ImageNet, we can build a decent classifier with limited data. Very few new label Tune the last layer Or last layer as feature for SVM Example code : http://caffe.berkeleyvision.org /gathered/examples/finetune_ flickr_style.html Enough new labels New tasks Tune the whole network 6

More Data, More Computation 2006 2010 2016 Caltech101, 8K Image ImageNet, 1.2M Image Yahoo YFCC, 100M Image Will this trend be plateaued or keep expanding? NVidia Stock Price 7

Deep Learning in Industry Data cleaning Startup examples More data Better model Model evolving 8

Deep Learning for Competition Kaggle small-scale image recognition Adapt several ImageNet models Dataaugmentation Study the failure examples, and find ways to conquer them ImageNet LSVRC Fuse complementary features using multi-gpu systems Identify the problem of existing models and fix it Larger scale (e.g., Youtube 8M video) Explore new scalable models 9

Plan for the remaining time: Fusing complementary features HyperFace Ranjan, Patel, Chellappa, arxiv 1603.01249, 2016 Integrating multiple tasks Mask R-CNN (w. ResNet) He, Gkioxari, Dollar, Girshick arxiv 1703.06870, 2017 Deeper understanding of the challenge for existing models 10

HyperFace A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition Rajeev Ranjan, Vishal M. Patel, Rama Chellappa arxiv 1603.01249, 2016 11

Tasks of HyperFace Face Detection Landmark Localization Pose Estimation Gender Recognition 12

HyperFace: Basic Idea Lower layers respond to edges and corners, and hence contain better localization properties higher layers are class-specific and suitable for semantic recognition including face recognition and gender. Features from lower and higher layers are complimentary. Fuse them! 13

HyperFace Network 14

Baseline 15

Loss function Detection Landmark location Visibility Pose Gender Total 16

Procedure 1. Selective search to get candidate regions 2. Normalize and scale each region 3. Predict the four tasks 4. Refine the prediction based on landmark detections 17

Face Detection Performance Face in the wild (AFW) 18

Face Detection Performance Face Detection Dataset and Benchmark (FDDB) 19

Landmark localization AFW 20

Landmark localization Annotated Facial Landmarks in the Wild (AFLW) 21

Landmark localization Annotated Facial Landmarks in the Wild (AFLW) 22

Pose Estimation AFW 23

Gender Recognition 24

Speed GTX Titan-X GPUs 3 seconds per image 2s for selective search to generate region proposals 0.2s for evaluate HyperFace network Questions or comments? Ok, my question: will a better region detector help HyperFace? 25

Mask R-CNN Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick arxiv 1703.06870, 2017 26

History Selective Search R-CNN Fast R-CNN Faster R-CNN Residual Network Mask R-CNN 27

R-CNN and Fast R-CNN R-CNN Fast R-CNN 200x faster than R- CNN in testing stage 28

Faster R-CNN à Mask R-CNN No longer use Selective Search Instead use network for region-proposal task Add a segmentation (mask) branch in addition to detection RoI pooling -> RoI aligment 29

RoI Pooling Layer Typical pooling layer: the size of output is (1/wh) of the input size. RoI pooling layer: the size of output is (7x7) no matter how large the input size is. for (int n=0; n<num_rois; n++){ for (int c = 0; c < channels_; ++c){ for (int ph = 0; ph < pooled_height_; ++ph){ for (int pw = 0; pw < pooled_width_; ++pw){ for (int h = hstart; h < hend; ++h){ for (int w = wstart; w < wend; ++w){ if (batch_data[index] > top_data[pool_index]) top_data[pool_index] = batch_data[index]; } } } } } 30

From RoI pooling to RoI align (source code pending) RoI pooling is not designed for pixel-pixel alignment RoI align Use bilinear interpolation instead of hard quantization Sample four locations per RoI bin and aggregate them The idea of RoI align seems simple but it may requires some efforts to implement efficiently on GPUs 31

Use Residual Network structure Benefits of Residual Network in ImageNet/COCO 2016 32

Why Residual Network? Problem: Is learning better networks as simple as stacking more layers? Deep network + residual learning can solve this problem. 33

Residual net 34

Back to Mask R-CNN Combine cost for Classification Detection Mask (new) Training speed: 32-40hours on 8GPU machine to train CoCo data Testing speed: 200ms per image on Tesla M40 35

Experiments 36

Summary of this class Now we have covered a number of vision applications: - Image classification (programming in class 3 ) - Face recognition/detection/alignment - Object detection/segmentation

Any questions so far? - No good results for your projects? - Problem with GoogleCloud/Paperspace? - Problem with Keras/TensorFlow/Caffe? - Others? 38

Todo before next class Each project group should submit a short project report (4 pages presentation slides) including 1. Problem definition 2. Related work 3. Preliminary results 4. Future plan Submission: Email to chad.dechant@columbia.edu by April 5 Note: your slides will be put on course website. The submission must be PDF file, named by your group number