Object Detection on Self-Driving Cars in China. Lingyun Li

Similar documents
Object Detection Based on Deep Learning

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Spatial Localization and Detection. Lecture 8-1

Lecture 5: Object Detection

Deep Learning for Object detection & localization

Yiqi Yan. May 10, 2017

R-FCN: Object Detection with Really - Friggin Convolutional Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

OBJECT DETECTION HYUNG IL KOO

Object detection with CNNs

Regionlet Object Detector with Hand-crafted and CNN Feature

YOLO 9000 TAEWAN KIM

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Yield Estimation using faster R-CNN

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

Rich feature hierarchies for accurate object detection and semantic segmentation

Multi-View 3D Object Detection Network for Autonomous Driving

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

RON: Reverse Connection with Objectness Prior Networks for Object Detection

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Instance-aware Semantic Segmentation via Multi-task Network Cascades

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

YOLO9000: Better, Faster, Stronger

Automatic detection of books based on Faster R-CNN

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

CAP 6412 Advanced Computer Vision

Fully Convolutional Networks for Semantic Segmentation

Project 3 Q&A. Jonathan Krause

Traffic Multiple Target Detection on YOLOv2

Optimizing Object Detection:

Lecture 7: Semantic Segmentation

Visual features detection based on deep neural network in autonomous driving tasks

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Semantic Segmentation

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

YOLO: You Only Look Once Unified Real-Time Object Detection. Presenter: Liyang Zhong Quan Zou

Rich feature hierarchies for accurate object detection and semant

Final Report: Smart Trash Net: Waste Localization and Classification

Unified, real-time object detection

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Object Detection in Sports Videos

Toward Scale-Invariance and Position-Sensitive Region Proposal Networks

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

arxiv: v1 [cs.cv] 16 Nov 2018

Creating Affordable and Reliable Autonomous Vehicle Systems

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

SEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL

Classification of objects from Video Data (Group 30)

A CLOSER LOOK: SMALL OBJECT DETECTION IN FASTER R-CNN. Christian Eggert, Stephan Brehm, Anton Winschel, Dan Zecha, Rainer Lienhart

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Learning to Segment Object Candidates

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Efficient Segmentation-Aided Text Detection For Intelligent Robots

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

Rich feature hierarchies for accurate object detection and semantic segmentation

Object Detection. Part1. Presenter: Dae-Yong

Learning to detect, localize and recognize many text objects in document images from few examples

TextBoxes++: A Single-Shot Oriented Scene Text Detector

WE are witnessing a rapid, revolutionary change in our

Joint Object Detection and Viewpoint Estimation using CNN features

Self Driving. DNN * * Reinforcement * Unsupervised *

Modern Convolutional Object Detectors

arxiv: v1 [cs.cv] 4 Jun 2015

DeepProposals: Hunting Objects and Actions by Cascading Deep Convolutional Layers

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

An Analysis of Scale Invariance in Object Detection SNIP

Paper Motivation. Fixed geometric structures of CNN models. CNNs are inherently limited to model geometric transformations

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

G-CNN: an Iterative Grid Based Object Detector

Flow-Based Video Recognition

Airport Detection Using End-to-End Convolutional Neural Network with Hard Example Mining

A Novel Representation and Pipeline for Object Detection

Computer Vision Lecture 16

G-CNN: an Iterative Grid Based Object Detector

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

arxiv: v1 [cs.cv] 15 Oct 2018

Real-time Object Detection CS 229 Course Project

FINE-GRAINED image classification aims to recognize. Fast Fine-grained Image Classification via Weakly Supervised Discriminative Localization

Advanced Video Analysis & Imaging

Martian lava field, NASA, Wikipedia

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou

Learning Spatial Context: Using Stuff to Find Things

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Object Recognition II

arxiv: v2 [cs.cv] 24 Mar 2019

Mimicking Very Efficient Network for Object Detection

Multiple Instance Detection Network with Online Instance Classifier Refinement

An Analysis of Scale Invariance in Object Detection SNIP

Category-level localization

Cascade Region Regression for Robust Object Detection

Deep Learning for Generic Object Detection: A Survey

Transcription:

Object Detection on Self-Driving Cars in China Lingyun Li

Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not used) 640 * 360 pixels Complex road conditions in China Annotation: Object category and Bounding box 4 Categories: Vehicle, Pedestrian, Cyclist, Traffic_lights Task: Predict bounding box, category, and confidence Randomly select 2000 images from 10000, as test/validation set.

Data Small objects Objects overlapped and cropped Poor image quality Poor annotations

Why is Object Detection Difficult? Image classification: Shift of an object inside an image is indiscriminate Favours translation-invariance (CNN) Object detection Describing how good the candidate box overlaps the object Need both translation-invariance and translation-variance Deep CNNs are less sensitive to translation

R-CNN: Region Proposal + CNN (2014)

SPPnet: Spatial Pyramid Pooling (2014) Fully-connected layers take fixed sized input CNN can take input of any size Pooling to fixed size after CNN Improvement: Can take image of any size Only run CNN once for input image

Fast R-CNN (2015) RoI (Region of Interest) pooling layer: a special type of SPP after CNN Run for each region proposal to get fixed size output Multi-task loss: Train category classifier and bounding box regression together

Faster R-CNN (2015) RPN: Region proposal network End-to-end training without separate region proposal Computation for fully connected layers after RPN is still separate for each RoI

Faster R-CNN (2015) RPN: Region Proposal Network Generate k different anchor boxes (RoI) for each 3*3 region on feature map Center of sliding window on feature map maps to center of Anchor box on original image

FCN: Fully Convolutional Networks (2016) For image semantic segmentation Convolutionalization Upsampling Skip Architecture

R-FCN (2016) Divide RoI into k2 grids k2*(c+1) score maps generated from Fully Convolutional Network RoI pooling generates k2*(c+1) scores

YOLO(2015) & YOLOv2 (2016) No Region Proposal Network Divide image into k*k grids Each grid responsible for object centered in that grid Fast Bad for small and overlapped objects YOLOv2 integrates YOLO & Faster-RCNN

Analysis & Evaluation For each category: Intersection over Union (IoU) threshold: 50% Average precision: 11-point average precision/recall Same as The PASCAL Visual Object Classes (VOC) Challenge Proportion of bounding boxes: Vehicles: 87% Pedestrian: 7% Cyclist: 6% Traffic_lights: 3% Evaluation: Weighted average precision Low IoU

Baseline Models trained on VOC 2007+2012 Classifier: ResNet101 > DarkNet19 > ZF Model Classifier FPS Vehicle (car + bus) Pedestrian (person) Cyclist (bicycle + motocycle) Traffic_light s (N/A) Weighted map Faster-RCNN ZF 7.87 0.4246 0.0695 0.0609 0 0.3779 R-FCN ResNet101 4.58 0.6144 0.1412 0.2033 0 0.55661 YOLOv2 DarkNet19 37.04 0.4466 0.0499 0.0543 0 0.395293

Train Setup: Caffe + AWS g3.16xlarge 4*Tesla M60, 64 vcpus, 488G RAM Modify network and data pipeline to fit our data Model Classifier Iterations Detection FPS Vehicle Pedestrian Cyclist Traffic_lights Weighted map R-FCN ResNet101 60000 4.57 0.8002 0.3184 0.5783 0.1215 0.7329 YOLOv2 DarkNet19 30000 27.8 0.8045 0.2335 0.4739 0.146 0.7249

Sample W-mAP vs Iterations

NMS (Non-Maximum Suppression) Remove duplicate boxes for same box IoU threshold 0.4 0.45 0.5 R-FCN 0.7273 0.7329 0.7316 YOLOv2 0.723 0.7249 0.7228

Soft-NMS (2017) W-mAP Threshold =0.45 Soft-NMS R-FCN 0.7329 0.7408 YOLOv2 0.7249 0.7231

Multi-Scale Training (YOLOv2 Only) Randomly scale input image to 704*352, 640*320, 576*288 or 512*256 R-FCN already has multi-scale anchors in Region Proposal Network Model Multi-Scale Iterations Vehicle Pedestrian Cyclist Traffic_lights Weighted map YOLOv2 Yes 30000 0.81 0.2459 0.4837 0.149 0.7314 YOLOv2 No 30000 0.8045 0.2335 0.4739 0.146 0.7249

Modify RPN Anchors (R-FCN Only) Original anchors (scale is based on input size of 1000*563): scale: [8, 16, 32] * 16 pixels Ratio: [0.5, 1, 2] RPN_MIN_SIZE = 16 pixels 9 anchors per sliding window Observations: A lot of small objects Objects with large ratio: pedestrian & cyclist Modified anchors: R-FCN Weighted map scale: [2, 4, 8, 16, 32] * 16 Before 0.7408 Ratio: [0.3, 0.5, 1, 2, 3] pixels After 0.7895 RPN_MIN_SIZE = 4 pixels 25 anchors per sliding window

Data Augmentation Crop Most objects appear in the bottom 75% Crop left bottom and right bottom (480*270) Discard bounding boxes that are cropped more than 75% Flip Results in 48000 training data Also tried to Stretch image, but failed to improve W-mAP Before After R-FCN 0.7895 0.7941 YOLOv2 0.7314 0.7388

Finally, Model Integration Detect with both YOLOv2 and R-FCN Remove overlapping box using Soft-NMS W-mAP Before R-FCN 0.7895 YOLOv2 0.7314 Integration 0.7912

Ground Truth

Detection

Ground Truth

Detection

Ground Truth

Detection

Ground Truth

Detection

Ground Truth

Detection

Ground Truth

Detection

Ground Truth

Detection

Ground Truth

Detection

Ground Truth

Detection

Future Work Clean data Fine-tuning for pedestrian, cyclist, and traffic_lights, will lose generalization Deformable-R-FCN (2017) OHEM: Online Hard Example Mining (2016) Stratified-OHEM (2017)

Q&A