Training models for road scene understanding with automated ground truth Dan Levi

Similar documents
Training models for road scene understanding with automated ground truth Dan Levi

Real-time category-based and general obstacle detection for autonomous driving

Measuring the World: Designing Robust Vehicle Localization for Autonomous Driving. Frank Schuster, Dr. Martin Haueis

Visual Perception for Autonomous Driving on the NVIDIA DrivePX2 and using SYNTHIA

Depth from Stereo. Dominic Cheng February 7, 2018

Creating Affordable and Reliable Autonomous Vehicle Systems

W4. Perception & Situation Awareness & Decision making

Towards Autonomous Vehicle. What is an autonomous vehicle? Vehicle driving on its own with zero mistakes How? Using sensors

Multi-View 3D Object Detection Network for Autonomous Driving

Collaborative Mapping with Streetlevel Images in the Wild. Yubin Kuang Co-founder and Computer Vision Lead

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

VISION FOR AUTOMOTIVE DRIVING

Non-flat Road Detection Based on A Local Descriptor

Towards Fully-automated Driving. tue-mps.org. Challenges and Potential Solutions. Dr. Gijs Dubbelman Mobile Perception Systems EE-SPS/VCA

Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet.

Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey

TorontoCity: Seeing the World with a Million Eyes

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Detecting the Unexpected: The Path to Road Obstacles Prevention in Autonomous Driving

Vision based autonomous driving - A survey of recent methods. -Tejus Gupta

Machine learning based automatic extrinsic calibration of an onboard monocular camera for driving assistance applications on smart mobile devices

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Self Driving. DNN * * Reinforcement * Unsupervised *

2 OVERVIEW OF RELATED WORK

Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

arxiv: v1 [cs.cv] 16 Mar 2018

LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION

CONTENT ENGINEERING & VISION LABORATORY. Régis Vinciguerra

Joint Object Detection and Viewpoint Estimation using CNN features

Pedestrian Detection Using Correlated Lidar and Image Data EECS442 Final Project Fall 2016

Turning an Automated System into an Autonomous system using Model-Based Design Autonomous Tech Conference 2018

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou

A System for Real-time Detection and Tracking of Vehicles from a Single Car-mounted Camera

Realtime Object Detection and Segmentation for HD Mapping

Semantic Segmentation

Calibration of a rotating multi-beam Lidar

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v2 [cs.cv] 12 Jul 2018

Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth and Motion Features

Automatic Dense Semantic Mapping From Visual Street-level Imagery

Computing the Stereo Matching Cost with CNN

Cloud-based Large Scale Video Analysis

Ground Plane Detection with a Local Descriptor

CAP 6412 Advanced Computer Vision

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

From 3D descriptors to monocular 6D pose: what have we learned?

Team Aware Perception System using Stereo Vision and Radar

Visual features detection based on deep neural network in autonomous driving tasks

MULTI-MODAL MAPPING. Robotics Day, 31 Mar Frank Mascarich, Shehryar Khattak, Tung Dang

EVALUATION OF DEEP LEARNING BASED STEREO MATCHING METHODS: FROM GROUND TO AERIAL IMAGES

Unified, real-time object detection

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Object Detection on Self-Driving Cars in China. Lingyun Li

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm

Depth Learning: When Depth Estimation Meets Deep Learning

Automotive 3D Object Detection Without Target Domain Annotations

Efficient L-Shape Fitting for Vehicle Detection Using Laser Scanners

When Big Datasets are Not Enough: The need for visual virtual worlds.

ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems (Supplementary Materials)

On Board 6D Visual Sensors for Intersection Driving Assistance Systems

Epipolar geometry-based ego-localization using an in-vehicle monocular camera

iqmulus/terramobilita benchmark on Urban Analysis

The Hilbert Problems of Computer Vision. Jitendra Malik UC Berkeley & Google, Inc.

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard

Simulation: A Must for Autonomous Driving

arxiv: v1 [cs.cv] 4 Aug 2017

S7348: Deep Learning in Ford's Autonomous Vehicles. Bryan Goodman Argo AI 9 May 2017

Flow Estimation. Min Bai. February 8, University of Toronto. Min Bai (UofT) Flow Estimation February 8, / 47

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Mapping with Dynamic-Object Probabilities Calculated from Single 3D Range Scans

Optical flow. Cordelia Schmid

Perception, Planning, Control, and Coordination for Autonomous Vehicles

Towards a visual perception system for LNG pipe inspection

Scene Understanding Networks for Autonomous Driving based on Around View Monitoring System

Regionlet Object Detector with Hand-crafted and CNN Feature

Quality Assurance and Quality Control Procedures for Survey-Grade Mobile Mapping Systems

Image processing techniques for driver assistance. Razvan Itu June 2014, Technical University Cluj-Napoca

Computer Vision: Making machines see

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

OpenStreetSLAM: Global Vehicle Localization using OpenStreetMaps

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Object Detection Based on Deep Learning

Flow-Based Video Recognition

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)

Developing Algorithms for Robotics and Autonomous Systems

Deep Face Recognition. Nathan Sun

3D Convolutional Neural Networks for Landing Zone Detection from LiDAR

Stereo Vision Based Traversable Region Detection for Mobile Robots Using U-V-Disparity

SpatialFuser. Offers support for GNSS and INS navigation systems. Supports multiple remote sensing equipment configurations.

Learning Semantic Environment Perception for Cognitive Robots

arxiv: v1 [cs.ro] 23 Feb 2018

Computer Vision at Cambridge: Reconstruction,Registration and Recognition

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

Tri-modal Human Body Segmentation

Supplementary Material for Sparsity Invariant CNNs

Transcription:

Training models for road scene understanding with automated ground truth Dan Levi With: Noa Garnett, Ethan Fetaya, Shai Silberstein, Rafi Cohen, Shaul Oron, Uri Verner, Ariel Ayash, Kobi Horn, Vlad Golder, Tomer Peer

GM Advanced Technical Center in Israel (ATCI)

Agenda Road scene understanding Acquiring training data with automated ground truth (AGT) Test cases: General obstacle detection Road segmentation General obstacle classification Curb detection Freespace Challenges and limitations Summary and future work

On-board road scene understanding Static: Road edge Road markings, complex lane understanding Signs Obstacles: clutter, construction zone cones Dynamic: Classified objects (cars, pedestrians, bicycles, animals ) General obstacles: animals, carts

Obstacle detection: general and category based

General obstacles, freespace, road segmentation - Road segmentation (vision) Mono-camera using semantic segmentation - General obstacle detection - Freespace (all non-flat road delimiters) 3D sensors (Stereo, Lidar)

Training Data Revisiting Unreasonable Effectiveness of Data in Deep Learning Era [Sun et al. 2017]

Manual Annotation The Cityscapes Dataset for Semantic Urban Scene Understanding [Cordts et al. 2016] Time: ~60 min per image ~1000 annotators

Computer graphics simulated data Photo-realism Scenario generation

Simulated data for optical flow FlowNet: Learning Optical Flow with Convolutional Networks [Dosovitskiy et. al. 2015]

Automated ground truth(agt) / Cross-sensor learning Velodyne LIDAR

Depth from a single image Semi-Supervised Deep Learning for Monocular Depth Map Prediction [Kuznietsov et al. 2017]

Map supervised road detection Map-supervised road detection [Laddha et al. 2016]

AGT for road scene understanding general setup Supervising sensors Target sensors Perquisite: Full alignment and synchronization between sensors

AGT for road scene understanding: scheme Supervising sensors: Target sensors: Task: object detection Data AGT: 1. Compute Task on Supervising sensors: - Offline - Temporal AGT: 2. Project output to target sensor domain Ground truth

Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability - Continuous (un-bounded) improvement 1. Challenging setup 2. Annotation quality / accuracy 3. Inherent limitations of supervisor : - Learning beyond supervisor capabilities - Learning from the same sensor (bootstrapping)

Supervising sensors: Velodyne HDL64 AGT for General obstacle detection Target sensors: Front camera Task: General obs. Det. Data AGT Ground truth

StixelNet: Monocular obstacle detection Levi, Dan, Noa Garnett, Ethan Fetaya. StixelNet : A Deep Convolutional Network for Obstacle Detection and Road Segmentation. In BMVC 2015.

StixelNet column based approach INPUT OUTPUT

AGT for obstacle detection version I KITTI Dataset [Geiger et al. 2013]

AGT for obstacle detection version I Velodyne LIDAR Raw images: 56 sequences (50 Train, 6 Test). 6,000 train images (every 5 th frame) and 800 test. Ground truth result: After GT: 331K training columns and 57K testing

AGT for general obstacles in image plane 1. Project Lidar points to image plane 2. Interpolate depth to all pixels 3. Find columns with depth profile typical to transition: road roughly vertical obstacle Figure taken from [Fernandes et al. 2015]

General obstacles AGT examples Limitations: - Cannot handle: close obstacles, clear columns - Low coverage (~30%)

StixelNet (v1) 5 Layer CNN Layer 1: convolution al Layer 2: convolution al Layer 3: fully connected Layer 4: fully connected Layer 5: fully connected 5 1 INPUT 37 0 11 24 5 3 45 64 3 5 Max poolin g (8X4) 11 200 Max poolin g (4X3) Dense Dense Dense 1024 2048 50 OUTPUT y

Experimental Results (Max Probability)

Comparison with stereo Stereo [Badino et al. 2009] StixelNet

Supervising sensors: AGT for Obstacle classification Target sensors: Front camera Task: Obstacle classification Data AGT Ground truth

AGT for obstacle classification Image based detection Lidar based verification Source: http://self-driving-future.com/the-eyes/velodyne/

Obstacle classification trained net result: pedestrians

AGT for General obstacle detection (ver 2) Supervising sensors: Velodyne HDL64 Target sensors: Front camera Task: General obs. Det. Data AGT Ground truth

Unified network: StixelNet + Object detection + Object pose estimation Noa Garnett, Shai Silberstein, Shaul Oron, Ethan Fetaya, Uri Verner, Ariel Ayash, Vlad Goldner, Rafi Cohen, Kobi Horn, Dan Levi. Real-time category-based and general obstacle detection for autonomous driving. CVRSUAD Workshop, ICCV2017.

Object-centric obstacle detection AGT Estimate and subtract road plane 3D Clustering (objects above 20cm) Detect clear columns: No object above 5cm + far enough returns near obstacles: 1. Below lidar coverage 2. During training Project to image, smooth Bottom contour via dynamic programming

General obstacles: old vs. new AGT

New general obstacle dataset with fisheye lens camera #images Kitti--train 6K 5M Internaltrain 16K #instances (columns) 20M Kitti-test 760 11K Internal-test 910 19K

StixelNet2: New network architecture

Improved results on KITTI

Experimental results with new AGT 1 0.8 0.6 0.4 0.2 Old New 0 Kitti - max Pr. Internal - max Pr. kitti - avg. Pr Internal - avg. Pr

Experimental results with new AGT Edge cases excluded ( near, clear ) 1 0.8 0.6 0.4 0.2 0 Chart Old New Title Kitti - max Pr. Internal - max Pr. kitti - avg. Pr Internal - avg. Pr

Cross dataset generalization 1 0.8 0.6 0.4 0.2 0 kitti-test max kitti-test avg internal-test max internal-test avg Train on KITTI Train on internal Train on both

Supervising sensors: AGT for car pose estimation Target sensors: Task: pose estimation IMU Data AGT Ground truth

AGT for pose estimation Multi sensor, temporal object detection 8 orientation bins pose representation Source: http://self-driving-future.com/theeyes/velodyne/ Dynamic Static

Pose estimation trained with mixed AGT and Manual

AGT for Curb detection

Curb detection trained net result examples

Supervising sensors: AGT for freespace Target sensors: Task: freespace Data AGT Ground truth

AGT for freespace with 3D beams Estimate and subtract road plane Analyze single Lidar Beam Velodyne Project limit to ground plane Velodyn e scan direction Project freespace limit to image plane, find near and clear

Obstacles vs. Freespace AGT

Freespace + object detection + car 3D pose

Freespace + object detection + car 3D pose

Freespace + object detection + car 3D pose

Freespace + object detection + car 3D pose

Freespace + object detection + car 3D pose

Finetuning from AGT: road segmentation 1. Fine-tune on KITTI Road segmentation (manually labelled) 2. Graph-cut segmentation 3. State-of-the-art accuracy among non-anonymous (94.88% MaxF)

AGT challenges: How accurate is the AGT?

AGT challenges: calibration, synchronization

AGT Perception mistakes Non-flat road Assumptions / coverage

Thank you!