Visual Perception for Autonomous Driving on the NVIDIA DrivePX2 and using SYNTHIA

Similar documents
Embedded real-time stereo estimation via Semi-Global Matching on the GPU

Training models for road scene understanding with automated ground truth Dan Levi

Multi-View 3D Object Detection Network for Autonomous Driving

Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet.

Towards Fully-automated Driving. tue-mps.org. Challenges and Potential Solutions. Dr. Gijs Dubbelman Mobile Perception Systems EE-SPS/VCA

Computing the Stereo Matching Cost with CNN

W4. Perception & Situation Awareness & Decision making

S7348: Deep Learning in Ford's Autonomous Vehicles. Bryan Goodman Argo AI 9 May 2017

Realtime Object Detection and Segmentation for HD Mapping

TorontoCity: Seeing the World with a Million Eyes

Measuring the World: Designing Robust Vehicle Localization for Autonomous Driving. Frank Schuster, Dr. Martin Haueis

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017

GPU Coder: Automatic CUDA and TensorRT code generation from MATLAB

Cloud-based Large Scale Video Analysis

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Adaptive Learning of an Accurate Skin-Color Model

Autonomous Driving Solutions

Fast scene understanding and prediction for autonomous platforms. Bert De Brabandere, KU Leuven, October 2017

Dense matching GPU implementation

Layered Interpretation of Street View Images

arxiv: v1 [cs.cv] 2 Apr 2017

Training models for road scene understanding with automated ground truth Dan Levi

RGBd Image Semantic Labelling for Urban Driving Scenes via a DCNN

Evaluation of Different Methods for Using Colour Information in Global Stereo Matching Approaches

Flow-Based Video Recognition

Detecting the Unexpected: The Path to Road Obstacles Prevention in Autonomous Driving

Semantic Segmentation

Towards Autonomous Vehicle. What is an autonomous vehicle? Vehicle driving on its own with zero mistakes How? Using sensors

Deep learning in MATLAB From Concept to CUDA Code

The Stixel World - A Compact Medium Level Representation of the 3D-World

Segmentation. Bottom up Segmentation Semantic Segmentation

Creating Affordable and Reliable Autonomous Vehicle Systems

Supplementary Material for Sparsity Invariant CNNs

Collaborative Mapping with Streetlevel Images in the Wild. Yubin Kuang Co-founder and Computer Vision Lead

National Science Foundation Engineering Research Center. Bingcai Zhang BAE Systems San Diego, CA

ELL 788 Computational Perception & Cognition July November 2015

Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey

Scene Segmentation in Adverse Vision Conditions

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Object Detection on Self-Driving Cars in China. Lingyun Li

Automatic Dense Semantic Mapping From Visual Street-level Imagery

Depth from Stereo. Dominic Cheng February 7, 2018

Non-flat Road Detection Based on A Local Descriptor

EVALUATION OF DEEP LEARNING BASED STEREO MATCHING METHODS: FROM GROUND TO AERIAL IMAGES

Learning for Active 3D Mapping

Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth and Motion Features

Multiple View Geometry

Regionlet Object Detector with Hand-crafted and CNN Feature

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

AUTONOMOUS DRONE NAVIGATION WITH DEEP LEARNING

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

GTC Interaction Simplified. Gesture Recognition Everywhere: Gesture Solutions on Tegra

Priors for Stereo Vision under Adverse Weather Conditions

On Board 6D Visual Sensors for Intersection Driving Assistance Systems

Joint Object Detection and Viewpoint Estimation using CNN features

2 OVERVIEW OF RELATED WORK

Exploiting Traffic Scene Disparity Statistics for Stereo Vision

Small is the New Big: Data Analytics on the Edge

Self Driving. DNN * * Reinforcement * Unsupervised *

Ground Plane Detection with a Local Descriptor

Autonomous Robot Navigation: Using Multiple Semi-supervised Models for Obstacle Detection

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm

Automatic Tracking of Moving Objects in Video for Surveillance Applications

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

TERRAIN ANALYSIS FOR BLIND WHEELCHAIR USERS: COMPUTER VISION ALGORITHMS FOR FINDING CURBS AND OTHER NEGATIVE OBSTACLES

Lecture 10 Multi-view Stereo (3D Dense Reconstruction) Davide Scaramuzza

Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

RGBD Occlusion Detection via Deep Convolutional Neural Networks

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Introduction to Computer Vision. Srikumar Ramalingam School of Computing University of Utah

Gradient of the lower bound

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs

Stereo Human Keypoint Estimation

Segmentation Based Stereo. Michael Bleyer LVA Stereo Vision

Simulation: A Must for Autonomous Driving

Region-based Segmentation and Object Detection

Vision based autonomous driving - A survey of recent methods. -Tejus Gupta

arxiv: v2 [cs.cv] 12 Sep 2018

CEng Computational Vision

arxiv: v2 [cs.cv] 14 May 2018

Static Scene Reconstruction

Stereo Vision II: Dense Stereo Matching

Automated Driving Development

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou

A Study of Vehicle Detector Generalization on U.S. Highway

Stereo camera de-calibration detection based on observing kinematic attributes of detected objects and the camera rig

Geometric Reconstruction Dense reconstruction of scene geometry

All human beings desire to know. [...] sight, more than any other senses, gives us knowledge of things and clarifies many differences among them.

COS Lecture 10 Autonomous Robot Navigation

CIS 580, Machine Perception, Spring 2015 Homework 1 Due: :59AM

Face Detection and Tracking Control with Omni Car

Color-based Free-Space Segmentation using Online Disparity-supervised Learning

Implementing Deep Learning for Video Analytics on Tegra X1.

Deep Learning Accelerators

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

Transcription:

Visual Perception for Autonomous Driving on the NVIDIA DrivePX2 and using SYNTHIA Dr. Juan C. Moure Dr. Antonio Espinosa http://grupsderecerca.uab.cat/hpca4se/en/content/gpu http://adas.cvc.uab.es/elektra/ http://www.synthia dataset.net

Our Background & Current Research Work 2 Computer Architecture Group: GPU acceleration: Bioinformatics, CV, Image Compression Computer Vision Group: CV Algorithms + Deep Learning for Camera-based ADAS GOAL: Camera-based Perception for Autonomous Driving Robotized car GPU-accelerated algorithms Deep Learning & Simulation Infrastructure (SYNTHIA) Elektra Car + DrivePX2

Overview of Presentation 3 GPU Accelerated Perception Depth Computation Semantic & Slanted stixels (Collaboration with Daimler) Speed up MAP estimation problem solved by DP using CNNs SYNTHIA toolkit New datasets, new ground-truth data, LIDARs

Stereo Vision for Depth Computation 4 10 meters Disparity: distance between same point in left & right images higher disparity = Objects are closer

SemiGlobal Matching (SGM) on GPU: Parallelism 5 Large Grain Parallelism Fine Grain Parallelism Matching Cost Smoothed Cost [Hernández ICCS 2016] Medium Grain Parallelism

SGM on GPU: Results 6 150 Performance ( Frames / Second, fps ) 100 50 960x360 1280x480 1920x720 real-time 0 Tegra X1 (DrivePX) SGM: 4 path directions Tegra Parker (DrivePX2) Maximum Disparity= Image Height / 4 Tegra Parker improves performance 4x vs Tegra X1: 3.5x Higher Effective Memory Bandwidth Higher execution overlap among kernels

Stixel World: Compact representation of the world Sky Obj. Stereo Images slope horizon Stereo Disparity Stixels Obj. Stixel = Stick + Pixel Obj. Fixed width, variable number of stixels per column First proposed by a Research Group in DAIMLER [Pfeiffer BMVC 2011] Grnd

Semantic Stixels: Unified approach Sky Buil. slope horizon Stereo Disparity Stereo Images Semantic Stixels Ped. Side Semantic segmentation [Schneider IV 2016] Road

Enhanced model: Slanted Stixels 9 Stixels Slanted Stixels MAP estimation Problem joining Semantic & Depth Bayesian Model (converted to energy minimization) Stixel Disparity Model includes slant b:, Redefine Energy function (log-likelihood) : log Enforces prior assumptions: no sky below horizon, objects stand on road Best Industrial Paper [Hernández BMVC 2017]

New SYNTHIA-San Francisco dataset 10 SF city designed with SYNTHIA toolkit 2224 Photorealistic images featuring slanted roads, with pixel level depth & semantic ground truth Very expensive to generate equivalent real data images

Results: Quantitative & Visual 11 Accuracy Results on SYNTHIA SF 3D representation Disparity Error (%) from 30.9 to 12.9 IoU (%) from 46 to 48.5 Accuracy remained the same for other datasets Left Image Original Stixels Slanted Stixels

Computation Complexity: Dynamic Programming Disparity Image Ground Object Sky h 12 Pixel Size Semantic Segmentation Each column processed independently Dynamic programming strategy for efficient evaluation of all the possible configurations Work Complexity (per column) ( h 2 ), h = image height

Stixel (DP) Algorithm on GPU: Parallelism 13 Large Grain parallelism CTA CTA h Medium and Fine Grain parallelism h.. Stereo Disparity step 1 step 2 step 3.. step h Sequential Operation with Decreasing Parallelism

Performance Results 14 Performance ( Frames/Second, fps ) Performance ( Frames/Second, fps ) 300 250 200 150 100 50 0 960x360 1280x480 1920x720 53 24 7 369 164 49 300 250 200 150 100 50 0 17 960x360 1280x480 1920x720 8 107 49 3 17 Tegra X1 (DrivePX) Tegra Parker (DrivePX2) Tegra X1 (DrivePX) Tegra Parker (DrivePX2) Original Stixel Model Slanted + Semantic Stixel Model (includes time for semantic inference) Real time performance on DrivePX2 for all image sizes ( 6x 7x on DrivePX2 vs DrivePX ) Complex Stixel Model: 60 70% of time for Stixel algorithm + 30 40% of time for semantic inference

Improving Computation Complexity: Pre-segmentation 15 Disparity Image Ground Object Sky h h Semantic Segmentation NAÏVE Pre-segmentation ( h ) Infer possible Stixel cuts (pre segmentation) from inputs Avoid checking all possible Stixel combinations Work Complexity (column) ( h h ), h << h Accuracy degrades 10-20% when using pre-segmentation

Pre-segmentation using a DNN 16 Disparity Image Ground Object Sky h h Semantic Segmentation DNN-based Pre-segmentation Infer possible Stixel cuts from inputs by using general data relations (among columns) Now accuracy improves slightly when using pre-segmentation

Improved Performance Results 17 Performance ( Frames/Second, fps ) Performance ( Frames/Second, fps ) 300 250 200 150 100 50 0 17 960x360 1280x480 1920x720 8 Tegra X1 (DrivePX) 107 49 3 17 Tegra Parker (DrivePX2) 300 250 200 150 100 50 0 960x360 1280x480 1920x720 37 17 7 Tegra X1 (DrivePX) 193 87 35 Tegra Parker (DrivePX2) Slanted + Semantic Stixel Model (includes time for semantic inference) + Pre segmentation Improves performance on both DrivePX and DrivePX2 ( 2x ) Now 15-30% of time for Stixel algorithm + 70-85% of time for semantic inference Inference time increase almost neglectable ( <10% ) Most of the CNN for pre segmentation is shared with CNN for semantic segmentation

SYNTHIA Dataset Toolkit 18 Image generator of precise annotated data for training DNNs on Autonomous Driving tasks Ground truth data: RGB & Per pixel: depth, semantic class, optical flow, 3D rounding boxes Fully compatible with Cityscapes classes Generation of LIDAR data Problem Customization: Synthia SanFrancisco www.synthia dataset.net

Summary: Real sequence video 19

Thank you Dr. Juan C. Moure juancarlos.moure@uab.es http://grupsderecerca.uab.cat/hpca4se/en/content/gpu Autonomous University of Barcelona