Visual Perception for Autonomous Driving on the NVIDIA DrivePX2 and using SYNTHIA

Visual Perception for Autonomous Driving on the NVIDIA DrivePX2 and using SYNTHIA Dr. Juan C. Moure Dr. Antonio Espinosa http://grupsderecerca.uab.cat/hpca4se/en/content/gpu http://adas.cvc.uab.es/elektra/ http://www.synthia dataset.net

Our Background & Current Research Work 2 Computer Architecture Group: GPU acceleration: Bioinformatics, CV, Image Compression Computer Vision Group: CV Algorithms + Deep Learning for Camera-based ADAS GOAL: Camera-based Perception for Autonomous Driving Robotized car GPU-accelerated algorithms Deep Learning & Simulation Infrastructure (SYNTHIA) Elektra Car + DrivePX2

Overview of Presentation 3 GPU Accelerated Perception Depth Computation Semantic & Slanted stixels (Collaboration with Daimler) Speed up MAP estimation problem solved by DP using CNNs SYNTHIA toolkit New datasets, new ground-truth data, LIDARs

Stereo Vision for Depth Computation 4 10 meters Disparity: distance between same point in left & right images higher disparity = Objects are closer

SemiGlobal Matching (SGM) on GPU: Parallelism 5 Large Grain Parallelism Fine Grain Parallelism Matching Cost Smoothed Cost [Hernández ICCS 2016] Medium Grain Parallelism

SGM on GPU: Results 6 150 Performance ( Frames / Second, fps ) 100 50 960x360 1280x480 1920x720 real-time 0 Tegra X1 (DrivePX) SGM: 4 path directions Tegra Parker (DrivePX2) Maximum Disparity= Image Height / 4 Tegra Parker improves performance 4x vs Tegra X1: 3.5x Higher Effective Memory Bandwidth Higher execution overlap among kernels

Stixel World: Compact representation of the world Sky Obj. Stereo Images slope horizon Stereo Disparity Stixels Obj. Stixel = Stick + Pixel Obj. Fixed width, variable number of stixels per column First proposed by a Research Group in DAIMLER [Pfeiffer BMVC 2011] Grnd

Semantic Stixels: Unified approach Sky Buil. slope horizon Stereo Disparity Stereo Images Semantic Stixels Ped. Side Semantic segmentation [Schneider IV 2016] Road

Enhanced model: Slanted Stixels 9 Stixels Slanted Stixels MAP estimation Problem joining Semantic & Depth Bayesian Model (converted to energy minimization) Stixel Disparity Model includes slant b:, Redefine Energy function (log-likelihood) : log Enforces prior assumptions: no sky below horizon, objects stand on road Best Industrial Paper [Hernández BMVC 2017]

New SYNTHIA-San Francisco dataset 10 SF city designed with SYNTHIA toolkit 2224 Photorealistic images featuring slanted roads, with pixel level depth & semantic ground truth Very expensive to generate equivalent real data images

Results: Quantitative & Visual 11 Accuracy Results on SYNTHIA SF 3D representation Disparity Error (%) from 30.9 to 12.9 IoU (%) from 46 to 48.5 Accuracy remained the same for other datasets Left Image Original Stixels Slanted Stixels

Computation Complexity: Dynamic Programming Disparity Image Ground Object Sky h 12 Pixel Size Semantic Segmentation Each column processed independently Dynamic programming strategy for efficient evaluation of all the possible configurations Work Complexity (per column) ( h 2 ), h = image height

Stixel (DP) Algorithm on GPU: Parallelism 13 Large Grain parallelism CTA CTA h Medium and Fine Grain parallelism h.. Stereo Disparity step 1 step 2 step 3.. step h Sequential Operation with Decreasing Parallelism

Performance Results 14 Performance ( Frames/Second, fps ) Performance ( Frames/Second, fps ) 300 250 200 150 100 50 0 960x360 1280x480 1920x720 53 24 7 369 164 49 300 250 200 150 100 50 0 17 960x360 1280x480 1920x720 8 107 49 3 17 Tegra X1 (DrivePX) Tegra Parker (DrivePX2) Tegra X1 (DrivePX) Tegra Parker (DrivePX2) Original Stixel Model Slanted + Semantic Stixel Model (includes time for semantic inference) Real time performance on DrivePX2 for all image sizes ( 6x 7x on DrivePX2 vs DrivePX ) Complex Stixel Model: 60 70% of time for Stixel algorithm + 30 40% of time for semantic inference

Improving Computation Complexity: Pre-segmentation 15 Disparity Image Ground Object Sky h h Semantic Segmentation NAÏVE Pre-segmentation ( h ) Infer possible Stixel cuts (pre segmentation) from inputs Avoid checking all possible Stixel combinations Work Complexity (column) ( h h ), h << h Accuracy degrades 10-20% when using pre-segmentation

Pre-segmentation using a DNN 16 Disparity Image Ground Object Sky h h Semantic Segmentation DNN-based Pre-segmentation Infer possible Stixel cuts from inputs by using general data relations (among columns) Now accuracy improves slightly when using pre-segmentation

Improved Performance Results 17 Performance ( Frames/Second, fps ) Performance ( Frames/Second, fps ) 300 250 200 150 100 50 0 17 960x360 1280x480 1920x720 8 Tegra X1 (DrivePX) 107 49 3 17 Tegra Parker (DrivePX2) 300 250 200 150 100 50 0 960x360 1280x480 1920x720 37 17 7 Tegra X1 (DrivePX) 193 87 35 Tegra Parker (DrivePX2) Slanted + Semantic Stixel Model (includes time for semantic inference) + Pre segmentation Improves performance on both DrivePX and DrivePX2 ( 2x ) Now 15-30% of time for Stixel algorithm + 70-85% of time for semantic inference Inference time increase almost neglectable ( <10% ) Most of the CNN for pre segmentation is shared with CNN for semantic segmentation

SYNTHIA Dataset Toolkit 18 Image generator of precise annotated data for training DNNs on Autonomous Driving tasks Ground truth data: RGB & Per pixel: depth, semantic class, optical flow, 3D rounding boxes Fully compatible with Cityscapes classes Generation of LIDAR data Problem Customization: Synthia SanFrancisco www.synthia dataset.net

Summary: Real sequence video 19

Thank you Dr. Juan C. Moure juancarlos.moure@uab.es http://grupsderecerca.uab.cat/hpca4se/en/content/gpu Autonomous University of Barcelona