Deep Learning Requirements for Autonomous Vehicles

Size: px

Start display at page:

Download "Deep Learning Requirements for Autonomous Vehicles"

Sophie Hensley
5 years ago
Views:

1 Deep Learning Requirements for Autonomous Vehicles Pierre Paulin, Director of R&D Synopsys Inc. Chipex, 1 May

2 Agenda Deep Learning and Convolutional Neural Networks for Embedded Vision Automotive Why deep learning vs conventional computer vision? State of the art CNN Algorithms Implementing Vision Processing on Embedded Systems Introduction to the DesignWare EV6x Embedded Vision Processor with CNN 2

Problem to Solve: Humans are Fallible Drivers Annual Global Road Crash Statistics Nearly 1.3 million people die in road crashes each year, on average 3,287 deaths a day.

3 Problem to Solve: Humans are Fallible Drivers Annual Global Road Crash Statistics Nearly 1.3 million people die in road crashes each year, on average 3,287 deaths a day. An additional million are injured or disabled Road crashes cost USD $518 billion globally, costing individual countries from 1-2% of their annual GDP About 94% of accidents caused by human error 2% environment, 2% mechanical, 2% margin error Cameras in Automobiles Rear View Camera (Intuitive Parking Assist) Front Camera (Pedestrian Detection, AEB/Automatic Emergency Braking) Surround View Cameras Interior camera Drowsiness / gaze detection Goal: To be More Responsive than a Human 3

Millions Autonomous Driving EVOLUTION 25 20 Global Autonomous Vehicle Sales

4 Millions Autonomous Driving EVOLUTION Global Autonomous Vehicle Sales Forecast L5 L4 L3 L2 L1 L0 0 7% % 5%

Technical Challenge: Inferring Information from a 2D Image (Frame) Example requirements: Performance Priority What is the highest performance I can get for Pedestrian Detection using an ADAS Front

5 Technical Challenge: Inferring Information from a 2D Image (Frame) Example requirements: Performance Priority What is the highest performance I can get for Pedestrian Detection using an ADAS Front Camera? Measured in TOPs or TMACs and/or FPS Power Priority How much CNN performance can I get for a power budget of 200mW? TMAC/s/W or TOP/s/W Area Priority How much embedded vision performance can I get in two mm2? mm2 or TMAC/mm2 Other concerns: External memory (DDR) bandwidth Software tools / Time to market 5

extraction followed by shallow learning Histogram of Oriented Gradients (HoG) Object appearance and shape

6 Traditional Computer Vision for Object Detection Histogram of oriented Gradients (HoG) example In the past, most pattern recognition tasks were performed on vector processing units with programs hand-tuned for feature extraction followed by shallow learning Histogram of Oriented Gradients (HoG) Object appearance and shape within an image can be described by the distribution of intensity gradients or edge directions. Shallow Learning e.g. SVM 6

Recent EV Market Trends & Perspective Displacement of traditional

p learning for improved accuracy i.e. Pedestrian Detection (HoG),

(CNN) based deep learning approaches Rapid evolution in deep

7 Recent EV Market Trends & Perspective Displacement of traditional vision algorithms with deep learning for improved accuracy i.e. Pedestrian Detection (HoG), Face detection (Viola-Jones) moving to Convolution Neural Network (CNN) based deep learning approaches Rapid evolution in deep learning technology to a wide set of applications Classification Localization Detection Semantic Segmentation 7

Deep Learning using CNN DNN network or graph Convolutional Neural Networks Takes advantage of spatial structure of image Shared weights/biases reduces number of parameters

8 Deep Learning using CNN DNN network or graph Convolutional Neural Networks Takes advantage of spatial structure of image Shared weights/biases reduces number of parameters Consists of convolution, pooling and fully connected layers Easier to train than fully connected network Current standard for embedded vision object detection CNN Architecture 8

9 CNN for Object Detection Low-level features Mid-level features High-level features classifier 9

10 CNN-based Object Classification Top 5 classes: 1: moped 2: motor scooter, scooter 3: barrow, garden cart, lawn cart, wheelbarrow 4: tricycle, trike, velocipede 5: crash helmet 10

11 Classification error Deep Learning Approaches Human Levels of Accuracy ImageNet Large Scale Visual Recognition Challenge Results 28% 26% 16% 12% AlexNet, 8 layers ZF, 8 layers VGG, 19 layers GoogLeNet, 22 layers ResNet,152 layers CUImage BDAT 7.3% 6.7% ILSVRC Competition Research teams compete to achieve higher accuracy on several visual recognition tasks Algorithms must identify images belonging to one of a thousand categories shallow deep 3.6% 3.0% % 2017 Human error 100% accuracy and reliability not realistic Traditional computer vision Deep learning computer vision 11

12 Yolo: Object Detection and Localization 12

13 CNN-based Denoiser Before After 13

High-Resolution Images Hengshuang Zhao, Xiaojuan

14 ICNet for Real-Time Semantic Segmentation Source: ICNet for Real-Time Semantic Segmentation on High-Resolution Images Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia 27 Apr

Neural Networks for Radar Waveform Recognition An automatic radar waveform recognition system to detect, track and locate low probability of intercept (LPI) radars.

15 Neural Networks for Radar Waveform Recognition An automatic radar waveform recognition system to detect, track and locate low probability of intercept (LPI) radars. The detected signals are processed into binary images which are resized for CNN The finished binary images are used in CNN and feature extraction Ming Zhang, Ming Diao, Lipeng Gao * and Lutao Liu 15

Area Area Embedded Systems VDSP Performance Power Area On-chip Cost efficient Energy

16 Choosing a Processor for Embedded Vision Embedded Vision Compute r Vision Machine Learning GPU FPG A Performance Performance Power Leading GPU Vendor: GoogLeNet 138 fps / 119W Power Area Area Embedded Systems VDSP Performance Power Area On-chip Cost efficient Energy efficient Real-time VDSP + CNN Performance Power Area Synopsys EV6x: GoogLeNet 200 fps / 0.5W (16 nm FFC) 16

17 DesignWare EV6x 17

18 DesignWare EV6x Embedded Vision Processor IP Scalable Hardware-Software Solution for High Accuracy Vision Processing Wide Vector DSP Processing - 8-, 16- & 32-bit datatypes - Up to 776 OPs/cycle (16b) - Up to 256 MACs/cycle (16b) - Easy OpenCL C programming Easy system integration - Control and communication - Low area and power - C/C++ programming Libraries (OpenCV) & API (OpenVX) Vision CPU (1/2/4 cores) 32-bit scalar SFPU EV6x Embedded Vision Processor Core 4 Core 3 Core 2 Core bit vector DSP VFPU MetaWare EV Compilers / Debuggers (C/C++, OpenCL C) Simulators (fast NSIM, EV VDK) CNN Mapping Tool CNN Engine (scalable) 3520 MAC Engine 1760 MAC Engine 880 MAC Engine Convolution Conv. 2D Classification Conv. 1D D M A High-performance CNN Engine - Up to 3520 MACs/cycle - Dedicated memory architecture, DMA - Multi-dimension parallelism - Supports 8 or 12 bit processing - Automatic programming tools Shared Memory - Low latency access from all EV cores and CNN engine - Shared Memory Visible to host Fast, Easy SoC connection - To host processor - To access frame data Sync & Debug Streaming Transfer Unit AXI Interconnect Shared Memory Background Memory Access - Load next frame in advance from on- or off-chip frame buffer 18

19 Denoiser Filter Results Close comparison of 12-bit and 8-bit accuracy Original versus Denoised PSNR: Peak Signal to Noise Ratio SSIM: Structural SIMilarity index ED: Euclidian Difference SSIM: 93.5% SSIM: 81.7% Original Noise added PSNR: 28.59, SSIM: , ED: Denoised (12b Fixed Point) PSNR: 21.98, SSIM: , ED: Denoised (8b Fixed Point) 20

EV6 Vector DSP and CNN Engine Benefits of Specialization

units, CNN Vector 1760 DSPs 12b MACs More MACs, higher accuracy,

Vector CNN Eng, 1024 8b MACs Running stacks, decision making,

Object Classification, Detection, Localization, Scene

20 EV6 Vector DSP and CNN Engine Benefits of Specialization Competing Implementations DesignWare EV62 + CNN1760 Scalar units, CNN Vector 1760 DSPs 12b MACs More MACs, higher accuracy, for same area Option A Option A Scalar unit Vector unit Scalar Vector CNN Eng, b MACs Running stacks, decision making, etc. Pixel Processing, Scaling / Pyramids, Filtering, etc. Object Classification, Detection, Localization, Scene Segmentation, etc. Option B Scalar unit Vector DSP Scalar unit Option B Vector unit b MACs 21

Summary Fast growing automotive applications DesignWare EV6x Processors ADAS and Autonomous Driving Deep learning techniques, like convolutional neural networks, offer the highest accuracy for object

21 Summary Fast growing automotive applications DesignWare EV6x Processors ADAS and Autonomous Driving Deep learning techniques, like convolutional neural networks, offer the highest accuracy for object classification, detection, and scene segmentation CNN replacing traditional computer vision algorithms Specialized CNN architecture offers area and power efficiencies, and higher accuracy for image quality improvement applications Unified multicore processor for automotive vision processing Scalar + vector DSP + CNN engine State-of-the-art convolutional neural network (CNN) 22

22 Thank You

Enabling Safe, Secure, Smarter Cars from Silicon to Software. Jeff Hutton Synopsys Automotive Business Development

Enabling Safe, Secure, Smarter Cars from Silicon to Software Jeff Hutton Synopsys Automotive Business Development Safe Secure Smarter Systemic Complexity ADAS Autonomous V2X Infotainment Safe Secure Smarter