S7105 ADAS/AD CHALLENGES: GPU SCHEDULING & SYNCHRONIZATION. Venugopala Madumbu, NVIDIA GTC D

Size: px

Start display at page:

Download "S7105 ADAS/AD CHALLENGES: GPU SCHEDULING & SYNCHRONIZATION. Venugopala Madumbu, NVIDIA GTC D"

Easter Conley
6 years ago
Views:

1 S7105 ADAS/AD CHALLENGES: GPU SCHEDULING & SYNCHRONIZATION Venugopala Madumbu, NVIDIA GTC D

2 ADVANCED DRIVING ASSIST SYSTEMS (ADAS) & AUTONOMOUS DRIVING (AD) High Compute Workloads Mapped to GPU 2

3 ADAS/AD Requirements & Challenges Real-Time Behavior Determinism Freedom from Interference Performance Maximum Throughput Minimal Latency Priority of Functionalities Multi-Core CPU GPU/DSP/HWA 3

ADAS/AD WORKLOADS Challenges Illustrated Scenario#1 Standalone Exec GL

Concurrent Exec GL Workload CUDA Workload > (X+Y) msec If so, How to

Workload over other While also having maximum throughput minimum

4 ADAS/AD WORKLOADS Challenges Illustrated Scenario#1 Standalone Exec GL Workload X msec Scenario#2 Standalone Exec CUDA Workload Scenario#3 Concurrent Exec GL Workload CUDA Workload > (X+Y) msec If so, How to Achieve determinism Achieve Freedom from interference Prioritize one Workload over other While also having maximum throughput minimum latency CUDA Workload GL Workload Y msec Time Shared GPU Execution Y msec X msec 4

5 GPU IN TEGRA High Level Tegra SoC Block Diagram CPU Host GPU Engines Other Clients (ISP, Display, etc.) CPU submits job/work to GPU GPU runs asynchronously to CPU GPU Memory Interface Memory Controller GPU has its own hardware scheduler (Host) It switches between workloads without CPU involvement DRAM 5

6 GPU SCHEDULING Concepts Channel independent stream of work on the GPU Command Push Buffer Command buffer written by Software and read by Hardware Channel Switching Save/restore GPU state on a channel switch Semaphores/SyncPoints Synchronization mechanism for events within the GPU Time Slice How long a GPU executes commands of a channel before a channel switch Run-list An ordered list of channels that SW wants the GPU to execute 6

7 GPU SCHEDULING Timesharing by Channel Switching Channel switching occurs when any ONE of the following happens: Time slice expires Engine runs out of work (no more commands) Blocked on a semaphore Channel Switch time = Drain Time + Save/Restore time Preemption can reduce Channel Switch times drastically App1 App2 App3 App4 Timesliced Round-Robin GPU GPU Occupancy Time 7

8 GPU SCHEDULING Preemption 8

9 GPU SCHEDULING Channel Switching with Time Slice Scenarios Channel 1 Time slice Channel Channel 1 1 Time slice Channel Channel 1 1 Time slice Channel Switch Timeout Channel Switch Timeout 1. Channel finishes before time slice expires Context switch to next channel Channel Reset 2. Channel preemption Stop all commands in pipeline Wait for engines to idle Higher Context Switch time 3. Channel Reset Engine could not idle and context could not save before channel switch timeout Callback to notify kernel of channel reset event 9

10 CHALLENGE REVISTED How can we achieve both? Real-Time behavior: Determinism Freedom from Interference Priority of Functionalities Performance: Maximum Throughput Minimal Latency 10

11 GPU SYNCHRONIZATION & SCHEDULING Software Control 1. User Driver Level (GPU Synchronization Approach) Syncpoints/Semaphores for Synchronization Through EglStreams, EGLSync etc 2. Kernel Driver Level (GPU Priority Scheduling Approach) Run-List Engineering How long channel runs Order of Channel execution 11

12 GPU SYNCHRONIZATION APPROACH No Synchronization Case Kernel launch GPU Semaphore GPU Priority GPU Task GPU Task Latency due to Concurrent Execution GPU Task CPU CPU Task CPU Task CPU Task msec 12

13 GPU SYNCHRONIZATION APPROACH Synchronization on CPU: Not good for GPU Kernel launch GPU Semaphore GPU Priority GPU Task GPU Task GPU Task CPU CPU Task CPU Task CPU Task msec 13

14 GPU SYNCHRONIZATION APPROACH Synchronization on GPU: No Context Switches Kernel launch GPU Semaphore GPU Priority GPU Task Delayed Start GPU Task GPU Task Determinism Freedom from Interference Priority of Functionalities CPU CPU Task CPU Task CPU Task msec 14

15 GPU PRIORITY SCHEDULING APPROACH Hypothetical Example TASK PRIORITY FPS WORST CASE EXECUTION TIME (WCET) H1 High 60 9ms H1 M1 Medium 30 4ms M1 M2 Medium 30 4ms M2 L1 Low/Best Effort 30 10ms L1 15

16 GPU PRIORITY SCHEDULING APPROACH Engineered Run-list and Time Slice Ensuring FPS and Latency H1 H1 H1 (Max Exec Time = 9 ms) M1 (Max Exec Time = 4 ms) M1 M1 Time slice = 9 ms Time slice = 3 ms M2 M2 L1 M2 (Max Exec Time = 4 ms) Time slice = 3 ms L1 (Max Exec Time = 10 ms) Time slice = 1 ms Run-List Ensured not >16ms for 60fps operation Work on GPU Time 16

17 GPU PRIORITY SCHEDULING APPROACH Reduce Latency for GPU Work Completion Ensure timeslice is long enough to complete work Ensure work is continually submitted and also well ahead in time To Avoid GPU idle time Unnecessary context switches 17

18 GPU SCHEDULING Best Practices to Keep GPU Busy Submit work in advance So the GPU has some work to execute at any point of time Try to reduce/eliminate work dependencies Have contingency plan for work overload If feedback shows over budget, submit work few frames ahead and spread Plan for worst case scenario Deal with GPU reset case esp for the Low priority cases GL Robustness Extensions 18

19 CONCLUSION GPU Synchronization & Scheduling Approaches Real-Time behavior: Determinism Freedom from Interference Priority of Functionalities Performance: Maximum Throughput Minimal Latency 19

20 Scott Whitman, NVIDIA Vladislav Buzov, NVIDIA Amit Rao, NVIDIA Yogesh Kini, NVIDIA ACKNOWLEDGEMENTS GTC Instructor led Lab:: L7105 EGLSTREAMS : INTEROPERABILITY OF CAMERA, CUDA AND OPENGL 11 TH MAY :30-11:30AM LL21D 20

21 Q&A 21

22 THANK YOU

EGLSTREAMS: INTEROPERABILITY FOR CAMERA, CUDA AND OPENGL. Debalina Bhattacharjee Sharan Ashwathnarayan

53023 - EGLSTREAMS: INTEROPERABILITY FOR CAMERA, CUDA AND OPENGL Debalina Bhattacharjee Sharan Ashwathnarayan Tegra SOC and typical use-cases Why Interops EGLStream and Its Key Features Agenda Examples