PERFORMANCE OPTIMIZATIONS FOR AUTOMOTIVE SOFTWARE

Similar documents
April 4-7, 2016 Silicon Valley

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

CUDA Development Using NVIDIA Nsight, Eclipse Edition. David Goodwin

NVIDIA Parallel Nsight. Jeff Kiel

Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition. Jeff Kiel Director, Graphics Developer Tools

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

PERFWORKS A LIBRARY FOR GPU PERFORMANCE ANALYSIS

! Readings! ! Room-level, on-chip! vs.!

Profiling and Debugging Games on Mobile Platforms

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Lecture 13: OpenGL Shading Language (GLSL)

April 4-7, 2016 Silicon Valley. CUDA DEBUGGING TOOLS IN CUDA8 Vyas Venkataraman, Kudbudeen Jalaludeen, April 6, 2016

S CUDA on Xavier

GeForce3 OpenGL Performance. John Spitzer

Programming shaders & GPUs Christian Miller CS Fall 2011

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Coding OpenGL ES 3.0 for Better Graphics Quality

Creating outstanding digital cockpits with Qt Automotive Suite

The Witness on Android Post Mortem. Denis Barkar 3 March, 2017

Dave Shreiner, ARM March 2009

Shaders. Slide credit to Prof. Zwicker

Working with Metal Overview

THE LEADER IN VISUAL COMPUTING

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Accelerating Realism with the (NVIDIA Scene Graph)

PowerVR Hardware. Architecture Overview for Developers

GPU Memory Model. Adapted from:

OpenGL on Android. Lecture 7. Android and Low-level Optimizations Summer School. 27 July 2015

Optimisation. CS7GV3 Real-time Rendering

Mention driver developers in the room. Because of time this will be fairly high level, feel free to come talk to us afterwards

Raise your VR game with NVIDIA GeForce Tools

NSIGHT ECLIPSE EDITION

Current Trends in Computer Graphics Hardware

Hardware-driven Visibility Culling Jeong Hyun Kim

ARM. Mali GPU. OpenGL ES Application Optimization Guide. Version: 2.0. Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555B (ID051413)

Spring 2011 Prof. Hyesoon Kim

Android PerfHUD ES quick start guide

NSIGHT ECLIPSE EDITION

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

ARM. Mali GPU. OpenGL ES Application Optimization Guide. Version: 3.0. Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555C (ID102813)

CS427 Multicore Architecture and Parallel Computing

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

GDC 2014 Barthold Lichtenbelt OpenGL ARB chair

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Fast Interactive Sand Simulation for Gesture Tracking systems Shrenik Lad

ClearSpeed Visual Profiler

Siggraph Agenda. Usability & Productivity. FX Composer 2.5. Usability & Productivity 9/12/2008 9:16 AM

Mobile graphics API Overview

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

Baback Elmieh, Software Lead James Ritts, Profiler Lead Qualcomm Incorporated Advanced Content Group

Real-Time Support for GPU. GPU Management Heechul Yun

CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Seamless Compute and OpenGL Graphics Development in NVIDIA Nsight 3.0 Visual Studio Edition and Beyond 3/20/2013

S WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS. Jakob Progsch, Mathias Wagner GTC 2018

Streaming Massive Environments From Zero to 200MPH

Achieving High-performance Graphics on Mobile With the Vulkan API

CUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION. Julien Demouth, NVIDIA Cliff Woolley, NVIDIA

Saving the Planet Designing Low-Power, Low-Bandwidth GPUs

Expected talk length: 30 minutes

GPU-accelerated similarity searching in a database of short DNA sequences

Get the most out of the new OpenGL ES 3.1 API. Hans-Kristian Arntzen Software Engineer

Beyond Hardware IP An overview of Arm development solutions

OpenGL BOF Siggraph 2011

DEFERRED RENDERING STEFAN MÜLLER ARISONA, ETH ZURICH SMA/

The Application Stage. The Game Loop, Resource Management and Renderer Design

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Introduction to CUDA programming

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

Analyze and Optimize Windows* Game Applications Using Intel INDE Graphics Performance Analyzers (GPA)

PowerVR Performance Recommendations. The Golden Rules

SLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES

Mikkel Gjøl Graphics

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

Copyright Khronos Group Page 1

Mobile HW and Bandwidth

Low-Overhead Rendering with Direct3D. Evan Hart Principal Engineer - NVIDIA

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

NVIDIA Developer Tools for Graphics and PhysX

GPGPU. Peter Laurens 1st-year PhD Student, NSC

ASYNCHRONOUS SHADERS WHITE PAPER 0

Overview. Technology Details. D/AVE NX Preliminary Product Brief

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University

PowerVR Performance Recommendations The Golden Rules. October 2015

Data-Parallel Algorithms on GPUs. Mark Harris NVIDIA Developer Technology

LATTICE-BOLTZMANN AND COMPUTATIONAL FLUID DYNAMICS

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Tesla GPU Computing A Revolution in High Performance Computing

PowerVR Series5. Architecture Guide for Developers

Lecture 25: Board Notes: Threads and GPUs

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

CS179 GPU Programming Introduction to CUDA. Lecture originally by Luke Durant and Tamas Szalay

Ray Tracing with Multi-Core/Shared Memory Systems. Abe Stephens

User Guide. GLExpert NVIDIA Performance Toolkit

NVSG NVIDIA Scene Graph

RAMSES. TECHNICAL OVERVIEW.

Transcription:

April 4-7, 2016 Silicon Valley PERFORMANCE OPTIMIZATIONS FOR AUTOMOTIVE SOFTWARE Pradeep Chandrahasshenoy, Automotive Solutions Architect, NVIDIA Stefan Schoenefeld, ProViz DevTech, NVIDIA 4 th April 2016

SESSION OVERVIEW Overview of the methodologies to optimize Automotive HMI application Introduction to Tegra Profiler Tools Case study: QT5 OSS samples 2

WHY OPTIMIZE? Performance is User Experience SOFTWARE DEFINED CAR IDEAL VS REALITY Lines of Source Code (in Millions) 100% ~In Luxury Car ** ~IVI System ** 80% 60% Linux Kernel 4.x* 40% 0 50 100 Boeing 787 ** NASA Mars Rover# 20% 0% Ideal Car Computers Today's Car Computer * Source: Linux kernel Wikipedia page: https://en.wikipedia.org/wiki/linux_kernel#lines_of_code # Monitoring the Execution of Space Craft Flight Software, NASA ** IEEE: Automotive Designline Used Processing Available Headroom 3

WHY OPTIMIZE? COMPLEXITY MULTI-TASKING & MULTI- RENDERING CONTEXTS PIXEL EXPLOSION & MULTI- DISPLAY SYNCHRONIZATION 4

HOW TO OPTIMIZE? METHODOLOGY IDENTIFYING BOTTLENECKS WHAT'S NEEDED CPU GPU Memory Bandwidth HW Accelerators Other System level Your application & libraries Instrumentation How much time spend in every module? Third party libraries Drivers Tools What do they do? 5

HOW TO OPTIMIZE? KEY TEGRA TOOLS OVERVIEW TEGRA SYSTEM PROFILER TEGRA GRAPHICS DEBUGGER 6

TEGRA SYSTEM PROFILER (TSP) Multi-core CPU profiler for Tegra TEGRA SYSTEM PROFILER Easily prepare a device and deploy application for profiling Quickly identify CPU hot spots, hot paths and L1/L2 cache issues Visualize multi-core CPU activities with a new timeline view Maximize multi-core CPU utilization Visualize CPU, GPU and EMC frequencies Visualize thread state 7

TEGRA GRAPHICS DEBUGGER (TGD) A Console-grade tool to debug & profile OpenGL ES TGD enables graphics development, TEGRA GRAPHICS DEBUGGER debugging & optimization on Tegra devices for OpenGL ES 2.0, 3.0 & 3.1 applications. Identifying performance bottlenecks and GPU utilization Interactive examination of GPU pipeline state Real-time examination of draw calls 8

PROFILING SETUP OVERVIEW DRIVE CX WITH LINUX SSH Display Output HOST PC DRIVE CX DISPLAY 9

QT5: CASE STUDY With QT5 Samples BIG SCENE (qt3d) PLANETS (qt3d, qml, quick) Lots of small geometry Many draw calls Scene graph usage GPU intensive Optimizing GL call stack Tools showcase: TSP, NVTX Tools showcase: TGD 10

QT3D RENDER.CPP Attribute *Renderer::updateBuffersAndAttributes(Geometry *geometry, RenderCommand *command, GLsizei &count, bool forceupdate) { Attribute *indexattribute = Q_NULLPTR; uint estimatedcount = 0; m_dirtyattributes.reserve(m_dirtyattributes.size() + geometry->attributes().size()); Q_FOREACH (const QNodeId &attributeid, geometry->attributes()) { Attribute *attribute = m_nodesmanager->attributemanager() ; if (attribute == Q_NULLPTR) continue; 11

NVIDIA TOOLKIT EXTENSION #include "nvtoolsext.h void hotspotfunc() { nvtxmarka("hotspot reached"); } void render() { nvtxrangeid_t r = nvtxrangestarta("rendering scene"); //render everything nvtxrangeend(r); } 12

QT5: CASE STUDY With QT5 Samples BIG SCENE (qt3d) PLANETS (qt3d, qml, quick) Lots of small geometry Many draw calls Scene graph usage GPU intensive Optimizing GL call stack Tools showcase: TSP, NVTX Tools showcase: TGD 13

GL STATE CACHING 14

UNIFORM CACHING 15

EFFICIENT GPU PROGRAMMING BEST PRACTICES STATES GEOMETRY Do not set states redundantly Try to sort draw calls according to common states Disable unused vertex arrays Use buffer objects Pack small buffers into a single one and use one draw call Use indexed primitives Pack vertex attributes Use uniform winding (clockwise or counter-clockwise) for geometry 16

EFFICIENT GPU PROGRAMMING BEST PRACTICES TEXTURES TEXTURES Use texture compression when possible Prefer immutable textures created with gltexstorage[23]d() Use mipmaps Consider using texture atlases/maps Avoid random access Update textures with gltexsubimage[23]d() Update dynamically generated textures through FBO s 17

EFFICIENT GPU PROGRAMMING BEST PRACTICES RENDERING RENDERING If possible render front to back Avoid reading back from GPU Disables modes/tests that you do not need Clear buffers only if you need to Avoid memory management during runtime Update data only when needed Cull early and often Do computations as early as possible Use shader cache for faster application start Use instancing Use indirect draw calls 18

CONCLUSION Optimize as you develop Identify your use cases Get an overview over the application modules How much time is spent in every module Profile the modules for hot spots Invest the most time in reducing the big hot spots Get the low hanging fruit first Use Tegra Graphics Debugger to analyze your GPU usage Optimize your GL stream and minimize driver overhead 19

RECOMMENDED SESSIONS TALK, TUTORIAL, HANDS ON LAB, HANGOUTS S6181 - Memory Bandwidth Bootcamp: Collaborative Access Patterns S6710 - Developer Tools for Next Generation Graphics APIs S6131 - Nvpro-Pipeline: Handling Massive Transform Updates in a SceneGraph S6810 - Optimizing Application Performance with CUDA Profiling Tools S6111, S6112 - NVIDIA CUDA Optimization with NVIDIA Nsight Eclipse Edition L6135A, L6135B - Jetson Developer Tools Lab H6122, H6157 - Performance Optimization & Analysis 20

April 4-7, 2016 Silicon Valley THANK YOU JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join