April 4-7, 2016 Silicon Valley

Similar documents
NEW DEVELOPER TOOLS FEATURES IN CUDA 8.0. Sanjiv Satoor

NSIGHT ECLIPSE EDITION

PERFWORKS A LIBRARY FOR GPU PERFORMANCE ANALYSIS

CUDA Development Using NVIDIA Nsight, Eclipse Edition. David Goodwin

PERFORMANCE OPTIMIZATIONS FOR AUTOMOTIVE SOFTWARE

NSIGHT ECLIPSE EDITION

CUDA Tools for Debugging and Profiling

THE LEADER IN VISUAL COMPUTING

NVIDIA Parallel Nsight. Jeff Kiel

Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition. Jeff Kiel Director, Graphics Developer Tools

April 4-7, 2016 Silicon Valley. CUDA DEBUGGING TOOLS IN CUDA8 Vyas Venkataraman, Kudbudeen Jalaludeen, April 6, 2016

OPTIMIZING HPC SIMULATION AND VISUALIZATION CODE USING NVIDIA NSIGHT SYSTEMS

Graphics Performance Analyzer for Android

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

NVIDIA Nsight Visual Studio Edition 4.0 A Fast-Forward of All the Greatness of the Latest Edition. Sébastien Dominé, NVIDIA

Raise your VR game with NVIDIA GeForce Tools

CUDA Tools for Debugging and Profiling. Jiri Kraus (NVIDIA)

S CUDA on Xavier

Qualcomm Snapdragon Profiler

GPU Computing Master Clss. Development Tools

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator

NSIGHT ECLIPSE EDITION

Tesla GPU Computing A Revolution in High Performance Computing

Positional tracking for VR/AR using Stereo Vision. Edwin AZZAM, CTO Stereolabs

Autonomous Driving Solutions

Profiling and Debugging Games on Mobile Platforms

More performance options

EE , GPU Programming

Seamless Compute and OpenGL Graphics Development in NVIDIA Nsight 3.0 Visual Studio Edition and Beyond 3/20/2013

OpenACC Course. Office Hour #2 Q&A

CSE 591: GPU Programming. Programmer Interface. Klaus Mueller. Computer Science Department Stony Brook University

Jomar Silva Technical Evangelist

Parallel Programming and Debugging with CUDA C. Geoff Gerfin Sr. System Software Engineer

The Witness on Android Post Mortem. Denis Barkar 3 March, 2017

OPENCL TM APPLICATION ANALYSIS AND OPTIMIZATION MADE EASY WITH AMD APP PROFILER AND KERNELANALYZER

Deep Learning: Transforming Engineering and Science The MathWorks, Inc.

Tegra 250 Development Kit Android Setup Experience

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker

PROFILER. DU _v5.0 October User's Guide

Using the PowerVR SDK to Optimize your Renderer

CUDA 7.5 OVERVIEW WEBINAR 7/23/15

System Wide Tracing User Need

Intel Parallel Studio XE 2017 Composer Edition BETA C++ - Debug Solutions Release Notes

The PowerVR Insider SDK. PowerVR Developer Technology

Intel System Studio 2014 Overview

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

Intel Integrated Native Developer Experience 2015 Update 2(OS X* Host)

Profiling of Data-Parallel Processors

Android Sdk Tutorial For Windows 7 64 Bit Full Version

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018

AMD CodeXL 1.3 GA Release Notes

Siggraph Agenda. Usability & Productivity. FX Composer 2.5. Usability & Productivity 9/12/2008 9:16 AM

Expose NVIDIA s performance counters to the userspace for NV50/Tesla

ArcGIS Runtime: Building Cross-Platform Apps. Rex Hansen Mark Baird Michael Tims Morten Nielsen

Developers Road Map to ArcGIS Desktop and ArcGIS Engine

Infrastructure Middleware (Part 3): Android Runtime Core & Native Libraries

Android. Lesson 1. Introduction. Android Developer Fundamentals. Android Developer Fundamentals. to Android 1

Intel INDE Integrated Native Developer Experience

ArcGIS Runtime SDK for Java: A Beginner s Guide. Mark Baird JC Malott

NVIDIA CUDA TOOLKIT

SHWETANK KUMAR GUPTA Only For Education Purpose

NVIDIA CUDA TOOLKIT

An Introduction to Android. Jason Chen Developer Advocate Google I/O 2008

Turing Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA

Systems software design. Software build configurations; Debugging, profiling & Quality Assurance tools

Using Intel VTune Amplifier XE and Inspector XE in.net environment

High-Productivity CUDA Programming. Cliff Woolley, Sr. Developer Technology Engineer, NVIDIA

NVIDIA CUDA TOOLKIT

ADVANCED trouble-shooting of real-time systems. Bernd Hufmann, Ericsson

Android PerfHUD ES quick start guide

Profiling GPU Code. Jeremy Appleyard, February 2016

μc/probe on the RIoTboard (Linux)

ArcGIS Runtime SDK for.net Getting Started. Jo Fraley

Ecosystem Overview Neil Trevett Khronos President NVIDIA Vice President Developer

GTC 2013 March San Jose, CA The Smartest People. The Best Ideas. The Biggest Opportunities. Opportunities for Participation:

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker

EGLSTREAMS: INTEROPERABILITY FOR CAMERA, CUDA AND OPENGL. Debalina Bhattacharjee Sharan Ashwathnarayan

S WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS. Jakob Progsch, Mathias Wagner GTC 2018

SIGGRAPH Briefing August 2014

Advanced CUDA Optimization 1. Introduction

Streaming mode snapshot mode Faster Troubleshooting Higher Quality Better Performance Control System Tuning Other Benefits

DNWSH - Version: 2.3..NET Performance and Debugging Workshop

IDE for medical device software development. Hyun-Do Lee, Field Application Engineer

CSE 599 I Accelerated Computing Programming GPUS. Intro to CUDA C

Adding Advanced Shader Features and Handling Fragmentation

Porting mobile web application engine to the Android platform

Beyond Hardware IP An overview of Arm development solutions

NVIDIA Developer Tools for Graphics and PhysX

Resource 2 Embedded computer and development environment

CUDA 5 and Beyond. Mark Ebersole. Original Slides: Mark Harris 2012 NVIDIA

PROFILER USER'S GUIDE. DU _v7.5 September 2015

μc/probe on the element14 BeagleBone Black

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016

Mobile and Ubiquitous Computing: Android Programming (part 1)

Android Application Development A Beginners Tutorial

GPU Programming. Rupesh Nasre.

Mobile AR Hardware Futures

Transcription:

April 4-7, 2016 Silicon Valley

TEGRA PLATFORMS GAMING DRONES ROBOTICS IVA AUTOMOTIVE 2

Compile Debug Profile Trace C/C++ NVTX NVIDIA Tools extension Getting Started CodeWorks JetPack Installers IDE Integration Standalone and CLI Hardware Support 3

SOFTWARE DEVELOPMENT WORKFLOW Desktop Tools Software Development CodeWorks JetPack Install. Toolchain Setup Cross-compilation Porting Running Nsight EE Nsight Tegra VSE Tegra System Profiler Nsight EE Tegra Graphics Debugger Tegra Graphics Debugger CUDA Visual Profiler Remote Debugging Remote Profiling Ship it! Debugging CPU/GPU Profiling System/CPU/GPU/IO/ Cuda-gdb Cuda-memcheck PerfWorks CUPTI nvprof 4

COMMON CORE TOOLS OFFERING Specialization Where Needed CUDA Debugging and Profiling Graphics Debugging and Profiling GPU Performance Counters Libraries CPU Profiling and System Trace NVIDIA Tools Extension (NVTX) Embedded Gaming Tegra Common Tools Automotive NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 5

GETTING STARTED CodeWorks 1r4 ANDROID TOOL CHAIN Jump starts developing for SHIELD/Android platform Installs Android NDK & SDK tool chain Installs Developer tools, GameWorks, Libraries, Reference documentation and samples Cross-compiles code samples, pushes them to devkit And Runs one sample Android SDK/NDK Java TEGRA ANDROID TOOLKIT Tools Docs Samples ANDROID OS IMAGES Eclipse Kit Kat Marshmallow Lollipop NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 6

GETTING STARTED JetPack Installer For DriveCX/PX and Jetson Jump starts developing for Embedded platforms Installs Linux ARM cross-compilation tool chain Installs Developer tools, CUDA, Libraries, Flashes Drive PX/CX, Jetson TK1/TX1 OS Image Reference documentation and samples Compiles code samples, pushes them to devkit And Runs one sample NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 7

JETPACK AND CODEWORKS New SDK Component Manager Parallel downloads Component update notification Component dependency resolution Integrated terminal window 8

CUDA STANDALONE TOOLS Visual Profiler Trace CUDA activities Profile CUDA kernels Correlate performance instrumentation with source code Expert-guided performance analysis NVPROF Collect performance events and metrics GPU Library Advisor Detect CUDA library optimization opportunities NVDISASM, CUOBJDUMP CUDA-MEMCHECK Detect out-of-bounds memory accesses Detect race condition in memory accesses Detect uninitialized variable accesses Detect incorrect GPU thread synchronization CUDA-GDB Debug CUDA kernels with CLI Debug CPU and GPU code CPU and GPU core dump support 9

NVIDIA NSIGHT ECLIPSE EDITION Homogeneous application development for CPU+GPU compute platforms CUDA-Aware Editor CUDA Debugger CPU+GPU CUDA Profiler 10

CUDA 8.0 For Tegra Platforms Tegra Parker Support CUDA Debugging with Compute Preemption CUDA Profiling with advanced PC Sampling metrics CUDA Visual Profiler with Critical Path Analysis 11

DEPENDENCY ANALYSIS Provide insight into application-level performance limiters Expose dependencies between activities according to the programming model Identify waiting time due to inter-stream dependencies Highlight activities on the critical application runtime path Supports CUDA (Linux/Mac/Windows) and POSIX threads (Linux/Mac) NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 12

DEPENDENCY ANALYSIS EXAMPLE Dependencies between events derived from programming model constraints Allows to compute wait states and the critical path cudalaunch cudastreamsynchronize Stalls CPU (waiting time) Kernel NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 13

DEPENDENCY ANALYSIS IN NVPROF New option to run post-mortem dependency analysis New option to trace POSIX threads for multi-threaded applications ==22557== Dependency Analysis: Critical path(%) Critical path Waiting time Name 94.61% 3.942181s 0ns clock_block(long*, long) 5.20% 216.857718ms 0ns cudamalloc 0.16% 6.617667ms 0ns <Other> 0.01% 293.028000us 0ns cudevicegetattribute 0.01% 235.154000us 0ns cudagetdeviceproperties 0.01% 221.116000us 0ns cudafree 0.00% 158.703000us 0ns cudastreamcreate 0.00% 35.252000us 0ns cudaconfigurecall 0.00% 35.248000us 0ns cudevicegetname 0.00% 33.139000us 0ns cudevicetotalmem_v2 0.00% 20.298000us 0ns cudasetupargument 0.00% 19.433000us 0ns cudagetdevice 0.00% 0ns 3.942147s pthread_join 0.00% 0ns 3.942136s cudastreamsynchronize 0.00% 0ns 1.001459s pthread_mutex_lock 0.00% 0ns 980.464357ms pthread_cond_wait NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 14

DEMO - DEPENDENCY ANALYSIS IN VISUAL PROFILER 15

TEGRA GRAPHICS DEBUGGER Next-gen graphics development tools Supports OpenGL ES 2.0/3.0/3.1 + Android Extension Pack Monitor key software and hardware performance metrics Debug draw calls, related states and resources Live capture of a single rendering frame Edit and recompile shaders live Automatic GPU bottleneck analysis Advanced timings for draw calls and kernel dispatches 16

NEW WITH TEGRA GRAPHICS DEBUGGER 2.X Graphics Range Profiler Advanced Interposer for non-rooted devices Highlight redundant API state changes NVTX support for perf markers and ranges Highlight drawcalls based on shader selection 17

TEGRA SYSTEM PROFILER Multi-core CPU profiler for all Tegra platforms Easily prepare a device and deploy application for profiling Maximize multi-core CPU utilization Quickly identify CPU hot spots, hot paths and L1/L2 cache issues Visualize multi-core CPU activities with a new timeline view Time range filtering 18

NEW WITH TEGRA SYSTEM PROFILER 2.5/2.6 Tegra Parker support and Expanded system trace NVIDIA Tools extension Support (NVTX) Visualize thread state: running/ready/blocked Trace CUDA kernel workload execution (Jetson/DriveCX/PX) Trace OpenGL-ES API calls Visualize CPU and EMC frequencies 19

STEREOLABS Real-time HD SLAM with GPU processing @ 25Hz Zed Jetson TX1 2560 x 720 @ 60Hz USB3.0 20

DEMO STEREOLABS 21

PERFWORKS Next-gen GPU Performance Counter Data Collection C API for collecting GPU performance counters and data from NVIDIA GPUs. CUDA, OpenGL, OpenGL ES, D3D11, and D3D12 Cross-Platform, with support for Kepler, Maxwell, Pascal GPUs Target Audience: tools developers, engine developers Schedule: Beta in 3Q 2016 22

PERFWORKS SDK Successor to the NVIDIA Perfkit SDK (NVPMAPI) Collect GPU metrics for Performance Monitoring (e.g. HUD). Automated serialized drawcall bottleneck analysis Adds range-based profiling Collect metrics per user defined range, draw calls, or dispatches (Perf Markers, RTs,...) Supports multi-threaded GPU work submission Collect data on modern multi-threaded APIs Collect consistent metrics across multiple generations of NVIDIA GPUs 23

NVTX 1R2 Events and Domains Win10 Differentiate annotations from libraries and application Middleware libraries have their own domain Process 0 Process 1 Thread 0 DX12 Memory GPU Thread 1 Annotation HairWorks Annotation CUDA Memory GPU Thread 2 PhysX Annotation CUDA Memory GPU build 3D world Animate Character Animate Hair ODE World physics simulatio User-defined and named synchronization primitives System CPU Core 0 Core 1 Freq. GPU Graphics Compute 24

GAME DEVELOPERS Enable Game Developer to easily port to SHIELD/Android CodeWorks 1r4 NDK profiling Same Graphics Debugging and Profiling Tools as PC Support consumer devices No rooted OS requirement Basic feature set support on non-tegra Visual Studio support 25

NVIDIA NSIGHT TEGRA VISUAL STUDIO EDITION Android NDK/JDK application development Project Management Android Debugging Logcat Filtering 26

NEW WITH NSIGHT TEGRA 3.3 Android Marshmallow and ARMv8 AArch64 Support for non-tegra devices Tegra Graphics Debugger Attach SIGSEV Signal handler NDK r10e win64 (Link massive games + new options) CMake GDB 7.9 win64 CMake 3.1 support Visual Studio 2015 27

Android GDB debugging in Visual Studio Set breakpoints in both Java and Native (C/C++) Build Native Android projects in Visual Studio using vs-android, ndk-build or makefiles. Use the familiar Visual Studio Locals, Watches, Memory and Breakpoints windows. NVIDIA CONFIDENTIAL 28

AUTOMOTIVE Nsight Eclipse Edition NextGen Build, Debug and Profile CUDA applications Required Eclipse version 4.4 or later Developed based on Eclipse CDT/DSF framework. Using Eclipse remote system explorer plugins to connect to the remote devices. Nsight plugins delivered as archive file(zip) and installed using standard Eclipse Can co-exist with other Eclipse plugins in the user environment. 30

SCREENSHOTS Launch shortcuts for cuda-gdb Debug session views NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 31

FUTURE Multi-process support PerfWorks Tegra System Profiler Multi-Node DrivePX2 with 2 SoCs LTTNg GPU event provider GPU process trace 32

Q&A https://developer.nvidia.com Monday 4/4 - S6111, NVIDIA CUDA Optimization with NVIDIA Nsight Eclipse Edition: A Case Study 9:00 - Room 211A Tuesday 4/5 - S6659, Perfworks: A Library for GPU Performance Analysis 15:00 - Room 211B Wednesday 4/6 - L6135A/B, Jetson Developer Tools Lab 13:30/15:30 Room 210C Thursday 4/7 - S6810, Optimizing App. Performance w/ CUDA Profiling Tools 10:00 - Room 211B 33