GPU Computing Master Clss. Development Tools

Similar documents
CUDA GPGPU Workshop CUDA/GPGPU Arch&Prog

Introduction to GPGPU and GPU-architectures

Device Memories and Matrix Multiplication

GPU Computing with CUDA

CUDA Development Using NVIDIA Nsight, Eclipse Edition. David Goodwin

NVIDIA Parallel Nsight. Jeff Kiel

Tesla GPU Computing A Revolution in High Performance Computing

GPU programming: CUDA. Sylvain Collange Inria Rennes Bretagne Atlantique

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Introduction to CUDA programming

PPAR: CUDA basics. Sylvain Collange Inria Rennes Bretagne Atlantique PPAR

GPU & High Performance Computing (by NVIDIA) CUDA. Compute Unified Device Architecture Florian Schornbaum

Lecture 7: Caffe: GPU Optimization. boris.

NSIGHT ECLIPSE EDITION

Introduction to GPU programming. Introduction to GPU programming p. 1/17

Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator

Hands-on CUDA exercises

Real-time Graphics 9. GPGPU

CUDA PROGRAMMING MODEL. Carlo Nardone Sr. Solution Architect, NVIDIA EMEA

Module Memory and Data Locality

Cooperative Multitasking for GPU-Accelerated Grid Systems

ME964 High Performance Computing for Engineering Applications

NSIGHT ECLIPSE EDITION

Parallel Programming and Debugging with CUDA C. Geoff Gerfin Sr. System Software Engineer

Tesla GPU Computing A Revolution in High Performance Computing

Real-time Graphics 9. GPGPU

CS : Many-core Computing with CUDA

CUDA Parallelism Model

Computer Graphics Presenting The Visual Future

Josef Pelikán, Jan Horáček CGG MFF UK Praha

CUDA C Programming Mark Harris NVIDIA Corporation

Advanced CUDA Optimization 1. Introduction

April 4-7, 2016 Silicon Valley

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

April 4-7, 2016 Silicon Valley. CUDA DEBUGGING TOOLS IN CUDA8 Vyas Venkataraman, Kudbudeen Jalaludeen, April 6, 2016

GPU Computing with CUDA

Introduction to CUDA

Technische Universität München. GPU Programming. Rüdiger Westermann Chair for Computer Graphics & Visualization. Faculty of Informatics

Hardware/Software Co-Design

Tesla Architecture, CUDA and Optimization Strategies

This is a draft chapter from an upcoming CUDA textbook by David Kirk from NVIDIA and Prof. Wen-mei Hwu from UIUC.

Dense Linear Algebra. HPC - Algorithms and Applications

COSC 6374 Parallel Computations Introduction to CUDA

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Computational Fluid Dynamics (CFD) using Graphics Processing Units

Lab 1 Part 1: Introduction to CUDA

GPU Programming. Performance Considerations. Miaoqing Huang University of Arkansas Fall / 60

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.

Practical Introduction to CUDA and GPU

Debugging Your CUDA Applications With CUDA-GDB

Supporting Data Parallelism in Matcloud: Final Report

Seamless Compute and OpenGL Graphics Development in NVIDIA Nsight 3.0 Visual Studio Edition and Beyond 3/20/2013

ECE 574 Cluster Computing Lecture 17

How to use OpenMP within TiViPE

NSIGHT ECLIPSE EDITION

CS 470 Spring Other Architectures. Mike Lam, Professor. (with an aside on linear algebra)

Shaders. Slide credit to Prof. Zwicker

Register file. A single large register file (ex. 16K registers) is partitioned among the threads of the dispatched blocks.

Massively Parallel Architectures

GPU Programming. Alan Gray, James Perry EPCC The University of Edinburgh

Lecture 5. Performance Programming with CUDA

CUDA Programming. Week 1. Basic Programming Concepts Materials are copied from the reference list

NVIDIA Nsight Visual Studio Edition 4.0 A Fast-Forward of All the Greatness of the Latest Edition. Sébastien Dominé, NVIDIA

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

Outline 2011/10/8. Memory Management. Kernels. Matrix multiplication. CIS 565 Fall 2011 Qing Sun

Scientific discovery, analysis and prediction made possible through high performance computing.

What does Fusion mean for HPC?

Graph Partitioning. Standard problem in parallelization, partitioning sparse matrix in nearly independent blocks or discretization grids in FEM.

GPGPU, 4th Meeting Mordechai Butrashvily, CEO GASS Company for Advanced Supercomputing Solutions

Information Coding / Computer Graphics, ISY, LiTH. Introduction to CUDA. Ingemar Ragnemalm Information Coding, ISY

Lecture 3: Introduction to CUDA

Cooperative Multitasking for GPU-Accelerated Grid Systems

Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition. Jeff Kiel Director, Graphics Developer Tools

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Introduction OpenCL Code Exercices. OpenCL. Tópicos em Arquiteturas Paralelas. Peter Frank Perroni. November 25, 2015

GPU Computing with CUDA

NEW DEVELOPER TOOLS FEATURES IN CUDA 8.0. Sanjiv Satoor

GPU programming. Dr. Bernhard Kainz

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Introduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series

Parallel Computing. Lecture 19: CUDA - I

ECE 574 Cluster Computing Lecture 15

MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011

Compiling and Executing CUDA Programs in Emulation Mode. High Performance Scientific Computing II ICSI-541 Spring 2010

Introduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series

Introduction to Parallel Computing with CUDA. Oswald Haan

E6895 Advanced Big Data Analytics Lecture 8: GPU Examples and GPU on ios devices

GPU Programming. Lecture 2: CUDA C Basics. Miaoqing Huang University of Arkansas 1 / 34

Using a GPU in InSAR processing to improve performance

Lessons learned from a simple application

GPU programming: CUDA basics. Sylvain Collange Inria Rennes Bretagne Atlantique

Basics of CADA Programming - CUDA 4.0 and newer

GPGPU/CUDA/C Workshop 2012

Introduction to Parallel Programming

Information Coding / Computer Graphics, ISY, LiTH. Introduction to CUDA. Ingemar Ragnemalm Information Coding, ISY

CS179 GPU Programming Introduction to CUDA. Lecture originally by Luke Durant and Tamas Szalay

HPC with Multicore and GPUs

CUDA Memory Types All material not from online sources/textbook copyright Travis Desell, 2012

Vector Addition on the Device: main()

Data Parallel Execution Model

Introduction to CUDA Programming

Transcription:

GPU Computing Master Clss Development Tools

Generic CUDA debugger goals Support all standard debuggers across all OS Linux GDB, TotalView and DDD Windows Visual studio Mac - XCode Support CUDA runtime APIs Support CUDA driver APIs Unified debug environment Provide simultaneous access to CUDA and host variables Present CUDA threads same as host threads Present CUDA memory same as host memory Support single and multi GPU debugging Attach to a running process local or remote.

Linux debugger model User Application CUDA GDB User code GDB CUDA UMD CUI Driver API callbacks(client) IPC Debug APIs(server) Target layer KMD RM RM GPU/s

IPC Driver APIs to Debug APIs Driver API callbacks retrieves the fatbin/sass line information Debug APIs translates the SASS information to actual GPU address Debug APIs tracks all memory allocations Driver and Debug APIs report and manage kernel launch, execution and termination.

Current Status CUDA-GDB 3.0 Beta shipping to customers supporting: New CUDA Memory Checker Support for all the OpenCL features Early support for Fermi Architecture etc

Upcoming Roadmap Migrate to standard binary formats DWARF and ELF Performance Improvement New debug APIs to enable external customers Support source and assembly level debugging Support for Apps with multiple host threads using CUDA Support for Apps using multi-gpus

Windows Development Development Environment CPU Code System Rendering Tools HLSL Project Wizards Editor GPU Code Shader Shader Authoring Graphics Debugging Graphics Performance AI Sound Gameplay Debugger Performance Build System CUDA OpenCL CUDA Performance

NVIDIA Nexus Next generation development toolset Homogeneous development for CPU and GPU Seamless debugging, profiling, and visualization

The Nexus 1.0 Release A full Visual Studio integrated development environment Supporting CUDA C, DirectCompute, OpenCL, DirectX 11, DirectX 10, OpenGL equires Windows Vista or Windows 7, Visual Studio 2008 SP1

The Nexus Ecosystem Development Environment CPU Code System Rendering Tools HLSL Build System Editor Debugger GPU Code Shader Shader Authoring AI Performance CUDA Sound Gfx Analysis CUDA Analysis OpenCL* Gameplay PhysX Analysis

Code Authoring Strong integration with the Visual Studio Shell. Syntax highlighting Code completion Nexus toolbar, menu, and project wizards

Code Debugging Step through GPU code, and examine program values. extern "C" global void matrixmul( float* C, float* A, float* B, int wa, int wb) { // Block index int bx = blockidx.x; int by = blockidx.y; // Thread index dim3 threads(block_size, int tx = threadidx.x; BLOCK_SIZE); int ty = threadidx.y; dim3 grid(wc / threads.x, HC / threads.y); matrixmul<<< // Index grid, of threads the first >>>(d_c, sub-matrix d_a, d_b, of A processed by the block WA, WB); int abegin = wa * BLOCK_SIZE * by; // Index of the last sub-matrix of A processed by the block int aend = abegin + wa - 1; // Step size used to iterate through the sub-matrices of A int astep = BLOCK_SIZE; // Index of the first sub-matrix of B processed by the block int bbegin = BLOCK_SIZE * bx; // Step size used to iterate through the sub-matrices of B int bstep = BLOCK_SIZE * wb;

Code Debugging Power up your breakpoints with parallel aware conditionals.

Code Debugging Manage debugging across thousands of threads.

Graphics Drill down from examining a frame

Graphics to a draw call

Graphics to specific resources and GPU state.

Graphics See GPU usage on a per draw call basis.

Analysis View performance from higher level using correlated CPU, GPU, thread, and OS data.

Analysis Get detailed performance information for every kernel.

Nexus everywhere Debug, visualize, and optimize your code remotely. Visual Studio Target Application Nexus Network Nexus Monitor CUDA DirectX Host Target