Copyright Khronos Group Page 1

Similar documents
Accelerating Vision Processing

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1

Khronos Connects Software to Silicon

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1

Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015

Update on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

Vulkan Launch Webinar 18 th February Copyright Khronos Group Page 1

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

Ecosystem Overview Neil Trevett Khronos President NVIDIA Vice President Developer

SIGGRAPH Briefing August 2014

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

OpenCL 1.0 to 2.2 : a quick survey

Vulkan 1.1 March Copyright Khronos Group Page 1

Copyright Khronos Group Page 1

OpenCL Overview. Shanghai March Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group

Silicon Acceleration APIs

OpenCL: History & Future. November 20, 2017

OpenCL Press Conference

Copyright Khronos Group 2012 Page 1. OpenCL 1.2. August 2012

Copyright Khronos Group Page 1

EECS 487: Interactive Computer Graphics

Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018

Copyright Khronos Group, Page 1. OpenCL. GDC, March 2010

Open Standards for AR and VR Neil Trevett Khronos President NVIDIA VP Developer January 2018

Standards Update. Copyright Khronos Group Page 1

Open Standards for Building Virtual and Augmented Realities. Neil Trevett Khronos President NVIDIA VP Developer Ecosystems

WebGL Meetup GDC Copyright Khronos Group, Page 1

Khronos and the Mobile Ecosystem

Copyright Khronos Group Page 1. OpenCL BOF SIGGRAPH 2013

Overview and AR/VR Roadmap

Prospects for a more robust, simpler and more efficient shader cross-compilation pipeline in Unity with SPIR-V

Mobile AR Hardware Futures

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

Open API Standards for Mobile Graphics, Compute and Vision Processing GTC, March 2014

OpenCL C++ kernel language

HSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015!

AR Standards Update Austin, March 2012

Copyright Khronos Group, Page 1. Khronos Overview. Taiwan, February 2012

KHRONOS STANDARDS UPDATE. Neil Trevett, GTC, 26 th March 2018

The OpenVX Computer Vision and Neural Network Inference

Standards for Vision Processing and Neural Networks

CLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level

Khronos Connects Software to Silicon

SYCL for OpenCL in a Nutshell

Copyright Khronos Group, Page 1 SYCL. SG14, February 2016

WebGL, WebCL and OpenCL

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Modern Processor Architectures. L25: Modern Compiler Design

Open Standard APIs for Augmented Reality

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Taipei Embedded Outreach OpenCL DSP Profile Proposals

KHRONOS STANDARDS UPDATE. Neil Trevett, GTC, 26 th March 2018

Khronos Updates GDC 2017 Neil Trevett Vice President Developer Ecosystem, NVIDIA President,

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

DEVELOPER DAY MONTRÉAL APRIL Copyright Khronos Group Page 1

Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group. Copyright Khronos Group Page 1

WebGL, WebCL and Beyond!

Copyright Khronos Group Page 1. Introduction to SYCL. SYCL Tutorial IWOCL

Vulkan API 杨瑜, 资深工程师

More performance options

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

ASYNCHRONOUS SHADERS WHITE PAPER 0

Graphics Technology Update

Colin Riddell GPU Compiler Developer Codeplay Visit us at

The Role of Standards in Heterogeneous Programming

Using SYCL as an Implementation Framework for HPX.Compute

Open Standards for Today s Gaming Industry

Standards for WebVR. Neil Trevett. Khronos President Vice President Mobile Content,

trisycl Open Source C++17 & OpenMP-based OpenCL SYCL prototype Ronan Keryell 05/12/2015 IWOCL 2015 SYCL Tutorial Khronos OpenCL SYCL committee

HKG OpenCL Support by NNVM & TVM. Jammy Zhou - Linaro

Vulkan: Scaling to Multiple Threads. Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

Working with Metal Overview

SYCL for OpenCL May15. Copyright Khronos Group Page 1

Distributed & Heterogeneous Programming in C++ for HPC at SC17

The State of Gaming APIs

Adding Advanced Shader Features and Handling Fragmentation

High Performance Computing on GPUs using NVIDIA CUDA

SYCL for OpenCL. in a nutshell. Maria Rovatsou, Codeplay s R&D Product Development Lead & Contributor to SYCL. IWOCL Conference May 2014

Heterogeneous Computing

Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President. Copyright Khronos Group, Page 1

Vulkan Timeline Semaphores

Open Standard APIs for Embedded Vision Processing

Introduction to OpenGL ES 3.0

Ecosystem Forum. SIGGRAPH, August 2018 Neil Trevett, Khronos President. Copyright Khronos Group Page 1

OpenCL The Open Standard for Heterogeneous Parallel Programming

Dynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California

GPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013

OpenGL ES 2.0 : Start Developing Now. Dan Ginsburg Advanced Micro Devices, Inc.

Graph Streaming Processor

Mobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair

THE HETEROGENEOUS SYSTEM ARCHITECTURE IT S BEYOND THE GPU

DEVELOPER DAY. Vulkan Subgroup Explained Daniel Koch NVIDIA MONTRÉAL APRIL Copyright Khronos Group Page 1

Parallel Programming on Larrabee. Tim Foley Intel Corp

Transcription:

OpenCL A State of the Union Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem OpenCL Working Group Chair ntrevett@nvidia.com @neilt3d Vienna, April 2016 Copyright Khronos Group 2016 - Page 1

Copyright Khronos Group 2016 - Page 2 Need for Heterogeneous Parallelism Flynn s Taxonomy (1966) + = Languages / Directives Explicit Kernels Libraries Threads and Messages "The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise" - Edsger Dijkstra

Copyright Khronos Group 2016 - Page 3 OpenCL Ecosystem Hardware Implementers Desktop/Mobile/Embedded/FPGA OpenCL 2.2 Top to Bottom C++ Working Group Members Apps/Tools/Tests/Courseware Single Source C++ Programming Core API and Language Specs Portable Kernel Intermediate Language

Copyright Khronos Group 2016 - Page 4 OpenCL 2.2 Provisional - seeking industry feedback before finalization at SIGGRAPH or SC16 OpenCL C++ kernel language into core SPIR-V 1.1 adds OpenCL C++ support SYCL 2.2 fully leverages OpenCL 2.2 from a single source file Runs on any OpenCL 2.0-capable hardware OpenCL C++ Kernel Language SPIR-V 1.1 with C++ support SYCL 2.2 for single source C++ 3-component vectors Additional image formats Multiple hosts and devices Buffer region operations Enhanced event-driven execution Additional OpenCL C built-ins Improved OpenGL data/event interop Device partitioning Separate compilation and linking Enhanced image support Built-in kernels / custom devices Enhanced DX and OpenGL Interop Shared Virtual Memory On-device dispatch Generic Address Space Enhanced Image Support C11 Atomics Pipes Android ICD SPIR-V in Core Subgroups into core Subgroup query operations clclonekernel Low-latency device timer queries Dec08 OpenCL 1.0 Specification Jun10 OpenCL 1.1 Specification Nov11 OpenCL 1.2 Specification 18 months 18 months 24 months Nov13 OpenCL 2.0 Specification 24 months Nov15 OpenCL 2.1 Specification 7months May16 OpenCL 2.2 PROVISIONAL

Copyright Khronos Group 2016 - Page 5 OpenCL C++ Kernel Language The OpenCL C++ kernel language is a static subset of C++14 - Frees developers from low-level coding details without sacrificing performance C++14 features removed from OpenCL C++ for parallel programming - Exceptions, Allocate/Release memory, Virtual functions and abstract classes Function pointers, Recursion and goto Classes, lambda functions, templates, operator overloading etc.. - Fast and elegant sharable code - reusable device libraries and containers - Templates enable meta-programming for highly adaptive software - Lambdas used to implement nested/dynamic parallelism Enhanced support for authoring libraries - Increased safety, reduced undefined behavior while accessing atomics, iterators, images, samplers, pipes, device queue built-in types and address spaces Safer, more adaptable, more reusable parallel software

Copyright Khronos Group 2016 - Page 6 The Choice of SYCL 2.2 or OpenCL C++ C++ Kernel Language Low Level Control GPGPU -style separation of device-side kernel source code and host code Developer Choice The development of the two specifications are aligned so code can be easily shared between the two approaches Single-source C++ Programmer Familiarity Approach also taken by C++ AMP, OpenMP and the C++ 17 Parallel STL SYCL is an important initiative to represent the OpenCL perspective as the industry as a whole figures out parallel programming from C++

Copyright Khronos Group 2016 - Page 7 More OpenCL 2.2 with help from SPIR-V 1.1 SPIR-V 1.1 adds full support for OpenCL C++ - Initializer/finalizer function execution modes to support constructors/destructors - Enhances the expressiveness of kernel programs by supporting named barriers, subgroup execution, and program scope pipes SPIR-V specialization constants - previously available in Vulkan shaders - SPIR-V module can express a family of parameterized OpenCL kernel programs - Embedded compile-time settings can be specialized at runtime - Eliminates the need to ship or recompile multiple variants of a kernel Pipe storage device-side type - useful for FPGA implementations - Makes connectivity size and type known at compile time - Enables efficient device-scope communication between kernels Enhanced optimization of generated code - Query non-trivial constructors/destructors of program scope global objects - User callbacks can be set at program release time

Copyright Khronos Group 2016 - Page 8 SPIR-V Ecosystem Khronos has open sourced these tools and translators GLSL Open source C++ front-end released https://github.com/khronosgroup/spir/tree/spirv-1.1 Khronos plans to open source these tools soon Third party kernel and shader Languages OpenCL C OpenCL C++ SPIR-V Tools SPIR-V Validator SPIR-V (Dis)Assembler Other Intermediate Forms LLVM to SPIR-V Bi-directional Translator LLVM SPIR-V Khronos defined and controlled cross-api intermediate language Native support for graphics and parallel constructs 32-bit Word Stream Extensible and easily parsed Retains data object and control flow information for effective code generation and translation IHV Driver Runtimes

Support for Both SPIR-V and LLVM LLVM is an SDK, not a formally defined standard - Khronos moved away from trying to use LLVM IR as a standard - Issues with versioning, metadata, etc. But LLVM is a treasure chest of useful transforms - SPIR-V tools can encapsulation and use LLVM to do useful SPIR-V transforms SPIR-V tools can all use different rules and there will be lots of these - May be lossy and only support SPIR-V subset - Internal form is not standardized - May hide LLVM version, metadata HLSL GLSL OpenCL C OpenCL C++ Rendezvous format for interchange Native expression of graphics and parallel functionality for Khronos APIs SPIR-V Tool-encapsulated LLVM Transform Tool - Compression - Optimization - Stripping - Linker/Merger Driver Copyright Khronos Group 2016 - Page 9

OpenCL Implementations 1.2 Jun12 1.1 Jul11 1.0 May09 1.0 Aug09 1.1 Aug10 1.0 May10 1.2 May12 2.0 Dec14 1.1 Feb11 1.1 Mar11 Desktop 2.0 Jul14 1.2 Dec12 1.1 Jun10 1.0 May09 1.2 May15 1.2 Mar16 1.1 Aug12 Mobile 1.2 Sep13 1.0 Feb11 1.1 Nov12 1.2 Apr14 1.1 Apr12 2.0 Nov15 1.2 Dec14 1.2 Sep14 1.1 May13 Embedded 1.0 Jan10 1.2 May15 1.2 Aug15 1.0 Jul13 FPGA 1.0 Dec14 Vendor timelines are first implementation of each spec generation Dec08 OpenCL 1.0 Specification Jun10 OpenCL 1.1 Specification Nov11 OpenCL 1.2 Specification Nov13 Nov15 OpenCL 2.0 OpenCL 2.1 Specification Specification Copyright Khronos Group 2016 - Page 10

OpenCL at a Crossroads Lack of Tools Too complex to program Performance portability is hard Desktop Use cases: Video and Image Processing, Gaming Compute Roadmap: Vulkan interop, arbitrary precision for increased performance, pre-emption, Collective Programming and improved execution model CUDA, NVIDIA Shipping 1.2 Apple Metal Mobile Use case: Photo and Vision Processing Roadmap: arbitrary precision for inference engine and pixel processing efficiency, pre-emption and QoS scheduling for power efficiency * Roadmap topics in discussion HPC, SciViz, Datacenter Use case: Numerical Simulation, Virtualization Roadmap: enhanced streaming processing, enhanced library support RenderScript confusion on Android, Apple Metal CUDA, NVIDIA Shipping 1.2, Lack of libraries FPGAs Use cases: Network and Stream Processing Roadmap: enhanced execution model, self-synchronized and selfscheduled graphs, fine-grained synchronization between kernels, DSL in C++ Embedded Use cases: Signal and Pixel Processing Roadmap: arbitrary precision for power efficiency, hard real-time scheduling, asynch DMA Copyright Khronos Group 2016 - Page 11

Copyright Khronos Group 2016 - Page 12 The Universal Struggle for Open Standards Platforms Idealized Universe = Total content lock. All commercially significant apps run on your platform and nowhere else Independent Hardware and Software Vendors Idealized Universe = zero cost to monetize apps and processors across all platforms Proprietary Solution Providers Idealized Universe = single viable solution. All platforms and applications use your solution and nothing else Effective Open Standard Strategies 1. Create joint investment in a solution that is too expensive for any one company to develop themselves 2. Create enough momentum that companies gain more content than they lose by supporting an open standard

Copyright Khronos Group 2016 - Page 13 Vulkan Explicit GPU Control Application Single thread per context High-level Driver Abstraction Context management Memory allocation Full GLSL compiler Error detection Layered GPU Control GPU Application Memory allocation Thread management Synchronization Multi-threaded generation of command buffers Thin Driver Explicit GPU Control GPU Vulkan 1.0 provides access to OpenGL ES 3.1 / OpenGL 4.X-class GPU functionality but with increased performance and flexibility Language Front-end Compilers Initially GLSL SPIR-V pre-compiled shaders Loadable debug and validation layers Vulkan Benefits Resource management in app code: Less hitches and surprises Simpler drivers: Improved efficiency/performance Reduced CPU bottlenecks Lower latency Increased portability Command Buffers: Command creation can be multi-threaded Multiple CPU cores increase performance Graphics, compute and DMA queues: Work dispatch flexibility SPIR-V Pre-compiled Shaders: No front-end compiler in driver Future shading language flexibility Loadable Layers No error handling overhead in production code

Copyright Khronos Group 2016 - Page 14 Vulkan Tools Architecture Layered design for cross-vendor tools innovation and flexibility - IHVs plug into a common, extensible architecture for code validation, debugging and profiling during development without impacting production performance Khronos Open Source Loader enables use of tools layers during debug - Finds and loads drivers, dispatches API calls to correct driver and layers Production Path (Performance) Vulkan-based Title Debug Layers can be installed during Development Interactive Debugger Validation Layers Vulkan s Common Loader Debug Layers IHV s Installable Client Driver Debug information via standardized API calls

Copyright Khronos Group 2016 - Page 15 Vulkan Feature Sets Vulkan supports hardware with a wide range of hardware capabilities - Mobile OpenGL ES 3.1 up to desktop OpenGL 4.5 and beyond One unified API framework for desktop, mobile, console, and embedded - No "Vulkan ES" or "Vulkan Desktop" Vulkan precisely defines a set of "fine-grained features" - Features are specifically enabled at device creation time (similar to extensions) Platform owners define a Feature Set for their platform - Vulkan provides the mechanism but does not mandate policy - Khronos will define Feature Sets for platforms where owner is not engaged Khronos will define feature sets for Windows and Linux - After initial developer feedback

Copyright Khronos Group 2016 - Page 16 Vulkan Genesis Khronos members from all segments of the graphics industry agree the need for new generation cross-platform GPU API Including an unprecedented level of participation from game engine developers Significant proposals, IP contributions and engineering effort from many working group members 18 months A high-energy working group effort Khronos first API hard launch 16Feb16 Specification, Conformance Tests, SDKs - all open source Reference Materials, Compiler front-ends, Samples Multiple Conformant Drivers on multiple OS Vulkan Working Group Participants

Copyright Khronos Group 2016 - Page 17 The Secret to Performance Portability Applications can use Vulkan directly for maximum flexibility and control Application uses utility libraries to speed development Utility libraries and layers Application Game Engines fully optimized over Vulkan Applications using game engines will automatically benefit from Vulkan s enhanced performance Rich Area for Innovation Many utilities and layers will be in open source Layers to ease transition from OpenGL Domain specific flexibility Performance across diverse hardware Similar ecosystem dynamic as WebGL A widely pervasive, powerful, flexible foundation layer enables diverse middleware tools and libraries

Copyright Khronos Group 2016 - Page 18 Add Compute to Vulkan? In Discussion Desktop Use cases: Video and Image Processing, Gaming Compute Roadmap: Vulkan interop, arbitrary precision for increased performance, pre-emption, collective programming and improved execution model Vulkan Compute? Gaming Compute, Pixel Processing, Inference Fine grain graphics and compute (no interop needed) SPIR-V for shading language flexibility C/C++ Low-latency, fine grain run-time Google Android adoption Competes well with Metal (=C++/OpenCL 1.2) Roadmap: arbitrary precision, SVM, dynamic parallelism, pre-emption and QoS scheduling HPC, SciViz, Datacenter Use case: Numerical Simulation, Virtualization Roadmap: enhanced streaming processing, enhanced library support FPGAs Use cases: Network and Stream Processing Roadmap: enhanced execution model, selfsynchronized and self-scheduled graphs, finegrained synchronization between kernels, DSL in C++ Embedded Use cases: Signal and Pixel Processing Roadmap: arbitrary precision for power efficiency, hard real-time scheduling, asynch DMA Mobile Use case: Photo and Vision Processing Roadmap: arbitrary precision for inference engine and pixel processing efficiency, preemption and QoS scheduling for power efficiency Vulkan Lessons 1. Engine developer insights were essential during design 2. Engine prototyping during design was essential during design 3. Open sourcing tests, tools, specs drives deeper community engagement 4. Explicit API supports strong middleware ecosystem BUT its just a GPU API still need OpenCL!

Possible OpenCL Evolution Increasing language expressiveness Guaranteeing degrees of forward progress Definitions of concurrency Evolution of OpenCL filling the gap between imprecise HLL and imperfect hardware Increasing parallel hardware flexibility Execution and memory model enhancements Pre-emption, virtual memory, on-device dispatch, synchronization Should OpenCL evolve to focus on the things that ONLY OpenCL can do 1. Enable low-level, explicit access to heterogeneous hardware needed by languages and libraries 2. Provide efficient runtime coordination of tasks, resources, scheduling on target hardware 3. Leverage, synergize and co-exist with Vulkan compute and learn from Vulkan 4. Define feature sets so target hardware does not have to implement inappropriate functionality 5. Adopt layered tools architecture to drive tools momentum and decrease run-time overhead 6. Leave usability, portability and performance portability to higher levels in the ecosystem Or what do YOU think? Copyright Khronos Group 2016 - Page 19

Copyright Khronos Group 2016 - Page 20 Get Involved! OpenCL is driving to new level of community engagement - Learning from the Vulkan experience - We need to know what you need from OpenCL - IWOCL is the perfect opportunity to find out! Any company or organization is welcome to join Khronos - For a voice and a vote in any of these standards - www.khronos.org If joining is not possible ask about the OpenCL Advisory Panel - Free of charge enable design reviews and contributions Neil Trevett - ntrevett@nvidia.com - @neilt3d