OpenCL Overview. Shanghai March Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group

Similar documents
Copyright Khronos Group 2012 Page 1. OpenCL 1.2. August 2012

OpenCL Press Conference

WebGL, WebCL and OpenCL

Copyright Khronos Group, Page 1. OpenCL. GDC, March 2010

Colin Riddell GPU Compiler Developer Codeplay Visit us at

Copyright Khronos Group, Page 1. OpenCL Overview. February 2010

Copyright Khronos Group Page 1. OpenCL BOF SIGGRAPH 2013

OpenCL Overview Benedict R. Gaster, AMD

Khronos Connects Software to Silicon

SIGGRAPH Briefing August 2014

Khronos and the Mobile Ecosystem

Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President. Copyright Khronos Group, Page 1

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Mobile AR Hardware Futures

Copyright Khronos Group Page 1

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

Copyright Khronos Group Page 1

OpenCL The Open Standard for Heterogeneous Parallel Programming

The State of Gaming APIs

Copyright Khronos Group Page 1

Update on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem

General Purpose GPU Programming (1) Advanced Operating Systems Lecture 14

OpenCL on the GPU. San Jose, CA September 30, Neil Trevett and Cyril Zeller, NVIDIA

Accelerating Vision Processing

Copyright Khronos Group, Page 1. Khronos Overview. Taiwan, February 2012

Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President

OpenCL: History & Future. November 20, 2017

WebGL Meetup GDC Copyright Khronos Group, Page 1

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

Vulkan Launch Webinar 18 th February Copyright Khronos Group Page 1

OpenCL TM & OpenMP Offload on Sitara TM AM57x Processors

AR Standards Update Austin, March 2012

Graphics Technology Update

Open API Standards for Mobile Graphics, Compute and Vision Processing GTC, March 2014

Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

Getting Started with Intel SDK for OpenCL Applications

Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018

Parallel programming languages:

GDC 2014 Barthold Lichtenbelt OpenGL ARB chair

WebGL, WebCL and Beyond!

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function

Vulkan 1.1 March Copyright Khronos Group Page 1

OpenCL Base Course Ing. Marco Stefano Scroppo, PhD Student at University of Catania

CLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level

Modern Processor Architectures. L25: Modern Compiler Design

Ecosystem Overview Neil Trevett Khronos President NVIDIA Vice President Developer

OpenCL. Dr. David Brayford, LRZ, PRACE PATC: Intel MIC & GPU Programming Workshop

Copyright Khronos Group, Page 1 SYCL. SG14, February 2016

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

HKG OpenCL Support by NNVM & TVM. Jammy Zhou - Linaro

CLU: Open Source API for OpenCL Prototyping

Heterogeneous Computing

HSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015!

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Request for Quotations

More performance options

SYCL for OpenCL in a Nutshell

The OpenVX Computer Vision and Neural Network Inference

Silicon Acceleration APIs

Lecture Topic: An Overview of OpenCL on Xeon Phi

Graph Partitioning. Standard problem in parallelization, partitioning sparse matrix in nearly independent blocks or discretization grids in FEM.

Overview and AR/VR Roadmap

From Application to Technology OpenCL Application Processors Chung-Ho Chen

Standards for Vision Processing and Neural Networks

Copyright Khronos Group Page 1

GPGPU COMPUTE ON AMD. Udeepta Bordoloi April 6, 2011

Martin Kruliš, v

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

SYCL for OpenCL May15. Copyright Khronos Group Page 1

Parallel Programming. Libraries and Implementations

OpenACC 2.6 Proposed Features

EECS 487: Interactive Computer Graphics

GPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013

Dave Shreiner, ARM March 2009

ECE 574 Cluster Computing Lecture 17

Parallel Programming Concepts. GPU Computing with OpenCL

Open Standard APIs for Augmented Reality

GPU programming. Dr. Bernhard Kainz

The Role of Standards in Heterogeneous Programming

OpenGL BOF Siggraph 2011

Heterogeneous Computing

OpenACC (Open Accelerators - Introduced in 2012)

Exotic Methods in Parallel Computing [GPU Computing]

Open Standards for Building Virtual and Augmented Realities. Neil Trevett Khronos President NVIDIA VP Developer Ecosystems

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group. Copyright Khronos Group Page 1

Khronos Overview The State of the Art in Open Standards for Visual Computing

OpenCL. Matt Sellitto Dana Schaa Northeastern University NUCAR

Khronos Updates GDC 2017 Neil Trevett Vice President Developer Ecosystem, NVIDIA President,

OpenCL C. Matt Sellitto Dana Schaa Northeastern University NUCAR

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Copyright Khronos Group Page 1. Introduction to SYCL. SYCL Tutorial IWOCL

Mobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair

Open Standards for Today s Gaming Industry

Visualization of OpenCL Application Execution on CPU-GPU Systems

Transcription:

Copyright Khronos Group, 2012 - Page 1 OpenCL Overview Shanghai March 2012 Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group

Copyright Khronos Group, 2012 - Page 2 Processor Parallelism CPUs Multiple cores driving performance increases Emerging Intersection GPUs Increasingly general purpose data-parallel computing Multiprocessor programming e.g. OpenMP Heterogeneous Computing Graphics APIs and Shading Languages OpenCL is a programming framework for heterogeneous compute resources

Copyright Khronos Group, 2012 - Page 3 The BIG Idea behind OpenCL OpenCL execution model - Define N-dimensional computation domain - Execute a kernel at each point in computation domain C Derivative to write kernels based on ISO C99 - APIs to discover devices in a system and distribute work to them Targeting many types of device - GPUs, CPUs, DSPs, embedded systems, mobile phones.. Even FPGAs Traditional loops void trad_mul(int n, const float *a, const float *b, float *c) { int i; for (i=0; i<n; i++) c[i] = a[i] * b[i]; } Data Parallel OpenCL kernel void dp_mul(global const float *a, global const float *b, global float *c) { int id = get_global_id(0); c[id] = a[id] * b[id]; } // execute over n work-items

Copyright Khronos Khronos 2009 Group, 2012 - Page 4 OpenCL Heterogeneous Computing Framework for programming diverse parallel computing resources in a system Platform Layer API - Query, select and initialize compute devices Kernel Language Specification - Subset of ISO C99 with language extensions Runtime API - Execute compute kernels gather results OpenCL has Embedded profile - No need for a separate ES spec

Copyright Khronos Group, 2012 - Page 5 OpenCL Working Group Members Diverse industry participation many industry experts - Processor vendors, system OEMs, middleware vendors, application developers - Academia and research labs, FPGA vendors NVIDIA is chair, Apple is specification editor Apple

Copyright Khronos Group, 2012 - Page 6 OpenCL Milestones Six months from proposal to released OpenCL 1.0 specification - Due to a strong initial proposal and a shared commercial incentive Multiple conformant implementations shipping - For CPUs and GPUs on multiple OS 18 month cadence between OpenCL 1.0, OpenCL 1.1 and now OpenCL 1.2 - Backwards compatibility protect software investment OpenCL working group formed Dec08 OpenCL 1.0 conformance tests released Jun10 OpenCL 1.2 Specification and conformance tests released! Jun08 OpenCL 1.0 released May09 OpenCL 1.1 Specification and conformance tests released Nov11

Copyright Khronos Group, 2012 - Page 7 OpenCL 1.2 Announced in December Significant updates - Khronos being responsive to developer requests - Updated OpenCL 1.2 conformance tests available - Multiple implementations underway Backward compatible upgrade to OpenCL 1.1 - OpenCL 1.2 will run any OpenCL 1.0 and OpenCL 1.1 programs - OpenCL 1.2 platform can contain 1.0, 1.1 and 1.2 devices - Maintains embedded profile for mobile and embedded devices

Copyright Khronos Group, 2012 - Page 8 Programming Kernels: OpenCL C Derived from ISO C99 - But without some C99 features such as standard C99 headers, function pointers, recursion, variable length arrays, and bit fields Language Features Added - Work-items and workgroups - Vector types - Synchronization - Address space qualifiers Also includes a large set of built-in functions - Image manipulation - Work-item manipulation, - Math functions, etc.

Copyright Khronos Group, 2012 - Page 9 OpenCL Platform Model One Host + one or more Compute Devices - Each Compute Device is composed of one or more Compute Units - Each Compute Unit is further divided into one or more Processing Elements

Copyright Khronos Group, 2012 - Page 10 OpenCL Execution Model OpenCL application runs on a host which submits work to the compute devices - Context: The environment within which work-items executes includes devices and their memories and command queues - Program: Collection of kernels and other functions (Analogous to a dynamic library) - Kernel: the code for a work item. Basically a C function - Work item: the basic unit of work on an OpenCL device GPU Queue Context CPU Queue Applications queue kernel execution - Executed in-order or out-of-order

Copyright Khronos Group, 2012 - Page 11 Creating an OpenCL Program CPU GPU DSP Context Programs Kernels Memory Objects Command Queue Kernel0 Images Programs Kernel1 Kernel2 Buffers In order & out of order Compile Create data and arguments Send for execution

1024 Copyright Khronos Group, 2012 - Page 12 An N-dimension domain of work-items Kernels executed across a global domain of work-items Work-items grouped into local workgroups Define the best N-dimensioned index space for your algorithm - Global Dimensions: 1024 x 1024 (whole problem space) - Local Dimensions: 128 x 128 (work group executes together) 1024 Synchronization between work-items possible only within workgroups: barriers and memory fences Cannot synchronize outside of a workgroup

Enqueue Kernel 1 Enqueue Kernel 2 Enqueue Kernel 1 Enqueue Kernel 2 Copyright Khronos Group, 2012 - Page 13 Synchronization: Queues & Events Events can be used to synchronize kernel executions between queues Example: 2 queues with 2 devices Kernel 2 starts before the results from Kernel 1 are ready Kernel 2 waits for an event from Kernel 1 and does not start until the results are ready CPU Kernel 2 CPU Kernel 2 GPU Kernel 1 GPU Kernel 1 Time Time

OpenCL Memory Model Private Memory Per work-item Local Memory Shared within a workgroup Global/Constant Memory Visible to all workgroups Host Memory On the CPU Private Memory Work-Item Workgroup Compute Device Work-Item Local Memory Private Memory Workgroup Global/Constant Memory Host Memory Private Memory Work-Item Local Memory Private Memory Work-Item Host Memory management is Explicit You must move data from host -> global -> local and back Copyright Khronos Group, 2012 - Page 14

Copyright Khronos Group, 2012 - Page 15 Partitioning Devices Devices can be partitioned into sub-devices - More control over how computation is assigned to compute units Sub-devices may be used just like a normal device - Create contexts, building programs, further partitioning and creating command-queues Three ways to partition a device - Split into equal-size groups - Provide list of group sizes - Group devices sharing a part of a cache hierarchy Compute Device Compute Device Host Compute Device Compute Unit Compute Unit Compute Unit Sub-device #1 Real-time processing tasks Compute Unit Compute Unit Compute Unit Sub-device #2 Mainline processing tasks

Copyright Khronos Group, 2012 - Page 16 Custom Devices and Built-in Kernels Embedded platforms often contain specialized hardware and firmware - That cannot support OpenCL C Built-in kernels can represent these hardware and firmware capabilities - Such as video encode/decode Hardware can be integrated and controlled from the OpenCL framework - Can enqueue built-in kernels to custom devices alongside OpenCL kernels FPGAs are one example of device that can expose built-in kernels - Latest FPGAs can support full OpenCL C as well OpenCL becomes a powerful coordinating framework for diverse resources - Programmable and non-programmable devices controlled by one run-time Built-in kernels enable control of specialized processors and hardware from OpenCL run-time

Copyright Khronos Group, 2012 - Page 17 Installable Client Driver Analogous to OpenGL ICDs in use for many years - Used to handle multiple OpenGL implementations installed on a system Optional extension - Platform vendor will choose whether to use ICD mechanisms Khronos OpenCL installable client driver loader - Exposes multiple separate vendor installable client drivers (Vendor ICDs) Application can access all vendor implementations - The ICD Loader acts as a de-multiplexor Application ICD Loader ICD Loader enables application to use any of the installed implementations Vendor #1 OpenCL Vendor #2 OpenCL Vendor #3 OpenCL ICD Loader ensures multiple implementations are installed cleanly

Copyright Khronos Group, 2012 - Page 18 Other Major New Features in OpenCL 1.2 Separate compilation and linking of objects - Provides the capabilities and flexibility of traditional compilers - Create a library of OpenCL programs that other programs can link to Enhanced Image Support - Added support for 1D images, 1D & 2D image arrays - OpenGL sharing extension now enables an OpenCL image to be created from an OpenGL 1D texture, 1D and 2D texture arrays DX9 Media Surface Sharing - Efficient sharing between OpenCL and DirectX 9 or DXVA media surfaces DX11 surface sharing - Efficient sharing between OpenCL and DirectX 11 surfaces And many other updates and additions..

Copyright Khronos Group, 2012 - Page 19 OpenCL Desktop Implementations http://developer.amd.com/zones/openclzone/ http://software.intel.com/en-us/articles/opencl-sdk/ http://developer.nvidia.com/opencl

Copyright Khronos Group, 2012 - Page 20 OpenCL Books Available Now! OpenCL Programming Guide - The Red Book of OpenCL - http://www.amazon.com/opencl-programming-guide-aaftab-munshi/dp/0321749642 OpenCL in Action - http://www.amazon.com/opencl-action-accelerate-graphics-computations/dp/1617290173/ Heterogeneous Computing with OpenCL - http://www.amazon.com/heterogeneous-computing-with-opencl-ebook/dp/b005jrhyus The OpenCL Programming Book - http://www.fixstars.com/en/opencl/book/

Copyright Khronos Group, 2012 - Page 21 Spec Translations Japanese OpenCL 1.1 spec translation available today - http://www.cutt.co.jp/book/978-4-87783-256-8.html - Valued partnership between Khronos and CUTT in Japan Working on OpenCL 1.2 specification translations - Japanese, Korean and Chinese

Copyright Khronos Group, 2012 - Page 22 Khronos OpenCL Resources OpenCL is 100% free for developers - Download drivers from your silicon vendor OpenCL Registry - www.khronos.org/registry/cl/ OpenCL 1.2 Reference Card - PDF version - http://www.khronos.org/files/opencl-1-2-quick-reference-card.pdf Online Man pages - http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/ OpenCL Developer Forums - Give us your feedback! - www.khronos.org/message_boards/

Copyright Khronos Group, 2012 - Page 23 Looking Forward OpenCL-HLM (High Level Model) Exploring high-level programming model, unifying host and device execution environments through language syntax for increased usability and broader optimization opportunities Long-term Core Roadmap Exploring enhanced memory and execution model flexibility to catalyze and expose emerging hardware capabilities WebCL Bring parallel computation to the Web through a JavaScript binding to OpenCL OpenCL-SPIR (Standard Parallel Intermediate Representation) Exploring low-level Intermediate Representation for code obfuscation/security and to provide target back-end for alternative high-level languages