OpenCL: History & Future. November 20, 2017

Similar documents
SYCL for OpenCL May15. Copyright Khronos Group Page 1

Copyright Khronos Group, Page 1 SYCL. SG14, February 2016

Copyright Khronos Group Page 1. Introduction to SYCL. SYCL Tutorial IWOCL

SYCL for OpenCL. in a nutshell. Maria Rovatsou, Codeplay s R&D Product Development Lead & Contributor to SYCL. IWOCL Conference May 2014

Vectorisation and Portable Programming using OpenCL

Copyright Khronos Group Page 1

OpenCL Overview. Shanghai March Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group

OpenCL 1.0 to 2.2 : a quick survey

SIGGRAPH Briefing August 2014

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

Khronos Connects Software to Silicon

OpenCL Press Conference

Copyright Khronos Group 2012 Page 1. OpenCL 1.2. August 2012

The Role of Standards in Heterogeneous Programming

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

HSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015!

HKG OpenCL Support by NNVM & TVM. Jammy Zhou - Linaro

Using SYCL as an Implementation Framework for HPX.Compute

Single-source SYCL C++ on Xilinx FPGA. Xilinx Research Labs Khronos 2017/11/12 19

Accelerating Vision Processing

OpenCL C++ kernel language

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1

SYCL for OpenCL in a Nutshell

OpenCL Base Course Ing. Marco Stefano Scroppo, PhD Student at University of Catania

More performance options

Vulkan 1.1 March Copyright Khronos Group Page 1

Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

trisycl Open Source C++17 & OpenMP-based OpenCL SYCL prototype Ronan Keryell 05/12/2015 IWOCL 2015 SYCL Tutorial Khronos OpenCL SYCL committee

From Application to Technology OpenCL Application Processors Chung-Ho Chen

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function

Programming Models for Multi- Threading. Brian Marshall, Advanced Research Computing

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

Modern Processor Architectures. L25: Modern Compiler Design

How GPUs can find your next hit: Accelerating virtual screening with OpenCL. Simon Krige

OpenACC. Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer

SIMULATOR AMD RESEARCH JUNE 14, 2015

Open Compute Stack (OpenCS) Overview. D.D. Nikolić Updated: 20 August 2018 DAE Tools Project,

Update on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem

Copyright Khronos Group Page 1

OpenCL TM & OpenMP Offload on Sitara TM AM57x Processors

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

OpenCL Vectorising Features. Andreas Beckmann

General Purpose GPU Programming (1) Advanced Operating Systems Lecture 14

Parallel Programming. Libraries and Implementations

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Copyright Khronos Group Page 1

ECE 574 Cluster Computing Lecture 17

Comparison and analysis of parallel tasking performance for an irregular application

Colin Riddell GPU Compiler Developer Codeplay Visit us at

RUNTIME SUPPORT FOR ADAPTIVE SPATIAL PARTITIONING AND INTER-KERNEL COMMUNICATION ON GPUS

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT

HPC future trends from a science perspective

AMD CORPORATE TEMPLATE AMD Radeon Open Compute Platform Felix Kuehling

Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17

Getting Started with Intel SDK for OpenCL Applications

Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Alexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria

Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1

Addressing Heterogeneity in Manycore Applications

Martin Kruliš, v

Vulkan Launch Webinar 18 th February Copyright Khronos Group Page 1

Lecture Topic: An Overview of OpenCL on Xeon Phi

Copyright Khronos Group, Page 1. OpenCL. GDC, March 2010

CUDA Performance Optimization

OpenCL Common Runtime for Linux on x86 Architecture

Parallel Programming Libraries and implementations

Request for Quotations

THE HETEROGENEOUS SYSTEM ARCHITECTURE IT S BEYOND THE GPU

GPU programming. Dr. Bernhard Kainz

Dynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California

Parallel and Distributed Computing

Evaluating OpenMP s Effectiveness in the Many-Core Era

Ecosystem Overview Neil Trevett Khronos President NVIDIA Vice President Developer

GPU Programming with Ateji PX June 8 th Ateji All rights reserved.

Auto-tunable GPU BLAS

DEVELOPER DAY. Vulkan Subgroup Explained Daniel Koch NVIDIA MONTRÉAL APRIL Copyright Khronos Group Page 1

Applying OpenCL. IWOCL, May Andrew Richards

Distributed & Heterogeneous Programming in C++ for HPC at SC17

SPOC : GPGPU programming through Stream Processing with OCaml

The OpenVX Computer Vision and Neural Network Inference

cuda-on-cl A compiler and runtime for running NVIDIA CUDA C++11 applications on OpenCL 1.2 devices Hugh Perkins (ASAPP)

Copyright Khronos Group, Page 1. OpenCL Overview. February 2010

Silicon Acceleration APIs

Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President

Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President. Copyright Khronos Group, Page 1

The Heterogeneous Programming Jungle. Service d Expérimentation et de développement Centre Inria Bordeaux Sud-Ouest

Why? High performance clusters: Fast interconnects Hundreds of nodes, with multiple cores per node Large storage systems Hardware accelerators

CUDA 5 and Beyond. Mark Ebersole. Original Slides: Mark Harris 2012 NVIDIA

Introduction to GPU hardware and to CUDA

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

Unified Memory. Notes on GPU Data Transfers. Andreas Herten, Forschungszentrum Jülich, 24 April Member of the Helmholtz Association

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

OpenACC 2.6 Proposed Features

Transcription:

Mitglied der Helmholtz-Gemeinschaft OpenCL: History & Future November 20, 2017

OpenCL Portable Heterogeneous Computing 2 APIs and 2 kernel languages C Platform Layer API OpenCL C and C++ kernel language to write parallel code C runtime API to build and execute kernels across multiple devices one code tree can be executed on CPUs, GPUs, DSPs, FPGAs, and hardware dynamically balance work across available platforms November 20, 2017 Slide 2

OpenCL 1.x 1.0 : released with Apple Mac OS X Snow Leopard on August 28, 2009 support announced by many companies, e.g. AMD, NVIDIA, IBM 1.1 : released June 2010, major new features: new data types, e.g. 3-component vectors handling commands from multiple host threads processing buffers across multiple devices subbuffers operations on buffer regions enhanced use of events to drive command execution new built-in C functions, e.g. integer clamp, shuffle, and asynchronous strided copies improved OpenGL interoperability November 20, 2017 Slide 3

OpenCL 1.x 1.2 : released November 2011, major new features: device partitioning: the ability to partition a device into sub-devices; work assignments can be allocated to individual compute units separate compilation and linking of objects; functionality to compile OpenCL into external libraries enhanced image support: 1.2 adds support for 1D images and 1D/2D image arrays built-in kernels: custom devices that contain specific unique functionality are now integrated more closely into the OpenCL framework November 20, 2017 Slide 4

OpenCL 2.x 2.0 : released November 2014, major new features: device-side enqueue (dynamic parallelism): kernels can add new work to device-side queues; clcreatecommandqueuewithproperties (host), enqueue_kernel (device) SVM - shared virtual memory: host and OpenCL devices can share the same virtual address range; use shared pointers instead of copying buffers across devices; coarse-grain/fine-grain SVM; clsvmalloc, clsetkernelargsvmpointer passing data between kernels using pipes: clcreatepipe (host), read_pipe, write_pipe (device) new built-ins on workgroup/subgroup level (i.e. warps, wavefronts): e.g. work_group_all/any, _broadcast, _reduce generic address space: a pointer to it can reference data in the private, local, or global address spaces C11 atomics November 20, 2017 Slide 5

OpenCL 2.x 2.1 : released November 2015 subgroups supported in OpenCL core (no longer as extension) additional subgroup query operations evolving to include a C++ kernel language: new OpenCL C++ kernel language based on C++ 14 SYCL: single source programming model industry support for OpenCL 2.1: SPIR: Standard Portable Intermediate Representation SPIR-V: true cross-api, fully defined by Khronos 2.2 : released May 2017 OpenCL C++ kernel language static subset of C++ 14 new Khronos SPIR-V 1.2 intermediate language which fully supports the OpenCL C++ kernel language pipe storage is new device-side type in OpenCL 2.2 that is useful for FPGA implementations for the first time, released the full source of the OpenCL 2.2 specifications and conformance tests November 20, 2017 Slide 6

SYCL C++ Single-source Heterogeneous Programming for OpenCL SYCL single-source programming enables host and kernel code to be contained in the same source file using the same templates for both, with full OpenCL acceleration seamless integration with OpenCL programs, C/C++ libraries and frameworks such as OpenMP includes templates and lambda functions for higher-level application software that can be cleanly coded for optimized acceleration November 20, 2017 Slide 7

SYCL Example Code #include <CL/sycl.hpp> int main () {... // Device buffers buffer<float, 1 > buf_a(array_a, range<1>(count)); buffer<float, 1 > buf_b(array_b, range<1>(count)); buffer<float, 1 > buf_c(array_c, range<1>(count)); buffer<float, 1 > buf_r(array_r, range<1>(count)); queue myqueue; myqueue.submit([&](handler& cgh) { // Data accessors auto a = buf_a.get_access<access::read>(cgh); auto b = buf_b.get_access<access::read>(cgh); auto c = buf_c.get_access<access::read>(cgh); auto r = buf_r.get_access<access::write>(cgh); // Kernel cgh.parallel_for<class three_way_add>(count, [=](id<> i) { r[i] = a[i] + b[i] + c[i]; }) ); });... } November 20, 2017 Slide 8

SPIR - Standard Portable Intermediate Representation portable encoding of device programs; enable 3 rd party code generation targeting OpenCL platforms without going through OpenCL SPIR 1.2 is an encoding of OpenCL C device programs in LLVM IR SPIR-V - true cross-api standard that is fully defined by Khronos with native support for shader and kernel features November 20, 2017 Slide 9

AMD Boltzmann Initiative strategic investments in heterogeneous system architecture (HSA) suite of tools designed to ease development of high-performance, energy efficient heterogeneous computing systems new compiler for Heterogeneous Computing (HCC) Linux driver and runtime focused on the needs of HPC cluster-class computing HIP-ifying CUDA applications translating CUDA source to run on AMD GPUs availability an early access program for the "Boltzmann Initiative" tools is planned for Q1 2016 November 20, 2017 Slide 10

IWOCL 2017 Conference (Toronto, CA) Advanced Hands-On OpenCL Tutorial Simon McIntosh-Smith and James Price, University of Bristol Based on the 2-day OpenSource course at: https://handsonopencl.github.io Workshops and Talks Heterogeneous Computing Using Modern C++ with OpenCL Devices An Open Ecosystem for Software Programmers to Compute on FPGAs Harnessing the Power of FPGAs with the Intel FPGA SDK for OpenCL Towards an Asynchronous Data Flow Model for SYCL 2.2 KART A Runtime Compilation Library for Improving HPC Application Performance IWOCL 2018 14-16 May, 2018, Oxford, UK http://www.iwocl.org/ November 20, 2017 Slide 11

References Khronos OpenCL https://www.khronos.org/opencl IWOCL 2016 International Workshop on OpenCL 19-21 April 2016, Vienna www.iwocl.org Portable Performance with OpenCL on Intel Xeon Phi http://www.techenablement.com/portable-performanceopencl-intel-xeon-phi November 20, 2017 Slide 12