OpenCL Device Fission Benedict R. Gaster, AMD

Size: px
Start display at page:

Download "OpenCL Device Fission Benedict R. Gaster, AMD"

Transcription

1 Copyright Khronos Group, Page 1 Fission Benedict R. Gaster, AMD March 2011

2 Fission (cl_ext_device_fission) Provides an interface for sub-dividing an device into multiple sub-devices Typically used to: - Reserve a part of the device for use for high-priority/latency-sensitive tasksl - Subdivide compute devices along some shared hardware feature like a Supported by CPU and Cell Broadband devices - Multicore CPU devices (AMD and Intel) - IBM Cell Broadband In the future support for the GPU too? Developed by - AMD, Apple, IBM, and Intel Copyright Khronos Group, Page 2

3 Copyright Khronos Group, Page 3 is a portable threading library Cross platform: - Windows, Linux, and OS X Threading features: - Asynchronous queues (i.e. back-ends for devices) - Events, dependency control Example class Parallel with public interface: class Parallel { public: Parallel(); static unsigned int atomicadd(unsigned int, volatile unsigned int *); bool parallelfor(int range, std::function<void (int i)>); Simple with Fission

4 Copyright Khronos Group, Page 4 6 Core x86 CPU L2 L2 Memory controller L2 6MB L3 L2 L2 IO controller (HT) L2 #define USE_CL_DEVICE_FISSION 1 #include <CL/cl.hpp> class Parallel { private: cl::context context_; std::vector<cl::> subs_; cl::commandqueue queue_; std::vector<cl::commandqueue> queues_; static void CL_CALLBACK wrapper(void * a); public: Parallel() { DDR3 System memory n GB

5 Copyright Khronos Group, Page 5 6 Core x86 CPU L2 L2 L2 L2 L2 L2 std::vector<cl::platform> platforms; cl::platform::get(&platforms); cl_context_properties properties[] = { CL_CONTEXT_PLATFORM, (cl_context_properties)(platforms[1])(), 0 ; Memory controller 6MB L3 IO controller (HT) context_ = cl::context( CL_DEVICE_TYPE_CPU, properties); DDR3 System memory n GB

6 Copyright Khronos Group, Page 6 1 device mapped to 6 core L2 L2 Memory controller L2 6MB L3 L2 L2 IO controller (HT) L2 std::vector<cl::> devices = context_.getinfo<cl_context_devices>(); We know that it will always return just a single device for CPU, i.e. it will use all cores to execute work-groups on that device. DDR3 System memory n GB

7 Copyright Khronos Group, Page 7 6 devices mapped to 6 cores 1. Check device fission supported L2 L2 Memory controller L2 6MB L3 L2 L2 IO controller (HT) L2 if (devices[0]. getinfo<cl_device_extensions>(). find( "cl_ext_device_fission") == std::string::npos) { exit(-1); DDR3 System memory n GB

8 Copyright Khronos Group, Page 8 6 devices mapped to 6 cores 2. Create Sub-devices L2 L2 L2 6MB L3 L2 L2 L2 cl_device_partition_property_ext props[] = { CL_DEVICE_PARTITION_EQUALLY_EXT, 1, CL_PROPERTIES_LIST_END_EXT, 0 ; Memory controller IO controller (HT) std::vector<cl::s> sdevices; devices[0].createsubs(props, &sdevices); DDR3 System memory n GB

9 Copyright Khronos Group, Page 9 6 devices mapped to 6 cores 3. Create Command Queues L2 L2 L2 6MB L3 L2 L2 L2 for (auto i = sdevices.begin(); i!= sdevices.end(); i++) { queues.push_back( cl::commandqueue(context, *i)); // end of Parallel() Memory controller IO controller (HT) DDR3 System memory n GB

10 Copyright Khronos Group, Page 10 6 CmdQueues mapped to 6 devices Context CPU CPU CPU CPU CPU CPU Queue Queue Queue Queue Queue Queue

11 Copyright Khronos Group, Page 11 6 CmdQueues mapped to 6 devices Context CPU1 CPU CPU CPU CPU CPU Queue Queue Queue Queue Queue Queue Each command queue is asynchronous in execution!

12 Copyright Khronos Group, Page 12 Native Kernels Enqueue C++ functions, compiled by the host compiler, to execute from within an command queue cl_int clenqueuenativekernel (cl_command_queue command_queue, void (*user_func)(void *) void *args, size_t cb_args, cl_uint num_mem_objects, const cl_mem *mem_list, const void **args_mem_loc, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) There is no guarantee that the function will execute in same thread that the enqueue was performed; must be careful about thread-local-storage usage

13 parallelfor bool parfor(int range, std::function<void (int i)> f) { std::vector<cl::event> events; size_t args[2]; args[0] = reinterpret_cast<size_t>(&f); int index = 0; for (int x = 0; x < range; x++) { int numqueues = range - x > queues_.size()? queues_.size() : range - x; cl::event event; while(numqueues > 0) { ; cl::event::waitforevents(events); return true; Copyright Khronos Group, Page 13

14 parallelfor bool parfor(int range, std::function<void (int i)> f) { std::vector<cl::event> events; size_t args[2]; args[0] = reinterpret_cast<size_t>(&f); ; int index = 0; for (int x = 0; x < range; x++) { int numqueues = range - x > queues_.size()? queues_.size() : range - x; cl::event event; while(numqueues > 0) { cl::event::waitforevents(events); return true; args[1] = static_cast<size_t>(index++); queues_[numqueues-1].enqueuenativekernel( wrapper, std::make_pair( static_cast<void *>(args), sizeof(size_t)*2), NULL, NULL, NULL, &event); events.push_back(event); numqueues--; x++; Copyright Khronos Group, Page 14

15 Copyright Khronos Group, Page 15 parallelfor - wrapper private: static void CL_CALLBACK wrapper(void * a) { size_t * args = static_cast<size_t *>(a); std::function<void (int i)> * f = reinterpret_cast<std::function<void (int i)>*>(args[0]); (*f)(static_cast<int>(args[1])); ; // class Parallel

16 How many primes const unsigned int numnumbers = 1024 * 1024; int main(void) volatile unsigned int numprimes = 0; int * numbers = new int[numnumbers]; Parallel parallel; parallel.parallelfor(numnumbers, [numbers, &numprimes] (int x) { auto isprime = [] (unsigned int n) -> bool { if (n == 1 n == 2) { return true; if (n % 2 == 0) { return false; for (unsigned int odd = 3; odd <= static_cast<unsigned int>(sqrtf(static_cast<float>(n))); odd +=2) { if (n % odd == 0) { return false; return true; ; // isprime if (isprime(numbers[x])) { Parallel::atomicAdd(1, &numprimes); ); // parallelfor //main Copyright Khronos Group, Page 16

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 GPGPU History Current GPU Architecture OpenCL Framework Example Optimizing Previous Example Alternative Architectures 2 1996: 3Dfx Voodoo 1 First graphical (3D) accelerator for desktop

More information

Advanced OpenCL Event Model Usage

Advanced OpenCL Event Model Usage Advanced OpenCL Event Model Usage Derek Gerstmann University of Western Australia http://local.wasp.uwa.edu.au/~derek OpenCL Event Model Usage Outline Execution Model Usage Patterns Synchronisation Event

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 GPGPU History Current GPU Architecture OpenCL Framework Example (and its Optimization) Alternative Frameworks Most Recent Innovations 2 1996: 3Dfx Voodoo 1 First graphical (3D) accelerator

More information

Heterogeneous Computing

Heterogeneous Computing OpenCL Hwansoo Han Heterogeneous Computing Multiple, but heterogeneous multicores Use all available computing resources in system [AMD APU (Fusion)] Single core CPU, multicore CPU GPUs, DSPs Parallel programming

More information

OpenCL in Action. Ofer Rosenberg

OpenCL in Action. Ofer Rosenberg pencl in Action fer Rosenberg Working with pencl API pencl Boot Platform Devices Context Queue Platform Query int GetPlatform (cl_platform_id &platform, char* requestedplatformname) { cl_uint numplatforms;

More information

Advanced OpenMP. Other threading APIs

Advanced OpenMP. Other threading APIs Advanced OpenMP Other threading APIs What s wrong with OpenMP? OpenMP is designed for programs where you want a fixed number of threads, and you always want the threads to be consuming CPU cycles. - cannot

More information

GPGPU COMPUTE ON AMD. Udeepta Bordoloi April 6, 2011

GPGPU COMPUTE ON AMD. Udeepta Bordoloi April 6, 2011 GPGPU COMPUTE ON AMD Udeepta Bordoloi April 6, 2011 WHY USE GPU COMPUTE CPU: scalar processing + Latency + Optimized for sequential and branching algorithms + Runs existing applications very well - Throughput

More information

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD INTRODUCTION TO OPENCL TM A Beginner s Tutorial Udeepta Bordoloi AMD IT S A HETEROGENEOUS WORLD Heterogeneous computing The new normal CPU Many CPU s 2, 4, 8, Very many GPU processing elements 100 s Different

More information

ENHANCED VIDEO PROCESSING WITH FFMPEG AND OPENCL. Kelvin Phan Massey University. 2016

ENHANCED VIDEO PROCESSING WITH FFMPEG AND OPENCL. Kelvin Phan Massey University. 2016 ENHANCED VIDEO PROCESSING WITH FFMPEG AND OPENCL Abstract: A master`s project to analysis the performance of parallel computation on audio video processing, by using OpenCL parallel computing language

More information

Advanced Topics in Heterogeneous Programming with OpenCL

Advanced Topics in Heterogeneous Programming with OpenCL Advanced Topics in Heterogeneous Programming with OpenCL Ben Gaster AMD benedict.gaster @amd.com Tim Mattson Intel timothy.g.mattson @intel.com Ian Buck NVIDIA ibuck @nvidia.com Mike Houston AMD Michael.Houston

More information

CS 677: Parallel Programming for Many-core Processors Lecture 11

CS 677: Parallel Programming for Many-core Processors Lecture 11 1 CS 677: Parallel Programming for Many-core Processors Lecture 11 Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Project Status Update Due

More information

OpenCL Overview Benedict R. Gaster, AMD

OpenCL Overview Benedict R. Gaster, AMD Copyright Khronos Group, 2011 - Page 1 OpenCL Overview Benedict R. Gaster, AMD March 2010 The BIG Idea behind OpenCL OpenCL execution model - Define N-dimensional computation domain - Execute a kernel

More information

Advanced Topics in Heterogeneous Programming with OpenCL

Advanced Topics in Heterogeneous Programming with OpenCL Advanced Topics in Heterogeneous Programming with OpenCL Ben Gaster AMD benedict.gaster @amd.com Tim Mattson Intel timothy.g.mattson @intel.com Ian Buck NVIDIA ibuck @nvidia.com Mike Houston AMD Michael.Houston

More information

ECE 574 Cluster Computing Lecture 17

ECE 574 Cluster Computing Lecture 17 ECE 574 Cluster Computing Lecture 17 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 6 April 2017 HW#8 will be posted Announcements HW#7 Power outage Pi Cluster Runaway jobs (tried

More information

OpenCL Events. Mike Bailey. Oregon State University. OpenCL Events

OpenCL Events. Mike Bailey. Oregon State University. OpenCL Events 1 OpenCL Events Mike Bailey mjb@cs.oregonstate.edu opencl.events.pptx OpenCL Events 2 An event is an object that communicates the status of OpenCL commands Event Read Buffer dc Execute Kernel Write Buffer

More information

OpenCL Events. Mike Bailey. Computer Graphics opencl.events.pptx

OpenCL Events. Mike Bailey. Computer Graphics opencl.events.pptx 1 OpenCL Events This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License Mike Bailey mjb@cs.oregonstate.edu opencl.events.pptx OpenCL Events 2 An

More information

Introduction to OpenCL. Benedict R. Gaster October, 2010

Introduction to OpenCL. Benedict R. Gaster October, 2010 Introduction to OpenCL Benedict R. Gaster October, 2010 OpenCL With OpenCL you can Leverage CPUs and GPUs to accelerate parallel computation Get dramatic speedups for computationally intensive applications

More information

GPGPU IGAD 2014/2015. Lecture 1. Jacco Bikker

GPGPU IGAD 2014/2015. Lecture 1. Jacco Bikker GPGPU IGAD 2014/2015 Lecture 1 Jacco Bikker Today: Course introduction GPGPU background Getting started Assignment Introduction GPU History History 3DO-FZ1 console 1991 History NVidia NV-1 (Diamond Edge

More information

OpenCL - Parallel computing for CPUs and GPUs

OpenCL - Parallel computing for CPUs and GPUs OpenCL - Parallel computing for CPUs and GPUs Benedict R. Gaster AMD Products Group Lee Howes Office of the CTO Agenda ATI Stream Computing Compute Hardware Model Compute Programming Model OpenCL and Direct

More information

INTRODUCTION TO OPENCL. Jason B. Smith, Hood College May

INTRODUCTION TO OPENCL. Jason B. Smith, Hood College May INTRODUCTION TO OPENCL Jason B. Smith, Hood College May 4 2011 WHAT IS IT? Use heterogeneous computing platforms Specifically for computationally intensive apps Provide a means for portable parallelism

More information

Modern C++ Parallelism from CPU to GPU

Modern C++ Parallelism from CPU to GPU Modern C++ Parallelism from CPU to GPU Simon Brand @TartanLlama Senior Software Engineer, GPGPU Toolchains, Codeplay C++ Russia 2018 2018-04-21 Agenda About me and Codeplay C++17 CPU Parallelism Third-party

More information

OpenCL* Device Fission for CPU Performance

OpenCL* Device Fission for CPU Performance OpenCL* Device Fission for CPU Performance Summary Device fission is an addition to the OpenCL* specification that gives more power and control to OpenCL programmers over managing which computational units

More information

CLU: Open Source API for OpenCL Prototyping

CLU: Open Source API for OpenCL Prototyping CLU: Open Source API for OpenCL Prototyping Presenter: Adam Lake@Intel Lead Developer: Allen Hux@Intel Contributors: Benedict Gaster@AMD, Lee Howes@AMD, Tim Mattson@Intel, Andrew Brownsword@Intel, others

More information

Introduction to Parallel & Distributed Computing OpenCL: memory & threads

Introduction to Parallel & Distributed Computing OpenCL: memory & threads Introduction to Parallel & Distributed Computing OpenCL: memory & threads Lecture 12, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In this Lecture Example: image rotation GPU threads and scheduling Understanding

More information

CS/EE 217 GPU Architecture and Parallel Programming. Lecture 22: Introduction to OpenCL

CS/EE 217 GPU Architecture and Parallel Programming. Lecture 22: Introduction to OpenCL CS/EE 217 GPU Architecture and Parallel Programming Lecture 22: Introduction to OpenCL Objective To Understand the OpenCL programming model basic concepts and data types OpenCL application programming

More information

OpenCL: History & Future. November 20, 2017

OpenCL: History & Future. November 20, 2017 Mitglied der Helmholtz-Gemeinschaft OpenCL: History & Future November 20, 2017 OpenCL Portable Heterogeneous Computing 2 APIs and 2 kernel languages C Platform Layer API OpenCL C and C++ kernel language

More information

Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President. Copyright Khronos Group, Page 1

Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President. Copyright Khronos Group, Page 1 Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President Copyright Khronos Group, 2009 - Page 1 Introduction and aims of OpenCL - Neil Trevett, NVIDIA OpenCL Specification walkthrough - Mike

More information

Masterpraktikum Scientific Computing

Masterpraktikum Scientific Computing Masterpraktikum Scientific Computing High-Performance Computing Michael Bader Alexander Heinecke Technische Universität München, Germany Outline Intel Cilk Plus OpenCL Übung, October 7, 2012 2 Intel Cilk

More information

Using SYCL as an Implementation Framework for HPX.Compute

Using SYCL as an Implementation Framework for HPX.Compute Using SYCL as an Implementation Framework for HPX.Compute Marcin Copik 1 Hartmut Kaiser 2 1 RWTH Aachen University mcopik@gmail.com 2 Louisiana State University Center for Computation and Technology The

More information

AMath 483/583, Lecture 24, May 20, Notes: Notes: What s a GPU? Notes: Some GPU application areas

AMath 483/583, Lecture 24, May 20, Notes: Notes: What s a GPU? Notes: Some GPU application areas AMath 483/583 Lecture 24 May 20, 2011 Today: The Graphical Processing Unit (GPU) GPU Programming Today s lecture developed and presented by Grady Lemoine References: Andreas Kloeckner s High Performance

More information

Lecture Topic: An Overview of OpenCL on Xeon Phi

Lecture Topic: An Overview of OpenCL on Xeon Phi C-DAC Four Days Technology Workshop ON Hybrid Computing Coprocessors/Accelerators Power-Aware Computing Performance of Applications Kernels hypack-2013 (Mode-4 : GPUs) Lecture Topic: on Xeon Phi Venue

More information

OpenCL Training Course

OpenCL Training Course OpenCL Training Course Intermediate Level Class http://www.ksc.re.kr http://webedu.ksc.re.kr INDEX 1. Class introduction 2. Multi-platform and multi-device 3. OpenCL APIs in detail 4. OpenCL C language

More information

Sistemi Operativi e Reti

Sistemi Operativi e Reti Sistemi Operativi e Reti GPGPU Computing: the multi/many core computing era Dipartimento di Matematica e Informatica Corso di Laurea Magistrale in Informatica Osvaldo Gervasi ogervasi@computer.org 1 2

More information

OpenCL The Open Standard for Heterogeneous Parallel Programming

OpenCL The Open Standard for Heterogeneous Parallel Programming OpenCL The Open Standard for Heterogeneous Parallel Programming March 2009 Copyright Khronos Group, 2009 - Page 1 Close-to-the-Silicon Standards Khronos creates Foundation-Level acceleration APIs - Needed

More information

GPU COMPUTING RESEARCH WITH OPENCL

GPU COMPUTING RESEARCH WITH OPENCL GPU COMPUTING RESEARCH WITH OPENCL Studying Future Workloads and Devices Perhaad Mistry, Dana Schaa, Enqiang Sun, Rafael Ubal, Yash Ukidave, David Kaeli Dept of Electrical and Computer Engineering Northeastern

More information

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization

More information

OPENCL C++ Lee Howes AMD Senior Member of Technical Staff, Stream Computing

OPENCL C++ Lee Howes AMD Senior Member of Technical Staff, Stream Computing OPENCL C++ Lee Howes AMD Senior Member of Technical Staff, Stream Computing Benedict Gaster AMD Principle Member of Technical Staff, AMD Research (now at Qualcomm) OPENCL TODAY WHAT WORKS, WHAT DOESN T

More information

General Purpose GPU Programming (1) Advanced Operating Systems Lecture 14

General Purpose GPU Programming (1) Advanced Operating Systems Lecture 14 General Purpose GPU Programming (1) Advanced Operating Systems Lecture 14 Lecture Outline Heterogenous multi-core systems and general purpose GPU programming Programming models Heterogenous multi-kernels

More information

Copyright Khronos Group Page 1. Introduction to SYCL. SYCL Tutorial IWOCL

Copyright Khronos Group Page 1. Introduction to SYCL. SYCL Tutorial IWOCL Copyright Khronos Group 2015 - Page 1 Introduction to SYCL SYCL Tutorial IWOCL 2015-05-12 Copyright Khronos Group 2015 - Page 2 Introduction I am - Lee Howes - Senior staff engineer - GPU systems team

More information

GPU acceleration on IB clusters. Sadaf Alam Jeffrey Poznanovic Kristopher Howard Hussein Nasser El-Harake

GPU acceleration on IB clusters. Sadaf Alam Jeffrey Poznanovic Kristopher Howard Hussein Nasser El-Harake GPU acceleration on IB clusters Sadaf Alam Jeffrey Poznanovic Kristopher Howard Hussein Nasser El-Harake HPC Advisory Council European Workshop 2011 Why it matters? (Single node GPU acceleration) Control

More information

AMD Accelerated Parallel Processing. OpenCL User Guide. October rev1.0

AMD Accelerated Parallel Processing. OpenCL User Guide. October rev1.0 AMD Accelerated Parallel Processing OpenCL User Guide October 2014 rev1.0 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Accelerated Parallel Processing, the AMD Accelerated

More information

AMD Accelerated Parallel Processing. OpenCL User Guide. December rev1.0

AMD Accelerated Parallel Processing. OpenCL User Guide. December rev1.0 AMD Accelerated Parallel Processing OpenCL User Guide December 2014 rev1.0 2014 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Accelerated Parallel Processing, the AMD Accelerated

More information

Making OpenCL Simple with Haskell. Benedict R. Gaster January, 2011

Making OpenCL Simple with Haskell. Benedict R. Gaster January, 2011 Making OpenCL Simple with Haskell Benedict R. Gaster January, 2011 Attribution and WARNING The ideas and work presented here are in collaboration with: Garrett Morris (AMD intern 2010 & PhD student Portland

More information

Many-core Processors Lecture 11. Instructor: Philippos Mordohai Webpage:

Many-core Processors Lecture 11. Instructor: Philippos Mordohai Webpage: 1 CS 677: Parallel Programming for Many-core Processors Lecture 11 Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Outline More CUDA Libraries

More information

Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President

Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President 4 th Annual Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President Copyright Khronos Group, 2009 - Page 1 CPUs Multiple cores driving performance increases Emerging Intersection GPUs Increasingly

More information

Threaded Programming. Lecture 9: Alternatives to OpenMP

Threaded Programming. Lecture 9: Alternatives to OpenMP Threaded Programming Lecture 9: Alternatives to OpenMP What s wrong with OpenMP? OpenMP is designed for programs where you want a fixed number of threads, and you always want the threads to be consuming

More information

SimpleOpenCL: desenvolupament i documentació d'una llibreria que facilita la programació paral lela en OpenCL

SimpleOpenCL: desenvolupament i documentació d'una llibreria que facilita la programació paral lela en OpenCL Treball de fi de Carrera ENGINYERIA TÈCNICA EN INFORMÀTICA DE SISTEMES Facultat de Matemàtiques Universitat de Barcelona SimpleOpenCL: desenvolupament i documentació d'una llibreria que facilita la programació

More information

Alexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria

Alexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria Alexei Katranov IWOCL '16, April 21, 2016, Vienna, Austria Hardware: customization, integration, heterogeneity Intel Processor Graphics CPU CPU CPU CPU Multicore CPU + integrated units for graphics, media

More information

Chapter 6 Parallel Loops

Chapter 6 Parallel Loops Chapter 6 Parallel Loops Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Solid State Graphics (SSG) API User Manual

Solid State Graphics (SSG) API User Manual Solid State Graphics (SSG) API User Manual Contents 1 Introduction... 3 2 Requirements... 3 3 OpenCL Extension Specification... 3 3.1 clcreatessgfileobjectamd... 3 3.2 clgetssgfileobjectinfoamd... 4 3.3

More information

GPGPU IGAD 2014/2015. Lecture 4. Jacco Bikker

GPGPU IGAD 2014/2015. Lecture 4. Jacco Bikker GPGPU IGAD 2014/2015 Lecture 4 Jacco Bikker Today: Demo time! Parallel scan Parallel sort Assignment Demo Time Parallel scan What it is: in: 1 1 6 2 7 3 2 out: 0 1 2 8 10 17 20 C++: out[0] = 0 for ( i

More information

Instructions for setting up OpenCL Lab

Instructions for setting up OpenCL Lab Instructions for setting up OpenCL Lab This document describes the procedure to setup OpenCL Lab for Linux and Windows machine. Specifically if you have limited no. of graphics cards and a no. of users

More information

Michael Kinsner, Dirk Seynhaeve IWOCL 2018

Michael Kinsner, Dirk Seynhaeve IWOCL 2018 Michael Kinsner, Dirk Seynhaeve IWOCL 2018 Topics 1. FPGA overview 2. Motivating application classes 3. Host pipes 4. Some data 2 FPGA: Fine-grained Massive Parallelism Intel Stratix 10 FPGA: Over 5 Million

More information

WebCL Overview and Roadmap

WebCL Overview and Roadmap Copyright Khronos Group, 2011 - Page 1 WebCL Overview and Roadmap Tasneem Brutch Chair WebCL Working Group Samsung Electronics Copyright Khronos Group, 2011 - Page 2 WebCL Motivation Enable high performance

More information

Introduction to pthreads

Introduction to pthreads CS 220: Introduction to Parallel Computing Introduction to pthreads Lecture 25 Threads In computing, a thread is the smallest schedulable unit of execution Your operating system has a scheduler that decides

More information

Parallel programming languages:

Parallel programming languages: Parallel programming languages: A new renaissance or a return to the dark ages? Simon McIntosh-Smith University of Bristol Microelectronics Research Group simonm@cs.bris.ac.uk 1 The Microelectronics Group

More information

ARCHITECTURAL SUPPORT FOR IRREGULAR PROGRAMS AND PERFORMANCE MONITORING FOR HETEROGENEOUS SYSTEMS

ARCHITECTURAL SUPPORT FOR IRREGULAR PROGRAMS AND PERFORMANCE MONITORING FOR HETEROGENEOUS SYSTEMS ARCHITECTURAL SUPPORT FOR IRREGULAR PROGRAMS AND PERFORMANCE MONITORING FOR HETEROGENEOUS SYSTEMS A Thesis Presented by Perhaad Mistry to The Department of Electrical and Computer Engineering in partial

More information

SYCL for OpenCL. in a nutshell. Maria Rovatsou, Codeplay s R&D Product Development Lead & Contributor to SYCL. IWOCL Conference May 2014

SYCL for OpenCL. in a nutshell. Maria Rovatsou, Codeplay s R&D Product Development Lead & Contributor to SYCL. IWOCL Conference May 2014 SYCL for OpenCL in a nutshell Maria Rovatsou, Codeplay s R&D Product Development Lead & Contributor to SYCL! IWOCL Conference May 2014 SYCL for OpenCL in a nutshell SYCL in the OpenCL ecosystem SYCL aims

More information

Lecture 9(B): GPUs & GPGPU

Lecture 9(B): GPUs & GPGPU Lecture 9(B): GPUs & GPGPU John-Philip Taylor 26 March 2015 Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Outline OpenGL Primitives and Vertices Vertex Shader Rasteriser Fragment Shader OpenCL

More information

Tools for Multi-Cores and Multi-Targets

Tools for Multi-Cores and Multi-Targets Tools for Multi-Cores and Multi-Targets Sebastian Pop Advanced Micro Devices, Austin, Texas The Linux Foundation Collaboration Summit April 7, 2011 1 / 22 Sebastian Pop Tools for Multi-Cores and Multi-Targets

More information

OpenCL. Dr. David Brayford, LRZ, PRACE PATC: Intel MIC & GPU Programming Workshop

OpenCL. Dr. David Brayford, LRZ, PRACE PATC: Intel MIC & GPU Programming Workshop OpenCL Dr. David Brayford, LRZ, brayford@lrz.de PRACE PATC: Intel MIC & GPU Programming Workshop 1 Open Computing Language Open, royalty-free standard C-language extension For cross-platform, parallel

More information

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo OpenMP Amasis Brauch German University in Cairo May 4, 2010 Simple Algorithm 1 void i n c r e m e n t e r ( short a r r a y ) 2 { 3 long i ; 4 5 for ( i = 0 ; i < 1000000; i ++) 6 { 7 a r r a y [ i ]++;

More information

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015 The Rise of Open Programming Frameworks JC BARATAULT IWOCL May 2015 1,000+ OpenCL projects SourceForge GitHub Google Code BitBucket 2 TUM.3D Virtual Wind Tunnel 10K C++ lines of code, 30 GPU kernels CUDA

More information

Chapter. Focus of the Course. Object-Oriented Software Development. program design, implementation, and testing

Chapter. Focus of the Course. Object-Oriented Software Development. program design, implementation, and testing Introduction 1 Chapter 5 TH EDITION Lewis & Loftus java Software Solutions Foundations of Program Design 2007 Pearson Addison-Wesley. All rights reserved Focus of the Course Object-Oriented Software Development

More information

OpenACC (Open Accelerators - Introduced in 2012)

OpenACC (Open Accelerators - Introduced in 2012) OpenACC (Open Accelerators - Introduced in 2012) Open, portable standard for parallel computing (Cray, CAPS, Nvidia and PGI); introduced in 2012; GNU has an incomplete implementation. Uses directives in

More information

OpenCL Base Course Ing. Marco Stefano Scroppo, PhD Student at University of Catania

OpenCL Base Course Ing. Marco Stefano Scroppo, PhD Student at University of Catania OpenCL Base Course Ing. Marco Stefano Scroppo, PhD Student at University of Catania Course Overview This OpenCL base course is structured as follows: Introduction to GPGPU programming, parallel programming

More information

OpenCL parallel Processing using General Purpose Graphical Processing units TiViPE software development

OpenCL parallel Processing using General Purpose Graphical Processing units TiViPE software development TiViPE Visual Programming OpenCL parallel Processing using General Purpose Graphical Processing units TiViPE software development Technical Report Copyright c TiViPE 2012. All rights reserved. Tino Lourens

More information

OpenCL TM & OpenMP Offload on Sitara TM AM57x Processors

OpenCL TM & OpenMP Offload on Sitara TM AM57x Processors OpenCL TM & OpenMP Offload on Sitara TM AM57x Processors 1 Agenda OpenCL Overview of Platform, Execution and Memory models Mapping these models to AM57x Overview of OpenMP Offload Model Compare and contrast

More information

INTRODUCING OPENCL TM

INTRODUCING OPENCL TM INTRODUCING OPENCL TM The open standard for parallel programming across heterogeneous processors 1 PPAM 2011 Tutorial IT S A MEANY-CORE HETEROGENEOUS WORLD Multi-core, heterogeneous computing The new normal

More information

OpenCL and the quest for portable performance. Tim Mattson Intel Labs

OpenCL and the quest for portable performance. Tim Mattson Intel Labs OpenCL and the quest for portable performance Tim Mattson Intel Labs Disclaimer The views expressed in this talk are those of the speaker and not his employer. I am in a research group and know very little

More information

CROWDMARK. Examination Midterm. Spring 2017 CS 350. Closed Book. Page 1 of 30. University of Waterloo CS350 Midterm Examination.

CROWDMARK. Examination Midterm. Spring 2017 CS 350. Closed Book. Page 1 of 30. University of Waterloo CS350 Midterm Examination. Times: Thursday 2017-06-22 at 19:00 to 20:50 (7 to 8:50PM) Duration: 1 hour 50 minutes (110 minutes) Exam ID: 3520593 Please print in pen: Waterloo Student ID Number: WatIAM/Quest Login Userid: Sections:

More information

Andrew Brownsword, EA BlackBox March Copyright Khronos Group, Page 1

Andrew Brownsword, EA BlackBox March Copyright Khronos Group, Page 1 Andrew Brownsword, EA BlackBox March 2009 Copyright Khronos Group, 2009 - Page 1 Supports a variety of compute resources - CPUs, GPUs, SPUs, accelerators - Unifies programming for devices that have very

More information

clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018

clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018 clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018 ANECDOTE DISCOVERING A BUFFER OVERFLOW CPU GPU MEMORY MEMORY Data Data Data Data Data 2 clarmor: A

More information

Cougar Open CL v1.0. Users Guide for Open CL support for Delphi/ C++Builder and.net

Cougar Open CL v1.0. Users Guide for Open CL support for Delphi/ C++Builder and.net Cougar Open CL v1.0 Users Guide for Open CL support for Delphi/ C++Builder and.net MtxVec version v4, rev 2.0 1999-2011 Dew Research www.dewresearch.com Table of Contents Cougar Open CL v1.0... 2 1 About

More information

Grand Central Dispatch. Sri Teja Basava CSCI 5528: Foundations of Software Engineering Spring 10

Grand Central Dispatch. Sri Teja Basava CSCI 5528: Foundations of Software Engineering Spring 10 Grand Central Dispatch Sri Teja Basava CSCI 5528: Foundations of Software Engineering Spring 10 1 New Technologies in Snow Leopard 2 Grand Central Dispatch An Apple technology to optimize application support

More information

OpenCL / OpenGL Texture Interoperability: An Image Blurring Case Study

OpenCL / OpenGL Texture Interoperability: An Image Blurring Case Study 1 OpenCL / OpenGL Texture Interoperability: An Image Blurring Case Study Mike Bailey mjb@cs.oregonstate.edu opencl.opengl.rendertexture.pptx OpenCL / OpenGL Texture Interoperability: The Basic Idea 2 Application

More information

Running Application Specific Kernel Code by a Just-In-Time Compiler. Ake Koomsin Yasushi Shinjo Department of Computer Science University of Tsukuba

Running Application Specific Kernel Code by a Just-In-Time Compiler. Ake Koomsin Yasushi Shinjo Department of Computer Science University of Tsukuba Running Application Specific Kernel Code by a Just-In-Time Compiler Ake Koomsin Yasushi Shinjo Department of Computer Science University of Tsukuba Agenda Motivation & Objective Approach Evaluation Related

More information

Copyright Khronos Group, Page 1 SYCL. SG14, February 2016

Copyright Khronos Group, Page 1 SYCL. SG14, February 2016 Copyright Khronos Group, 2014 - Page 1 SYCL SG14, February 2016 BOARD OF PROMOTERS Over 100 members worldwide any company is welcome to join Copyright Khronos Group 2014 SYCL 1. What is SYCL for and what

More information

OpenCL. Computation on HybriLIT Brief introduction and getting started

OpenCL. Computation on HybriLIT Brief introduction and getting started OpenCL Computation on HybriLIT Brief introduction and getting started Alexander Ayriyan Laboratory of Information Technologies Joint Institute for Nuclear Research 05.09.2014 (Friday) Tutorial in frame

More information

Krzysztof Laskowski, Intel Pavan K Lanka, Intel

Krzysztof Laskowski, Intel Pavan K Lanka, Intel Krzysztof Laskowski, Intel Pavan K Lanka, Intel Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR

More information

VirtualCL (VCL) Programmer and Administrator Guide and Manuals

VirtualCL (VCL) Programmer and Administrator Guide and Manuals VirtualCL (VCL) Cluster Platform Programmer and Administrator Guide and Manuals Revised for VCL-1.25 Oct. 2017 ii Copyright c 2010-2017. All rights reserved. Preface This document presents the VirtualCL

More information

Operating systems fundamentals - B06

Operating systems fundamentals - B06 Operating systems fundamentals - B06 David Kendall Northumbria University David Kendall (Northumbria University) Operating systems fundamentals - B06 1 / 12 Introduction Introduction to threads Reminder

More information

Copyright Khronos Group, Page 1. OpenCL Overview. February 2010

Copyright Khronos Group, Page 1. OpenCL Overview. February 2010 Copyright Khronos Group, 2011 - Page 1 OpenCL Overview February 2010 Copyright Khronos Group, 2011 - Page 2 Khronos Vision Billions of devices increasing graphics, compute, video, imaging and audio capabilities

More information

Real-time Graphics 9. GPGPU

Real-time Graphics 9. GPGPU 9. GPGPU GPGPU GPU (Graphics Processing Unit) Flexible and powerful processor Programmability, precision, power Parallel processing CPU Increasing number of cores Parallel processing GPGPU general-purpose

More information

OpenCL. Matt Sellitto Dana Schaa Northeastern University NUCAR

OpenCL. Matt Sellitto Dana Schaa Northeastern University NUCAR OpenCL Matt Sellitto Dana Schaa Northeastern University NUCAR OpenCL Architecture Parallel computing for heterogenous devices CPUs, GPUs, other processors (Cell, DSPs, etc) Portable accelerated code Defined

More information

Towards Transparent and Efficient GPU Communication on InfiniBand Clusters. Sadaf Alam Jeffrey Poznanovic Kristopher Howard Hussein Nasser El-Harake

Towards Transparent and Efficient GPU Communication on InfiniBand Clusters. Sadaf Alam Jeffrey Poznanovic Kristopher Howard Hussein Nasser El-Harake Towards Transparent and Efficient GPU Communication on InfiniBand Clusters Sadaf Alam Jeffrey Poznanovic Kristopher Howard Hussein Nasser El-Harake MPI and I/O from GPU vs. CPU Traditional CPU point-of-view

More information

Real-time Graphics 9. GPGPU

Real-time Graphics 9. GPGPU Real-time Graphics 9. GPGPU GPGPU GPU (Graphics Processing Unit) Flexible and powerful processor Programmability, precision, power Parallel processing CPU Increasing number of cores Parallel processing

More information

Introduction à OpenCL

Introduction à OpenCL 1 1 UDS/IRMA Journée GPU Strasbourg, février 2010 Sommaire 1 OpenCL 2 3 GPU architecture A modern Graphics Processing Unit (GPU) is made of: Global memory (typically 1 Gb) Compute units (typically 27)

More information

Data Parallelism. CSCI 5828: Foundations of Software Engineering Lecture 28 12/01/2016

Data Parallelism. CSCI 5828: Foundations of Software Engineering Lecture 28 12/01/2016 Data Parallelism CSCI 5828: Foundations of Software Engineering Lecture 28 12/01/2016 1 Goals Cover the material in Chapter 7 of Seven Concurrency Models in Seven Weeks by Paul Butcher Data Parallelism

More information

CSC209H Lecture 11. Dan Zingaro. March 25, 2015

CSC209H Lecture 11. Dan Zingaro. March 25, 2015 CSC209H Lecture 11 Dan Zingaro March 25, 2015 Level- and Edge-Triggering (Kerrisk 63.1.1) When is an FD ready? Two answers: Level-triggered: when an operation will not block (e.g. read will not block),

More information

OPENCL WITH AMD FIREPRO W9100 GERMAN ANDRYEYEV MAY 20, 2015

OPENCL WITH AMD FIREPRO W9100 GERMAN ANDRYEYEV MAY 20, 2015 OPENCL WITH AMD FIREPRO W9100 GERMAN ANDRYEYEV MAY 20, 2015 Introducing AMD FirePro W9100 HW COMPARISON W9100(HAWAII) VS W9000(TAHITI) FirePro W9100 FirePro W9000 Improvement Notes Compute Units 44 32

More information

Copyright Khronos Group Page 1. OpenCL BOF SIGGRAPH 2013

Copyright Khronos Group Page 1. OpenCL BOF SIGGRAPH 2013 Copyright Khronos Group 2013 - Page 1 OpenCL BOF SIGGRAPH 2013 Copyright Khronos Group 2013 - Page 2 OpenCL Roadmap OpenCL-HLM (High Level Model) High-level programming model, unifying host and device

More information

Parallel and Distributed Computing

Parallel and Distributed Computing Parallel and Distributed Computing NUMA; OpenCL; MapReduce José Monteiro MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer Science and Engineering

More information

Guillimin HPC Users Meeting January 13, 2017

Guillimin HPC Users Meeting January 13, 2017 Guillimin HPC Users Meeting January 13, 2017 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Please be kind to your fellow user meeting attendees Limit

More information

Rock em Graphic Cards

Rock em Graphic Cards Rock em Graphic Cards Agnes Meyder 27.12.2013, 16:00 1 / 61 Layout Motivation Parallelism Old Standards OpenMPI OpenMP Accelerator Cards CUDA OpenCL OpenACC Hardware C++AMP The End 2 / 61 Layout Motivation

More information

Section - Computer Science

Section - Computer Science Section - Computer Science 1. With respect to the C++ programming language, which is the parameter that is added to every non-static member function when it is called? (i) this pointer (ii) that pointer

More information

Massively Parallel Algorithms

Massively Parallel Algorithms Massively Parallel Algorithms Introduction to CUDA & Many Fundamental Concepts of Parallel Programming G. Zachmann University of Bremen, Germany cgvr.cs.uni-bremen.de Hybrid/Heterogeneous Computation/Architecture

More information

CS 677: Parallel Programming for Many-core Processors Lecture 12

CS 677: Parallel Programming for Many-core Processors Lecture 12 1 CS 677: Parallel Programming for Many-core Processors Lecture 12 Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu CS Department Project Poster

More information

Support Tools for Porting Legacy Applications to Multicore. Natsuki Kawai, Yuri Ardila, Takashi Nakamura, Yosuke Tamura

Support Tools for Porting Legacy Applications to Multicore. Natsuki Kawai, Yuri Ardila, Takashi Nakamura, Yosuke Tamura Support Tools for Porting Legacy Applications to Multicore Natsuki Kawai, Yuri Ardila, Takashi Nakamura, Yosuke Tamura Agenda Introduction PEMAP: Performance Estimator for MAny core Processors The overview

More information

Lecture Topics. Announcements. Today: Operating System Overview (Stallings, chapter , ) Next: Processes (Stallings, chapter

Lecture Topics. Announcements. Today: Operating System Overview (Stallings, chapter , ) Next: Processes (Stallings, chapter Lecture Topics Today: Operating System Overview (Stallings, chapter 2.1-2.4, 2.8-2.10) Next: Processes (Stallings, chapter 3.1-3.6) 1 Announcements Consulting hours posted Self-Study Exercise #3 posted

More information