GPGPU IGAD 2014/2015. Lecture 1. Jacco Bikker
|
|
- Rhoda Joseph
- 5 years ago
- Views:
Transcription
1 GPGPU IGAD 2014/2015 Lecture 1 Jacco Bikker
2 Today: Course introduction GPGPU background Getting started Assignment
3 Introduction GPU History
4 History 3DO-FZ1 console 1991
5 History NVidia NV-1 (Diamond Edge 3D) 1995
6 History 3Dfx Diamond Monster 3D 1996
7 History Quake vs GLQuake 1997
8 History Fixed function pipeline vs Programmable pipeline 2007
9 History
10 History Source: Naffziger, AMD
11 History
12 History GPU - conveyor belt: input = vertices + connectivity step 1: transform step 2: rasterize step 3: shade step 4: z-test output = pixels
13 Introduction void main(void) { float t = iglobaltime; vec2 uv = gl_fragcoord.xy / iresolution.y; float r = length(uv), a = atan(uv.y,uv.x); float i = floor(r*10); a *= floor(pow(128,i/10)); a += 20.*sin(0.5*t) *i-100.*(r*i/10)*cos(0.5*t); r += ( *cos(a)) / 10; r = floor(n*r)/10; gl_fragcolor = (1-r)*vec4(0.5,1,1.5,1); }
14 Introduction Historically, the GPU is a co-processor. GPUs perform well because they have a constrained execution model, which is based on parallelism. GPU programming requires a very different way of expressing algorithms.
15 Introduction This course Teacher background Your role Learning objectives ECTS / lectures / homework / assessment
16 This course AGT6: 7 lectures We start at 10.00am Demo time Break half-way
17 Lecturer Me : dr. Jacco Bikker - CUDA Ray tracing Rendering
18 Your role You: Maybe a GPGPU / shader expert Use AGT6 to get further Or just pass with a 6
19 Objectives Objectives: Get feet wet Generic GPGPU concepts *not*: Detailed API knowledge
20 Details AGT6: 3 ECTS = ~80 hours Weekly homework, unverified Final assignment: free form
21 Background GPU architecture
22 GPU architecture CPU: Designed to run one thread as fast as possible. Use large caches to minimize memory latency Maximize cache usage using pipeline & branch prediction Multi-core processing Task parallelism Interesting tricks: SIMD Hyperthreading
23 GPU architecture GPU: Designed to combat latency using many threads. Hide latency by computation Maximize parallelism Streaming processing Data parallelism Interesting tricks: S I M T Use typical GPU hardware (filtering etc.) Cache anyway
24 GPU architecture CPU Multiple tasks = multiple threads Tasks run different instructions 10s of complex threads execute on a few cores Thread execution managed explicitly GPU SIMD: same instructions on multiple data s of light-weight threads on 100s of cores Threads are managed and scheduled by hardware
25 GPU architecture
26 GPU architecture SIMT Thread execution: Group 32 threads (vertices, pixels, primitives) into warps Each warp executes the same instruction In case of latency, switch to different warp (thus: switch out 32 threads for 32 different threads) Flow control:
27 GPU architecture void main(void) // for each pixel { float t = iglobaltime; vec2 uv = gl_fragcoord.xy / iresolution.y; float r = length(uv), a = atan(uv.y,uv.x); float i = floor(r*10); a *= floor(pow(128,i/10)); a += 20.*sin(0.5*t) *i-100.*(r*i/10)*cos(0.5*t); r += ( *cos(a)) / 10; r = floor(n*r)/10; gl_fragcolor = (1-r)*vec4(0.5,1,1.5,1); }
28 GPU architecture Easy to port to GPU: Image postprocessing Particle effects Ray tracing Actually, a lot of algorithms are not easy to port at all. Decades of legacy, or a fundamental problem?
29 Background Why GPGPU OpenCL vs Shaders vs CUDA
30 Why GPGPU Some tasks are more efficient on the GPU GPU has high theoretical peak performance Prevent wasting processing power
31 OpenCL vs shaders No mapping to graphics context needed Avoid thinking about various transformations of coordinates (world / screen / texture) Access to memory levels that are implicit in OpenGL OpenCL also runs on CPUs
32 OpenCL vs CUDA (but if you must: A Comprehensive Performance Comparison of CUDA and OpenCL, Fang et al., )
33 Getting Started Tools of the trade Template
34 Tools Get your development tools here: NVidia: AMD: Intel:
35 Template Template available from
36 Template kernel void main( write_only image2d_t outimg ) { int column = get_global_id( 0 ); int line = get_global_id( 1 ); // calculate checkerboard pattern int tilex = column / 40; int tiley = line / 40; float color = (float)((tilex + tiley) & 1); // 0 or 1 float4 white = (float4)( 1, 1, 1, 1 ); write_imagef( outimg, (int2)(column, line), color * white ); }
37 Template #version 330 uniform sampler2d color; in vec2 P; in vec2 uv; out vec3 pixel; void main() { // retrieve input pixel pixel = texture( color, uv ).rgb; // darken towards edges float dx = P.x - 0.5, dy = P.y - 0.5; float distance = sqrt( dx * dx + dy * dy ); float scale = 1 - max( 0, distance * ); pixel *= scale; }
38 Template bool Game::Init() { // load shader and texture cloutput = new Texture( SCRWIDTH, SCRHEIGHT, Texture::FLOAT ); shader = new Shader( "shaders/checker.vert", "shaders/checker.frag" ); // load OpenCL code kernel = new Kernel( "programs/program.cl", "main" ); // link cl output texture as an OpenCL buffer outputbuffer = clcreatefromgltexture2d( kernel->getcontext(), CL_MEM_WRITE_ONLY, GL_TEXTURE_2D, 0, cloutput->getid(), 0 ); kernel->setargument( 0, &outputbuffer ); // done return true; }
39 Template void Game::Tick() { // run cl code to fill texture kernel->run( &outputbuffer ); // run shader on cl-generated texture shader->bind(); shader->setinputtexture( GL_TEXTURE0, "color", cloutput ); shader->setinputmatrix( "view", mat4( 1 ) ); DrawQuad(); }
40 Getting Started MyFirst OpenCL app OpenCL terminology
41 Terminology A few words you need to know the meaning of: 1. Device 2. Host 3. Context 4. Kernel 5. Program 6. Compute unit (CUDA: CUDA core) 7. Work item (CUDA: thread) 8. Command queue (synchronous, asynchronous)
42 MyFirst To execute an OpenCL program: 1. Query the host system for OpenCL devices 2. Create a context to associate the OpenCL devices 3. Create programs that will run on one or more associated devices 4. From the programs, select kernels to execute 5. Create memory objects on the host or on the device 6. Copy memory data to the device as needed 7. Provide arguments for the kernels 8. Submit the kernels to the command queue for execution 9. Copy the results from the device to the host. clgetplatformids( ) clgetdeviceids( ) clcreatecontext( ) clcreatecommandqueue( ) clcreateprogramwithsource( ) clbuildprogram( ) clcreatekernel( ) clcreatebuffer( ) clenqueuewritebuffer( ) clsetkernelarg( ) clenqueuendrangekernel( ) clfinish( ) clenqueuereadbuffer( )
43 MyFirst #include <stdio.h> #include "CL/cl.h" #define ITEMS 10 const char *KernelSource = " kernel void hello( global float *input, global float *output)\n"\ "{\n size_t id = get_global_id(0);\n output[id] = input[id] * input[id];\n}"; void main() { cl_int err; cl_uint num_of_platforms = 0; cl_platform_id platform_id; cl_device_id device_id; cl_uint num_of_devices = 0; size_t global = ITEMS; float inputdata[items] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }, results[items] = { 0 }; clgetplatformids( 1, &platform_id, &num_of_platforms ); clgetdeviceids( platform_id, CL_DEVICE_TYPE_GPU, 1, &device_id, &num_of_devices ); cl_context_properties props[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties)platform_id, 0 }; cl_context context = clcreatecontext( props, 1, &device_id, 0, 0, &err ); cl_command_queue queue = clcreatecommandqueue( context, device_id, 0, &err ); cl_program program = clcreateprogramwithsource( context, 1, (const char**)&kernelsource, 0, &err ); clbuildprogram( program, 0, NULL, NULL, NULL, NULL ); cl_kernel kernel = clcreatekernel( program, "hello", &err ); cl_mem input = clcreatebuffer( context, CL_MEM_READ_ONLY, 4 * ITEMS, 0, 0 ); cl_mem output = clcreatebuffer( context, CL_MEM_WRITE_ONLY, 4 * ITEMS, 0, 0 ); clenqueuewritebuffer( queue, input, CL_TRUE, 0, 4 * ITEMS, inputdata, 0, 0, 0 ); clsetkernelarg( kernel, 0, sizeof( cl_mem ), &input ); clsetkernelarg( kernel, 1, sizeof( cl_mem ), &output ); clenqueuendrangekernel( queue, kernel, 1, 0, &global, 0, 0, 0, 0 ); clfinish( queue ); clenqueuereadbuffer( queue, output, CL_TRUE, 0, 4 * ITEMS, results, 0, 0, 0 ); for( int i = 0; i < ITEMS; i++ ) printf( "%f ",results[i] ); } clreleasememobject( input ); clreleasememobject( output ); clreleaseprogram( program ); clreleasekernel( kernel ); clreleasecommandqueue( queue ); clreleasecontext( context );
44 MyFirst bool Kernel::InitCL() { cl_platform_id platform; cl_device_id* devices; cl_uint devcount; cl_int error; Like I said, I don t care much for API details Just start with the template, and modify / replace it when the need arises.... }
45 Assignment Create an OpenCL program that calculates Voronoi noise for a 512x512 buffer and make it available to the CPU. Measure the performance gain compared to CPU-only. Reference:
46 Words of Advice WebGL!= OpenCL Can t do by reference, use pointers instead float3 parameter: (float3)(1, 1, 1) fract requires second parameter sinf doesn t exist, use sin Also, see this helpful chart:
47 The End (for now)
Heterogeneous Computing
OpenCL Hwansoo Han Heterogeneous Computing Multiple, but heterogeneous multicores Use all available computing resources in system [AMD APU (Fusion)] Single core CPU, multicore CPU GPUs, DSPs Parallel programming
More informationWebCL Overview and Roadmap
Copyright Khronos Group, 2011 - Page 1 WebCL Overview and Roadmap Tasneem Brutch Chair WebCL Working Group Samsung Electronics Copyright Khronos Group, 2011 - Page 2 WebCL Motivation Enable high performance
More informationECE 574 Cluster Computing Lecture 17
ECE 574 Cluster Computing Lecture 17 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 6 April 2017 HW#8 will be posted Announcements HW#7 Power outage Pi Cluster Runaway jobs (tried
More informationRock em Graphic Cards
Rock em Graphic Cards Agnes Meyder 27.12.2013, 16:00 1 / 61 Layout Motivation Parallelism Old Standards OpenMPI OpenMP Accelerator Cards CUDA OpenCL OpenACC Hardware C++AMP The End 2 / 61 Layout Motivation
More informationNeil Trevett Vice President, NVIDIA OpenCL Chair Khronos President
4 th Annual Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President Copyright Khronos Group, 2009 - Page 1 CPUs Multiple cores driving performance increases Emerging Intersection GPUs Increasingly
More informationOpenCL. Matt Sellitto Dana Schaa Northeastern University NUCAR
OpenCL Matt Sellitto Dana Schaa Northeastern University NUCAR OpenCL Architecture Parallel computing for heterogenous devices CPUs, GPUs, other processors (Cell, DSPs, etc) Portable accelerated code Defined
More informationNeil Trevett Vice President, NVIDIA OpenCL Chair Khronos President. Copyright Khronos Group, Page 1
Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President Copyright Khronos Group, 2009 - Page 1 Introduction and aims of OpenCL - Neil Trevett, NVIDIA OpenCL Specification walkthrough - Mike
More informationCS/EE 217 GPU Architecture and Parallel Programming. Lecture 22: Introduction to OpenCL
CS/EE 217 GPU Architecture and Parallel Programming Lecture 22: Introduction to OpenCL Objective To Understand the OpenCL programming model basic concepts and data types OpenCL application programming
More informationIntroduction to Parallel & Distributed Computing OpenCL: memory & threads
Introduction to Parallel & Distributed Computing OpenCL: memory & threads Lecture 12, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In this Lecture Example: image rotation GPU threads and scheduling Understanding
More informationOpenCL / OpenGL Texture Interoperability: An Image Blurring Case Study
1 OpenCL / OpenGL Texture Interoperability: An Image Blurring Case Study Mike Bailey mjb@cs.oregonstate.edu opencl.opengl.rendertexture.pptx OpenCL / OpenGL Texture Interoperability: The Basic Idea 2 Application
More informationOpenCL The Open Standard for Heterogeneous Parallel Programming
OpenCL The Open Standard for Heterogeneous Parallel Programming March 2009 Copyright Khronos Group, 2009 - Page 1 Close-to-the-Silicon Standards Khronos creates Foundation-Level acceleration APIs - Needed
More informationGPGPU IGAD 2014/2015. Lecture 4. Jacco Bikker
GPGPU IGAD 2014/2015 Lecture 4 Jacco Bikker Today: Demo time! Parallel scan Parallel sort Assignment Demo Time Parallel scan What it is: in: 1 1 6 2 7 3 2 out: 0 1 2 8 10 17 20 C++: out[0] = 0 for ( i
More informationMartin Kruliš, v
Martin Kruliš 1 GPGPU History Current GPU Architecture OpenCL Framework Example Optimizing Previous Example Alternative Architectures 2 1996: 3Dfx Voodoo 1 First graphical (3D) accelerator for desktop
More informationOpenCL API. OpenCL Tutorial, PPAM Dominik Behr September 13 th, 2009
OpenCL API OpenCL Tutorial, PPAM 2009 Dominik Behr September 13 th, 2009 Host and Compute Device The OpenCL specification describes the API and the language. The OpenCL API, is the programming API available
More informationOpenCL Base Course Ing. Marco Stefano Scroppo, PhD Student at University of Catania
OpenCL Base Course Ing. Marco Stefano Scroppo, PhD Student at University of Catania Course Overview This OpenCL base course is structured as follows: Introduction to GPGPU programming, parallel programming
More informationDesign and implementation of a highperformance. platform on multigenerational GPUs.
Design and implementation of a highperformance stream-based computing platform on multigenerational GPUs. By Pablo Lamilla Álvarez September 27, 2010 Supervised by: Professor Shinichi Yamagiwa Kochi University
More informationFrom Application to Technology OpenCL Application Processors Chung-Ho Chen
From Application to Technology OpenCL Application Processors Chung-Ho Chen Computer Architecture and System Laboratory (CASLab) Department of Electrical Engineering and Institute of Computer and Communication
More informationGPGPU COMPUTE ON AMD. Udeepta Bordoloi April 6, 2011
GPGPU COMPUTE ON AMD Udeepta Bordoloi April 6, 2011 WHY USE GPU COMPUTE CPU: scalar processing + Latency + Optimized for sequential and branching algorithms + Runs existing applications very well - Throughput
More informationMartin Kruliš, v
Martin Kruliš 1 GPGPU History Current GPU Architecture OpenCL Framework Example (and its Optimization) Alternative Frameworks Most Recent Innovations 2 1996: 3Dfx Voodoo 1 First graphical (3D) accelerator
More informationTowards Transparent and Efficient GPU Communication on InfiniBand Clusters. Sadaf Alam Jeffrey Poznanovic Kristopher Howard Hussein Nasser El-Harake
Towards Transparent and Efficient GPU Communication on InfiniBand Clusters Sadaf Alam Jeffrey Poznanovic Kristopher Howard Hussein Nasser El-Harake MPI and I/O from GPU vs. CPU Traditional CPU point-of-view
More informationPerforming Reductions in OpenCL
Performing Reductions in OpenCL Mike Bailey mjb@cs.oregonstate.edu opencl.reduction.pptx Recall the OpenCL Model Kernel Global Constant Local Local Local Local Work- ItemWork- ItemWork- Item Here s the
More informationOpenCL in Action. Ofer Rosenberg
pencl in Action fer Rosenberg Working with pencl API pencl Boot Platform Devices Context Queue Platform Query int GetPlatform (cl_platform_id &platform, char* requestedplatformname) { cl_uint numplatforms;
More informationData Parallelism. CSCI 5828: Foundations of Software Engineering Lecture 28 12/01/2016
Data Parallelism CSCI 5828: Foundations of Software Engineering Lecture 28 12/01/2016 1 Goals Cover the material in Chapter 7 of Seven Concurrency Models in Seven Weeks by Paul Butcher Data Parallelism
More informationGPU acceleration on IB clusters. Sadaf Alam Jeffrey Poznanovic Kristopher Howard Hussein Nasser El-Harake
GPU acceleration on IB clusters Sadaf Alam Jeffrey Poznanovic Kristopher Howard Hussein Nasser El-Harake HPC Advisory Council European Workshop 2011 Why it matters? (Single node GPU acceleration) Control
More informationMali -T600 Series GPU OpenCL ARM. Developer Guide. Version 2.0. Copyright ARM. All rights reserved. DUI0538F (ID012914)
ARM Mali -T600 Series GPU OpenCL Version 2.0 Developer Guide Copyright 2012-2013 ARM. All rights reserved. DUI0538F () ARM Mali-T600 Series GPU OpenCL Developer Guide Copyright 2012-2013 ARM. All rights
More information/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 10: GPGPU (3) Welcome!
/INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2018 - Lecture 10: GPGPU (3) Welcome! Today s Agenda: Don t Trust the Template The Prefix Sum Parallel Sorting Stream Filtering Optimizing GPU
More informationHigh Dynamic Range Tone Mapping Post Processing Effect Multi-Device Version
High Dynamic Range Tone Mapping Post Processing Effect Multi-Device Version Intel SDK for OpenCL* Application Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number:
More informationGPU COMPUTING RESEARCH WITH OPENCL
GPU COMPUTING RESEARCH WITH OPENCL Studying Future Workloads and Devices Perhaad Mistry, Dana Schaa, Enqiang Sun, Rafael Ubal, Yash Ukidave, David Kaeli Dept of Electrical and Computer Engineering Northeastern
More informationIntroduction to OpenCL!
Lecture 6! Introduction to OpenCL! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! OpenCL Architecture Defined in four parts Platform Model
More informationAdvanced OpenMP. Other threading APIs
Advanced OpenMP Other threading APIs What s wrong with OpenMP? OpenMP is designed for programs where you want a fixed number of threads, and you always want the threads to be consuming CPU cycles. - cannot
More informationParallelization using the GPU
~ Parallelization using the GPU Scientific Computing Winter 2016/2017 Lecture 29 Jürgen Fuhrmann juergen.fuhrmann@wias-berlin.de made wit pandoc 1 / 26 ~ Recap 2 / 26 MIMD Hardware: Distributed memory
More informationOpenCL. Dr. David Brayford, LRZ, PRACE PATC: Intel MIC & GPU Programming Workshop
OpenCL Dr. David Brayford, LRZ, brayford@lrz.de PRACE PATC: Intel MIC & GPU Programming Workshop 1 Open Computing Language Open, royalty-free standard C-language extension For cross-platform, parallel
More informationIntroduction à OpenCL
1 1 UDS/IRMA Journée GPU Strasbourg, février 2010 Sommaire 1 OpenCL 2 3 GPU architecture A modern Graphics Processing Unit (GPU) is made of: Global memory (typically 1 Gb) Compute units (typically 27)
More informationOpenCL. Computation on HybriLIT Brief introduction and getting started
OpenCL Computation on HybriLIT Brief introduction and getting started Alexander Ayriyan Laboratory of Information Technologies Joint Institute for Nuclear Research 05.09.2014 (Friday) Tutorial in frame
More informationDebugging and Analyzing Programs using the Intercept Layer for OpenCL Applications
Debugging and Analyzing Programs using the Intercept Layer for OpenCL Applications Ben Ashbaugh IWOCL 2018 https://github.com/intel/opencl-intercept-layer Why am I here? Intercept Layer for OpenCL Applications
More informationSistemi Operativi e Reti
Sistemi Operativi e Reti GPGPU Computing: the multi/many core computing era Dipartimento di Matematica e Informatica Corso di Laurea Magistrale in Informatica Osvaldo Gervasi ogervasi@computer.org 1 2
More informationAdvanced OpenCL Event Model Usage
Advanced OpenCL Event Model Usage Derek Gerstmann University of Western Australia http://local.wasp.uwa.edu.au/~derek OpenCL Event Model Usage Outline Execution Model Usage Patterns Synchronisation Event
More informationShaders. Slide credit to Prof. Zwicker
Shaders Slide credit to Prof. Zwicker 2 Today Shader programming 3 Complete model Blinn model with several light sources i diffuse specular ambient How is this implemented on the graphics processor (GPU)?
More informationMasterpraktikum Scientific Computing
Masterpraktikum Scientific Computing High-Performance Computing Michael Bader Alexander Heinecke Technische Universität München, Germany Outline Intel Cilk Plus OpenCL Übung, October 7, 2012 2 Intel Cilk
More informationOpenCL on the GPU. San Jose, CA September 30, Neil Trevett and Cyril Zeller, NVIDIA
OpenCL on the GPU San Jose, CA September 30, 2009 Neil Trevett and Cyril Zeller, NVIDIA Welcome to the OpenCL Tutorial! Khronos and industry perspective on OpenCL Neil Trevett Khronos Group President OpenCL
More informationCS 677: Parallel Programming for Many-core Processors Lecture 12
1 CS 677: Parallel Programming for Many-core Processors Lecture 12 Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu CS Department Project Poster
More informationScientific Computing WS 2017/2018. Lecture 27. Jürgen Fuhrmann Lecture 27 Slide 1
Scientific Computing WS 2017/2018 Lecture 27 Jürgen Fuhrmann juergen.fuhrmann@wias-berlin.de Lecture 27 Slide 1 Lecture 27 Slide 2 Why parallelization? Computers became faster and faster without that...
More informationColin Riddell GPU Compiler Developer Codeplay Visit us at
OpenCL Colin Riddell GPU Compiler Developer Codeplay Visit us at www.codeplay.com 2 nd Floor 45 York Place Edinburgh EH1 3HP United Kingdom Codeplay Overview of OpenCL Codeplay + OpenCL Our technology
More informationOpenCL. An Introduction for HPC programmers. Benedict Gaster, AMD Tim Mattson, Intel. - Page 1
OpenCL An Introduction for HPC programmers Benedict Gaster, AMD Tim Mattson, Intel - Page 1 Preliminaries: Disclosures - The views expressed in this tutorial are those of the people delivering the tutorial.
More informationIntroduction to OpenCL. Benedict R. Gaster October, 2010
Introduction to OpenCL Benedict R. Gaster October, 2010 OpenCL With OpenCL you can Leverage CPUs and GPUs to accelerate parallel computation Get dramatic speedups for computationally intensive applications
More informationScientific Computing WS 2018/2019. Lecture 25. Jürgen Fuhrmann Lecture 25 Slide 1
Scientific Computing WS 2018/2019 Lecture 25 Jürgen Fuhrmann juergen.fuhrmann@wias-berlin.de Lecture 25 Slide 1 Lecture 25 Slide 2 SIMD Hardware: Graphics Processing Units ( GPU) [Source: computing.llnl.gov/tutorials]
More informationAMath 483/583, Lecture 24, May 20, Notes: Notes: What s a GPU? Notes: Some GPU application areas
AMath 483/583 Lecture 24 May 20, 2011 Today: The Graphical Processing Unit (GPU) GPU Programming Today s lecture developed and presented by Grady Lemoine References: Andreas Kloeckner s High Performance
More informationLecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1
Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei
More informationNVIDIA OpenCL JumpStart Guide. Technical Brief
NVIDIA OpenCL JumpStart Guide Technical Brief Version 1.0 February 19, 2010 Introduction The purposes of this guide are to assist developers who are familiar with CUDA C/C++ development and want to port
More informationLecture 13: OpenGL Shading Language (GLSL)
Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 18, 2018 1/56 Motivation } Last week, we discussed the many of the new tricks in Graphics require low-level access to the Graphics
More informationAccelerate with GPUs Harnessing GPGPUs with trending technologies
Accelerate with GPUs Harnessing GPGPUs with trending technologies Anubhav Jain and Amit Kalele Parallelization and Optimization CoE Tata Consultancy Services Ltd. Copyright 2016 Tata Consultancy Services
More informationProgramming shaders & GPUs Christian Miller CS Fall 2011
Programming shaders & GPUs Christian Miller CS 354 - Fall 2011 Fixed-function vs. programmable Up until 2001, graphics cards implemented the whole pipeline for you Fixed functionality but configurable
More informationInstructions for setting up OpenCL Lab
Instructions for setting up OpenCL Lab This document describes the procedure to setup OpenCL Lab for Linux and Windows machine. Specifically if you have limited no. of graphics cards and a no. of users
More informationReal-time Graphics 9. GPGPU
9. GPGPU GPGPU GPU (Graphics Processing Unit) Flexible and powerful processor Programmability, precision, power Parallel processing CPU Increasing number of cores Parallel processing GPGPU general-purpose
More informationGPU 101. Mike Bailey. Oregon State University. Oregon State University. Computer Graphics gpu101.pptx. mjb April 23, 2017
1 GPU 101 Mike Bailey mjb@cs.oregonstate.edu gpu101.pptx Why do we care about GPU Programming? A History of GPU Performance vs. CPU Performance 2 Source: NVIDIA How Can You Gain Access to GPU Power? 3
More informationGPU 101. Mike Bailey. Oregon State University
1 GPU 101 Mike Bailey mjb@cs.oregonstate.edu gpu101.pptx Why do we care about GPU Programming? A History of GPU Performance vs. CPU Performance 2 Source: NVIDIA 1 How Can You Gain Access to GPU Power?
More informationThe Open Computing Language (OpenCL)
1 OpenCL The Open Computing Language (OpenCL) OpenCL consists of two parts: a C/C++-callable API and a C-ish programming language. Also go look at the files first.cpp and first.cl! Mike Bailey mjb@cs.oregonstate.edu
More informationThe Open Computing Language (OpenCL)
1 The Open Computing Language (OpenCL) Also go look at the files first.cpp and first.cl! Mike Bailey mjb@cs.oregonstate.edu opencl.pptx OpenCL 2 OpenCL consists of two parts: a C/C++-callable API and a
More informationThe Open Computing Language (OpenCL)
1 The Open Computing Language (OpenCL) Also go look at the files first.cpp and first.cl! Mike Bailey mjb@cs.oregonstate.edu opencl.pptx OpenCL 2 OpenCL consists of two parts: a C/C++-callable API and a
More informationEfficient and Scalable Shading for Many Lights
Efficient and Scalable Shading for Many Lights 1. GPU Overview 2. Shading recap 3. Forward Shading 4. Deferred Shading 5. Tiled Deferred Shading 6. And more! First GPU Shaders Unified Shaders CUDA OpenCL
More informationOpenCL / OpenGL Vertex Buffer Interoperability: A Particle System Case Study
1 OpenCL / OpenGL Vertex Buffer Interoperability: A Particle System Case Study See the video at: http://cs.oregonstate.edu/~mjb/cs575/projects/particles.mp4 Mike Bailey mjb@cs.oregonstate.edu Oregon State
More informationReal-time Graphics 9. GPGPU
Real-time Graphics 9. GPGPU GPGPU GPU (Graphics Processing Unit) Flexible and powerful processor Programmability, precision, power Parallel processing CPU Increasing number of cores Parallel processing
More informationOpenCL / OpenGL Vertex Buffer Interoperability: A Particle System Case Study
1 OpenCL / OpenGL Vertex Buffer Interoperability: A Particle System Case Study See the video at: http://cs.oregonstate.edu/~mjb/cs575/projects/particles.mp4 Mike Bailey mjb@cs.oregonstate.edu Oregon State
More informationDemocratizing General Purpose GPU Programming through OpenCL and Scala
Democratizing General Purpose GPU Programming through OpenCL and Scala A Dat 5 report by: Raider Beck Helge Willum Larsen Tommy Jensen Supervised by: Bent Thomsen JANUARY 10, 2011 Student report not an
More informationCopyright 2013 by Yong Cao, Referencing UIUC ECE408/498AL Course Notes. OpenCL. OpenCL
OpenCL OpenCL What is OpenCL? Ø Cross-platform parallel computing API and C-like language for heterogeneous computing devices Ø Code is portable across various target devices: Ø Correctness is guaranteed
More informationGraphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university
Graphics Architectures and OpenCL Michael Doggett Department of Computer Science Lund university Overview Parallelism Radeon 5870 Tiled Graphics Architectures Important when Memory and Bandwidth limited
More informationLecture Topic: An Overview of OpenCL on Xeon Phi
C-DAC Four Days Technology Workshop ON Hybrid Computing Coprocessors/Accelerators Power-Aware Computing Performance of Applications Kernels hypack-2013 (Mode-4 : GPUs) Lecture Topic: on Xeon Phi Venue
More informationAMD Accelerated Parallel Processing. OpenCL User Guide. October rev1.0
AMD Accelerated Parallel Processing OpenCL User Guide October 2014 rev1.0 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Accelerated Parallel Processing, the AMD Accelerated
More informationKenneth Dyke Sr. Engineer, Graphics and Compute Architecture
Kenneth Dyke Sr. Engineer, Graphics and Compute Architecture 2 Supporting multiple GPUs in your application Finding all renderers and devices Responding to renderer changes Making use of multiple GPUs
More informationAMD Accelerated Parallel Processing. OpenCL User Guide. December rev1.0
AMD Accelerated Parallel Processing OpenCL User Guide December 2014 rev1.0 2014 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Accelerated Parallel Processing, the AMD Accelerated
More informationIntel SDK for OpenCL* - Sample for OpenCL* and Intel Media SDK Interoperability
Intel SDK for OpenCL* - Sample for OpenCL* and Intel Media SDK Interoperability User s Guide Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 327283-001US Revision: 1.0 World
More informationOpenCL overview. Ian Ollmann, Ph.D. Senior Scientist
OpenCL overview Ian Ollmann, Ph.D. Senior Scientist 2 Discuss OpenCL design and philosophy Explore common developer questions Debugging tips Text Assumes some experience with OpenCL 3 Bring programmability
More informationComputer Architecture
Jens Teubner Computer Architecture Summer 2017 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2017 Jens Teubner Computer Architecture Summer 2017 34 Part II Graphics
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More informationPCAP Assignment II. 1. With a neat diagram, explain the various stages of fixed-function graphic pipeline.
PCAP Assignment II 1. With a neat diagram, explain the various stages of fixed-function graphic pipeline. The host interface receives graphics commands and data from the CPU. The commands are typically
More informationMichael Kinsner, Dirk Seynhaeve IWOCL 2018
Michael Kinsner, Dirk Seynhaeve IWOCL 2018 Topics 1. FPGA overview 2. Motivating application classes 3. Host pipes 4. Some data 2 FPGA: Fine-grained Massive Parallelism Intel Stratix 10 FPGA: Over 5 Million
More informationGPU Architecture and Programming with OpenCL. OpenCL. GPU Architecture: Why? Today s s Topic. GPUs: : Architectures for Drawing Triangles Fast
Today s s Topic GPU Architecture and Programming with OpenCL David Black-Schaffer david.black-schaffer@it black-schaffer@it.uu.se Room 1221 GPU architecture What and why The good The bad Compute Models
More informationOpenCL Training Course
OpenCL Training Course Intermediate Level Class http://www.ksc.re.kr http://webedu.ksc.re.kr INDEX 1. Class introduction 2. Multi-platform and multi-device 3. OpenCL APIs in detail 4. OpenCL C language
More informationCUDA and GPU Performance Tuning Fundamentals: A hands-on introduction. Francesco Rossi University of Bologna and INFN
CUDA and GPU Performance Tuning Fundamentals: A hands-on introduction Francesco Rossi University of Bologna and INFN * Using this terminology since you ve already heard of SIMD and SPMD at this school
More informationGPU Architecture and Programming with OpenCL
GPU Architecture and Programming with OpenCL David Black-Schaffer david.black-schaffer@it black-schaffer@it.uu.se Room 1221 Today s s Topic GPU architecture What and why The good The bad Compute Models
More informationOverview: Graphics Processing Units
advent of GPUs GPU architecture Overview: Graphics Processing Units the NVIDIA Fermi processor the CUDA programming model simple example, threads organization, memory model case study: matrix multiply
More informationOpenCL Overview Benedict R. Gaster, AMD
Copyright Khronos Group, 2011 - Page 1 OpenCL Overview Benedict R. Gaster, AMD March 2010 The BIG Idea behind OpenCL OpenCL execution model - Define N-dimensional computation domain - Execute a kernel
More informationOpenCL Device Fission Benedict R. Gaster, AMD
Copyright Khronos Group, 2011 - Page 1 Fission Benedict R. Gaster, AMD March 2011 Fission (cl_ext_device_fission) Provides an interface for sub-dividing an device into multiple sub-devices Typically used
More informationCUDA Programming Model
CUDA Xing Zeng, Dongyue Mou Introduction Example Pro & Contra Trend Introduction Example Pro & Contra Trend Introduction What is CUDA? - Compute Unified Device Architecture. - A powerful parallel programming
More informationParallel Programming on Larrabee. Tim Foley Intel Corp
Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This
More informationECE 574 Cluster Computing Lecture 17
ECE 574 Cluster Computing Lecture 17 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 March 2019 HW#8 (CUDA) posted. Project topics due. Announcements 1 CUDA installing On Linux
More informationHPC trends (Myths about) accelerator cards & more. June 24, Martin Schreiber,
HPC trends (Myths about) accelerator cards & more June 24, 2015 - Martin Schreiber, M.Schreiber@exeter.ac.uk Outline HPC & current architectures Performance: Programming models: OpenCL & OpenMP Some applications:
More informationINTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD
INTRODUCTION TO OPENCL TM A Beginner s Tutorial Udeepta Bordoloi AMD IT S A HETEROGENEOUS WORLD Heterogeneous computing The new normal CPU Many CPU s 2, 4, 8, Very many GPU processing elements 100 s Different
More informationHPC programming languages
Outlines HPC programming languages Arnaud LEGRAND, CR CNRS, LIG/INRIA/Mescal Vincent DANJEAN, MCF UJF, LIG/INRIA/Moais October 6th, 2014 Outlines Goals of my four lectures Today Overview of a few different
More informationIntroduction à OpenCL et applications
Philippe Helluy 1, Anaïs Crestetto 1 1 Université de Strasbourg - IRMA Lyon, journées groupe calcul, 10 novembre 2010 Sommaire OpenCL 1 OpenCL 2 3 4 OpenCL GPU architecture A modern Graphics Processing
More informationThe Graphics Pipeline and OpenGL III: OpenGL Shading Language (GLSL 1.10)!
! The Graphics Pipeline and OpenGL III: OpenGL Shading Language (GLSL 1.10)! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 4! stanford.edu/class/ee267/! Lecture Overview! Review
More informationUsing GPUs. Visualization of Complex Functions. September 26, 2012 Khaldoon Ghanem German Research School for Simulation Sciences
Visualization of Complex Functions Using GPUs September 26, 2012 Khaldoon Ghanem German Research School for Simulation Sciences Outline GPU in a Nutshell Fractals - A Simple Fragment Shader Domain Coloring
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationProgramming and Simulating Fused Devices. Part 2 Multi2Sim
Programming and Simulating Fused Devices Part 2 Multi2Sim Rafael Ubal Perhaad Mistry Northeastern University Boston, MA Conference title 1 Outline 1. Introduction 2. The x86 CPU Emulation 3. The Evergreen
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationThe Application Stage. The Game Loop, Resource Management and Renderer Design
1 The Application Stage The Game Loop, Resource Management and Renderer Design Application Stage Responsibilities 2 Set up the rendering pipeline Resource Management 3D meshes Textures etc. Prepare data
More informationA Case for Better Integration of Host and Target Compilation When Using OpenCL for FPGAs
A Case for Better Integration of Host and Target Compilation When Using OpenCL for FPGAs Taylor Lloyd, Artem Chikin, Erick Ochoa, Karim Ali, José Nelson Amaral University of Alberta Sept 7 FSP 2017 1 University
More informationGPU Fundamentals Jeff Larkin November 14, 2016
GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate
More informationMulti-Processors and GPU
Multi-Processors and GPU Philipp Koehn 7 December 2016 Predicted CPU Clock Speed 1 Clock speed 1971: 740 khz, 2016: 28.7 GHz Source: Horowitz "The Singularity is Near" (2005) Actual CPU Clock Speed 2 Clock
More informationECE 574 Cluster Computing Lecture 15
ECE 574 Cluster Computing Lecture 15 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 30 March 2017 HW#7 (MPI) posted. Project topics due. Update on the PAPI paper Announcements
More information