PyCUDA and PyUblas: Hybrid HPC in Python made easy

Size: px
Start display at page:

Download "PyCUDA and PyUblas: Hybrid HPC in Python made easy"

Transcription

1 PyCUDA and PyUblas: Hybrid HPC in Python made easy Applied Mathematics, Brown University March 5, 2009

2 Thanks Jan Hesthaven (Brown) Tim Warburton (Rice) Lucas Wilcox (UT Austin) Akil Narayan (Brown) PyCUDA contributors Nvidia Corporation

3 Outline 1 PyCUDA: What and Why? 2 PyCUDA: Overview 3 PyUblas 4 Conclusions

4 Scripting for GPUs Outline 1 PyCUDA: What and Why? Scripting for GPUs (Very) brief GPU PyCUDA: Overview 3 PyUblas 4 Conclusions

5 Scripting for GPUs PyCUDA: Why do Scripting for GPUs? GPUs are everything that Python is not. Highly parallel Very architecture-sensitive Built for maximum FP/memory throughput complement each other

6 Scripting for GPUs PyCUDA: Why do Scripting for GPUs? GPUs are everything that Python is not. Highly parallel Very architecture-sensitive Built for maximum FP/memory throughput complement each other CPU: largely restricted to control tasks ( 1000/sec) Python fast enough

7 Scripting for GPUs PyCUDA: Why do Scripting for GPUs? GPUs are everything that Python is not. Highly parallel Very architecture-sensitive Built for maximum FP/memory throughput complement each other CPU: largely restricted to control tasks ( 1000/sec) Python fast enough Realize a promise: Use Python... from first prototype to full-scale production code.

8 Scripting for GPUs Python: Speed Usual answer to the Speed Question : Hybrid ( mixed ) Code.

9 Scripting for GPUs Python: Speed Usual answer to the Speed Question : Hybrid ( mixed ) Code. Plays to the strengths of each language.

10 Scripting for GPUs Python: Speed Usual answer to the Speed Question : Hybrid ( mixed ) Code. Plays to the strengths of each language. But: Introduces (some) complexity.

11 Scripting for GPUs Python: Speed Usual answer to the Speed Question : Hybrid ( mixed ) Code. Plays to the strengths of each language. But: Introduces (some) complexity. Observation: GPU code is already hybrid.

12 Scripting for GPUs Python: Speed Usual answer to the Speed Question : Hybrid ( mixed ) Code. Plays to the strengths of each language. But: Introduces (some) complexity. Observation: GPU code is already hybrid. Consequence: No added complexity through hybrid code.

13 (Very) brief GPU 101 Outline 1 PyCUDA: What and Why? Scripting for GPUs (Very) brief GPU PyCUDA: Overview 3 PyUblas 4 Conclusions

14 (Very) brief GPU 101 GPUs: What? Design target for CPUs: Make a single thread very fast Hide latency through large caches Predict, speculate

15 (Very) brief GPU 101 GPUs: What? Design target for CPUs: Make a single thread very fast Hide latency through large caches Predict, speculate Design target for GPUs: Throughput matters single threads do not Hide latency through massive parallelism Let programmer deal with raw storage hierarchy

16 (Very) brief GPU 101 What is CUDA? CUDA is Nvidia s proprietary compute abstraction.

17 (Very) brief GPU 101 What is CUDA? CUDA is Nvidia s proprietary compute abstraction. Main merit: A well-balanced model of GPU computing. Abstract enough to not be hardware-specific. Concrete enough to expose most hardware features.

18 (Very) brief GPU 101 What is CUDA? CUDA is Nvidia s proprietary compute abstraction. Main merit: A well-balanced model of GPU computing. Abstract enough to not be hardware-specific. Concrete enough to expose most hardware features. (Very) close semantic relative of OpenCL.

19 (Very) brief GPU 101 What is CUDA? CUDA is Nvidia s proprietary compute abstraction. Main merit: A well-balanced model of GPU computing. Abstract enough to not be hardware-specific. Concrete enough to expose most hardware features. (Very) close semantic relative of OpenCL. Python + CUDA = PyCUDA.

20 (Very) brief GPU 101 GPUs: Execution Model Computational Grid Block (0, 1) Block (0, 0) Block (1, 1) Block (1, 0) Block (2, 1) Block (2, 0) Multi-tiered Parallelism Block Grid Block (1, 0) (0, 3) (1, 3) (2, 3) (3, 3) (0, 2) (1, 2) (2, 2) (3, 2) (0, 1) (0, 0) (1, 1) (1, 0) (2, 1) (2, 0) (3, 1) (3, 0) Image Credit: Johan S. Seland, Sintef

21 (Very) brief GPU 101 GPUs: Execution Model Computational Grid Block (0, 1) Block (0, 0) Block (1, 0) (0, 3) (0, 2) (0, 1) (0, 0) Block (1, 1) Block (1, 0) (1, 3) (1, 2) (1, 1) (1, 0) Block (2, 1) Block (2, 0) (2, 3) (2, 2) (2, 1) (2, 0) (3, 3) (3, 2) (3, 1) (3, 0) Multi-tiered Parallelism Block Grid Only threads within a block can communicate Image Credit: Johan S. Seland, Sintef

22 (Very) brief GPU 101 GPUs: Execution Model Computational Grid Block (0, 1) Block (0, 0) Block (1, 0) (0, 3) (0, 2) (0, 1) (0, 0) Block (1, 1) Block (1, 0) (1, 3) (1, 2) (1, 1) (1, 0) Block (2, 1) Block (2, 0) (2, 3) (2, 2) (2, 1) (2, 0) (3, 3) (3, 2) (3, 1) (3, 0) Multi-tiered Parallelism Block Grid Only threads within a block can communicate Each Block is assigned to physical execution unit. Image Credit: Johan S. Seland, Sintef

23 (Very) brief GPU 101 GPUs: Execution Model Computational Grid Block (0, 1) Block (0, 0) Block (1, 0) (0, 3) (0, 2) (0, 1) (0, 0) Block (1, 1) Block (1, 0) (1, 3) (1, 2) (1, 1) (1, 0) Block (2, 1) Block (2, 0) (2, 3) (2, 2) (2, 1) (2, 0) (3, 3) (3, 2) (3, 1) (3, 0) Multi-tiered Parallelism Block Grid Only threads within a block can communicate Each Block is assigned to physical execution unit. Grids and Blocks replace outer loops in an algorithm. Image Credit: Johan S. Seland, Sintef

24 Whetting your Appetite Outline 1 PyCUDA: What and Why? 2 PyCUDA: Overview Whetting your Appetite Working with PyCUDA 3 PyUblas 4 Conclusions

25 Whetting your Appetite Whetting your appetite 1 import pycuda.driver as cuda 2 import pycuda.autoinit 3 import numpy 4 5 a = numpy.random.randn(4,4).astype(numpy.float32) 6 a gpu = cuda.mem alloc(a.nbytes) 7 cuda.memcpy htod(a gpu, a) [This is examples/demo.py in the PyCUDA distribution.]

26 Whetting your Appetite Whetting your appetite 9 mod = cuda.sourcemodule( 10 global void doublify ( float a) 11 { 12 int idx = threadidx.x + threadidx.y 4; 13 a[ idx ] = 2; 14 } 15 ) func = mod.get function( doublify ) 18 func(a gpu, block=(4,4,1)) a doubled = numpy.empty like(a) 21 cuda.memcpy dtoh(a doubled, a gpu) 22 print a doubled 23 print a

27 Whetting your Appetite Whetting your appetite 9 mod = cuda.sourcemodule( 10 global void doublify ( float a) 11 { 12 int idx = threadidx.x + threadidx.y 4; 13 a[ idx ] = 2; 14 } 15 ) func = mod.get function( doublify ) 18 func(a gpu, block=(4,4,1)) a doubled = numpy.empty like(a) 21 cuda.memcpy dtoh(a doubled, a gpu) 22 print a doubled 23 print a Compute kernel

28 Whetting your Appetite Whetting your appetite, Part II Did somebody say Abstraction is good?

29 Whetting your Appetite Whetting your appetite, Part II 1 import numpy 2 import pycuda.autoinit 3 import pycuda.gpuarray as gpuarray 4 5 a gpu = gpuarray.to gpu( 6 numpy.random.randn(4,4).astype(numpy.float32)) 7 a doubled = (2 a gpu).get() 8 print a doubled 9 print a gpu

30 Working with PyCUDA Outline 1 PyCUDA: What and Why? 2 PyCUDA: Overview Whetting your Appetite Working with PyCUDA 3 PyUblas 4 Conclusions

31 Working with PyCUDA PyCUDA Philosophy Provide complete access

32 Working with PyCUDA PyCUDA Philosophy Provide complete access Automatically manage resources

33 Working with PyCUDA PyCUDA Philosophy Provide complete access Automatically manage resources Provide abstractions

34 Working with PyCUDA PyCUDA Philosophy Provide complete access Automatically manage resources Provide abstractions Allow interactive use

35 Working with PyCUDA PyCUDA Philosophy Provide complete access Automatically manage resources Provide abstractions Allow interactive use Check for and report errors automatically

36 Working with PyCUDA PyCUDA Philosophy Provide complete access Automatically manage resources Provide abstractions Allow interactive use Check for and report errors automatically Integrate tightly with numpy

37 Working with PyCUDA PyCUDA: Completeness PyCUDA exposes all of CUDA.

38 Working with PyCUDA PyCUDA: Completeness PyCUDA exposes all of CUDA. For example: Arrays and Textures Pagelocked host memory Memory transfers (asynchronous, structured) Streams and Events Device queries (GL Interop)

39 Working with PyCUDA PyCUDA: Completeness PyCUDA supports every OS that CUDA supports.

40 Working with PyCUDA PyCUDA: Completeness PyCUDA supports every OS that CUDA supports. Linux Windows OS X

41 Working with PyCUDA PyCUDA: Documentation

42 Working with PyCUDA PyCUDA: Workflow Edit Run

43 Working with PyCUDA PyCUDA: Workflow Edit Run SourceModule("...")

44 Working with PyCUDA PyCUDA: Workflow Edit Run SourceModule("...") PyCUDA

45 Working with PyCUDA PyCUDA: Workflow Edit Cache? Run SourceModule("...") PyCUDA

46 Working with PyCUDA PyCUDA: Workflow Edit Cache? Run nvcc SourceModule("...") PyCUDA

47 Working with PyCUDA PyCUDA: Workflow Edit Cache? Run nvcc.cubin SourceModule("...") PyCUDA

48 Working with PyCUDA PyCUDA: Workflow Edit Cache! Run nvcc.cubin SourceModule("...") PyCUDA

49 Working with PyCUDA PyCUDA: Workflow Edit Cache! Run nvcc.cubin SourceModule("...") Upload to GPU PyCUDA

50 Working with PyCUDA PyCUDA: Workflow Edit Cache! Run nvcc.cubin SourceModule("...") Run on GPU Upload to GPU PyCUDA

51 Working with PyCUDA Automatic Cleanup Reachable objects (memory, streams,... ) are never destroyed.

52 Working with PyCUDA Automatic Cleanup Reachable objects (memory, streams,... ) are never destroyed. Once unreachable, released at an unspecified future time.

53 Working with PyCUDA Automatic Cleanup Reachable objects (memory, streams,... ) are never destroyed. Once unreachable, released at an unspecified future time. Scarce resources (memory) can be explicitly freed. (obj.free())

54 Working with PyCUDA Automatic Cleanup Reachable objects (memory, streams,... ) are never destroyed. Once unreachable, released at an unspecified future time. Scarce resources (memory) can be explicitly freed. (obj.free()) Correctly deals with multiple contexts and dependencies.

55 Working with PyCUDA gpuarray: Simple Linear Algebra pycuda.gpuarray: Meant to look and feel just like numpy.

56 Working with PyCUDA gpuarray: Simple Linear Algebra pycuda.gpuarray: Meant to look and feel just like numpy. gpuarray.to gpu(numpy array) numpy array = gpuarray.get()

57 Working with PyCUDA gpuarray: Simple Linear Algebra pycuda.gpuarray: Meant to look and feel just like numpy. gpuarray.to gpu(numpy array) numpy array = gpuarray.get() No: indexing, slicing, etc. (yet)

58 Working with PyCUDA gpuarray: Simple Linear Algebra pycuda.gpuarray: Meant to look and feel just like numpy. gpuarray.to gpu(numpy array) numpy array = gpuarray.get() No: indexing, slicing, etc. (yet) Yes: +, -,, /, fill, sin, exp, rand, take,...

59 Working with PyCUDA gpuarray: Simple Linear Algebra pycuda.gpuarray: Meant to look and feel just like numpy. gpuarray.to gpu(numpy array) numpy array = gpuarray.get() No: indexing, slicing, etc. (yet) Yes: +, -,, /, fill, sin, exp, rand, take,... Mixed types (int32 + float32 = float64)

60 Working with PyCUDA gpuarray: Simple Linear Algebra pycuda.gpuarray: Meant to look and feel just like numpy. gpuarray.to gpu(numpy array) numpy array = gpuarray.get() No: indexing, slicing, etc. (yet) Yes: +, -,, /, fill, sin, exp, rand, take,... Mixed types (int32 + float32 = float64) print gpuarray for debugging.

61 Working with PyCUDA gpuarray: Simple Linear Algebra pycuda.gpuarray: Meant to look and feel just like numpy. gpuarray.to gpu(numpy array) numpy array = gpuarray.get() No: indexing, slicing, etc. (yet) Yes: +, -,, /, fill, sin, exp, rand, take,... Mixed types (int32 + float32 = float64) print gpuarray for debugging. Memory behind gpuarray available as.gpudata attribute. Use as kernel arguments, textures, etc.

62 Working with PyCUDA gpuarray: Elementwise expressions Avoiding extra store-fetch cycles for elementwise math: from pycuda.curandom import rand as curand a gpu = curand((50,)) b gpu = curand((50,)) from pycuda.elementwise import ElementwiseKernel lin comb = ElementwiseKernel( float a, float x, float b, float y, float z, z[ i ] = a x[i] + b y[i] ) c gpu = gpuarray.empty like(a gpu) lin comb(5, a gpu, 6, b gpu, c gpu) assert la. norm((c gpu (5 a gpu+6 b gpu)).get()) < 1e 5

63 Working with PyCUDA PyCUDA: Vital Information software/pycuda X Consortium License (no warranty, free for all use) Requires: numpy, Boost C++, Python Support via mailing list.

64 Working with PyCUDA PyCUDA: Metaprogramming In PyCUDA, GPU code does not need to be a compile-time constant.

65 Working with PyCUDA PyCUDA: Metaprogramming In PyCUDA, GPU code does not need to be a compile-time constant. (unlike native C interface)

66 Working with PyCUDA Shameless Plug How would you use this stuff to write a Discontinuous Galerkin solver? How well does that work? MS134: Large-Scale Computing with High-Order Mth. Tomorrow (Friday), 10:00 10:25, Concerto A

67 A Toolset for building Hybrid Codes Outline 1 PyCUDA: What and Why? 2 PyCUDA: Overview 3 PyUblas A Toolset for building Hybrid Codes 4 Conclusions

68 A Toolset for building Hybrid Codes Making C++ and Python talk to each other Making C++ and Python a workable combination: Python C++

69 A Toolset for building Hybrid Codes Making C++ and Python talk to each other Making C++ and Python a workable combination: Python Boost.Python C++

70 A Toolset for building Hybrid Codes Making C++ and Python talk to each other Making C++ and Python a workable combination: Python Numpy Boost.Python C++

71 A Toolset for building Hybrid Codes Making C++ and Python talk to each other Making C++ and Python a workable combination: Python Numpy Boost.Python C++ Boost.Ublas

72 A Toolset for building Hybrid Codes Making C++ and Python talk to each other Making C++ and Python a workable combination: Python Numpy Boost.Python PyUblas C++ Boost.Ublas

73 A Toolset for building Hybrid Codes PyUblas: Features Exploits pluggable storage backends in Ublas: numpy vector<t>, numpy matrix<t> Normal citizens of the Ublas universe (take part in ET etc.) Type-safe, zero-copy language transition OK anywhere: argument, return value, data member Strided storage: numpy strided vector<t> Not OK: non-contiguous storage

74 A Toolset for building Hybrid Codes Example C++: #include <pyublas/numpy.hpp> numpy vector<double> triple(numpy vector<double> x) { return 3 x; } BOOST PYTHON MODULE(sample ext) { boost :: python:: def( triple, triple ); } Python: import sample ext import pyublas vec = numpy.ones((5,), dtype=float) print sample ext. triple (vec)

75 Conclusions GPUs and Python: good combination +

76 Conclusions GPUs and Python: good combination Been considering playing with GPUs? PyCUDA makes that easier. +

77 Conclusions GPUs and Python: good combination Been considering playing with GPUs? PyCUDA makes that easier. Allows full access to low-level bits, if needed. +

78 Conclusions GPUs and Python: good combination Been considering playing with GPUs? PyCUDA makes that easier. Allows full access to low-level bits, if needed. Provides many nice abstractions to make life easier. +

79 Conclusions GPUs and Python: good combination Been considering playing with GPUs? PyCUDA makes that easier. Allows full access to low-level bits, if needed. Provides many nice abstractions to make life easier. Enables just-in-time GPU code generation. See tomorrow s talk +

80 Questions?? Thank you for your attention! MS134: Tomorrow (Friday), 10:00 10:25, Concerto A image credits

81 Image Credits C870 GPU: Nvidia Corp. Snail: flickr.com/hadi fooladi GT200 die shot: Nvidia Corp. Old Books: flickr.com/ppdigital Adding Machine: flickr.com/thomashawk Floppy disk: flickr.com/ethanhein

GPU Metaprogramming using PyCUDA: Methods & Applications

GPU Metaprogramming using PyCUDA: Methods & Applications GPU Metaprogramming using PyCUDA: Methods & Applications Division of Applied Mathematics Brown University Nvidia GTC October 2, 2009 Thanks Tim Warburton (Rice) Jan Hesthaven (Brown) Nicolas Pinto (MIT)

More information

PyCUDA. An Introduction

PyCUDA. An Introduction PyCUDA An Introduction Scripting GPUs with PyCUDA Why do Scripting for GPUs? GPUs are everything that scripting languages are not: Highly parallel Very architecture-sensitive Built for maximum FP/memory

More information

Programming GPUs with PyCuda

Programming GPUs with PyCuda Intro GPUs Scripting Hands-on Programming GPUs with PyCuda SciPy Conference 2009 / Advanced Tutorial http://conference.scipy.org/advanced_tutorials August 19, 2009 Intro GPUs Scripting Hands-on Thanks

More information

COURTESY A. KLOECKNER

COURTESY A. KLOECKNER GPU Metaprogramming applied to High Order DG and Loop Generation Division of Applied Mathematics Brown University August 19, 2009 Thanks Jan Hesthaven (Brown) Tim Warburton (Rice) Akil Narayan (Brown)

More information

Easy, Effective, Efficient: GPU Programming in Python with PyOpenCL and PyCUDA

Easy, Effective, Efficient: GPU Programming in Python with PyOpenCL and PyCUDA Intro Py{OpenCL,CUDA} Code Gen. Loo.py Conclusions Easy, Effective, Efficient: GPU Programming in Python with PyOpenCL and PyCUDA Courant Institute of Mathematical Sciences New York University May 21,

More information

PyCUDA. Continued...

PyCUDA. Continued... PyCUDA Continued... gpuarray Vector Types pycuda.gpuarray.vec All CUDA vector types are supported: float3, int3, long4, etc, Available as numpy data types Field names x, y, z, and w as in CUDA Construct

More information

GPU Programming Languages

GPU Programming Languages GPU Programming Languages Vilhelm Sjöberg April 5, 2010 What s wrong with CUDA? Low-level programs structured by kernels, not data flow. Limited metaprogramming features It s just not Haskell! (Or Python,

More information

CUDA and GPU Performance Tuning Fundamentals: A hands-on introduction. Francesco Rossi University of Bologna and INFN

CUDA and GPU Performance Tuning Fundamentals: A hands-on introduction. Francesco Rossi University of Bologna and INFN CUDA and GPU Performance Tuning Fundamentals: A hands-on introduction Francesco Rossi University of Bologna and INFN * Using this terminology since you ve already heard of SIMD and SPMD at this school

More information

Easy, Effective, Efficient: GPU Programming in Python. GPU-Python with PyOpenCL and PyCUDA

Easy, Effective, Efficient: GPU Programming in Python. GPU-Python with PyOpenCL and PyCUDA Easy, Effective, Efficient: GPU Programming in Python with PyOpenCL and PyCUDA Courant Institute of Mathematical Sciences New York University PASI: The Challenge of Massive Parallelism Lecture 3 January

More information

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy

More information

CUDA PROGRAMMING MODEL. Carlo Nardone Sr. Solution Architect, NVIDIA EMEA

CUDA PROGRAMMING MODEL. Carlo Nardone Sr. Solution Architect, NVIDIA EMEA CUDA PROGRAMMING MODEL Carlo Nardone Sr. Solution Architect, NVIDIA EMEA CUDA: COMMON UNIFIED DEVICE ARCHITECTURE Parallel computing architecture and programming model GPU Computing Application Includes

More information

Preparing seismic codes for GPUs and other

Preparing seismic codes for GPUs and other Preparing seismic codes for GPUs and other many-core architectures Paulius Micikevicius Developer Technology Engineer, NVIDIA 2010 SEG Post-convention Workshop (W-3) High Performance Implementations of

More information

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Introduction to CUDA programming

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Introduction to CUDA programming KFUPM HPC Workshop April 29-30 2015 Mohamed Mekias HPC Solutions Consultant Introduction to CUDA programming 1 Agenda GPU Architecture Overview Tools of the Trade Introduction to CUDA C Patterns of Parallel

More information

AMath 483/583, Lecture 24, May 20, Notes: Notes: What s a GPU? Notes: Some GPU application areas

AMath 483/583, Lecture 24, May 20, Notes: Notes: What s a GPU? Notes: Some GPU application areas AMath 483/583 Lecture 24 May 20, 2011 Today: The Graphical Processing Unit (GPU) GPU Programming Today s lecture developed and presented by Grady Lemoine References: Andreas Kloeckner s High Performance

More information

An Introduction to GPGPU Pro g ra m m ing - CUDA Arc hitec ture

An Introduction to GPGPU Pro g ra m m ing - CUDA Arc hitec ture An Introduction to GPGPU Pro g ra m m ing - CUDA Arc hitec ture Rafia Inam Mälardalen Real-Time Research Centre Mälardalen University, Västerås, Sweden http://www.mrtc.mdh.se rafia.inam@mdh.se CONTENTS

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware

More information

Tesla Architecture, CUDA and Optimization Strategies

Tesla Architecture, CUDA and Optimization Strategies Tesla Architecture, CUDA and Optimization Strategies Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Tesla Architecture & CUDA CUDA Programming Optimization

More information

Introduction to CUDA CME343 / ME May James Balfour [ NVIDIA Research

Introduction to CUDA CME343 / ME May James Balfour [ NVIDIA Research Introduction to CUDA CME343 / ME339 18 May 2011 James Balfour [ jbalfour@nvidia.com] NVIDIA Research CUDA Programing system for machines with GPUs Programming Language Compilers Runtime Environments Drivers

More information

Profiling & Tuning Applications. CUDA Course István Reguly

Profiling & Tuning Applications. CUDA Course István Reguly Profiling & Tuning Applications CUDA Course István Reguly Introduction Why is my application running slow? Work it out on paper Instrument code Profile it NVIDIA Visual Profiler Works with CUDA, needs

More information

Accelerating image registration on GPUs

Accelerating image registration on GPUs Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining

More information

GPU Fundamentals Jeff Larkin November 14, 2016

GPU Fundamentals Jeff Larkin November 14, 2016 GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate

More information

MapReduce Locality Sensitive Hashing GPUs. NLP ML Web Andrew Rosenberg

MapReduce Locality Sensitive Hashing GPUs. NLP ML Web Andrew Rosenberg MapReduce Locality Sensitive Hashing GPUs NLP ML Web Andrew Rosenberg Big Data What is Big Data? Data Analysis based on more data than previously considered. Analysis that requires new or different processing

More information

CS 179 Lecture 4. GPU Compute Architecture

CS 179 Lecture 4. GPU Compute Architecture CS 179 Lecture 4 GPU Compute Architecture 1 This is my first lecture ever Tell me if I m not speaking loud enough, going too fast/slow, etc. Also feel free to give me lecture feedback over email or at

More information

CUDA OPTIMIZATIONS ISC 2011 Tutorial

CUDA OPTIMIZATIONS ISC 2011 Tutorial CUDA OPTIMIZATIONS ISC 2011 Tutorial Tim C. Schroeder, NVIDIA Corporation Outline Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control

More information

Fundamental CUDA Optimization. NVIDIA Corporation

Fundamental CUDA Optimization. NVIDIA Corporation Fundamental CUDA Optimization NVIDIA Corporation Outline Fermi/Kepler Architecture Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control

More information

Scientific Computing with Python and CUDA

Scientific Computing with Python and CUDA Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 1 / 55 Inhalt 1 A

More information

Fundamental CUDA Optimization. NVIDIA Corporation

Fundamental CUDA Optimization. NVIDIA Corporation Fundamental CUDA Optimization NVIDIA Corporation Outline! Fermi Architecture! Kernel optimizations! Launch configuration! Global memory throughput! Shared memory access! Instruction throughput / control

More information

ECE 574 Cluster Computing Lecture 15

ECE 574 Cluster Computing Lecture 15 ECE 574 Cluster Computing Lecture 15 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 30 March 2017 HW#7 (MPI) posted. Project topics due. Update on the PAPI paper Announcements

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

CSC573: TSHA Introduction to Accelerators

CSC573: TSHA Introduction to Accelerators CSC573: TSHA Introduction to Accelerators Sreepathi Pai September 5, 2017 URCS Outline Introduction to Accelerators GPU Architectures GPU Programming Models Outline Introduction to Accelerators GPU Architectures

More information

HPC COMPUTING WITH CUDA AND TESLA HARDWARE. Timothy Lanfear, NVIDIA

HPC COMPUTING WITH CUDA AND TESLA HARDWARE. Timothy Lanfear, NVIDIA HPC COMPUTING WITH CUDA AND TESLA HARDWARE Timothy Lanfear, NVIDIA WHAT IS GPU COMPUTING? What is GPU Computing? x86 PCIe bus GPU Computing with CPU + GPU Heterogeneous Computing Low Latency or High Throughput?

More information

S WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS. Jakob Progsch, Mathias Wagner GTC 2018

S WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS. Jakob Progsch, Mathias Wagner GTC 2018 S8630 - WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS Jakob Progsch, Mathias Wagner GTC 2018 1. Know your hardware BEFORE YOU START What are the target machines, how many nodes? Machine-specific

More information

CUDA Architecture & Programming Model

CUDA Architecture & Programming Model CUDA Architecture & Programming Model Course on Multi-core Architectures & Programming Oliver Taubmann May 9, 2012 Outline Introduction Architecture Generation Fermi A Brief Look Back At Tesla What s New

More information

Introduction to CUDA Programming

Introduction to CUDA Programming Introduction to CUDA Programming Steve Lantz Cornell University Center for Advanced Computing October 30, 2013 Based on materials developed by CAC and TACC Outline Motivation for GPUs and CUDA Overview

More information

Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control flow

Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control flow Fundamental Optimizations (GTC 2010) Paulius Micikevicius NVIDIA Outline Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control flow Optimization

More information

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3

More information

CUDA Performance Optimization. Patrick Legresley

CUDA Performance Optimization. Patrick Legresley CUDA Performance Optimization Patrick Legresley Optimizations Kernel optimizations Maximizing global memory throughput Efficient use of shared memory Minimizing divergent warps Intrinsic instructions Optimizations

More information

Massively Parallel Computing with CUDA. Carlos Alberto Martínez Angeles Cinvestav-IPN

Massively Parallel Computing with CUDA. Carlos Alberto Martínez Angeles Cinvestav-IPN Massively Parallel Computing with CUDA Carlos Alberto Martínez Angeles Cinvestav-IPN What is a GPU? A graphics processing unit (GPU) The term GPU was popularized by Nvidia in 1999 marketed the GeForce

More information

Advanced CUDA Optimization 1. Introduction

Advanced CUDA Optimization 1. Introduction Advanced CUDA Optimization 1. Introduction Thomas Bradley Agenda CUDA Review Review of CUDA Architecture Programming & Memory Models Programming Environment Execution Performance Optimization Guidelines

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

GPU Programming Introduction

GPU Programming Introduction GPU Programming Introduction DR. CHRISTOPH ANGERER, NVIDIA AGENDA Introduction to Heterogeneous Computing Using Accelerated Libraries GPU Programming Languages Introduction to CUDA Lunch What is Heterogeneous

More information

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions. Schedule CUDA Digging further into the programming manual Application Programming Interface (API) text only part, sorry Image utilities (simple CUDA examples) Performace considerations Matrix multiplication

More information

GPU programming CUDA C. GPU programming,ii. COMP528 Multi-Core Programming. Different ways:

GPU programming CUDA C. GPU programming,ii. COMP528 Multi-Core Programming. Different ways: COMP528 Multi-Core Programming GPU programming,ii www.csc.liv.ac.uk/~alexei/comp528 Alexei Lisitsa Dept of computer science University of Liverpool a.lisitsa@.liverpool.ac.uk Different ways: GPU programming

More information

Advanced Topics: Streams, Multi-GPU, Tools, Libraries, etc.

Advanced Topics: Streams, Multi-GPU, Tools, Libraries, etc. CSC 391/691: GPU Programming Fall 2011 Advanced Topics: Streams, Multi-GPU, Tools, Libraries, etc. Copyright 2011 Samuel S. Cho Streams Until now, we have largely focused on massively data-parallel execution

More information

Dynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California

Dynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California Dynamic Cuda with F# HPC GPU & F# Meetup March 19 San Jose, California Dr. Daniel Egloff daniel.egloff@quantalea.net +41 44 520 01 17 +41 79 430 03 61 About Us! Software development and consulting company!

More information

Introduction to Parallel Computing with CUDA. Oswald Haan

Introduction to Parallel Computing with CUDA. Oswald Haan Introduction to Parallel Computing with CUDA Oswald Haan ohaan@gwdg.de Schedule Introduction to Parallel Computing with CUDA Using CUDA CUDA Application Examples Using Multiple GPUs CUDA Application Libraries

More information

CSC266 Introduction to Parallel Computing using GPUs Introduction to CUDA

CSC266 Introduction to Parallel Computing using GPUs Introduction to CUDA CSC266 Introduction to Parallel Computing using GPUs Introduction to CUDA Sreepathi Pai October 18, 2017 URCS Outline Background Memory Code Execution Model Outline Background Memory Code Execution Model

More information

Python VSIP API: A first draft

Python VSIP API: A first draft Python VSIP API: A first draft Stefan Seefeld HPEC WG meeting, December 9, 2014 Goals Use cases: Promote VSIP standard to a wider audience (SciPy users) Add more hardware acceleration to SciPy Allow VSIP

More information

Fundamental Optimizations

Fundamental Optimizations Fundamental Optimizations Paulius Micikevicius NVIDIA Supercomputing, Tutorial S03 New Orleans, Nov 14, 2010 Outline Kernel optimizations Launch configuration Global memory throughput Shared memory access

More information

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD INTRODUCTION TO OPENCL TM A Beginner s Tutorial Udeepta Bordoloi AMD IT S A HETEROGENEOUS WORLD Heterogeneous computing The new normal CPU Many CPU s 2, 4, 8, Very many GPU processing elements 100 s Different

More information

Warps and Reduction Algorithms

Warps and Reduction Algorithms Warps and Reduction Algorithms 1 more on Thread Execution block partitioning into warps single-instruction, multiple-thread, and divergence 2 Parallel Reduction Algorithms computing the sum or the maximum

More information

Supercomputing, Tutorial S03 New Orleans, Nov 14, 2010

Supercomputing, Tutorial S03 New Orleans, Nov 14, 2010 Fundamental Optimizations Paulius Micikevicius NVIDIA Supercomputing, Tutorial S03 New Orleans, Nov 14, 2010 Outline Kernel optimizations Launch configuration Global memory throughput Shared memory access

More information

This is a draft chapter from an upcoming CUDA textbook by David Kirk from NVIDIA and Prof. Wen-mei Hwu from UIUC.

This is a draft chapter from an upcoming CUDA textbook by David Kirk from NVIDIA and Prof. Wen-mei Hwu from UIUC. David Kirk/NVIDIA and Wen-mei Hwu, 2006-2008 This is a draft chapter from an upcoming CUDA textbook by David Kirk from NVIDIA and Prof. Wen-mei Hwu from UIUC. Please send any comment to dkirk@nvidia.com

More information

Introduction to CUDA

Introduction to CUDA Introduction to CUDA Overview HW computational power Graphics API vs. CUDA CUDA glossary Memory model, HW implementation, execution Performance guidelines CUDA compiler C/C++ Language extensions Limitations

More information

Introduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series

Introduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series Introduction to GPU Computing Using CUDA Spring 2014 Westgid Seminar Series Scott Northrup SciNet www.scinethpc.ca March 13, 2014 Outline 1 Heterogeneous Computing 2 GPGPU - Overview Hardware Software

More information

CUDA Programming. Week 1. Basic Programming Concepts Materials are copied from the reference list

CUDA Programming. Week 1. Basic Programming Concepts Materials are copied from the reference list CUDA Programming Week 1. Basic Programming Concepts Materials are copied from the reference list G80/G92 Device SP: Streaming Processor (Thread Processors) SM: Streaming Multiprocessor 128 SP grouped into

More information

Introduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series

Introduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series Introduction to GPU Computing Using CUDA Spring 2014 Westgid Seminar Series Scott Northrup SciNet www.scinethpc.ca (Slides http://support.scinet.utoronto.ca/ northrup/westgrid CUDA.pdf) March 12, 2014

More information

CS179 GPU Programming Introduction to CUDA. Lecture originally by Luke Durant and Tamas Szalay

CS179 GPU Programming Introduction to CUDA. Lecture originally by Luke Durant and Tamas Szalay Introduction to CUDA Lecture originally by Luke Durant and Tamas Szalay Today CUDA - Why CUDA? - Overview of CUDA architecture - Dense matrix multiplication with CUDA 2 Shader GPGPU - Before current generation,

More information

Betty Holberton Multitasking Gives the illusion that a single processor system is running multiple programs simultaneously. Each process takes turns running. (time slice) After its time is up, it waits

More information

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1 Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei

More information

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27 1 / 27 GPU Programming Lecture 1: Introduction Miaoqing Huang University of Arkansas 2 / 27 Outline Course Introduction GPUs as Parallel Computers Trend and Design Philosophies Programming and Execution

More information

Performance potential for simulating spin models on GPU

Performance potential for simulating spin models on GPU Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational

More information

Scripting CUDA (using python, R and MATLAB)

Scripting CUDA (using python, R and MATLAB) Scripting CUDA (using python, R and MATLAB) Ferdinand Jamitzky jamitzky@lrz.de http://goo.gl/nkd8fy Why parallel programming? End of the free lunch Moore's law means no longer faster processors, only more

More information

Parallel Programming on Larrabee. Tim Foley Intel Corp

Parallel Programming on Larrabee. Tim Foley Intel Corp Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This

More information

Chapter 3 Parallel Software

Chapter 3 Parallel Software Chapter 3 Parallel Software Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers

More information

COSC 6374 Parallel Computations Introduction to CUDA

COSC 6374 Parallel Computations Introduction to CUDA COSC 6374 Parallel Computations Introduction to CUDA Edgar Gabriel Fall 2014 Disclaimer Material for this lecture has been adopted based on various sources Matt Heavener, CS, State Univ. of NY at Buffalo

More information

GPUfs: Integrating a file system with GPUs

GPUfs: Integrating a file system with GPUs GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications OS CPU

More information

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty

More information

Parallel Programming and Debugging with CUDA C. Geoff Gerfin Sr. System Software Engineer

Parallel Programming and Debugging with CUDA C. Geoff Gerfin Sr. System Software Engineer Parallel Programming and Debugging with CUDA C Geoff Gerfin Sr. System Software Engineer CUDA - NVIDIA s Architecture for GPU Computing Broad Adoption Over 250M installed CUDA-enabled GPUs GPU Computing

More information

Dense Linear Algebra. HPC - Algorithms and Applications

Dense Linear Algebra. HPC - Algorithms and Applications Dense Linear Algebra HPC - Algorithms and Applications Alexander Pöppl Technical University of Munich Chair of Scientific Computing November 6 th 2017 Last Tutorial CUDA Architecture thread hierarchy:

More information

Information Coding / Computer Graphics, ISY, LiTH. Introduction to CUDA. Ingemar Ragnemalm Information Coding, ISY

Information Coding / Computer Graphics, ISY, LiTH. Introduction to CUDA. Ingemar Ragnemalm Information Coding, ISY Introduction to CUDA Ingemar Ragnemalm Information Coding, ISY This lecture: Programming model and language Introduction to memory spaces and memory access Shared memory Matrix multiplication example Lecture

More information

iennacl GPU-accelerated Linear Algebra at the Convenience of the C++ Boost Libraries Karl Rupp

iennacl GPU-accelerated Linear Algebra at the Convenience of the C++ Boost Libraries Karl Rupp GPU-accelerated Linear Algebra at the Convenience of the C++ Boost Libraries Karl Rupp Mathematics and Computer Science Division Argonne National Laboratory based on previous work at Technische Universität

More information

High Performance and GPU Computing in MATLAB

High Performance and GPU Computing in MATLAB High Performance and GPU Computing in MATLAB Jan Houška houska@humusoft.cz http://www.humusoft.cz 1 About HUMUSOFT Company: Humusoft s.r.o. Founded: 1990 Number of employees: 18 Location: Praha 8, Pobřežní

More information

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods

More information

Supporting Data Parallelism in Matcloud: Final Report

Supporting Data Parallelism in Matcloud: Final Report Supporting Data Parallelism in Matcloud: Final Report Yongpeng Zhang, Xing Wu 1 Overview Matcloud is an on-line service to run Matlab-like script on client s web browser. Internally it is accelerated by

More information

GPU Computing: Development and Analysis. Part 1. Anton Wijs Muhammad Osama. Marieke Huisman Sebastiaan Joosten

GPU Computing: Development and Analysis. Part 1. Anton Wijs Muhammad Osama. Marieke Huisman Sebastiaan Joosten GPU Computing: Development and Analysis Part 1 Anton Wijs Muhammad Osama Marieke Huisman Sebastiaan Joosten NLeSC GPU Course Rob van Nieuwpoort & Ben van Werkhoven Who are we? Anton Wijs Assistant professor,

More information

CUDA Lecture 2. Manfred Liebmann. Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17

CUDA Lecture 2. Manfred Liebmann. Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17 CUDA Lecture 2 Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17 manfred.liebmann@tum.de December 15, 2015 CUDA Programming Fundamentals CUDA

More information

GPU & High Performance Computing (by NVIDIA) CUDA. Compute Unified Device Architecture Florian Schornbaum

GPU & High Performance Computing (by NVIDIA) CUDA. Compute Unified Device Architecture Florian Schornbaum GPU & High Performance Computing (by NVIDIA) CUDA Compute Unified Device Architecture 29.02.2008 Florian Schornbaum GPU Computing Performance In the last few years the GPU has evolved into an absolute

More information

Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator

Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator What is CUDA? Programming language? Compiler? Classic car? Beer? Coffee? CUDA Parallel Computing Platform www.nvidia.com/getcuda Programming

More information

CUDA programming model. N. Cardoso & P. Bicudo. Física Computacional (FC5)

CUDA programming model. N. Cardoso & P. Bicudo. Física Computacional (FC5) CUDA programming model N. Cardoso & P. Bicudo Física Computacional (FC5) N. Cardoso & P. Bicudo CUDA programming model 1/23 Outline 1 CUDA qualifiers 2 CUDA Kernel Thread hierarchy Kernel, configuration

More information

ECE 574 Cluster Computing Lecture 17

ECE 574 Cluster Computing Lecture 17 ECE 574 Cluster Computing Lecture 17 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 March 2019 HW#8 (CUDA) posted. Project topics due. Announcements 1 CUDA installing On Linux

More information

Advanced CUDA Optimizations. Umar Arshad ArrayFire

Advanced CUDA Optimizations. Umar Arshad ArrayFire Advanced CUDA Optimizations Umar Arshad (@arshad_umar) ArrayFire (@arrayfire) ArrayFire World s leading GPU experts In the industry since 2007 NVIDIA Partner Deep experience working with thousands of customers

More information

NVIDIA Fermi Architecture

NVIDIA Fermi Architecture Administrivia NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Assignment 4 grades returned Project checkpoint on Monday Post an update on your blog beforehand Poster

More information

CUDA GPGPU Workshop CUDA/GPGPU Arch&Prog

CUDA GPGPU Workshop CUDA/GPGPU Arch&Prog CUDA GPGPU Workshop 2012 CUDA/GPGPU Arch&Prog Yip Wichita State University 7/11/2012 GPU-Hardware perspective GPU as PCI device Original PCI PCIe Inside GPU architecture GPU as PCI device Traditional PC

More information

Multi-Processors and GPU

Multi-Processors and GPU Multi-Processors and GPU Philipp Koehn 7 December 2016 Predicted CPU Clock Speed 1 Clock speed 1971: 740 khz, 2016: 28.7 GHz Source: Horowitz "The Singularity is Near" (2005) Actual CPU Clock Speed 2 Clock

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 30: GP-GPU Programming. Lecturer: Alan Christopher

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 30: GP-GPU Programming. Lecturer: Alan Christopher CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 30: GP-GPU Programming Lecturer: Alan Christopher Overview GP-GPU: What and why OpenCL, CUDA, and programming GPUs GPU Performance

More information

KSTAR tokamak. /

KSTAR tokamak. / KSTAR tokamak / spinhalf@nfri.re.kr !!! Data parallelism CUDA programming python! pycuda GPU Development tools Python 2.6+ Scientific libraries as python package for interactive computing (Numpy, Scipy..)

More information

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University Part 1: General introduction Ch. Hoelbling Wuppertal University Lattice Practices 2011 Outline 1 Motivation 2 Hardware Overview History Present Capabilities 3 Programming model Past: OpenGL Present: CUDA

More information

GPU Programming for Mathematical and Scientific Computing

GPU Programming for Mathematical and Scientific Computing GPU Programming for Mathematical and Scientific Computing Ethan Kerzner and Timothy Urness Department of Mathematics and Computer Science Drake University Des Moines, IA 50311 ethan.kerzner@gmail.com timothy.urness@drake.edu

More information

Lecture 11: GPU programming

Lecture 11: GPU programming Lecture 11: GPU programming David Bindel 4 Oct 2011 Logistics Matrix multiply results are ready Summary on assignments page My version (and writeup) on CMS HW 2 due Thursday Still working on project 2!

More information

Josef Pelikán, Jan Horáček CGG MFF UK Praha

Josef Pelikán, Jan Horáček CGG MFF UK Praha GPGPU and CUDA 2012-2018 Josef Pelikán, Jan Horáček CGG MFF UK Praha pepca@cgg.mff.cuni.cz http://cgg.mff.cuni.cz/~pepca/ 1 / 41 Content advances in hardware multi-core vs. many-core general computing

More information

Introduction to CUDA (1 of n*)

Introduction to CUDA (1 of n*) Administrivia Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Paper presentation due Wednesday, 02/23 Topics first come, first serve Assignment 4 handed today

More information

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA Part 1: Hardware design and programming model Dirk Ribbrock Faculty of Mathematics, TU dortmund 2016 Table of Contents Why parallel

More information

GPU Computing: A VFX Plugin Developer's Perspective

GPU Computing: A VFX Plugin Developer's Perspective .. GPU Computing: A VFX Plugin Developer's Perspective Stephen Bash, GenArts Inc. GPU Technology Conference, March 19, 2015 GenArts Sapphire Plugins Sapphire launched in 1996 for Flame on IRIX, now works

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

Memory. Lecture 2: different memory and variable types. Memory Hierarchy. CPU Memory Hierarchy. Main memory

Memory. Lecture 2: different memory and variable types. Memory Hierarchy. CPU Memory Hierarchy. Main memory Memory Lecture 2: different memory and variable types Prof. Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Key challenge in modern computer architecture

More information

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in

More information

Technische Universität München. GPU Programming. Rüdiger Westermann Chair for Computer Graphics & Visualization. Faculty of Informatics

Technische Universität München. GPU Programming. Rüdiger Westermann Chair for Computer Graphics & Visualization. Faculty of Informatics GPU Programming Rüdiger Westermann Chair for Computer Graphics & Visualization Faculty of Informatics Overview Programming interfaces and support libraries The CUDA programming abstraction An in-depth

More information

GPU Programming Using CUDA

GPU Programming Using CUDA GPU Programming Using CUDA Michael J. Schnieders Depts. of Biomedical Engineering & Biochemistry The University of Iowa & Gregory G. Howes Department of Physics and Astronomy The University of Iowa Iowa

More information

ECE 8823: GPU Architectures. Objectives

ECE 8823: GPU Architectures. Objectives ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading

More information