GPU & High Performance Computing (by NVIDIA) CUDA. Compute Unified Device Architecture Florian Schornbaum

GPU & High Performance Computing (by NVIDIA) CUDA Compute Unified Device Architecture 29.02.2008 Florian Schornbaum

GPU Computing Performance In the last few years the GPU has evolved into an absolute computing workhorse : programmable processor very high memory bandwidth multiple cores (high parallelism)

CPU vs. GPU GPU originally specialized for math-intensive, highly parallel computation (exactly what graphics rendering is about) On the GPU (in contrast to the CPU) more transistors are devoted to data processing ( ALU) rather than data caching and flow control

Problem: GPGPU (general purpose computation on GPUs) GPGPU so far : program the GPU through a graphics API and trick the GPU into general-purpose computing by casting problems as graphics : turn data into images ( texture maps ) turn algorithms into image synthesis ( rendering passes ) Promising results, but : tough learning curve, particularly for non-graphics experts potentially high overhead of an inadequate graphics API highly constrained memory layout & access model

Solution: GPU Computing with CUDA Co-designed hardware & software for direct GPU computing : new hardware and software architecture for issuing and managing computations on the GPU as a data-parallel computing device without the need of mapping them to a graphics API (available for the GeForce 8 series, the Tesla platform and some Quadro solutions) 2 higher-level mathematical libraries of common usage : CUFFT and CUBLAS Hardware has been designed to support lightweight driver and runtime layers high performance

Hardware Implementation The device is implemented as a set of multiprocessors. Each multiprocessor has a single instruction multiple data architecture (SIMD).

Hardware Implementation The device is implemented as a set of multiprocessors. Each multiprocessor has a single instruction multiple data architecture (SIMD). At any given clock cycle, each processor of the multiprocessors executes the same instruction, but operates on different data.

Programming Model: A Highly Multi-threaded Coprocessor the GPU is viewed as a compute device capable of executing a very high number of threads in parallel

Programming Model: A Highly Multi-threaded Coprocessor the GPU is viewed as a compute device capable of executing a very high number of threads in parallel the GPU operates as a coprocessor to the main CPU (host) : data-parallel, compute-intensive portions of applications running on the host are off-loaded onto the device a portion of an application that is executed many times, but on different data, can be isolated into a function that is executed on the device as many different threads

Application Programming Interface an extension to the C programming language function type qualifiers to specify whether a function executes on the host or the device

Application Programming Interface an extension to the C programming language function type qualifiers to specify whether a function executes on the host or the device explicit GPU memory allocation, returns pointers to GPU memory ( cudamalloc(), cudafree() ) memory can be copied from host to device, device to host and device to device ( cudamemcpy() )

CPU C Program void add_matrix_cpu(float *a, float *b, float *c, int N) { int i, j, index; for(i = 0; i < N; i++) { for(j = 0; j < N; j++) { index = i + j * N; c[index] = a[index] + b[index]; } } } void main() {... add_matrix_cpu(a, b, c, N);... }

CUDA C Program global void add_matrix_gpu(float *a, float *b, float *c, int N) { int i = blockidx.x * blockdim.x + threadidx.x; int j = blockidx.y * blockdim.y + threadidx.y; int index = i + j * N; if(i < N && j < N) c[index] = a[index] + b[index]; } void main() {... dim3 dimblock(blocksize, blocksize); dim3 dimgrid(n / dimblock.x, N / dimblock.y); add_matrix_gpu<<<dimgrid,dimblock>>>(a, b, c, N);... }

CUDA Software Development Kit

CUDA vs. Standard CPU Performance (2007)

GPU Computing with CUDA THE END References : SUPERCOMPUTING 2007 Tutorial: High Performance Computing with CUDA http://www.gpgpu.org/sc2007/ CUDA Programming Guide 1.1 http://developer.download.nvidia.com/compute/cuda/1_1/nvidia_cuda_programming_guide_1.1.pdf