CUDA programming. CUDA requirements. CUDA Querying. CUDA Querying. A CUDA-capable GPU (NVIDIA) NVIDIA driver A CUDA SDK

Size: px

Start display at page:

Download "CUDA programming. CUDA requirements. CUDA Querying. CUDA Querying. A CUDA-capable GPU (NVIDIA) NVIDIA driver A CUDA SDK"

Victor Harrington
6 years ago
Views:

1 CUDA programming Bedrich Benes, Ph.D. Purdue University Department of Computer Graphics CUDA requirements A CUDA-capable GPU (NVIDIA) NVIDIA driver A CUDA SDK Standard C compiler What do I have? cudadeviceprop prop; int n; cudagetdevicecount(&n);//how many GPUs? for (int i=0;i<n;i++) { cudagetdeviceproperties(&prop,i); printf( %i %i,prop.major,prop.minor); } Device 0: "GeForce GTX 780" CUDA Driver Version / Runtime Version 6.5 / 6.5 CUDA Capability Major/Minor version number: 3.5 Total amount of global memory: 3072 MBytes ( bytes) (12) Multiprocessors, (192) CUDA Cores/MP: 2304 CUDA Cores GPU Clock rate: 902 MHz (0.90 GHz) Memory Clock rate: 3004 Mhz Memory Bus Width: 384-bit L2 Cache Size: bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536),3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: bytes Total amount of shared memory per block: bytes Total number of registers available per block: Warp size: 32 1

2 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): ( , 65535, 65535) Maximum memory pitch: bytes Texture alignment: 512 bytes CUDA query example Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model) Device supports Unified Addressing (UVA): No Device PCI Bus ID / PCI location ID: 2 / 0 Compute Mode: < Default (multiple host threads can use ::cudasetdevice() with device simu ltaneously) > devicequery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce GTX 780 CUDA instruction set is called Parallel Thread Execution PTX A kernel can be written in PTX, but it is tedious Nvidia C compiler - nvcc makes this task easier. CUDA & C source code (*.cu, *.cpp) CUDA libraries (FFT, BLAS, etc.) nvcc.exe (NVIDIA compiler) PTX (NVIDIA assembly code) ASM (CPU host code) CUDA Driver C compiler C libraries GPU CPU Offline compilation Compiles the device code into the assembly code PTX is a device independent object code PTX is compiled to a particular device PTX can be executed on a device different than the device that has generated it cubin cuda binary 2

3 Just in time compilation Virtual layer CUDA & C source code (*.cu) Physical layer PTX2Target Compiler PTX is further compiled to binary The binary code is cached in compute cache nvcc.exe (NVIDIA compiler) PTX (NVIDIA assembly code) GPU1 GPU2 GPUn CUDA C Runtime The runtime is in library: cudart.lib static link cudart.dll dynamic link Build configurations nvcc <filename>.cu [-o executable] generates release mode nvcc -g <filename>.cu debug mode (for host code only, not for device) nvcc -deviceemu <filename>.cu builds device emulation mode all runs on the CPU, no debug symbols nvcc -deviceemu -g <filename>.cu debug device emulation mode (all cpu & debug symbols) 3

4 Create empty project (Win32 console application) Select Build Customizations Select CUDA Add your file.cu to the project Select the source *.cpp or *.cu file and assign the custom compile to it multiple files in a project *.cpp files can be normally used and will be linked 4

5 .net Debugging You can step in read values etc. using standard.net tools Reading NVIDIA CUDA Programming Guide Kirk, D.B., Hwu, W.W., Programming Massively Parallel Processors, NVIDIA, Morgan Kaufmann 2010 C:\ProgramData\NVIDIA Corporation\CUDA Samples\v6.5 5

Cuda Compilation Utilizing the NVIDIA GPU Oct 19, 2017

Cuda Compilation Utilizing the NVIDIA GPU Oct 19, 2017 This document will essentially provide the reader with the understanding on how to use the CUDA 7.0 environment within the Electrical and Computer