NVIDIA GPU CODING & COMPUTING

Size: px

Start display at page:

Download "NVIDIA GPU CODING & COMPUTING"

Meghan Gordon
6 years ago
Views:

1 NVIDIA GPU CODING & COMPUTING

2 WHY GPU S?

5 ARCHITECTURE & PROGRAM MODEL

6 CPU v. GPU

7 Multiprocessor Model

8 Memory Model

9 Memory Model: Thread Level

10 Programing Model: Logical Mapping of Threads

11 Programing Model: Another Look

12 PROGRAMING MODEL: KIRCHHOFF MIGRATION EXAMPLE

13 Program Model: Kirchhoff Migration Example See Moton, S. Seismic Imaging on GPUs: Algorithms and Porting & Production Experiences, Hess Corporation, Slides 15-19

14 EXECUTION MODEL

15 Execution Model: Mapping of Thread Blocks to HW

16 Execution Model: Thread Level

18 C/C++ EXTENSIONS

19 C/C++ Extensions: Function Type Qualifiers device global host There are use restrictions for these See NVIDIA_CUDA_Programming_Guide_2.3.pdf pges 105 and 106.

20 C/C++ Extensions: Value Type Qualifiers device constant shared volatile There are use restrictions for these too. See NVIDIA_CUDA_Programming_Guide_2.3.pdf pges 106 and 107.

21 C/C++ Extensions: Built-in Vector Types char1, uchar1, char2, uchar2,..., char4, uchar4 short1, ushort1,..., short4, ushort4 int1, uint1,..., int4, uint4 long1, ulong1,..., long4, ulong4 longlong1, longlong2 float1,..., float4 double1, double2 dim3 (like a stuct with 3 ints) See NVIDIA_CUDA_Programming_Guide_2.3.pdf pges

22 C/C++ Extensions: Built-in Variables griddim blockidx blockdim threadidx warpsize There are restrictions for these See NVIDIA_CUDA_Programming_Guide_2.3.pdf pg 110.

23 C/C++ Extensions: Memory Fence Functions threadfence() threadfence_block() See NVIDIA_CUDA_Programming_Guide_2.3.pdf pges 110 and 111.

24 C/C++ Extensions: Synchronization Function syncthreads() See NVIDIA_CUDA_Programming_Guide_2.3.pdf pg 111.

25 C/C++ Extensions: Mathematical Functions See Appendix C See NVIDIA_CUDA_Programming_Guide_2.3.pdf begins at pg 119.

26 C/C++ Extensions: Other Functions Texture Functions Time Function Atomic Functions Bitwise Functions See NVIDIA_CUDA_Programming_Guide_2.3.pdf begins at pg 113.

27 CUDA KERNEL EXAMPLES

28 CUDA Kernel Examples See CUDA SDK and Tim s Hello World examples, and the CUDA Programing Guide v. 2.3

29 PERFORMANCE GUIDELINES: GLOBAL MEMORY ACCESS

30 CUDA RUNTIME API

31 CUDA Runtime API Allocate Memory Free Memory Copy Memory Set Memory Texture Functions etc... See CUDA SDK examples and CudaReferenceManual.pdf for more

32 CUDA DRIVER API

33 CUDA Driver API Allocate Memory Free Memory Copy Memory Set Memory Texture Functions etc... See CUDA SDK examples and CudaReferenceManual.pdf for more

34 PERFORMANCE GUIDELINES: GLOBAL MEMORY ACCESS

35 Global Memory Access: Coalesced Read/Write Random Single Coalesced Read/ Write Missaligned Single Coalesced Read/ Write Two Coalesced Reads/Writes

36 PERFORMANCE GUIDELINES: SHARED MEMORY ACCESS

37 Shared Memory Access: NO Bank Conflicts

38 Shared Memory Access: NO Bank Conflicts

39 Shared Memory Access: With Bank Conflicts

40 Shared Memory Access: Broadcasting (NO Conflict*) *Possible two way back conflict

41 PERFORMANCE GUIDELINES: OTHER STUFF

42 Performance Guidelines: Other Stuff Try to avoid divergent warps Trade Precision for Speed Bitwise Tricks (i.e. x/4 <=> x>>2) See CUDA Programing Guide v. 2.3 for more See NVIDIA_CUDA_Programming_Guide_2.3.pdf pges

43 PERFORMANCE GUIDELINES: KIRCHHOFF MIGRATION EXAMPLE

44 Performance Guidelines: Kirchhoff Migration Example See Moton, S. Seismic Imaging on GPUs: Algorithms and Porting & Production Experiences, Hess Corporation, Slides

45 DEBUGGING

46 Debugging Device Emulation Mode gdb Threads Have Unique IDs CUDA API Functions Return Error Statuses See NVIDIA_CUDA_Programming_Guide_2.3.pdf pges 44 and 45.

47 PROFILING

48 Profiling $ export CUDA_PROFILE=1 CUDA GI Profiler CUDA Simple Text Based Profiler cuda_profile.log in cwd/exec-dir See Cuda_Profiler_2.3.txt

49 RELEVANT/IMPORTANT TOPICS THAT WERE NOT COVERED

50 Topics Not Covered (at least not in great detail) Texture Memory and Fetching CUDA API s Asynchronous Concurrent Execution Streams Events Technical Specifications Compute Capability Intrinsic Functions And Other Stuff...

51 END

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions. Schedule CUDA Digging further into the programming manual Application Programming Interface (API) text only part, sorry Image utilities (simple CUDA examples) Performace considerations Matrix multiplication