GETTING STARTED WITH CUDA SDK SAMPLES

Similar documents
NVWMI VERSION 2.18 STANDALONE PACKAGE

QUADRO SYNC II FIRMWARE VERSION 2.02

NVIDIA CUDA C INSTALLATION AND VERIFICATION ON

GRID SOFTWARE FOR RED HAT ENTERPRISE LINUX WITH KVM VERSION /370.28

GRID SOFTWARE FOR MICROSOFT WINDOWS SERVER VERSION /370.12

NVIDIA Tesla Compute Cluster Driver for Windows

GPU LIBRARY ADVISOR. DA _v8.0 September Application Note

Tuning CUDA Applications for Fermi. Version 1.2

NVIDIA CUDA C GETTING STARTED GUIDE FOR MAC OS X

KEPLER COMPATIBILITY GUIDE FOR CUDA APPLICATIONS

MAXWELL COMPATIBILITY GUIDE FOR CUDA APPLICATIONS

DU _v01. September User Guide

TESLA M2050 AND TESLA M2070/M2070Q DUAL-SLOT COMPUTING PROCESSOR MODULES

Getting Started. NVIDIA CUDA C Installation and Verification on Mac OS X

NVIDIA nforce 790i SLI Chipsets

TESLA 1U GPU COMPUTING SYSTEMS

PASCAL COMPATIBILITY GUIDE FOR CUDA APPLICATIONS

Getting Started. NVIDIA CUDA Development Tools 2.2 Installation and Verification on Mac OS X. May 2009 DU _v01

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

NVIDIA CUDA. Fermi Compatibility Guide for CUDA Applications. Version 1.1

NVIDIA SLI Mosaic Mode

GPU Computing Workshop CSU Background and motivation. Garland Durham Quantos Analytics

Getting Started. NVIDIA CUDA Development Tools 2.3 Installation and Verification on Mac OS X

NVIDIA CAPTURE SDK 6.1 (WINDOWS)

NVIDIA CAPTURE SDK 6.0 (WINDOWS)

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

Getting Started. NVIDIA CUDA Development Tools 2.2 Installation and Verification on Microsoft Windows XP and Windows Vista

NVIDIA Release 197 Tesla Driver for Windows

CUDA TOOLKIT 3.2 READINESS FOR CUDA APPLICATIONS

GRID SOFTWARE MANAGEMENT SDK

VIRTUAL GPU LICENSE SERVER VERSION

VIRTUAL GPU SOFTWARE MANAGEMENT SDK

NVWMI VERSION 2.24 STANDALONE PACKAGE

VIRTUAL GPU MANAGEMENT PACK FOR VMWARE VREALIZE OPERATIONS

Release 270 Tesla Driver for Windows - Version

NVIDIA CAPTURE SDK 7.1 (WINDOWS)

SDK White Paper. Vertex Lighting Achieving fast lighting results

NSIGHT ECLIPSE PLUGINS INSTALLATION GUIDE

Application Note. NVIDIA Business Platform System Builder Certification Guide. September 2005 DA _v01

TUNING CUDA APPLICATIONS FOR MAXWELL

GRID SOFTWARE FOR HUAWEI UVP VERSION /370.12

TUNING CUDA APPLICATIONS FOR MAXWELL

MOSAIC CONTROL DISPLAYS

NSIGHT ECLIPSE EDITION

VIRTUAL GPU SOFTWARE R384 FOR RED HAT ENTERPRISE LINUX WITH KVM

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

VIRTUAL GPU SOFTWARE R384 FOR MICROSOFT WINDOWS SERVER

NVIDIA CUDA GETTING STARTED GUIDE FOR LINUX

SDK White Paper. Matrix Palette Skinning An Example

GRID VIRTUAL GPU FOR HUAWEI UVP Version ,

GRID SOFTWARE FOR HUAWEI UVP VERSION /370.28

NVBLAS LIBRARY. DU _v6.0 February User Guide

VIRTUAL GPU CLIENT LICENSING

User Guide. Vertex Texture Fetch Water

CUDA/OpenGL Fluid Simulation. Nolan Goodnight

NVIDIA CUDA INSTALLATION GUIDE FOR MAC OS X

NVIDIA GPU BOOST FOR TESLA

VIRTUAL GPU SOFTWARE R390 FOR RED HAT ENTERPRISE LINUX WITH KVM

VIRTUAL GPU SOFTWARE R384 FOR MICROSOFT WINDOWS SERVER

NSIGHT ECLIPSE EDITION

User Guide. GLExpert NVIDIA Performance Toolkit

VIRTUAL GPU SOFTWARE. QSG _v5.0 through 5.2 Revision 03 February Quick Start Guide

USING INLINE PTX ASSEMBLY IN CUDA

VIRTUAL GPU CLIENT LICENSING

RMA PROCESS. vr384 October RMA Process

Cg Toolkit. Cg 2.0 January 2008 Release Notes

GRID LICENSING. DU _v4.6 January User Guide

Technical Brief. LinkBoost Technology Faster Clocks Out-of-the-Box. May 2006 TB _v01

VIRTUAL GPU LICENSE SERVER VERSION AND 5.1.0

Why? High performance clusters: Fast interconnects Hundreds of nodes, with multiple cores per node Large storage systems Hardware accelerators

Tesla GPU Computing A Revolution in High Performance Computing

NVJPEG. DA _v0.2.0 October nvjpeg Libary Guide

NSIGHT ECLIPSE EDITION

Technical Report. GLSL Pseudo-Instancing

NVIDIA CUDA INSTALLATION GUIDE FOR MICROSOFT WINDOWS

GLExpert NVIDIA Performance Toolkit

Multi-View Soft Shadows. Louis Bavoil

Cg Toolkit. Cg 2.0 May 2008 Release Notes

CREATING AN NVIDIA QUADRO VIRTUAL WORKSTATION INSTANCE

GRID VIRTUAL GPU FOR HUAWEI UVP Version /

Android PerfHUD ES quick start guide

NVIDIA CUDA VIDEO DECODER

NVIDIA CUDA TM Developer Guide for NVIDIA Optimus Platforms. Version 1.0

Horizon-Based Ambient Occlusion using Compute Shaders. Louis Bavoil

NVJPEG. DA _v0.1.4 August nvjpeg Libary Guide

SDK White Paper. Occlusion Query Checking for Hidden Pixels

CUDA-GDB: The NVIDIA CUDA Debugger

NVIDIA CUDA GETTING STARTED GUIDE FOR LINUX

High Quality DXT Compression using OpenCL for CUDA. Ignacio Castaño

GRID SOFTWARE FOR VMWARE VSPHERE VERSION /370.12

VIRTUAL GPU CLIENT LICENSING

CUDA-MEMCHECK. DU _v03 February 17, User Manual

GRID SOFTWARE FOR VMWARE VSPHERE VERSION /370.21

GRID VGPU FOR VMWARE VSPHERE Version /

CUDA DEVELOPER GUIDE FOR NVIDIA OPTIMUS PLATFORMS

Technical Report. Anisotropic Lighting using HLSL

Cg Toolkit. Cg 1.3 Release Notes. December 2004

NVIDIA DEBUG MANAGER FOR ANDROID NDK - VERSION 8.0.1

CUDA DEVELOPER GUIDE FOR NVIDIA OPTIMUS PLATFORMS

GRID VGPU FOR VMWARE VSPHERE Version /

Transcription:

GETTING STARTED WITH CUDA SDK SAMPLES DA-05723-001_v01 January 2012 Application Note

TABLE OF CONTENTS Getting Started with CUDA SDK Samples... 1 Before You Begin... 2 Getting Started With SDK Samples... 2 Getting Started Samples... 2 matrixmul... 2 simpletemplates... 2 template... 2 template_runtime... 3 Simpler CUDA Samples... 3 BandwidthTest... 3 Clock... 3 cudaopenmp... 3 devicequery, devicequerydrv... 3 Ptxjit... 3 simpleatomicintrinsics... 3 simplecublas... 3 simplecufft... 4 simplempi... 4 simplemulticopy... 4 simplemultigpu... 4 simplepitchlineartexture... 4 simpleprintf... 4 simplestreams... 5 simplesurfacewrite... 5 simpletemplates... 5 simpletexture, simpletexturedrv... 5 simplevoteintrinsics... 5 simplezerocopy... 5 template... 5 template_runtime... 5 CUDA + Graphics Interoperability... 7 simpled3d9... 7 simpled3d9texture... 7 simpled3d10... 7 simpled3d10texture... 7 simpled3d11texture... 7 simplegl... 7 simpletexture3d... 7 Getting Started With CUDA SDK Samples DA-05723-001_v01 ii

GETTING STARTED WITH CUDA SDK SAMPLES NVIDIA CUDA TM is a general purpose parallel computing architecture introduced by NVIDIA. It includes the CUDA Instruction Set Architecture (ISA) and the parallel compute engine in the GPU. This document is intended to introduce to you a set of SDK samples that can be run as an introduction to CUDA. Most of these SDK samples use the CUDA runtime API except for ones explicitly noted that are CUDA Driver API. To run these SDK samples, you should have experience with C and/or C++. It is not required that you have any parallel programming experience to start out. The CUDA C SDK samples listed in this document are found in both the C and CUDALibraries default directories in the following folders: Windows: ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.1 Linux: ~/NVIDIA GPU Computing SDK Mac OSX: /Developer/GPU Computing Getting Started With CUDA SDK Samples DA-05723-001_v01 1

BEFORE YOU BEGIN This document assumes you have installed CUDA on your system. CUDA runs on Windows, Mac, and Linux environments. To install CUDA, refer to the CUDA Getting Started Guide available with the SDK and on the CUDA web site at: http://www.nvidia.com/cuda) GETTING STARTED WITH SDK SAMPLES The list of SDK samples is divided up into three categories: Getting started samples If you are new to CUDA, these are the best SDK samples to begin with. Simple CUDA samples Samples that demonstrate CUDA + Graphics interoperability Note: There are some overlaps between the three categories. Getting Started Samples matrixmul This sample implements matrix multiplication as a CUDA kernel. It has been written for clarity of exposition to illustrate various CUDA programming principles, but not with the goal of providing the best performance kernel for matrix multiplication. For performance, this SDK sample also uses the CUBLAS library to show high-performance matrix multiplication. simpletemplates This sample is a templatized version of the template project. It also shows how to correctly templatize dynamically-allocated shared memory arrays. template This sample is a basic template project that can be used as a starting point for creating new CUDA projects. Getting Started With CUDA SDK Samples DA-05723-001_v01 2

template_runtime This is a simple template project that can be used as a starting point to create a new CUDA project that does not use the cutil library. Simpler CUDA Samples BandwidthTest This is a test program to measure the memory copy bandwidth of the GPU. It currently is capable of measuring device-to-device copy bandwidth, host-to-device copy bandwidth for pageable and page-locked memory, and device-to-host copy bandwidth for pageable and page-locked memory. Clock This example shows how to use the clock function in CUDA kernels to measure the performance within a kernel accurately. cudaopenmp This is a sample application demonstrating how to use the OpenMP API for launching workloads across multiple GPUs. The binaries for this sample are not pre-built with the Windows SDK installer. devicequery, devicequerydrv These two SDK samples show how to enumerate properties of the CUDA devices present in the system (using the CUDA runtime API). The *Drv version has the same functions as the runtime sample, but uses the CUDA Driver API. Ptxjit This sample demonstrates JIT compilation of PTX code. This sample uses a PTX program embedded in a string array. The CUDA Driver API calls are used to compile and run a PTX program. simpleatomicintrinsics This is a simple demonstration of global memory atomic instructions. This sample requires Compute Capability 1.1 or higher. simplecublas This is a basic example demonstrating how to use the CUBLAS (CUDA Basic Linear Algebra) library. This sample can be found within the CUDALibraries folder. For more details on how to use the CUBLAS Library, refer to the CUBLAS_Library.pdf Programming Guide included with the CUDA Toolkit. Getting Started With CUDA SDK Samples DA-05723-001_v01 3

simplecufft This is a basic example demonstrating how to use the CUFFT (CUDA Fast Fourier Transform) Library. In this sample, CUFFT is used to compute the 1D-convolution of a signal. The signal is transformed to the frequency domain, multiplied together with a filter kernel, and then the signal is transformed back to time domain. This sample can be found within the CUDALibraries folder. For more details on how to use the CUFFT Library, refer to the CUBLAS_Library.pdf Programming Guide included with the CUDA Toolkit. simplempi This demonstrates how to use MPI in combination with CUDA to demonstrate how to launch workloads across multiple systems that have a GPU. This sample generates some random numbers on one node, dispatches to all nodes, then computes the square root on each node s GPU. Then the average results of the results are computed. The binary is not pre-built with the SDK installer. simplemulticopy Since Compute Capability 1.1, it is possible to overlap compute with one memcopy to/from the host. Compute Capability 2.0 with a Tesla or Quadro GPU improves on this by enabling a second parallel copy operation in the opposite direction at full speed (PCIe is symmetric). This sample illustrates the usage of CUDA streams to achieve overlapping of kernel execution with copying data to and from the device. simplemultigpu This application demonstrates how to use the CUDA API launch workloads across multiple GPUs. Starting with CUDA 4.0, there is a new API for CUDA context management and multi-threaded access. This greatly simplifies the way that CUDA kernels can be launched across multiple GPUs. For more details, refer to sections 3.2.4.1, 3.2.4.3, and 3.2.6 in the CUDA C Programming Guide and to the CUDA_4.0_Readiness_Tech_Brief.pdf about the new multi-device programming model. simplepitchlineartexture This sample demonstrates how to use 1D Pitch Linear Textures in a CUDA program. simpleprintf This CUDA Runtime API sample is a very basic sample that implements how to use the printf function in the device code. Specifically, for devices with compute capability less than 2.0, the function cuprintf is called; otherwise, printf can be used directly. Getting Started With CUDA SDK Samples DA-05723-001_v01 4

For more details, refer to Appendix B.14 in the CUDA C Programming Guide included with the CUDA Toolkit. simplestreams This sample uses CUDA streams to overlap kernel executions with memcopies between the device and the host. Starting with CUDA 4.0, this sample adds support to pin of generic host memory. This SDK sample requires Compute Capability 1.1 or higher. For more details, refer to sections 3.2.5.5 in the CUDA C Programming Guide and to the CUDA_4.0_Readiness_Tech_Brief.pdf included with the CUDA Toolkit. simplesurfacewrite This sample demonstrates the use of surface references, thus enabling write-to-texture. (This sample requires a Fermi-based GPU (Compute Capability 2.0)). simpletemplates This sample is a templatized version of the template project. It also shows how to correctly templatize dynamically-allocated shared memory arrays. simpletexture, simpletexturedrv This sample demonstrates how to use textures with CUDA (Runtime API and the Driver API versions). simplevoteintrinsics This is a simple program that demonstrates how to use the Vote (any, all) intrinsic instruction in a CUDA kernel. This sample requires Compute Capability 1.2 or higher. simplezerocopy This sample illustrates how to use Zero Memory Copy. Kernels can read directly from and write directly to pinned system memory. This sample requires GPUs that support this feature (MCP79, GT200, Fermi based GPUs). Refer to Section 3.1.3 in the CUDA_C_Best_Practices_Guide.pdf for more details. template This sample is a basic template project that can be used as a starting point for creating new CUDA projects. template_runtime This is a simple template project that can be used as a starting point to create a new CUDA project that does not use the cutil library. Getting Started With CUDA SDK Samples DA-05723-001_v01 5

Getting Started With CUDA SDK Samples DA-05723-001_v01 6

CUDA + Graphics Interoperability These SDK samples demonstrate interoperability between CUDA and graphics. simpled3d9 This program demonstrates the interoperability between CUDA and Direct3D9. The program modifies vertex positions with CUDA and uses Direct3D9 to render the geometry. A Direct3D capable device is required. simpled3d9texture This program demonstrates Direct3D9 texture interoperability with CUDA. The program creates a number of D3D9 textures (2D, 3D, and CubeMap) which are written to/from CUDA kernels. Direct3D then renders the results on the screen. A Direct3D capable device is required. simpled3d10 This program demonstrates the interoperability between CUDA and Direct3D10. The program modifies vertex positions with CUDA and uses Direct3D10 to render the geometry. A Direct3D Capable device is required. simpled3d10texture This program demonstrates Direct3D10 texture interoperability with CUDA. The program creates a number of D3D10 textures (2D, 3D, and CubeMap) which are written to from CUDA kernels. Direct3D then renders the results on the screen. A Direct3D Capable device is required. simpled3d11texture This program demonstrates Direct3D11 texture interoperability with CUDA. The program creates a number of D3D11 textures (2D, 3D, and CubeMap) which are written to from CUDA kernels. Direct3D then renders the results on the screen. A Direct3D Capable device is required. simplegl This program demonstrates interoperability between CUDA and OpenGL. The program modifies vertex positions with CUDA and uses OpenGL to render the geometry. simpletexture3d This program demonstrates the use of 3D textures in CUDA. Getting Started With CUDA SDK Samples DA-05723-001_v01 7

Multi-GPU Programming simplep2p This sample demonstrates how to use a new feature in CUDA 4.0 API for multi-device programming with UVA (Unified Virtual Addressing) and GPU Direct 2.0 peer to peer communications (copying of data and memory addressing). This sample requires two GPUs with peer to peer capability based on GF100 or GF110 or newer. Getting Started With CUDA SDK Samples DA-05723-001_v01 8

Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, MATERIALS ) ARE BEING PROVIDED AS IS. NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation. HDMI HDMI, the HDMI logo, and High-Definition Multimedia Interface are trademarks or registered trademarks of HDMI Licensing LLC. ROVI Compliance Statement NVIDIA Products that support Rovi Corporation s Revision 7.1.L1 Anti-Copy Process (ACP) encoding technology can only be sold or distributed to buyers with a valid and existing authorization from ROVI to purchase and incorporate the device into buyer s products. This device is protected by U.S. patent numbers 6,516,132; 5,583,936; 6,836,549; 7,050,698; and 7,492,896 and other intellectual property rights. The use of ROVI Corporation's copy protection technology in the device must be authorized by ROVI Corporation and is intended for home and other limited pay-per-view uses only, unless otherwise authorized in writing by ROVI Corporation. Reverse engineering or disassembly is prohibited. OpenCL OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc. Trademarks NVIDIA, the NVIDIA logo, and CUDA are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. Copyright 2011-2012 NVIDIA Corporation. All rights reserved. www.nvidia.com