CUDA/OpenGL Fluid Simulation. Nolan Goodnight

Size: px
Start display at page:

Download "CUDA/OpenGL Fluid Simulation. Nolan Goodnight"

Transcription

1 CUDA/OpenGL Fluid Simulation Nolan Goodnight

2 Document Change History Version Date Responsible Reason for Change 0.1 2/22/07 Nolan Goodnight Initial draft 1.0 4/02/07 Nolan Goodnight / Mark Harris Release version

3 Abstract This document describes an NVIDIA CUDA implementation of a simple fluids solver for the Navier-Stokes equations for incompressible flow. The CUDA algorithms are based on Jos Stam s FFT-based Stable Fluids system [1], and we refer the reader to this paper for mathematical and algorithmic details. We present the basic steps in one time iteration of the solver, and we show how each kernel is implemented using the CUDA C programming language. The results are displayed using OpenGL to render a particle system within the fluid. Figure 1 shows a screenshot of the fluids after several iterations. Figure 1. A screenshot of the CUDA fluids solver running on a 512x512 domain. The fluid is displayed using particles that move according to the velocity field. 1

4 Motivation The core component of any fluid simulation is a numeric solver for the Navier-Stokes equations, represented over a uniform grid of spatial locations or grid cells. The solver works iteratively, where at each iteration or time step a set of discrete differential operators are applied to small groups of cells across the domain. These operators update fluid state, such as velocity and pressure, based on the grid values from the previous time step. For this implementation of Stable Fluids [1], one iteration consists of the following steps: Forces: Fluid velocity evolves according to force inputs over time. This step updates the fluid velocity field by integrating forces at each grid cell. Advection: To update the fluid velocity field, this step transfers velocity values between grid cells. The transfer distance depends on the magnitude of the velocity vectors. Diffusion: All fluids have some viscosity, which is a measure of how force inputs diffuse across the velocity field over time. This step simulates viscosity by applying a diffusion operator to the velocity field. Projection: For incompressible fluids, the velocity field must remain non-divergent or mass conserving. Otherwise, portions of the fluid will disappear and the simulation will fail to produce correct results. This step forces the updated velocity field to be non-divergent. The operators (or kernels) used in these steps access data in local neighborhoods around each grid cell. As a result, the algorithms are highly data-parallel and therefore a good match for the CUDA programming model. We parallelize the fluid solver by executing blocks of hundreds of threads that span the computational domain, where each thread applies these operators to a small number of grid cells. We present details in a later section. In addition to presenting implementation details and kernel code, this sample provides examples of how to use several features of the CUDA runtime API, user libraries, and C language. These features, which are explained in detail in the CUDA Programming Guide, include: CUDA Texture references: Most of the kernels in this example access GPU memory through texture. In CUDA, this is done using the texture reference type. (For details, see sections and of the programming guide.) The CUDA Array type: The fluids solver computes results in a 2D grid. In many cases, it is fastest to use 2D Arrays for intermediate results because CUDA Arrays

5 abstract the GPU s memory layout (with optimal 2D texture cache performance), while memory from cudamalloc() is linear. (For details, see section of the programming guide.) The CUFFT user library: This example implements the FFT-based version of the Stable Fluids algorithm. This approach performs velocity diffusion and mass conservation in the frequency domain, and we use the CUDA FFT library to perform Fourier transforms. (For details, see the CUFFT documentation.) The CUDA/OpenGL interoperability API: To display the fluid, we move particles using the fluid velocity field, and we draw the particles using simple GL calls. The interoperability API allows us to share memory between CUDA and OpenGL, so we can update the particle system using CUDA, and render from the same memory using OpenGL. (For details, see section of the programming guide.) Note that the CUDA Developer SDK also includes fluidsd3d, a version of the fluids simulator that uses Direct3D for rendering and demonstrates the CUDA Direct3D interoperability API. Implementation This section describes in detail the CUDA C implementation of each kernel in the Stable fluids algorithm. As is typical for data-parallel algorithms over 2D grids, we execute blocks of threads in which each thread is responsible for all computation within a small region of the fluid domain. For this example, most kernels use the following tile and thread block configuration: Tile size: The fluid domain is divided into tiles of 64-by-64 cells. We tile the entire domain in this way, using partially full tiles if the domain dimensions don t divide evenly by 64. Block size: 256 threads divided logically into 64 threads in x times 4 threads in y. The threads are distributed over the tile such that each thread computes results for a vertical column of 16 grid cells. This thread configuration gives us a high-performance balance between the number of threads per block, the maximum number of blocks per multiprocessor in the chip, and number of per thread resources (e.g. local registers) available. Figure 1 illustrates this block/thread configuration. Because it s possible that some threads blocks will overlap the fluid domain (Figure 1 b), we use the CUDA C code in Listing 1 to check that each thread is inside the domain boundaries. This code is used in all global functions for the fluid solver, but the constant 16 may be replaced by a variable parameter if the tile size changes.

6 Figure 1: Block diagram of the thread configuration used for all kernels in this sample. A) we use blocks of 64 x 4 threads, where each thread computes results for a vertical column of 16 grid cells. B) If the domain dimensions (dotted line) are not integer multiples of the tiles size, we overlap the domain with extra tiles. int gtidx = blockidx.x * blockdim.x + threadidx.x; int gtidy = blockidx.y * (16 * blockdim.y) + threadidx.y * 16; // gtidx is the domain location in x for this thread // dx is the domain size in x if (gtidx < dx) { for (int p = 0; p < 16; p++) { int fi = gtidy + p; // fi is the domain location in y for this thread // dy is the domain size in y if (fi < dy) { // Kernel for each step code goes here } } // For all cells in a vertical column of 16 } // If this thread is inside the fluid domain Listing 1: CUDA C code for testing whether a thread is inside the fluid domain.

7 Add Forces We add forces to the velocity field by translating user mouse motion into 2D force vectors. We launch a single thread block on the GPU, and each thread in the block computes a force value based on the distance from the center cell. Threads at the far edges of the thread block tile add smaller force updates to the velocity than threads near the center grid cell. The following is CUDA C code for adding forces: // tx is the thread location in x // ty is the thread location in y // fj is the global thread position in the velocity array int tx = threadidx.x; int ty = threadidx.y; cdata *fj = (cdata*)((char*)v + (ty + spy) * pitch) + tx + spx; // We offset the local thread position by r, which is the // radius of the force thread block // cdata is a float2 type cdata vterm = *fj; tx -= r; ty -= r; // For smoothness, we compute a 1/(1 + x^4 + y^4) force falloff float s = 1.f / (1.f + tx*tx*tx*tx + ty*ty*ty*ty); vterm.x += s * fx; vterm.y += s * fy; Note that we use a pitch value in the calculation of the global thread position in the velocity field. This is necessary because the velocity memory is allocated using cudamalloc2d (see section of the CUDA Programming Guide for details). All remaining code examples in this document will use a pitch value when calculation global input and output addresses. Velocity Advection For velocity advection, we need to compute how the fluid velocity transports across grid cells. We use the implicit advection process proposed by Stam [1], and described in detail by Harris [2]. In this approach, we use the velocity vector at each grid cell to trace back in time to a previous cell, and we replace the starting grid cell velocity with this value from this previous location.

8 cdata vterm, ploc; int fj = fi * dx + gtidx; // Fetch the velocity value at the grid cell for the thread // location (gridx, fi) vterm = texfetch(texref, (float)gtidx, (float)fi); // Trace the velocity vector backward to determine a new // location in the grid ploc.x = (gtidx + 0.5f) - (dt * vterm.x * dx); ploc.y = (fi + 0.5f) - (dt * vterm.y * dy); vterm = texfetch(texref, ploc.x, ploc.y); The second texture fetch in the code above uses bilinear interpolation to get a smooth value. The texture reference object is configured with this type of filtering so that we don t have to manually interpolate velocity values in the CUDA C code and can take advantage of the GPU s built-in texture filtering hardware. Velocity Diffusion and Projection This sample implements velocity diffusion and projection in a single kernel. We can do this because both operations are performed on velocity coefficients in the frequency domain. For this step we do the following: 1. Forward FFT of the velocity field 2. Frequency space diffusion to simulate viscosity and projection to make the velocity field non-divergent. 3. Inverse FFT of the velocity coefficients. For both forward and inversion FFT we use CUFFT, the CUDA FFT library. See the CUFFT documentation for details on how to create and use CUFFT plans. It s important to note that the current version of CUFFT only supports complex-to-complex transforms, while the velocity field is real-valued. Therefore, we allocate extra GPU memory to store complex-valued coefficients for the x and y velocity components. The differential operator for diffusion in the spatial domain is simply a convolution, which can be implemented as a point-wise multiplication in the frequency domain. Therefore, we can simulate fluid viscosity by scaling the velocity coefficients such that high frequency terms are forced smaller than low-frequency.

9 int fj = fi * dx + gtidx; xterm = vx[fj]; yterm = vy[fj]; // Compute the index of the wavenumber based on the // data order produced by a standard NN FFT. int iix = (gtidx > dx / 2)? (gtidx - (dx)) : gtidx; int iiy = (fi > dy / 2)? (fi - (dy)) : fi; // Velocity diffusion float kk = (float)(iix * iix + iiy * iiy); // k^2 float diff = 1.f / (1.f + visc * dt * kk); xterm.x *= diff; xterm.y *= diff; yterm.x *= diff; yterm.y *= diff; // Velocity projection if (kk > 0.f) { float rkk = 1.f / kk; // Real portion of velocity projection float rkp = (iix * xterm.x + iiy * yterm.x); // Imaginary portion of velocity projection float ikp = (iix * xterm.y + iiy * yterm.y); xterm.x -= rkk * rkp * iix; xterm.y -= rkk * ikp * iix; yterm.x -= rkk * rkp * iiy; yterm.y -= rkk * ikp * iiy; } vx[fj] = xterm; vy[fj] = yterm; Display Display of the fluid simulation is performed by rendering points in OpenGL. The positions of the points are initially randomized throughout the domain, and at each time step, the velocity field is used to move the points forward. To do this, cudaglmapbufferobject()is used to map the vertex buffer object of point positions to a CUDA device memory pointer. This allows it to be passed to a CUDA kernel which advects the point positions using the velocity field, as in the following code. The vertex buffer object is then unmapped from the device pointer so that it can be used in normal OpenGL rendering using gldrawarrays().

10 int fj = fi * dx + gtidx; pterm = part[fj]; int xvi = ((int)(pterm.x * dx)); int yvi = ((int)(pterm.y * dy)); vterm = *((cdata*)((char*)v + yvi * pitch) + xvi); pterm.x += dt * vterm.x; pterm.x = pterm.x - (int)pterm.x; pterm.x += 1.f; pterm.x = pterm.x - (int)pterm.x; pterm.y += dt * vterm.y; pterm.y = pterm.y - (int)pterm.y; pterm.y += 1.f; pterm.y = pterm.y - (int)pterm.y; part[fj] = pterm; Bibliography 1. Stam, J Stable Fluids. In Proceedings of SIGGRAPH Harris, Mark J. Fast Fluid Dynamics Simulation on the GPU. In GPU Gems, R. Fernando, Ed., ch. 38, pp Addison Wesley,

11 Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, MATERIALS ) ARE BEING PROVIDED AS IS. NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA, the NVIDIA logo, GeForce, NVIDIA Quadro, and NVIDIA CUDA are trademarks or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. Copyright 2007 NVIDIA Corporation. All rights reserved. NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA

Real-Time GPU Fluid Dynamics

Real-Time GPU Fluid Dynamics Real-Time GPU Fluid Dynamics Jesús Martín Berlanga Scientific Computing Computer Science Technical College (ETSIINF) Technical University of Madrid (UPM) January 19, 2017 Figure 1: Interactive simulation

More information

SDK White Paper. Vertex Lighting Achieving fast lighting results

SDK White Paper. Vertex Lighting Achieving fast lighting results SDK White Paper Vertex Lighting Achieving fast lighting results WP-01400-001-v01 July 2004 Abstract Vertex Lighting Vertex lighting is not as accurate as per-pixel lighting. However, it has the advantage

More information

CUDA Particles. Simon Green

CUDA Particles. Simon Green CUDA Particles Simon Green sdkfeedback@nvidia.com Document Change History Version Date Responsible Reason for Change 1.0 Sept 19 2007 Simon Green Initial draft Abstract Particle systems [1] are a commonly

More information

SDK White Paper. Matrix Palette Skinning An Example

SDK White Paper. Matrix Palette Skinning An Example SDK White Paper Matrix Palette Skinning An Example WP-01407-001-v01 July 2004 Abstract Matrix Palette Skinning Example Bone based animation is a good way to add believable animation to your engine without

More information

Optical Flow Estimation with CUDA. Mikhail Smirnov

Optical Flow Estimation with CUDA. Mikhail Smirnov Optical Flow Estimation with CUDA Mikhail Smirnov msmirnov@nvidia.com Document Change History Version Date Responsible Reason for Change Mikhail Smirnov Initial release Abstract Optical flow is the apparent

More information

Tuning CUDA Applications for Fermi. Version 1.2

Tuning CUDA Applications for Fermi. Version 1.2 Tuning CUDA Applications for Fermi Version 1.2 7/21/2010 Next-Generation CUDA Compute Architecture Fermi is NVIDIA s next-generation CUDA compute architecture. The Fermi whitepaper [1] gives a detailed

More information

Multi-View Soft Shadows. Louis Bavoil

Multi-View Soft Shadows. Louis Bavoil Multi-View Soft Shadows Louis Bavoil lbavoil@nvidia.com Document Change History Version Date Responsible Reason for Change 1.0 March 16, 2011 Louis Bavoil Initial release Overview The Multi-View Soft Shadows

More information

User Guide. Vertex Texture Fetch Water

User Guide. Vertex Texture Fetch Water User Guide Vertex Texture Fetch Water Introduction What Is the Vertex Texture Fetch Water Sample? The sample demonstrates a technique to render small to medium bodies of water using Vertex Texture Fetch

More information

SDK White Paper. Soft Shadows

SDK White Paper. Soft Shadows SDK White Paper Soft Shadows TB-01409-001_v01 July 2004 Abstract This paper describes how Shader Model 3.0 s conditional branching accelerates the computation of soft shadows. Note that you are not generating

More information

Technical Report. Anisotropic Lighting using HLSL

Technical Report. Anisotropic Lighting using HLSL Technical Report Anisotropic Lighting using HLSL Abstract Anisotropic Lighting Demo Anisotropic lighting is a lighting technique that does not require that the surface behave the same from different angles.

More information

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata CUDA Fluid simulation Lattice Boltzmann Models Cellular Automata Please excuse my layout of slides for the remaining part of the talk! Fluid Simulation Navier Stokes equations for incompressible fluids

More information

Technical Report. GLSL Pseudo-Instancing

Technical Report. GLSL Pseudo-Instancing Technical Report GLSL Pseudo-Instancing Abstract GLSL Pseudo-Instancing This whitepaper and corresponding SDK sample demonstrate a technique to speed up the rendering of instanced geometry with GLSL. The

More information

CUDA Particles. Simon Green

CUDA Particles. Simon Green CUDA Particles Simon Green sdkfeedback@nvidia.com Document Change History Version Date Responsible Reason for Change 1.0 Sept 19 2007 Simon Green Initial draft 1.1 Nov 3 2007 Simon Green Fixed some mistakes,

More information

High Quality DXT Compression using OpenCL for CUDA. Ignacio Castaño

High Quality DXT Compression using OpenCL for CUDA. Ignacio Castaño High Quality DXT Compression using OpenCL for CUDA Ignacio Castaño icastano@nvidia.com March 2009 Document Change History Version Date Responsible Reason for Change 0.1 02/01/2007 Ignacio Castaño First

More information

Technical Brief. NVIDIA and Microsoft Windows Vista Getting the Most Out Of Microsoft Windows Vista

Technical Brief. NVIDIA and Microsoft Windows Vista Getting the Most Out Of Microsoft Windows Vista Technical Brief NVIDIA and Microsoft Windows Vista Getting the Most Out Of Microsoft Windows Vista Getting the Most Out Of Windows Vista What Is Windows Vista? Microsoft Windows Vista is the first operating

More information

Technical Brief. LinkBoost Technology Faster Clocks Out-of-the-Box. May 2006 TB _v01

Technical Brief. LinkBoost Technology Faster Clocks Out-of-the-Box. May 2006 TB _v01 Technical Brief LinkBoost Technology Faster Clocks Out-of-the-Box May 2006 TB-02423-001_v01 Table of Contents Faster Clocks Out-of-the-Box with LinkBoost Technology... 3 Introduction... 3 LinkBoost...

More information

White Paper. Denoising. February 2007 WP _v01

White Paper. Denoising. February 2007 WP _v01 White Paper Denoising February 2007 WP-03020-001_v01 White Paper Document Change History Version Date Responsible Reason for Change _v01 AK, TS Initial release Go to sdkfeedback@nvidia.com to provide feedback

More information

SDK White Paper. Occlusion Query Checking for Hidden Pixels

SDK White Paper. Occlusion Query Checking for Hidden Pixels SDK White Paper Occlusion Query Checking for Hidden Pixels WP-01402-001_v01 July 2004 Abstract Occlusion Queries Many graphics engines today waste time by attempting to draw pixels of objects that do not

More information

SDK White Paper. HLSL Blood Shader Gravity Maps

SDK White Paper. HLSL Blood Shader Gravity Maps SDK White Paper HLSL Blood Shader Gravity Maps WP-01389-001_v01 July 2004 Preface Gravity Maps With the arrival of programmable pixel shaders, it has become possible to render the world around us in a

More information

Android PerfHUD ES quick start guide

Android PerfHUD ES quick start guide Android PerfHUD ES quick start guide Version 1.0001 July 2010-1 - Contents INTRODUCTION 3 SETUP 4 CONNECTING TO THE PERFHUD ES CLIENT 6 COMMON PROBLEMS 7 KNOWN ISSUES 8 July 2010-2 - Introduction This

More information

Histogram calculation in CUDA. Victor Podlozhnyuk

Histogram calculation in CUDA. Victor Podlozhnyuk Histogram calculation in CUDA Victor Podlozhnyuk vpodlozhnyuk@nvidia.com Document Change History Version Date Responsible Reason for Change 1.0 06/15/2007 vpodlozhnyuk First draft of histogram256 whitepaper

More information

NVIDIA CAPTURE SDK 6.1 (WINDOWS)

NVIDIA CAPTURE SDK 6.1 (WINDOWS) NVIDIA CAPTURE SDK 6.1 (WINDOWS) RN-07010-001_v04 July 2017 Release Notes i DOCUMENT CHANGE HISTORY RN-07010-001_v04 Version Date Authors Description of Change 01 February 8, 2016 SD Initial draft 02 May

More information

GPU & High Performance Computing (by NVIDIA) CUDA. Compute Unified Device Architecture Florian Schornbaum

GPU & High Performance Computing (by NVIDIA) CUDA. Compute Unified Device Architecture Florian Schornbaum GPU & High Performance Computing (by NVIDIA) CUDA Compute Unified Device Architecture 29.02.2008 Florian Schornbaum GPU Computing Performance In the last few years the GPU has evolved into an absolute

More information

Application Note. NVIDIA Business Platform System Builder Certification Guide. September 2005 DA _v01

Application Note. NVIDIA Business Platform System Builder Certification Guide. September 2005 DA _v01 Application Note NVIDIA Business Platform System Builder Certification Guide September 2005 DA-02043-001_v01 NVIDIA Business Platform System Builder Certification Guide In order to receive NVIDIA Business

More information

User Guide. GPGPU Disease

User Guide. GPGPU Disease User Guide GPGPU Disease Introduction What Is This? This code sample demonstrates chemical reaction-diffusion simulation on the GPU, and uses it to create a creepy disease effect on a 3D model. Reaction-diffusion

More information

NVIDIA nforce 790i SLI Chipsets

NVIDIA nforce 790i SLI Chipsets Technical Brief NVIDIA nforce 790i SLI Chipsets Reducing Latencies & Bandwidth Utilization March 2008 TB-03897-001_v01 Introduction The NVIDIA nforce 790i SLI chipset features an improved communication

More information

Technical Brief. NVIDIA Quadro FX Rotated Grid Full-Scene Antialiasing (RG FSAA)

Technical Brief. NVIDIA Quadro FX Rotated Grid Full-Scene Antialiasing (RG FSAA) Technical Brief NVIDIA Quadro FX Rotated Grid Full-Scene Antialiasing (RG FSAA) Overview Many full-scene antialiasing (FSAA) techniques help diminish the appearance of stairstep artifacts, known as jaggies.

More information

Deinterleaved Texturing for Cache-Efficient Interleaved Sampling. Louis Bavoil

Deinterleaved Texturing for Cache-Efficient Interleaved Sampling. Louis Bavoil Deinterleaved Texturing for Cache-Efficient Interleaved Sampling Louis Bavoil lbavoil@nvidia.com Document Change History Version Date Responsible Reason for Change 1 March 6, 2014 Louis Bavoil Initial

More information

Order Independent Transparency with Dual Depth Peeling. Louis Bavoil, Kevin Myers

Order Independent Transparency with Dual Depth Peeling. Louis Bavoil, Kevin Myers Order Independent Transparency with Dual Depth Peeling Louis Bavoil, Kevin Myers Document Change History Version Date Responsible Reason for Change 1.0 February 9 2008 Louis Bavoil Initial release Abstract

More information

GPU LIBRARY ADVISOR. DA _v8.0 September Application Note

GPU LIBRARY ADVISOR. DA _v8.0 September Application Note GPU LIBRARY ADVISOR DA-06762-001_v8.0 September 2016 Application Note TABLE OF CONTENTS Chapter 1. Overview... 1 Chapter 2. Usage... 2 DA-06762-001_v8.0 ii Chapter 1. OVERVIEW The NVIDIA is a cross-platform

More information

CUDA TOOLKIT 3.2 READINESS FOR CUDA APPLICATIONS

CUDA TOOLKIT 3.2 READINESS FOR CUDA APPLICATIONS CUDA TOOLKIT 3.2 READINESS FOR CUDA APPLICATIONS August 20, 2010 Technical Brief INTRODUCTION In NVIDIA CUDA TM Toolkit version 3.2 and the accompanying 260.xx release of the CUDA driver, changes are being

More information

Horizon-Based Ambient Occlusion using Compute Shaders. Louis Bavoil

Horizon-Based Ambient Occlusion using Compute Shaders. Louis Bavoil Horizon-Based Ambient Occlusion using Compute Shaders Louis Bavoil lbavoil@nvidia.com Document Change History Version Date Responsible Reason for Change 1 March 14, 2011 Louis Bavoil Initial release Overview

More information

NVIDIA CAPTURE SDK 7.1 (WINDOWS)

NVIDIA CAPTURE SDK 7.1 (WINDOWS) NVIDIA CAPTURE SDK 7.1 (WINDOWS) RN-07010-07.1_v01 October 2018 Release Notes i DOCUMENT CHANGE HISTORY RN-07010-07.1_v01 Version Date Authors Description of Change 01 September 24, 2018 SD Initial draft

More information

GETTING STARTED WITH CUDA SDK SAMPLES

GETTING STARTED WITH CUDA SDK SAMPLES GETTING STARTED WITH CUDA SDK SAMPLES DA-05723-001_v01 January 2012 Application Note TABLE OF CONTENTS Getting Started with CUDA SDK Samples... 1 Before You Begin... 2 Getting Started With SDK Samples...

More information

NVIDIA CAPTURE SDK 6.0 (WINDOWS)

NVIDIA CAPTURE SDK 6.0 (WINDOWS) NVIDIA CAPTURE SDK 6.0 (WINDOWS) RN-07010-001_v03 January 2017 Release Notes i DOCUMENT CHANGE HISTORY RN-07010-001_v03 Version Date Authors Description of Change 01 February 8, 2016 SD Initial draft 02

More information

Soft Particles. Tristan Lorach

Soft Particles. Tristan Lorach Soft Particles Tristan Lorach tlorach@nvidia.com January 2007 Document Change History Version Date Responsible Reason for Change 1 01/17/07 Tristan Lorach Initial release January 2007 ii Abstract Before:

More information

Particle Simulation using CUDA. Simon Green

Particle Simulation using CUDA. Simon Green Particle Simulation using CUDA Simon Green sdkfeedback@nvidia.com July 2012 Document Change History Version Date Responsible Reason for Change 1.0 Sept 19 2007 Simon Green Initial draft 1.1 Nov 3 2007

More information

White Paper. Solid Wireframe. February 2007 WP _v01

White Paper. Solid Wireframe. February 2007 WP _v01 White Paper Solid Wireframe February 2007 WP-03014-001_v01 White Paper Document Change History Version Date Responsible Reason for Change _v01 SG, TS Initial release Go to sdkfeedback@nvidia.com to provide

More information

NVWMI VERSION 2.24 STANDALONE PACKAGE

NVWMI VERSION 2.24 STANDALONE PACKAGE NVWMI VERSION 2.24 STANDALONE PACKAGE RN-07366-224-v01 December 2015 Release Notes DOCUMENT CHANGE HISTORY RN-07366-224-v01 Version Date Authors Description of Change 01 12/02/2015 CC Initial release for

More information

Constant-Memory Order-Independent Transparency Techniques

Constant-Memory Order-Independent Transparency Techniques Constant-Memory Order-Independent Transparency Techniques Louis Bavoil lbavoil@nvidia.com Eric Enderton eenderton@nvidia.com Document Change History Version Date Responsible Reason for Change 1 March 14,

More information

User Guide. GLExpert NVIDIA Performance Toolkit

User Guide. GLExpert NVIDIA Performance Toolkit User Guide GLExpert NVIDIA Performance Toolkit Table of Contents Introduction... 2 System Requirements...2 GLExpert Getting Started... 3 GLExpert Configuration Parameters...3 Categories of Interest...3

More information

CUDA-GDB: The NVIDIA CUDA Debugger

CUDA-GDB: The NVIDIA CUDA Debugger CUDA-GDB: The NVIDIA CUDA Debugger User Manual Version 2.2 Beta 3/30/2009 ii CUDA Debugger User Manual Version 2.2 Beta Table of Contents Chapter 1. Introduction... 1 1.1 CUDA-GDB: The NVIDIA CUDA Debugger...1

More information

GLExpert NVIDIA Performance Toolkit

GLExpert NVIDIA Performance Toolkit User Guide GLExpert NVIDIA Performance Toolkit Table of Contents Introduction... 1 System Requirements... 1 GLExpert Getting Started... 2 GLExpert Configuration... 2 Categories...3 Level of Information

More information

Cg Toolkit. Cg 1.2 Release Notes

Cg Toolkit. Cg 1.2 Release Notes Cg Toolkit Cg 1.2 Release Notes Cg Toolkit Release Notes The Cg Toolkit allows developers to write and run Cg programs using a wide variety of hardware platforms and graphics APIs. Originally released

More information

Technical Report. Non-Power-of-Two Mipmap Creation

Technical Report. Non-Power-of-Two Mipmap Creation Technical Report Non-Power-of-Two Mipmap Creation Abstract Mipmapping is commonly used to avoid sampling artifacts during texture minification. The construction of mipmaps for textures whose size is a

More information

White Paper. Soft Shadows. February 2007 WP _v01

White Paper. Soft Shadows. February 2007 WP _v01 White Paper Soft Shadows February 2007 WP-03016-001_v01 White Paper Document Change History Version Date Responsible Reason for Change 01 KD, SM Initial release Go to sdkfeedback@nvidia.com to provide

More information

White Paper. Texture Arrays Terrain Rendering. February 2007 WP _v01

White Paper. Texture Arrays Terrain Rendering. February 2007 WP _v01 Texture Arrays Terrain Rendering February 2007 WP-03015-001_v01 Document Change History Version Date Responsible Reason for Change _v01 BD, TS Initial release Go to sdkfeedback@nvidia.com to provide feedback

More information

Technical Report. SLI Best Practices

Technical Report. SLI Best Practices Technical Report SLI Best Practices Abstract This paper describes techniques that can be used to perform application-side detection of SLI-configured systems, as well as ensure maximum performance scaling

More information

Enthusiast System Architecture Certification Feature Requirements

Enthusiast System Architecture Certification Feature Requirements Enthusiast System Architecture Certification Feature Requirements October 2007 DA-03366-001_v02 Enthusiast System Architecture Certification Feature Requirements This document provides PC component vendors

More information

NVWMI VERSION 2.18 STANDALONE PACKAGE

NVWMI VERSION 2.18 STANDALONE PACKAGE NVWMI VERSION 2.18 STANDALONE PACKAGE RN-07366-218-v01 July 2014 Release Notes DOCUMENT CHANGE HISTORY RN-07366-218-v01 Version Date Authors Description of Change 01 07/31/2014 CC Initial release for version

More information

GRID SOFTWARE FOR RED HAT ENTERPRISE LINUX WITH KVM VERSION /370.28

GRID SOFTWARE FOR RED HAT ENTERPRISE LINUX WITH KVM VERSION /370.28 GRID SOFTWARE FOR RED HAT ENTERPRISE LINUX WITH KVM VERSION 367.128/370.28 RN-08687-001 _v4.7 July 2018 Release Notes TABLE OF CONTENTS Chapter 1. Release Notes... 1 Chapter 2. Validated Platforms...2

More information

Sparkling Effect. February 2007 WP _v01

Sparkling Effect. February 2007 WP _v01 White Paper Sparkling Effect February 2007 WP-03021-001_v01 White Paper Document Change History Version Date Responsible Reason for Change _v01 TL, TS Initial release Go to sdkfeedback@nvidia.com to provide

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v6.5 August 2014 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

Technical Report. SLI Best Practices

Technical Report. SLI Best Practices Technical Report SLI Best Practices Abstract This paper describes techniques that can be used to perform application-side detection of SLI-configured systems, as well as ensure maximum performance scaling

More information

Real-Time CPU Fluid Dynamics

Real-Time CPU Fluid Dynamics Real-Time CPU Fluid Dynamics Jesús Martín Berlanga Scientific Computing Computer Science Technical College (ETSIINF) Technical University of Madrid (UPM) January 19, 2017 Figure 1: Interactive simulation

More information

Technical Brief. AGP 8X Evolving the Graphics Interface

Technical Brief. AGP 8X Evolving the Graphics Interface Technical Brief AGP 8X Evolving the Graphics Interface Increasing Graphics Bandwidth No one needs to be convinced that the overall PC experience is increasingly dependent on the efficient processing of

More information

Getting Started. NVIDIA CUDA C Installation and Verification on Mac OS X

Getting Started. NVIDIA CUDA C Installation and Verification on Mac OS X Getting Started NVIDIA CUDA C Installation and Verification on Mac OS X November 2009 Getting Started with CUDA ii November 2009 Table of Contents Chapter 1. Introduction... 1 CUDA Supercomputing on Desktop

More information

NVIDIA CUDA C GETTING STARTED GUIDE FOR MAC OS X

NVIDIA CUDA C GETTING STARTED GUIDE FOR MAC OS X NVIDIA CUDA C GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v02 August 2010 Installation and Verification on Mac OS X DOCUMENT CHANGE HISTORY DU-05348-001_v02 Version Date Authors Description of Change

More information

Histogram calculation in OpenCL. Victor Podlozhnyuk

Histogram calculation in OpenCL. Victor Podlozhnyuk Histogram calculation in OpenCL Victor Podlozhnyuk vpodlozhnyuk@nvidia.com Document Change History Version Date Responsible Reason for Change 1.0 06/15/2007 Victor Podlozhnyuk First draft of histogram256

More information

User Guide. TexturePerformancePBO Demo

User Guide. TexturePerformancePBO Demo User Guide TexturePerformancePBO Demo The TexturePerformancePBO Demo serves two purposes: 1. It allows developers to experiment with various combinations of texture transfer methods for texture upload

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v7.0 March 2015 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

Adarsh Krishnamurthy (cs184-bb) Bela Stepanova (cs184-bs)

Adarsh Krishnamurthy (cs184-bb) Bela Stepanova (cs184-bs) OBJECTIVE FLUID SIMULATIONS Adarsh Krishnamurthy (cs184-bb) Bela Stepanova (cs184-bs) The basic objective of the project is the implementation of the paper Stable Fluids (Jos Stam, SIGGRAPH 99). The final

More information

Cg Toolkit. Cg 2.0 January 2008 Release Notes

Cg Toolkit. Cg 2.0 January 2008 Release Notes Cg Toolkit Cg 2.0 January 2008 Release Notes Cg Toolkit Release Notes The Cg Toolkit allows developers to write and run Cg programs using a wide variety of hardware and OS platforms and graphics APIs.

More information

Getting Started. NVIDIA CUDA Development Tools 2.2 Installation and Verification on Mac OS X. May 2009 DU _v01

Getting Started. NVIDIA CUDA Development Tools 2.2 Installation and Verification on Mac OS X. May 2009 DU _v01 Getting Started NVIDIA CUDA Development Tools 2.2 Installation and Verification on Mac OS X May 2009 DU-04264-001_v01 Getting Started with CUDA ii May 2009 DU-04264-001_v01 Table of Contents Chapter 1.

More information

Parallel Prefix Sum (Scan) with CUDA. Mark Harris

Parallel Prefix Sum (Scan) with CUDA. Mark Harris Parallel Prefix Sum (Scan) with CUDA Mark Harris mharris@nvidia.com March 2009 Document Change History Version Date Responsible Reason for Change February 14, 2007 Mark Harris Initial release March 25,

More information

QUADRO SYNC II FIRMWARE VERSION 2.02

QUADRO SYNC II FIRMWARE VERSION 2.02 QUADRO SYNC II FIRMWARE VERSION 2.02 RN-08989-002_v02 April 6, 2018 Release Notes DOCUMENT CHANGE HISTORY RN-08989-002_v02 Version Date Authors Description of Change 01 9/26/17 JK/DT/DK/CC Initial release

More information

NSIGHT ECLIPSE PLUGINS INSTALLATION GUIDE

NSIGHT ECLIPSE PLUGINS INSTALLATION GUIDE NSIGHT ECLIPSE PLUGINS INSTALLATION GUIDE DG-06450-001 _v9.0 June 2018 TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. Install Using Eclipse IDE... 1 1.2. Uninstall Using Eclipse IDE... 2 1.3. Install

More information

Technical Report. Mesh Instancing

Technical Report. Mesh Instancing Technical Report Mesh Instancing Abstract What is Mesh Instancing? Before we talk about instancing, let s briefly talk about the way that most D3D applications work. In order to draw a polygonal object

More information

GRID SOFTWARE FOR MICROSOFT WINDOWS SERVER VERSION /370.12

GRID SOFTWARE FOR MICROSOFT WINDOWS SERVER VERSION /370.12 GRID SOFTWARE FOR MICROSOFT WINDOWS SERVER VERSION RN-08686-001 _v4.4 (GRID) Revision 02 October 2017 Release Notes TABLE OF CONTENTS Chapter 1. Release Notes... 1 Chapter 2. Validated Platforms...2 2.1.

More information

NVIDIA CUDA DEBUGGER CUDA-GDB. User Manual

NVIDIA CUDA DEBUGGER CUDA-GDB. User Manual CUDA DEBUGGER CUDA-GDB User Manual PG-00000-004_V2.3 June, 2009 CUDA-GDB PG-00000-004_V2.3 Published by Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 Notice ALL DESIGN SPECIFICATIONS, REFERENCE

More information

Cg Toolkit. Cg 2.0 May 2008 Release Notes

Cg Toolkit. Cg 2.0 May 2008 Release Notes Cg Toolkit Cg 2.0 May 2008 Release Notes Cg Toolkit Release Notes The Cg Toolkit allows developers to write and run Cg programs using a wide variety of hardware and OS platforms and graphics APIs. Originally

More information

Cg Toolkit. Cg 1.3 Release Notes. December 2004

Cg Toolkit. Cg 1.3 Release Notes. December 2004 Cg Toolkit Cg 1.3 Release Notes December 2004 Cg Toolkit Release Notes The Cg Toolkit allows developers to write and run Cg programs using a wide variety of hardware platforms and graphics APIs. Originally

More information

NVIDIA CUDA TM Developer Guide for NVIDIA Optimus Platforms. Version 1.0

NVIDIA CUDA TM Developer Guide for NVIDIA Optimus Platforms. Version 1.0 NVIDIA CUDA TM Developer Guide for NVIDIA Optimus Platforms Version 1.0 3/30/2010 Introduction NVIDIA Optimus is a revolutionary technology that delivers great battery life and great performance, in a

More information

Getting Started. NVIDIA CUDA Development Tools 2.3 Installation and Verification on Mac OS X

Getting Started. NVIDIA CUDA Development Tools 2.3 Installation and Verification on Mac OS X Getting Started NVIDIA CUDA Development Tools 2.3 Installation and Verification on Mac OS X July 2009 Getting Started with CUDA ii July 2009 Table of Contents Chapter 1. Introduction... 1 CUDA Supercomputing

More information

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular

More information

Computational Fluid Dynamics (CFD) using Graphics Processing Units

Computational Fluid Dynamics (CFD) using Graphics Processing Units Computational Fluid Dynamics (CFD) using Graphics Processing Units Aaron F. Shinn Mechanical Science and Engineering Dept., UIUC Accelerators for Science and Engineering Applications: GPUs and Multicores

More information

Cg Toolkit. Cg 1.4 rc 1 Release Notes

Cg Toolkit. Cg 1.4 rc 1 Release Notes Cg Toolkit Cg 1.4 rc 1 Release Notes Cg Toolkit Release Notes The Cg Toolkit allows developers to write and run Cg programs using a wide variety of hardware platforms and graphics APIs. Originally released

More information

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1 Markus Hadwiger, KAUST Reading Assignment #2 (until Feb. 17) Read (required): GLSL book, chapter 4 (The OpenGL Programmable

More information

Navier-Stokes & Flow Simulation

Navier-Stokes & Flow Simulation Last Time? Navier-Stokes & Flow Simulation Implicit Surfaces Marching Cubes/Tetras Collision Detection & Response Conservative Bounding Regions backtracking fixing Today Flow Simulations in Graphics Flow

More information

NVIDIA SDK. NVMeshMender Code Sample User Guide

NVIDIA SDK. NVMeshMender Code Sample User Guide NVIDIA SDK NVMeshMender Code Sample User Guide Introduction What Is This? The NVMeshMender library is designed to prepare meshes for per-pixel lighting, by generating normals, tangents and binormals. Some

More information

GRID VIRTUAL GPU FOR HUAWEI UVP Version ,

GRID VIRTUAL GPU FOR HUAWEI UVP Version , GRID VIRTUAL GPU FOR HUAWEI UVP Version 340.78, 341.44 RN-06927-001 February 23rd, 2015 Release Notes RN-06927-001 CONTENTS Release Notes... 1 Validated Platforms... 2 Known Issues... 3 Version 340.78,

More information

Cg Toolkit. Cg 2.1 October 2008 Release Notes

Cg Toolkit. Cg 2.1 October 2008 Release Notes Cg Toolkit Cg 2.1 October 2008 Release Notes Cg Toolkit Release Notes The Cg Toolkit allows developers to write and run Cg programs using a wide variety of hardware and OS platforms and graphics APIs.

More information

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v5.0 October 2012 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1 System Requirements... 1 1.2 About

More information

A Simple Fluid Solver based on the FFT

A Simple Fluid Solver based on the FFT A Simple Fluid Solver based on the FFT Jos Stam Alias wavefront 1218 Third Ave, 8th Floor, Seattle, WA 98101 Abstract This paper presents a very simple implementation of a fluid solver. Our solver is consistent

More information

NVIDIA Tesla Compute Cluster Driver for Windows

NVIDIA Tesla Compute Cluster Driver for Windows NVIDIA Tesla Compute Cluster Driver for Windows RN-05404-198_v198-17 July 2010 Release Notes 01 NVIDIA TESLA COMPUTE CLUSTER DRIVER FOR WINDOWS This edition of Release 198 Notes describes the Release 198

More information

GPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA

GPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA GPGPU LAB Case study: Finite-Difference Time- Domain Method on CUDA Ana Balevic IPVS 1 Finite-Difference Time-Domain Method Numerical computation of solutions to partial differential equations Explicit

More information

NVIDIA CUDA C INSTALLATION AND VERIFICATION ON

NVIDIA CUDA C INSTALLATION AND VERIFICATION ON NVIDIA CUDA C INSTALLATION AND VERIFICATION ON MICROSOFT XP, MICROSOFT VISTA, AND WINDOWS 7 SYSTEMS DU-80003-001_v01 April 2010 Getting Started DOCUMENT CHANGE HISTORY DU-80003-001_v01 Version Date Authors

More information

Cg Toolkit. Cg 2.1 beta August 2008 Release Notes

Cg Toolkit. Cg 2.1 beta August 2008 Release Notes Cg Toolkit Cg 2.1 beta August 2008 Release Notes Cg Toolkit Release Notes The Cg Toolkit allows developers to write and run Cg programs using a wide variety of hardware and OS platforms and graphics APIs.

More information

White Paper. Perlin Fire. February 2007 WP _v01

White Paper. Perlin Fire. February 2007 WP _v01 White Paper Perlin Fire February 2007 WP-03012-001_v01 Document Change History Version Date Responsible Reason for Change 01 AT, CK Initial release Go to sdkfeedback@nvidia.com to provide feedback on Perlin

More information

Getting Started. NVIDIA CUDA Development Tools 2.2 Installation and Verification on Microsoft Windows XP and Windows Vista

Getting Started. NVIDIA CUDA Development Tools 2.2 Installation and Verification on Microsoft Windows XP and Windows Vista Getting Started NVIDIA CUDA Development Tools 2.2 Installation and Verification on Microsoft Windows XP and Windows Vista May 2009 Getting Started with CUDA ii May 2009 Table of Contents Chapter 1. Introduction...1

More information

NVIDIA CUDA. Fermi Compatibility Guide for CUDA Applications. Version 1.1

NVIDIA CUDA. Fermi Compatibility Guide for CUDA Applications. Version 1.1 NVIDIA CUDA Fermi Compatibility Guide for CUDA Applications Version 1.1 4/19/2010 Table of Contents Software Requirements... 1 What Is This Document?... 1 1.1 Application Compatibility on Fermi... 1 1.2

More information

PASCAL COMPATIBILITY GUIDE FOR CUDA APPLICATIONS

PASCAL COMPATIBILITY GUIDE FOR CUDA APPLICATIONS PASCAL COMPATIBILITY GUIDE FOR CUDA APPLICATIONS DA-08133-001_v9.1 April 2018 Application Note TABLE OF CONTENTS Chapter 1. Pascal Compatibility...1 1.1. About this Document...1 1.2. Application Compatibility

More information

CUDA Toolkit 4.2 CUFFT Library

CUDA Toolkit 4.2 CUFFT Library CUDA Toolkit 4.2 CUFFT Library PG-05327-040_v01 March 2012 Programming Guide Contents 1 Introduction 2 2 Using the CUFFT API 3 2.1 Data Layout..................................... 4 2.1.1 FFTW Compatibility

More information

CUDA Toolkit 5.0 CUFFT Library DRAFT. PG _v01 April Programming Guide

CUDA Toolkit 5.0 CUFFT Library DRAFT. PG _v01 April Programming Guide CUDA Toolkit 5.0 CUFFT Library PG-05327-050_v01 April 2012 Programming Guide Contents 1 Introduction 2 2 Using the CUFFT API 3 2.1 Data Layout..................................... 4 2.1.1 FFTW Compatibility

More information

MAXWELL COMPATIBILITY GUIDE FOR CUDA APPLICATIONS

MAXWELL COMPATIBILITY GUIDE FOR CUDA APPLICATIONS MAXWELL COMPATIBILITY GUIDE FOR CUDA APPLICATIONS DA-07172-001_v7.0 March 2015 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Compatibility... 1 1.1. About this Document...1 1.2. Application Compatibility

More information

CGT 581 G Fluids. Overview. Some terms. Some terms

CGT 581 G Fluids. Overview. Some terms. Some terms CGT 581 G Fluids Bedřich Beneš, Ph.D. Purdue University Department of Computer Graphics Technology Overview Some terms Incompressible Navier-Stokes Boundary conditions Lagrange vs. Euler Eulerian approaches

More information

Cg Toolkit. Cg 2.2 April 2009 Release Notes

Cg Toolkit. Cg 2.2 April 2009 Release Notes Cg Toolkit Cg 2.2 April 2009 Release Notes Cg Toolkit Release Notes The Cg Toolkit allows developers to write and run Cg programs using a wide variety of hardware and OS platforms and graphics APIs. Originally

More information

GRID VIRTUAL GPU FOR HUAWEI UVP Version /

GRID VIRTUAL GPU FOR HUAWEI UVP Version / GRID VIRTUAL GPU FOR HUAWEI UVP Version 361.40 / 362.13 RN-07930-001 April 4 th, 2016 Release Notes RN-07930-001 CONTENTS Release Notes... 1 Validated Platforms... 2 Software Versions... 2 Known Product

More information

KEPLER COMPATIBILITY GUIDE FOR CUDA APPLICATIONS

KEPLER COMPATIBILITY GUIDE FOR CUDA APPLICATIONS KEPLER COMPATIBILITY GUIDE FOR CUDA APPLICATIONS DA-06287-001_v5.0 October 2012 Application Note TABLE OF CONTENTS Chapter 1. Kepler Compatibility... 1 1.1 About this Document... 1 1.2 Application Compatibility

More information

NSIGHT ECLIPSE EDITION

NSIGHT ECLIPSE EDITION NSIGHT ECLIPSE EDITION DG-06450-001 _v5.0 October 2012 Getting Started Guide TABLE OF CONTENTS Chapter 1. Introduction...1 1.1 About...1 Chapter 2. Using... 2 2.1 Installing... 2 2.1.1 Installing CUDA

More information