Horizon-Based Ambient Occlusion using Compute Shaders. Louis Bavoil

Similar documents
Multi-View Soft Shadows. Louis Bavoil

Constant-Memory Order-Independent Transparency Techniques

Order Independent Transparency with Dual Depth Peeling. Louis Bavoil, Kevin Myers

Deinterleaved Texturing for Cache-Efficient Interleaved Sampling. Louis Bavoil

User Guide. Vertex Texture Fetch Water

SDK White Paper. Vertex Lighting Achieving fast lighting results

Technical Report. Anisotropic Lighting using HLSL

SDK White Paper. Matrix Palette Skinning An Example

SDK White Paper. Occlusion Query Checking for Hidden Pixels

White Paper. Solid Wireframe. February 2007 WP _v01

Soft Particles. Tristan Lorach

GPU LIBRARY ADVISOR. DA _v8.0 September Application Note

User Guide. DU _v01f January 2004

SDK White Paper. Soft Shadows

NVIDIA nforce 790i SLI Chipsets

Android PerfHUD ES quick start guide

Sparkling Effect. February 2007 WP _v01

White Paper. Denoising. February 2007 WP _v01

Enthusiast System Architecture Certification Feature Requirements

Tuning CUDA Applications for Fermi. Version 1.2

Technical Brief. LinkBoost Technology Faster Clocks Out-of-the-Box. May 2006 TB _v01

SDK White Paper. HLSL Blood Shader Gravity Maps

GLExpert NVIDIA Performance Toolkit

Technical Brief. NVIDIA Quadro FX Rotated Grid Full-Scene Antialiasing (RG FSAA)

White Paper. Soft Shadows. February 2007 WP _v01

Screen Space Ambient Occlusion. Daniel Kvarfordt & Benjamin Lillandt

User Guide. GLExpert NVIDIA Performance Toolkit

High Quality DXT Compression using OpenCL for CUDA. Ignacio Castaño

NVIDIA CAPTURE SDK 6.1 (WINDOWS)

Cg Toolkit. Cg 2.0 January 2008 Release Notes

Application Note. NVIDIA Business Platform System Builder Certification Guide. September 2005 DA _v01

NVIDIA CAPTURE SDK 7.1 (WINDOWS)

Technical Brief. NVIDIA and Microsoft Windows Vista Getting the Most Out Of Microsoft Windows Vista

Cg Toolkit. Cg 2.0 May 2008 Release Notes

NVIDIA CAPTURE SDK 6.0 (WINDOWS)

User Guide. GPGPU Disease

Optical Flow Estimation with CUDA. Mikhail Smirnov

White Paper. Texture Arrays Terrain Rendering. February 2007 WP _v01

CUDA Particles. Simon Green

CUDA TOOLKIT 3.2 READINESS FOR CUDA APPLICATIONS

Cg Toolkit. Cg 2.1 beta August 2008 Release Notes

Cg Toolkit. Cg 1.4 rc 1 Release Notes

NSIGHT ECLIPSE PLUGINS INSTALLATION GUIDE

Histogram calculation in CUDA. Victor Podlozhnyuk

Technical Report. GLSL Pseudo-Instancing

User Guide. TexturePerformancePBO Demo

Cg Toolkit. Cg 2.1 October 2008 Release Notes

Cg Toolkit. Cg 1.2 Release Notes

SDK White Paper. Video Filtering on the GPU. Eric Young NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050

NVWMI VERSION 2.24 STANDALONE PACKAGE

Cg Toolkit. Cg 2.2 April 2009 Release Notes

White Paper. Perlin Fire. February 2007 WP _v01

NVWMI VERSION 2.18 STANDALONE PACKAGE

Technical Report. Mesh Instancing

Cg Toolkit. Cg 1.3 Release Notes. December 2004

Advanced Ambient Occlusion Methods for Modern Games

QUADRO SYNC II FIRMWARE VERSION 2.02

Histogram calculation in OpenCL. Victor Podlozhnyuk

GRID SOFTWARE FOR RED HAT ENTERPRISE LINUX WITH KVM VERSION /370.28

CUDA/OpenGL Fluid Simulation. Nolan Goodnight

Skinned Instancing. Bryan Dudash

Technical Brief. AGP 8X Evolving the Graphics Interface

Volumetric Particle Shadows. Simon Green

Cg Toolkit. Cg 2.2 February 2010 Release Notes

GRID SOFTWARE FOR MICROSOFT WINDOWS SERVER VERSION /370.12

CUDA-GDB: The NVIDIA CUDA Debugger

User Guide. Melody 1.2 Normal Map Creation & Multiple LOD Generation

Technical Report. Non-Power-of-Two Mipmap Creation

MOSAIC CONTROL DISPLAYS

Rain. Sarah Tariq

NVIDIA CUDA C GETTING STARTED GUIDE FOR MAC OS X

NVIDIA SDK. NVMeshMender Code Sample User Guide

Technical Report. SLI Best Practices

MAXWELL COMPATIBILITY GUIDE FOR CUDA APPLICATIONS

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

NSIGHT ECLIPSE EDITION

Getting Started. NVIDIA CUDA Development Tools 2.2 Installation and Verification on Mac OS X. May 2009 DU _v01

KEPLER COMPATIBILITY GUIDE FOR CUDA APPLICATIONS

Cg Toolkit. Cg 1.2 User s Manual Addendum

TUNING CUDA APPLICATIONS FOR MAXWELL

PASCAL COMPATIBILITY GUIDE FOR CUDA APPLICATIONS

Getting Started. NVIDIA CUDA C Installation and Verification on Mac OS X

TUNING CUDA APPLICATIONS FOR MAXWELL

NVIDIA CUDA DEBUGGER CUDA-GDB. User Manual

Technical Brief. Lumenex Engine The New Standard In GPU Image Quality

GETTING STARTED WITH CUDA SDK SAMPLES

QUADRO WORKSTATION APPLICATION NVIDIA QuadroView Release Notes. Based on Version

Getting Started. NVIDIA CUDA Development Tools 2.3 Installation and Verification on Mac OS X

NSIGHT COMPUTE. v March Customization Guide

NVIDIA CUDA. Fermi Compatibility Guide for CUDA Applications. Version 1.1

Advanced Post Processing

NVJPEG. DA _v0.1.4 August nvjpeg Libary Guide

VIRTUAL GPU LICENSE SERVER VERSION

NVBLAS LIBRARY. DU _v6.0 February User Guide

NVJPEG. DA _v0.2.0 October nvjpeg Libary Guide

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Technical Brief. FirstPacket Technology Improved System Performance. May 2006 TB _v01

Advanced Computer Graphics CS 563: Screen Space GI Techniques: Real Time

NVIDIA Tesla Compute Cluster Driver for Windows

Transcription:

Horizon-Based Ambient Occlusion using Compute Shaders Louis Bavoil lbavoil@nvidia.com

Document Change History Version Date Responsible Reason for Change 1 March 14, 2011 Louis Bavoil Initial release

Overview This DirectX 11 SDK sample renders Screen-Space Ambient Occlusion (SSAO) in constant time using compute shaders. As with all SSAO algorithms, the input data is the depth buffer of the scene being rendered. (See [Akenine-Moller et al. 08] for an introduction to SSAO). In this sample, the AO is rendered using a jitter-free approximation of the Horizon-Based Ambient Occlusion (HBAO) algorithm [Bavoil and Sainz 08]. Second, the AO result is blurred horizontally and vertically using a depth-aware filter. Both the HBAO and the blur passes are implemented in compute shaders using group-shared memory. As shown on Figure 1, the lack of jittering in the HBAO computation can be hidden by using a large blur kernel. To ease the integration of this technique in DirectX 11 applications, an open-source library is provided with functions for specifying the input depths (linear or not), the associated camera parameters, the output color buffer, and the HBAO parameters. Figure 1. Medusa scene [NVIDIA 2008] with and without HBAO, using 4x8 depth samples per pixel and a 33x33 blur kernel. NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 www.nvidia.com

The NVSSAO Library Building with the Library The NVSSAO library can be integrated as an external library into any DirectX 11 application. The header file and the 32-bit lib files are located in the NVSSAO directory. There are multiple lib files, for different build configurations. The naming conventions are: - MT for multi-threaded runtime - MD for multi-threaded DLL runtime - MTd for multi-threaded debug runtime - MDd for multi-threaded DLL debug runtime Alternatively, the library can be integrated to the application by adding the contents of the NVSDK_D3D11_SSAO (including the Shaders subdirectory) to the source code and adding HBAO_DX11_LIB.cpp to the Visual Studio project. This is what the example SSAO11 application in this SDK sample does. In the two cases, the interface of the library is NVSDK_D3D11_SSAO.h. Only this header needs to be included in the source files that call the library s functions. Using NVSSAO Adding HBAO to a DirectX 11 application can be done simply by adding two function calls to the rendering loop, after the opaque objects of the scene have been rendered into the hardware depth buffer, and before the semi-transparent geometry is blended over. First, the input depth buffer must be specified by calling either NVSDK_D3D11_SetHardwareDepthsForAO or NVSDK_D3D11_SetViewSpaceDepthsForAO. Second, calling NVSDK_D3D11_RenderAO will render HBAO using default parameters and will output the HBAO to the render target view passed as argument to the function. The RenderAO should not modify the state of the application. It saves all the D3D state objects that it will alter at the beginning of the function and restores them at the end. The RenderAO function allocates all the D3D resources it needs on first use. To release these resources, NVSDK_D3D11_ReleaseAOResources should be added to the application where the screen-dependent resources are released. There is no need to call ReleaseAOResources when the buffers are resized though. RenderAO automatically resizes its render targets when it detects that the input depth buffer has changed size. Optionally, before calling the NVSDK_D3D11_RenderAO, the AO parameters may be configured by calling NVSDK_D3D11_SetAOParameters. If the useblending option is enabled, the AO is multiplied over the RGB colors from the output buffer while preserving destination alpha, otherwise the AO is overwritten.

Running the Sample The Show Colors checkbox toggles drawing the scene colors. The Show AO checkbox toggles rendering the HBAO and multiplying the AO over the colors. There are three scenes selectable from the GUI: The Medusa scenes are single-frame captures of the NVIDIA Medusa GTX 280 launch demo. For these scene, the camera cannot be moved and the resolution is assumed to be 1280x720 (default resolution). The AT-AT and the Leaves scenes are rendering meshes with a dynamic camera. For these scenes, the camera can be rotated by dragging the left button over the window and the mouse wheel can be used to zoom in and out. MSAA can be enabled by selecting one of the modes from the second combo box. The Sibenik scene is rendering a mesh with a first-person camera. For this scene, the camera can be moved with the A,D,W,S keys and dragging the mouse. The following HBAO parameters are exposed in the GUI: The Radius multiplier slider controls the scale factor that is applied to the AO radius of influence (view-space distance). Increasing the AO radius increases the distance over which objects are casting occlusion. The Num steps slider controls the number of uniform steps per direction. Increasing the number of steps increases the footprint of the HBAO kernel but may introduce under-sampling artifacts. The Angle bias slider can be used to remove undesired AO at concave edges of low-tessellated meshes, as well as for hiding precision artifacts. The Power exponent slider controls the exponent of the power function that is applied right before the HBAO is blended with the destination colors. The Blur width slider controls the size of the depth-aware blur kernel that is applied to the HBAO, in number of pixels. Increasing the blur width filters out banding artifacts and reduces temporal aliasing artifacts. The Blur sharpness slider scales the maximum depth delta between the current pixel and a given sample. This parameter should be increased if the blur filter is making the AO bleed across object silhouettes. The Compute-shader HBAO and Pixel-shader HBAO radio buttons are for comparing the compute-shader jitter-free HBAO with the original HBAO algorithm (ground truth). The Half-resolution and Full-resolution buttons control the resolution at which the unfiltered AO is computed. The blur passes are always performed in full resolution, regardless of this parameter. In the top-left part of the screen are displayed the average frame rate, the current resolution and GPU name, as well as the following GPU times: Total is the GPU time spent by NVSDK_D3D11_RenderAO, including: Z is for linearization the input depths (when using hardware depths as input), AO is for rendering the unfiltered HBAO, BlurX and BlurY are for the horizontal and vertical filter respectively, Comp is for blending the final AO to the output render target view.

Parameter Tuning AO Radius As shown on Figure 2, using a small AO radius multiplier, the output AO essentially darkens the concave areas of the surfaces. Using a larger radius multiplier makes the AO look more like a global illumination approximation. Figure 2. Unfiltered HBAO with R=0.2 (top) and R=4.0 (bottom).

Blur Size Figure 3 shows that the banding artifacts from using the 4 axis-aligned directions can be hidden by a 33x33 blur kernel. Figure 3. With no blur (top) and with a 33x33 blur kernel (bottom).

Code Organization The C++ code of the library is located in the NVSDK_D3D11_SSAO directory. The API is in NVSSAO\NVSDK_D3D11_SSAO.h and the implementation is in HBAO_DX11_LIB.h and HBAO_DX11_LIB.cpp. The HLSL source code of the shaders is located in the Shaders\Source\ directory. The associated HLSL byte-code files are in Shaders\Bin\ and are included in HBAO_DX11_LIB.cpp. When changing the HLSL source code, the byte code can be recompiled by launching Shaders\compile_all.bat. Performance Table 1 below presents an analysis of the NVSSAO GPU cost when running the SDK sample in 1900x1200 1xMSAA with the default settings (compute-shader HBAO). The HBAO is computed in full resolution with 4 directions and 8 steps per direction. The resulting AO is then filtered with depth-aware horizontal and vertical blur passes. For the Medusa scene, the input depths are set with NVSDK_D3D11_SetViewSpaceDepthsForAO, whereas for the other scenes the depths are passed with NVSDK_D3D11_SetHardwareDepthsForAO. Therefore, there is no GPU time spent on linearizing depths for the Medusa scene. Table 1. GPU times on GeForce GTX 580 in 1920x1200 1xMSAA. GPU Times (ms) Medusa AT-AT Leaves Linearization 0.0 0.1 0.1 AO 2.1 2.1 2.1 Horizontal Blur 0.6 0.6 0.6 Vertical Blur 0.6 0.6 0.6 Compositing 0.2 0.1 0.1 Total 3.5 3.5 3.5

Acknowledgments The AT-AT model used in this sample was created by Brad Blackburn and can be downloaded from http://www.scifi3d.com. The Medusa scene was captured from the NVIDIA Medusa demo [NVIDIA 2008]. References [Akenine-Moller et al. 08] Tomas Akenine-Möller, Eric Haines, Naty Hoffman. Real-Time Rendering - Third Edition. 2008. [Bavoil and Sainz 08] Louis Bavoil, Miguel Sainz. Image-Space Horizon-Based Ambient Occlusion. ShaderX7 - Advanced Rendering Techniques. 2008. [NVIDIA 2008] Eugene d'eon. Life on the Bleeding Edge: More Secrets of the NVIDIA Demo Team. NVISION08. 2008.

Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, MATERIALS ) ARE BEING PROVIDED AS IS. NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA, the NVIDIA logo, GeForce, NVIDIA Quadro, and NVIDIA CUDA are trademarks or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. Copyright 2011 NVIDIA Corporation. All rights reserved.