Soft Particles. Tristan Lorach

Similar documents
White Paper. Solid Wireframe. February 2007 WP _v01

Multi-View Soft Shadows. Louis Bavoil

Sparkling Effect. February 2007 WP _v01

User Guide. Vertex Texture Fetch Water

SDK White Paper. Vertex Lighting Achieving fast lighting results

Technical Report. Anisotropic Lighting using HLSL

Constant-Memory Order-Independent Transparency Techniques

Horizon-Based Ambient Occlusion using Compute Shaders. Louis Bavoil

White Paper. Soft Shadows. February 2007 WP _v01

Android PerfHUD ES quick start guide

Order Independent Transparency with Dual Depth Peeling. Louis Bavoil, Kevin Myers

SDK White Paper. Matrix Palette Skinning An Example

White Paper. Texture Arrays Terrain Rendering. February 2007 WP _v01

SDK White Paper. Occlusion Query Checking for Hidden Pixels

User Guide. TexturePerformancePBO Demo

Technical Brief. LinkBoost Technology Faster Clocks Out-of-the-Box. May 2006 TB _v01

SDK White Paper. HLSL Blood Shader Gravity Maps

SDK White Paper. Soft Shadows

Skinned Instancing. Bryan Dudash

Technical Report. GLSL Pseudo-Instancing

Deinterleaved Texturing for Cache-Efficient Interleaved Sampling. Louis Bavoil

Technical Brief. NVIDIA Quadro FX Rotated Grid Full-Scene Antialiasing (RG FSAA)

Technical Report. Non-Power-of-Two Mipmap Creation

White Paper. Denoising. February 2007 WP _v01

Application Note. NVIDIA Business Platform System Builder Certification Guide. September 2005 DA _v01

NVIDIA nforce 790i SLI Chipsets

NVWMI VERSION 2.24 STANDALONE PACKAGE

Cg Toolkit. Cg 2.1 October 2008 Release Notes

NVIDIA SDK. NVMeshMender Code Sample User Guide

User Guide. GLExpert NVIDIA Performance Toolkit

White Paper. Perlin Fire. February 2007 WP _v01

GLExpert NVIDIA Performance Toolkit

User Guide. DU _v01f January 2004

Cg Toolkit. Cg 2.1 beta August 2008 Release Notes

Technical Report. Mesh Instancing

Cg Toolkit. Cg 2.0 January 2008 Release Notes

Cg Toolkit. Cg 2.0 May 2008 Release Notes

Optical Flow Estimation with CUDA. Mikhail Smirnov

CUDA Particles. Simon Green

Enthusiast System Architecture Certification Feature Requirements

High Quality DXT Compression using OpenCL for CUDA. Ignacio Castaño

Cg Toolkit. Cg 1.2 Release Notes

Tuning CUDA Applications for Fermi. Version 1.2

Technical Brief. NVIDIA and Microsoft Windows Vista Getting the Most Out Of Microsoft Windows Vista

User Guide. GPGPU Disease

Cg Toolkit. Cg 1.3 Release Notes. December 2004

NVWMI VERSION 2.18 STANDALONE PACKAGE

Technical Brief. AGP 8X Evolving the Graphics Interface

GPU LIBRARY ADVISOR. DA _v8.0 September Application Note

SDK White Paper. Video Filtering on the GPU. Eric Young NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050

Rain. Sarah Tariq

Cg Toolkit. Cg 2.2 April 2009 Release Notes

Cg Toolkit. Cg 1.4 rc 1 Release Notes

Rendering Grass with Instancing in DirectX* 10

QUADRO SYNC II FIRMWARE VERSION 2.02

Cg Toolkit. Cg 2.2 February 2010 Release Notes

Software Occlusion Culling

The Rasterization Pipeline

NVIDIA CAPTURE SDK 6.1 (WINDOWS)

GRID SOFTWARE FOR RED HAT ENTERPRISE LINUX WITH KVM VERSION /370.28

CUDA/OpenGL Fluid Simulation. Nolan Goodnight

SDK White Paper. Improve Batching Using Texture Atlases

Dominic Filion, Senior Engineer Blizzard Entertainment. Rob McNaughton, Lead Technical Artist Blizzard Entertainment

Could you make the XNA functions yourself?

Shadow Techniques. Sim Dietrich NVIDIA Corporation

User Guide. NVIDIA Quadro FX 4700 X2 BY PNY Technologies Part No. VCQFX4700X2-PCIE-PB

Intel Core 4 DX11 Extensions Getting Kick Ass Visual Quality out of the Latest Intel GPUs

CUDA Particles. Simon Green

NVIDIA CAPTURE SDK 6.0 (WINDOWS)

Rendering Grass Terrains in Real-Time with Dynamic Lighting. Kévin Boulanger, Sumanta Pattanaik, Kadi Bouatouch August 1st 2006

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Histogram calculation in CUDA. Victor Podlozhnyuk

User Guide. Melody 1.2 Normal Map Creation & Multiple LOD Generation

CS 354R: Computer Game Technology

GRID SOFTWARE FOR MICROSOFT WINDOWS SERVER VERSION /370.12

Volumetric Particle Shadows. Simon Green

NSIGHT ECLIPSE PLUGINS INSTALLATION GUIDE

Technical Report. SLI Best Practices

MOSAIC CONTROL DISPLAYS

Technical Brief. Lumenex Engine The New Standard In GPU Image Quality

Direct Rendering of Trimmed NURBS Surfaces

Spring 2009 Prof. Hyesoon Kim

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Rasterization Overview

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

CS451Real-time Rendering Pipeline

CSE 167: Introduction to Computer Graphics Lecture #9: Visibility. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2018

Histogram calculation in OpenCL. Victor Podlozhnyuk

1.2.3 The Graphics Hardware Pipeline

TUNING CUDA APPLICATIONS FOR MAXWELL

CUDA-GDB: The NVIDIA CUDA Debugger

NVIDIA CAPTURE SDK 7.1 (WINDOWS)

CSE 167: Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Shading Shades. Frank Jargstorff. June 1, Abstract

GRID VIRTUAL GPU FOR HUAWEI UVP Version ,

Technical Report. SLI Best Practices

2.11 Particle Systems

Morphological: Sub-pixel Morhpological Anti-Aliasing [Jimenez 11] Fast AproXimatte Anti Aliasing [Lottes 09]

TUNING CUDA APPLICATIONS FOR MAXWELL

Rasterization and Graphics Hardware. Not just about fancy 3D! Rendering/Rasterization. The simplest case: Points. When do we care?

Transcription:

Soft Particles Tristan Lorach tlorach@nvidia.com January 2007

Document Change History Version Date Responsible Reason for Change 1 01/17/07 Tristan Lorach Initial release January 2007 ii

Abstract Before: After : Particle sprites are widely used in most games. They are used for a variety of semitransparent effects such as fog, smoke, dust, magic spells and explosions. However, sprites commonly produce artifacts unnaturally sharp edges where they intersect the rest of the scene. We present a simple way to fade out flat sprites against a 3D scene, softening the artificial edges. Two solutions are implemented: one uses the ability of DirectX10 to read the depth buffer as a texture; the other uses a more conventional second render target to store depth values. New features of DirectX10 are exploited: Geometry Shaders and the ability to read from the z-buffer. January 2007 1

Figure 1. Sharp, unnatural particles edges (above) and our improved soft particles (below). January 2007 2

Motivation The cheapest way to display particles is to use sprites: polygons that face the camera in any orientation. A major issue with sprites is that they are flat but actually represent complex 3D objects. Figure 2 shows a typical sprite. It represents, say, a smoke cloud that looks roughly spherical in 3D space. However, this cloud is far from being a simple sphere the small details are highly irregular. A single, flat, textured polygon is a poor model for such complexity. Figure 2. A typical particle texture from a game. When we place the particle in the 3D scene, it is z-buffered to obtain correct occlusion. Z-buffering produces an all-or-nothing result: a pixel is visible or not. Again, this is a poor model for the complexity of our smoke cloud. The depth values of the particles are inconsistent with the true nature of the smoke cloud that we are attempting to represent. We can easily see that the particle is a flat sprite and the illusion breaks. One of the most disconcerting cases is using particles to draw dynamic fog with large sprites. This sample shows two simple approaches to improving the depth test by adding a transition zone where the sprite fades-out against the background scene. Figure 3 shows an exaggerated example. January 2007 3

Figure 3. A particle rendered with a high-contrast, opaque texture to exaggerate the problem (above) and our result (below). January 2007 4

How Does It Work? Creation of the sprite Each sprite is created using the Geometry Shader. So we only feed the GPU a set of 3D positions, one per sprite. The Geometry Shader expands each position into two triangles and correctly orients them perpendicular to the camera view direction. Blending the sprite with background The native depth test in the hardware s z-buffering unit is not sufficient for Soft Particles because z-buffering is a binary all-or-nothing result, as explained above. So we access the depth values and perform additional processing when the sprite is close to the background scene. Considering fragments processed by the pixel shader: The closer the sprite s fragment is to the scene, the more it fades out to show more of the background (d2 and d4 in figure 4). The farther the sprite s fragment is to the scene, the more opaque it will be (d3 in figure 4). If the sprite is behind the background scene, we discard the fragment entirely (d1 in figure 4). (Note, the opacity value described here does not account for opacity contributed by the sprite s texture.) January 2007 5

Figure 4. d1 is negative so we kill the fragment; at d2 and d4 the particle blends with the background; in d3 the fragment is opaque. We use a function to fade-out the particle with the background depending on the difference between its depth and the scene s depth. An easy approach would be to saturate the result of the scaled difference: D = saturate((z scene Z particle ) * scale) But using a simple ramp function is not enough: in some situations we can see artifacts related to discontinuities caused by the saturate() function. Figure 6 shows an example. We add a contrast function to keep the whole curve continuous and create a smooth fade (see Contrast() ). Figure 5 shows the contrast function: it is a piecewise function based on the intrinsic pow function but symmetric at the central point (0.5, 0.5). float Output = 0.5*pow(saturate(2*(( Input > 0.5)? 1-Input : Input)), ContrastPower); Output = ( Input > 0.5)? 1-Output : Output; January 2007 6

Figure 5. Contrast function. d depends on contrast power Figure 6. A linear fade (left) which still shows sharp edges. Compare with our smooth contrast curve (right). January 2007 7

Note that we also tried another smoothing curve for the same purpose: Fig 6 : different curves depending on c in Fc This function looks easier to implement than the previous one because it is not piecewise. However, compared to the previous curve, the lack of symmetry could lead to lower quality smoothing. This function is available in the code for comparison purpose (see contrast() function). Depth values The preceding contrast functions all require a depth value for the visible point in the scene. This is not automatically available to the pixel shader. We implemented two different way to read the depth values. Reading the Z-Buffer A new feature of DirectX10 is the ability to read the z-buffer values in a pixel shader. We exploit this feature to provide the depth values for our fade function. Note: DirectX10 does not permit you to bind a shader resource view for a multi-sample anti-aliased depth buffer. January 2007 8

Figure 7. The first opaque pass populates the depth buffer. The second, particle pass reads depth. In pass #2, this version uses the depth buffer in the pixel shader as a texture. To be able to use it as a texture, we need to not be using this depth buffer as the currently bound depth-stencil render-target. Then we bind it as a Shader Resource View (SRV). Finally, the comparison of depth values will be done in the pixel shader. Note: DirectX10 doesn t allow us to keep the depth-stencil texture as a render target and bind it at the same time as a SRV. Unfortunately, this is true even if you decide to disable Z writes (one would expect DX10 to allow reading from a depth buffer we won t modify, but that is not the case). We use the Load function to read texel values. Load doesn t use a sampler, which is useful because we don t need any filtering. Furthermore, Load fetches the texture using its real width and height dimensions (also called unnormalized coordinates) instead of using the usual normalized (u,v) coordinate set. This simplifies this algorithm because the width and height of the depth buffer are the same as the window where we run the pixel shader. January 2007 9

Each fragment will Load the depth value using it s window coordinate as a texture coordinate, available through the system value SV_Position : zbuf = texture_depth.load( int4(p.x, p.y,0,0)).r; In the pixel shader, we need to keep track of the depth value computed by the triangle we are rasterizing. We can forward this depth value through an interpolator (texcoord#) to transmit the projected Z values from the vertex shader down to the pixel shader. As an intermediate, the geometry shader will also pass down this Z value to the pixel shader: output.z = mul(pw, viewproj).z; We can see that the pixel shader will receive the Z value before it got divided by w component : the projection-transformed Z value. The zbuf value returned from the Load() function is expressed in Normalized Device Coordinates Space (xyz/w) and now needs to be converted to Projection Space (before we divide by w). Consider the projection transformation: Next we transform the vector into the Normalized Device Coordinates space by dividing (x,y,z) by w. Here is what we ll get for the z component : Zbuf is the projected z value that the hardware stored in the depth buffer. If we want to compare consistent depth values, the fetched value Zbuf needs to be transformed into projection space : Then we will be able to compare this z value with the one sent through the interpolator down to the pixel shader. January 2007 10

Using a 2 nd Render Target Figure 7 The second solution is using multiple render-targets (MRT): while rendering the components of the scene in the first pass, a second FP32 render-target will be used to store the depth values. Note that any vertex-shader of any object being processed in the first pass will have to provide the Z value in Projection Space (before dividing by w). Similarly, any pixel shader will have to write this Z value to the second render target. In the second pass, when rendering the particles, we read this FP32 texture and use the values for direct comparison (both are expressed in Projection Space). Here again, January 2007 11

the depth value of the particle is passed by the vertex shader down to the pixel shader, where the comparison is done. Running the Sample The sample is only rendering two big particles with a cloudy animated texture (a 3D texture). 3 techniques are exposed through a combo box : Using the ZBuffer as a texture for particle depth test (default) Using an FP32 second render target as the depth texture to test against Basic mode with no more than HW depth test (no soft particles). Some other parameters are available : MaxSz : modify the size of the two particles Scale : scaling the difference between Zparticle and Zbackground. This will of course increase or decrease how fast the fade out happens. Contrast : changes the curvature of the transition (See fig.4) Epsilon Threshold : by default this is set to zero. This epsilon is the limit under which the fragment will be discarded. Animate Texture : freeze animation. Use Debug Tex : displays a procedural checkerboard for debugging purposes (Fig.2) Debug Loop : amount of time we render the 2 particles. Interesting for performance comparisons Integration The first implementation of using the depth buffer as a texture is easy to integrate because we don t need to provide any additional information during the first pass. On the other hand, the second method using an additional fp32 render target is somewhat more painful to integrate. Each vertex and pixel shader used during the first pass needs to be changed in order to write the depth values correctly to the fp32 render target. January 2007 12

Performance The test was performed on AMD Athlon 64 3500+ 2.21Ghz with a GeForce 8800 GTX board. We changed the sample to loop 1000 times on 2 particles originally rendered and changed the particle size to 3.2. Note that this test is not visually interesting and doesn t reflect common situations. The purpose of having this loop is to reduce the margin of error. The fact that we run each iteration repeatedly leads to lower framerates: 1. simple HW depth test : 34.6fps 2. Technique using the ZBuffer as a texture : 13.3fps -60% 3. Technique using an the Fp32 render target for depth values : 26.7-25% It appears that the fact that 2) disabled the depth buffer does really impact the performance as it also disables the various hardware depth test optimizations. On the other hand, the third method didn t change the setup of the depth buffer. Performances here is better but still lower than #1. We can explain this by the fact that more pixel shader processing is required to create the smooth transition effect. January 2007 13

Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, MATERIALS ) ARE BEING PROVIDED AS IS. NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA, the NVIDIA logo, GeForce, and NVIDIA Quadro are trademarks or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. Copyright 2007 NVIDIA Corporation. All rights reserved. NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 www.nvidia.com