Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Similar documents
LOD and Occlusion Christian Miller CS Fall 2011

Universiteit Leiden Computer Science

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

TSBK03 Screen-Space Ambient Occlusion

Wed, October 12, 2011

Visible Surface Detection. (Chapt. 15 in FVD, Chapt. 13 in Hearn & Baker)

Rasterization Overview

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development

Simulation and Rendering Massive Crowds of Intelligent and Detailed Creatures on GPU

INFOGR Computer Graphics. J. Bikker - April-July Lecture 11: Visibility. Welcome!

Visibility and Occlusion Culling

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and

CS4620/5620: Lecture 14 Pipeline

Applications of Explicit Early-Z Culling

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

VISIBILITY & CULLING. Don t draw what you can t see. Thomas Larsson, Afshin Ameri DVA338, Spring 2018, MDH

Motivation. Culling Don t draw what you can t see! What can t we see? Low-level Culling

Hidden surface removal. Computer Graphics

Streaming Massive Environments From Zero to 200MPH

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Real-Time Rendering (Echtzeitgraphik) Dr. Michael Wimmer

Deferred Rendering Due: Wednesday November 15 at 10pm

CSM Scrolling: An acceleration technique for the rendering of cascaded shadow maps

CSE 167: Introduction to Computer Graphics Lecture #10: View Frustum Culling

RASTERISED RENDERING

Direct Rendering of Trimmed NURBS Surfaces

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Deferred Splatting. Gaël GUENNEBAUD Loïc BARTHE Mathias PAULIN IRIT UPS CNRS TOULOUSE FRANCE.

PowerVR Hardware. Architecture Overview for Developers

Pipeline Operations. CS 4620 Lecture 14

Project Gotham Racing 2 (Xbox) Real-Time Rendering. Microsoft Flighsimulator. Halflife 2

Course Recap + 3D Graphics on Mobile GPUs

Adaptive Point Cloud Rendering

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

CSE 167: Introduction to Computer Graphics Lecture #9: Visibility. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2018

Real Time Rendering of Expensive Small Environments Colin Branch Stetson University

The Rasterization Pipeline

3/1/2010. Acceleration Techniques V1.2. Goals. Overview. Based on slides from Celine Loscos (v1.0)

Abstract. 2 Description of the Effects Used. 1 Introduction Phong Illumination Bump Mapping

Computer Graphics Shadow Algorithms

CSE 167: Introduction to Computer Graphics Lecture #11: Visibility Culling

Real-Time Non- Photorealistic Rendering

CSE 167: Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Real-Time Hair Simulation and Rendering on the GPU. Louis Bavoil

CMSC427 Advanced shading getting global illumination by local methods. Credit: slides Prof. Zwicker

Performance OpenGL Programming (for whatever reason)

Speeding up your game

Hardware-driven Visibility Culling Jeong Hyun Kim

CS 381 Computer Graphics, Fall 2012 Midterm Exam Solutions. The Midterm Exam was given in class on Tuesday, October 16, 2012.

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Graphics and Interaction Rendering pipeline & object modelling

Table of Contents. Questions or problems?

Computer Graphics. Lecture 9 Hidden Surface Removal. Taku Komura

Shadows. COMP 575/770 Spring 2013

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

TDA362/DIT223 Computer Graphics EXAM (Same exam for both CTH- and GU students)

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

Hidden Surface Removal

CS535 Fall Department of Computer Science Purdue University

Dominic Filion, Senior Engineer Blizzard Entertainment. Rob McNaughton, Lead Technical Artist Blizzard Entertainment

Shadows & Decals: D3D10 techniques from Frostbite. Johan Andersson Daniel Johansson

CS 498 VR. Lecture 18-4/4/18. go.illinois.edu/vrlect18

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Hardware Displacement Mapping

Pipeline Operations. CS 4620 Lecture 10

The Application Stage. The Game Loop, Resource Management and Renderer Design

CHAPTER 1 Graphics Systems and Models 3

After the release of Maxwell in September last year, a number of press articles appeared that describe VXGI simply as a technology to improve

PowerVR Performance Recommendations. The Golden Rules

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

Computergrafik. Matthias Zwicker. Herbst 2010

Comparison of hierarchies for occlusion culling based on occlusion queries

A Rendering Pipeline for Real-time Crowds

Subdivision Of Triangular Terrain Mesh Breckon, Chenney, Hobbs, Hoppe, Watts

Visibility optimizations

Shadow Techniques. Sim Dietrich NVIDIA Corporation

Parallel Triangle Rendering on a Modern GPU

FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS

3D Reconstruction with Tango. Ivan Dryanovski, Google Inc.

Hybrid Rendering for Collaborative, Immersive Virtual Environments

Let s start with occluding contours (or interior and exterior silhouettes), and look at image-space algorithms. A very simple technique is to render

Terrain Rendering Research for Games. Jonathan Blow Bolt Action Software

Computer Graphics 1. Chapter 7 (June 17th, 2010, 2-4pm): Shading and rendering. LMU München Medieninformatik Andreas Butz Computergraphik 1 SS2010

Rendering Grass with Instancing in DirectX* 10

Impostors and pseudo-instancing for GPU crowd rendering

Atmospheric Reentry Geometry Shader

Approximating Subdivision Surfaces with Gregory Patches for Hardware Tessellation

Acknowledgement: Images and many slides from presentations by Mark J. Kilgard and other Nvidia folks, from slides on developer.nvidia.

Clipping. CSC 7443: Scientific Information Visualization

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

Illumination and Geometry Techniques. Karljohan Lundin Palmerius

Lecture 25: Board Notes: Threads and GPUs

The Rendering Pipeline (1)

Introduction to Visualization and Computer Graphics

GPU-Based Visualization of AMR and N-Body Dark Matter Simulation Data. Ralf Kähler (KIPAC/SLAC)

Transcription:

1

2

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. Crowd rendering in large environments presents a number of challenges, including visibility culling, animation, and level of detail (LOD) management. 3

These have traditionally been CPU-based tasks, trading some extra CPU work for a larger reduction in the GPU load, but the per-character cost can be a serious bottleneck. Furthermore, CPU-side scene management is difficult if objects are simulated and animated on the GPU. We present a practical solution that allows rendering of large crowds of characters, from a variety of viewpoints, with stable performance and excellent visual quality. Our system uses Direct3D 10 functionality to perform view-frustum culling, occlusion culling, and LOD selection entirely on the GPU, allowing thousands of GPU-simulated characters to be rendered with full shadows in arbitrary environments. To our knowledge this is the first system presented that supports this functionality. 4

5

We start with a vertex buffer containing all of the per-instance data needed to render each character, such as character positions and orientations. In our case, this information is obtained from a GPU-based crowd simulation, but a CPU-based simulation or user input could also be used. In order to avoid a read-back, we wanted to perform all the typical scene management tasks on the GPU. In particular we wanted to do frustum and occlusion culling of the characters, sort the visible characters into discrete LOD groups as well as shadow frustum groups. 6

The key idea behind our scene management approach is the use of geometry shaders that act as filters for a set of character instances. 7

A filtering shader works by taking a set of point primitives as input, where each point contains the per-instance data needed to render a given character (position, orientation, and animation state). 8

The filtering shader re-emits only those points which pass a particular test, while discarding the rest. The emitted points are streamed into a buffer which can then be re-bound as instance data and used to render the characters. Multiple filtering passes can be chained together by using successive DrawAuto calls, and different tests can be set up simply by using different shaders. In practice, we use a shared geometry shader to perform the actual filtering, and perform the different filtering tests in vertex shaders. Aside from providing more modular code, this approach can also provide performance benefits. 9

Next, I will show how to construct filters for performing view frustum culling, occlusion culling, LOD selection, and shadow frustum selection. 10

For view-frustum culling, the vertex shader simply performs an intersection check between the character bounding volume and the view frustum. 11

If the test passes, then the corresponding character is visible, and its instance data is emitted from the geometry shader and streamed out. Otherwise, it is discarded and the character s instance data is removed from the stream. 12

The output of this process is a buffer containing instance data for all characters that fall inside the view frustum. This instance data can then be recirculated and used as input to subsequent filters. 13

Using this framework, we can also perform occlusion culling to avoid rendering characters which are completely occluded by mountains or structures. Because we are performing our character management on the GPU, we are able to perform occlusion culling in a novel way, by taking advantage of the depth information that exists in the hardware Z buffer. 14

This approach requires far less CPU overhead than an approach based on predicated rendering or occlusion queries, while still allowing culling against arbitrary, dynamic occluders. Our approach is similar in spirit to the hierarchical Z testing that is implemented in modern GPUs. Here our filter uses the hierarchical depth image to test if characters are fully occluded. 15

After rendering all of the occluders in the scene, we construct a hierarchical depth image from the Z buffer, which we will refer to as a Hi-Z map. The Hi-Z map is a mip-mapped, screen-resolution image, where each texel in a given mip level contains the maximum depth of all corresponding texels in the previous mip level. In the most detailed mip level, each texel simply contains the corresponding depth value from the Z buffer. This depth information can be collected during the main rendering pass for the occluding objects; a separate depth pass is not required to build the Hi-Z map. 16

Each character s bounding sphere is projected into a appropriate level of the hierarchical depth image. 17

Each character s bounding sphere is projected into a appropriate level of the hierarchical depth image. 18

And a conservative depth test is performed against the texels in the chosen MIP level. If the character s bounding sphere passes the test, the instance data is streamed out. Otherwise, the filter removes the character's instance data from the stream. 19

LOD selection is also performed using a filtering scheme. Only those instances that have passed the previous frustum and occlusion filters are sent as input to this filter. The LOD selection filter tests visible characters distance from the camera. This filter is executed once for each LOD group. During the first pass, characters that fall into the first LOD group are selected. Three passes are used to create 3 stream out buffers that contain instanced characters in each LOD group. These groups can then be used to render instances of the 3 character LODs shown here. 20

We use a very similar system for generating our shadow maps. We use a Parallel Split shadow mapping scheme where several shadow maps are used for different segments of the view frustum. Parallel split shadow maps require that characters be, once again, sorted by their distance from the camera. This time we must construct groups of instance data for characters that fall into the various shadow map splits. This filter is a combination of the frustum culling filter and the LOD selection filter. 21

Once we have character instances grouped by shadow map splits, we can use the grouped instances to draw our characters into shadow maps. In our application we used the low LOD mesh to draw characters into the first three levels of the shadow map and used a even coarser mesh to draw characters into the final, furthest shadow map. As you can see in the image, this coarse mesh is comprised of just a few boxes. This is ok because this shadow map will only be used for characters that are very far away from the viewer. 22

23

Once we ve determined the visible characters in each LOD, we would like to render all of the character instances in each given LOD. In order to issue the draw call for a given LOD, we need to know the instance count. Obtaining this instance count unfortunately requires the use of a stream out statistics query. Like occlusion queries, stream out statistics queries can cause significant stalls, and, thus, performance degradation, when the results are used in the same frame that the query is issued, because the GPU may go idle while the application is processing the query results. However, an easy solution for this is to re-order draw-calls to fill the gap between previous computations and the result of the query. In our system, we are able to offset the GPU stall by interleaving scene management with the next frame s crowd movement simulation. This ensures that the GPU is kept busy while the CPU is reading the query result and issuing the character rendering commands. 24

Using the filtering techniques described earlier, we construct LOD groups of visible, frustum culled, instance data. The instance data is used to draw instaced character meshes. An appropriate mesh is used for each LOD group. As you would expect, the nearest LOD uses a very finely detailed mesh which uses hardware tessellation and displacement mapping to represent fine surface detail. Tessellation and displacement mapping are disabled for the medium LOD and a coarse approximating mesh is used for the furthest or lowest LOD. 25

Because we are doing all of our crowd management on the GPU, we must also perform mesh animation on the GPU. We do this by storing bone animations in textures. A texture array is used to store all of the animations. Each page of the texture array contains point sampled curves for all the bones for that particular animation. Using linear interpolation, these textures are sampled to give piece-wise linear bone animation reconstruction. 26

I have presented a method for managing large crowds completely on the GPU. The GPU is used to perform filtering operations on a stream of instance data and these filters implement frustum and visibility culling, LOD selection, and shadow frustum selection. Additionally, I had shown how an animation system can be moved to the GPU to support. 27