A Rendering Pipeline for Real-time Crowds

Similar documents
Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Impostors and pseudo-instancing for GPU crowd rendering

Adaptive Point Cloud Rendering

CS451Real-time Rendering Pipeline

CMSC427 Advanced shading getting global illumination by local methods. Credit: slides Prof. Zwicker

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Impostors, Pseudo-instancing and Image Maps for GPU Crowd Rendering

Spring 2009 Prof. Hyesoon Kim

Pipeline Operations. CS 4620 Lecture 14

CS4620/5620: Lecture 14 Pipeline

Advanced Lighting Techniques Due: Monday November 2 at 10pm

Spring 2011 Prof. Hyesoon Kim

Sign up for crits! Announcments

RASTERISED RENDERING

The Rasterization Pipeline

C P S C 314 S H A D E R S, O P E N G L, & J S RENDERING PIPELINE. Mikhail Bessmeltsev

Lecture 3 Sections 2.2, 4.4. Mon, Aug 31, 2009

Direct Rendering of Trimmed NURBS Surfaces

3D Graphics for Game Programming (J. Han) Chapter II Vertex Processing

Motivation. Culling Don t draw what you can t see! What can t we see? Low-level Culling

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

The Rasterization Pipeline

The Application Stage. The Game Loop, Resource Management and Renderer Design

CHAPTER 1 Graphics Systems and Models 3

Rasterization Overview

CSE 167: Introduction to Computer Graphics Lecture #4: Vertex Transformation

Rendering Objects. Need to transform all geometry then

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

A Real-time System of Crowd Rendering: Parallel LOD and Texture-Preserving Approach on GPU

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions.

CS 563 Advanced Topics in Computer Graphics QSplat. by Matt Maziarz

CS 130 Exam I. Fall 2015

Overview. By end of the week:

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Computer Graphics Shadow Algorithms

TDA362/DIT223 Computer Graphics EXAM (Same exam for both CTH- and GU students)

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

CS 381 Computer Graphics, Fall 2012 Midterm Exam Solutions. The Midterm Exam was given in class on Tuesday, October 16, 2012.

CSE 167: Introduction to Computer Graphics Lecture #9: Visibility. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2018

VR Rendering Improvements Featuring Autodesk VRED

CS 381 Computer Graphics, Fall 2008 Midterm Exam Solutions. The Midterm Exam was given in class on Thursday, October 23, 2008.

Deferred Rendering Due: Wednesday November 15 at 10pm

Computer Graphics with OpenGL ES (J. Han) Chapter VII Rasterizer

CS535 Fall Department of Computer Science Purdue University

Rendering Grass with Instancing in DirectX* 10

CSM Scrolling: An acceleration technique for the rendering of cascaded shadow maps

Shadows & Decals: D3D10 techniques from Frostbite. Johan Andersson Daniel Johansson

Terrain Rendering (Part 1) Due: Thursday November 30 at 10pm

Real-Time Rendering of a Scene With Many Pedestrians

Terrain rendering (part 1) Due: Monday, March 10, 10pm

Streaming Massive Environments From Zero to 200MPH

CSE528 Computer Graphics: Theory, Algorithms, and Applications

CSE 167: Lecture #4: Vertex Transformation. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Ciril Bohak. - INTRODUCTION TO WEBGL

Computergrafik. Matthias Zwicker. Herbst 2010

Speeding up your game

1.2.3 The Graphics Hardware Pipeline

Shaders. Slide credit to Prof. Zwicker

CS 130 Exam I. Fall 2015

LOD and Occlusion Christian Miller CS Fall 2011

Animation Essentially a question of flipping between many still images, fast enough

Universiteit Leiden Computer Science

Project 1, 467. (Note: This is not a graphics class. It is ok if your rendering has some flaws, like those gaps in the teapot image above ;-)

Evolution of GPUs Chris Seitz

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

WebGL (Web Graphics Library) is the new standard for 3D graphics on the Web, designed for rendering 2D graphics and interactive 3D graphics.

CSE328 Fundamentals of Computer Graphics

Per-pixel Rendering of Terrain Data

3D GRAPHICS. design. animate. render

Rendering If we have a precise computer representation of the 3D world, how realistic are the 2D images we can generate? What are the best way to mode

Overview. By end of the week:

Subdivision Of Triangular Terrain Mesh Breckon, Chenney, Hobbs, Hoppe, Watts

Computer Graphics 7: Viewing in 3-D

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

OpenGL Transformations

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

CSE 167: Introduction to Computer Graphics Lecture #10: View Frustum Culling

High-quality Shadows with Improved Paraboloid Mapping

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

TSBK03 Screen-Space Ambient Occlusion

Performance OpenGL Programming (for whatever reason)

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

CSE 167: Introduction to Computer Graphics Lecture #11: Visibility Culling

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

Graphics Hardware. Instructor Stephen J. Guy

Virtual Cameras and The Transformation Pipeline

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Practical Shadow Mapping

Programming Graphics Hardware

CS 354R: Computer Game Technology

Particle systems, collision detection, and ray tracing. Computer Graphics CSE 167 Lecture 17

GPU Programming EE Final Examination

Visibility: Z Buffering

Computer graphics 2: Graduate seminar in computational aesthetics

Pipeline Operations. CS 4620 Lecture 10

Software Occlusion Culling

Transcription:

A Rendering Pipeline for Real-time Crowds Benjamín Hernández and Isaac Rudomin In motion picture movies, large crowds of computer-generated characters are usually included to produce battle scenes of epic proportions. In video games, it is common to find crowds that conform armies controlled by users (or AI) in real-time strategy games, or crowds made of non-player characters (e.g., groups of spectators in a stadium). In virtual environments, it is common to find crowd simulations that interact with other characters and with their surrounding environment. In all cases, optimizations such as level of detail and culling should be performed to render the crowds. In this chapter, we propose a parallel approach (implemented on the GPU) for level of detail selection and view frustum culling, allowing us to render crowds made of thousands of characters. 1.1 System Overview Our rendering pipeline is outlined in figure 1.1. First, all necessary initializations are performed on the CPU. These include loading information stored on disk (e.g., animation frames and polygonal meshes) and information generated as a preprocess (e.g., character positions) or in runtime (e.g., camera parameter updates). This information is used on the GPU to calculate the characters new positions, do view frustum culling, assign a specific level of detail (LOD) for each character and for level of detail sorting and character rendering. A brief description of each stage is given next: Populating the Virtual Environment and Behavior. In these stages we specify the initial positions of all the characters, how they will move through the virtual environment and how they will interact with each other. The result is a set of updated character positions. View Frustum Culling and Level of Detail Assignation. In this stage we use the characters positions to identify which characters will be culled. Additionally, we assign a proper LOD identifier to 1

2 1. A Rendering Pipeline for Real-time Crowds Figure 1.1. Rendering pipeline for crowd visualization. Dashed arrows correspond to data transferred from main memory to GPU memory only once at initialization. the characters positions inside the view frustum according to their distance to the camera. Level of Detail Sorting. The output of the View Frustum Culling and Level of Detail Assignation stage is a mixture of positions with different LODs. In this stage we sort each position according to its LOD identifier into appropriate buffers such that all the characters positions in a buffer have the same level of detail. Animation and Draw Instanced. In this stage we will use each sorted buffer to draw the appropriate LOD character mesh using instancing. Instancing allows us to translate the characters across the virtual environment and add visual and geometrical variety to the individuals that conform the crowd. In the following sections we will present a detailed description of how we implemented these stages.

1.2. Populating the Virtual Environment and Behavior 3 1.2 Populating the Virtual Environment and Behavior For simplicity, our virtual environment is a plane. It is parallel to the plane formed by the xz axes. The initial positions of the characters are calculated randomly and stored into a texture. For behavior, we implemented finite state machines (FSM) as fragment shaders. A finite state machine is used to update the characters positions following [Rudomín et al. 05, Millán et al. 06], in which a character will consult the value of a labeled world map and follow a very simple FSM that causes it to move right until it reaches the right edge of the map, at which point the agent changes state and starts moving left until it gets to the left edge. However, other GPU approaches such as [Erra et al. 09,Silva et al. 09] are ideal for this pipeline 1. Implemeting FSM as fragment shaders needs three kinds of textures: a world space texture, an agent texture and an FSM table texture. World space textures encode values for each location in the virtual environment. This would cover many types of maps: height maps, collision maps, interest area maps or action maps. We consider these maps as maps labeled with some value on each pixel. Agent textures have a pixel for each character and encode the state s of the character and its position (x, z) in the world map. Finally, the finite state machine is represented as a texture where given a certain state of the character and a certain input, we can obtain a new state and position of the character following the basic algorithm showed in listing 1.1. given agent i s t a t e=agent [ i ]. s ; x=agent [ i ]. x ; z=agent [ i ]. z ; l a b e l=world [ x, z ] ; agent [ i ]. s=fsm [ s t a t e, l a b e l ] ; agent [ i ]. x += fsm [ s t a t e, l a b e l ]. d e l t a x ; agent [ i ]. z += fsm [ s t a t e, l a b e l ]. d e l t a z ; Listing 1.1. Basic algorithm to implement FSM as fragment shader. 1.3 View Frustum Culling View frustum culling (VFC) consists of eliminating groups of objects outside the camera s view frustum. The common approach for VFC is to test the intersection between the objects and the six view frustum planes using their plane equations to determine the visibility of each object. In our 1 The reason we recommend methods that use the GPU for behavior, in addition to the fact that these methods can simulate the behavior of tens of thousand characters efficiently, is that approaches using the GPU eliminate the overhead of transferring the new characters positions between the CPU and and the GPU on every frame.

4 1. A Rendering Pipeline for Real-time Crowds case, we implement a simpler method called radar VFC [Puig Placeres 05]. Radar VFC is based on the camera s referential points. The method tests the objects for being in the view range or not, thus there is no need to calculate the six view frustum plane equations. On the other hand, objects tested against the view frustum are usually simplified using points or bounding volumes such as bounding boxes (oriented or axis-aligned) or spheres. In our case, we use points (the characters positions) together with radar VFC to perform only three tests to determine the character s visibility. In addition, to avoid the culling of characters that are partially inside the view frustum, we increase the view frustum size by units 2 (figure 1.2). Figure 1.2. View frustum culling. As mentioned earlier, radar VFC is based on camera referential points. In other words, the camera has a referential based on the three unit vectors x, y and z as shown in figure 1.3, where c is the position of the camera, n is the center of the near plane and f is the center of the far plane. Figure 1.3. Camera s referential based on the three unit vectors x, y, z. The idea behind radar VFC is that once we have the character s position 2 The value of is obtained by visually adjusting the view frustum.

1.3. View Frustum Culling 5 p to be tested against the view frustum, we find the coordinates of p in the referential and then use this information to find out if the point is inside or outside the view frustum. The first step is to find the camera s referential. Let d be the camera s view direction, u the camera s up vector, then unit vectors X, Y, Z that conform the referential are calculated using equations 1.1, 1.2 and 1.3. Z = d = d 2 x + d 2 y + d 2 z (1.1) X = Z u (1.2) Y = X Z (1.3) Once we have calculated the referential, the next step is to compute the vector v that goes from the camera center c to the agent s position p using equation 1.4: v = p c (1.4) Next, the vector v is projected onto the camera referential, i.e., onto X, Y, Z unit vectors. Radar VFC first tests vector v against the Z unit vector; v is outside the view frustum if its projection proj Z v / (Near plane, F ar plane ). Notice that the projection of a vector a into a unit vector B is given by the dot product of both vectors, i.e. proj B a = a B. If proj Z v [Near plane, F ar plane ], then vector v is tested against the Y unit vector; v will be outside the view frustum if its projection proj Y v / ( (h/2 + ), h/2 + ) interval. Where h is the height of the view frustum at v s position and is the value used to increase the view frustum size as shown in figure 1.2. The height h is calculated using equation 1.5: h = proj Z v 2 tan fov 2 : fov [0, 2π] (1.5) where fov is the field of view angle. If proj Y v ( (h/2 + ), h/2 + ), then vector v is tested against X unit vector, i.e. v is outside the view frustum if its projection proj X v / ( (w/2 + ), w/2 + ) interval, where w is the width of the view frustum, given in equation 1.6: w = h ratio (1.6) where ratio is the aspect ratio value of the view frustum.

6 1. A Rendering Pipeline for Real-time Crowds VFC and LOD assignation stages are performed using a geometry shader. This shader receives as input the agent texture that was updated in behavior stage (section 1.2), and it will emit the positions (x, z) which are inside the view frustum and a LOD id. The resultant triplets (x, y, LOD id ) are stored in a vertex buffer object using the OpenGL transform feedback feature. Listing 1.2 shows the code that performs radar VFC in GLSL. [ v e r t e x program ] void main ( void ) gl TexCoord [ 0 ] = gl MultitexCoord0 ; g l P o s i t i o n = g l V e r t e x ; [ geometry program ] #define INSIDE t r u e #define OUTSIDE f a l s e uniform sampler2drect p o s i t i o n ; uniform f l o a t nearplane, farplane, tang, r a t i o, d e l t a ; uniform vec3 campos, X, Y, Z ; bool pointinfrustum ( vec3 p o i n t ) // c a l c u l a t i n g v = p c vec3 v = p o i n t campos ; // c a l c u l a t i n g t h e p r o j e c t i o n o f v i n t o Z u n i t v e c t o r f l o a t pcz = dot ( v, Z ) ; // F i r s t t e s t : t e s t a g a i n s t Z u n i t v e c t o r i f ( pcz > f a r P l a n e pcz < nearplane ) return OUTSIDE; // c a l c u l a t i n g t h e p r o j e c t i o n o f v i n t o Y u n i t v e c t o r f l o a t pcy = dot ( v,y) ; f l o a t h = pcz tang ; h = h + d e l t a ; // Second t e s t : t e s t a g a i n s t Y u n i t v e c t o r i f ( pcy > h pcy < h ) return OUTSIDE; // c a l c u l a t i n g t h e p r o j e c t i o n o f v i n t o X u n i t v e c t o r f l o a t pcx = dot ( v,x) ; f l o a t w = h r a t i o ; w = w + d e l t a ; // Third t e s t : t e s t a g a i n s t X u n i t v e c t o r i f ( pcx > w pcx < w ) return OUTSIDE; return INSIDE ;

1.3. View Frustum Culling 7 void main ( void ) vec4 pos = texture2drect ( p o s i t i o n, gl TexCoordIn [ 0 ] [ 0 ]. s t ) ; i f ( pointinfrustum ( pos. xyz ) ) g l P o s i t i o n = pos ; EmitVertex ( ) ; EndPrimitive ( ) ; Listing 1.2. Code for radar view frustum culling in GLSL. 1.3.1 Assigning Level of Detail After determining which positions are inside the view frustum, the next step is to assign a LOD id according to a given metric. In this case, we use discrete LOD 3 which consists of creating different LODs for each character as a preprocess. At run-time, the appropriate character s LOD is rendered using its LOD id. Metrics for assigning values to LOD id can be based on distance to the camera, model size in screen space, eccentricity of the model with respect to the camera or perceptual factors among others. For performance and simplicity, we are using the distance to the camera as our metric; we also use visual perception to reduce the popping effect. The idea behind the distance to the camera metric is to select (or in our case, assign) the appropriate LOD based on the distance between the model and the viewpoint (i.e., coarser resolution for distant geometry). Nevertheless, instead of computing the Euclidean distance between the object and the viewpoint, we define the appropriate LOD as a function of the view range and the far plane. These values are obtained by the camera referential points. The camera s view range is given by unit vector Z and it is limited by the distance between the camera center, c, and the F ar plane value. Thus, we test the projection of v onto Z, proj Z v, against different fixed intervals of the view range to assign a value to LOD id. A common approach for manually assigning values to LOD id is using if statements as shown in listing 1.3. Nevertheless, we can reduce GPU branching by eliminating the if statements and by using a sum of unit step functions instead (equation 1.7): 3 It has been shown in [Millán et al. 06] that 2D representations such as impostors make it possible to render tens of thousands of similar animated characters, but 2D representation approaches need manual tunning and generate huge amount of data if several animation sequences are present and/or geometrical variety is considered.

8 1. A Rendering Pipeline for Real-time Crowds... i f projzv <= range0 then LODid = 0 else i f projzv > range0 & projzv <= range1 then LODid = 1 else i f projzv > range1 & projzv <= range2 then LODid = 2... Listing 1.3. Assigning LOD id using if statements. n 1 LOD id = U(proj Z v far distance τ i ) (1.7) i=0 where n is the number of LOD meshes per character, τ i (0, 1], is a threshold that isotropically or anisotropically divides the view range visually calibrated to reduce popping effects and U is the unit step function given by: 1 if t t0, U(t t 0 ) = 0 if t < t 0. Notice that if n = 3 (three LOD meshes per character), then LOD id can receive three values, 0 when the character is near the camera (full detail), 1 when the character is at medium distances from the camera (medium detail) and 2 when character is at distances far from the camera (low detail). Listing 1.4 shows the changes made in listing 1.2 to add LOD id calculation. [ geometry shader ]... bool pointinfrustum ( vec3 point, out f l o a t lod )... // c a l c u l a t i n g t h e p r o j e c t i o n o f v i n t o Z u n i t v e c t o r f l o a t pcz = dot ( v, Z ) ;... // For 3 LOD meshes : lod = s t e p ( f a r P l a n e tao0, pcz ) + s t e p ( f a r P l a n e tao1, pcz ) + s t e p ( f a r P l a n e tao2, pcz ) ; return INSIDE ;

1.4. Level of Detail Sorting 9 void main ( void ) f l o a t lod ; vec4 pos = texture2drect ( p o s i t i o n, gl TexCoordIn [ 0 ] [ 0 ]. s t ) ; i f ( pointinfrustum ( pos. xyz, lod ) ) g l P o s i t i o n = pos ; g l P o s i t i o n.w = lod ; EmitVertex ( ) ; EndPrimitive ( ) ; Listing 1.4. Assigning LOD id using step functions. 1.4 Level of Detail Sorting The result of the VFC and LOD assignation stage is that we have filled up a vertex buffer object (VBO) for all the characters with positions inside the camera s view frustum. And in accordance to their distances to the camera, we have assigned a LOD id for each position (figure 1.4.a.). On the other hand, hardware instancing requires a single LOD mesh to draw several instances of the same mesh, thus we need to organize these positions according to their LOD id in different VBOs. Figure 1.4. a. Output VBO from VFC and LOD Assignation stage; b. Output of LOD sorting stage. Following [Park and Han 09], we will sort the output VBO from the VFC and LOD assignation stage into appropriate VBOs (that we will call V BOs LOD ) such that all of the characters positions in a VBO have the same LOD id (figure 1.4.b.). Since we are using three LODs then we will use three VBOs.

10 1. A Rendering Pipeline for Real-time Crowds In this case, we use transform feedback to populate each V BO LOD. In total, we perform three transform feedback passes. In addition, transform feedback allows us to know how many primitives where written in each V BO LOD. The number of primitives written will be used when calling the Draw Instanced routine. For each transform feedback pass, a geometry shader will emit only the vertices of the same LOD id. This is shown in listing 1.5. Notice that the uniform variable lod is updated each pass. In our case it will be set to 0 for a full-resolution mesh, 1 for a medium-resolution mesh and 2 for a low-resolution mesh. [ geometry shader ] uniform f l o a t lod ; // t h i s v a r i a b l e i s updated each pass void main ( ) vec4 pos = g l P o s i t i o n I n [ 0 ] ; i f ( lod == pos.w ) g l P o s i t i o n = pos ; EmitVertex ( ) ; EndPrimitive ( ) ; Listing 1.5. Geometry shader used to populate each V BO LOD. Figure 1.5 shows the output of this stage and the VFC and LOD assignation stage. The characters positions are rendered as points. We have assigned a specific color for each V BO LOD. In this case, red was assigned to V BO LOD0, green to V BO LOD1 and blue to V BO LOD2. 1.5 Animation and Draw Instanced As a preprocess, we load all of the character meshes and textures that will conform the crowd. We also define several character groups according to their geometrical appearance. Animation is performed with a technique that reuses the character s rig on the other characters of the group. In this method, the animation sequence is stored in a texture array where each layer stores an animation frame. Animation frames specify the rotation angles of the character s joints. These angles are used to pose a character and by interpolating them we perform character animation totally on the GPU. However, approaches such as AniTextures [Bahnassi 06] and Skinned Instancing [Dudash 07] can be used as an alternative. At runtime, for each character group, we render the high-res instances first using the positions stored in V BO LOD0 to world-transform each instance, then we render the medium-res instances using V BO LOD1, and

1.6. Results 11 Figure 1.5. Output of Level of Detail sorting stage, 4096 characters rendered as points. LOD-0 is shown in red, LOD-1 in green and LOD-2 in blue. Left: main camera view. Right: auxiliary camera view, notice that only positions inside the view frustum are visible. finally the low-res instances using V BO LOD2. In each call, we use the function gldrawelementsinstanced available in OpenGL. This function generates a unique instance value called gl InstanceId which is accessible in a vertex shader and it is used as an index to access the instance s specific position, animation frame and its visual characteristics. 1.6 Results We designed two tests to verify the performance of our pipeline. These tests were performed under Windows Vista using an Nvidia 9800GX2 card with SLI-disabled using a viewport size of 900x900 pixels. The goal of the first test is to determine the execution time of the Behavior, VFC & LOD assignation and LOD sorting stages 4. The goal of the second test is to determine the execution time of the complete pipeline. The first test consisted of incrementing the number of characters from 1K to 1M, each character with three LODs. Timing information was obtained using timer queries (GL EXT timer query) which provides a mechanism used to determine the amount of time (in nanoseconds) it takes to fully complete a set of OpenGL commands without stalling the rendering pipeline. Results of this test are shown in the graph 1.6 (timing values are in 4 We do not provide the execution time of the Animation & Draw Instanced stage since timing results are bigger by several orders of magnitude.

12 1. A Rendering Pipeline for Real-time Crowds Figure 1.6. Test 1 results. Notice that timing results are in milliseconds. milliseconds). In addition, figure 1.5 shows a rendering snapshot for 4096 characters rendered as points. Notice that the elapsed time for VFC & LOD assignation and LOD sorting stages remains almost constant. Besides, when performing transform feedback, we do not need any other subsequent pipeline stages thus rasterization is disabled. The second test consists of rendering a crowd of different characters. Each character has three LODs, the character s LOD-0 mesh is made of 2500 vertices, the LOD-1 mesh 1000 and LOD-2 300. The goal of this test is to determine the execution time of all the stages of our pipeline using two different camera perspectives. In perspective A (fig. 1.7), almost all characters are visible, while in perspective B (fig. 1.8) almost all characters are culled. These results are shown in table 1.1 for perspective A, and in table 1.2 for perspective B. The first column of both tables shows the number of rendered characters, the second one shows the time, in milliseconds, measured for each case. Columns three to five show how many characters per level of detail are rendered, and column six shows the total number of vertices, in millions, transformed by our animation shader. Finally, the last two columns show the percentage of characters that are visible or culled.

i i i i 1.7. Conclusions and Future Work 13 Figure 1.7. Perspective A (8192 characters). Most of the characters are visible. Figure 1.8. Perspective B (8192 characters). Most of the characters are culled. 1.7 Conclusions and Future Work We have shown that optimization techniques such as view frustum culling and level of detail selection in the GPU result in a very small time penalty. In our practical case, the stage that takes more time to execute was Animation & Draw Instanced, which is expected. Moreover, extra memory i i i i

14 1. A Rendering Pipeline for Real-time Crowds Table 1.1. Results obtained in perspective A Agents t LOD0 LOD1 LOD2 Vertices Visible Culled 10 3 10 6 % % 1024 47.62 274 644 54 1.35 95 5 2048 62.11 365 688 397 1.70 71 29 4096 95.60 450 1388 943 2.80 68 32 8192 159.24 483 1438 5085 4.17 86 14 12288 194.93 400 1905 7694 5.21 81 19 16384 261.78 476 2477 10388 6.78 81 19 Table 1.2. Results obtained in perspective B Agents t LOD0 LOD1 LOD2 Vertices Visible Culled 10 3 10 6 % % 1024 19.49 161 0 0 0.40 16 84 2048 22.91 204 0 0 0.51 10 90 4096 49.33 412 314 0 1.34 18 82 8192 57.77 503 317 0 1.60 10 90 12288 72.99 471 995 0 2.20 12 88 16384 97.18 541 1546 0 2.90 13 87 requirements do not exceed the amount needed to store a 32-bit floating texture of 512 512 pixels, i.e. for sixteen thousand characters we needed to allocate four floating point vertex buffer objects of 128 128, one auxiliary vertex buffer object to store partial results obtained from the VFC & LOD assignation stage, and three vertex buffers to store the positions of each level of detail. However, performance results can be improved and memory requirements can be reduced by using the new OpenGL 4.0 characteristic called multiple transform feedback, exposed in the ARB transf orm f eedback3, ARB gpu shader5 and N V gpu program5 extensions, which allows geometry shaders to direct each vertex arbitrarily to a specified vertex stream. Therefore, we will only require one transform feedback call for LOD id sorting and by combining the VFC & LOD assignation and LOD sorting stages we could dispense the auxiliary vertex buffer object used to store partial results obtained from the VFC & LOD assignation stage. This pipeline can be extended by adding an occlusion culling stage. Complex scenes such as those where crowds are needed or those where landscapes are depicted with indigenous vegetation, human elements, buildings and structures can be favored by this. One approach is to perform occlusion culling via OpenGL occlusion queries. Nevertheless, the number of

1.8. Acknowledgements 15 queries needed might not be enough for complex scenes made of hundreds of thousands of elements. In addition, it requires synchronizing the CPU and the GPU, which might stall the pipeline. Another approach is to put extra cameras in the positions of big structures and perform radar view frustum culling (using the normalized version of the view frustum) and then take the complement of the visible set. This stage can be performed after the behavior stage and before the VFC & LOD assignation stage. 1.8 Acknowledgements We wish to thank Nvidia for its kind donation of the GPU used in the experiments. Bibliography [Bahnassi 06] Wessam Bahnassi. AniTextures. In ShaderX4 - Advanced Rendering Techniques. Charles River Media, 2006. [Dudash 07] B. Dudash. Skinned Instancing. NVIDIA Technical Report, 2007. [Erra et al. 09] Ugo Erra, Bernardino Frola, and Vittorio Scarano. A GPU-based Method for Massive Simulation of Distributed Behavioral Models with CUDA. In 22nd Annual Conference on Computer Animation and Social Agents (CASA 2009). Charles River Media, 2009. [Millán et al. 06] Erik Millán, Benjamín Hernández, and Isaac Rudomín. Large Crowds of Autonomous Animated Characters using Fragment Shaders and Level of Detail. In ShaderX5 - Advanced Rendering Techniques. Charles River Media, 2006. [Park and Han 09] Hunki Park and Junghyun Han. Fast Rendering of Large Crowds Using GPU. In ICEC 08: Proceedings of the 7th International Conference on Entertainment Computing, pp. 197 202. Berlin, Heidelberg: Springer-Verlag, 2009. [Puig Placeres 05] Frank Puig Placeres. Improved Frustum Culling. In Game Programming Gems V. Charles River Media, Inc., 2005. [Rudomín et al. 05] Isaac Rudomín, Erik Millán, and Benjamín Hernández. Fragment shaders for agent animation using Finite State Machines. Simulation Modelling Practice and Theory 13:8 (2005), 741 751. Programmable Graphics Hardware.

16 BIBLIOGRAPHY [Silva et al. 09] Alessandro Ribeiro Da Silva, Wallace Santos Lages, and Luiz Chaimowicz. Boids that see: Using self-occlusion for simulating large groups on GPUs. Comput. Entertain. 7:4 (2009), 1 20.