Genetic Selection of Parametric Scenes Rendering

Size: px

Start display at page:

Download "Genetic Selection of Parametric Scenes Rendering"

Nathan Butler
6 years ago
Views:

1 Genetic Selection of Parametric Scenes Rendering Bruce Merry October 1, 23 Supervised by Dr. James Gain 1 Department of Computer Science University of Cape Town Abstract We describe the rendering component of a system for creating virtual landscapes. The design of the system requires that four scenes be displayed simultaneously in real time, yet preprocessing must be kept to a few seconds. We use level-of-detail techniques to speed up rendering, and show how the traditionally off-line preprocessing can be made fast enough for our application. We also show how graphics hardware can be used to eliminate the popping effect often associated with level-of-detail schemes. bmerry@cs.uct.ac.za 1 jgain@cs.uct.ac.za

2 Contents 1 Introduction 1 2 Background Level of detail management Mesh simplification Continuous level of detail View-dependent simplification Terrain simplification Replacing geometry with texture Billboarding Instancing Triangle strips Shadows Preprocessing Projection shadows Stencilled shadow volumes Shadow maps Programmable hardware vertex processing Terrain Representation Level of detail Optimisation ROAM-based algorithm Vertex program algorithm Advantages of ROAM over the vertex program algorithm Advantages of the vertex program algorithm over ROAM Our implementation Texturing and lighting Trees Textures and normals Instancing Level of detail Creation A simple control scheme Our control scheme Rendering Triangle strips Lighting Sky Clouds Far clip plane Exploration of the environment Simultaneous exploration Collision detection Testing 22 8 Results Preprocessing Terrain level-of-detail Level of detail for trees System tests User testing Conclusions and future work 29 1 Acknowledgements 29 A Constraining placement in edge collapses 3 B Questionnaire on rendering 31 i

3 List of Figures 1 System design The edge collapse operation for a progressive mesh Vertex hierarchy in a progressive mesh Triangle split process used by ROAM and Lindstrom algorithms Typical triangulation produced by ROAM and Lindstrom algorithms A bounding wedgie Interpolations used by Vlietinck s algorithm Projection shadow Stencilled shadow A programmable vertex pipeline Preventing T-junctions Screenshots of the ROAM-based terrain algorithm Screenshots of the vertex program terrain algorithm Producing a light map Combining normals Affine transforms of tree instances The problem with free-form deformations A sequence of distinguished meshes Recomputations for memoryless metrics A sky cube Frame rate comparison of terrain LOD algorithms Triangle counts for the terrain LOD algorithms Quality of the terrain LOD algorithms Popping in the terrain LOD algorithms Frame rates for the tree LOD algorithms Quality for the tree LOD algorithms Popping in the tree LOD algorithms System tests run on a number of scenes Screenshots of test scenes The gap between the base of a tree and its shadow, and the workaround List of Tables 1 Breakdown of the preprocessing time for each tree mesh Breakdown of the preprocessing time for terrain User ratings of sudden and smooth changes, relative to sky ii

4 Glossary and acronyms Alpha channel A complement to the red, green and blue channels that is generally used to represent transparency. An alpha value of corresponds to total transparency, a value of 1 to total opaqueness and intermediate values to translucency. Alpha test An optional test in the rendering pipeline that culls new pixels whose alpha value fails some test. This is often used to cull transparent pixels. Bump map A special kind of normal map that encodes perturbations to a reference normal. generally used to create realistic lighting. Bump maps are Cube map A texture defined over the faces of a cube; generally used to define functions over the space of directions. Geomorph A geometric interpolation between two representations of an object. Gourard shading A shading method in which colour values are computed at vertices and linearly interpolated across faces. GPU Graphics Processing Unit the graphics hardware Heightfield A sampling of the heights of a terrain on a regular grid. Light map A texture that encodes lighting information. LOD Level Of Detail one of several representations of an object with differing complexity and quality. Normal map A texture that encodes normal vectors to a surface. Phong shading A shading method in which normals are linearly interpolated across faces, and lighting is computed at each pixel. PM Progressive Mesh A sequence of meshes, each obtained from the previous by an edge collapse Shadow map A texture that is the depth buffer of a scene as rendered from the point of view of a light source. This is a discrete representation of a shadow volume. Shadow volume The region of space which is cast into shadow by a particular light. Stencil buffer A buffer of the same size as the frame buffer that contains integer values. Stencil test An optional test in the rendering pipeline that applies a mask (or stencil) to limit the pixels that will be updated. The shape of the mask is determined by the stencil buffer. Vertex program A low-level program that the GPU runs on each model vertex to produce a fully transformed and lit output vertex. iii

5 1 Introduction Computers have aided computer graphics professionals in producing visually pleasing images. Computers are used by artists and modellers to create compelling objects, from buildings to popular animated characters. But regardless of the advantages of using the computer, this sort of modelling is still manual labour. Generating and animating large numbers of objects such as flocks of birds, forests and armies is far too laborious. Even more difficult is creating physical phenomena such as clouds or firework displays. The alternative to manually creating models is to have software automatically generate them, a technique known as procedural generation. For example, trees can be generated by simulating their growth patterns, and ground can be created by using noise functions for the height. The output is determined by a relatively small number of parameters which influence the generation, such as branching factors for trees or roughness for terrain. Procedural generation of Virtual Reality scenes has been reasonably successful in producing highly detailed 3-D environments[deussen et al. 1998]. However, such parametric scenes may have a vast number of parameters, such as branching factors, density, colours and so on. Selecting and setting individual parameters is not always feasible. Our system uses genetic algorithms to assist in selecting parameters for procedural generation. A user is presented with four scenes, and is able to walk through them in real time. The user evaluates the scenes in several areas (trees, terrain and sky) and indicates the best in each category. This feedback is used by a genetic algorithm to create four new scenes. The process is repeated until the user is satisfied with the scenes that are generated. The software is designed to run on a standard desktop PC with a consumer graphics card. Our tests used a Pentium 4 2.GHz and an NVIDIA GeForce4 Ti42, but the system is still quite usable on lower-end systems. There are three components to the system: 1. Scene generation 2. Rendering 3. Artificial Intelligence They are related as indicated in figure 1. The scene generator defines a parameter space. Each parameter is described by the number of bits and the type of object controlled by the parameter (trees, terrain or sky). The scene generator uses other fields internally (such as the range to which to map the quantised parameter). The AI generates bitfields encoding the parameters, which are passed to the scene generator to decode. From the decoded parameters, the scene generator creates a scene and passes it to the renderer. The renderer displays the image on the screen and accepts keyboard and mouse events from the user interface to move the viewpoint. Finally, buttons in the user interface indicate user preferences to the AI component which is used to improve the parameters. Unlike many other systems that use procedural generation, the generation is done on-line. This leads to constraints on the speed of the generation. We aim to produce each scene within a few seconds, which includes time for any rendering preprocessing. Despite the limit on preprocessing time, rendering must still be efficient to allow for an interactive walkthough; we aim for a minimum of 15 frames per second. Scene generator Parameter space Parameters AI Scene description Feedback Renderer Rendered image Movements UI Figure 1: System design. The boxes represent components of the system while arrows represent information flows. I was responsible for implementing the rendering engine. This requires efficient rendering of complex outdoor scenes at interactive frame rates. In addition I implemented the user interface, with the exception of the components used to provide feedback to the artificial intelligence. The rendering engine needs to produce visually compelling output at an interactive frame rate (at least 15 frames per second). In addition the user is idle during any preprocessing; as the speed of user feedback is the limiting factor 1

6 in the genetic algorithm, the preprocessing must be kept to a minimum (about a second). The target platform for the renderer was a Pentium 4 processor with a GeForce 4 Ti graphics card 2. Applying the features of the graphics hardware (particularly the programmable vertex pipeline) to speed up rendering formed part of the research of the project. Section 2 provides background information on the techniques used for efficient rendering. Sections 3 through 5 describe how the terrain, trees and sky were implemented. Section 6 covers the walk-though engine. Finally, sections 7 and 8 deal with testing the system. 2 The project proposal called for a GeForce 3; however more money became available for the hardware after the proposal was submitted. 2

7 2 Background Even with modern graphics hardware, procedurally generated scenes can contain far too much information to be rendered naïvely. In an outdoor scene, very little is hidden, making high-speed rendering especially challenging. However, outdoor scenes have several advantages: 1. The most important advantage is that a lot of what is visible is too far away to be seen in detail. This allows for techniques that use lower-quality representations of an object when it is far away. There are several approaches to this; Garland [1999] provides a survey of a number of these techniques. 2. In an indoor scene, a ray from the eye through a pixel will intersect the scene many times. This is termed a high depth complexity. Each intersection corresponds to a pixel that must be processed, unless high-level visibility culling is done. In contrast, such a ray in an outdoor scene is likely to strike only a few trees and the ground 3. This reduces the need for sophisticated culling algorithms. In addition, the load on the pixel pipeline is small and hence some operations like lighting can be moved onto the pixel pipeline for improved accuracy, particularly in conjunction with level-of-detail techniques. 3. The lighting model can be simple. There is only a single light source (the sun), and for practical purposes it can be treated as being infinitely far away. In addition, there are almost no shiny surfaces outdoors (with the exception of water and wet surfaces), and so specular highlights are seldom required. This allows the lighting calculation to be extremely fast. It also allows precomputation of some lighting, since diffuse and ambient light are view-independent. There are, however, more advanced techniques that simulate the transport of light through the atmosphere, such as [Preetham et al. 1999]. These advantages and disadvantages guided the choice of techniques that were considered for this project. 2.1 Level of detail management Level of detail refers to the quality with which a particular object is represented. Quality refers to the number of primitives (in our case, triangles) that are devoted to representing an object. Using more primitives will provide a more accurate representation of an object, while using fewer primitives will lead to artefacts. An object may have several representations with different levels of detail, and the most appropriate one is rendered. High detail representations provide accuracy for close-up viewing, while low detail representations save on rendering time for distant objects Mesh simplification A highly-detailed triangle mesh can be replaced by a coarser triangle mesh with roughly the same shape. Hoppe [1996] proposes progressive meshes (PMs) as a simple way of performing such a simplification. Figure 2: The edge collapse operation for a progressive mesh PMs are based on the operation of collapsing edges (see figure 2). Repeated edge collapses produce a sequence of progressively simpler meshes. Progressive meshes are also designed to accommodate discrete per-face attributes (such as material parameters) as well as piecewise-continuous attributes (such as normals). The choice of which edges to collapse and in what order, can be done in many ways, trading off quality against processing time. Hoppe [1996] describes a scheme that aims to minimise distortion of geometry and attributes, while preserving the topology of sharp edges. Sharp edges are edges across which there is a discontinuity (such as a crease, where there is a discontinuity in the normal field). This scheme is intended for preprocessing and is quite expensive, typically taking many minutes [Hoppe 1996] on a 15MHz Indigo2. As such it is unsuitable for this project. Garland and Heckbert [1997] provide a simpler scheme using quadric functions. Associated with each vertex V is a quadric function Q V. The value of Q V (p) (where p is a position in space) estimates the sum of the squares of the distances of p from each face incident on V. If an edge joining V 1 and V 2 is collapsed to a new vertex at position p, then the cost associated with this collapse is Q V1 (p) + Q V2 (p). Finding an optimal placement for p given V 1 and V 2 requires solving a linear system. 3 This will not be true in a dense forest with many thousands of trees, but that is not our goal. 3

8 This quadric scheme does not take into account attributes such as texture. Garland and Heckbert [1998] propose an extension of the scheme that treats position and attributes as part of an n-element vector, and applies the same technique in n dimensions. Hoppe [1999] further improves on this with a modified scheme that involves a sparse quadric function, and which scales linearly (rather than quadratically) with the number of attributes. This scheme also takes into account sharp edges. There are many other ways to assign costs to edge collapses. Fei and Wu [1999] use a very simple scheme involving only edge lengths and local curvature. Lindstrom and Turk [1998] use a quadric-based approach that penalises changes in volume, changes in surface area near a boundary, and long edges. Garland and Shaffer [22] attack the problem of simplifying enormous models using a two-phase quadric-based approach, with the first stage operating out-ofcore. Lindstrom and Turk [2] compare rendered images of the original and simplified mesh, which automatically considers preservation of colour and texture. Zelinka and Garland [22] propose permission grids, a mechanism to enhance any existing algorithm with a guaranteed error bound. For more detail, refer to Garland s survey of the field [Garland 1999] Continuous level of detail One of the problems of simple level of detail schemes is that popping occurs when moving from one level to another [Southern and Gain 23]. PMs are well suited to morphing between levels of detail [Hoppe 1996]. Southern and Gain [23] describe a variation on the standard progressive mesh, called the g-mesh, that is suitable for hardware accelerated morphing between levels of detail. Unlike a standard PM algorithm, the output is a small number of representations at different levels of detail. The algorithm is based on producing a lower detail mesh from a higher detail one. The final output is then the result of iteratively applying this simplification process, starting with the original mesh. The simplification consists of identifying a maximal set of edge collapses that do not interfere with each other (i.e. have disjoint neighbourhoods), and applying these collapses. Since the number of legal collapses is proportional to the complexity of the mesh, the mesh complexity will decay exponentially with the number of iterations. Conversely, the number of output meshes will be logarithmic with respect to the initial mesh complexity. In fact Southern and Gain [23] show that each mesh will have approximately half the number of vertices of the previous one. Continuous level of detail is provided by morphing between an output mesh M and the next output mesh M. Since M is the result of applying independent edge collapses to M, it is simple to identity a single vertex of M with every vertex of M. The hardware can then be programmed to interpolate vertex coordinates between these representations View-dependent simplification The method used by Southern and Gain [23] renders an entire model at the same level of detail. It is possible to get better results by refining parts of a model (such as those closer to the viewer) more than others (especially back-facing areas). This is particularly necessary for terrain meshes, as there will always be some parts close to the viewer and other parts much further away. Xia and Varshney [1996] and Hoppe [1997] both propose extensions to progressive meshes that handle view-dependent simplification. Both algorithms are based on identifying a vertex hierarchy within a progressive mesh. If the edge-collapse process is reversed it becomes a vertex-split process, where a vertex is replaced by a pair of vertices (figure 2 read from right to left). This process defines a binary forest on the vertices. Any particular representation corresponds to a horizontal cut through the tree (see figure 3). The algorithms differ in how they choose a cut that obeys additional constraints needed to ensure that the neighbourhood of a chosen vertex is also at the correct level of detail. Figure 3: Vertex hierarchy in a progressive mesh. The black dots indicate a horizontal cut of vertices that could form a view-dependent refinement. View-dependent algorithms are less amenable to hardware morphing, because the morphing factor must vary over the model and hence needs to be calculated per vertex and per frame. NVIDIA recommends that one does not use view-dependent progressive meshes with their hardware, as the added CPU cost outweighs the gain on the GPU [Dietrich 2]. 4

9 Figure 4: Triangle split process used by ROAM and Lindstrom algorithms Figure 5: Typical triangulation produced by ROAM and Lindstrom algorithms Terrain simplification Terrain is often represented as a regular field of heights. This constrained representation allows for special case simplification algorithms that perform better than a more general algorithm. However, since terrain usually has a large extent, a view-dependent algorithm is essential. Several existing algorithms were considered for the project: 1. An algorithm published by Lindstrom et al. [1996], based on recursive subdivision of right triangles (see figures 4 and 5). Vertices are either activated or deactivated, depending on a quality metric. The metric bounds the screen-space projection of the line joining the actual position of a vertex to the interpolated position it would have if omitted from the mesh. This metric does not provide a global error bound [Duchaineau et al. 1997], but is nevertheless a good approximation. Activating a vertex involves splitting the triangle(s) of which this vertex is the midpoint of the base. If these triangles do not exist in the current triangulation, they must be forced to exist by splitting higher level triangles. This forced splitting prevents cracks in the mesh due to T-junctions. This basic approach requires re-evaluating every vertex on every frame, which would be prohibitively slow. The algorithm decomposes the heightfield into large square blocks. A bounding interval for the error metric across each block is calculated, and based on this a block may be replaced with a lower or higher resolution version (by halving or doubling the sampling rate). Each block is then triangulated as before, taking care to prevent cracks where blocks join. In addition, blocks that lie completely outside of the view frustum can be culled. 2. Real-time Optimally Adapting Meshes (ROAM) [Duchaineau et al. 1997], an algorithm that aims to improve on the one above by exploiting frame-to-frame coherence. It uses the same subdivision strategy but uses a different error metric and a different algorithm to derive the optimal representation. The error metric is based on the screen-space projection of a wedgie : a bounding triangular prism for a triangular portion of the heightfield (see figure 6). This metric has the advantage of being monotonic: the error metric for a parent triangle is at least as great as that of the sub-triangles. Figure 6: A wedgie used to bound the terrain in the ROAM algorithm. 5

10 ROAM creates each triangulation from the previous one by a sequence of split and merge operations. Two priority queues (ordered by error metric) hold the potential splits and merges, and these are updated on the fly. In each frame, the priority queues are processed (by splitting or merging) until the desired error bound is obtained. ROAM has the additional advantage that other criteria can be used to halt processing, such as a fixed triangle count or a fixed frame rate [Duchaineau et al. 1997]. As described, this algorithm must re-evaluate the error metric for every potential split or merge on every frame. While this is a relatively small set compared to the total set of vertices, this update is still expensive. The implementation described by Duchaineau et al. [1997] uses a velocity bound on the viewpoint to defer re-evaluation until it is possible that a fresh value is necessary for correctness. 3. An algorithm [Vlietinck 23] specifically designed for a hardware programmable vertex pipeline, as found in the GeForce 4 Ti. Every vertex is assigned a level of detail number based on its depth in screen space. Integer values correspond to uniform triangulations (such as those on the left and right in figure 7). Fractional values correspond to interpolations between the closest two integer values. Figure 7 shows how the morphing is done for the two cases. (a) (b) Figure 7: Interpolations used by Vlietinck s algorithm Conceptually one can consider all vertices to be rendered. Those vertices whose assigned level of detail number is lower than the level of detail at which that vertex first appears are collapsed into parent vertices. This would of course be very inefficient, so the algorithm recursively subdivides the heightfield into square blocks until each block has all level of detail numbers lying in the interval [i 1, i + 1] for some integer i. Each block is then rendered with only the geometry for level of detail i Replacing geometry with texture Highly detailed geometry can be simplified with little loss in visual quality if it is replaced with a discrete texture map. For example, a grassy plain may be rendered with very little geometry, but with a grass texture to simulate the appearance of grass. This has been standard practice for many years. More recent improvements include bump mapping [Angel 2, pp ], which ensures correct lighting Billboarding A distant object can often be replaced by a picture of that object, drawn onto a surface that faces the viewer. An alpha channel is used to distinguish between pixels that are part of the image and those that are transparent. OpenGL supports an alpha test mode in which pixels with an opacity below a threshold are culled. However, alpha blending is insufficient because transparent pixels would nevertheless update the depth buffer, causing the supposedly transparent pixels to obscure objects behind them Billboarding is often used in games to represent trees: a single image of the tree is drawn on a billboard that rotates about the vertical axis to face the viewer. For objects that do not have rotational symmetry, billboards can also be updated as the viewer moves to prevent foreshortening. Dobashi et al. [2] present a technique for drawing realistic clouds that employs billboards. Billboards can also be used for parts of an object: leaves on trees are commonly drawn with a quadrilateral and a leaf texture. The difference between using billboards and textures is that a billboard replaces an entire object. For example, a texture mapped tree would still contain geometry describing the trunk and the individual branches (with a bark texture), while a billboarded tree is simply a 2D image. 6

11 2.1.7 Instancing The memory requirements of a complete forest are enormous if every tree is unique. However, these requirements can be lowered dramatically if a few template trees are created, and multiple instances are rendered. To prevent all instances of the same template from looking identical, a unique affine transform can be applied to each. Deussen et al. [1998] describe a system for generating and rendering plant ecosystems that employs instancing. They start with a wide range of plants, described as parameter vectors, and cluster together plants that are close together in the parameter space. 2.2 Triangle strips A potentially limiting factor in rendering triangle meshes is the bandwidth required to transmit connectivity information. A triangle strip is a list of indices (v 1, v 2,..., v k ) that represents triangles {v i, v i 1, v i 2 }. This corresponds to a walk along the triangles of the mesh, alternately turning left and right. If long strips are used, then only a third as much data is required to encode the connectivity. In addition, caching the transformed and lit vertices from the previous triangle means that only one vertex needs to be processed per triangle. Generalised triangle strips allow following triangles to connect to either edge of the previous triangle. They are not directly supported by OpenGL but can be implemented by inserting degenerate triangles (with two vertices the same). Finding an optimal stripification of a mesh (i.e. one using the fewest strips) has been shown to be NP-complete [Estkowski et al. 22]. Most implementations use heuristic approaches. For example, Evans et al. [1996] use a heuristic algorithm that 1. identifies patches (regions divided regularly into quads) and creates strips specifically for these regions; and 2. greedily creating strips, using one of several cost heuristics. 2.3 Shadows There are several techniques for creating shadows. Each has some advantages and disadvantages. In the overview below, the source is the object casting a shadow and the target is the object being shadowed Preprocessing This refers to any technique that computes shadow information for a static scene in advance. This information could be stored as a texture or by splitting polygons into shadowed and non-shadowed regions. This approach is often used by games, which have the benefit of being able to perform preprocessing for hours. These were not considered for the project as only a second or two of preprocessing time is practical Projection shadows This is simplest, and involves projecting the source onto the plane of the target, from the light source (see figure 8). The projection is done with a 4 4 matrix, similar to the way objects are projected from world space onto the screen in normal rendering. The time complexity is the product of the complexities of the source and target, and so this technique is only practical if one or the other is extremely simple (such as a plane or a box). L S shadow Figure 8: A sphere is projected onto a plane to form an elliptic shadow. The light is at L. 7

12 2.3.3 Stencilled shadow volumes This produces excellent results, but can be expensive. Initially the depth buffer is populated with the scene geometry. Next the shadow volume (the volume of space in which objects will be in shadow) is rendered, updating the stencil buffer whenever the depth test succeeds (indicating that the shadow volume lies in front of the scene at this point). The shadow volume is actually rendered in two passes. In one pass the front-facing polygons are rendered, incrementing the stencil buffer. In the next pass the back-facing polygons are rendered, decrementing the stencil buffer. The resulting stencil value for a pixel is the number of times a ray shot from the eye to that pixel enters the shadow volume, less the number of times that ray leaves the volume. In the simplest case, the stencil value will be non-zero precisely for pixels in shadow. A lighting pass can then add light to pixels with zero stencil value, leaving pixels with non-zero stencil values in darkness. In practice there are several problems, such as the near clip plane slicing open the shadow volume. Everitt and Kilgard [22] describe a number of modifications to produce a robust shadow volume algorithm. The algorithmic complexity is at worst linear. However, the multiple passes lead to a high constant factor. SV (c) S L T (b) (a) V Figure 9: Stencilled shadow. A light L creates a shadow volume (SV) behind a source S. A target T passes through this volume, causing the portion inside to be shadowed. Three rays from a viewpoint V are shown. Ray (a) never intersects the shadow volume, (b) enters only enters it, and (c) both enters and leaves it. Only (b) is in shadow Shadow maps A shadow map applies the same idea as a shadow volume. However, the shadow volume is represented with a texture rather than geometry. For each of a finite number of rays leaving the light, the distance from the light to the closest part of the scene is computed and stored in a texture. For each pixel of the target that is to be rendered, the distance from the light to this point on the target is computed. If the distance to the target is greater than the value found in the texture, then the target must be occluded by some part of the scene that is closer to the light and lies in the same direction. The texture map can be created by rendering the scene from the point of view of the light source. The depth buffer then contains the distances required 4. Since this is a texture-based approach, it suffers from aliasing problems (which appear as a jagged shadow). However, it works well with both complex sources and complex targets, and it is easier to implement a robust shadow map algorithm than a robust stencilled shadow algorithm. In static scenes, it is also possible for the shadow computation to be done as a preprocess (a stencilled shadow volume can be preprocessed, but the creation of the stencil must be done at runtime, which potentially involves rendering a large amount of extra geometry per frame). Shadow mapping works best with hardware support, which the GeForce 3 and later video cards provide. In the absence of special-purpose hardware support, the same effect can be achieved by using alpha test hardware. During the preprocessing, linear texture coordinate generation copies the depth values into a texture coordinate, which is passed through as an alpha value to the destination alpha plane (using a 1D ramp texture to implement the identity function if necessary). This alpha plane is then saved to a depth map texture, as the depth buffer normally would be. During rendering, linear texture coordinate generation transforms the rendered point to the light s coordinate system and produces an alpha value as before. A shader program computes the difference between the incoming 4 The depth values are not actually distances, but have the same sorting property. 8

13 Input vertices transform lighting vertex program setup Pixel Pipeline display Figure 1: A programmable vertex pipeline. Vertices either pass through the traditional fixed-pipe functions on the top, or are processed by the user-supplied vertex program on the bottom. The fully transformed and lit vertices are processed in the same way once they enter the triangle setup stage. alpha value and the value stored in the texture. Finally, alpha test hardware culls those pixels for which the incoming alpha value exceeds the value in the texture (indicating that the pixel is further from the light than some occluder). The disadvantage of this approach is that alpha usually has much lower precision than depth (8-bit versus 24-bit is common). In addition, special purpose shadow hardware will perform texture filtering after performing the shadow comparison rather than before it, allowing for soft shadows. 2.4 Programmable hardware vertex processing Conceptually, any graphics card has two main pipelines. The first is the vertex pipeline, which performs the following functions (among others): 1. Transform the position and normal using the model-view and projection matrices. 2. Compute lighting at vertices. 3. Compute texture coordinates at vertices, if this is not a simple assignment. 4. Group individual vertices into primitives (usually triangles). The second is the pixel pipeline, which rasterises primitives, interpolates attributes across faces and performs other per-pixel operations such as texture mapping, depth testing, alpha blending and so on. Starting with the NVIDIA GeForce 3, modern consumer graphics cards support a user-programmable vertex engine [Lindholm et al. 21]. The standard components of the vertex pipeline (other than primitive assembly) are replaced with a general-purpose user program. This program contains assembly-level instructions to specify how each vertex should be processed. Figure 1 shows the two forms of the vertex pipeline (the pixel pipeline is represented by a single block). This allows techniques that have previously been done in software to be moved into hardware. As mentioned before, one application is morphing between two representations of an object. Other uses include perturbing vertices (for example, to simulate grass waving in the wind) and performing custom lighting calculations (such as refraction). Moving this processing onto the graphics card is particularly fast for several reasons: The instruction set is designed for vertex processing. The registers are real 4-element vectors, and there are instructions for common operations such as dot products and reciprocal square roots. Vertex processing is highly parallelisable (since the processing of one vertex has no effect on the processing of another), so performance scales almost directly with the number of pipelines. Processed vertices are produced where they are needed, namely on the graphics card. Vertices that are processed in software must be sent across a relatively narrow bus. 9

14 3 Terrain 3.1 Representation The terrain was implemented as a uniformly sampled heightfield. This is a simple, easy-to-use representation. It does have some limitations: 1. It cannot represent vertical cliffs or overhangs 2. It does not adapt the sampling frequency to the roughness or slope of the terrain Since we intended to produce forest scenes that one could walk through, rather than imposing mountains, the first limitation was not an issue. The second limitation is handled by sampling at the maximum required frequency, and then relying on a level of detail algorithm to reduce the sampling of the final rendering, as needed. 3.2 Level of detail Optimisation We hypothesise that recent advancements in video card technology have created new problems in level-of-detail algorithms. Earlier work took the approach of investing a large amount of CPU time to reduce the number of triangles sent to the hardware. We aim to show that this CPU time is becoming the bottleneck in such rendering systems. We compare a classical algorithm (ROAM) to an algorithm designed for a modern GPU. The latter algorithm is far simpler and must render more triangles to achieve the same quality, but places less demand on the CPU. The algorithm of Lindstrom et. al. [1996] was not tested due to time constraints ROAM-based algorithm The first algorithm used to implement level of detail is an adaptation of ROAM [Duchaineau et al. 1997]. It improves on the efficiency of ROAM by discarding the ability to adapt to metrics such as polygon count. The original ROAM algorithm explicitly computes the cost of each candidate operation and uses priority queues to order them. Our algorithm instead computes whether each candidate exceeds the threshold cost, without actually computing the cost. This allows the computation to be optimised (for example, by squaring out square roots and multiplying out denominators). The algorithm was further modified by working top-down rather than computing the current frame dynamically from the previous one. While this approach ignores frame-to-frame coherence, it has several advantages: No queues need to be maintained, thus reducing the memory footprint and the complexity of the code. Only the split operation needs to be implemented. Splits simply propagate up the hierarchy, while merges require more complex management. Although an implementation using queues was never fully written, some initial testing (implementing only the split queue) indicated that queues would provide no speed advantage. Since the bottleneck in a queued approach is the re-evaluation of the metric, deferment techniques [Duchaineau et al. 1997] may change the situation. The first stage in rendering is determining which vertices and triangles to draw. The determination stage is recursive, and begins with four triangles making up the heightfield (joining the corners to the centre). If the current triangle lies outside the view frustum, it is kept but not simplified any further. Otherwise the bounding wedgie for the current triangle is projected into screen space. If at any point the projection of a vertical line segment through the wedgie exceeds a threshold the triangle is split. We use the same conservative estimate as Duchaineau et al. [1997]. If a triangle is split, then The triangle with which it shares a hypotenuse is also split, to prevent T-junctions in the mesh. This may force higher-level splits to occur as well to ensure that the dual triangle exists (see figure 11). The two sub-triangles are processed recursively. Splits continue until the desired quality is reached or no more heightfield samples are available (see figures 5 and 12 for examples). To begin the next frame, all of the enabled flags must be reset. Rather than explicitly clearing all the flags, we use a nonce-based approach. Every flag is in fact an integer, and is deemed to be true if equal to a particular value (the nonce). Incrementing the nonce (to a value it has never had) effectively clears all the flags in O(1) time. Lindstrom et al. [1996] show that the triangulated mesh can be rendered using a single generalised triangle strip. This is achieved by rendering the triangles in the order that they appear in a walk of the vertex split hierarchy (note that this depends on triangles outside the view frustum being rendered; this adds little penalty since these triangles are not subdivided). Every vertex has an associated level, which is the lowest level of detail at which it is used. The 1

15 A A B C B C (a) (b) Figure 11: Preventing T-junctions. (a) Splitting triangle A would create a T-junction (circled). (b) Triangle B is split twice and C is split once to prevent any T-junctions from forming. (a) (b) Figure 12: Screenshots of the ROAM-based terrain algorithm. (a) Eye view. (b) Overhead view, with the viewpoint on the left of the image. algorithm simplifies construction of the strip by noting that the vertices of a common edge between neighbouring triangles have opposite parity. We use the same algorithm in our implementation. Duchaineau et al. [1997] use an adaptive approach which modifies the triangulation used in the previous frame. Since we do not exploit frame-to-frame coherence in generating the triangulation, this is not a practical approach for our implementation. Figure 12 shows two views created using the ROAM algorithm. Figure 12(a) is taken from the camera position, while 12(b) is an overhead shot of the same triangulation (with the camera on the left). The large triangles in the top-left and top-right corners lie outside of the view frustum, so are not refined. There is also a region just in front of the camera that is also not refined, as it lies below the view frustum Vertex program algorithm The second algorithm employs a vertex program [Lindholm et al. 21] to produce continuous level of detail. The algorithm is described in section Briefly, a block (a 2 k 2 k region of the heightfield) is rendered by either recursively rendering four sub-blocks or by directly rendering a uniform triangulation of the block. The recursive routine is implemented as shown below (the starting block is the entire heightfield). 1. Determine whether this block is visible. If not, return. 2. Determine the depth range of the block, and convert this to a LOD number range. 3. If the LOD number range is in the interval [i 1, i + 1] for some integer i, and the block is below a certain size, render the block at level i + 1. Otherwise split the block into four sub-blocks and recursively render them. Visibility determination is performed using a bounding box test. The bounding heights for each block are computed as a preprocess. The same bounding box is used to estimate the depth range of the block. Depth is used rather than 11

16 (a) (b) Figure 13: Screenshots of the vertex program terrain algorithm. (a) Eye view. (b) Overhead view, with the viewpoint on the left of the image. distance because only the corners of the bounding box need to be examined to bound the depth range, as opposed to projecting the viewpoint onto each side of the box. The metric must also be computed for each vertex in the vertex program, and depth is cheaper to compute than distance. Figure 13 shows the same views as figure 12, but using the vertex program algorithm. Unlike the ROAM-based algorithm, the regions that lie outside the view frustum are completely culled. It can also be seen that the transitions between levels are smoother than with ROAM. A maximum size is imposed on rendered blocks to allow a greater area to be visibility culled. This is particularly important when the camera is pointed directly at the ground. In this case, the camera projection plane will largely coincide with the ground, and so depth values (which are distances from this plane) will be small. In the worst case this can lead to the entire terrain being rendered using the highest level of detail, which would be disasterous for performance. The memory requirements of this algorithm are rather large. Apart from the vertex data stored for any algorithm, one must also store a displacement vector and a LOD indicator for each vertex, and a height range for each potential block. If using IEEE single precision (32-bit) floating point for all parameters, the storage required is bytes per heightfield sample, or 7.33MB for a heightfield. Our implementation is somewhat wasteful, and uses 9MB. For comparison, our ROAM implementation (which is also slightly wasteful) uses 6MB. Memory usage can be improved through quantisation. X and Y positions are naturally quantised, and if heights are also quantised to 16 bits then this requirement can be halved. Furthermore, height ranges are only stored for optimisation purposes, in that it is expensive to recompute the height range of a large block. However, 3 4 of all blocks that may be rendered are 2 2, and are no bigger than 4 4. Performing on-the-fly computations of height ranges for small blocks would add little overhead while making large memory savings Advantages of ROAM over the vertex program algorithm ROAM is portable to essentially any graphics hardware, while not all current hardware supports vertex programs. The vertex program could instead be implemented in software, but this would significantly reduce the speed. ROAM adapts to the curvature of the terrain. In a terrain with some flat areas and some rough areas, the vertex program algorithm must render the entire terrain at the level of detail required to have the rough areas look correct. Although it was not implemented as such in this project, ROAM is designed to be able to work with criteria other than the error metric itself (e.g. fixed polygon count, fixed frame rate) Advantages of the vertex program algorithm over ROAM It provides continuous level of detail. This allows a lower detail rendering to be used, since there is no popping to draw the user s attention to the fact that a LOD mechanism is being used. 12

(a) (b) (c) Figure 14: Producing a light map. (a) The light map with no shadows. (b) The shadow map. (c) The light map with shadows added.

. This allows the CPU to be used for other tasks, such as game AI or physics simulation. 3.2.

It is at least as fast as ROAM, even on flattish terrain where ROAM uses far fewer triangles.

17 (a) (b) (c) Figure 14: Producing a light map. (a) The light map with no shadows. (b) The shadow map. (c) The light map with shadows added. It places a much lighter load on the CPU (less than half in some cases see section 8.2). This allows the CPU to be used for other tasks, such as game AI or physics simulation Our implementation The results show that the disadvantages of the vertex program algorithm do not prevent it from being used for our purposes. It is at least as fast as ROAM, even on flattish terrain where ROAM uses far fewer triangles. To make our program portable, both algorithms are implemented, and ROAM is used only on hardware that does not support vertex programs. 3.3 Texturing and lighting There are many textures that can be applied to the terrain: a ground texture, such as grass or mud; a normal map, encoding the normal at each point on the ground; a light map, representing the light intensity at each point on the ground; a shadow map, used to cast shadows. Since the scene is static, it is also possible to combine some textures together during the preprocessing stage. The viewer is on the ground, so ground textures must be high-detail, but can be repeated to save memory. In contrast, the other textures can be lower frequency (which simply softens the lighting), but cannot be repeated. The final implementation uses two textures: a light map and a ground texture. The light map takes shadows into account. Since there is a light map there is no need for a normal map. The light map is produced in two stages. Initially an unshadowed light map (figure 14a) is computed on the CPU, using the position of the sun and a normal on the terrain, computed from finite differences. A shadow map is also created for self-shadowing of the terrain and shadows cast by the trees (figure 14b). The terrain is then rendered into an off-screen buffer at a high resolution, applying the shadow map and the light map but not the ground texture. It is rendered with a top-down orthogonal projection. This rendering is then scaled down and used as the final light map (14c). Starting with a high resolution light map and down-sampling serves to anti-alias the shadows, which are otherwise very sharp and jagged. Figure 29d shows the final result. Another possibility is to use several ground textures (for mud, grass, gravel, etc) and blend between them. The blend factor could be randomly generated or could be a function of the terrain (such as height or slope). This is left for future work. 13

18 4 Trees The scene generator exports trees purely as a triangle-mesh of geometry. The renderer must provide texturing and lighting. To achieve interactive frame rates, it must also implement a level-of-detail scheme. It must also deal with the problem of storing data for hundreds of trees. 4.1 Textures and normals The scene generator does not provide textures or normals. This is a shortcoming in the scene generator that would need to be addressed in a production system. To provide reasonable appearance for trees, the renderer automatically generates texture coordinates and normals from the geometry. Texture coordinates are generated simply as a linear function of position. The function is oriented so that most of the trunk appears to be correctly textured. However, at two opposite sides of the trunk the texture is stretched. The grain of the bark also completely fails to follow the shape of the tree. Computing normals requires that genuine creases be distinguished from smooth joins. For example, the base of a cylinder forms a crease with the side, while the polygons making up the side form smooth joins. To deal with this, normals are associated with (face, vertex) pairs rather than vertices. The normal assigned to the pair (f, v) is the average of the face normals of all faces incident on v that do not differ from the face normal of f by more than a threshold angle. Figure 15 illustrates this idea in two dimensions: at the three sharp corners, the normals differ widely and are left as is to create the sharp crease. However, at the remaining vertices the normals are close together and will be averaged to create the appearance of a smooth join. (a) (b) (c) Figure 15: Combining normals. The corners labelled (a), (b) and (c) are sharp corners while the remaining vertices are smooth and will have their normals averaged together. 4.2 Instancing To reduce memory requirements, only a relatively small number of tree models are used. These are replicated to provide as many trees as desired. To prevent all instances of the same tree appearing identical, an arbitrary affine transform can be applied to each instance. Figure 16 shows some of the results that may be obtained. Rotation about the vertical axis and shear are also possible. A more general transform, such as a free-form deformation [Sederberg and Parry 1986] would have provided a greater range of instances. A free-form deformation (FFD) maps a cuboid of space into a Bézier hyper-patch. Objects contained within the cuboid are warped as well. This maps straight lines to Bézier curves and planes to Bézier surfaces. A simple FFD would have been within the capabilities of a GPU vertex program. However, an FFD has a number of disadvantages: 1. Even a simple FFD (with 4 3 control points) requires a lengthy vertex program. Since this program is applied to every vertex of every tree in every frame, rendering performance will suffer heavily. (a) (b) (c) (d) Figure 16: Affine transforms of tree instances. (a) The original tree. (b) Uniform scaling. (c) Stretching. (d) Rotation. 14

19 2. Unlike an affine transform, a FFD will map planes to curved surfaces. This requires that the original model is well tessellated in order for the resulting model to appear accurately curved. It also prevents level of detail management from being used (see figure 17 for an illustration of the problem). (a) (b) (c) (d) Figure 17: The problem with FFDs. (a) The original object. (b) The object after a FFD. (c) The object after level of detail decimation. (d) The object after LOD decimation followed by a FFD. 4.3 Level of detail Creation The level of detail for the trees in the scene is based on the progressive mesh [Hoppe 1996], with hardware accelerated geomorphing as described by Southern and Gain [23]. However, the batched hierarchy of Southern and Gain is not used, as it imposes unnecessary restrictions. The batched hierarchy attempts to apply maximal sets of independent edge collapses. The maximality constraint forces every part of the model to be simplified on each iteration even though some areas may be better suited to decimation than others. As an example, a triangle mesh created with the Marching Cubes algorithm [Lorensen and Cline 1987] is tessellated to the same degree in areas of high curvature as in planar regions, and the latter will be far better suited to decimation. The batched hierarchy does have the advantage that the output is a small set of representations that allow for simple geomorphing. However, Hoppe [1996] shows that a geomorph can be constructed between any two meshes in a progressive mesh sequence. Our approach is to select a small number of the meshes from a progressive mesh sequence and morph between them. The selected meshes are named distinguished meshes (DMs), and are labelled D, D 1,..., D k where D is the original mesh and D i+1 is created from D i by a sequence of edge collapses. Figure 18 shows a sequence of DMs. Geomorphs are constructed between D i and D i+1, and are termed representations. Such a geomorph where D i+1 has weight α (α [, 1]) is assigned a LOD number of i + α. This definition is chosen so that representations are a continuous function of LOD numbers. LOD numbers are always clamped to the interval [, k]. The selection of DMs is a trade-off between memory and speed. One extreme is to use the entire PM sequence, requiring O(n 2 ) memory. On the other extreme, one uses only one DM (the original mesh), thus require rendering many more vertices than necessary to maintain quality. We choose to use half as many vertices in each DM as in the previous one. This results in a linear memory requirement, while never requiring that more than double the original number of vertices be stored. Hoppe [1996] also reports good results using an exponential decay. As the progressive mesh implementation is required to be extremely fast, the quadric approach of Garland and Heckbert [1997] is used (described in section 2.1.1). Hoppe [1999] has experimented with a memoryless version of this error metric, and found that it produced better results (memoryless meaning that the quadric associated with a vertex is determined from the faces currently incident on it, rather than those which were incident in the original base). However, a memoryless error metric requires recomputation of the quadrics in the neighbourhood of a collapse, which would slow down an implementation. In addition it would require re-evaluation of the metric in a larger neighbourhood, causing further loss of performance (see figure 19). The implementation supports discontinuous attributes, as in [Hoppe 1996]. The particular attributes that are used are normal and texture coordinates. For efficiency, these attributes are not used in computing the error metric. They are simply interpolated to the new position. This has been found to produce noticeable texture sliding when the placement scheme is unconstrained. Restricting newly created vertices to lie on the original edge significantly reduces the problem. 15

20 D D 1 D 2 D 3 Figure 18: A sequence of distinguished meshes Figure 19: Recomputations for memoryless metrics. When the thick edge is collapsed, the shaded faces are updated. With a non-memoryless metric, the edges incident on the new vertex (dashed) must be updated. With a memoryless metric, the edges incident on any shaded face must be updated (both the dash-dotted and dotted edges). 16

21 4.3.2 A simple control scheme During rendering of the scene, it is necessary to assign a particular representation to each camera position for each object in the scene. We consider first the problem of choosing a representation for a single tree, based on distance. We define a function f that maps the distance of the camera from the tree to a LOD number. In addition we define a quality function q that maps LOD numbers to some view-independent error metric that measures a linear quantity. We would like our function f to have the following properties: 1. f should be continuous, monotone, and reasonably smooth; 2. q(f(d)) should be directly proportional to d. A simple approach is to aim for a fixed polygon density in the rendered image. In this case q should be chosen to be proportional to the square root of the average polygon area in a representation doubling the distance from the camera and quadrupling the average model-space polygon area will keep the screen-space density constant. Assuming that the total surface area does not change significantly (i.e. by an order of magnitude) across levels of detail, polygon area is roughly proportional to the inverse of the number of faces. Trees have low genus 5, so the number of polygons is close to double the number of vertices (using Euler s formula). If there are n vertices in the original mesh, then D i = n 2 i (where D i is the number of vertices in D i. Hence, we can define q(i) = D i 1 2 = 2 i/2 n. We extend this formula to cover non-integer LOD numbers as well, since the extension is continuous and infinitely differentiable. We can now satisfy the properties above by defining f by f(d) = q 1 (αd) = 2 log 2 (αd n) = 2 log 2 d + β, where α is a quality control factor and β = 2 log 2 α + log 2 n Our control scheme The approach above fails to consider that quality may not scale with polygon count 6. This is visible in Figure 18, where the first two DMs have the same shape. An alternative approach is to explicitly associate a quality number with each DM, based on the error metric of the LOD creation algorithm, and interpolate for non-integer LOD numbers. Since the number of vertices scales exponentially with LOD number, we choose to perform interpolation on the logarithm of the error metric. We define r(l) = log q(l) and linearly interpolate r. As before, we define f(d) to be q 1 (αd). Hence f(d) = r 1 (log(αd)) = r 1 (log d + β), where β = log α. We assign the quality values for the distinguished meshes by considering the largest cost of any edge collapse used to produce the mesh from the original. The cost is raised to an appropriate power to correct for the dimensionality of the metric. This approach, unfortunately, breaks down for the initial mesh, as no edge collapses have been performed. We chose to simply hard-code a value, but this is an area that needs further work. 4.4 Rendering Triangle strips Our algorithm is similar to the local algorithm of Evans et al. [1996]. To keep time to a minimum, no global analysis (patch-finding) is performed. The first strip is started with an arbitrary location and direction. Strips are extended as pure strips as far as possible; when they cannot be extended in this way a degenerate triangle is inserted to turn a corner, and extension as a pure strip is continued. When a strip cannot be grown any further (even by turning a corner), a new strip is started. Rather than starting at an arbitrary location, the already rendered triangles are searched backwards for one with an unrendered neighbour, and the strip is started there. If the graphics system has a vertex cache, then this approach will improve the cache hit rate by reusing recently used vertices. 5 Ideally they are genus, but for efficiency the scene generator produces several inter-penetrating cylinders rather than a single connected mesh. 6 In particular, if the original model is over-tessellated then the first few DMs will show negligible quality loss. 17

22 4.4.2 Lighting Since level-of-detail decimation may produce large triangles, the errors introduced by Gourard shading may be large. To compensate, per-pixel lighting (Phong shading) is used. Results still show some artefacts due to the normal being linearly interpolated. Better results could be achieved using a normal map texture; however, parameterising a texture over the surface of a tree is a difficult problem. The light model includes only an infinite distance diffuse component (the sun), an ambient component, and shadows. Ignoring shadows for the present, this means that the light level at a point is entirely determined by the normal at that point. A cube-map texture that maps normals to light levels is precomputed. Shadowing is done with a shadow map, as described in section The OpenGL shadow extension uses a shadow map to return a or 1 value for each pixel. This value is used as the interpolation parameter to the GL_INTERPOLATE shader function to interpolate between the lighting level returned from the cube map and the constant ambient value. Since the shadow value is always or 1, it acts as a multiplexor between ambient + diffuse and ambient. Level of detail causes some problems for shadow mapping. As the geometry of the tree changes, the shadow map does not. Hence, some parts of the tree may move into or out of the shadow volume, producing artefacts such as shadows on the sun-facing side of a tree. To counteract this, the shadow map is created by letting the back-facing polygons define the boundary, rather than the front-facing polygons[wang and Molnar 1994]. The effect of this is that the volume inside the tree is not considered to be in shadow. Sun-facing regions that move slightly will thus still not be in shadow. Back-facing regions may still move into the unshadowed area inside of the tree; however this will not produce artefacts because those areas will have zero diffuse component by virtue of being back-facing. In fact, a bias is used that causes the entire trunk to be considered unshadowed, as this eliminates some sampling artefacts. This approach introduces a small gap between the base of a tree and its shadow on the ground (figure 3a). his is partly due to the bias and partly due to the filtering that is done on the terrain light map. To prevent this, the light map for the terrain is created with a different shadow map. This shadow map is created with front-facing polygons, causing the circle of ground under the tree to be shadowed. Figure 3b shows that the output appears correct. 18

23 5 Sky One possible approach to sky is to tessellate a hemisphere, and assign colours to the vertices. The colours on the faces are interpolated, and no textures are used. This approach makes it simple to dynamically update the sky (for example, to simulate different times of day). However, we did not intend for the sky to change during the short interval that the user would spend evaluating each scene. This system is otherwise rather inflexible, as unless a very fine tessellation is used it can only be used for gradual changes in colour. It would be difficult to paint on clouds or the sun. We chose to instead represent the sky as a cube. The cube is textured, which allows a lot of control over what appears in the sky. The scene generator is responsible for generating the texture, so it is not discussed here. Figure 2a shows a 2D version of the sky cube, with clouds projected onto it, and 2b shows the corner of the sky cube in a scene. C A B (a) (b) Figure 2: A sky cube. (a) Clouds are projected onto the top (A), sides (B) and corners (C). The thick lines show the areas that would be painted on the sky texture. (b) One corner of the sky cube in a scene The sky is also modelled as being infinitely far away. This allows every texture sample to be treated as a direction, and to have the colour computed as a function of direction. It also prevents the user from detecting that the sky is a cube by standing close to one face. Actually placing the sky at infinity introduces problems with frustum clipping, so instead the sky box is moved to always be centred on the user. 5.1 Clouds Although it was noted above that clouds can be painted onto the sky texture, this causes difficulties for the scene generation. The scene generator would need to implement a simple ray-tracer, in which a ray is fired through the clouds to determine the final colour that should be used in the texture. As an alternative, clouds are drawn onto a number of rectangular planes as textures. These textures include an alpha (transparency) channel, to allow higher clouds and the sky box to show through gaps. This also allows clouds to drift over time, but this was not implemented. If one is willing to lose the ability to have clouds move, it would be possible to pre-render the clouds and merge them into the sky map as a preprocess; we choose to keep the system simple and flexible. A problem that arises with the current implementation is that the cloud planes have very distinct edges. In addition, we do not account for the curvature of the Earth, so clouds will never appear to meet the land even if extended infinitely far. The best solution would be to have the scene generator paint clouds onto the sky, although this simply shifts the problem onto the scene generator. An alternative approach would be to allow non-horizontal cloud planes. Unfortunately, joining these planes up without gaps or overlaps is again a non-trivial task. We found that if the cloud planes are placed sufficiently low, the quality is not ideal but is not too distracting, and opted for this simpler approach. 5.2 Far clip plane The clouds and sky must be placed sufficiently far away that they do not intersect with large hills or with trees. They must also be sufficiently separated that no artefacts arise due to finite depth buffer precision. This requires that the far clip plane be placed quite far away. 19

24 Kilgard and Everitt [22] show how the far clip plane may be placed at infinity using the projection matrix. The formula arises as the limit of the standard projection matrix as the far clip distance tends to infinity, and is 2n r+l r l r l 2n t+b t b t b 1 2n 1 where n is the near clip distance and view frustum passes through (l, b, n) and (r, t, n). They further show that the loss of depth precision is marginal unless the near and far clip planes were originally very close, which is unusual. We place the far clip plane at infinity so that clouds and the sky box can be placed in without concern for the far clip plane., 2

25 6 Exploration of the environment Several possibilities were considered for the user interaction with the environment: 1. A walk-though, where the camera follows the terrain. This allows the user to inspect detail close up, and also allows the user to interact with the environment as it would appear in a movie or game. 2. A fly-over, where the camera stays above the tree-tops. This provides the simplest case for collision detection, but does not allow the user to examine detail. 3. A fixed path fly-through, where the camera follows some predefined track through the environment. This may allow the user to examine some detail, but without any control. It is also not clear how collision detection should be implemented. We chose to implement a walk-though, as we felt it was critical that scenes are evaluated in the same way they would be experienced in real use. Terrain following was initially very discontinuous. This was found to be due to the camera height being set relative to the height of the nearest heightfield sample. Switching to bi-linear interpolation between the four surrounding heightfield samples produced a smooth walk-through. 6.1 Simultaneous exploration The performance of the system is heavily limited by the rate at which the user can evaluate scenes. We chose to present four scenes simultaneously, and have camera movement be synchronised across all four. The AI operates on the best elements from a set of scenes, and showing an entire set in the same window allows for a side-by-side comparison. The choice of the number of scenes is a compromise. The AI is expected to do better with more scenes per iteration. However, the rendering engine cannot maintain interactive frame with too many scenes, and the user may struggle to select from too many choices. We found that once a user is comfortable with the system, he/she will often move around very little, preferring to evaluate scenes simply by looking around. To facilitate this, the camera is initially placed at the highest point in the scene that is within a certain distance from the centre. This generally allows the user to gain an impression of the whole scene which would be impossible if the camera was placed in a valley. The controls were selected to be similar to those used in most current PC-based 3D shooter games, as we have found this system to be very effective at controlling motion. Using standard controls will also make it easier for users to operate the system. The keyboard is used to move forwards, backwards and sideways without rotation, and the mouse is used to control the camera direction (the two mouse axes controlling pitch and yaw). 6.2 Collision detection A walk-through creates the most difficult case for run-time collision detection. A concern was that an overly conservative collision-detection algorithm might make user movement almost impossible in a dense forest. This was resolved by using an aggressive algorithm that would in some cases allow the user to move through part of a tree. Collision detection is based on bounding cylinders. Associated with each tree is an infinite vertical cylinder that contains the trunk. This in theory allows the camera to move through branches. However, by using a walk-through the camera is constrained to being close to ground level and so will not encounter many branches. The choice of a bounding cylinder over a bounding box has two advantages. Firstly, a cylinder more accurately reflects the shape of the tree itself. Secondly, attempting to walk directly into a tree is an unstable equilibrium. This prevents any one of the simultaneous views from becoming stuck against a tree for long. 21

26 7 Testing The goals of the renderer are efficiency and image quality. Unlike many other rendering systems, efficiency is important both for preprocessing and rendering. Images are compared for quality against reference frames (produced with no level of detail) and between successive frames, to determine the effect of popping. The possible tests have three degrees of freedom: objects, LOD scheme and scene. Objects refers to the set of objects that are actually rendered, for example only trees or only terrain. LOD refers to the level of detail scheme in use: either no LOD, discrete LOD or continuous LOD. In the case of trees, a second continuous LOD test is done with the simpler control strategy described in section A range of scenes is used, with variation in the number of trees and the shape of the terrain and clouds. The camera follows a fixed path, which moves forward and then backwards across the middle of the terrain. The sun is behind and to the side of the camera to get good illumination while preserving shadows. Since the path is symmetric, quality is only tested on half of the path. Popping is tested in a novel way. The popping that occurs between frames f 1 and f 2 is tested as follows. The geometry from frame f 1 is combined with the camera position of f 2 to produce a test image. This is compared to the reference image, namely frame f 2. Any sudden changes in geometry between the two frames will appear as a difference in the image, without the distortions introduced by camera movement. For the purpose of the popping tests, view frustum culling was disabled as otherwise holes may appear in the test image (due to the view frustum being different from the frustum used for culling). For our tests we chose f 1 and f 2 to be 2 frames apart, every 2 frames (so frames and 2, 2 and 4, 4 and 6 etc) 7. All image comparisons are done with the L 2 norm (square root of the average of squared errors). The L 2 norm is generally better suited to image comparisons than the L 1 norm (average error) and the L norm (maximum error). While comparing pixel errors is not necessarily the best approach (as it does not consider spatial shifts) it is useful provided that there is a good contrast between background and foreground and there are no global shifts. Lindstrom and Turk [2] also report good results using the L 2 norm for level of detail work. All the tests were performed on an Intel Pentium 4 2.GHz processor with 512MB of memory, and a GeForce4 Ti 42 graphics card. The resolution was 4 3, which is the resolution used for each of the four display windows when the user uses the system. Hence, any frame rates reported are expected to be roughly four times that of the final system, and our experience confirms this. Testing with higher resolution showed little change in efficiency. User tests were conducted to determine whether users were aware of any problems in the renderer. Ten users were given a number of tasks to complete using the system. Afterwards, they were asked a number of questions, including questions about rendering. The rendering questions are found in B. Although no level of detail is used for the sky, users were asked if they noticed any sudden or smooth changes in the sky. This separates level-of-detail effects from outside influences. Users were not warned in advance to look for level-of-detail artefacts. 7 The choice of 2 frame gaps is a compromise. A comparison every frame would have been preferrable, but the storage requirements were prohibitive. Using a smaller range of frames would have made the storage manageable, but would very likely miss certain transitions altogether). 22

27 8 Results 8.1 Preprocessing One can split the preprocessing steps performed by the renderer into several categories: 1. rendering the shadow map; 2. terrain preprocessing, which creates data structures that are used for rendering terrain; 3. mesh preprocessing, which is performed on each tree template. There is some preprocessing that does not fall into any of these categories, but it is on the order of a few milliseconds and is ignored. Tests were run on 48 scenes, created using the normal user interface (four at a time). The first set of four was not used as it contained outliers. In each case the top left scene was chosen for the purposes of the AI, to prevent any systematic bias in the generated scenes. Rendering the shadow map takes.119 ±.85 seconds, and at most.413 seconds for all our tests. The time depends on the number of trees, which is why the variation is so large and non-gaussian. Tables 1 and 2 summarise the time taken for various subtasks. Each scene contains 5 meshes (templates); total times are per-scene. The total preprocessing time for a scene ranges from 2 seconds to 4 seconds, with a mean of 3.12 seconds. This means that for four scenes, the average waiting time will be around 12 seconds. Task Time per mesh (s) Std dev (s) Total time (s) Std dev (s) Creating initial data structures Creating a progressive mesh Creating triangle strips Other Total Table 1: Breakdown of the preprocessing time for each tree mesh 8.2 Terrain level-of-detail Figure 21 compares frame rates for the two level-of-detail algorithms for terrain (ROAM and the vertex program algorithm). Figures 29a and 29b show the test scenes. The raw data contains some low-amplitude noise that makes the graphs somewhat difficult to read; this has been smoothed out. The column on the left represents fairly flat terrain, which is where ROAM excels (as it adapts to the curvature of the terrain). In this case ROAM produces few triangles compared to the vertex program algorithm (see figure 22), yet has performance that is essentially the same as the vertex program algorithm. Figure 21c shows that despite the low triangle count, ROAM produces a CPU load comparable to that of the vertex program algorithm. In contrast, the right hand column represents a scene with large hills. Here ROAM and the vertex program algorithm have triangle counts of the same order of magnitude (with fewer triangles for ROAM, as expected), and the vertex program algorithm heavily out-performs ROAM. Figures 21c and 21d show that in all cases, the CPU is a bottleneck. This explains why ROAM is unable to gain much advantage from its lower triangle count. Figure 23 compares screenshots taken every 5 frames against reference screenshots, using the L 2 norm. It shows that the quality of the vertex program algorithm equals or exceeds that of ROAM. The high quality for the vertex program algorithm in figure 23a is expected as the algorithm produces many more triangles. 8.3 Level of detail for trees Trees are tested in a similar manner to terrain, using the scenes shown in figures 29c and 29d. The terrain is not drawn, so that terrain effects will not interfere with the comparison. Clouds are not drawn for the same reason. In the graphs below, the simple scheme is continuous level of detail using the naïve control method described in Task Time mesh (s) Std dev (s) Creating general data structures Merging the light and shadow maps.5.3 Preprocessing for the vertex program LOD scheme.94.5 Total Table 2: Breakdown of the preprocessing time for terrain 23

28 Gentle hills No LOD ROAM Vertex program 6 5 Big hills No LOD ROAM Vertex program Frames per second Frames per second (a) Frame (b) Frame 7 Gentle hills (CPU only) 7 Big hills (CPU only) Frames per second 4 3 Frames per second Frame Frame No LOD ROAM (c) Vertex program No LOD ROAM (d) Vertex program Figure 21: Frame rate comparison of terrain LOD algorithms for two scenes. The column on the left is a flat scene, while the one on the right has large hills. Graphs (a) and (b) show the normal frame rate, while graphs (c) and (d) show the CPU time. The latter is produced by doing all the work except sending the primitives to OpenGL for rendering. 24

29 8 Gentle hills 7 Big hills Triangles Triangles Frame Frame ROAM (a) Vertex program ROAM (b) Vertex program Figure 22: Triangle counts for the terrain LOD algorithms.25 Gentle hills ROAM Vertex program.2.18 Big hills ROAM Vertex program L2 error.15.1 L2 error (a) Frame (b) Frame Figure 23: Quality of the terrain LOD algorithms L2 error Gentle hills ROAM Vertex program L2 error Big hills ROAM Vertex program (a) Frame (b) Frame Figure 24: Popping in the terrain LOD algorithms 25

30 section It can be seen from figure 25 that this scheme produces significantly lower frame rates (the plateau in the left hand graph is an area where no trees are visible). The simple scheme has a lower frame rate because it must produce more triangles to obtain the same quality. Despite this, the quality graphs (figure 26) show that the quality is lower. The continuous scheme can be seen to be an advantage over the discrete scheme. It has similar frame rate and quality, but does not produce the severe popping that the discrete scheme does. The spikes in figure 27 are almost certainly even worse than they appear: they correspond to the popping in a single frame, while the changes for the continuous scheme will be morphed over 2 frames. Frames per second Few trees, gentle hills, cumulus No LOD Discrete Continuous Simple Frames per second Many trees, big hills No LOD Discrete Continuous Simple Frame (a) Frame (b) Figure 25: Frame rates for the tree LOD algorithms Few trees, gentle hills, cumulus Discrete Continuous Simple Many trees, big hills Discrete Continuous Simple L2 error L2 error Frame (a) Frame (b) Figure 26: Quality for the tree LOD algorithms The quality of the level-of-detail scheme for trees is an order of magnitude lower than that of terrain, which is disappointing. Better results could be achieved at the expense of frame rate. The cause of the low quality is that the radii of the trunk and branches decrease as simplification progresses. 8.4 System tests The entire system is also tested to ensure that frame rates and quality are sufficiently high. The scenes shown in figure 29a, 29c, 29d and 29e are used. We aim for a minimum frame rate of 15fps in the user system, which corresponds to 6fps for a single scene. The results (figure 28) show that the target frame rate was easily achieved. The quality is somewhat lower than desirable (due largely to the trees), but the use of continuous level of detail prevents the user from being aware of this. 8.5 User testing To account for outside influences, ratings are taken relative those for the sky (which does not use LOD). Several users claimed noticing large sudden changes in the sky, which indicates that outside influences were present (possibly 26

31 .25 Few trees, gentle hills, cumulus.4 Many trees, big hills L2 error L2 error Frame Frame Discrete Continuous Simple (a) Discrete Continuous Simple (b) Figure 27: Popping in the tree LOD algorithms 45 Framerate.8 Quality Frames per second L2 error Frame Frame Gentle hills Few trees, gentle hills, cumulus Medium trees, small hills, stratus Many trees, big hills (a) Gentle hills Few trees, gentle hills, cumulus Medium trees, small hills, stratus Many trees, big hills (b) Figure 28: System tests run on a number of scenes. (a) Frame rates. (b) Quality metric. 27

32 Table 3: User ratings of sudden and smooth changes, relative to sky Question Mean Standard deviation Standard deviation of the mean Sudden change in ground Smooth change in ground Sudden change in trees Smooth change in trees poor mouse resolution, which led several users to complain of discontinuous motion). All tests were done using the continuous level-of-detail schemes. Table 3 shows the average results once the bias for sky has been subtracted (raw ratings are in the range 1 5, with 5 indicating a large error). Curiously, other than smooth changes in ground the results all show that there was less distortion in the ground and trees than in the sky. None of these results is significant enough to indicate that level-of-detail effects were noticeable to users. Of the ten users, three were aware of texture sliding on the trees. Our own experience shows that the sliding occurs at the points where the texture is stretched due to the way texture coordinates are assigned (see section 4.1). Implementing proper texturing without extreme stretching would most likely resolve the sliding. No other specific problems with the renderer were reported. Two users reported problems that were not related to rendering, and one user reported that there were other problems but did fill in the field to indicate what the problem was. 28

33 9 Conclusions and future work The renderer was extremely successful, attaining far higher frame rates than required. Furthermore, it achieves this with very little loss of quality, as demonstrated by the user testing. However, there are still areas that are open to future research. The way in which the trees are rendered has a few problems with regards to the interaction between lighting and level of detail (see section 4.4). These problems could be alleviated by defining a light map over the surface of the tree. Unfortunately, automatic parameterisation is a difficult problem. Sander et al. [21] describe a scheme for parameterising a model, that is designed to work with progressive meshes. Like most parameterisation schemes, it breaks up the mesh into a number of simple and roughly flat regions, called charts. Each chart is then mapped into a portion of the texture domain. Since the boundaries of adjacent charts do not join up in the texture, this introduces problems with mipmapping. The preprocessing of trees is sufficiently fast to be usable, but is still much slower than one would like. The trees also tend to lose volume and disappear as they are simplified. This may be solved by a volume-preserving metric for progressive meshes, such as that of Lindstrom and Turk [1996]. It would remain to be seen if such a technique could be made fast enough. The results show that the vertex program approach to continuous level of detail is efficient enough to be practical. In the case of terrain, the algorithm we used has the disadvantage of not adapting to the curvature of the terrain. In low-curvature terrains, ROAM is able to compete with the vertex program algorithm for frame rate, and would be faster on lower-end graphics hardware (due to the low triangle count). Further work is needed to allow the vertex program algorithm to adapt to curvature without causing the recursive subdivision to proceed too far. Interesting things could be achieved by more tightly integrating the scene generator with the renderer. For example, the terrain is generated by sampling a function. Instead, the function could be kept inside the render and used to generate coordinates on the fly. This would allow for infinite-resolution and infinite-extent landscapes. The scene generator can also guide the simplification process using knowledge of the structure. 1 Acknowledgements The implementation of the renderer relies on several libraries. wxwindows is the user interface toolkit; libpng, zlib and glpng combine to provide image loading and saving; and libgm is used for vector manipulation. The bark texture is from 3DCafe, and the grass texture was created by Dr. M. Suzuki. Funding for the research was provided by the National Research Foundation (NRF) and the University Council. 29

34 A Constraining placement in edge collapses Garland and Heckbert [1997] use a quadric function to determine the optimal placement for the new vertex in an edge collapse. The quadric determines the cost of the collapse. To be more specific, suppose vertices v 1 and v 2 have associated quadric functions E 1 = x T Q 1 x + x T B 1 + C 1 and E 2 = x T Q 2 x + x T B 2 + C 2. The cost of collapsing an edge joining v 1 and v 2 is the minimum possible value of E = E 1 + E 2 for all values of x, and the new vertex is placed at the value of x that minimises E. If we let Q = Q 1 + Q 2, B = B 1 + B 2 and C = C 1 + C 2 then we can obtain E = x T Qx + x T B + C (1) E = 2Qx + B. (2) Minimising E corresponds to solving E =, which gives the solution x = Q 1 B/2. For reasons outlined in the body, we choose to constrain x to lie on the line joining v 1 and v 2. To do this, we parameterise x as x = v 1 + λd, where D = v 2 v 1. We can now rewrite equation (1) as follows: where E = (v 1 + λd) T [ Q(v 1 + λd) + B ] + C (3) = v T 1 Qv 1 + 2λD T Qv 1 + λ 2 D T QD + v T 1 B + λd T B + C (4) = (D T QD)λ 2 + (2D T Qv 1 + D T B)λ + (v T 1 Qv 1 + C) (5) = aλ 2 + bλ + c, (6) a = D T QD, b = D T (2Qv 1 + B), and (8) c = v T 1 Qv 1 + C. (9) Equation (4) is obtained by noting that the matrix Q is always symmetric. Minimising E is now a trivial scalar problem: λ = b 2a, (1) E = c b2 4a. (11) However, we also choose to constrain λ to the interval [, 1] (corresponding the line segment joining v 1 and v 2 ). If lambda lies outside this range we clamp it, and recompute E accordingly. (7) 3

35 B Questionnaire on rendering 1. How smooth did you find walking through the scenes? Circle all that apply. Very smooth Very jerky 2. Were you aware of any sudden changes in the ground (pieces appearing or disappearing in an instant)? If so, how noticeable was it? None at all Extremely noticeable 3. Were you aware of any smooth changes in the ground (pieces gradually fading in or out)? If so, how noticeable was it? None at all Extremely noticeable 4. Were you aware of any sudden changes in the sky? None at all Extremely noticeable 5. Were you aware of any smooth changes in the sky? None at all Extremely noticeable 6. Were you aware of any sudden changes in the trees? None at all Extremely noticeable 7. Were you aware of any smooth changes in the trees? None at all Extremely noticeable 8. At any point did you notice the bark on the trees shifting? Yes No 9. Did you notice any other problems with the way the scene was displayed? (do not worry about the realism of the scene) Yes No 31

36 References Angel, E. 2. Interactive Computer Graphics: a top-down approach with OpenGL. Addison Wesley Longman. Deussen, O., Hanrahan, P., Lintermann, B., Měch, R., Pharr, M., and Prusinkiewicz, P Realistic modeling and rendering of plant ecosystems. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques, ACM Press, Dietrich, S. 2. Optimizing for hardware transform and lighting. Available from developer.nvidia.com. Dobashi, Y., Kaneda, K., Yamashita, H., Okita, T., and Nishita, T. 2. A simple, efficient method for realistic animation of clouds. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., Duchaineau, M. A., Wolinsky, M., Sigeti, D. E., Miller, M. C., Aldrich, C., and Mineev-Weinstein, M. B ROAMing terrain: real-time optimally adapting meshes. In IEEE Visualization, Estkowski, R., Mitchell, J. S. B., and Xiang, X. 22. Optimal decomposition of polygonal models into triangle strips. In Proceedings of the eighteenth annual symposium on Computational geometry, ACM Press, Evans, F., Skiena, S. S., and Varshney, A Optimizing triangle strips for fast rendering. In IEEE Visualization 96, R. Yagel and G. M. Nielson, Eds., Everitt, C., and Kilgard, M. J. 22. Practical and robust stenciled shadow volumes for hardware-accelerated rendering. Available from developer.nvidia.com. Fei, G., and Wu, E A real-time generation algorithm for progressive meshes in dynamic environments. In Proceedings of the ACM symposium on Virtual reality software and technology, ACM Press, Garland, M., and Heckbert, P. S Surface simplification using quadric error metrics. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., Garland, M., and Heckbert, P. S Simplifying surfaces with color and texture using quadric error metrics. In Proceedings of the conference on Visualization 98, IEEE Computer Society Press, Garland, M., and Shaffer, E. 22. A multiphase approach to efficient surface simplification. In Proceedings of the conference on Visualization 2, Garland, M Multiresolution modeling: Survey and future opportunities. In Eurographics 99 State of the Art Reports, Hoppe, H Progressive meshes. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, ACM Press, Hoppe, H View-dependent refinement of progressive meshes. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., Hoppe, H New quadric metric for simplifiying meshes with appearance attributes. In Proceedings of the conference on Visualization 99, IEEE Computer Society Press, Lindholm, E., Kilgard, M. J., and Moreton, H. 21. A user-programmable vertex engine. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, ACM Press, Lindstrom, P., and Turk, G Fast and memory efficient polygonal simplification. In Proceedings of the conference on Visualization 98, IEEE Computer Society Press, Lindstrom, P., and Turk, G. 2. Image-driven simplification. ACM Transactions on Graphics (TOG) 19, 3, Lindstrom, P., Koller, D., Ribarsky, W., Hodges, L. F., Faust, N., and Turner, G. A Real-time, continuous level of detail rendering of height fields. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, ACM Press, Lorensen, W. E., and Cline, H. E Marching cubes: A high resolution 3d surface construction algorithm. In Proceedings of the 14th annual conference on Computer graphics and interactive techniques, ACM Press,

37 Preetham, A. J., Shirley, P., and Smits, B A practical analytic model for daylight. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., Sander, P. V., Snyder, J., Gortler, S. J., and Hoppe, H. 21. Texture mapping progressive meshes. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, ACM Press, Sederberg, T. W., and Parry, S. R Free-form deformation of solid geometric models. In Proceedings of the 13th annual conference on Computer graphics and interactive techniques, ACM Press, Southern, R., and Gain, J. 23. Creation and control of real-time continuous level of detail on programmable graphics hardware. Computer Graphics Forum Vol. 22, No. 1 (March), Vlietinck, J., 23. Trilinear displacement mapping of a flat surface with a v1.1 vertex shader. Wang, Y., and Molnar, S Second-depth shadow mapping. Tech. Rep. TR94-19, UNC-CS. Xia, J. C., and Varshney, A Dynamic view-dependent simplification for polygonal models. In Proceedings of the conference on Visualization 96, IEEE Computer Society Press, 327 ff. Zelinka, S., and Garland, M. 22. Permission grids: practical, error-bounded simplification. ACM Transactions on Graphics (TOG) 21, 2,

38 (a) Flat terrain (b) Big hills (c) Few trees and flat terrain (d) Many trees and big hills (e) Medium trees and small hills (f) Close-up of a tree Figure 29: Screenshots of test scenes (a) (b) Figure 3: The gap between the base of a tree and its shadow, and the workaround 34

Subdivision Of Triangular Terrain Mesh Breckon, Chenney, Hobbs, Hoppe, Watts

Subdivision Of Triangular Terrain Mesh Breckon, Chenney, Hobbs, Hoppe, Watts MSc Computer Games and Entertainment Maths & Graphics II 2013 Lecturer(s): FFL (with Gareth Edwards) Fractal Terrain Based on