Practical logarithmic rasterization for low-error shadow maps

Graphics Harware (2007) Timo Aila an Mark Segal (Eitors) Practical logarithmic rasterization for low-error shaow maps D. Branon Lloy 1 Naga K. Govinaraju 2 Steven E. Molnar Dinesh Manocha 1 1 University of North Carolina at Chapel Hill 2 Microsoft Corporation NVIDIA Corporation Abstract Logarithmic shaow maps can eliver the same quality as competing shaow map algorithms with substantially less storage an banwith. We show how current GPU architectures can be moifie incrementally to support renering of logarithmic shaow maps at current GPU fill rates. Specifically, we moify the rasterizer to support renering to a nonuniform gri with the same watertight rasterization properties as current rasterizers. We also escribe a epth compression scheme to hanle the nonlinear primitives prouce by logarithmic rasterization. Our propose architecture enhancements align with current trens of ecreasing cost for on-chip computation relative to off-chip banwith an storage. For a moest increase in computation, logarithmic rasterization can greatly reuce shaow map banwith an storage costs. Categories an Subject Descriptors (accoring to ACM CCS): I.. [Computer Graphics]: Harware Architecture- Graphics processors, I..7 [Computer Graphics]: Color, shaing, shaowing, an texture 1. Introuction Shaows are an important element in interactive D applications. Unfortunately, shaow renering can require consierable computation resources. In some cases shaow renering can take up to one half of the time to rener a frame [Sho0]. The importance of shaows has le venors to introuce harware features an optimizations specifically targete to reuce the cost of shaow renering [NVI04]. In this paper we present a harware enhancement to support high-quality shaow map renering. Shaows in toay s interactive applications are typically renere using stencil shaow volumes [Cro77] or shaow maps [Wil78]. These algorithms capitalize on the enormous fill rates of current GPUs. For example, the GeForce 8800 generates 192 samples per clock for z-only renering [NVI06]. At the GPU s clock rate of 575 MHz this amounts to an astouning fill rate of 110.4 billion pixels per secon. These fill rates are obtaine by using rasterization in conjunction with epth buffer compression. Rasterization makes heavy use of parallelism an exploits a high egree of coherence both in computation an memory access. Compression provies more efficient use of available memory banwith. Current GPUs also have high-spee memory interfaces. Nevertheless, memory banwith can still be a limiting factor for the performance of shaow volumes an shaow maps. Shaow maps are an attractive algorithm because they are a flexible, image-base approach an can easily hanle complex, ynamic scenes. However, shaow maps must use high resolution to avoi aliasing artifacts. High resolution shaow maps limit performance because of banwith bottlenecks an increase contention for limite GPU memory. In this paper, we buil on recent work that highlights the importance of a logarithmic shaow map parameterization for reucing aliasing [WSP04, LTYM06, ZSXL06]. The recently propose logarithmic perspective shaow map (Log- PSM) algorithm [LGQ 07] combines a logarithmic parameterization with a perspective projection. Potentially, Log- PSMs coul have the same goo performance as competing algorithms, while requiring significantly lower banwith an storage. The main rawback is that the logarithmic rasterization require to support LogPSMs is not available on current GPUs an simulating it is too slow for interactive applications. Main results: We propose the following incremental enhancements to graphics harware to support logarithmic rasterization at fill rates comparable to the linear rasterization on current GPUs: Rasterization on a nonuniform gri: We exten exist- Copyright c 2007 by the Association for Computing Machinery, Inc. Permission to make igital or har copies of part or all of this work for personal or classroom use is grante without fee provie that copies are not mae or istribute for commercial avantage an that copies bear this notice an the full citation on the first page. Copyrights for components of this work owne by others than ACM must be honore. Abstracting with creit is permitte. To copy otherwise, to republish, to post on servers, or to reistribute to lists, requires prior specific permission an/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail permissions@acm.org. GH 2007, San Diego, California, August 04-05, 2007 c 2007 ACM 978-1-5959-625-7/07/0008 $ 5.00

B. Lloy, N. Govinaraju, S. Molnar, & D. Manocha / Practical logarithmic rasterization for low-error shaow maps ing rasterizers that sample on a uniform regular gri to rasterize on a regular gri with nonuniform spacing in one irection. Generalize polygon offset: Polygon offset is use with shaow maps to avoi self-shaowing artifacts. The polygon offset on current GPUs is constant over a primitive. For logarithmic rasterization the offset varies linearly in one irection. A new epth compression scheme: Depth compression is important for reucing memory banwith. Existing epth compression techniques exploit the fact that primitives are planar. These techniques fail for logarithmic rasterization, which causes planar primitives to become curve. We present a epth compression scheme that is better suite for logarithmic rasterization, which, in practice, also works well for linear rasterization. The primary avantage of our propose enhancements is that they capitalize on existing harware esigns. High performance GPUs are the prouct of years of careful tuning an optimization. Our low-cost enhancements enable significant improvement over existing shaow map algorithms without compromising previous capabilities. Logarithmic rasterization requires a moest amount of aitional computational power an prouces significant banwith an storage savings. Therefore these enhancements align well with current harware trens of ecreasing cost for on-chip computation an a relatively high cost for off-chip banwith. The rest of the paper is organize as follows. Section 2 escribes relate work. Section provies an overview of the LogPSM algorithm. Section 4 escribes the harware enhancements require to support logarithmic rasterization. Section 5 presents results an analysis, an conclues with ieas for future work. 2. Relate work The unerlying cause of shaow map aliasing is a mismatch between the sample locations use for renering the image from the eye an the samples use for renering the shaow map from the light [JMB04, AL04]. Ray tracing avois this problem completely because eye an light samples coincie exactly. Despite consierable recent progress, interactive ray tracing at high resolution for eformable scenes remains a challenge. Harware accelerators for ray tracing have been propose [WSS05], but no mass-market solution is currently available. The irregular z-buffer architecture [JLBM05] can also rener a shaow map with samples corresponing to those taken by the eye. However, irregular sampling requires a spatial ata structure to ientify nearest samples. The spatial ata structure increases storage costs, increases memory banwith when accessing or upating samples, an ecreases the locality of memory accesses. Implementing the algorithm with performance comparable to other shaow map algorithms is challenging. Several techniques that can be use on toay s GPUs reuce aliasing by creating a better match between the light an eye sampling rates. These techniques reistribute shaow map samples either by partitioning the shaow map into a collection of smaller, inepenent shaow maps aapte to the local sampling rate [TQJN99, FFBG01, Arv04, ZSXL06, LSO07, GW07], or by applying a global warping function [SD02, WSP04, MT04]. The partitioning techniques ten to be slow because of the large number of rener passes they require. Warping techniques are fast an fairly simple, but because the warping is performe with projective transformations that poorly approximate the eye sample istribution, they still require high resolution to control the resulting errors. Shaow volumes [Cro77] can robustly rener error-free har shaows on toay s GPUs, but they consume an enormous amount of fill rate an o not scale well with geometric complexity.. Logarithmic perspective shaow maps In this section we provie an overview of the logarithmic perspective shaow map (LogPSM) algorithm an highlight its avantages over existing techniques. We refer the intereste reaer to [LGQ 07] for a etaile erivation, escription, an analysis of the algorithm..1. Algorithm LogPSMs use a small number of warpe shaow maps to prouce low-error shaow maps for both point an irectional light sources. The LogPSM algorithm procees as follows: 1. Face partitioning an perspective warping. This step is essentially equivalent to Kozlov s perspective warpe cube map [Koz04]. In the camera s post-perspective space, the view frustum becomes a cube. Kozlov fits a shaow map to each face of this cube that faces away from the light. In worl space this correspons to a perspective warping of the shaow maps for the sies of the view frustum. The shaow map is oriente so that its y axis is aligne with the irection along the length of the sie face (as in Figure 1). 2. Balance resolution allocation. The available shaow map resolution is allocate across the partitions using an aliasing error metric so that maximum errors of the partitions are equalize an the overall maximum error is minimize.. Shaow map renering. Each shaow map is renere separately. The perspective portion of the LogPSM is applie using a stanar 4 4 matrix. The logarithmic part is applie only to the y coorinate of the sie faces using the following equation: c 0 = y = F(y) = c 0 log(c 1 y+1) (1) 1 log( f/n), c 1 = 1 (f/n) ( f/n) where the y coorinate lies in the range [0,1], an n an f are the near an far plane istances of the view frustum. 4. Image renering. During image renering a fragment program is use to sample from the appropriate shaow map for each fragment. (2)

B. Lloy, N. Govinaraju, S. Molnar, & D. Manocha / Practical logarithmic rasterization for low-error shaow maps vertex processor y clipping x (a) (b) (c) rasterizer setup fragment processor.2. Avantages of LogPSMs over other algorithms Of the many shaow map algorithms, those that achieve high performance on toay s harware are base on perspective warping [SD02, WSP04, MT04, Koz04] an/or are variants of cascae shaow maps [TQJN99, ZSXL06, Eng07]. Perspective warping algorithms have two main rawbacks that are aresse by LogPSMs. The first rawback is that a perspective parameterization still leas to high error. Analysis by Lloy et al. [LTYM06] showe that the error for a perspective parameterization is O( f/n). This is an improvement over the O(( f/n) 2 ) error of a stanar shaow map, but not nearly as goo as the O(log( f/n)) of a logarithmic parameterization. The secon rawback is that the single shaow map algorithms [SD02, WSP04, MT04] revert back to stanar shaow maps when the light irection approaches the optical axis of the view frustum. This leas to a huge variation in error as the light moves. The face partitioning of Kozlov s algorithm [Koz04] solves this problem but can occasionally cause shearing artifacts on shaow eges. Log- PSMs correct for these by splitting sie faces that have a large angle between their eges (as seen from the light) an by rotating the perspective projection applie to each half (see [LGQ 07] for more etails). The cascae shaow map (CSM) approaches partition the view frustum along its optical axis. These partitions are a iscrete approximation of a logarithmic parameterization. To approach the same quality as LogPSMs in scenes with a high epth ratio, CSMs require a large number of partitions. For omniirectional point lights, multiple sets of CSMs are require, leaing to even more partitions. Renering a large number of partitions egraes performance. LogPSMs, on the other han, require an average of about 4 partitions for irectional lights an 6 for point lights insie the view frusmemory interface () (e) (f) alpha, stencil, & epth tests blening epth compression color compression Figure 1: Comparison of various parameterizations. (a) In this view from an overhea irectional light we see that the spacing between eye samples in the view frustum increases linearly in y. (b) A stanar shaow map prouces a poor match for the eye sample istribution near the viewer. (c) Perspective projections can obtain a goo fit in x at the expense of poor fit in y [SD02], or () a better fit in y at the expense of the fit in x [WSP04]. (e) Cascae shaow maps are a iscrete approximation of (f) the continuous logarithmic perspective parameterization. The main ifferences between the stanar shaow maps an the LogPSM algorithm are the parameterization an the use of more than one shaow map. Figure 2: Graphics pipeline. We propose moifications to the rasterizer an epth compression units. Note that both compression units perform ecompression as well. tum. For the same number of shaow maps as LogPSMs, CSMs have significantly higher error. (See Figure 7 for comparison of LogPSMs with other algorithms). 4. Harware enhancements for logarithmic rasterization The perspective part of the LogPSM parameterization can be hanle using the stanar graphics pipeline. The logarithmic part, however, requires logarithmic rasterization which causes planar primitives to become curve. In this section we escribe the incremental harware enhancements that allow LogPSMs to be renere with the same fill rates as competing shaow map algorithms, but with the benefits of reuce storage an banwith. We moify just two fixe-function harware components the rasterizer an the epth compression. After briefly reviewing rasterization base on ege equations, we escribe our moifications. 4.1. Rasterizing with ege equations The rasterizer performs two main functions. First, it etermines which pixels are covere by a primitive. Secon, it linearly interpolates attributes from the vertices to the covere pixels. Coverage etermination on moern, highperformance graphics harware is typically performe using implicit ege equations of the general form: E(x,y) = Ax+By+C. () The sign of E at a point (x,y) inicates on which sie of the ege the point lies. Points insie a convex primitive lie on the positive sie of all its eges. One of the main avantages of rasterizing with ege equations is that the evaluation of E over many pixels is easy to parallelize. Current GPUs evaluate ege equations for a tile of samples (e.g. a 4 4 block) in parallel. To make the most efficient use of harware, GPUs perform a hierarchical traversal of the image. A coarse stage ientifies tiles that are at least partially covere by a primitive. The tiles are traverse in a manner that is favorable for page memories an maximizes cache coherence [Pin88, MM00, MWM01]. A fine stage then computes coverage for iniviual samples within a tile. Attributes for the pixels, such as color, epth, an texture coorinates, can be linearly interpolate from the vertices using an equation of the same form as Equation. Olano an Greer [OG97] show

B. Lloy, N. Govinaraju, S. Molnar, & D. Manocha / Practical logarithmic rasterization for low-error shaow maps Figure : Transformation pipeline for LogPSMs. (Left) A worl space view of a line (blue) an a sie face of the view frustum (black). (Mile) After the perspective transformation the sie face has been become a square. (Right) The logarithmic transformation causes the line to become curve. The uniform sample gri (re) in the log warpe space correspons to a nonuniform rectilinear gri in the linear post-perspective space. how to compute the coefficients for these equations. The coefficients are compute in the setup phase of the rasterizer. The ege equations are typically compute in fixe point. The conversion of vertex positions from floating point to fixe point can be thought of as snapping the vertices to iscrete coorinates on a uniform gri. With iscrete coorinates the ege equations can be evaluate at gri locations exactly by using a sufficient number of bits to avoi truncation. This provies a water-tight rasterization without the ouble-hit or missing pixels that can result from numerical robustness problems. 4.2. Logarithmic rasterization Logarithmic rasterization can be thought of as rasterizing curve primitives on a uniform gri, or as rasterizing linear primitives on a gri that is nonuniform in only one imension (see Figure ). We aopt the latter view to facilitate the explanation of how linear rasterization can be extene to support logarithmic rasterization. Our approach is to snap the nonuniform gri samples to positions on an unerlying uniform gri that is fine enough to istinctly represent the samples. The ege equations can then be evaluate exactly using fixe-point arithmetic to provie water-tight rasterization. We will now escribe our moifie ege equations, how to compute coverage with them, the amount of precision require to evaluate them, an a generalize polygon offset for hanling self shaowing artifacts. 4.2.1. Moifie ege an interpolation equations The ege equations for logarithmic rasterization can be compute by transforming y in the warpe space back to y in the linear space using the inverse of Eq. 1 an composing the result with Eq. : E (x,y ) = E(x,G(y )) = Ax+BG(y )+C (4) G(y ) = F 1 (y ) = e(y /c 0) 1 c 1 = ( f/n)(1 (f/n) y ). ( f/n) 1 (5) The interpolation equations are hanle similarly. Note that the coefficients of these equations are base on the vertex positions in linear space resulting from the perspective part Figure 4: Stanar an logarithmic shaow maps. A uniform gri is superimpose on a stanar shaow map (left) to show the warping from the logarithmic parameterization (right). Note that straight lines become curve of the LogPSM parameterization. Rasterization algorithms that initialize tile traversal using vertex positions can apply Equation 1 to the y coorinate of the vertices uring triangle setup. Several algorithms use uring rasterization rely on the fact that quantities compute at tile corners provie conservative bouns for the those quantities over the whole tile. The values of the ege equations at the tile corners are use to steer tile traversal. Conservative epth bouns obtaine from tile corners are use for culling optimizations such as the hierarchical z-buffer or z-min an z-max culling. The warpe ege an interpolation equations remain monotonic so values at tile corners still provie conservative bouns. This means that algorithms that rely on this property can be use for logarithmic rasterization without moification. 4.2.2. Coverage etermination Coverage for a tile sample coul be compute brute-force by an array of ege equation evaluators for each sample in the tile. We conserve ie area, however, by exploiting the linearity of E in the x imension to compute the ege equations incrementally: E (x 0 + x,y ) = E (x 0,y )+A x. (6) We first perform a full evaluation of the ege equations in parallel for the samples in the first column of the tile. We then compute the values for the remaining columns in parallel by simply aing a constant to the first column. These calculations can be pipeline, so that a sustaine rate of one 4 4 tile of samples can be compute per clock. A few cycles of ownstream buffering allow the coverage bits for each tile to be aggregate together an presente broasie to ownstream units. The extra latency incurre by this approach is small compare to the epth of current GPU pipelines. Because the logarithmic rasterization moe is expecte to be use only for shaow maps, issues regaring sample placement for multisample antialiasing are not a concern. Several other possibilities exist for evaluating the ege equations. One possibility is to also compute the G(y ) term incrementally along the first column. After a full evaluation of G(y ) for the first row, y = y 0, the values on subsequent rows k coul be compute in parallel with a multiply-a

B. Lloy, N. Govinaraju, S. Molnar, & D. Manocha / Practical logarithmic rasterization for low-error shaow maps operation: G(y 0 + k y ) = G(y 0)P k + Q k (7) P k = ( f/n) k y, Q k = ( f/n) (1 (f/n) k y ). (8) ( f/n) 1 The constants P k an Q k epen only on the parameterization. Computing G(y ) for the first column incrementally is somewhat more involve because it requires an extra step an creates extra latency. Another possibility is to precompute the values of G(y ) at each scanline an store them in a table. These values epen only on the parameterization an coul be use for all primitives. The table coul possibly be constructe using floating point harware alreay available for vertex/fragment processors. The evaluation of the ege equations for the first column in a tile woul then involve only a table look-up an a ot prouct. Because tiles are traverse in a preefine orer, table entries use by a tile coul be prefetche. The table woul have to be quite large for a high resolution buffer. The size coul be reuce somewhat by precomputing G(y ) only for the first row of a tile an computing the other rows incrementally. 4.2.. Precision requirements To evaluate the ege equations exactly we require fixepoint G(y ) values. We must use at least enough fixepoint precision that any two subsequent values of G(y ) an G(y + y ) can be represente istinctly. In other wors, the least significant bit shoul represent a quantity no larger than G min = min y G(y ) G(y + y ). Given that the range of G is [0,1], the minimum number of bits require is: b min = log 2 (1/ G min ). (9) Because G min epens on the far to near plane epth ratio ( f/n), using a fixe number of bits will place constraints on the range of epth ratios that can be renere accurately. Suppose we esire to rasterize a 2 bx 2 by buffer. The istance between uniform samples in y [0,1] is y = 1/2 by = 2 by. From Figure b we can see that the resulting nonuniform spacing in y = G(y ) is minimal at y = 1. Thus G min = G(1) G(1 2 by ). Plugging this value into Equation 9 yiels an expression that can be boune fairly tightly for ( f/n) [1,10 5 ] as: b min < b y + 0.8log 2 ( f/n). (10) The vertices use to setup the ege equations are limite to the 24 bit precision of a 2-bit floating point mantissa. Therefore we choose to represent fixe-point y coorinates an G(y ) values with 24 bits. Plugging b min = 24 in Equation 10 we get an expression for the largest ( f/n) that can be hanle accurately: ( f/n) max = 2 (24 by) 0.8. (11) For a 4K 4K buffer b y = 12 an ( f/n) max. 10 4. This epth ratio is quite large. For view frusta with even higher epth ratios the sie faces may be split to reuce the epth ratios of the iniviual pieces. Stanar rasterization typically reners to a sub-pixel gri for increase accuracy. Aing sub-pixel resolution in y effectively increases b y, which woul impose further limitations on ( f/n) max. However, this may not be necessary because our approach alreay provies sub-pixel resolution in y. G increases linearly from G min at y = 1 to ( f/n) G min at y = 0. When the epth ratio is equal to ( f/n) max, the regions near the viewer enjoy an ample amount of sub-pixel resolution. While less sub-pixel resolution is available for the istant regions, this may not be a problem because the istant regions ten to be less important. For low epth ratios, sub-pixel resolution is available for both the near an istant regions. 4.2.4. Attribute interpolation Attribute interpolation uses planar equations of the same form as E. The interpolation equations can be evaluate in the same way as the ege equations. The only ifference is that they nee not be evaluate exactly, so floating point or lower precision fixe-point representations may be use. We moify the epth interpolation slightly to hanle polygon offset as escribe in the following section. 4.2.5. Generalize polygon offset Polygon offset is often use to generate a epth bias use to prevent false self-shaowing. The stanar offset is sm z + u, where s is a scale factor, m z is the epth slope of the polygon, an u is a constant. The OpenGL Specification [SA06] efines an approximation of the epth slope m z as: ( ) z m z = max x, z y. (12) The polygon offset can be compute in setup an fole into the epth interpolation equation. The epth slope is compute in linear space so we compute the value of the warpe space z y in linear space using the chain rule: z y = z y y y = z y c 1y+1 c 0 c. (1) 1 The (c 1 y+1)/(c 0 c 1 ) term ecreases with y. Unlike linear rasterization, m z may switch from z/ y to z/ x over a polygon. One way to hanle this is to use two epth interpolation equations, each combine with the polygon offset for z/ x an z/ y, respectively. The max operator coul then be performe per pixel. Another possibility is to split the polygon along the scan line where the switch occurs. This can be one by aing one more ege equation. If ege equations are use to implement the viewport or scissor operations, one of these may be use. Each polygon half is rasterize with the appropriate epth interpolation equation. The simplest approach, however, is to compute m z0, the value of m z at the lower y-boun of the polygon y 0, an m z1, the value of m z at the upper boun y 1, an use a single equation m z(y) that varies linearly between them: m z(y) = m z0(y1 y)+m z1 (y y0) (y 1 y 0 ) (14)

B. Lloy, N. Govinaraju, S. Molnar, & D. Manocha / Practical logarithmic rasterization for low-error shaow maps z 0 z 0 a 0 (a) (b) (c) Figure 5: Depth buffer compression algorithm for logarithmic rasterization. (a) First-orer ifferentials are compute in y an x by subtracting the epth value of the neighbor immeiately above or to the left, respectively. (b) This is followe by a variation of anchor encoing. For the pink region, the anchor a 0 an its y offset form a line use as a preictor, an is a correction term. In the green region a 1, x, an y values form a preicting plane. A ifferent y is use for encoing each row. (c) Bit allocation for each quantity. The offset compute using this equation can be fole into a single epth interpolation equation. When no switch occurs on the polygon, Eq. 14 gives the correct result. When there is a switch, it provies a conservative approximation. The maximum relative ifference in the approximation is largest when y 1 y 0 = 1 with a value of about 1, but for shorter extents the ifference is far less. Note that for paths through our test scenes a switch actually occurs for less than 1% of the polygons. For those polygons with a switch the mean of the maximum relative ifference from using Eq. 14 is less than.001. 4.. Depth buffer compression Depth buffer compression is an important optimization for reucing the memory banwith requirements of rasterization. We refer the reaer to Hasselgren an Möller [HA06] for an excellent survey an etaile explanations of existing epth compression schemes. Many of the current algorithms exploit the planarity of a tile s epth values to achieve fast, lossless compression. Because logarithmic rasterization causes planar primitives to become curve, many of the existing algorithms fail to achieve goo compression ratios. We compose two existing first-orer schemes to prouce a secon orer compression scheme that is better able to capture the curvature in the y irection. The algorithm is summarize in Figure 5. First, we store the base value in the upper-left corner of the tile at full 24-bit precision. We compute first orer ifferentials for the remaining values [DMFW02]. For the green 4 block on the right, the ifferentials shoul be fairly small since the ege equations are linear in x. We then use a variation of an anchor encoing scheme [VM05]. Delta offsets are compute with respect to anchor values. Together the elta offsets an anchor values form a line or plane that is use to preict values for the rest of the locations. Correction terms for the preicte values are also compute. If the values of the anchors, elta offsets, an correction terms all fit within their allotte bit buget the tile can be compresse. Otherwise, it is store uncompresse. Decompression simply reverses the process. Our scheme uses a 128-bit allocation to achieve lossless compression with a : 1 compression ratio. The results are shown in the next section. a 1 24 10 18 10 6 16 10 9 4.4. Summary To summarize, logarithmic rasterization requires the following moifications: A moest increase in bit with in the rasterizer to support the 24-bit fixe-point y coorinates. Evaluators to compute the exponentials in the ege an interpolation equations. Application of the logarithmic parameterization to y coorinates in setup. Computation of generalize polygon offset in setup. A new epth compression unit. The rest of the graphics pipeline remains the same. 5. Results In this section we escribe our logarithmic rasterization simulator an compare the results of LogPSMs with several competing approaches. We also emonstrate the effectiveness of our epth compression an iscuss the overall feasibility of our approach. We have implemente a simulator that uses a fragment program to perform logarithmic rasterization. The simulator performs the logarithmic transformation on triangle vertices an reners the axis-aligne bouning qua that encloses them. The fragment program then evaluates the triangle ege equations an iscars fragments that fall outsie the triangle. It computes epth for the remaining fragments using the epth interpolation equation (which inclues the generalize polygon offset). This simulator runs at about 2 frames per secon on a GeForce 6800 GT on our test moels far below what a native harware implementation coul o, but fast enough to be interactive. We have also implemente several competing shaow map algorithms on the same GPU. Figure 6 shows a comparison of LogPSMs with each of these other algorithms on a town scene. As the figure shows, for the same shaow map resolution LogPSMs eliver lower, more uniform error than competing algorithms. (See [LGQ 07] for a more extensive comparison with other algorithms). 5.1. Depth compression We evaluate our epth compression scheme in a way similar to that of Hasselgren an Möller [HA06]. We capture ata for camera paths through three moels of varying complexity using 4 ifferent resolutions. We also use some of the same compression algorithms [VM05, DMFW02, OPS 05, HA06]. We coul not test the epth offset algorithm because this requires the z-min an z-max values, which are not supplie by our simulator. For each frame we split the finishe epth buffer into 4 4 tiles (excluing tiles untouche uring scene rasterization) an ran them through the compression algorithms. A more etaile simulation woul capture the banwith of partially complete tiles. The results shown in Figure 6 are intene to inicate the relative performance of the algorithms. On epth buffers renere with linear rasterization, our compression scheme is on par with some of the other algorithms. For logarithmic rasterization the other algorithms

B. Lloy, N. Govinaraju, S. Molnar, & D. Manocha / Practical logarithmic rasterization for low-error shaow maps 100 Linear Rasterization 100 Logarithmic Rasterization Compresse size (%) 80 60 40 20 0 Our algorithm Anchor [VM05] DDPCM [DMFW02] Plane & epth offset [OPS*05] Hasselgren an Möller [HA06] 256 512 1024 2048 Resolution Compresse size (%) 80 60 40 20 0 Our algorithm Anchor [VM05] DDPCM [DMFW02] Plane & epth offset [OPS*05] Hasselgren an Möller [HA06] 256 512 1024 2048 Resolution Figure 6: Depth compression (Top row) Benchmark scenes: Town (58K triangles), Robots (95K triangles), an Power Plant (1.7M triangles) The full Power Plant has 1M triangles, but we left out internal pipes. (Bottom row) The two graphs show the average epth compression for several algorithms for shaow maps renere with both linear an logarithmic rasterization. Our epth compression algorithm provies reasonable compression for both. On the right is the logarithmic shaow map from Figure 4 color coe with re for compresse tiles an blue for tiles where the sample epths are all 1 because they are untouche or all 0 ue to epth clamping. fare poorly. Our compression scheme is still able to achieve fairly goo compression ratios, especially at higher resolutions, although the compression ratios are somewhat lower than those for linear rasterization. 5.2. Limitations One of the limitations of our approach to logarithmic rasterization is the limit on the epth ratio. Both LogPSMs an current techniques such as cascae shaow maps hanle high epth ratios by partitioning the view frustum an using more shaow maps. Cascae shaow maps must split to control the error. LogPSMs split ue to precision limitations. Fortunately, LogPSMs require few splits. For example, a single LogPSM can accurately rener shaows on receivers from 1 m to km away. A single split squares this range to over 1000 km. Another limitation is that like all other global warping algorithms, LogPSMs o not take the orientation of surfaces in the scene into account an may not perform as well when the surfaces in the scene are nearly parallel to the light irection. 5.. Feasibility We believe that the enhancements we propose to support logarithmic rasterization are feasible because they are incremental an leverage existing harware esigns. The moifications are isolate to only two fixe-function elements of the GPU pipeline. The changes to the rasterizer are incremental. A new epth compression unit using our algorithm woul have complexity comparable to other algorithms. If esire, the epth compression unit coul sit alongsie the existing one use for linear rasterization. Because our algorithm also provies comparable compression ratios for linear rasterization, our epth compression unit coul feasibly replace the existing unit, thereby enabling a single unit to hanle both linear an logarithmic rasterization. Our enhancements require only a moest increase in onchip calculations but eliver significant reuctions in banwith an storage requirements. Using increase computation to save banwith aligns well with current harware trens because for the same cost, computational power continues to increase rapily while banwith lags consierably behin. Therefore, our propose harware enhancements provie a goo balance between banwith reuction an implementation cost. Conclusion We have shown how logarithmic rasterization can be implemente on GPUs by extening existing harware to rasterize to a nonuniform gri. We have also emonstrate a epth compression scheme that prouces reasonable results for logarithmic rasterization. Using a simulator, we have also shown that logarithmic rasterization can prouce shaow maps with lower error than competing algorithms. For future work we woul like to investigate further improvements to our epth compression scheme. We woul also like to explore the topic of user programmable rasterization. LogPSMs are one example of the benefits a programmable rasterizer coul provie. Rasterization is an obvious next step for increasing user programmability of the graphics pipeline.

B. Lloy, N. Govinaraju, S. Molnar, & D. Manocha / Practical logarithmic rasterization for low-error shaow maps Stanar FP+LSPSM CSM+LSPSM LogPSM Figure 7: Benefits of logarithmic rasterization. The viewer is positione below a tree in a town scene. Both the image an total shaow map resolution are 512 512. ( f /n) = 1000. (Top row) Gri lines for every 5 texels projecte onto the scene. (Bottom row) Color encoing of aliasing error m: green (m = 1), yellow (m = ), re (m > 10). m measures the maximum extent of the projection of shaow map texels in the image. Black pixels occur at partition bounaries. Stanar shaow map have extremely high error. FP + LSPSM combines face partitioning with perspective warping using the light-space perspective shaow maps (LSPSM) [WSP04] algorithm. The cascae shaow map (CSM) uses perspective warping (LSPSM) on the iniviual shaow maps for aitional error reuctions. LogPSMs use logarithmic rasterization to eliver low error both close to the viewer an far away. 5 shaow maps are use for CSM + LSPSM while only are use for FP + LSPSM an LogPSM. Acknowlegements We woul like to thank Cory Quammen for help with the vieo, Ben Clowar for the robot moel, Aaron Lefohn an Taylor Holliay for the use of the town moel, an the anonymous reviewers for their helpful comments. Special thanks go to Jon Hasselgren an Thomas Akenine-Möller for the use of their epth compression coe. This work was supporte in part by an NSF Grauate Fellowship, an NVIDIA Fellowship, ARO Contracts DAAD19-02-1-090 an W911NF-04-1-0088, NSF awars 040014, 042958 an 0404088, DARPA/RDECOM Contract N619-04-C004 an the Disruptive Technology Office. References [AL04] A ILA T., L AINE S.: Alias-free shaow maps. In Proceeings of Eurographics Symposium on Renering 2004 (2004), Eurographics Association, pp. 161 166. 2 [Arv04] A RVO J.: Tile shaow maps. In Computer Graphics International (2004), pp. 240 247. 2 [Cro77] C ROW F. C.: Shaow algorithms for computer graphics. ACM Computer Graphics 11, (1977), 242 248. 1, 2 [DMFW02] D E ROO J., M OREIN S., FAVELA B., W RIGHT M.: Metho an apparatus for compressing parameter values for pixels in a isplay frame. US Patent 6,476,811 (2002). 6 [Eng07] E NGEL W.: Cascae shaow maps. In ShaerX5, Engel W., (E.). Charles River Meia, 2007, pp. 197 206. [FFBG01] F ERNANDO R., F ERNANDEZ S., BALA K., G REENBERG D.: Aaptive shaow maps. In Proceeings of ACM SIGGRAPH 2001 (2001), pp. 87 90. 2 [GW07] G IEGL M., W IMMER M.: Querie virtual shaow maps. In Proceeings of ACM SIGGRAPH 2007 Symposium on Interactive D Graphics an Games (2007), ACM Press, pp. 65 72. 2 [HA06] H ASSELGREN J., A KENINE -M ÖLLER T.: Efficient epth buffer compression. In Proceeings of Graphics Harware 2006 (2006), Eurographics Association, pp. 10 110. 6 [JLBM05] J OHNSON G. S., L EE J., B URNS C. A., M ARK W. R.: The irregular z-buffer: Harware acceleration for irregular ata structures. ACM Trans. Graph. 24, 4 (2005), 1462 1482. 2 [JMB04] J OHNSON G., M ARK W., B URNS C.: The irregular z-buffer an its application to shaow mapping. In The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-04-09 (2004). 2 [Koz04] KOZLOV S.: Perspective shaow maps: Care an feeing. In GPU Gems, Fernano R., (E.). AisonWesley, 2004, pp. 214 244. 2, c Association for Computing Machinery, Inc. 2007.

B. Lloy, N. Govinaraju, S. Molnar, & D. Manocha / Practical logarithmic rasterization for low-error shaow maps [LGQ 07] LLOYD D. B., GOVINDARAJU N. K., QUAM- MEN C., MOLNAR S., MANOCHA D.: Logarithmic perspective shaow maps. Tech. Rep. TR07-005, University of North Carolina at Chapel Hill, 2007. (available at http://gamma.cs.unc.eu/logpsm). 1, 2,, 6 [LSO07] LEFOHN A. E., SENGUPTA S., OWENS J.: Resolution-matche shaow maps. ACM Trans. Graph. (2007). To appear. 2 [LTYM06] LLOYD B., TUFT D., YOON S., MANOCHA D.: Warping an partitioning for low error shaow maps. In Proceeings of the Eurographics Symposium on Renering 2006 (2006), Eurographics Association, pp. 215 226. 1, [MM00] MCCORMACK J., MCNAMARA R.: Tile polygon traversal using half-plane ege functions. In Proceeings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics harware (2000), pp. 15 21. [MT04] MARTIN T., TAN T.-S.: Anti-aliasing an continuity with trapezoial shaow maps. In Proceeings of the Eurographics Symposium on Renering (2004), Eurographics Association, pp. 15 160. 2, [MWM01] MCCOOL M., WALES C., MOULE K.: Incremental an hierarchical hilbert orer ege equation using polygon rasteization. Eurographics Workshop on Graphics Harware (2001), 65 72. [NVI04] NVIDIA CORP.: UltraShaowII: Accelerate shaow calculations. Technical Brief (2004). 1 [NVI06] NVIDIA CORP.: NVIDIA GeForce 8800 GPU architecture overview. Technical Brief (2006). 1 [OG97] OLANO M., GREER T.: Triangle scan conversion using 2 homogeneous coorinates. Eurographics Workshop on Graphics Harware (1997), 89 95. [OPS 05] ORNSTEIN D., PELED G., SPERBER Z., CO- HEN E., MALKA G.: Z-compression mechanism. US Patent 6,580,427 (2005). 6 [Pin88] PINEDA J.: A parallel algorithm for polygon rasterization. In Computer Graphics (SIGGRAPH 88 Proceeings) (Aug. 1988), Dill J., (E.), vol. 22, pp. 17 20. [SA06] SEGAL M., AKELEY K.: The OpenGL graphics system: A specification. http://www.opengl.org/ (2006). 5 [SD02] STAMMINGER M., DRETTAKIS G.: Perspective shaow maps. In Proceeings of ACM SIGGRAPH 2002 (2002), pp. 557 562. 2, [Sho0] SHOEMAKER B.: John Carmack speaks on oom. GameSpot, http://www.gamespot.com/ pc/action/oom/news.html?si=6027989 (200). 1 [TQJN99] TADAMURA K., QIN X., JIAO G., NAKAMAE E.: Renering optimal solar shaows using plural sunlight epth buffers. In Computer Graphics International 1999 (1999), p. 166. 2, [VM05] VAN DYKE J., MARGESON J.: Metho an apparatus for managing an accessing epth ata in a computer graphics system. US Patent 6,961,057 (2005). 6 [Wil78] WILLIAMS L.: Casting curve shaows on curve surfaces. In Computer Graphics (SIGGRAPH 78 Proceeings) (1978), vol. 12, pp. 270 274. 1 [WSP04] WIMMER M., SCHERZER D., PURGATHOFER W.: Light space perspective shaow maps. In Proceeings of the Eurographics Symposium on Renering (2004), Eurographics Association, pp. 14 152. 1, 2,, 8 [WSS05] WOOP S., SCHMITTLER J., SLUSALLEK P.: RPU: a programmable ray processing unit for realtime ray tracing. ACM Trans. Graph. 24, (2005), 44 444. 2 [ZSXL06] ZHANG F., SUN H., XU L., LUN L. K.: Parallel-split shaow maps for large-scale virtual environments. In Proceeings of ACM International Conference on Virtual Reality Continuum an Its Applications 2006 (2006), ACM SIGGRAPH, pp. 11 18. 1, 2,