JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.3, NO. 4, DECEMBER, 23 175 An Adapive Spaial Deph Filer for 3D Rendering IP Chang-Hyo Yu and Lee-Sup Kim Absrac In his paper, we presen a new mehod for early deph es for a 3D rendering engine. We add a filer sage o he raserizer in he 3D rendering engine, in an aemp o idenify and avoid he occluded pixels. This filering block deermines if a pixel is hidden by a cerain plane. If a pixel is hidden by he plane, i can be removed. The simulaion resuls show ha he filer reduces he number of pixels o he nex sage up o 71.7%. As a resul, 67% of memory bandwidh is saved wih simple exra hardware. Index Terms graphics, 3D rendering, deph es, Z buffer, early z es I. INTRODUCTION 3D graphics rendering process needs los of memory accesses which are Texure Mapping, Sencil Tes, Deph Tes and Color Blending. Because of he large memory accesses, he performance of he 3D rendering engine is highly depended on he bandwidh of memory sysem. Among he operaions of he 3D rendering engine, he exure mapping is he mos memory inensive operaion. In convenional high-performance rendering processors, exure mapping is performed before he z-es [4], [6], [7]. This archiecure properly suppors he semanics of he sandard APIs such as OpenGL [9], bu he major disadvanage is unnecessary exure mapping for invisible pixels; which causes a wase of he memory bandwidh. In order o remove such unnecessary Manuscrip received November 5, 23; revised November 26, 23. KAIST 373-1 Guseong-dong, Yuseong-gu, Daejeon, 35-71, Republic of Korea+82-42-869-4442 Email: {mosmov,lskim}@ mvlsi.kais.ac.kr operaions, exure mapping should be performed afer he z-es. However, a wider fragmen queue for he pipeline execuion is required. Moreover, he wide separaion beween reading and wriing a z-value makes mainaining he frame memory consisency more difficul because more han one fragmen may have he same pixel address. Hyper-Z echnology [3], ATI s archiecure for heir commercial produc, keeps a reduced resoluion of he z- buffer on he hierarchical z-buffer [8] and removes he fragmens wih z-es failures as early as possible; hey are removed from he pipeline before exure mapping. Afer his sep, exure mapping and he final z-es wih he full screen frame memory are performed. Wih his scheme, a considerable amoun (6 7% on he average) of fragmens wih z-es failures are deeced and hen discarded from he pipeline. However, he hierarchical z- buffer requires a very large daa srucure because i is consruced wih he z-pyramid [8]. Mainaining he hierarchical z-buffer for every frame memory updae may also bring an excessive compuaional burden. Mid-exuring [5] used wo-sage z-es operaion. They used he 1s z-es before he exuring and 2nd z- es afer he exuring. Their wo-sage z-es shares he z-buffer in he frame buffer. Unlike he Hyper-Z [3], i needs no an auxiliary z-buffer for he 1s z-es bu sill needs he overhead for he memory bandwidh o read a z-value for he 1s z-es. In his paper, we presen a simple soluion o perform he deph es parially before he exuring like Hyper-Z bu we need only 1 bi or 2 bis for a fragmen. And he daa is managed by independenly o he z-buffer of he frame buffer. Due o hese feaures, he overhead from he early deph es (like he 1s z-es [5]) is very smaller han previous mehods. We previously presened he basic Deph Filer algorihm [1] concepually. In his paper, we complee our algorihm such ha i is more reliable by using a newly added algorihm for adapive updaing while
176 CHANG-HYO YU e al : AN ADAPTIVE SPATIAL DEPTH FILTER FOR 3D RENDERING IP skipping unnecessary deph-read operaions. The remainder of he paper is arranged as follows. Secion II shows he advanced algorihm of he basic deph filer. Secion III shows he adapaion mehod and secion IV shows he simulaion environmen and resuls. Secion V discusses he resuls while he las secion concludes his paper. II. DEPTH FILTER We use a hierarchical concep wherein a deph filer performs an early-deph-es in he raserizer. This newly added block uses only 1 bi or 2 bis, compared o 24 bis or 32 bis precision in he ATI s Hyper-Z [3] mehod. By his simple es, we rejec a large porion of pixels. Fig.1 shows he block diagram of he modified rendering engine. Since our previous basic deph filer [1] has a fixed filer posiion, i could no be used in acual siuaion. However, an adapaion block is newly added o he previous deph filer. I makes a signal o he deph filer so ha he deph filer updaes is filer o he opimal posiion. In case of he 3-plane sysem of he basic deph filer, we already know ha he deph filer has beer performance due o a fine resoluion of filering, bu if we use he hird plane of 3-plane sysem as a special purpose mask plane, he deph filer has good resul even in a less deph complexiy cases. Fig. 2 shows he modified 3-plane sysem, called 2-plane Skipping Deph Buffer Reading (SDBR) sysem. Triangle Daa Raserizer Deph Filer Ex. MEMORY in Framebuffer Texure Fog, Sencil, Alpha Tes Deph Tes Adapaion To. Per-pixel operaions Fig. 1. A block diagram of he proposed archiecure of 3D rendering Engine. y. NP x DF1 DF Values DF2 1. SDBR mask z DF3 : Encoded Deph Filer Fig. 2. A concepual picure of 3-plane sysem or 2-plane SDBR sysem. : 1 : 1 : 11 A convenional z-es needs a z value of a deph buffer. When a pixel is rendered o a cerain coordinae for he firs ime, i is no necessary o read he z value from he deph buffer because he value of he deph buffer would be 1. (defaul seing). To remove (or skip) his needless read operaion, he deph filer provides perinen informaion. In Fig. 2, if we move DF3 o he Far plane, DF3 is used for soring wheher he pixel has been rendered before. In oher words, his plane holds he informaion wheher he deph in his coordinae is valid or no. If he value of his plane is 1, he curren coordinae had been updaed before. Therefore is deph informaion is valid. When he value of his plane is, we only wrie a deph value o he deph buffer in he frame memory wihou reading he curren deph value. The deph filer sends fragmens o a pixel processor wih his informaion, usually a 1 bi signal. The deph es block can save memory bandwidh wased on reading useless deph. III. ADAPTIVE UPDATING OF THE FILTER POSITION The deph filer uses he mask plane in he z- coordinae o remove he pixels. This plane should be locaed a an opimal posiion so as o yield a maximum rejeced pixels. Therefore he posiion of he deph filer is one of he mos imporan aspecs in our sysem. A. Characerisics of pixels To find an opimum posiion of he filer, we focus on he pixel characerisics. A 3D applicaion s objec daa is disribued by a user a any range of z-coordinaes which are from he near-plane o he far-plane [., 1.]. However, here is no general closed form for he disribuion of pixels. Since he pixels have only nondeerminisic disribuions in he general 3D applicaion s objec daa, we use simple closed form of disribuion funcion only o know and approximae he characerisics of pixel disribuion. The simple models of disribuion funcions are a uniform disribuion funcion, a riangle-shape funcion and a more complex polynomial funcion. These hree ypes of pixel disribuions are shown in Fig. 3.
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.3, NO. 4, DECEMBER, 23 177 p(z) 4.5 4. p 3.5 3. 2.5 2. 1.5 1..5 Normalized Disribuions.45544.45544 z) = + 2 ( z.26) +.1 ( z.8) +.4) ( 3 2.68316 4z, z <.5 p ( z = 2 ) 4 4z,.5 z 1 p1 ( z) = 1.1.2.3.4.5.6.7.8.9 1 z Fig. 3. Normalized disribuions of he hree ypes simplified models. Near plane Deph Filer FP RP SP Far plane Fig. 4. FP and SP are passed o he nex pipeline and RP is removed. Le be an arbirary posiion of z-coordinaes, FP ( = p( z) dz ( = 1 BP p( z) dz (1) where p(z) is he normalized densiy funcion of he pixel daa. From hese equaions, FP((Pixels in Fron of and BP( (Pixels Behind of always have a monoonic funcion of because here is no negaive value of p(z). If he deph filer is on, pixels can be divided ino hree pars as shown in Fig. 4. The pixels, farher han (BP(), are divided ino wo classes by he deph filer. The firs is he sum of survived pixels, SP. These pixels have been esed by he deph filer bu could no be rejeced. The second is he sum of rejeced pixels, RP, by he deph filer. B. An opimal posiion of deph filer To find a maximum RP, we use anoher convenien form of RP. RP can be wrien as RP = BP SP = ( ToalPixels FP) SP (2) ( FP SP) = ToalPixel s + where FP + BP = ToalPixels, SP = BP RP By equaion (2), he maximum value of RP can be obained by he minimum value of (FP + SP). We find a minimum value of (FP + SP) insead of finding an opimal posiion of he deph filer. NRP is defined as (FP + SP). NRP is he negaion of RP, which has a minimum value when RP has a maximum value. This is he firs propery o find an opimal posiion of he deph filer. And he following wo properies are derived from he naural characerisics of he deph filer. Propery 1: The deph filer has a maximum rejecion raio when NRP has a minimum. Propery 2: The deph filer has a larger probabiliy o rejec pixels as he posiion of he deph filer moves o 1. in he z-coordinae because here are more pixels ha can mask he deph filer. Propery 3: The deph filer has a larger probabiliy o rejec pixels as he posiion of he deph filer moves o. in he z-coordinae because here are more pixels ha can be rejeced by he deph filer. The RP( can be approximaed simply o α 1 by propery 2 and α 2 (1 by propery 3. α 1 and α 2 are consans and and from hese properies. We hen rewrie RP( ( α1 ( ) ( α 2 (1 ) = β(1 1 are proporional facors, α = β( 1 (3) whereα is he proporion of BP(. α has a value from o 1. If RP( is, α is and if RP( is he same as BP(, α is 1. We assumed α o be.75 a he maximum (he simulaion resul shows abou 72% (RP(/(FP(+BP() =.72) of oal pixels are rejeced, hen he proporion of α (RP(/BP() is larger han.72, in es model (a)). Therefore he range of α is α.75. Then we se he consan, β, o be 3 because he value of ( 1 is.25 a he maximum. Since he pixels of FP( are no changed by he deph filer, RP( only has a proporion of BP(. The funcion of SP( is also monoonic because SP( is he value of BP( RP( and none of funcions have negaive values.
178 CHANG-HYO YU e al : AN ADAPTIVE SPATIAL DEPTH FILTER FOR 3D RENDERING IP 1..8.6.4.2. (a) p ( ) (b) p 2 ( z) (c) p 3 ( z) NRP( 1 z FP( SP(. FP( RP( NRP( BP( SP(.2.4.6.8 1. NRP( FP( SP( Fig. 5. FP( and SP( wih heir sum, NRP(. RP( has a maximum when FP( = SP(. The curve of SP( has he same value as BP( a boh sides and is lowered by RP(. In oher words, he curve of SP( is a seeper version of he BP( curve. We rewrie he equaions, RP( = BP( 3(1 ( 1 3(1 )) SP( = BP( RP( = BP( (4) ( 1 3(1 )) NRP( = FP( + SP( = FP( + BP( The approximaed opimal posiion, deermined graphically, is obained when FP( = SP(. Since SP( is also a monoone decreasing funcion, his approximaion is accepable. Simulaed NRP(s of he models are shown in Fig.5. Our simplified models are well conduced o find an approximaed opimum posiion. There can be an error a specific cases, for example, all objecs are rendered in back-o-fron order. However, he general 3D models have similar characerisics as shown here. From his algorihm, we place an adapaion block ino a convenional deph es block o adjus he filer posiion. The adapaion block makes a signal, which racks he filer posiion o he cross poin of FP and SP, o he deph filer a every frame. The hardware overhead of our adapaion algorihm is very small; which is composed of wo couners (for FP and SP) and leading-one deecors (for finding deference beween he FP and SP) and some oher operaors. Therefore i is no burden o implemen ino a convenional deph es block. IV. SIMULATION AND RESULTS A. Simulaion environmen For simulaion, we used Graphics Archiecure Tesing Environmen (GATE) [2] based on Microsof visual C++. GATE models overall graphics hardware archiecure hrough a modular approach, suppors OpenGL, and offers easy modificaion and rapid esing of archiecure. I also gahers compuaional saisics. The performance and oher saisical daa can be obained from GATE. We added he deph filer block ino he raserizer of he GATE and he adapaion block ino he convenional deph es block. A single bi of he SDBR signal is fed o he convenional pipelines beween he deph filer and he adapaion block. B. Resuls The es models are shown in Fig. 6. Model (a) is composed of 1 hexagonal columns generaed randomly. Model (b) and (c) are general 3D objecs. Model (d) is an image by exure mapping for esing he 2-plane SDBR sysem. I has 1. of deph complexiy and is used for he wors case of he deph filer. All ess are conduced on a 512 x 512 screen and Back face culling is disabled. We use a replicaed basic objec so as o have high deph complexiy. Fig. 6. Tes models wih differen deph complexiy. ( ) represens a model s deph complexiy. Rejecion Raio Filer posiion.8.7.6.5.4.3.2.1.9.8.7.6.5.4.3.2 (a) Scene (a) Scene (b) Scene (c).1 (c) 1 2 3 4 5 6 7 8 9 1.7.6.5.4.3.2.1 (b) 1-Plane 2-Plane SDBR 3-Plane.6.5.4.3.2.1 (d) 1 2 3 4 5 6 7 8 9 1 Frames Fig. 7. Rejecion raios of he models and each sysem. Graph (a) shows he resuls of he hree kinds of es models in he 3- plane sysem. Graph (b) shows he resul of he hree ypes of sysems for es model (b). Graph (c) is he corresponding filer posiion of graph (a) and Graph (d) is ha of graph (b) Rejecion Raio Filer posiion
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.3, NO. 4, DECEMBER, 23 179 Table 1. RR (Rejecion raio), ERR (Effecive rejecion raio) of he es models. RR ( ERR ) Model (a) Model (b) Model (c) 1-Plane 2-Plane SDBR 3-Plane 63 (79.6) 7.9 (81.9) 71.7 (88.9) 57.9 (79.6) 63.2 (87.) 63.6 (87.4) 3.6 (64.3) 35.3 (74.2) 4.4 (77.2) Fig.7 shows he rejecion raio of he deph filer, (RP(/Toal Pixels). All he sysems and es models have large amouns of rejeced pixels and all converge o he approximaed maximum rejecion in hree frames. The 3-plane sysem racks faser han he ohers and has he bes rejecion raio. Alhough he 2-plane SDBR sysem is slower han he 3-plane sysem, is opimal rejecion raio is similar o he 3-plane sysem. The converged rejecion raios of he es models are shown in able 1. V. DISCUSSIONS We define a required memory bandwidh o compare our sysems wih a convenional sysem. The convenional mehod is no one of he oher ypes of hierarchical mehods such as he Hyper-Z [3], ec. Raher i is he convenional Z buffer mehod. We canno compare wih commercial mehods because hey are no in public. sysem is o be 2 2.6G.57 64 1 bi = 2.371GB/sec in he es model (b). We used he specificaions of Radeon97pro [4], which has a 2.6gigapixels/sec fill rae. As shown in Fig. 8, he 2-Plane SDBR sysem has he bes performance, no only for he complex model (model (a)) bu also for he less complex model (model (d), a exure). In Fig. 8 (d), since es model (d) has 1. of deph complexiy, here is no rejeced pixel. Therefore he required memory bandwidh of he 1-Plane and he 3-Plane sysem is larger han ha of convenional bu even hough he overhead of he deph filer, he 2-Plane SDBR sysem is always beer han he convenional deph es. The effec obained by skipping an unnecessary read operaion is more efficien han he effec obained by adding one plane from a 2-plane o a 3-plane sysem. Hyper-Z [3] shows an effecive rejecion raio of 44 ~ 93% in he es bench bu hey need large inernal memory and synchronizaion of he inernal memory and he deph buffer of he frame buffer. Our algorihm shows an effecive rejecion raio of 64~88% wih only small inernal memory (or regiser) by he deph filer using he adapive mehod. VI. CONCLUSIONS Required Memory Bandwidh = (FR RR B/pixel) + (FR SR 16 B/pixel) +FR (1 RR SR) 2 B/pixel + Deph Filer Overhead Deph Filer Overhead = (Read, Wrie) FR MR Transferring Daa Size FR: fill rae, RR: rejecion raio, SR: SDBR rae and MR: miss rae We assumed he enire es models need he memory operaions which are Read-Z, Wrie-Z, Read-Color, Wrie-Color and Texure-Read. We assume ha he all operaions need 4 byes. This calculaion is based on he specificaions of curren convenional hardware and considered wihou any oher overheads. For example, he overhead of he deph filer in case of he 1-plane As more complex models become commonplace in 3D compuer graphics, i becomes increasingly imporan o remove unnecessary operaions ha wase memory bandwidh. In his paper, we presen an algorihm ha saves memory bandwidh in he rendering engine of 3D graphics hardware. GB/s 5 4 3 2 1 Required Memory Bandwidh A B C D Tes Scenes 1-Plane 2-Plane SDBR 3-Plane Convenional Fig. 8. Comparisons of he required memory bandwidh of all our sysems and a convenional sysem.
18 CHANG-HYO YU e al : AN ADAPTIVE SPATIAL DEPTH FILTER FOR 3D RENDERING IP Deph Filer is a filering mehod ha removes invisible pixels ahead of per-pixel pipelines. As a resul, i avoids unnecessary per-pixel operaions by adding exra hardware ino a raserizer placed in fron of he perpixel pipeline and a simple adapive block ino he convenional deph es block. Up o 71.7% of oal pixels were removed by his mehod for he complex es models as well as non-complex es models. The maximum bandwidh gain obained by he deph filer was 3.3. A novel adapaion algorihm makes he deph filer possible o use in acual siuaion; herefore i can be a useful mehod o alleviae memory bolenecks in 3D rendering engines. pipeline archiecure for 3D rendering processors, IEEE Proceedings of Inernaional Conference on Applicaion- Specific Sysems, Archiecures and Processors, pp 173-182, July, 22. [6] J. McCormack, R. McNamara, C. Gianos, L. Seiler, N. Jouppi, K. Correll, Neon: a single-chip 3D worksaion graphics acceleraor, Graphics Hardware Workshop 1998, pp 123-132, Augus, 1998. [7] L. Garber, The wild world of 3D graphics chips, IEEE Compuer, vol. 33, no. 9, pp 12 16, Sep, 2. [8] N. Greene, M. Kass and G. Miller, Hierarchical z-buffer visibiliy, ACM Proceedings of SIGGRAPH, pp 231 238, Aug, 1993. [9] R. Kempf and C. Frazier, OpenGL reference manual, Addison Wesley, 1996. ACKNOWLEDGMENT This research was sponsored by SAMSUNG elecronics and KOSEF hrough he MICROS a KAIST, Korea. Chang-Hyo Yu received he B.S. and M.S. degrees in Elecrical Engineering and Compuer Science from KAIST, Korea, in 21 and 23 respecively, and now he is a Ph.D. candidae in Elecrical Engineering and Compuer Science from KAIST. His research ineress include 3D graphics hardware design and mulimedia programmable processor design. REFERENCES [1] C.H. Yu and L.S. Kim, A hierarchical deph buffer for minimizing memory bandwidh in 3d rendering engine: deph filer, IEEE Proceedings of he 23 Inernaional Symposium on Circuis and Sysems, Vol. 2, pp 724 727, May, 23. [2] Inho. Lee, e al, A hardware-like highlevel language based environmen for 3d graphics archiecure exploraion, IEEE Proceedings of he 23 Inernaional Symposium on Circuis and Sysems, Vol. 2, pp 512 515, May, 23. [3] S. Morein, ATI Radeon Hyper-Z echnology, In Ho 3D Proceedings, Graphics Hardware Workshop 2, Inerlaken, Swizerland, Augus, 2. [4] ATI Corporaion, Radeon97pro, 22, hp://www.ai.c om /producs/pc/radeon97pro/index.hml and hp://ww w.ai.com /developer/echpapers.hml. [5] W.C. Park, e al, A mid-exuring pixel raserizaion Lee-Sup Kim received he B.S. degree in Elecronics Engineering from Seoul Naional Universiy, Korea, in 1982 and he M.S. and Ph.D. degrees in Elecrical Engineering from Sanford Universiy, Sanford, CA, in 1986 and 199, respecively. He was a pos-docoral fellow a he Toshiba Corporaion, Kawasaki, Japan, during 199-1993, where he was involved in he design of he high performance DSP and single chip MPEG2 decoder. Since March 1993, he has been a KAIST. In November 22, he became a Full Professor. During he year of 1998, he was on he sabbaical leave wih Chromaic Research and SandCraf Inc. in Silicon Valley. His research ineress are 3D graphics hardware design, LCD display conroller design, Mulimedia Programmable Processor design, and High speed and Low power digial IC design.