An Adaptive Spatial Depth Filter for 3D Rendering IP

Similar documents
Implementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report)

FIELD PROGRAMMABLE GATE ARRAY (FPGA) AS A NEW APPROACH TO IMPLEMENT THE CHAOTIC GENERATORS

A Matching Algorithm for Content-Based Image Retrieval

Spline Curves. Color Interpolation. Normal Interpolation. Last Time? Today. glshademodel (GL_SMOOTH); Adjacency Data Structures. Mesh Simplification

CAMERA CALIBRATION BY REGISTRATION STEREO RECONSTRUCTION TO 3D MODEL

A GRAPHICS PROCESSING UNIT IMPLEMENTATION OF THE PARTICLE FILTER

AML710 CAD LECTURE 11 SPACE CURVES. Space Curves Intrinsic properties Synthetic curves

A time-space consistency solution for hardware-in-the-loop simulation system

Image segmentation. Motivation. Objective. Definitions. A classification of segmentation techniques. Assumptions for thresholding

Network management and QoS provisioning - QoS in Frame Relay. . packet switching with virtual circuit service (virtual circuits are bidirectional);

MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES

A GRAPHICS PROCESSING UNIT IMPLEMENTATION OF THE PARTICLE FILTER

STEREO PLANE MATCHING TECHNIQUE

Effects needed for Realism. Ray Tracing. Ray Tracing: History. Outline. Foundations of Computer Graphics (Fall 2012)

Curves & Surfaces. Last Time? Today. Readings for Today (pick one) Limitations of Polygonal Meshes. Today. Adjacency Data Structures

MOTION DETECTORS GRAPH MATCHING LAB PRE-LAB QUESTIONS

4.1 3D GEOMETRIC TRANSFORMATIONS

MATH Differential Equations September 15, 2008 Project 1, Fall 2008 Due: September 24, 2008

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

COSC 3213: Computer Networks I Chapter 6 Handout # 7

Video Content Description Using Fuzzy Spatio-Temporal Relations

Learning in Games via Opponent Strategy Estimation and Policy Search

Why not experiment with the system itself? Ways to study a system System. Application areas. Different kinds of systems

Analysis of Various Types of Bugs in the Object Oriented Java Script Language Coding

source managemen, naming, proecion, and service provisions. This paper concenraes on he basic processor scheduling aspecs of resource managemen. 2 The

LOW-VELOCITY IMPACT LOCALIZATION OF THE COMPOSITE TUBE USING A NORMALIZED CROSS-CORRELATION METHOD

Gauss-Jordan Algorithm

Chapter 8 LOCATION SERVICES

Schedule. Curves & Surfaces. Questions? Last Time: Today. Limitations of Polygonal Meshes. Acceleration Data Structures.

A Fast Stereo-Based Multi-Person Tracking using an Approximated Likelihood Map for Overlapping Silhouette Templates

Today. Curves & Surfaces. Can We Disguise the Facets? Limitations of Polygonal Meshes. Better, but not always good enough

Scheduling. Scheduling. EDA421/DIT171 - Parallel and Distributed Real-Time Systems, Chalmers/GU, 2011/2012 Lecture #4 Updated March 16, 2012

Handling uncertainty in semantic information retrieval process

Open Access Research on an Improved Medical Image Enhancement Algorithm Based on P-M Model. Luo Aijing 1 and Yin Jin 2,* u = div( c u ) u

An efficient approach to improve throughput for TCP vegas in ad hoc network

STRING DESCRIPTIONS OF DATA FOR DISPLAY*

4 Error Control. 4.1 Issues with Reliable Protocols

Assignment 2. Due Monday Feb. 12, 10:00pm.

NEWTON S SECOND LAW OF MOTION

Improved TLD Algorithm for Face Tracking

Reinforcement Learning by Policy Improvement. Making Use of Experiences of The Other Tasks. Hajime Kimura and Shigenobu Kobayashi

Michiel Helder and Marielle C.T.A Geurts. Hoofdkantoor PTT Post / Dutch Postal Services Headquarters

Simple Network Management Based on PHP and SNMP

Last Time: Curves & Surfaces. Today. Questions? Limitations of Polygonal Meshes. Can We Disguise the Facets?

Sam knows that his MP3 player has 40% of its battery life left and that the battery charges by an additional 12 percentage points every 15 minutes.

Design Alternatives for a Thin Lens Spatial Integrator Array

EECS 487: Interactive Computer Graphics

User Adjustable Process Scheduling Mechanism for a Multiprocessor Embedded System

Accelerating Call Route Query of Multi-domain SIP System via P2P GONG Jing, SHEN Qing-guo, SHEN Huan-sheng

Virtual Recovery of Excavated Archaeological Finds

PART 1 REFERENCE INFORMATION CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONITOR

Coded Caching with Multiple File Requests

CENG 477 Introduction to Computer Graphics. Modeling Transformations

Difficulty-aware Hybrid Search in Peer-to-Peer Networks

Motion Level-of-Detail: A Simplification Method on Crowd Scene

Upper Body Tracking for Human-Machine Interaction with a Moving Camera

Design and Application of Computer-aided English Online Examination System NONG DeChang 1, a

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab

Image Based Computer-Aided Manufacturing Technology

Detection Tracking and Recognition of Human Poses for a Real Time Spatial Game

Optimal Crane Scheduling

Nonparametric CUSUM Charts for Process Variability

Research Article Particle Filtering: The Need for Speed

Audio Engineering Society. Convention Paper. Presented at the 119th Convention 2005 October 7 10 New York, New York USA

Chapter 4 Sequential Instructions

Dynamic Route Planning and Obstacle Avoidance Model for Unmanned Aerial Vehicles

Quantitative macro models feature an infinite number of periods A more realistic (?) view of time

Packet Scheduling in a Low-Latency Optical Interconnect with Electronic Buffers

NRMI: Natural and Efficient Middleware

LAMP: 3D Layered, Adaptive-resolution and Multiperspective Panorama - a New Scene Representation

Lecture 18: Mix net Voting Systems

Chapter 3 MEDIA ACCESS CONTROL

Moving Object Detection Using MRF Model and Entropy based Adaptive Thresholding

Outline. EECS Components and Design Techniques for Digital Systems. Lec 06 Using FSMs Review: Typical Controller: state

1 œ DRUM SET KEY. 8 Odd Meter Clave Conor Guilfoyle. Cowbell (neck) Cymbal. Hi-hat. Floor tom (shell) Clave block. Cowbell (mouth) Hi tom.

FACIAL ACTION TRACKING USING PARTICLE FILTERS AND ACTIVE APPEARANCE MODELS. Soumya Hamlaoui & Franck Davoine

Why Waste a Perfectly Good Abstraction?

Performance Evaluation of Implementing Calls Prioritization with Different Queuing Disciplines in Mobile Wireless Networks

Definition and examples of time series

A High-Speed Adaptive Multi-Module Structured Light Scanner

Point Cloud Representation of 3D Shape for Laser- Plasma Scanning 3D Display

An Improved Square-Root Nyquist Shaping Filter

LHP: An end-to-end reliable transport protocol over wireless data networks

Less Pessimistic Worst-Case Delay Analysis for Packet-Switched Networks

Utility-Based Hybrid Memory Management

PERFORMANCE OF TCP CONGESTION CONTROL IN UAV NETWORKS OF VARIOUS RADIO PROPAGATION MODELS

Data Structures and Algorithms. The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 2

Partition-based document identifier assignment (PBDIA) algorithm. (long queries)

CS422 Computer Networks

Improving Ranking of Search Engines Results Based on Power Links

An Efficient Delivery Scheme for Coded Caching

A Face Detection Method Based on Skin Color Model

Rao-Blackwellized Particle Filtering for Probing-Based 6-DOF Localization in Robotic Assembly

Motor Control. 5. Control. Motor Control. Motor Control

Delayed reservation decision in optical burst switching networks with optical buffers. Title. Li, GM; Li, VOK; Li, CY; Wai, PKA

Improving Explicit Congestion Notification with the Mark-Front Strategy

Simultaneous Precise Solutions to the Visibility Problem of Sculptured Models

A Tool for Multi-Hour ATM Network Design considering Mixed Peer-to-Peer and Client-Server based Services

A MRF formulation for coded structured light

J. Vis. Commun. Image R.

Transcription:

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.3, NO. 4, DECEMBER, 23 175 An Adapive Spaial Deph Filer for 3D Rendering IP Chang-Hyo Yu and Lee-Sup Kim Absrac In his paper, we presen a new mehod for early deph es for a 3D rendering engine. We add a filer sage o he raserizer in he 3D rendering engine, in an aemp o idenify and avoid he occluded pixels. This filering block deermines if a pixel is hidden by a cerain plane. If a pixel is hidden by he plane, i can be removed. The simulaion resuls show ha he filer reduces he number of pixels o he nex sage up o 71.7%. As a resul, 67% of memory bandwidh is saved wih simple exra hardware. Index Terms graphics, 3D rendering, deph es, Z buffer, early z es I. INTRODUCTION 3D graphics rendering process needs los of memory accesses which are Texure Mapping, Sencil Tes, Deph Tes and Color Blending. Because of he large memory accesses, he performance of he 3D rendering engine is highly depended on he bandwidh of memory sysem. Among he operaions of he 3D rendering engine, he exure mapping is he mos memory inensive operaion. In convenional high-performance rendering processors, exure mapping is performed before he z-es [4], [6], [7]. This archiecure properly suppors he semanics of he sandard APIs such as OpenGL [9], bu he major disadvanage is unnecessary exure mapping for invisible pixels; which causes a wase of he memory bandwidh. In order o remove such unnecessary Manuscrip received November 5, 23; revised November 26, 23. KAIST 373-1 Guseong-dong, Yuseong-gu, Daejeon, 35-71, Republic of Korea+82-42-869-4442 Email: {mosmov,lskim}@ mvlsi.kais.ac.kr operaions, exure mapping should be performed afer he z-es. However, a wider fragmen queue for he pipeline execuion is required. Moreover, he wide separaion beween reading and wriing a z-value makes mainaining he frame memory consisency more difficul because more han one fragmen may have he same pixel address. Hyper-Z echnology [3], ATI s archiecure for heir commercial produc, keeps a reduced resoluion of he z- buffer on he hierarchical z-buffer [8] and removes he fragmens wih z-es failures as early as possible; hey are removed from he pipeline before exure mapping. Afer his sep, exure mapping and he final z-es wih he full screen frame memory are performed. Wih his scheme, a considerable amoun (6 7% on he average) of fragmens wih z-es failures are deeced and hen discarded from he pipeline. However, he hierarchical z- buffer requires a very large daa srucure because i is consruced wih he z-pyramid [8]. Mainaining he hierarchical z-buffer for every frame memory updae may also bring an excessive compuaional burden. Mid-exuring [5] used wo-sage z-es operaion. They used he 1s z-es before he exuring and 2nd z- es afer he exuring. Their wo-sage z-es shares he z-buffer in he frame buffer. Unlike he Hyper-Z [3], i needs no an auxiliary z-buffer for he 1s z-es bu sill needs he overhead for he memory bandwidh o read a z-value for he 1s z-es. In his paper, we presen a simple soluion o perform he deph es parially before he exuring like Hyper-Z bu we need only 1 bi or 2 bis for a fragmen. And he daa is managed by independenly o he z-buffer of he frame buffer. Due o hese feaures, he overhead from he early deph es (like he 1s z-es [5]) is very smaller han previous mehods. We previously presened he basic Deph Filer algorihm [1] concepually. In his paper, we complee our algorihm such ha i is more reliable by using a newly added algorihm for adapive updaing while

176 CHANG-HYO YU e al : AN ADAPTIVE SPATIAL DEPTH FILTER FOR 3D RENDERING IP skipping unnecessary deph-read operaions. The remainder of he paper is arranged as follows. Secion II shows he advanced algorihm of he basic deph filer. Secion III shows he adapaion mehod and secion IV shows he simulaion environmen and resuls. Secion V discusses he resuls while he las secion concludes his paper. II. DEPTH FILTER We use a hierarchical concep wherein a deph filer performs an early-deph-es in he raserizer. This newly added block uses only 1 bi or 2 bis, compared o 24 bis or 32 bis precision in he ATI s Hyper-Z [3] mehod. By his simple es, we rejec a large porion of pixels. Fig.1 shows he block diagram of he modified rendering engine. Since our previous basic deph filer [1] has a fixed filer posiion, i could no be used in acual siuaion. However, an adapaion block is newly added o he previous deph filer. I makes a signal o he deph filer so ha he deph filer updaes is filer o he opimal posiion. In case of he 3-plane sysem of he basic deph filer, we already know ha he deph filer has beer performance due o a fine resoluion of filering, bu if we use he hird plane of 3-plane sysem as a special purpose mask plane, he deph filer has good resul even in a less deph complexiy cases. Fig. 2 shows he modified 3-plane sysem, called 2-plane Skipping Deph Buffer Reading (SDBR) sysem. Triangle Daa Raserizer Deph Filer Ex. MEMORY in Framebuffer Texure Fog, Sencil, Alpha Tes Deph Tes Adapaion To. Per-pixel operaions Fig. 1. A block diagram of he proposed archiecure of 3D rendering Engine. y. NP x DF1 DF Values DF2 1. SDBR mask z DF3 : Encoded Deph Filer Fig. 2. A concepual picure of 3-plane sysem or 2-plane SDBR sysem. : 1 : 1 : 11 A convenional z-es needs a z value of a deph buffer. When a pixel is rendered o a cerain coordinae for he firs ime, i is no necessary o read he z value from he deph buffer because he value of he deph buffer would be 1. (defaul seing). To remove (or skip) his needless read operaion, he deph filer provides perinen informaion. In Fig. 2, if we move DF3 o he Far plane, DF3 is used for soring wheher he pixel has been rendered before. In oher words, his plane holds he informaion wheher he deph in his coordinae is valid or no. If he value of his plane is 1, he curren coordinae had been updaed before. Therefore is deph informaion is valid. When he value of his plane is, we only wrie a deph value o he deph buffer in he frame memory wihou reading he curren deph value. The deph filer sends fragmens o a pixel processor wih his informaion, usually a 1 bi signal. The deph es block can save memory bandwidh wased on reading useless deph. III. ADAPTIVE UPDATING OF THE FILTER POSITION The deph filer uses he mask plane in he z- coordinae o remove he pixels. This plane should be locaed a an opimal posiion so as o yield a maximum rejeced pixels. Therefore he posiion of he deph filer is one of he mos imporan aspecs in our sysem. A. Characerisics of pixels To find an opimum posiion of he filer, we focus on he pixel characerisics. A 3D applicaion s objec daa is disribued by a user a any range of z-coordinaes which are from he near-plane o he far-plane [., 1.]. However, here is no general closed form for he disribuion of pixels. Since he pixels have only nondeerminisic disribuions in he general 3D applicaion s objec daa, we use simple closed form of disribuion funcion only o know and approximae he characerisics of pixel disribuion. The simple models of disribuion funcions are a uniform disribuion funcion, a riangle-shape funcion and a more complex polynomial funcion. These hree ypes of pixel disribuions are shown in Fig. 3.

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.3, NO. 4, DECEMBER, 23 177 p(z) 4.5 4. p 3.5 3. 2.5 2. 1.5 1..5 Normalized Disribuions.45544.45544 z) = + 2 ( z.26) +.1 ( z.8) +.4) ( 3 2.68316 4z, z <.5 p ( z = 2 ) 4 4z,.5 z 1 p1 ( z) = 1.1.2.3.4.5.6.7.8.9 1 z Fig. 3. Normalized disribuions of he hree ypes simplified models. Near plane Deph Filer FP RP SP Far plane Fig. 4. FP and SP are passed o he nex pipeline and RP is removed. Le be an arbirary posiion of z-coordinaes, FP ( = p( z) dz ( = 1 BP p( z) dz (1) where p(z) is he normalized densiy funcion of he pixel daa. From hese equaions, FP((Pixels in Fron of and BP( (Pixels Behind of always have a monoonic funcion of because here is no negaive value of p(z). If he deph filer is on, pixels can be divided ino hree pars as shown in Fig. 4. The pixels, farher han (BP(), are divided ino wo classes by he deph filer. The firs is he sum of survived pixels, SP. These pixels have been esed by he deph filer bu could no be rejeced. The second is he sum of rejeced pixels, RP, by he deph filer. B. An opimal posiion of deph filer To find a maximum RP, we use anoher convenien form of RP. RP can be wrien as RP = BP SP = ( ToalPixels FP) SP (2) ( FP SP) = ToalPixel s + where FP + BP = ToalPixels, SP = BP RP By equaion (2), he maximum value of RP can be obained by he minimum value of (FP + SP). We find a minimum value of (FP + SP) insead of finding an opimal posiion of he deph filer. NRP is defined as (FP + SP). NRP is he negaion of RP, which has a minimum value when RP has a maximum value. This is he firs propery o find an opimal posiion of he deph filer. And he following wo properies are derived from he naural characerisics of he deph filer. Propery 1: The deph filer has a maximum rejecion raio when NRP has a minimum. Propery 2: The deph filer has a larger probabiliy o rejec pixels as he posiion of he deph filer moves o 1. in he z-coordinae because here are more pixels ha can mask he deph filer. Propery 3: The deph filer has a larger probabiliy o rejec pixels as he posiion of he deph filer moves o. in he z-coordinae because here are more pixels ha can be rejeced by he deph filer. The RP( can be approximaed simply o α 1 by propery 2 and α 2 (1 by propery 3. α 1 and α 2 are consans and and from hese properies. We hen rewrie RP( ( α1 ( ) ( α 2 (1 ) = β(1 1 are proporional facors, α = β( 1 (3) whereα is he proporion of BP(. α has a value from o 1. If RP( is, α is and if RP( is he same as BP(, α is 1. We assumed α o be.75 a he maximum (he simulaion resul shows abou 72% (RP(/(FP(+BP() =.72) of oal pixels are rejeced, hen he proporion of α (RP(/BP() is larger han.72, in es model (a)). Therefore he range of α is α.75. Then we se he consan, β, o be 3 because he value of ( 1 is.25 a he maximum. Since he pixels of FP( are no changed by he deph filer, RP( only has a proporion of BP(. The funcion of SP( is also monoonic because SP( is he value of BP( RP( and none of funcions have negaive values.

178 CHANG-HYO YU e al : AN ADAPTIVE SPATIAL DEPTH FILTER FOR 3D RENDERING IP 1..8.6.4.2. (a) p ( ) (b) p 2 ( z) (c) p 3 ( z) NRP( 1 z FP( SP(. FP( RP( NRP( BP( SP(.2.4.6.8 1. NRP( FP( SP( Fig. 5. FP( and SP( wih heir sum, NRP(. RP( has a maximum when FP( = SP(. The curve of SP( has he same value as BP( a boh sides and is lowered by RP(. In oher words, he curve of SP( is a seeper version of he BP( curve. We rewrie he equaions, RP( = BP( 3(1 ( 1 3(1 )) SP( = BP( RP( = BP( (4) ( 1 3(1 )) NRP( = FP( + SP( = FP( + BP( The approximaed opimal posiion, deermined graphically, is obained when FP( = SP(. Since SP( is also a monoone decreasing funcion, his approximaion is accepable. Simulaed NRP(s of he models are shown in Fig.5. Our simplified models are well conduced o find an approximaed opimum posiion. There can be an error a specific cases, for example, all objecs are rendered in back-o-fron order. However, he general 3D models have similar characerisics as shown here. From his algorihm, we place an adapaion block ino a convenional deph es block o adjus he filer posiion. The adapaion block makes a signal, which racks he filer posiion o he cross poin of FP and SP, o he deph filer a every frame. The hardware overhead of our adapaion algorihm is very small; which is composed of wo couners (for FP and SP) and leading-one deecors (for finding deference beween he FP and SP) and some oher operaors. Therefore i is no burden o implemen ino a convenional deph es block. IV. SIMULATION AND RESULTS A. Simulaion environmen For simulaion, we used Graphics Archiecure Tesing Environmen (GATE) [2] based on Microsof visual C++. GATE models overall graphics hardware archiecure hrough a modular approach, suppors OpenGL, and offers easy modificaion and rapid esing of archiecure. I also gahers compuaional saisics. The performance and oher saisical daa can be obained from GATE. We added he deph filer block ino he raserizer of he GATE and he adapaion block ino he convenional deph es block. A single bi of he SDBR signal is fed o he convenional pipelines beween he deph filer and he adapaion block. B. Resuls The es models are shown in Fig. 6. Model (a) is composed of 1 hexagonal columns generaed randomly. Model (b) and (c) are general 3D objecs. Model (d) is an image by exure mapping for esing he 2-plane SDBR sysem. I has 1. of deph complexiy and is used for he wors case of he deph filer. All ess are conduced on a 512 x 512 screen and Back face culling is disabled. We use a replicaed basic objec so as o have high deph complexiy. Fig. 6. Tes models wih differen deph complexiy. ( ) represens a model s deph complexiy. Rejecion Raio Filer posiion.8.7.6.5.4.3.2.1.9.8.7.6.5.4.3.2 (a) Scene (a) Scene (b) Scene (c).1 (c) 1 2 3 4 5 6 7 8 9 1.7.6.5.4.3.2.1 (b) 1-Plane 2-Plane SDBR 3-Plane.6.5.4.3.2.1 (d) 1 2 3 4 5 6 7 8 9 1 Frames Fig. 7. Rejecion raios of he models and each sysem. Graph (a) shows he resuls of he hree kinds of es models in he 3- plane sysem. Graph (b) shows he resul of he hree ypes of sysems for es model (b). Graph (c) is he corresponding filer posiion of graph (a) and Graph (d) is ha of graph (b) Rejecion Raio Filer posiion

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.3, NO. 4, DECEMBER, 23 179 Table 1. RR (Rejecion raio), ERR (Effecive rejecion raio) of he es models. RR ( ERR ) Model (a) Model (b) Model (c) 1-Plane 2-Plane SDBR 3-Plane 63 (79.6) 7.9 (81.9) 71.7 (88.9) 57.9 (79.6) 63.2 (87.) 63.6 (87.4) 3.6 (64.3) 35.3 (74.2) 4.4 (77.2) Fig.7 shows he rejecion raio of he deph filer, (RP(/Toal Pixels). All he sysems and es models have large amouns of rejeced pixels and all converge o he approximaed maximum rejecion in hree frames. The 3-plane sysem racks faser han he ohers and has he bes rejecion raio. Alhough he 2-plane SDBR sysem is slower han he 3-plane sysem, is opimal rejecion raio is similar o he 3-plane sysem. The converged rejecion raios of he es models are shown in able 1. V. DISCUSSIONS We define a required memory bandwidh o compare our sysems wih a convenional sysem. The convenional mehod is no one of he oher ypes of hierarchical mehods such as he Hyper-Z [3], ec. Raher i is he convenional Z buffer mehod. We canno compare wih commercial mehods because hey are no in public. sysem is o be 2 2.6G.57 64 1 bi = 2.371GB/sec in he es model (b). We used he specificaions of Radeon97pro [4], which has a 2.6gigapixels/sec fill rae. As shown in Fig. 8, he 2-Plane SDBR sysem has he bes performance, no only for he complex model (model (a)) bu also for he less complex model (model (d), a exure). In Fig. 8 (d), since es model (d) has 1. of deph complexiy, here is no rejeced pixel. Therefore he required memory bandwidh of he 1-Plane and he 3-Plane sysem is larger han ha of convenional bu even hough he overhead of he deph filer, he 2-Plane SDBR sysem is always beer han he convenional deph es. The effec obained by skipping an unnecessary read operaion is more efficien han he effec obained by adding one plane from a 2-plane o a 3-plane sysem. Hyper-Z [3] shows an effecive rejecion raio of 44 ~ 93% in he es bench bu hey need large inernal memory and synchronizaion of he inernal memory and he deph buffer of he frame buffer. Our algorihm shows an effecive rejecion raio of 64~88% wih only small inernal memory (or regiser) by he deph filer using he adapive mehod. VI. CONCLUSIONS Required Memory Bandwidh = (FR RR B/pixel) + (FR SR 16 B/pixel) +FR (1 RR SR) 2 B/pixel + Deph Filer Overhead Deph Filer Overhead = (Read, Wrie) FR MR Transferring Daa Size FR: fill rae, RR: rejecion raio, SR: SDBR rae and MR: miss rae We assumed he enire es models need he memory operaions which are Read-Z, Wrie-Z, Read-Color, Wrie-Color and Texure-Read. We assume ha he all operaions need 4 byes. This calculaion is based on he specificaions of curren convenional hardware and considered wihou any oher overheads. For example, he overhead of he deph filer in case of he 1-plane As more complex models become commonplace in 3D compuer graphics, i becomes increasingly imporan o remove unnecessary operaions ha wase memory bandwidh. In his paper, we presen an algorihm ha saves memory bandwidh in he rendering engine of 3D graphics hardware. GB/s 5 4 3 2 1 Required Memory Bandwidh A B C D Tes Scenes 1-Plane 2-Plane SDBR 3-Plane Convenional Fig. 8. Comparisons of he required memory bandwidh of all our sysems and a convenional sysem.

18 CHANG-HYO YU e al : AN ADAPTIVE SPATIAL DEPTH FILTER FOR 3D RENDERING IP Deph Filer is a filering mehod ha removes invisible pixels ahead of per-pixel pipelines. As a resul, i avoids unnecessary per-pixel operaions by adding exra hardware ino a raserizer placed in fron of he perpixel pipeline and a simple adapive block ino he convenional deph es block. Up o 71.7% of oal pixels were removed by his mehod for he complex es models as well as non-complex es models. The maximum bandwidh gain obained by he deph filer was 3.3. A novel adapaion algorihm makes he deph filer possible o use in acual siuaion; herefore i can be a useful mehod o alleviae memory bolenecks in 3D rendering engines. pipeline archiecure for 3D rendering processors, IEEE Proceedings of Inernaional Conference on Applicaion- Specific Sysems, Archiecures and Processors, pp 173-182, July, 22. [6] J. McCormack, R. McNamara, C. Gianos, L. Seiler, N. Jouppi, K. Correll, Neon: a single-chip 3D worksaion graphics acceleraor, Graphics Hardware Workshop 1998, pp 123-132, Augus, 1998. [7] L. Garber, The wild world of 3D graphics chips, IEEE Compuer, vol. 33, no. 9, pp 12 16, Sep, 2. [8] N. Greene, M. Kass and G. Miller, Hierarchical z-buffer visibiliy, ACM Proceedings of SIGGRAPH, pp 231 238, Aug, 1993. [9] R. Kempf and C. Frazier, OpenGL reference manual, Addison Wesley, 1996. ACKNOWLEDGMENT This research was sponsored by SAMSUNG elecronics and KOSEF hrough he MICROS a KAIST, Korea. Chang-Hyo Yu received he B.S. and M.S. degrees in Elecrical Engineering and Compuer Science from KAIST, Korea, in 21 and 23 respecively, and now he is a Ph.D. candidae in Elecrical Engineering and Compuer Science from KAIST. His research ineress include 3D graphics hardware design and mulimedia programmable processor design. REFERENCES [1] C.H. Yu and L.S. Kim, A hierarchical deph buffer for minimizing memory bandwidh in 3d rendering engine: deph filer, IEEE Proceedings of he 23 Inernaional Symposium on Circuis and Sysems, Vol. 2, pp 724 727, May, 23. [2] Inho. Lee, e al, A hardware-like highlevel language based environmen for 3d graphics archiecure exploraion, IEEE Proceedings of he 23 Inernaional Symposium on Circuis and Sysems, Vol. 2, pp 512 515, May, 23. [3] S. Morein, ATI Radeon Hyper-Z echnology, In Ho 3D Proceedings, Graphics Hardware Workshop 2, Inerlaken, Swizerland, Augus, 2. [4] ATI Corporaion, Radeon97pro, 22, hp://www.ai.c om /producs/pc/radeon97pro/index.hml and hp://ww w.ai.com /developer/echpapers.hml. [5] W.C. Park, e al, A mid-exuring pixel raserizaion Lee-Sup Kim received he B.S. degree in Elecronics Engineering from Seoul Naional Universiy, Korea, in 1982 and he M.S. and Ph.D. degrees in Elecrical Engineering from Sanford Universiy, Sanford, CA, in 1986 and 199, respecively. He was a pos-docoral fellow a he Toshiba Corporaion, Kawasaki, Japan, during 199-1993, where he was involved in he design of he high performance DSP and single chip MPEG2 decoder. Since March 1993, he has been a KAIST. In November 22, he became a Full Professor. During he year of 1998, he was on he sabbaical leave wih Chromaic Research and SandCraf Inc. in Silicon Valley. His research ineress are 3D graphics hardware design, LCD display conroller design, Mulimedia Programmable Processor design, and High speed and Low power digial IC design.