Sheared Interpolation and Gradient Estimation. for Real-Time Volume Rendering. Hanspeter Pster, Frank Wessels, and Arie Kaufman

Similar documents
Sheared Interpolation and Gradient Estimation. State University of New York at Stony Brook. sample value, a local shading model is applied and a

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only

Accuracy Improvement in Camera Calibration

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Elementary Educational Computer

Ones Assignment Method for Solving Traveling Salesman Problem

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

Computer Graphics Hardware An Overview

Pattern Recognition Systems Lab 1 Least Mean Squares

Normals. In OpenGL the normal vector is part of the state Set by glnormal*()

Alpha Individual Solutions MAΘ National Convention 2013

A Note on Least-norm Solution of Global WireWarping

IMP: Superposer Integrated Morphometrics Package Superposition Tool

Image Segmentation EEE 508

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA

Dynamic Programming and Curve Fitting Based Road Boundary Detection

CS Polygon Scan Conversion. Slide 1

Cubic Polynomial Curves with a Shape Parameter

Parallel Polygon Approximation Algorithm Targeted at Reconfigurable Multi-Ring Hardware

Fast Fourier Transform (FFT) Algorithms

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Polymorph: Morphing Among Multiple Images

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

3D Model Retrieval Method Based on Sample Prediction

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Chapter 3 Classification of FFT Processor Algorithms

Assignment 5; Due Friday, February 10

. Written in factored form it is easy to see that the roots are 2, 2, i,

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Lecture 5. Counting Sort / Radix Sort

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

An array based design for Real-Time Volume Rendering

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Texture Mapping. Jian Huang. This set of slides references the ones used at Ohio State for instruction.

EE123 Digital Signal Processing

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Analysis of Algorithms

Math 10C Long Range Plans

condition w i B i S maximum u i

1. Introduction o Microscopic property responsible for MRI Show and discuss graphics that go from macro to H nucleus with N-S pole

Performance Plus Software Parameter Definitions

Data Structures and Algorithms. Analysis of Algorithms

1. SWITCHING FUNDAMENTALS

SD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters.

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c

COMP 558 lecture 6 Sept. 27, 2010

Consider the following population data for the state of California. Year Population

RESEARCH ON AUTOMATIC INSPECTION TECHNIQUE OF REAL-TIME RADIOGRAPHY FOR TURBINE-BLADE

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

Improving Template Based Spike Detection

Algorithms for Disk Covering Problems with the Most Points

6.854J / J Advanced Algorithms Fall 2008

Appendix D. Controller Implementation

Analysis of Algorithms

South Slave Divisional Education Council. Math 10C

GPUMP: a Multiple-Precision Integer Library for GPUs

Combination Labelings Of Graphs

ECE4050 Data Structures and Algorithms. Lecture 6: Searching

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Octahedral Graph Scaling

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Parabolic Path to a Best Best-Fit Line:

arxiv: v2 [cs.ds] 24 Mar 2018

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

A General Framework for Accurate Statistical Timing Analysis Considering Correlations

Operating System Concepts. Operating System Concepts

Lighting and Shading. Outline. Raytracing Example. Global Illumination. Local Illumination. Radiosity Example

Evaluation of Different Fitness Functions for the Evolutionary Testing of an Autonomous Parking System

Mobile terminal 3D image reconstruction program development based on Android Lin Qinhua

Force Network Analysis using Complementary Energy

On (K t e)-saturated Graphs

Computer Systems - HS

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

performance to the performance they can experience when they use the services from a xed location.

ISSN (Print) Research Article. *Corresponding author Nengfa Hu

Filter design. 1 Design considerations: a framework. 2 Finite impulse response (FIR) filter design

Fast Interpolation of Grid Data at a Non-Grid Point

The Magma Database file formats

. Perform a geometric (ray-optics) construction (i.e., draw in the rays on the diagram) to show where the final image is formed.

Descriptive Statistics Summary Lists

WestminsterResearch

An Efficient Algorithm for Graph Bisection of Triangularizations

A Resource for Free-standing Mathematics Qualifications

The Nature of Light. Chapter 22. Geometric Optics Using a Ray Approximation. Ray Approximation

Designing a learning system

A Study on the Performance of Cholesky-Factorization using MPI

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

An Efficient Algorithm for Graph Bisection of Triangularizations

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a

Stone Images Retrieval Based on Color Histogram

How do we evaluate algorithms?

Fast algorithm for skew detection. Adnan Amin, Stephen Fischer, Tony Parkinson, and Ricky Shiu

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

Graphics (Output) Primitives. Chapters 3 & 4

Transcription:

Sheared Iterpolatio ad Gradiet Estimatio for Real-Time Volume Rederig Haspeter Pster, Frak Wessels, ad Arie Kaufma Departmet of Computer Sciece State Uiversity of New York at Stoy Brook Stoy Brook, NY 11794{4400, U.S.A. Abstract I this paper we preset a techique for the iteractive cotrol ad display of static ad dyamic 3D datasets. We describe ovel ways of tri-liear iterpolatio ad gradiet estimatio for a real-time volume rederig system, usig coherecy betwee rays. We show simulatio results that compare the proposed methods to traditioal algorithms ad preset them i the cotext of Cube-3, a special-purpose architecture capable of rederig 512 3 16-bit per voxel datasets at over 20 frames per secod. 1. Itroductio Numerous scietic applicatios, icludig biomedical ad geophysical aalysis, computatioal uid dyamics ad ite elemet models, require the rapid display of dyamically acquired or computer geerated 3D datasets. Real-time visualizatio of dyamic volume data, called 4D (spatial-temporal) visualizatio, permits observatio of 3D data chages, such as the study of uid ow i rocks or the study of a beatig heart. I order to reveal the iteral structure of the data, direct volume rederig methods have to be employed that geerate a image without pre-processig ad allow for the iteractive cotrol of viewig parameters [6]. The massive computatioal resources ecessary to achieve 4D visualizatio at high frame rates place hard to meet requiremets o sequetial implemetatios ad geeral-purpose computers. Oly parallelism amog a dedicated set of processors ca achieve the ecessary high memory badwidth ad arithmetic performace [4, 7, 12, 15] [6, Chapter 6]. While relatively fast algorithms exist for the display of static datasets o massively parallel architectures [17, 18], very little attetio has bee paid to the real-time visualizatio of dyamically chagig high-resolutio 3D data. This is the mai objective of Cube-3, a special-purpose architecture capable of rederig 512 3 16-bit per voxel datasets at over 20 frames per secod [16]. Authors' email: pster@cs.suysb.edu, frakw@cs.suysb.edu, ari@cs.suysb.edu Cube-3 implemets ray-castig, a powerful volume rederig techique that oers high image quality while allowig for algorithmic optimizatios which sigicatly reduce image geeratio times [6, 13, 14]. Rays are cast from the viewig positio ito the volume data. At evely spaced locatios alog each ray, the data is tri-liearly iterpolated usig values of surroudig voxels. Cetral diereces of voxels aroud the sample poit yield a gradiet which is used as a surface ormal approximatio. Usig the gradiet ad the iterpolated sample value, a local shadig model is applied ad a sample opacity is assiged. Fially, ray samples alog the ray are composited ito pixel values to produce a image [11]. A importat problem of ray-castig is the ouiform mappig of samples oto voxels, sice voxels may cotai more tha oe ray sample or may be ivolved i multiple gradiet calculatios. This leads to redudat data accesses ad irregular iterprocessor commuicatio that aect the performace. I Cube-3 we use a ray-castig approach that trasforms the volume ito a itermediate coordiate system for which there is a mappig of ray samples oto the volume that is oe-to-oe. This allows for eciet projectios oto a face of the volume, ad the distorted image is the warped (2D trasformed ad projected) oto the view plae. Usig a similar approach, Yagel ad Kaufma [20] describe a template based ray-castig scheme to simplify path geeratio for rays through the volume, ad Schroder ad Stoll [17] have implemeted this method o a Priceto Egie of 1024 processors ad have achieved sub-secod rederig times for a 128 3 dataset. Camero ad Uderill [3] ecietly use a itermediate volume trasformatio to reduce data commuicatio i a SIMD parallel processor. Lacroute ad Levoy [10] recetly reported o a fast implemetatio usig a shear-warp trasformatio ad were able to achieve iteractive rederig times for 256 3 datasets o a graphics workstatio. All these implemetatios require a pre-processig step to calculate the gradiet eld or to geerate color ad opacity volumes ad are therefore ot suitable for 4D visualizatio. Appeared i Eurographics Hardware Workshop, Oslo, September 94 1

This paper presets two ew methods that allow for real-time tri-liear iterpolatio ad gradiet estimatio without pre-computatio. They are suitable for 4D visualizatio ad lead to a eciet implemetatio i hardware. Sectio 2 describes the uderlyig realtime ray-castig approach that trasforms the volume ito a itermediate sheared coordiate space. Sectio 3 discusses the problems associated with performig iterpolatio i this sheared space ad itroduces sheared tri-liear iterpolatio as a eective solutio. We the preset a ew way of gradiet approximatio usig coherecy betwee rays i Sectio 4. Sectio 5 describes the mai architectural features of Cube-3 ad Sectio 6 gives results o the proposed iterpolatio ad shadig methods. 2. Real-Time Ray-Castig Our real-time ray-castig algorithm assumes that the volume is sampled o a rectiliear grid. A distorted itermediate image is projected oto the volume face that is most perpedicular to the viewig directio. Usig a term by Yagel ad Kaufma [20] we call this face the base-plae. A 2D warp of the base-plae projectio produces the al image. The rst step is to trasform the volume ito a itermediate coordiate system for which there is a simple mappig of voxels oto base-plae pixels. I a recet approach, Lacroute ad Levoy [10] use a shear-warp factorizatio of the viewig trasform ad project the volume i a slice-parallel fashio oto the base-plae. The volume is treated as a set of 2D slices which are subject to a 2D shear-scale ad resamplig operatio accordig to the viewig trasform. Each slice is treated idepedetly without computig idividual rays, ad the resultig base-plae image is warped oto the viewig plae. Other approaches [20] operate i a ray-parallel fashio, where resamplig ad compositig operatios take place o rays cast from each pixel of the base-plae. I both approaches the 3D volume is traversed oly oce per projectio. The algorithms ivolve oe resamplig of the volume ad a iexpesive 2D image warp. I Cube-3 we adopted the ray-parallel approach because it allows for eciet parallel implemetatios of compositig alog rays. Usig a techique by Yagel ad Kaufma [20], we geerate lookup tables or templates to cast discrete rays from the base-plae ito the volume. Figure 1 shows a example of a parallel ad perspective projectio. 26-coected discrete lies are pre-geerated usig a 3D variatio [8] [6, pp. 280{301] of Breseham's algorithm modied for o-iteger edpoits. This algorithm guaratees costat steppig by a distace of oe alog the major axis (the Z-axis i Figure 1). The steppig alog the two other axes (the X- ad Y-axes i Figure 1) is stored i two templates. For parallel projectios, where eighborig rays follow the exact same path through the volume, the templates store positios for a 3 volume. For perspective projectios they are of size 2 each (see Figure 1). Y-Template Y Z X X-Template Y-Template a) Parallel Projectio b) Perspective Projectio Y Z X X-Template Figure 1: X/Y-Templates for Discrete Rays. Figure 2 schematically shows how the algorithm proceeds. All the discrete rays belogig to the same scalie of the base-plae image reside o the same plae iside the volume, called the Projectio Ray Plae (PRP). By fetchig all voxels o a PRP ad trasformig them accordigly ito a 2D buer, all discrete rays ca be aliged alog a directio parallel to a axis, e.g. horizotal. If we dee beams to be rays parallel to a mai axis of the Cubic Frame Buer (CFB), the for parallel projectios this trasformatio is simply a shear of beams to the left or right (see Figure 2). For perspective projectios each voxel belogig to a discrete ray has to be shifted by a dieret amout. We refer to this process as de-faig, sice divergig rays are stored adjacet to each other i the 2D buer. Orthogoal Beams Image Plae Fetch ad Shear Above PRP Curret Ray Cubic Frame Buffer (CFB) Compositig ad 2D Warp Tri-Liear Iterpolatio Above Curret Below Gradiet Estimatio ad Shadig Figure 2: Real-Time Ray-Castig. As soo as two PRPs are stored i two 2D buers (referred to as the above ad curret buers i Figure 2), a tri-liear iterpolatio is performed to geerate Appeared i Eurographics Hardware Workshop, Oslo, September 94 2

sample poits o cotiuous rays usig the voxels of four discrete rays as iput data (see Sectio 3). The two 2D buers geerate oe iterpolated plae of cotiuous rays. Three such plaes, above, below ad curret, are eeded for local gradiet approximatios usig eighborig rays (see Sectio 4). The samples of the rays are shaded ad opacities are assiged usig a user cotrollable trasfer fuctio. The shaded rays are composited ito a al pixel color usig a parallel implemetatio of the frot-to-back (or back-to-frot) compositig: C 0 = C L + (1? L )C R 0 = L + (1? L ) R (1) Here the subscripts L ad R idicate sample color C or opacity from left or right childre of the biary tree, respectively. Other parallel projectio schemes such as rst or last opaque projectio, maximum or miimum voxel value ad weighted summatio ca also be employed. The ext sectio discusses the issues of tri-liear iterpolatio betwee discrete rays to geerate cotiuous rays, ad Sectio 4 shows how to compute the local gradiet at each cotiuous sample poit. 3. Sheared Tri-Liear Iterpolatio Tri-liear iterpolatio geerates a value at oiteger locatios by fetchig the eight surroudig voxels ad iterpolatig as follows: P abc = P 000 (1? a)(1? b)(1? c) + P 100 a(1? b)(1? c) + P 010 (1? a)b(1? c) + P 001 (1? a)(1? b)c + P 101 a(1? b)c + P 011 (1? a)bc + P 110 ab(1? c) + P 111 abc: (2) Here the relative 3D coordiate of a sample poit withi a cube with respect to the corer voxel closest to the origi is ha; b; ci ad the data values associated with the corer voxels of the cube are P ijk, where i, j, k = 0 or 1, ad the iterpolated data value associated with the sample poit is P abc. Dieret optimizatios aim at reducig the arithmetic complexity of this operatio [9, 16], but the arbitrary memory access to fetch eight eighborig voxels for each sample poit makes this oe of the most time cosumig operatios durig volume rederig. By trasformig discrete rays from the PRP so that they are aliged ad storig them i two 2D buers (see Figure 2), we ca greatly reduce this data access ad commuicatio cost. Istead of fetchig the eighteighborhood of each resamplig locatio, four discrete rays are fetched from the buer, two from each of the above ad below plaes. I parallel implemetatios, eighborig rays reside i adjacet iterpolatio modules, requirig oly a local shift operatio of oe voxel uit betwee eighbors. S5 S4 S3 S2 S1 a) Parallel Projectio A B A B b) Perspective Projectio Discrete Ray A Discrete Ray B Missig Voxels Figure 3: Problems with Discrete Ray Iterpolatio. However, there is a problem itrisic to iterpolatio betwee discrete rays. Figure 3 illustrates this i 2D. The samples o the cotiuous ray have to be iterpolated usig bi-liear iterpolatio betwee samples of the discrete rays A (white) ad B (black). Sample S1 ca be correctly iterpolated usig four voxels from A ad B, sice they form a rectagle, i.e., the rays do ot make a discrete step to the left or right. As soo as the discrete rays step to the left or right as is the case for samples S2 ad S4, the eighborig voxels form a parallelogram, ad a straightforward biliear iterpolatio would produce the wrog sample values. The grey shaded square voxels i Figure 3a would be eeded to yield the correct result, but they reside o rays two uits apart from ray B. This problem is exacerbated for perspective projectios (Figure 3b). The discrete rays diverge, ad the correct eighborig voxels are ot eve stored i the 2D plae buers. For example, oly two voxels of ray A cotribute to the correct iterpolatio at sample poit S3. I the 3D case as may as six voxels may be missig i the immediate eighborhood of a sample poit for perspective projectios. The solutio is to perform a sheared tri-liear iterpolatio by factorig it ito four liear ad oe biliear iterpolatio. Istead of specifyig the sample locatio with respect to a corer voxel closest to the origi, each 3D coordiate alog the ray cosists of relative weights for liear iterpolatios alog each axis i possibly sheared voxel eighborhoods. These weights ca be pre-computed ad stored i the X/Y-templates discussed i Sectio 2. Figure 4 shows the ecessary iterpolatio steps i 3D. First we perform four liear iterpolatios i directio of the major axis (the Z-axis i Figure 4) usig Appeared i Eurographics Hardware Workshop, Oslo, September 94 3

Discrete Ray A Discrete Ray B Liearly Iterpolated Samples Ray Sample Y Z X a) Parallel Projectio Back Plaes Frot Plaes b) Perspective Projectio Figure 4: Sheared Tri-Liear Iterpolatio. eight voxels of four eighborig discrete rays iside the 2D buers. These eight voxels are the vertices of a oblique parallelepiped for parallel projectios (see Figure 4a) or of a frustum of a pyramid for perspective projectios (see Figure 4b). Four voxels each reside o two separate plaes oe uit apart, which we call the frot or the back plae depedig o whe it is ecoutered durig ray traversal i the directio of the major axis. Therefore, oly oe weight factor has to be stored, correspodig to the distace betwee the frot plae ad the positio of the ray sample poit. The resultig four iterpolated values form a rectagle ad ca be bi-liearly iterpolated to yield the al sample value. We split this bi-liear iterpolatio ito two liear iterpolatios betwee the corer values ad a al liear iterpolatio betwee the edge values. At the bottom of Figure 4 this is show as two iterpolatios i X- directio followed by oe iterpolatio i Y-directio. Discrete Ray A Discrete Ray B Out-of-Rage Samples a) No Offset b) Offset i Rage c) Offset out of Rage Figure 5: Variable Ray Osets i Major Directio. The sample poits correspodig to the cotiuous rays have to be iside the polyhedro deed by the voxels o the four surroudig discrete rays. Whe costructig the discrete rays, all cotiuous rays start at iteger positios of the base plae, i.e., they coicide with voxels of the rst slice of the volume dataset. However, as Figure 5a shows, usig these rays durig raycastig eectively reduces the tri-liear iterpolatio to a bi-liear iterpolatio, because all sample poits alog the ray fall oto the frot plaes of the parallelepipeds or pyramid frustum. Usig X ad Y iteger positios o the base-plae we ca allow a oset from the base-plae i major directio as a degree of freedom ad are able to perform sheared tri-liear iterpolatios (Figure 5b). But for osets i major directio that are too big, as show i Figure 5c), some of the samples alog the rays may fall outside the boudig box deed by the discrete rays. I order to get a upper boud for admissible osets we have to uderstad how steps i o-major directio alog discrete rays occur. Figure 6 shows the situatio i 2D. The view vector is split ito a dx compoet alog the X-axis (dx ad dy i 3D) ad a uit vector i directio of the major axis (the Y-axis i Figure 6). Steppig i directio of the major axis, we add the viewig vector to the curret sample positio at S i order to get the ew sample positio at S. dy=1 dx View-Vector o 45 1 - dx 1 - dx dx I-Rage Samples Figure 6: Maximum Oset Estimatio. Suppose that the additio of dx at poit S leads to a step of the discrete rays i x directio. This step ca oly occur if S has a relative x oset with respect to the lower left corer voxel of more tha 1? dx for positive dx (or less tha 1 + dx for egative dx). I other words, sample S was iside the rectagle of size dx by 1 show i Figure 6. However, oly the shaded regio of this rectagle cotais sample positios iside the parallelepiped deed by the corer voxels. Takig the smallest side i major axis as the worst-case, this meas that i-rage samples have a maximal relative y oset of o more tha 1? dx for positive dx (o less tha 1 + dx for egative dx). Sice we step with a uit vector i the directio of the major axis, all relative osets alog the ray are determied by the osets of the rst ray samples from the base-plae. The above argumet easily exteds to 3D, makig the maximum allowed oset i directio of the major axis: mi(1? dx; 1? dy); dx; dy 0 Appeared i Eurographics Hardware Workshop, Oslo, September 94 4

mi(1 + dx; 1? dy); dx < 0; dy 0 mi(1? dx; 1 + dy); dx 0; dy < 0 mi(1 + dx; 1 + dy); dx; dy < 0; (3) where dx ad dy are the compoets of the viewig vector i x ad y directio, respectively. Notice that for 45 o viewig agle dx ad dy are 1, yieldig a oset of 0 ad bi-liear iterpolatio as i Figure 5a. This fact will be of importace whe discussig the results i Sectio 6. I our implemetatio we cast a sigle ray from the origi of the image plae oto the base-plae usig uiform distace betwee samples ad choose the oset i major directio of the rst sample after peetratio of the base-plae. If ecessary the oset is iteratively reduced util it satises the above coditio. This leads to view depedet osets i major directio ad to varyig resamplig of the dataset. The variatio of resamplig poits accordig to viewig directio is a advatage for iteractive visualizatio, because more of the iteral data structure ca be revealed. Each discrete ray cosists of voxels, idepedet of the viewig directio. Sice the maximum viewig agle dierece with the major axis is ot more tha 45 degrees, the volume sample rate is deed by the diagoal through the cube ad is by a factor of p 3 higher for orthographic viewig. We foud that for raycompositig this is ot a importat cosideratio due to the averagig ature of the compositig operator. A more severe problem is the varyig size of the sample eighborhood (see Figure 4). For parallel projectios, the eight voxels surroudig the sample poit either form a cube with sides of legth oe or a oblique parallelepiped as i Figure 4a. For perspective projectios, however, the surroudig voxels may form the frustum of a pyramid with parallel frot ad back plaes as i Figure 4b. Due to the divergece of rays towards the back of the dataset, the volume spaed by this frustum icreases, thereby reducig the precisio of the tri-liear iterpolatio. However, we foud that the distace betwee eighborig discrete rays at the ed of the volume ever exceeded two voxels for a 256 3 dataset while still achievig a high amout of perspectivity. Furthermore, i typical datasets the samples at the back of the volume have little iuece o the al pixel color due to compositig alog the ray. The ceter of projectio C ad the eld-of-view (FOV) i perspective projectios also iuece the samplig rate (see Figure 7). The discrete lie algorithm casts exactly oe ray per pixel of the base-plae, or a maximum of 2 rays per scalie. I cases where the FOV exteds across the the dataset (Figure 7a) this guaratees better samplig tha regular image order ray-castig, which would cast rays spaig the FOV ad sed wasteful rays that miss the dataset. However, for a small FOV the discrete lie steppig yields udersamplig i the active regios of the base-plae (Figure C a) Correct Samplig FOV FOV C b) Udersamplig C c) Two Base-Plae Projectios Figure 7: Samplig for Perspective Projectios. 7b). Figure 7c shows a case where two base-plae images cotribute to the al view image. The worst case i 3D is the geeratio of three base-plae projectios for a sigle perspective image. Sectio 6 presets comparisos betwee image order ray-castig usig a view idepedet samplig rate alog the rays, tri-liear iterpolatio employig equatio 2 usig the correct voxels, ad the proposed sheared tri-liear iterpolatio amog discrete rays. The ext sectio describes methods for gradiet estimatio usig samples o eighborig rays. 4. ABC Gradiet Estimatio To approximate the surface ormals ecessary for shadig ad classicatio we use the gray-level gradiet which is computed by the diereces betwee the values of the curret sample ad its immediate eighbors [5]. I order to evaluate the gradiet at a particular poit, we form cetral diereces betwee the tri-liearly iterpolated values of rays o the immediate left, right, above ad below, as well as the values of the curret ray. Sice this amouts to storig three cosecutive plaes of ray samples, we call this method ABC gradiet estimatio for the above, below, ad curret ray sample buers. The simplest approach, show i Figure 8 for 2D, is to use the 6-eighborhood gradiet, which uses the diereces of eighborig sample values alog the ray, P (;m+1)? P (;m?1) i base-plae directio ad P (;m?1)? P (?1;m+1) i the ray directio. Although the left, right, above ad below ray samples are i the same plae ad orthogoal to each other, the samples i the ray directio may be slated. A more critical problem occurs durig a switch of base-plae. Figure 8a shows the situatio for almost 45 o viewig directio, where a image is projected oto the horizotal baseplae. For ay agle greater tha 45 o a switch of baseplaes occurs, ad the values of P (;m)?p (?1;m) are used istead to calculate the gradiet i the base-plae directio. This leads to itolerable temporal aliasig. We also simulated the use of a 26-eighborhood gradiet (Figure 9). Istead of fetchig sample values from Appeared i Eurographics Hardware Workshop, Oslo, September 94 5

+2-1 -2 m-2 m-1 m m+1 m+2 +2-1 -2 m-2 m-1 m m+1 m+2 (a) 6-eighborhood,, (b) 6-eighborhood, Horizotal Base-Plae Vertical Base-Plae Figure 8: 6-eighborhood Gradiet. four eighborig rays, 26 iterpolated samples from 8 eighborig rays are fetched. Each sample is assiged a weight factor correspodig to the iverse Mahatta distace i the iterpolated buer to the ceter sample. For example, sample P (;m?1) i Figure 9a has a weight of 1, whereas sample P (;m?2) has a weight of 1. I 2 3D we also get weight factors of 1 for the corer samples of the 26-eighborhood. However, to simplify the 3 arithmetic we use powers of 2, so that these samples are multiplied by a weight of 1. The gradiet is estimated 4 by takig weighted sums of ray samples ad diereces betwee opposite sample plaes. For the 2D example i Figure 9a this correspods to: G base = [ 1 2 P (;m) + P (;m+1) + 1 2 P (?1;m+2) ]? G ray = [ 1 [ 1 2 P (;m?2) + P (;m?1) + 1 2 P (?1;m) ] 2 P (;m?2) + P (;m?1) + 1 2 P (;m) ]? [ 1 2 P (?1;m) + P (?1;m+1) + 1 2 P (?1;m+2) ] (4) This method leads to better overall image quality whe compared to the 6-eighborhood gradiet, but the switchig of major axis is still oticeable (compare Figure 9a ad 9b). m-2 m-1 m m+1 (a) 26-eighborhood, Horizotal Base-Plae +2-1 -2 m+2 m-2 m-1 m m+1 (b) 26-eighborhood, Vertical Base-Plae Figure 9: 26-eighborhood Gradiet. +2-1 -2 m+2 To circumvet this problem we take a similar approach to the 6-eighborhood method but use a additioal liear iterpolatio step to resample the rays o correct orthogoal positios. Figure 10 shows how the roud samples o the left ad right ray are used to liearly iterpolate the correct square samples. We call this approach the 10-eighborhood gradiet estimatio for the 3D case, sice 10 voxels participate i the computatio. It adequately solves the problem of switchig the major axis durig object rotatios ad yields high image quality. The liear iterpolatio weights are costat alog a ray ad correspod to a shift of all samples i the viewig directio. Sectio 6 presets a direct compariso betwee the 6-, 10- ad 26-eighborhood gradiet methods. -2 m-2 m-1 m m+1 m+2 (a) 10-eighborhood, Horizotal Base-Plae +2-1 +2-1 -2 m-2 m-1 m m+1 m+2 (b) 10-eighborhood, Vertical Base-Plae Figure 10: 10-eighborhood Gradiet. I the case of perspective projectios, the frot of each PRP is uiformly sampled with rays oe uit apart. As the rays diverge towards the back of the volume, the distace betwee rays icreases, ad the gradiet estimatio becomes less accurate. However, because of the usually small distace betwee rays ad due to the averagig ature of shadig, classicatio ad compositig, these eects do ot iuece image quality for typical datasets. With the gradiet estimatio ad light vector directios, the sample itesity ca be geerated usig a variety of shadig methods (e.g., usig lookup tables [10]). Opacity values for compositig are geerated usig a trasfer fuctio represeted as a 2D lookup table idexed by sample desity ad gradiet magitude [11]. The ext sectio shows how the preseted sheared tri-liear iterpolatio ad ABC gradiet estimatio are supported i the Cube-3 architecture i order to achieve real-time 4D visualizatio. 5. Cube-3 Architecture Cube-3 is a special-purpose real-time volume visualizatio system that allows for the display of highresolutio 512 3 16-bit per voxel datasets at frames rates over 20 Hz. It cotais a large CFB memory to hold the volumetric dataset ad performs base-plae projectios accordig to user cotrolled parameters. A host computer, coected to Cube-3 ad cotaiig the frame buer for the al image display, rus the user iter- Appeared i Eurographics Hardware Workshop, Oslo, September 94 6

face software ad performs the al 2D image warp oto the viewig plae. Real-time acquisitio devices such as a cofocal microscope, microtomograph, ultrasoud, or a computer ruig a simulatio model are tightly coupled to the Cube-3 memory usig high-badwidth optical liks for the iput of dyamically chagig 3D datasets. The Cube-3 architecture is highly-parallel ad pipelied [16]. Figure 11 shows a block diagram of the overall dataow. The CFB is a 3D memory orgaized i dual-access memory modules, each storig 2 voxels. A special 3D skewed orgaizatio eables the coictfree access to ay beam of voxels [7]. PRPs are fetched as a sequece of voxel beams ad stored i cosecutive 2D Skewed Buers (2DSB). A high-badwidth itercoectio etwork, the Fast Bus, allows the aligmet of the discrete rays o the PRP parallel to a mai axis i the 2DSB modules. PRP Cubic Frame Buffer (CFB) Frame Buffer Parallel Beam Fetch Fast Bus Projectio 2D Warpig 2D Skewed Buffer (2DSB) Ray Projectio Coe (RPC) Discrete Ray Fetch TRILIN Tri-Liear Iterpolatio Shadig Figure 11: Cube-3 System Overview. ABC Shadig Uits Three 2DSBs are used i a pipelied fashio to support sheared tri-liear iterpolatio. Aliged discrete rays from 2DSBs are fetched coict-free ad placed ito special purpose Tri-Liear Iterpolatio (TRILIN) uits. The resultig cotiuous projectio rays are placed oto ABC Shadig Uits, where the gradiets are estimated ad each ray sample is coverted ito both a itesity ad a associated opacity value accordig to lightig ad data segmetatio parameters. These itesity/opacity ray samples are fed ito the leaves of a Ray Projectio Coe (RPC). The RPC is a folded biary tree that geerates i parallel ad i a pipelied fashio the al pixel value usig a variety of projectio schemes o the coe odes. The resultig base-plae pixel is trasmitted to the host where it is post-processed (e.g., post-shaded or splatted) ad 2D trasformed (warped) oto the viewig plae. The result is stored i the 2D frame-buer. The parallel coict-free memory architecture of Cube-3 reduces the memory access bottleeck from O( 3 ) per projectio to O( 2 ) ad allows for very high data throughput. For a dataset size of 512 3 16-bit voxels we estimate a performace of up to 30 frames per secod. Such a system would require 8 boards ad a custom fabricated backplae. Cube-3 is a scalable ad exible architecture that allows the user to iteractively cotrol the followig parameters: viewig agle from ay parallel ad perspective directio, cotrol over shadig ad projectio (e.g., rst opaque, maximum value, x-ray, compositig), color segmetatio ad thresholdig, cotrol over traslucecy, sectioig ad slicig. It will provide a rederig performace that is a order of magitude higher tha that of previously reported systems ad thereby revolutioize the way scietists coduct their studies. 6. Results We implemeted the dieret iterpolatio ad gradiet estimatio methods i software ad coducted several experimets. The rst program, VolRe implemets traditioal image order volume rederig. Rays are cast from the image plae ito the volume ad sampled at uiform steps. The tri-liear iterpolatio is performed accordig to Equatio 2 usig the correct 8- eighborhood aroud sample poits. The gradiet is estimated usig cetral diereces of tri-liear iterpolated values i a 6-eighborhood aroud each sample. The secod program, True3D, uses our real-time discrete ray-castig method, but istead of performig sheared tri-liear iterpolatio it fetches the exact 8- eighborhood aroud each sample poit. The last program, Sheared3D, implemets the same algorithm but with the proposed sheared tri-liear iterpolatio. Both True3D ad Sheared3D ca use ay of the 6-, 26- or 10-eighborhood gradiet methods for compariso purposes. For the implemetatio of these algorithms we used the VolVis volume visualizatio system, developed at the State Uiversity of New York at Stoy Brook [2, 1]. (The source code of VolVis is freely available by sedig email to volvis@cs.suysb.edu.) 6.1. Tri-Liear Iterpolatio Compariso First we compare images resultig from Sheared3D to results obtaied from VolRe ad True3D. The gradiet approximatio method used for Sheared3D ad True3D was the proposed 10-eighborhood gradiet estimatio. The dataset, a CT study of a cadaver head of size 256 256 225 voxels at 8-bit per voxel, was take o a Geeral Electric CT Scaer ad provided courtesy of North Carolia Memorial Hospital. All programs use the same shadig model ad a opacity trasfer fuctio that maps voxel values below 80 to = 0, has a liear ramp for from 0 to 0:75 for values betwee 80 ad 100, ad assigs = 0:75 to values above 100. We chose this particular trasfer fuctio to classify boe Appeared i Eurographics Hardware Workshop, Oslo, September 94 7

Figure 12: Dataset redered with sheared tri-liear iterpolatio (left) ad the dierece image to traditioal volume rederig (right) for 45 o rotatio agle. This is the worst case for sheared tri-liear iterpolatio. i the dataset as opaque i order to try to maximize the display of aliasig eects o the forehead of the CT skull. For the experimets we rotated the dataset by 70 o aroud the horizotal axis with respect to the world coordiate system, ad durig aimatios we rotated it aroud a vertical axis betwee 0 o ad 90 o i steps of 5 o. As error measure betwee the resultig images we use the average Euclidea distace of RGB values betwee correspodig pixels. Figure 12 shows the dataset rotated by 45 o aroud the vertical axis. The left image was geerated usig Sheared3D ad the image o the right is the dierece image, mapped to gray-scale, comparig the correspodig Sheared3D ad VolRe images for this rotatio agle. Figure 13 shows the relative Euclidea error i percetage betwee images from Sheared3D ad VolRe ad betwee Sheared3D ad True3D, respectively. The compariso with VolRe (top curve) shows how the error raises towards 45 o rotatio agle ad reaches a miimum at 0 o ad 90 o. The peak at 45 o is due to the differet samplig distace alog the ray, which is by p 3 bigger for discrete lie steppig (see Sectio 3). Furthermore, due to the oset cosideratios explaied i Sectio 3, our algorithm performs oly bi-liear iterpolatio as opposed to the the tri-liear iterpolatio i VolRe. The compariso to True3D shows zero error for 45 o because both algorithms perform bi-liear iterpolatio ad use the same gradiet estimatio techique. The relative error i percet compared to VolRe stays below 1:3%, ad compared to True3D it stays below 0:3%. Percet Error 1.4 1.2 1.0 0.8 0.6 0.4 0.2 Sheared Tri-Liear Iterpolatio VolRe True3D 0.0 0 10 20 30 40 50 60 70 80 90 Degrees Figure 13: Sheared Iterpolatio Percetage Error. 6.2. ABC Gradiet Estimatio Compariso For the compariso of the dieret ABC gradiet estimatio techiques we use a voxelized model of a sphere as dataset. The sphere is sca-coverted usig the volume samplig method described i [19]. The surface itersectio poits are obtaied by thresholdig, i.e., as soo as a certai voxel value is exceeded we calculate the gradiet at that poit. Each gradiet is compared to the true geometric surface ormal. As error measure we use the magitude of agular dierece betwee two vectors. All diereces are accumulated ad averaged over all surface itersectio poits. Appeared i Eurographics Hardware Workshop, Oslo, September 94 8

Figure 14: Error magitude of comparig surface ormals of 10- (Top) ad 26-eighborhood gradiets (Bottom) to the true aalytic ormal of the voxelized sphere. Notice the jump of regios of high error for the 26-eighborhood gradiet betwee 45 o ad 50 o rotatio agle. Dark: 0 o jej < 8:5 o, Medium: 8:5 o jej < 20 o, Light: 20 o jej < 31:5 o, White: jej 31:5 o. Rotatio agles (left to right): 30 o ; 35 o ; 40 o ; 45 o ; 50 o ; 55 o ; 60 o. Figure 15 shows the results of rotatig the sphere aroud a vertical axis betwee 0 o ad 90 o i steps of 5 o. The top two curves compare the aalytic ormal with the 26- ad the 6-eighborhood gradiet, respectively. The error icreases towards 45 o rotatio agle due to the o-orthogoality of the gradiet directios which reaches a maximum at 45 o. Although the 26-gradiet shows a little higher error magitude, the dierece betwee these two methods is ot sigicat. Average Error Magitude (Degrees) 25 20 15 10 5 Gradiet Average Error 26-Grad 6-Grad 10-Grad 0 0 10 20 30 40 50 60 70 80 90 Degrees Rotatio Figure 15: Average Error Magitude for ABC Gradiet Estimatios Compared to the Aalytic Normal. The curve o the bottom i Figure 15 shows the compariso of the aalytic ormal with the 10- eighborhood gradiet estimatio. The error magitude is sigicatly smaller tha for the other gradiet methods. The error also icreases towards 45 o rotatio agle. This is due to the dieret distaces betwee samples that are used for the gradiet calculatios i the three orthogoal directios. Figure 14 shows how the error propagates aroud the sphere for rotatio agles from 30 o to 60 o i steps of 5 o. Dark shaded regios idicate regios of low error magitude, light shaded regios idicate higher error magitudes. The top row shows the 10-eighborhood gradiet method with a fairly regular error trasitio from left to right durig a switch of base-plaes at 45 o (ceter sphere). The bottom row, depictig the 26- eighborhood gradiet method, shows a geerally larger error magitude. Additioally, the regio of largest error jumps from the right side of the sphere to the left durig the switch of base-plaes. This jump leads to oticeable chages i image itesity durig object rotatio, a eect that we described as temporal aliasig i Sectio 4. 7. Coclusios I order to achieve the goal of real-time visualizatio of dyamic datasets we developed Cube-3, a scalable architecture that exploits parallelism ad pipeliig. I this paper we preseted the uderlyig real-time ray-castig approach that allows for a mappig of raysamples oto voxels that is oe-to-oe. Usig templates ad shearig/de-faig of beams, we fetch 2D plaes from the volume dataset ad perform sheared tri-liear iterpolatio betwee discrete eighborig rays. Usig the resultig iterpolated ray samples from above, curret ad below plaes, we described ovel ways of gradiet estimatio usig coherecy betwee rays. Usig software simulatios we compared the proposed methods to traditioal image order ray-castig. The error of usig sheared tri-liear iterpolatio istead of performig image order ray-castig is below 1:3% relative dierece i Euclidea distace of the resultig image pixels. We showed that use of the proposed 10-eighborhood istead of a 6- or 26- eighborhood gradiet approach reduces both the average error compared to aalytically computed ormals ad the temporal aliasig that arises from switchig base-plaes durig object rotatios. We preseted both Appeared i Eurographics Hardware Workshop, Oslo, September 94 9

methods i the cotext of Cube-3, a special purpose architecture aimed at real-time 4D visualizatio of highresolutio volumetric datasets. 8. Ackowledgmets This work has bee supported by the Natioal Sciece Foudatio uder grat CCR-9205047. We would like to thak Lisa Sobierajski ad Rick Avila for their helpful suggestios durig the developmet of these methods. Sidey Wag provided us with the sphere dataset ad helped with the geeratio of Figure 14. A discussio with Claudio Silva gave us the isight ito the various error metrics we used. We also thak Patrick Tora for helpful system admiistratio durig the more hectic momets i the developmet of this project. Refereces 1. Avila, R., He, T., Hog, L., Kaufma, A., Pfister, H., Silva, C., Sobierajski, L., ad Wag, S. VolVis: A diversied system for volume visualizatio research ad developmet. To appear i Proceedigs of Visualizatio '94 (Washigto, DC, Oct. 1994). 2. Avila, R., Sobierajski, L., ad Kaufma, A. Towards a comprehesive volume visualizatio system. I Proceedigs of Visualizatio '92 (Bosto, MA, Oct. 1992), IEEE Computer Society Press, pp. 13{20. 3. Camero, G., ad Uderill, P. E. Rederig volumetric medical image data o a SIMD architecture computer. I Proceedigs of Third Eurographics Workshop o Rederig (May 1992). 4. Fuchs, H., Poulto, J., Eyles, J., Greer, T., Goldfeather, J., Ellsworth, D., Molar, S., Turk, G., Tebbs, B., ad Israel, L. Pixel- Plaes 5: A heterogeeous multiprocessor graphics system usig processor-ehaced memories. Computer Graphics 23, No. 3 (July 1989), 79{88. 5. Hohe, K. H., ad Berstei, R. Shadig 3Dimages from CT usig gray-level gradiets. IEEE Trasactios o Medical Imagig MI-5, 1 (Mar. 1986), 45{47. 6. Kaufma, A. Volume Visualizatio. IEEE CS Press Tutorial, Los Alamitos, CA, 1991. 7. Kaufma, A., ad Bakalash, R. Memory ad processig architecture for 3D voxel-based imagery. IEEE Computer Graphics & Applicatios 8, 6 (Nov. 1988), 10{23. 8. Kaufma, A., ad Shimoy, E. 3D scacoversio algorithms for voxel-based graphics. I ACM Workshop o Iteractive 3D Graphics (Chapel Hill, NC, Oct. 1986), pp. 45{76. 9. Kittel, G. VERVE: Voxel egie for realtime visualizatio ad examiatio. I Computer Graphics Forum (September 1993), vol. 12, No. 3, pp. C{37 { C{48. 10. Lacroute, P., ad Levoy, M. Fast volume rederig usig a shear-warp factorizatio of the viewig trasform. Computer Graphics, Proceedigs of SIGGRAPH '94 (July 1994), 451{457. 11. Levoy, M. Display of surfaces from volume data. IEEE Computer Graphics & Applicatios 8, 5 (May 1988), 29{37. 12. Levoy, M. Desig for real-time high-quality volume rederig workstatio. I 1989 Workshop o Volume Visualizatio (Chapel Hill, NC, May 1989), pp. 85{90. 13. Levoy, M. Eciet ray tracig of volume data. ACM Trasactios o Graphics 9, 3 (July 1990), 245{261. 14. Levoy, M. Volume rederig by adaptive reemet. The Visual Computer (July 1990), 2{7. 15. Molar, S., Eyles, J., ad Poulto, J. Pixelow: High-speed rederig usig image compositio. Computer Graphics 26, 2 (July 1992), 231{ 240. 16. Pfister, H., Kaufma, A., ad Chiueh, T. Cube-3: A Real-Time Architecture for High- Resolutio Volume Visualizatio. To appear i 1994 Workshop o Volume Visualizatio (Washigto, DC, Oct. 1994). 17. Schroder, P., ad Stoll, G. Data parallel volume rederig as lie drawig. I 1992 Workshop o Volume Visualizatio (Bosto, MA, Oct. 1992), pp. 25{31. 18. Vezia, G., Fletcher, P., ad Robertso, P. Volume rederig o the MasPar MP-1. I 1992 Workshop o Volume Visualizatio (Bosto, MA, Oct. 1992), pp. 3{8. 19. Wag, S., ad Kaufma, A. Volume sampled voxelizatio of geometric primitives. I Proceedigs of Visualizatio '93 (Sa Jose, CA, Oct. 1993), IEEE Computer Society Press, pp. 78{84. 20. Yagel, R., ad Kaufma, A. Template-based volume viewig. Computer Graphics Forum 11, 3 (Sept. 1992), 153{167. Appeared i Eurographics Hardware Workshop, Oslo, September 94 10