An array based design for Real-Time Volume Rendering

Similar documents
Computer Graphics Hardware An Overview

Elementary Educational Computer

Pattern Recognition Systems Lab 1 Least Mean Squares

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

. Written in factored form it is easy to see that the roots are 2, 2, i,

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

IMP: Superposer Integrated Morphometrics Package Superposition Tool

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Normals. In OpenGL the normal vector is part of the state Set by glnormal*()

Accuracy Improvement in Camera Calibration

Chapter 3 Classification of FFT Processor Algorithms

3D Model Retrieval Method Based on Sample Prediction

A Very Simple Approach for 3-D to 2-D Mapping

CS Polygon Scan Conversion. Slide 1

Computers and Scientific Thinking

Ones Assignment Method for Solving Traveling Salesman Problem

BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM

Chapter 4 The Datapath

Behavioral Modeling in Verilog

Neural Networks A Model of Boolean Functions

UNIVERSITY OF MORATUWA

Sheared Interpolation and Gradient Estimation. for Real-Time Volume Rendering. Hanspeter Pster, Frank Wessels, and Arie Kaufman

Automatic Generation of Polynomial-Basis Multipliers in GF (2 n ) using Recursive VHDL

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

. Perform a geometric (ray-optics) construction (i.e., draw in the rays on the diagram) to show where the final image is formed.

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 )

Weston Anniversary Fund

Parallel Polygon Approximation Algorithm Targeted at Reconfigurable Multi-Ring Hardware

Image Segmentation EEE 508

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

Multiprocessors. HPC Prof. Robert van Engelen

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods.

DETECTION OF LANDSLIDE BLOCK BOUNDARIES BY MEANS OF AN AFFINE COORDINATE TRANSFORMATION

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Performance Plus Software Parameter Definitions

Parabolic Path to a Best Best-Fit Line:

Evaluation scheme for Tracking in AMI

Computer Systems - HS

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

COMP 558 lecture 6 Sept. 27, 2010

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only

CS 683: Advanced Design and Analysis of Algorithms

are two specific neighboring points, F( x, y)

Intro to Scientific Computing: Solutions

Guide to Applying Online

The Nature of Light. Chapter 22. Geometric Optics Using a Ray Approximation. Ray Approximation

The Magma Database file formats

Fast Fourier Transform (FFT) Algorithms

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c

Counting Regions in the Plane and More 1

Texture Mapping. Jian Huang. This set of slides references the ones used at Ohio State for instruction.

Mobile terminal 3D image reconstruction program development based on Android Lin Qinhua

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

Lecture 28: Data Link Layer

Introduction to OSPF. ISP Training Workshops

One advantage that SONAR has over any other music-sequencing product I ve worked

FINITE DIFFERENCE TIME DOMAIN METHOD (FDTD)

Digital System Design

A Study on the Performance of Cholesky-Factorization using MPI

SURVEYING INSTRUMENTS SDR33 SOKKIA ELECTR ONIC FIELD BOOKS NOW EVEN MORE RUGGED PERFORMANCE. from The World Leader in Data Collection

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits

Chapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Recursive Procedures. How can you model the relationship between consecutive terms of a sequence?

Computational Geometry

Alpha Individual Solutions MAΘ National Convention 2013

Lighting and Shading. Outline. Raytracing Example. Global Illumination. Local Illumination. Radiosity Example

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation

Chapter 5. Functions for All Subtasks. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

1. Introduction o Microscopic property responsible for MRI Show and discuss graphics that go from macro to H nucleus with N-S pole

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software

EVALUATION OF TRIGONOMETRIC FUNCTIONS

Lecture 3. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA

Graphics (Output) Primitives. Chapters 3 & 4

Project 2.5 Improved Euler Implementation

Why Do We Care About Lighting? Computer Graphics Lighting. The Surface Normal. Flat Shading (Per-face) Setting a Surface Normal in OpenGL

On (K t e)-saturated Graphs

Introduction to Computing Systems: From Bits and Gates to C and Beyond 2 nd Edition

Computer Science Foundation Exam. August 12, Computer Science. Section 1A. No Calculators! KEY. Solutions and Grading Criteria.

Computer Graphics. Surface Rendering Methods. Content. Polygonal rendering. Global rendering. November 14, 2005

Descriptive Statistics Summary Lists

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions

Efficient Hough transform on the FPGA using DSP slices and block RAMs

Algorithm Design Techniques. Divide and conquer Problem

Chapter 9. Pointers and Dynamic Arrays. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Improving Template Based Spike Detection

APPLICATION NOTE. Automated Gain Flattening. 1. Experimental Setup. Scope and Overview

Appendix D. Controller Implementation

Data Structures Week #9. Sorting

Lecture 2. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

Baan Tools User Management

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Efficient Hardware Design for Implementation of Matrix Multiplication by using PPI-SO

Transcription:

A array based desig for Real-Time Volume Rederig Michael Doggett* School of Computer Sciece ad Egieerig The Uiversity of New South Wales 11 Abstract This paper describes a ew algorithm ad hardware desig for the geeratio of two dimesioal images from volume data usig the ray castig techique. The algorithm is part of a image geeratio system that is broke dow ito three subsystems. The first subsystem stores the iput data i a buffered memory usig a rearragemet of the origial address value. The secod subsystem reads data poits from the buffered memory ad shifts the data to computatioal elemets i order to complete the viewig calculatios for the image sythesis process. The fial stage takes the results of the viewig calculatios combied with the origial iput data to complete the surface rederig ad pixel compositig to create the fial image. This paper focusses o the secod subsystem which cosists of two, two dimesioal arrays of processig elemets. The first array performs a limited agle, sigle dimesio rotatio by shiftig the data. The secod array performs a two dimesioal ray castig operatio viewig rays are assiged to each processig elemet. The first stage is outlied i this paper ad the fial rederig stages are the subject of previous work. The hardware desig associated with these algorithms is described ad tested. It is estimated that this architecture is capable of producig 384 x 384 pixel images at speeds of 15 frames per secod for 256 3 data sets. Real time geeratio of images of volume data is importat i scietific applicatios of volume visualizatio ad computer graphics applicatios which use volume graphics. Additioal Key Words ad Phrases: volume visualizatio, graphics hardware, image geeratio, volume graphics 1 Itroductio New custom architectures for graphics systems are required to process the large amouts of data associated with volume data at video rates rio]. Custom hardware architectures have bee proposed that are capable of geeratig real-time images from volume data [8, 16, 5, 7, 11, 15]. Volume data is a large data space made up of discrete poits which typically lie o a three dimesioal grid each discrete poit "Email: miked@vaslusw.edu.au tsydey 2052. AUSTRALIA twww: http://www.vast.usw.edu.aurmikedlidex.html holds a data value commoly called a voxel. The sythesis of video rate images from large volume data is used i various applicatios of visualizatio such as medical imagig, ad video rate image geeratio is essetial i computer iterface techologies such as virtual reality. The geeratio of two dimesioal images from volume data usig trasparecy techiques, such as volume rederig, is essetial for Volume Visualizatio. Examples of volume data which are sampled iclude, magetic resoace imagig (MRJ) ad computed tomography (CT) data. Volume data ca also be sythesized from traditioal computer graphics primitives such as triagles. The geeratio of images from sythetic data is referred to as Volume Graphics. As hardware processig power icreases Volume Graphics has the potetial to replace polygo based graphics ad place real-time 3D computer graphics o every computer [9]. This ca be achieved by replacig two dimesioal frame buffers with three dimesioal frame buffers that ca work with voxelized polygos stored as volume data. 2 Previous Work Previous work o the system [2, 1] has cocetrated o shadig which occurs i the fial stages ofimage geeratio. This paper details the desig of part of the frot ed, beig two arrays that shift iput data to perform a partial rotatio ad processig elemets that calculate the path of viewig rays through the volume data. The memory subsystem that eters the data ito the first array is briefly outlied i this paper ad is the subject of cotiuig ivestigatio. The desig preseted i this paper aims to describe a scalable ad adaptable architecture for real-time volume visualizatio systems. To achieve this, a rotatio ad ray castig algorithm which uses arrays of processig elemets to create a parallel pipelied subsystem was desiged. The complete system uses a iitial subsystem for volume data storage ad a fial subsystem for rederig. The array based algorithm was implemeted ad tested i software. A hardware descriptio laguage was used to describe, simulate ad verify the hardware fuctioality ad calculate performace estimates for the array processig elemets. The hardware descriptio was the traslated ito the LSI gate array desig eviromet i order to estimate the characteristics of a gate array implemetatio. Usig the gate array results, the per 93

formace is estimated to achieve a video rate of 15 frames per secod (f/s) for images of volume data. 3 Related Work TheCube-3 architecture [15] is estimated to be capable of processig a 512 3 data set at 30 f/s. The Cube architecture uses a bus betwee iput volume data ad a set of buffers data is stored before the image geeratio process begis. The trasfer of data usig this bus ivolves a template of values beig calculated which determie the correct storage addresses for the data held i the buffers. The buffers store the data for the ext stage of processig. The purpose of the bus trasfer operatio is to trasfer data represetig oe row of oe slice of a volume data cube i oe trasfer operatio. The bus trasfer operatio is the projectio calculatio for parallel or perspective viewig of the data set. The array based desig reported i this paper differs from the Cube-3 architecture by havig a simpler parallel trasfer method betwee memory ad the secod subsystem for processig. The simplified addressig the requires the itroductio of the two array desigs to calculate arbitrary scree projectios. The Kittel system [11] is able to reder images of 256 3 data sets at 2.5 fls, ad uses a VLSI pipelie for the calculatio ofsurface ormals ad Phog shadig. The performace of this architecture is improved by usig a distributed volume data memory ad sedig packets betwee processig elemets to represet the traversal of rays through the data space. This parallel implemetatio is capable of 20 fls for 512 3 data sets usig a 64 processig elemet array. The architecture preseted here elimiates the eed for a etwork of processig elemets ad packets cotaiig ray traversal iformatio. The ray castig operatio is performed withi oe array that aligs the volume data with the correct ray for image geeratio. Results from algorithms capable of speeds ofa few frames per secod for data sets of 128 3 to 256 3 usig supercomputers [17] ad stadard workstatios [12] have also bee reported. These real time results are depedet o either large supercomputer systems or sparse data sets. For higher frame rates ad larger data sets, icreased processig power is required. 4 Image Geeratio from Volume Data 4.1 Itroductio The sythesis of images from volume data is referred to as volume rederig [13, 3]. There are several approaches to this rederig process, oe beig ray castig [13], Ray castig is similar to ray tracig, except that i ray castig rays travel through the data set oce without creatig reflectio rays. The ray castig used i this system is based o discrete icremets alog each ray similar to discrete ray tracig [18]. As the rays travel through the data, sample poits are extracted ad used to calculate local gradiets surroudig the sample poit which is used i opacity calculatios to combie the sample poits to create the fial pixel value. The shadig of the surface is calculated usig either the diffuse shadig or Phog shadig equatios [4]. Ray castig ad ray tracig are examples ofimage sythesis that is based o a pixel by pixel calculatio. The stadard ray castig algorithm allows rays to travel from ay viewpoit through the data space. This requires that all data poits i the memory are available to each processor that calculates the traversal of a ray. This creates a bottleeck at the iput memory for a sigle processor calculatig all ray traversals. A solutio for parallel systems is to pass data packets cotaiig ray traversal iformatio from oe parallel processig elemet to aother. This passig of ray packets requires a etwork ad adds overheads to the ray castig algorithm. The objective of the ray castig algorithm preseted i this paper is to create a parallel algorithm that passes data i the correct order to a set of processig elemets without the memory or etwork bottleecks. To achieve this the algorithm preseted uses a limited rage of possible viewpoits for the ray castig algorithm. Arbitrary viewig is accomplished by addig two preprocessig stages. The first stage is called coordiate swappig, ad the secod. X axis rotatio. The fial stage requires a modified ray castig algorithm i the X, Z plae. The data that eters the first stage are voxel values, Va, which are represeted usig a right haded coordiate system called the world coordiate system. I the same coordiate system a viewig directio is specified usig a cylidrical coordiate system represeted by the values (J ad <p. The coordiate value for oe of the eight corers of the volume data set is equal to the origi ofthe world coordiate system ad the data set icreases i size alog the positive x, y ad z axes. 4.2 Coordiate Swappig The coordiate swappig stage of the image geeratio process ivolves the rearragemet ofcoordiate values for each voxel which reorders the iput data i terms of the ew coordiate system. The world coordiate system used for iput voxel data is chaged to the ew coordiate system ad called the limited view coordiate system ad chages all possible viewig agles i the rage of (0 < (J < 360,0 < <p < 180) to (225 < (J < 315,90 < <p < 180). This mappig is accomplished by swappig ad/or ivertig the x, y, z coordiates ofthe iput voxels depedig o the viewpoit represeted by the cylidrical coordiate values, (J ad <p. The ew coordiates are used i the X axis rotatio. The required coordiate swappig for each view agle are show i Table 1. 94

! Regio 1 2 3 4 5 6 7 8 Viewig Agles Axes Chage X+,Y+,Z+ Yie, Zic = icremetal values for y, z coordiates Z-,Y-,X respectively Y+,Z-,X Yz=, Zz= =rotated y,z coordiates for poits o the Z+,Y+,X z = plae Y-,Z+,X Yz=o, zz=o =rotated y,z coordiates for poits o the X+,Y-,Z z = 0 plae X+,Z-,Y+ X+,Y+,Z+ X+,Z+,Y- Table 1: Iput data rearragemets required for rages of viewig agles. 4.3 X axis rotatio The secod stage of the image geeratio process ivolves a two dimesioal array which effectively performs a small rotatio of the voxel data values. The rotatio is about the X axis of the limited view coordiate system. The complete rotatio calculatio ivolves a traslatio to the origi, a three dimesioal rotatio, ad a traslatio back to the origial positio. Usig a homogeeous coordiate system [4], the traslatio ad rotatio operatios result i the followig equatios: Rz =x Ry = Ycos(Oz) - Z si(oz) - ::2 cos(oz) + :: si(oz) + :: 2 2 Rz = ysi(oz) + zcos(oz) - ::2 si(oz) - :: cos(oz) + :: 2 2 =world coordiate values = rotated coordiate values i the limited view coordiate system =agle ofrotatio about the X axis calculated from the 0 view poit agle =size of the data set =size of the data set Oce all iitial ad icremetal values for coordiate rotatio are calculated they are used i a algorithm for the X axis rotatio. The algorithm is based o a two dimesioal array data structure W (x, y). Each elemet of the array stores the voxel data Va, the voxel y value Vy ad three boolea values eq, gt ad Lt. A compariso is made betwee the Vy value ad the y value stored i each array elemet ad the results stored i the three boolea values. The voxel data is also shifted alog the x axis ofthe array. As data is shifted alog the x axis the boolea values idicate whether the data moves to the adjacet array elemet or a row above or below the adjacet elemet. The x dimesio ofthe array is ad the y dimesio is I i order to hadle up to 45 degree rotatios. The iput ito the array is from the volume data ad proceeds i a plae-by-plae, colum-by-colum processig order. For each colum ofeach plae the data is iput ad rotated usig the followig algorithm : for ( y =1 to ) { W(,y).vy = Yz=o + (y X Yie) W(,y).Va = y{ze,y,zc} for ( x =1 to J{ { W(x,y).eq:= (W(x,y).Vy equal to W(x,y).Y pe ) W(x,y).gt:= (W(x,y).Vy greater tha W(x, y).y pe ) W(x,y).lt := (W(x, y).vy less tha W(x, y).ype) ifw(x,y).eq the W(x -l,y) = W(x,y) ifw(x,y).gt the W(x -l,y) = W(x,y -1) ifw(x,y).lt the W(x -l,y) = W(x,y + 1) } } To reduce the calculatio complexity ivolved i performig this calculatio for every coordiate a icremetal calculatio is used. The coordiates of a data poit i the first plae ad the last plae ofthe volume data set are rotated usig the above equatios. The rotated coordiates ofthe first plae poit are used as the startig poit ad icremetal values are added for each ew poit. The icremetal value is calculated usig the followig equatios: Yz= - Yz=o Yic = "---.:...--=..:. Zz= - Zz=o Zie = -'--.:...-..;;;...:. eq gt It xc zc =boolea value for equal y coordiates =boolea value for voxel Y coordiate greater tha array Y coordiate = boolea value for voxel y coordiate less tha array y coordiate = curret colum =curret plae 4.4 Ray Castig The fial stage ofthe ew image geeratio process is a modified ray castig algorithm that uses a array of rays that tra 95

verse through the volume data i the X, Z plae. A two di mesioal array, R(x, y), stores voxel values, voxel coordi ates, ad ray coordiates i each array elemet. The ray co ordiates, Rz ad Rz represet the curret locatio ofthe ray i the x, Zplae. The dimesios of the R(x, y) array are i both x ad y dimesios. The ray at each array locatio moves through the data i a icremetal fashio detectig itersectios with data values as it progresses. The startig poit ad icremetal values for each ray eed to be calculated for each image. The viewig ray is determied by takig a iitial value at the scree plae ad a termiatig value o the opposite side of the volume data. A lie is created betwee the two values ad the iitial poit of itersectio with the volume data set is foud. The icremetal values are foud usig these iitial ad fial values for each ray. The view plae is the x, y plae at Z = - i i the view co ordiate system. This view plae has to be traslated to the same coordiate system that the X axis rotatio uses so that coordiate itersectios ca be foud betwee rays ad the voxel values stored i the W (x, y) array. The trasformatio of the ray coordiate values to the correct coordiate system requires a traslatio to the origi, the rotatio, ad the aother traslatio back to volume data coordiates. The equa tios for the calculatio of x, Z values i the X axis rotatio coordiate space are : Rz = xcos(8y ) + zsi(8y) + - cos(9y) - si(9y) Rz = zcos(8y ) - x si(9y ) + - cos(9y) + 8i(Oy) x,z = origial coordiate values Rx,Rz =rotated coordiate values 8 y =agle of rotatio about the Y axis calculated from the </> view poit agle = size of the data set The poits o the viewig plae at z = - i are trasformed to the correct coordiate space with the viewig agle </>. Aother x, y plae at z = 5:, which represets the plae rays termiate, is also trasformed. Usig the two plaes as iitial ad fial values for each ray the itersectio poit betwee the ray ad the voxel data space is calculated. This itersectio is foud by fidig the itersectio betwee the lie coectig the correspodig poit o the two rotated plaes ad the faces of the cube which defies the voxel data space. This itersectio poit is the the iitial value for each ray. The icremet values for each ray are calculated usig the correspodig poits of the two plaes ad the followig formula: R z_ - 4 -Rx==f Xic = R z- - -Rz==.!1:. 4 4 Zic = Xic, Zic xz=,zz=,zz= =icremetal values for x, z ray coordiates respectively =rotated x, z coordiates for poits o the z = 5: plae =rotated x, z coordiates for poits o the z = -t plae = size of the data set A two dimesioal array, R(x, V), stores at each array elemet three voxel values, the coordiates for the curret plae voxel value, ad the coordiates of the ray associated with the array elemet. The array elemet also stores a series of boolea values. The x = 1 colum of the W (x, y) array cotais the voxel values ad associated coordiates which are assiged to the = colum of the R(x,y) array. The colums of data effectively move across the W(x, y) array ad ito the R(x,y) array. The R(x,y) array is drive by the same data processig as the W (x, y) array ad uses the followig algorithm to complete the ray castig operatio: jor ( y = 1 to ) { R(i,y).Vz =xc R(I,y).vz = zir+ (y X Zic) R(I,y),V4 = W(1,y),V4 R(,y).vb = R(1,y),V4 R(2,y).vC =R(l,y).ltb jor ( X =1 to ) { R(x, y).eq = (R(x, y).rx equal to R(x, y).vx) ad (R(x, y).rz equal to R(x, y).vz) R(x,y),V4 =R(x + 1,y).Va R(X,y).Vb = R(x + 1,y).Vb R(x, y).vc = R(x + 1,y).vc R(x,y).vx =R(x + 1, y).vx R(x, y).vz = R(x + 1, y).vz R(x,y).v1 =R(x + l,y).vl ifr(x, y).eq the if(r(x,y).vl used) the R(x,y).pv2 = R(x,y).Rxf, R(x,y).RzJ R(x,y).R:z =R(x,y).Rz +R(X,y).Xic R(x,y).Rz = R(x,y).Rz +R(X,y).Zic else R(x,y).pv1 = R(x,y).Rx,R(x,y).Rz R(x,y).vl = true } } 96

eq vi pv1 pv2 Vb Vc = boolea for ray itersectio with data value = boolea for whe first passed value is used = first set of passed ray itersectio values = secod set of passed ray itersectio values =previously processed plae of voxel values = plae of voxels processed before the Vb plae r;;;;l... Double Buffered Iput Warp Array Memory f- Ray Array The output from the R(x, y) array is the itersectio poits ad associated voxel values which are used i surface shadig ad image compositig. Whe a itersectio is foud the fractioal parts of the curret ray locatio are passed through the array to the the = 1 colum they are used i the shadig calculatios. Ifmore tha oe itersectio is foud at a particular voxellocatio a secod set of itersectio values are passed for shadig calculatio. Whe a ray itersects a voxel value, the coordiates ofthe ray are icremeted ad the ray is't icremeted agai util the ext itersectio. The distace betwee the sample poits that a ray takes is equal to the distace betwee voxel values i the data set to esure that o more tha two samples are ever take at oe voxellocatio. 4.5 Surface shadig ad image compositig The voxels values i the colum R(1, y) are used to calculate values for the pixels i the fial image. The y value of the voxel i the R(x, y) array is the scree y value for the pixel. The x pixel value is stored with the voxel i the R(x, y) array. A widow of 3 3 voxel values is take from the R(I, y) colum oe 3 x 3 plae for each iteratio of the complete algorithm. The gradiet at the cetre of the widow of values is calculated ad used as a surface gradiet for the cetre poit. The surface gradiet is used i a diffuse lightig equatio to calculate the light itesity ad hece pixel value at the locatio ofthe cetre voxel i the widow. The complete descriptio for surface shadig ad image compositig used i this system is described i [2, 1]. Alterative algorithms for surface shadig ad image compositig, that are implemeted i real-time systems, are outlied i [11, 15]. 5 Hardware Desig The architecture ofthe system ito which the ew image geeratio algorithm fits is show i Figure 1. The system is broke ito several major compoets icludig two custom hardware arrays of processig elemets based o the algorithms described i the previous sectio outliig the ew image geeratio algorithm. The system is capable of acceptig iput from a real-time data acquisitio device ad storig it i the memory system. The specialised memory represets the first stage ad is a double buffered memory with each buffer holdig oe copy of the volume data. As data is etered ito the double buffer the coordiate swappig operatio is performed. The output from the memory subsystem is coected to the warp array, which performs f- Frame Rederig i'" Scree fe- Buffer Pipelie f- I- Figure I: System level orgaisatio the X axis rotatio described i Sectio 4.3. The warp array fuctio is similar to a shear warp operatio, but is a rotatio ad ot a shear. The output from the warp array is coected to the iput ofthe ray array, which performs the modified ray castig algorithm described i Sectio 4.4. The last stage performs the surface shadig ad image compo sitig ad is broke ito several rederig pipelies ad a scree buffer. 5.1 Double buffered memory The first stage of the system is a double buffered memory which performs the coordiate swappig operatio described i Sectio 4.2. A double buffered memory is used so that data ca eter the system at the same time as data is read ito the warp array without coflictig accesses. The address of voxel data is altered as it is writte ito the double buffered memory. A icremetor startig at the origi poitofthe voxel values ad icremetig to the. last row ad colum of the fial plae of the voxel data is used to calculate the address of iput data. The icremetor value represets the X, Y ad Z coordiates with 8 bits for each coordiate to accommodate a 256 3 data set. The coordiate values are the recalculated accordig to the view poit with coordiate swappig described i Sectio 4.2. Three multiplexors ad iverters are used to create the ew address the data is stored i the acti ve buffer for iput data. Oce the icremetor reaches the maximum value the buffer switch is chaged ad the process repeats. The buffer switch is used to select betwee the iput data buffer ad the buffer values are read ito the warp array. The desig of the double buffer is show i Figure 2. The iput data value addresses are calculated sequetially. To improve the performace of the address calculatio process multiple address calculatios uits ca ru i parallel. Data set sizes of 512 3 ad 1024 3 would require parallel address calculatio uits. 97

data N P U T Icremetor Va 8 bit address, " Figure 2: Iput data double buffer for data rearragemet lililliati(m'ja y... cakulawrs Double lmr_ A"<y Memory -o'j_" ".J::::... " : :/"- /v:v\,', :/ '' i "'''':,/. -... "';.' ', " ",. VtIVyAt;/ Figure 3: The Warp Array orgaizatio 5.2 The Warp Array The warp array is a array ofprocessig elemets which perform the viewig rotatio described i Sectio 4.3. The orgaizatio ofthe warp array is show i Figure 3. Iitialisatio calculatios are required for the coordiate data used i the warp array. The icremetal calculatios require oe additio operatio for each voxel value. The projectio iitialisatio operatios are more complex ad require a dedicated digital sigal processor (DSP) to calculate the iitial values oce per frame. The iitial value for each row ad a icremetal value are loaded ito two registers which, combied with a adder, ca calculate the Y value iput for each row of the warp array. Ray Figure 4: Warp Array processig elemet Va Origial voxel data value V x, V y, V z X, Y ad Z coordiates of data value after X axis rotatio Y pe Y-positio of processig elemet from top of warp array Rxi, Rx/ Iteger ad fractioal compoets ofthe X coordiate of the ray Rzi, Rz/ Iteger ad fractioal compoets ofthe Z coordiate of the ray Table 2: Notatio for Warp ad Ray Array processig elemets. 5.3 The Warp Array processig elemet The warp array processig elemet takes oe set of iputs from three possible sets ad seds the iput to oe of three possible outputs depedig o the data's associated Y value. The etire array acts as a large shift register data is stored ad shifted towards its correct positio withi the array. Each processig elemet has a Y pe register which holds the y value of the array elemet. The Y pe value ad the Vy value are compared ad the voxel data ad y value are passed to a processig elemet i the ext colum which is either above, below or adjacet to the curret elemet. The processig elemet uses two flip flops as registers to store the voxel value ad its Y coordiate. The iput to the registers is selected from the outputs ofthe adjacet processig elemets. There are two select lies to the multiplexers which are drive by the comparator output of the adjacet processig elemets. The output of the register values are passed to the adjacet processig elemets o the output side. A comparator is used for the Y coordiate compariso. The layout for the warp processig elemet ca be see i Figure 4 ad the values are described i Table 2. The output of the fial colum ofthe warp array is coected to the iput ofthe first colum of the ray array. 98

.>, -DOfC}:I[J;f" OCI-.. + --, == 61i[\.......", r-------.. > Figure 5: Ray Array orgaizatio Figure 6: Ray Array processig elemet 5.5 The Ray Array processig elemet 5.4 The Ray Array The ray array is a two dimesioal array of processig elemets which perform the operatio of ray castig as described i Sectio 4.4. The iitial values for the ray coordiates are calculated by a dedicated digital sigal processor (DSP) chip ad passed through the ray array to the correct processig elemet. The DSP chip calculates the iitialisatios described i Sectio 4.4, which are performed oce per frame. The coordiates ofthe ray are stored i each processig elemet usig a register with both a 8-bit iteger ad 8-bit fractioal compoet. The iteger compoet of the ray ad the coordiates ofthe voxel data are compared to determie itersectios with the data ad whe a itersectio occurs the fractioal compoet ofthe itersectio is passed through the remaider ofthe array. Each ray array processig elemet accepts a set ofvalues that cotai itersectio iformatio from previous processig elemets. After the itersectio calculatio is performed the result is combied with the iput itersectio data ad set to the followig processig elemet. The output data is used by the shadig operatio i the surface rederig ad image compositig stage. The orgaizatio of the ray array is show i Figure 5. The voxel data value Va from the warp array is placed directly ito the adjacet processig elemets i the ray array. The voxel y coordiate, V y is ot passed o to the ray array as it is o loger eeded for computatio. Each ray array elemet stores a voxel value from the curret plae. Va ad the two previous plaes, Vb ad Ye. Iitialisatio values are loaded ito each ray array processig elemet usig the a load bit ad the registers that pass the itersectio data. The load bit is used to switch multiplexors that iput the iitialisatio data ito the appropriate registers i the processig elemet. Each ray array processig elemet stores data usig flip flops as data registers. The ray coordiate is icremeted through the array usig two 16-bit carry look ahead adders. A itersectio is detected usig a comparator to detect whe the coordiates of a voxel ad a ray are equal. At each itersectio poit, the fractioal compoets of the ray locatio. RzJ ad RzJ are output to the ext processig elemet. Oce a itersectio is detected by the comparator the itersectio data i the iput register is checked to see ifa value already exists, ifso the itersectio data from the curret processig elemet is placed i the secod locatio. I Figure 6 the layout for a ray processig elemet is show with the values explaied i Table 2. 6 Results The viewig ad shadig operatios of the system were implemeted i software ad results were obtaied. The processig elemets i the warp ad ray array were implemeted i a hardware descriptio laguage (HDL) ad tested for fuctioality ad performace. The viewig ad shadig operatios were tested o both artificial ad real data sets. Oly the real data set results are discussed i this paper. The artificial data sets showed that all arbitrary viewig agles worked correctly. 99

6.1 Software Simulatio The results from the software simulatio for a 40 3 sized data set cotaiig several spheres is show i Figure 7 (a). I Figure 7(a) the viewpoit is set to () = 45 degrees ad =45 degrees showig the performace of both the ray ad warp arrays. To test the two arrays o real sampled data a MRI of a huma heart was used. Figure 7(b) shows the data set with o rotatio. 6.2 Hardware Simulatio To test hardware fuctioality ad to estimate the processig speeds ofthe warp ad ray array, the processig elemets were defied i a HDL ad simulated usig a switch level simulator [6]. The simulator takes accout of gate delays ad fa-i ad fa-out coditios. Both processig elemets fuctioed as described, with the warp array processig elemet operatig at 38MH Z, ad the ray array processig elemet at 25MHz. 6.3 System Performace To estimate the performace of the complete system the results from the hardware simulatios were used. A coservative clock rate of 20MH z will produce a pixel result every loos. To estimate the requiremets ofa real time system, a example data set ofsize 256 3 was used to geerate 384 2 sized images. This was foud to require 32MB of8-bit data i the double buffered memory. Assumig IOs memory is used the 3 parallel read ad write lies are required to perform at 15!/ s. The warp array requires a array of size 256 x 384 ad the ray array requires a array of 384 x 384. The techology used for simulatig the gate array layout is LSI logic's 0.6-micro LCA300K [14]. The maximum umber of gates o oe chip is 300,000. The hardware descriptio was traslated to the descriptio laguage used by the LSI toolset ad a gate array schematic geerated to determie the umber of gates that each processig elemet required. Usig these details the warp array requires a set of25 chips cofigured i a 5 x 5 array. The larger ray array is implemetable usig a similar set of 5 x 5 chips, but oly cotais a 384 x 40 ray array. This reduced size requires that the ray array be reused approximately 10 times per frame. The followig calculatio estimates the time required to geerate oe frame i this system : loos x 256 x 256 x 10 = 65ms The frame time traslates to a performace of 15 f/s. 7 Coclusio This paper has preseted a ew algorithm ad hardware desig for the visualizatio of volume data. The warp array ad ray array store the data as it is'processed ad perform the viewig ad ray castig operatios required for volume visualizatio. The reductio ofthe three dimesioal ray castig algorithm to two dimesioal is a direct result of the coordiate swappig process ad the X axis rotatio. The separatio of ray castig from data storage ad rederig allows these aspects to be customized for particular applicatios. The system is desiged to allow flexibility i the sizig of both the warp ad ray arrays to cost ad performace cosideratios. The system utilises a high level of both pipeliig ad parallelism to provide real time frame rates for volume data sets. The system desig is scalable ad therefore allows it to be used to process larger data sets at higher frame rates. 8 Ackowledgemets The author would like to thak Professor Graham Hellestrad for his supervisio ad essetial feedback, Dr Jayasooriah, Mr Guter Kittel ad Mr Stephe Avery for their helpful discussios with regard to this work, ad Dr Jiirge Hesser for the MRI data set of a huma heart. Refereces [1] DOGGETT, M., AND HELLESTRAND, G. A hardware architecture for video rate shadig of volume data. I Iteratioal Symposium ofcircuits adsystems (May 1995), IEEE. [2] DOGGETT, M. C., AND HELLESTRAND, G. R. A hardware architecture for video rate smooth shadig of volume data. I EuroGraphics Workshop o Graphics Hardware (September 1994). EuroGraphics, pp. 95 102. [3] DREBIN, R., CARPENTER, L., AND HANRAHAN, P. Volume rederig. Computer Graphics 22, 4 (August 1988),51-58. [4] FOLEY, J. D., VAN DAM, A., FEINER, S. K., AND HUGHES, J. F. Computer Graphics: Priciples ad Practice. Addiso Wesley, 1989. [5] GUNTHER, T., POLIWODA, C., REINHART, C., HESSER, J., MANNER, R., MEINZER, H.-P., AND BAUR, H.-J. Virim: A massively parallel processor for real-time volume visualizatio i medicie. I Eurographics workshop o Graphics Hardware (September 1994), pp. 103-108. [6] HELLESTRAND, G. R. Modal: A system for digital hardware descriptio ad simulatio. Joural ofdigital Systems 4,3 (1980),241-303. [7] JUSKIW, S., AND DURDLE, N. G. Iteractive rederig of voumetric data sets. I Eurographics workshop o GraphiCS Hardware (September 1994), pp. 86-94. 100

[8] KAUFMAN, A., AND BAKALASH, R. Memory ad processig architecture for 3d voxel-based imagery. IEEE Computer Graphics ad Applicatios 8, II (November 1988), 10-23. [9] KAUFMAN, A., COHEN, D., AND YAGEL, R. Volume graphics. IEEE Computer 26,7 (July 1993),51-64. [10] KAUFMAN, A., HOHNE, K. H., KRUGER, W., ROSENBLUM, L., AND SCHROEDER, P. Research issues i volume visualizatio. IEEE Computer Graphics ad Applicatios 14, 2 (March 1994), 63-67. [11] KNITTEL, G. A scalable architecture for volume rederig. I Eurographics Workshop o Graphics Hardware (September 1994), pp. 58-69. [12] LACROUTE, P., AND LEVOY, M. Fast volume rederig usig a shear-warp factorizatio of the viewig trasformatio. I Computer GraphiCS (July 1994), ACM SIGGRAPH, pp. 451-458. (a) [13] LEVaY, M. Display of surfaces from volume data. IEEE Computer Graphics ad Applicatios 8, 5 (May 1988),29-37. [14] LSI LOGIC CORPORATION. LCA300K Gate Array 5 Volt Series Products Databook, October 1993. [15] PFISTER, H., KAUFMAN, A., AND CHIVEH, T.-C. Cube 3: A real-time architecture for high resolutio volume vizualizatio. I ACMIlEEE Symposium o Volume Visualizatio (October 1994). [16] STYTZ, M. R., AND FRIEDER, O. Volume-primitive based three-dimesioal medical image rederig: Customized architectural approaches. Computers ad Graphics /6,1 (1992),85-100. [17] VEZINA, G., FLETCHER, P. A., AND ROBERTSON, P. K. Volume rederig o the maspar mp-l. I A CM Workshop o Volume Visualizatio (October 1992), pp.3-8. [18] YAGEL, R., COHEN, D., AND KAUFMAN, A. Discrete ray tracig. IEEE Computer Graphics ad Applicatios 12, 5 (September 1992),19-28. (b) Figure 7: (a) A data set comprised of spheres at a arbitrary view poit. (b) MRI heart Sca with o rotatio 101