Design of a Real Time FPGA-based Three Dimensional Positioning Algorithm

Similar documents
APPLICATION OF A COMPUTATIONALLY EFFICIENT GEOSTATISTICAL APPROACH TO CHARACTERIZING VARIABLY SPACED WATER-TABLE DATA

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

A Binarization Algorithm specialized on Document Images and Photos

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Parallelism for Nested Loops with Non-uniform and Flow Dependences

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Wishing you all a Total Quality New Year!

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

S1 Note. Basis functions.

TN348: Openlab Module - Colocalization

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

X- Chart Using ANOM Approach

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

CS 534: Computer Vision Model Fitting

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Reducing Frame Rate for Object Tracking

An Entropy-Based Approach to Integrated Information Needs Assessment

Cluster Analysis of Electrical Behavior

Mathematics 256 a course in differential equations for engineering students

Edge Detection in Noisy Images Using the Support Vector Machines

Module Management Tool in Software Development Organizations

Active Contours/Snakes

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Multi-view 3D Position Estimation of Sports Players

Feature Reduction and Selection

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Load Balancing for Hex-Cell Interconnection Network

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Simulation Based Analysis of FAST TCP using OMNET++

Fitting: Deformable contours April 26 th, 2018

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

An Optimal Algorithm for Prufer Codes *

A fault tree analysis strategy using binary decision diagrams

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Programming in Fortran 90 : 2017/2018

The Codesign Challenge

An Image Fusion Approach Based on Segmentation Region

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Problem Set 3 Solutions

Lecture 5: Multilayer Perceptrons

Pictures at an Exhibition

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

y and the total sum of

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Biostatistics 615/815

Performance Evaluation of Information Retrieval Systems

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT

Design of a Real Time FPGA-based Three Dimensional Positioning Algorithm

Meta-heuristics for Multidimensional Knapsack Problems

Analysis of Continuous Beams in General

Lecture #15 Lecture Notes

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

A Robust Method for Estimating the Fundamental Matrix

GSLM Operations Research II Fall 13/14

Support Vector Machines

Analysis of Collaborative Distributed Admission Control in x Networks

AP PHYSICS B 2008 SCORING GUIDELINES

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Optimizing Document Scoring for Query Retrieval

A Statistical Model Selection Strategy Applied to Neural Networks

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

CSE 326: Data Structures Quicksort Comparison Sorting Bound

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Fusion Performance Model for Distributed Tracking and Classification

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

Private Information Retrieval (PIR)

Electrical analysis of light-weight, triangular weave reflector antennas

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

Lecture 13: High-dimensional Images

Simplification of 3D Meshes

High-Boost Mesh Filtering for 3-D Shape Enhancement

Intelligent Information Acquisition for Improved Clustering

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Advanced Computer Networks

CMPS 10 Introduction to Computer Science Lecture Notes

ELEC 377 Operating Systems. Week 6 Class 3

Wavefront Reconstructor

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Classification Based Mode Decisions for Video over Networks

Wireless Sensor Network Localization Research

Efficient Distributed File System (EDFS)

Transcription:

Desgn of a Real Tme FPGA-based Three Dmensonal Postonng Algorthm Nathan G. Johnson-Wllams, Student Member IEEE, Robert S. Myaoka, Member IEEE, Xaol L, Student Member IEEE, Tom K. Lewellen, Fellow IEEE, and Scott Hauck, Senor Member IEEE Abstract We report on the mplementaton and hardware platform of a real tme Statstcs-Based Processng (SBP) method wth depth of nteracton processng for contnuous mnature crystal element (cmce) detectors usng a sensor on the entrance surface desgn. Our group prevously reported on a Feld Programmable Gate Array (FPGA) SBP mplementaton that provded a two dmensonal (2D) soluton of the detector s ntrnsc spatal resoluton. Ths new mplementaton extends that work to take advantage of three dmensonal (3D) look up tables to provde a 3D postonng soluton that mproves ntrnsc spatal resoluton. Resoluton s most mproved along the edges of the crystal, an area where the 2D algorthm s performance suffers. The algorthm allows an ntrnsc spatal resoluton of ~0.90 mm FWHM n X and Y and a resoluton of ~1.90 mm FWHM n Z (.e., the depth of the crystal) based upon DETECT2000 smulaton results that nclude the effects of Compton scatter n the crystal. A ppelned FPGA mplementaton s able to process events n excess of 220k events per second, whch s greater than the maxmum expected concdence rate for an ndvdual detector. In contrast to all detectors beng processed at a centralzed host, as n the current system, a separate FPGA s avalable at each detector, thus dvdng the computatonal load. A prototype desgn has been mplemented and tested usng a reduced word sze due to memory lmtatons of our commercal prototypng board. A I. INTRODUCTION contnuous mnature crystal element (cmce) detector s a low cost alternatve to dscrete crystal detector modules that have tradtonally been used to acheve hgh spatal resoluton for small anmal postron emsson tomography (PET) scanners. A key to the magng performance of the cmce detector s the use of a statstcs based postonng (SBP) algorthm [1,2]. SBP s used to mprove the resoluton of the monolthc sensor module and ncrease the effectve mage area of the sensor compared to Anger postonng. The bass of SBP nvolves the calculaton of the most lkely locaton of an event gven the sensor outputs. We prevously reported on a feld programmable gate array (FPGA) mplementaton of our SBP method for a two dmensonal (2D) soluton (.e., X and Y postonng) for a cmce detector [4]. Whle the majorty of events occur near the surface of the module crystal, the depth of nteracton does vary for each event. Accurate calculaton of the depth of nteracton permts more accurate calculatons of the concdence lne of response (LOR) and yelds better ntrnsc spatal resoluton performance for our cmce detectors. A three dmensonal (3D) SBP algorthm provdes a soluton that ncludes the X and Y dmensons and the depth of nteracton. Whle the prmary goal of fndng the depth of nteracton s for use wth the LOR, the three dmensonal algorthm also provdes better X and Y resoluton. The use of depth of nteracton n calculatons allows a closer match between sensor outputs and characterzaton tables. The advantage of properly matched characterzaton tables s most evdent near the edges and corners of the crystal where the resoluton of 2D SBP usually suffers. II. DETECTOR DESIGN AND METHODS A. cmce Detector Module The cmce detector module s modeled as beng composed of a 49.6 mm by 49.6 mm by 15 mm LYSO crystal coupled to a 2D array of sensors [Fg. 1]. The module uses a novel sensor on the entrance surface (SES) desgn whch places the sensors on the entrance of the crystal [3]. The advantage of placng the sensor array on the entrance surface of the crystal s that t places the sensors closer to where the majorty of events Manuscrpt receved November 13, 2009. Ths work was supported n part by Zecotech, Altera, DOE DE-FG02-08ER64676, and NIH grants EB001563 and EB002117. N. Johnson-Wllams, and S. Hauck are wth the Unversty of Washngton Department of Electrcal Engneerng, Seattle, WA USA (e-mal: NathanJW@u.washngton.edu, Hauck@ee.washngton.edu). R. Myaoka, X. L, and T. Lewellen are wth Unversty of Washngton Department of Radology, Seattle, WA USA. occur. Fg. 1. Illustraton of detector module wth sensor on the entrance surface desgn. The sensor array contans 256 (.e., 16x16) MAPD elements.

The sensor array conssts of a 16 by 16 array of mcro-pxel avalanche photododes (MAPD). MAPDs are one vendor s verson of the Geger mode APD. To reduce the computatonal load of handlng 256 ndvdual sensors outputs row-column summng s used and reduces the 16 by 16 array to 32 outputs. Smulaton tools were used to determne the lght response functon of the descrbed cmce detector. To summarze, DETECT2000 [5,6] s used to determne the probablty that a lght photon generated at a specfc (X, Y, Z) poston n the crystal s detected by a specfc photosensor. GEANT [7] s used to track the gamma nteractons (both Compton and photoelectrc) wthn the crystal. For each nteracton, the number of lght photons produced by the scntllator crystal s determned. That number s adjusted for the nonproportonalty of LSO accordng to the tables reported by Rooney [8]. Posson nose s then added to the number of lght photons produced. B. Statstcs-based Postonng Method (SBP) Suppose the dstrbutons of the 32 row-column sum outputs, M = m1, m2,, m32 for scntllaton poston X,Y are ndependent normal dstrbutons wth mean {μ(x)} and standard devaton {σ(x)}. The lkelhood functon for makng any sngle observaton m gven x s: n 2 1 ( m ( )) μ x L[ m = x] exp 2 = ( ) 1 σ ( x) 2π 2σ x (1) The maxmum lkelhood estmator of the event poston x s gven by: 2 ( m ( )) μ x arg mn + ln( σ ( x) ) 2( ) x 2σ x (2) The mean and varance of the lght probablty densty functon (PDF) are created by characterzng the detector module. For the two dmensonal method the module s characterzed at a resoluton of 127x127 (X,Y). Each pont s spaced.3875mm apart gvng a unform dstrbuton across the 49.6mm X 49.6mm crystal. For more detaled explanatons about the SBP mplementaton refer to Joung and Lng [1,2]. C. Three Dmensonal SBP The same prncples of the two dmensonal SBP can be appled to a 3D search. Instead of two dmensons, the lght response functon s characterzed n 3D, gvng μ and σ as a functon of X, Y and Z. The module s characterzed for 127x127x15 postons, wth 127x127 representng the X and Y dmensons wth each X and Y poston characterzed for 15 depths. As wth the 2D mplementaton, each XY poston s on a grd wth each pont separated by.3875mm. The 15 depths, however, are not equally dstrbuted over the 15mm deep crystal; depth s characterzed at a hgher resoluton near the entrance of the crystal where events are most lkely to occur [Table 1, Fg. 2]. In total the module s characterzed for 241,935 dfferent postons, 15 tmes more than the 2D mplementaton. The lkelhood functon of 3D SBP s seen n Equaton 3. Fg. 2. Illustraton of the dstrbuton of depth levels Start Depth End Depth Level 1 0.0000 0.8275 Level 2 0.8275 1.4131 Level 3 1.4131 2.0290 Level 4 2.0290 2.6784 Level 5 2.6784 3.3653 Level 6 3.3653 4.0943 Level 7 4.0943 4.8709 Level 8 4.8709 5.7016 Level 9 5.7016 6.5948 Level 10 6.5948 7.5604 Level 11 7.5604 8.6113 Level 12 8.6113 9.7640 Level 13 9.7640 11.0404 Level 14 11.0404 12.4703 Level 15 12.4703 15.0000 Table 1. Locaton of depth levels n the 15 mm thck scntallator crystal. D. Herarchcal Search A herarchcal search can be used to produce a SBP soluton n many fewer teratons than an exhaustve search. A herarchcal search begns wth a coarse graned search, and through each stage becomes more refned, untl the maxmum resoluton s reached n the fnal stage [Fg. 3]. A sx stage, two dmensonal, herarchcal search has been mplemented and tested for prevous versons of the cmce detector module [4]. Each search stage compares nne dfferent ponts equally dstrbuted across the search area. The lkelhood of each pont s calculated and the most lkely of the nne ponts s selected as the center pont for the next more refned search [Fgure 9]. Comparng 9 dfferent ponts n each search stage allows the herarchcal SBP algorthm to solve for the postonng of a 127x127 grd n sx stages. In total, 54 data pont computatons are requred, whch s a drastc reducton from the 16,129 data pont calculatons needed for an exhaustve search. (3)

Fg. 3. Example of a sx stage herarchcal soluton for 2D SBP. E. Two Dmensonal SBP Implementaton A two dmensonal herarchcal SBP algorthm has been desgned and mplemented on an FPGA development board [4]. The mplementaton produced smlar accuracy to exhaustve methods wth a drastcally reduced number of calculatons. The sx stages of the herarchcal search were ppelned n the FPGA mplementaton [Fg. 4]. In ths arrangement the algorthm could produce a soluton wth a much hgher throughput, the tme t takes to solve a sngle stage. The FPGA mplementaton was capable of processng data n real tme, n excess of 200,000 events per second. Fg. 4. Ppelned mplementaton of Herarchcal 2D SBP III. ALGORITHM DESIGN A. Hardware Lmtatons The lmtng factor n mplementng the SBP algorthm s the large data tables requred for the fne graned search stages toward the end of the ppelne. In the fnal stage of the 2D mplementaton the sxth search stage requred a data table over 2.75MB. Snce 3D SBP characterzes the lght response functon for 15 depths t s expected that the requred tables wll be sgnfcantly larger. Custom hardware s currently under desgn for the applcaton. The custom hardware wll feature Altera's Stratx III FPGA wth three large banks of external memory wth suffcent capacty to store the large data tables requred for 3D SBP. Wth the custom hardware n progress a development board was used to ad n the desgn and mplementaton of the 3D algorthm. The development board used s the Stratx II DSP Development Board Professonal Edton whch features the Stratx II FGPA. Of partcular nterest to ths applcaton s the avalable ~8Mb of on chp memory, 32Mb of off-chp SRAM memory, and 128Mb of off-chp SDRAM memory. B. Algorthm Dervaton Many of the same deas and approaches from the 2D mplementaton were carred over to the 3D mplementaton. As a basc approach, the herarchcal search method was used to reduce the number of calculatons requred to produce a soluton. Snce the 3D mplementaton adds an extra dmenson and the detector module s slghtly dfferent some changes were needed. The smplest way to extend the 2D herarchcal search to a 3D herarchcal search s to expand some of the stages n the prevous 2D algorthm to 3D stages. The 3D search treats the depth of nteracton as an ndependent varable (just lke the X and Y dmensons), and adds the depth varable to the search algorthm. A 3D stage uses a volumetrc search, performng the same basc comparson operaton as a 2D stage but comparng a dstrbuton of ponts across the X, Y, and Z dmensons n place of just the X and Y dmensons [Fg. 5]. The 3D stage fnds the best ft combnaton of X, Y, and, Z wthn the stage s dstrbuton of ponts. Fg. 5. Groupng of voxels n a 2D search (left) and groupng of voxels n a 3D search (rght). Functonally, a 3D stage works smlar to a 2D stage wth one major dfference; a 3D stage uses a three dmensonal characterzaton table n place of a two dmensonal characterzaton table. In a 3D table, values are dependent on all three dmensons rather than n a 2D table where X and Y values are averaged accordng to the expected dstrbuton of depth. The 3D table allows a 3D stage to examne ndependent ponts across all three dmensons and select the best ft pont n X, Y, and Z. The computaton process n a 3D stage s the same as n a 2D stage, the lkelhood of each pont s calculated wth the best ft becomng the center pont n the more refned search carred out n the next stage. Snce a 3D search compares the same XY ponts of a 2D search at multple depths, a 3D search wll consder a greater total number of ponts than the comparable 2D search. For the gven system, the depth of nteracton s defned by 15 depths. As wth the XY planar search, the depth search can be dvded nto a herarchcal search, reducng the number of computatons requred to produce a soluton. Dvdng the 15 levels equally nto a herarchcal search would allow the depth soluton to be found n only 3 stages, wth each stage comparng three depths [Fg. 6]. Snce the exstng ppelne has sx stages, only half of the stages wll be converted to 3D stages. Stages not converted to a three dmensonal search wll reman two dmensonal to complete the XY search. Wth only half of the search stages requred to be volumetrc, the order of those stages n the ppelne becomes a varable for optmzaton of accuracy and memory requrements.

soluton early enough to ncrease the accuracy of the fnergraned XY searches. The best combnaton tested places a volumetrc search thrd, fourth, and ffth n the ppelne. Fg. 6: Outlne of herarchcal search to fnd depth among 15 levels. Planar searches followng any volumetrc stages n the ppelne wll also use three dmensonal characterzaton tables. The depth of nteracton found n the precedng stages s used as a constant wth the 3D characterzaton table producng a 2D dmensonal characterzaton table matched to the calculated depth of nteracton. The matched table offers the advantage that characterzaton values from the table are based on a specfc depth, whch s more accurate than a 2D table where the values are based on an average depth. The prncple downsde to the volumetrc search stage approach s the ncreased number of computatons requred. As outlned above, each volumetrc search n the ppelne performs a 3x3x3 search, 3 tmes as many calculatons as the orgnal 2D search. Wth memory bandwdth lmtatons restrctng each stage to operatng on one pont at a tme, the 3D stage has a crtcal path of 27 calculatons. Replacng a 2D stage wth a 3D stage ncreases the crtcal path length n the ppelne to that of the 3D stage, reducng throughput by a factor of three. The memory requrement for a 3D search s also drastcally ncreased. Each volumetrc search and each 2D stage followng a volumetrc search requres a 3D table. The sze ncrease of a 3D table over a 2D table s dependant on the number of levels ncluded. For example, a 3D table wth all 15 depths requres 15 tmes the memory of the correspondng 2D table. Snce the depth search s organzed as a herarchcal search not every volumetrc search requres a full 3D table. The table for the frst of the three volumetrc stages s ncreased by a factor three, the second by a factor of seven, and the thrd by a factor of ffteen. Any 2D stage followng a 3D stage s requred to have at least as many depths as the precedng 3D stage. Wth the 2D sxth stage table already requrng close to 3MB, convertng to 3D tables s a sgnfcant ncrease n capacty requrements. To evaluate tradeoffs the algorthm was tested wth the volumetrc search stages placed n dfferent locatons n the ppelne to study the effects of volumetrc search placement compared to system accuracy [Table 2]. The results show that placng the depth search too early or too late n the ppelne s detrmental to system accuracy across all dmensons. When the volumetrc stages were placed early, the X and Y resoluton proved to be too coarse/naccurate to provde a good bass for a depth search. When placed late, results ndcated that many of the calculatons early n the ppelne were naccurate due to a lack of nformaton on the actual depth of nteracton. Placng the volumetrc searches toward the mddle of the ppelne allowed a suffcently accurate XY bass for calculatng the depth whle producng a depth Locaton of Volumetrc Searches FWHM-XY FHWM-Z Stages 1,2,3 1.03 3.49 Stages 2,3,4 0.95 1.95 Stages 3,4,5 0.91 1.87 Stages 4,5,6 0.92 1.88 Table 2. Accuracy of SBP Algorthm wth volumetrc search stages placed at dfferent locatons n the ppelne. Placng a volumetrc search thrd, fourth, and ffth n the ppelne produces very good results; however there are several hardware lmtatons whch prohbt ts use. The added memory requrement of the volumetrc searches would exceed the capacty of the on-chp memory currently avalable, whch would move the later volumetrc stage s table off-chp where memory bandwdth s much more lmted. Total memory capacty s also a problem snce the sxth stage table s requred to be ffteen tmes larger than the orgnal 2D table. The number of computatons requred for each volumetrc search s also prohbtve. Gven the current hardware desgn, there s nsuffcent memory bandwdth to perform the 27 calculatons requred for a volumetrc search and meet the throughput requrements of the system. The exstng hardware wll only permt the 27 calculatons necessary n a volumetrc stage to be computed n approxmately 500 clock cycles, yeldng a throughput of 140,000 events per second. Ths s 0.7x the requred throughput. To ncrease the throughput of the system, the volumetrc search can be splt nto two ndependent stages: an XY planar search followed by a lnear Z search. Usng ths method the algorthm wll alternate between a 3x3 planar search and a 3 depth lnear search. The two stages combned do not perform the same operaton as the volumetrc search; n place of comparng all possble ponts (X,Y,Z) the algorthm frst solves for XY, and then Z [Fg. 7]. Ths change dramatcally ncreases the throughput of the algorthm, reducng the 27 calculaton operatons to 9 operatons. Fg. 7. Comparson between a volumetrc search stage (left) and a planar/lnear search stage (rght). Both the planar search and the lnear search wll use 3D characterzaton tables. The planar search wll use the depth nformaton found n the precedng stages as a constant, solvng for the best ft XY at the gven depth. The lnear search wll use the XY nformaton found n the prevous stage as a constant, solvng for the best depth ft at the gven XY. The addtonal 3 ndependent depth search stages sgnfcantly

ncreases hardware requrements over the sngle volumetrc search stage. By havng both the planar and lnear searches act ndependently as separate stages, the memory bandwdth and capacty requrements are sgnfcantly ncreased compared to the volumetrc search, as each stage would requre ts own characterzaton tables and perform ther calculaton ndependently. At the cost of some throughput, the ncrease n requred memory bandwdth and capacty requrements can be addressed. The ndependent planar search and lnear search can be grouped as a sngle stage performng two operatons. In ths confguraton the stage frst performs a planar search, followed by the lnear search. Ths allows the planar search and lnear search to share one memory table. The cost of ths s an ncrease n the crtcal path caused by the addton of the 3 lnear search calculatons to the planar stage. Compared to the orgnal volumetrc search approach frst descrbed, ths varaton allows an algorthm wth no addtonal memory bandwdth or memory capacty requrements but wth a greater than 2X ncrease n throughput. Usng the same test setup as the volumetrc search algorthm, several orderngs of the planar/lnear stages were compared [Table 3]. As was seen n Table 1 there s a sgnfcant qualty gan when performng the depth searches after the frst stage and lttle to no gan n begnnng the depth searches after the second stage. The best opton s placng a hybrd search stage second, thrd and fourth n the ppelne. Compared wth the best opton from table 1, the hybrd search stage performs smlarly, wth slghtly mproved depth resoluton and slghtly poorer XY resoluton. Locaton of Lnear/Planar Stages FWHM-XY FHWM-Z Stages 1,2,3 1.20 3.52 Stages 2,3,4 0.95 1.95 Stages 3,4,5 0.90 1.87 Stages 4,5,6 1.02 1.90 Table 3. Accuracy of SBP Algorthm wth lnear/planar search stages placed at dfferent locatons n the ppelne. Another opton avalable to ncrease throughput s placng the depth search by tself n a sngle stage, performng the entre depth search at once. Usng the prevously outlned herarchcal depth search, the DOI can be found n a sngle stage n 9 calculatons. Gven that a herarchcal search can fnd the depth n 9 calculatons, placng the depth search n a sngle stage mantans the orgnal 2D search s crtcal path length to 9 calculatons. Ths change offers several other benefts wthout much cost. By compressng all of the depth searches nto a sngle stage the memory tables needed for the coarser graned depth searches can be elmnated, reducng the capacty requrement of on-chp memory. Table 4 shows the smulated results of the algorthm wth the lnear depth search placed at dfferent locatons n the ppelne. Compared wth results from table 2, t can be seen that performng the entre depth search n a sngle stage has lttle effect on the accuracy of the system. Based on the results shown n Table 3 there s no beneft n performng the depth search after the thrd stage of planar XY search. Snce the depth search characterzaton table sze ncreases by a factor of 4 for each planar search precedng t, the depth search s deally placed after the thrd XY search, where a hgh degree of accuracy s acheved wthout requrng a larger memory than necessary. Locaton of Lnear Depth Search FWHM-XY FHWM-Z After Stage 1 1.25 3.82 After Stage 2 1.00 2.36 After Stage 3 0.90 2.02 After Stage 4 0.91 1.89 After Stage 5 1.14 1.92 After Stage 6 1.09 1.91 Table 4. Accuracy of SBP Algorthm wth sngle herarchcal search stages placed at dfferent locatons n the ppelne. A common problem to all the prevously descrbed methods s the large memory capacty requred. The full data tables for the later stages of the algorthm are qute large, wth the sxth stage 15-depth table requrng 45MB of storage space. The total memory requrement for the system exceeds 60MB, an amount greater than the development board capacty. Whle not requred, an XY planar search can beneft from the use of a three dmensonal characterzaton table over a two dmensonal one. A three dmensonal characterzaton table allows a better match between sensor outputs and table values based on the known depth rather than an average or assumed depth. However, not all 15 depths are requred to take advantage of ths. In the case of stage 4, 5 and 6 the full data table s not requred and a smplfed three dmensonal table can be used whch has a reduced number of levels. Ths reducton saves memory at the potental cost of lower accuracy. In general, the greater number of levels that can be used, the greater the accuracy. However, the results ndcate that the number of depths used s subject to dmnshng returns, where sgnfcant mprovements n accuracy correlate wth larger tables up to a certan pont when accuracy s not ganed as readly. In the best case of hgh accuracy wth a reduced table sze, 4 depths are used n stage 4, 8 n stage 5, and 15 n stage 6. Ths corresponds to the followng memory requrements: 768KB for the 4 th stage, 6MB for the 5 th stage, and 45MB for the 6 th stage. However, the development board cannot support such a large memory requrement. The general trend s that the more depths used, the greater the accuracy. Followng ths general trend, the best combnaton wll be the most detaled characterzaton tables that can ft on the development board. After allocatng some of the on-chp memory for the frst three stages and the depth stage, roughly.2mb remans for the fourth stage. Ths s only suffcent space for a sngle depth 4 th stage table. The off chp SRAM has 32KB of addressng, allowng at most a 2 depth 5 th stage table. The 500KB of addressng for the SDRAM memory s suffcent to allow an 8 depth 6 th stage table. Although an 8 depth table can be used for the sxth stage, a 4 depth table produces slghtly better results. The recommended number of depths used for the development board mplementaton uses 1 depth for stage 4, 2

depths for stage 5, and 4 depths for stage 6 (hghlghted n Table 5). Whle ths soluton s not deal, t s a sgnfcant mprovement over usng a sngle depth for the fnal stages and also represents a sgnfcant savngs n memory. Reducng stage 4 from a 15 depth table to a 1 depth table saves 2.625MB, whle the reducton n stages 5 and 6 save 9.75MB and 24MB respectvely. The reductons amount to a total 58% lower memory capacty requrement for the system. Combnng observatons from the prevous tests gves the basc framework of the three dmensonal algorthm [Fg. 8]. The frst three stages are XY planar searches, smlar to ones used n the orgnal two dmensonal algorthm. The depth of nteracton s found n a lnear search after the frst three stages and the depth s then used as part of the planar searches n stages 5 and 6. The depth remans unchanged through the last three stages of the ppelne. Fg. 8. Basc framework of 3D SBP herarchcal search. C. Dynamc Breadth Search In the deal case the soluton set forms a hyperbolc parabalod wth a smooth and consstent gradent to the most lkely soluton. In ths deal case a herarchcal search wll consstently produce the same result as an exhaustve search, as any pont closer to the absolute mnmum has a greater lkelhood, allowng the algorthm to always converge to the correct soluton. However, n practce the soluton set s rarely deal and often has local mnma or an uneven gradent. There are many reasons behnd the unbalanced gradent and local mnma: reflectons from the edges of the crystal, truncaton of the sgnals due to the edges of the crystal, and nose n the system. Reflectons from the edge of the crystal are most detrmental to system accuracy. In ths case lght can reflect from the edge of the crystal, gvng the appearance of an event n a dfferent locaton. Lastly, as wth any system, nose can have a detrmental effect to the system s accuracy. Most detrmental to system accuracy are local mnma. A local mnmum s a porton of the soluton set whch breaks from the general gradent of the set and, when vewed n a lmted scope, appears to be the absolute mnmum of the system. As a result of a local mnmum, a herarchcal search can ncorrectly select a pont farther from the absolute mnma n coarse graned stages and eventually converge to the local mnma as a soluton n place of the true mnmum. Generally, the local mnma that can trap the herarchcal search are located near the true mnmum. Local mnma and nose that have a great effect on system accuracy have the hghest occurrence on the outer edges of the crystal where reflectons are most lkely to occur. Examnng the results of data sets across a range of postons on the crystal show that events occurrng wthn 5mm of the crystal edge produce sgnfcantly poorer results than postons located closer to the center of the crystal. A hgh occurrence of the same local mnma leads to a tendency to produce ncorrect solutons n the same locaton wth a hgher than normal frequency. When vewng a large number of samples for a data pont wth local mnma, the dstrbuton of the results shows multple peaks: often the correct soluton and a peak occurrng at each locaton of a local mnmum [Fg. 9]. Fg. 9. Example of multple peakng (left) found wth a 2D herarchcal search compared to the desred soluton (rght) found wth an exhaustve search. Addtonal peaks, even those close to the correct soluton, have a great effect on the usefulness of the PET system. When constructng an mage from postonng data t s mpossble to determne the dfference between a peak caused by a correct event or by a local mnmum. Incorrectly dentfyng a peak caused by a local mnmum as a true poston leads to mage ghostng, an object that appears n the produced mage but does not actually exst. Inversely, gnorng closely grouped peaks that mght be caused by local mnma but are actually dfferent data ponts can elmnate valuable nformaton. Snce mage reconstructon cannot dfferentate between the two, t s mportant to elmnate the problem at the source and ensure that the herarchcal search s not affected by local mnma n the soluton set. Examnaton of data sets wth local mnma shows that local mnma have a tendency to occur near the absolute mnma. Snce the local mnma have a tendency to be located near the true mnma a broader fne gran search becomes a vable correcton opton. A broader 5x5 search ncreases the range of the search by.3875mm n all drectons. Expandng the fnal search stage from a 3x3 to a 5x5 reduces the occurrence of multple peakng by 41% and decreases average error by.011 mm over the entre range of postons. A 7x7 search ncreases the range of the search by.775 mm. Whle a 7x7 doesn t further decrease the rate of dual peakng, expandng the fnal search stage to a 7x7 search reduces the average error by an addtonal.001mm compared to the 5x5. When expanded to a 9x9 search there s lttle beneft over a 7x7, wth only a slght decrease n average error. Further analyss shows that postons located n the corner of the crystal are more greatly affected than those that are near only one edge. Nearly every data pont wthn 5mm of the corner had nstances of multple peakng. If the data pont s located along a sngle edge the occurrence s generally much lower and less severe. When focusng on an area 5mm from a corner a 5x5 search reduces the nstances of dual peakng by 36%, a 7x7 reduces the rate by 68%, and a 9x9 reduces the rate by a comparable amount. Smlar performance gans are also seen n average error: a 5x5 reduces average error by.09mm, a 7x7 reduces average error by.13mm, and a 9x9 reduces average error by.15mm. The cost of the expanded search s ncreased computatonal requrements. Compared to a 3x3 search, a 5x5 search requres

2.7 tmes as many calculatons. A 7x7 search requres 5.4 tmes as many calculatons as a 3x3, and a 9x9 search requres 9 tmes as many calculatons. A broader fnal search stage, whle addressng the multple peak ssue, has a great effect on system throughput. Replacng the fnal stage wth a broader search reduces throughput by 2.7X for a 5x5, 5.4 for a 7x7, and 9 for a 9x9. Snce 9x9 search requres a sgnfcantly greater number of calculatons, wth an only slght beneft over the 7x7, t s not consdered. The majorty of crystal postons receve lttle to no beneft from usng an expanded search. Examnng system accuracy and occurrences of multple peaks shows that only postons wthn 5mm of the crystal edge are benefted by a broader search. Approxmately 37% of the crystal area falls wthn 5mm of an edge. Gven an equal dstrbuton of events across the crystal ths corresponds to 37% requrng a broader search. If only events wthn 5mm of the edge of the crystal nvoke a 7x7, throughput s reduced to.38x compared to a 3x3 fnal stage. Further mprovements can be made usng a 5x5 search for data located near only one edge of the crystal and a 7x7 search for data located n the corner of the crystal [Fg. 10]. Ths modfcaton results n an ncrease of throughput of 1.5X over only usng a 7x7 search, reducng throughput to.57x when compared to the 3x3 search. the prevous stage ndcates that the poston s near the edge the algorthm enters the 5x5 search state, and f the prevous stage ndcates a corner poston the algorthm enters the 7x7 search state. Note that the state of the fnal stage has no effect on the operatons of the precedng stages. IV. RESULTS The new three dmensonal SBP algorthm produces hgher qualty results than the prevous two dmensonal SBP for our new detector module. The dfference between the two algorthms s greatest at the edge of the crystal where the three dmensonal method produces greatly mproved results and the two dmensonal method typcally performs poorly (Table 5). Another advantage that s not captured by the FWHM results s that when usng a 2D look up table there s a sgnfcant problem wth dual peakng of the postonng estmate. Ths has been corrected n part by usng 3D SBP and further mprovement was ganed by the use of the dynamc breadth search. The 3D SPB algorthm was ppelned and has been mplemented on a commercally avalable FPGA development board. The mplementaton can process events n excess of 220k events per second, whch s greater than the maxmum expected concdence rate for an ndvdual detector. In comparson to an exhaustve search, whch compares the lkelhood of every pont, the mplemented algorthm produces a soluton n much fewer calculatons and wth a smlar accuracy [Table 6]. Fg. 10: Search breadth regons A 7x7 search s computatonal ntensve, requrng more than fve tmes the calculatons of a 3x3 search. Whle only a porton of data ponts are expected to nvoke a 7x7 search, the computatonal ntensty of those select data ponts have a profound mpact. Whle multple fnal stage data tables allowng parallel calculatons would be deal, the sze of a full table makes t mpractcal. Alternatvely, n the event of a 7x7 search the ppelne s stalled whle the 7x7 search performs the extra calculatons; n ths stuaton the characterzaton tables of the prevous stage are unused for much of the tme. In an effort to ncrease the throughput of a 7x7 search the ffth stage table can be used to perform some of the calculatons. 9 of the ponts n a 7x7 search are avalable n the ffth stage table; performng those calculatons n parallel ncreases the throughput of a 7x7 search by 22%. Overall ths change results n an expected 1.05X speed up averaged across all data ponts compared to usng the sxth stage table by tself. The decson of search breadth s based on the poston found n the precedng stage. When the precedng stage ndcates that the locaton of the event s wthn 5mm of the edge of the crystal the algorthm s adjusted accordngly. If Dstance From Edge of Crystal FWHM-XY 2D Algorthm 3D Algorthm 10.46 0.8 0.94 8.91 0.81 0.67 7.36 0.9 0.41 5.81 1.4 0.94 4.26 1.58 0.74 Table 5. Comparson between two and three dmensonal SBP methods FWHM-XY FHWM-Z Method Exhaustve SBP 0.9 1.88 3D Herarchcal SBP 1.02 2.07 Table 6. Comparson between exhaustve and three dmensonal SBP methods. REFERENCES [1] J Joung, et al., CMce: a hgh resoluton anmal PET usng contnuous LSO wth a statstcs based postonng scheme, NIM. Phys. Res. A, vol. 489, no. 1-3, pp. 584-589, Aug. 2002. [2] T. Lng, et al., Performance comparsons of contnuous mnature crystal elements (cmce) detectors, IEEE TNS, vol. 53, pp. 2513-2518, 2006. [3] X. L, et al., A hgh resoluton, monolthc crystal PET/MRI detector wth DOI postonng capablty IEEE EMBS, pp. 2287-2290, 2008 [4] D. DeWtt, Master s Thess, Department of Electrcal Engneerng, Unversty of Washngton, 2008.

[5] G. Tsang, et al., A smulaton to model poston encodng multcrystal PET detectors, IEEE TNS, vol. 42, pp. 2236-2243, 1995. [6] G. F. Knoll, et al., Lght collecton n scntllaton detector compostes for neutron detecton, IEEE TNS, vol. 35, pp. 872-875, 1988. [7] S. Agostnell, et al., "Geant4 a smulaton toolkt," NIM Secton A, vol. 506, pp. 250-303, 2003. [8] B. D. Rooney, et al., Scntllator lght yeld nonproportonalty: calculatng photon response usng measure electron response, IEEE TNS, vol. 44, pp. 509-516, 1997.