- PDF Free Download

A Real-Time Foveated Senso with Ovelapping Receptive Fields Mac Bolduc and Matin D. Levine Cente fo Intelligent Machines McGill Univesity, 3480 Univesity St., Monteal, Quebec, Canada H3A 2A7 email: fbolduc,levineg@cim.mcgill.edu TR-CIM-95-06 Mach 1995 Cente fo Intelligent Machines McGill Univesity Monteal, Quebec, Canada Accepted fo publication, Real-Time Imaging, May 1996 Postal Addess: 3480 Univesity Steet, Monteal, Quebec, Canada H3A 2A7 Telephone: (514) 398-6319 Telex: 05 268510 FAX: (514) 398-7348 Email: cim@cim.mcgill.ca

A Real-Time Foveated Senso with Ovelapping Receptive Fields Mac Bolduc and Matin D. Levine Cente fo Intelligent Machines McGill Univesity, 3480 Univesity St., Monteal, Quebec, Canada H3A 2A7 email: fbolduc,levineg@cim.mcgill.edu Abstact Vision systems fo autonomous obots sometimes equie high esolution, sometimes a wide eld-of-view, and always fast pocessing. To achieve these same goals, the pimate etina pefoms nonlinea \image" data eduction which ultimately poduces a elatively small output. Such a data eduction scheme povides a compomise between the equiements of eld-of-view, esolution, and speed of pocessing. An ovelapping eceptive eld (RF) data eduction model, as poposed in the system pesented hee, is based on the etina, and oes exibility in the selection of RF aveaging masks. This exibility is illustated with a set of outputs poduced using unifom, Gaussian, dieence-of-gaussians, and edge detection masks. Howeve such models ae moe computationally expensive than thei non-ovelapping countepats. To compute the equisite mapping, an adapted scan-line algoithm is used due to its eciency with espect to memoy and speed. To achieve at least 10 fames pe second, we employ MIMD paallel pocessing using six TMS320C40 digital signal pocessos. An image is captued by one pocesso, and the data is distibuted line by line to up to fou othe mapping nodes. Each of the mapping nodes poduces patial esults that ae combined by the last node. Thoughput measuements show that the minimum median thoughput is 11.9 fames pe second fo useful model paamete combinations. Measuements using one to fou mappings nodes show that the speedup is linea with espect to the numbe of mapping nodes. The output of this data eduction system has been employed to compute points of inteest in the eld-of-view, which ae then used to alte the camea gaze. Resume Les systemes visuels pou obots mobiles autonomes ont pafois besoin a la fois dun lage champs visuel, dune haute esolution dimage, et dune vitesse de taitement tes apide. Pou obei a ces m^emes containtes, le systeme visuel des pimates a ecous a une technique de eduction non-lineaie de donnees pou limite la quantite de donnees a taite. Cette technique pemet un compomis ente les containtes de champs visuel, de esolution et de vitesse de taitement. Un modele de eduction dimage, base su la etine et compenant des champs eceptifs qui se chevauchent, pemet une gande exibilite quant au type de masques utilises pou la eduction de donnees. Cette exibilite est demontee a laide de masques de moyennage unifomes ou Gaussiens, de dieence de Gaussiennes ou de detections de contous. Pa conte, un tel modele demande plus de calculs que ceux ou les champs eceptifs ne se chevauchent pas. Pou obteni un systeme qui calcule des images eduites a pati dimages conventionnelles, un algoithme de balayage de ligne a ete choisi et adapte

pou sa vitesse de taitement ainsi que sa demande moinde de capacite de memoie. Un eseau paallele de pocesseus MIMD, b^ati a pati de pocesseus TMS320C40 de Texas Instuments, pemet de ealise la eduction dimage a des vitesse de plus de 10 images pa seconde. Un pocesseu dedie a la captue dimage obtient limage dentee et distibue les lignes d'image a un ensemble de pocesseus de taitement. Chacun de ces pocesseus de taitement calcule un esultat patiel de la eduction de donnees. Les esultats sont ensuite combines pa un aute pocesseu. Les mesues de vitesse du systeme demontent une vitesse moyenne de 11.9 images pa seconde pou lensemble utile des paametes du modele de eduction dimage. La mesue des vitesses etablie pou des conguations de un a quate pocesseus de taitement monte egalement que la vitesse est lineaiement dependante du nombe de pocesseus utilises. Les images poduites pa le systeme sont utilises pou en extaie des points dinte^et qui seviont de efeence aux deplacement de la camea. Acknowledgements This eseach is patially suppoted by the Natual Sciences and Engineeing Reseach Council (NSERC) of Canada and by the National Centes of Excellence Pogam though IRIS. MB would like to pesonally thank NSERC fo thei scholaship suppot. MDL would like to thank CIAR and PRECARN fo thei suppot.

1. Intoduction 1 1. Intoduction If an autonomous mobile obot is to use a vision system fo navigation and taget ecognition, it must suppot a vaiety of equiements. It needs high (enough) esolution fo obtaining detail in the egions of inteest. It also needs a wide eld-of-view fo avoiding collisions, detecting looming objects, and detemining point of inteest. To be useful, the system equies a shot, i.e. eal-time esponse. A unifom image senso would have high esolution and a wide eld-of-view, implying a vey lage output image. Pocessing such lage amounts of data equies intensive computation. Fo example, it is not atypical fo algoithms unning on standad wokstations to take of the ode of seconds to minutes to complete. Fo autonomous obots these long esponse times ae clealy inadequate. Studies of pimate visual systems eveal that thee is a compomise which simultaneously povides a wide eld-of-view, high spatial esolution in the egion of inteest, and a small fast-to-pocess output. The basis of this compomise is the use of a vaiable esolution sensoy system as an image data eduction scheme (see [1] o [2] fo details of the biological facts behind this scheme). This technique poduces two output images, as illustated in Figue 1. One of the outputs, called the fovea, contains the cental egion of the input at maximal esolution (Figue 1(c)). The othe output image, the peiphey, is based on a log-pola coodinate system (Figue 1(d)). It poduces a eduction of the input data suounding the fovea. This eduction is popotional to the distance fom the cente of the fovea. Figue 1(b) contains the esult of an invese mapping of the output images back to the input domain. It illustates the adial natue of the data eduction and should be compaed with the oiginal camea input in Figue 1(a). Techniques using such a data eduction method have appeaed in the liteatue. Fo example, the method has been poposed as an image compession scheme fo visual telecommunication ove standad telephone lines [3] and fo emote telemetic diving [4]. Systems fo object ecognition have been suggested which ae based on the popety that otating o scaling an object about the cente of the fovea coesponds to tanslating the coesponding object in the peiphey [5, 6]. This so-called scale and otation invaiance popety is also useful fo solutions to optic ow estimation, time-to-collision detemination, and steeo dispaity measuements [7, 8, 9]. Othe wok has dealt with techniques fo tacking and docking spacecaft [10, 11, 12]. This whole body of eseach indicates that using a foveated senso not only povides a good compomise, but also that it possesses inteesting featues fo machine vision. A vaiety of appoaches have been used fo building data eduction systems. One is the development of a senso based on VLSI technology [13, 14]. Anothe is the design of a lens system which povides the nonunifom esolution by ediecting light [15, 16, 17]. Both of these methods yield systems with fast data eduction thoughputs, and in the case of the lens, a vey wide eld-of-view. Howeve, a majo dawback of these appoaches is that the data eduction paametes of the physical design necessaily emain xed. A thid method is to use a standad video camea with some digital pocessing hadwae to compute the foveated etinal output [18, 19]. Although this is not as fast a solution fo lage images, it does povide the most exibility since the data eduction paametes can be vaied at will. Theefoe, this is the pefeed appoach fo a foveated senso. The guidelines fo the design of the etinal data eduction system pesented in this pape ae that it has to (i) map a standad gay-scale video image (> 200 000 pixels), (ii) have a

2 A Real-Time Foveated Senso with Ovelapping Receptive Fields (b) (a) θ (c) (d) Figue 1. Example of Retinal Data Reduction. (a) A egion of a 484 by 484 pixel video input image used fo the data eduction mapping. The gaze is at the cente of the image. (b) The invese mapping image showing the two output images in the same input plane. The pixels in this image ae of the identical size as those in (a) in ode to illustate how the peipheal image data ae compessed. (c) Enlaged foveal output image (the cental disc of the input image with 50 pixel adius). (d) Enlaged peiphey output poduced by etinal data eduction (31 by 126 pixels). fast thoughput (at least ten fames pe second), and (iii) povide the possibility of being embedded on a mobile obot. Thee ae thee steps involved in the design of a foveated senso which satises the above guidelines. The st is the selection of a data eduction model. Section 2 pesents an oveview of existing models and outlines the ovelapping eceptive eld (RF) model used in ou pototype. The second step is the selection of an algoithm fo computing

2. Retinal Mapping Model 3 the data eduction mapping. In section 3, an algoithm based on a class of compute gaphics algoithms is explained. The last step is concened with the implementation of this technique in ode to obtain the desied thoughput. This is attained using a paallel netwok of TMS320C40 1 digital signal pocessos. Section 4 pesents a detailed desciption of how the implementation is achieved using such a netwok. In section 5, the system pefomance as it elates to thoughput and speedup is discussed. Section 6 illustates the exibility of using an ovelapping RF model fo machine vision by showing a vaiety of output images based on dieent RF aveaging masks. Lastly, section 7 concludes this pape with a ecapitulation of the key points pesented hee, and mentions cuent developments using this system. 2. Retinal Mapping Model The etinal mapping model dictates how to convet input image pixels to an output epesentation. In this section, a bief oveview of the two classes of etinal models found in the liteatue is pesented. The two classes ae called hee confomal mapping models and ovelapping cicula eceptive elds models. Both types have the popety of scale and otation invaiance in thei peipheal outputs [20, 21, 22] and assume squae input pixels. This oveview leads to a pesentation of the mapping model used in ou system and the output epesentation selected. The two confomal etinal mapping models ae based on the complex logaithm function: w = log z [23] and w = log(z + a)[21]. Complex vaiables z and w epesent pixel coodinates in the input and output images, espectively. Fo the log(z) model, the input image mapping template consists of a log-pola gid: a set of ays emanating fom the cente of the image at egula angula incements, and a set of cicles whee the atio of the adii of adjacent cicles is constant. Figue 2(a) shows a mapping template based on the log(z) model. Adjacent cicles and adjacent ays bound ings (annuli) and sectos, espectively, in the input image. Pixels within a given ing and secto make up an RF, and ae unifomly aveaged to poduce an output value. These values ae aanged in a ectangula peiphey output gid, shown in Figue 2(d). Hee one axis coesponds to the index of the secto in which the RF lies (angula component), and the othe, to the index of the ing in which the RF lies (log-adial component). The esult is a log-pola coodinate system. Since the log(z) function has a singulaity at z = 0, a second output image is poduced: the fovea shown in gue 2(c). It consists of a disc containing pixels fom the cente of oiginal input image. An altenative fo dealing with the singulaity is the log(z + a) input image template(not shown in the gue). It can be ceated by cutting a vetical slice of width 2a fom the middle of the log(z) template and binging the two emaining pieces togethe [24]. The shape of the RF's is the same as those in the log(z) template except along the vetical midline, and the pixels within each RF ae also unifomly aveaged. The output values ae aanged in a buttey image: each of the left and ight halves of the template coesponds to a hoizontal paabolic shape, and the vetices of the two paabolas touch. The hoizontal axis coesponds to the adial (ing) component of the RF, and the vetical axis, to the angula (ay) component. Adjacent RF's on each side of the vetical midline ae no longe adjacent in the output image. No foveal disc is needed fo this vesion of the mapping since the singulaity poblem no longe exists. Both of these models simulate the 1 Tademak of Texas Instuments Inc.

4 A Real-Time Foveated Senso with Ovelapping Receptive Fields mapping between photoecepto aeas on the etina and thei aangement in the visual cotex [25, 26]. In fact, the log(z + a) model is a bette appoximation to the physiological data obtained on this mapping [26]. Fo both of these confomal mapping models, the RF's geneated do not ovelap, and thei contents ae unifomly aveaged. Each input image pixel contibutes to only one RF value. y y x x ing secto (RF) fovea RF fovea (a) (b) y x log( z ) θ (c) (d) Figue 2. (a) Input image mapping template fo the log(z) model. (b) Ovelapping cicula RF template constucted using ou vaiation of the model (=.28,!=.3). The dashed lines epesent the log-pola gid on which the RF's ae centeed. (c) The fovea: a copy of the pixels in the cental disc of eithe mapping template. (d) The peiphey output in log-pola coodinates. Each pixel in this image coesponds to one RF in eithe of the templates. By contast, ovelapping cicula RF models pemit each pixel to contibute to moe than one RF, and not all pixels contibute to the same numbe of RF's. A mapping template of such a model is shown in Figue 2(b). The esulting fovea and peiphey ae displayed in Figues 2(c) and 2(d), espectively. The cicula RF's ae a bette appoximation of

2. Retinal Mapping Model 5 the actual aveaging aea of photoeceptos obseved on the etina [2]. Notwithstanding the incease in mapping complexity, ovelapping cicula RF models ae attactive fo a mobile obot visual system because a vaiety of RF masks (e.g., Gaussian, dieence-of- Gaussians) can be easily implemented. Hence we have selected this type of model fo ou system. Wilson [22] has poposed such a model and dened biologically-based paametes to constain the placement of the RF's on the image template. These ae the foveal adius f, the RF ovelap facto!, and the RF size-to-eccenticity atio (symbols ous). In this context, RF size is the diamete of the RF cicle, and the eccenticity of an RF is the adial distance fom the cente of the RF to the cente the image. The RF ovelap facto indicates the faction of the diamete of a eceptive eld that is ovelapped by one of its neighbos. Based on neuophysiological data, Wilson xed the RF ovelap! to 50%, and the RF diamete to adial distance atio to 0.25. Yamamoto et al. [27] have extended Wilson's model to pemit vaiable and!. The mathematical expessions fo positioning the cicula RF's based on the paametes in both Wilson's and Yamamoto's models contain a small-angle appoximation. We have investigated anothe set of RF positioning expessions, also with vaiable and!, whee this appoximation has been emoved. The template of Figue 2(b) was obtained using ou vesion of the RF positioning equations. The RF centes ae aanged along ays oiginating at the cente of the image, and along cicles also placed at the cente of the image. In fact, the RF centes ae at the intesection of the cicles and ays of a log-pola gid. To constain the placement of these centes, we need to deive two positioning paametes fom the thee model paametes, namely, the angle between adjacent ays and the atio k of adii of adjacent cicles. The ay spacing angle is given by (1) = cos?1 (1? 2 =2(1?!) 2 ) As fo the adii atio, we used an adjacent cicle spacing constaint which ensues that the aveage of the RF ovelap of an RF on a cicle i by its neighbos on cicles i? 1 and i + 1 is the desied ovelap!. Given that the st ing of RF's is at the foveal adius f (whee i = 0), this constaint foces the cicles to have adii i, given by (2) whee (3) k = i = f k i ;??(1? 2!)? (4 + 2 (1? 2!) 2? 2 ) 1=2 (? 1) The dieence between these positioning paametes and Yamamoto's vesion is vey small (see [1] fo the deivations of k and and details of this compaison). Thee ae thee vesions of the ovelapping RF model which povide simila templates, but die on how pixels within the RF's ae aveaged, how the cental aea of the input image (the fovea) is teated, and how the output values ae oganized into output images. In Wilson's model, the cental aea of the template (the aea within the st RF cicle) also contains RF's, and the size of all foveal RF's is the same as those found at the fovea bounday; this poduces a xed numbe of foveal output pixels egadless of the foveal :

6 A Real-Time Foveated Senso with Ovelapping Receptive Fields adius. Unifom aveaging of pixels within RF's is used, and the output values ae oganized in a manne simila to Schwatz's buttey image. Yamamoto et al. have implemented thei vesion of Wilson's model in a system called Fovia [27] which uses a MasPa (SIMD) machine. Each RF is assigned to a pocesso, and RF values ae the (Gaussian) aveaged intensities of a xed numbe of input pixels, egadless of the RF size 2. Like Wilson's model, the fovea contains RF's and the foveal output size is independent of the foveal diamete. Unlike Wilson's model, and much like the log(z) model, two output images ae poduced: a fovea with a Catesian coodinate system and a (ectangula) peiphey with a log-pola coodinate system. Based on Yamamoto te al's implementation, Baon and Levine have ecently edesigned this etinal mapping to make use of all the pixels within the RF cicles, while employing a ciculaly symmetic Gaussian mask to weight the pixels within the RF's [28]. Also, the fovea no longe contains RF's, and the foveal output is just a copy of the cental disc of the input image. As fo the system discussed in this pape, it was decided to poduce two output images as in [28]. Hence, the input image is pocessed using the mapping template of gue 2(b), and the output images illustated by gues 2(c) and 2(d) ae poduced. The RF aveaging mask is left as a selectable system paamete. This pemits the computation of a vaiety of lte templates as is shown in section 6. 3. Mapping Algoithm Given a etinal data eduction mapping model, the next step in the ceation of a eal-time system to pefom this mapping is the design of an ecient algoithm. This is the topic of this section. We stated by looking at a class of compute gaphics algoithms. Scanline (SL) algoithms ae used fo fast egion lling and hidden suface emoval on aste scan displays. The latte have one point in common with standad video cameas: they ae line-by-line seial devices. The oiginal motivation behind ou use of an SL algoithm was to detemine whethe it was possible to pefom the etino-cotical mapping as fast as the video data was eceived fom the camea. The high output ate of the camea makes this vey dicult. Yet analysis shows that using the scan-line appoach equies fewe computations to compute the mapping than a staightfowad table look-up appoach [1]. The SL algoithm is descibed hee in the context of computing the etino-cotical mapping. Reades ae efeed to [29] fo details of the technique as it petains to compute gaphics poblems. The basic pinciple behind this algoithm is that on any given line, pixels that contibute to a given RF fom a span of adjacent pixels. This span coheence pinciple is illustated in Figue 3. Computing the data eduction mapping entails aveaging the set of pixels within each RF. To do so, each pixel within a given RF is multiplied by a weight facto which is detemined by the type of aveaging mask used and the location of the pixel in the RF (see section 6 fo examples). All of the weighted pixel values ae then added to poduce the nal RF output value. The scan line algoithm pocesses the input image line by line, and RF span by RF span. Fo a given RF span on a given line, the pixel intensities ae weighted and the poducts ae accumulated. Hence, the contibution of the RF span to the RF value is computed, 2 This is done fo computational easons and involves a subsampling stategy.

3. Mapping Algoithm 7 scan line RF span Figue 3. Receptive Field Span. The intesection of a scan line (dotted) with a paticula mapping template. The solid line epesents the pixel span in a eceptive eld fo the given scan line. and this contibution is then added to the value cuently stoed fo the RF (the RF value is initialized to zeo befoe the mapping begins). Afte the last line of the input image is pocessed, each RF value contains the sum of all of it's spans contibutions, i.e. the weighted sum of pixel intensities within the RF. This algoithm can be summaized as follows: ASL Algoithm 8 input image line l do 8 span s on line l do contibutions 0 8 pixel p in span s of RF z do contibutions contibutions + w(p; z)i(p) end do value(z) value(z) + contibutions end do end do whee w(p; z) is the weight of input pixel p in RF z, and I(p) is the intensity value of input pixel p. Fom the above, it is clea that updating the values of RF's which intesect a given line occus only once pe line. In typical SL algoithms, a list of span infomation is maintained fom one line to the next. Hee, this would be done by updating the span endpoints and weight lists fo RF's still intesected by the new line, eliminating list enties fo RF's which no longe intesect the new line, and adding new list enties fo RF's which begin on the new line. Updating the span infomation fo evey line is a fai amount of wok. At the cost of additional

8 A Real-Time Foveated Senso with Ovelapping Receptive Fields memoy, the span infomation fo evey RF and line intesection can be pecomputed and stoed in a look-up table, called hee a span infomation table (SIT). Hence, a unning pointe to the SIT data can be maintained by the mapping algoithm. We call this vesion of the algoithm the Adapted-Scan-Line (ASL) algoithm. The infomation needed fo each span fom the SIT consists of a efeence to the beginning of the span data, a efeence to the beginning of the weight list fo the span, the size of the span, and the location of the peipheal image output value. A copy of the span weight lists could be stoed fo each span, but since many aveaging masks ae ciculaly symmetic (e.g., unifom, Gaussian, dieence-of-gaussians), coesponding spans in RF's of the same size can use the same pixel weighting factos. Theefoe, the RF weight factos need only be stoed once fo each RF size. A theoetical compaison between the ASL algoithm and one based on a pe pixel/pe RF look-up table technique shows that the SL algoithm equies 65% less memoy and 58% fewe pocessing cycles [1]. 4. Senso Implementation Even with the ecient ASL algoithm just descibed, the implementation of etinal data eduction can benet fom speed inceases due to paallelism. As mentioned in the intoduction, ou implementation of the etinal data eduction system is based on a netwok to TMS320C40 DSP micocomputes. Ideally, when using n pocessos to pefom an algoithm, the computational time equied would decease by a facto of n. In othe wods, we would obseve a speed-up of n. Howeve, as Ahmdal's law states, the best attainable speed-up (n! 1) has an uppe bound [30]. Even if the ideal speed-up is unattainable, the goal in developing a paallel implementation is to maximize speed-up by maximizing pocesso utilization and system thoughput. Implementing an algoithm on a paallel pocessing system entails patitioning the wok into tasks and the data among these tasks. In this section, the netwok equied to pefom the data eduction mapping is deived based on the constaints imposed by the micocompute technology. Then each of the components of the netwok ae in tun descibed in tems of the wok each must pefom. Moe details about each component can be found in [1]. 4.1. Data and Task Patition. The hadwae used in the system consists of decoupled nodes of C40 pocessos which foce some equiements on the esulting pocesso conguation. Specically, since the C40 nodes shae no memoy, and moe than one node is needed to pefom the ASL mapping algoithm, a mechanism fo distibuting the input data to the mapping nodes is necessay. The st question to answe is how to divide the input image data among the mapping nodes. We note that (i) image data ae captued line by line, (ii) the ASL algoithm is line oiented, and (iii) the captued data lines ae stoed sequentially in the fame bue. Thus the natual answe is to distibute the input image line by line fom the node which captues the image. Each of the n mapping pocessos eceives one line out of n, and no two pocessos obtain the same data. Figue 4 illustates the poposed data distibution technique. Fo most lines the template intesects only a potion of the line's video data. Theefoe it is only useful to tansfe this esticted segment of the data to the mapping node. Also, the scan lines intesecting the fovea contain a segment of pixels which is unused by the data eduction mapping. The foveal data can be tansfeed diectly

4. Senso Implementation 9 mapping node 1 mapping node 2 mapping node 3 mapping node 4 mapping node 1 Figue 4. Input Data Distibution Scheme. The data is tansmitted line by line to whicheve node pefoms the mapping fo the line. Only the data necessay fo the mapping ae tansfeed. Fo lines intesecting the fovea, the mapping data ae divided into two segments which ae tansfeed sepaately to the same mapping node. to the taget system. Thus to minimize the quantity of data tansfeed, the image data on lines intesecting the fovea ae divided into two goups, coesponding to the left and ight segments of the template. The two data segments on these lines ae then tansfeed sepaately to the same mapping node. We note that although thee is no ovelap between the data eceived by each mapping node, the RF's actually do ovelap. Thus, except in the case whee a single mapping node is used, none of the mapping nodes will eceive all of the data fo any given RF unless the RF has a diamete of one pixel. These cases aside, a mapping node k computes only pat of the value p(i; j) of the RF on ing i and ay j. Theefoe (4) p k (i; j) = X w(x; y; i; j)i(x; y); 8(x;y)2L(i;j;k)A(i;j) whee A(i; j) is the set of input pixels within the bounday of RF with coodinates (i; j), and L(i; j; k) is the subset of pixels of A(i; j) which intesect the input line segments mapped by pocesso k. To obtain the total RF value, some means of combining the patial esults poduced by the mapping nodes is needed. This is achieved by an additional node, called the combination node. Given the patial esults fom each of the P mapping nodes, this pocesso eectively computes p(i; j) fo all RF (i; j): (5) p(i; j) = PX p k (i:j): k=1

10 A Real-Time Foveated Senso with Ovelapping Receptive Fields Image Data Image Data Distibution ASL Algoithm ASL Algoithm ASL Algoithm Patial Results Combination Node Peiphey Taget System ASL Algoithm Mapping Nodes Fovea Taget System Figue 5. Retinal Data Reduction Pocesso Netwok. This gue illustates the patition of tasks needed fo computing the foveated images. To summaize, when computing the data eduction mapping using multiple mapping nodes, the lack of shaed memoy foces the use of a distibution node. Thus an additional node is needed to combine the patial esults poduced by the mapping nodes. Figue 5 illustates the esulting C40 netwok. One featue intoduced in this paallel implementation of the ASL algoithm is scalability. Theoetically, mapping nodes can be added to the netwok as long as the distibution and combination nodes can suppot them. Howeve given the opeating system used fo development 3 and the task patition just descibed, it is not possible to congue moe than fou mapping nodes. 4.2. Distibution Node. The video camea which is connected to the distibution node poduces images of 512 by 484 pixels. Since the data eduction template (see Figue 2(b)) must t within these 484 lines, only pixels within a cental 484 by 484 squae ae eectively used. The tasks pefomed by this node ae exclusively data tansfes, all of which ae done using the DMA copocesso. Two sets of image data tansfes ae equied. Since the foveal output of the system is a diect copy of the cental egion of the input image, one set of tansfes consists of sending the foveal output data to the taget system. The othe set of tansfes consists of distibuting image data, line by line, to the mapping nodes. In the distibution node, thee is no explicit veication of image captue completion. Intuitively, one would expect the system to wait fo the beginning of the captue of a new fame befoe beginning the data tansfes. Also, a check should be made to ensue that a line has been captued befoe tansfeing it. Othewise, some image data might be sent out fom the fame stoe befoe it is updated. If the data tansfes and subsequent pocessing take longe than a single fame time, synchonizing them with image captue would educe the achievable system thoughput. As illustated in Figue 6(a), if the tansfes and subsequent pocessing take moe than one fame time but less than two, image captue would stop afte the st fame. Once 3 The opeating system used is Helios-C40 fom Peihelion Inc.

4. Senso Implementation 11 Stat of Captue 0 ms Stat of Data Tansfes Stat of Captue 0 ms Stat of Data Tansfes Captue of fame 1 Captue Halted 33 ms Pocessing of fame 1 Captue of fame 1 Captue Continued 33 ms Pocessing of most ecent fame data Skipping fame 2 50 ms End of Pocessing Idle peiod Captue of fame 2 50 ms End of Pocessing Stat of Data Tansfes Stat of Captue 66 ms Stat of Data Tansfes Captue Continued 66 ms Pocessing of most ecent fame data (a) (b) Figue 6. Captue Synchonization. (a) In this example the pocessing time is slowe than the fame captue time, and data tansfes ae synchonized with the beginning of image captue. The system pocesses one out of evey two fames. The thoughput is theefoe limited to 15 fames pe second. (b) If the data tansfes ae not synchonized with the captue, the pocessing thoughput is as fast as the pocessing can allow. In this example, the thoughput is 20 fames pe second. the mapping was complete, captue would be enabled again and would eectively stat at the beginning of the thid fame. Theefoe, the thoughput would be limited to half the video ate (fteen fames pe second). Similaly, if the mapping time is geate than n? 1 fame times, but less than n fame times, synchonizing the data distibution with captue would limit the thoughput to 30 fames pe second. n Instead, by letting the fame captue un feely, the subsequent etinal data eduction pocessing can poceed as fast as it can. If the tansfe and pocessing time is geate than the fame captue time, then image captue will peiodically catch up with and pass the point whee data tansfes occu. Theefoe the data pocessed on a given line is always at most 33 milliseconds old. This method is illustated in Figue 6(b); it is the appoach used in ou implementation. The thoughput of the pocessing system is dependent solely on the wok involved in computing the data eduction mapping. Thee is an appaent poblem with this method. If an object in the input image moves at a speed whee its position in the image changes signicantly duing the mapping time, thee is a possibility that the object may appea in moe RF's than would be if it wee stationay. Howeve, this poblem is also obseved in conventional cameas. If an object

12 A Real-Time Foveated Senso with Ovelapping Receptive Fields moves signicantly duing the fame time, its position on the odd lines will be dieent fom the even lines as a esult of intelacing 4. 4.3. Mapping Nodes. The mapping nodes ae at the heat of the data eduction system. They compute the ASL algoithm descibed in section 3 on a subset of lines of the input image. Given the netwok conguation in Figue 5, thee tasks ae equied fom the mapping node. Fist, each one must obtain image data fo mapping. The image data ae eceived in data blocks which coespond to a segment of pixels of a given line intesecting the mapping 5. Second, these nodes compute the etinal mapping fo the newly eceived data block using the ASL algoithm. Thid, upon completion of the mapping, each node must tansfe its patial esults to the combination node. These thee tasks can actually be pefomed in paallel with the help of the DMA copocesso. To do this, two bues ae assigned fo data input, and two moe fo data output. The DMA copocesso tansfes input data into one of the bues while the CPU pefoms the mapping on the othe. Since the input data block size is at most 484 pixels, the two on-chip RAM memoy banks ae used as input bues. Similaly, the CPU places the mapping esults in one output bue while the DMA copocesso tansfes the esults fo the pevious fame fom the othe. These bues ae faily lage and hence ae placed alongside the span infomation table (SIT) in SRAM memoy. 4.4. Combination Node. This last node in the netwok completes the data eduction mapping by obtaining the patial esults of each of the mapping nodes and combining them. The nal esult, the complete peiphey, is then tansfeed out to the taget system. Hee again input tansfes, computation, and output tansfes can be pefomed in paallel. The method used to obtain the nal peiphey consists of pocessing the patial esults fom each mapping node line by line. A line of peiphey data coesponds to the RF values of an RF ing. Simila to the technique used in the mapping nodes, two combination bues ae employed. One contains the peiphey line being combined, and the othe has the pevious peiphey line being tansfeed out. Also, some additional input bues to hold the patial esults eceived ae used. 5. Pefomance of the Foveated Senso In this section, we pesent esults of the pefomance of the foveated senso design descibed above. To secue pefomance data, the system was un with a taget system consisting of pocesses which measue the time between the eception of consecutive output images. To obtain data meaningful fo evaluating the pefomance, some pactical consideations must be addessed st. Following this, two aspects of the pefomance will be discussed. The st one is system thoughput with which the andomness of the measuements obtained is illustated. The othe aspect is system speed-up. The goal of this section is to obtain an idea if how well the design pefoms. Theefoe, the vaiance in the data is not analyzed in depth. A discussion of pocesso usage and latency fo this system is discussed in detail in [1]. 4 This may not be tue if the camea uses a mechanical o electonic shutte. 5 Recall that lines intesecting the fovea poduce two such data blocks.

5. Pefomance of the Foveated Senso 13 5.1. Pactical Consideations. The st issue is the selection of a set of paamete values to use in the expeiments. Based on the pactical paamete anges descibed in [1], the values chosen ae: (6) (7) (8) f 2 f5; 50; 100g 2 f0:05; 0:1; 0:2; 0:3; 0:4; 0:5; 0:6; 0:7; 0:8; 0:9; 1:0g! 2 f0:0; 0:05; 0:1; 0:15; 0:2; 0:25; 0:3; 0:35; 0:4; 0:45; 0:5g: The values fo the size/eccenticity atio and the ovelap facto! ae taken at egula intevals 6. The foveal adius f = 5 epesents a case whee the fovea is pactically nonexistent. A typical situation whee the peiphey coves most of the input image is povided by f = 50. The value f = 100 poduces a vey lage fovea. In tems of the pefomance of the system, lage fove poduce smalle peiphey images (fewe RF ings), which consequently take less time to compute. Fo this eason, lage fove wee not tested. The second consideation concens the peiphey size. Even with the selected foveal adii, fo many combinations of the thee paamete values, the peiphey image poduced was vey small. Thus, the measued thoughput of the data eduction system in these cases would be quite lage when compaed to that with useful paamete combinations. Fo this eason, in the expeimentation which follows, the data pesented coespond to situations whee the peiphey image poduced contains at least 100 pixels. Thus the mapping templates include at least 100 eceptive elds. Anothe issue petains to the smallest RF size. The diamete of the smallest RF's, located on the foveal bounday, is given by (9) d0 = f : Fo f = 5 and = 0.05 and 0.1, the diamete d0 is less than a pixel. Although it is conceivable that the mapping table and weights could be adapted to suppot RF's of sub-pixel size, these cases wee not consideed in the cuent implementation. Hence, the paamete combinations fo which the smallest RF's ae less than a pixel in diamete wee not used. The last pactical consideation concens available memoy. Fo some paamete values, the memoy equied of the mapping nodes may exceed that available. In these cicumstances, it is not possible to obtain pefomance data. Some additional constaints ae imposed on the combinations of these paametes used in the expeiments. The data pesented in the following subsections ae obtained using only such valid paamete combinations. A combination of paamete values is valid if: (i) the peiphey contains at least 100 pixels, (ii) the smallest RF's have a diamete of at least one pixel, and (iii) the mapping nodes have enough memoy to pefom the mapping fo the given paamete values. All consideations taken into account, the size of the peiphey images using valid paamete values anged fom 100 to 15813 pixels ( f = 50, = :05,! = 0:5), given an input image of dimension 484 by 484. The data eduction atio, i.e., the atio of the input image size to the total size of the fovea and peiphey, anged fom 2,037 to 14.7 [1]. 6 This is tue except fo the smallest value which is set to the minimum size /eccenticity atio. See [1] fo details on the selection of the paamete anges.

14 A Real-Time Foveated Senso with Ovelapping Receptive Fields 5.2. System Thoughput. The st pefomance measue of inteest is the thoughput of the etinal mapping system, that is, the numbe of fames pe second. It is impotant to emembe that since data mapping and image captue ae not synchonized (see section 4.2), the fame ates shown hee can exceed the image captue ate of thity fames pe second. The expeimental setup equies the addition of a taget system fo measuing the mapping time. This system consists of two time nodes which eceive the esulting fovea fom the captue node and the peiphey fom the combination node. This pemits us to measue the time t m between consecutive outputs. The thoughput is given by the ecipocal of this measued time, (10) = 1 : t m Thee is some andomness in the measuements obtained. Fo each valid combination of paamete values (!; and f ), the time equied pe fame was measued fo 28 fames 7. Analysis has shown that the pecent standad deviation of the mapping time t m is vey small (< 1%) fo almost all paamete combinations [1]. Fo the few points whee the deviation is lage, the deviation can each 10%. The majo facto which inuences this deviation is that each C40 node uns a small opeating system (OS) which emoves some time fom the mapping pocess at egula intevals 8. The amount of time equied by the OS vaies. Occasionally, it needs a lage amount of time, and as a esult, it slows down the mapping fo that paticula fame. This poduces an outlie in the deviation gaph. Thus if the expeiment is epeated, the peaks in the deviation suface would not necessaily be at the same points, if they would occu at all. See [1] fo a discussion of this phenomenon. To eliminate the andom behavio, the OS could be disabled. This is not pactical because doing so would entail estating the pocesso netwok fo each paamete combination. Instead, the median value of the measued thoughputs was obtained fo each paamete combination 9. These median values ae the basis fo the esults pesented fom heeon. Figue 7 contains the thoughput plots fo the valid paamete combinations. Each plot contains fou sufaces, and each one epesents the thoughput fo a dieent numbe k of mapping nodes. It is clea fom the gue that the thoughput vaies with the paametes,!, and f. As intuitively expected, in each instance, the thoughput is faste fo a lage k. Howeve, the change in thoughput fo inceasing and! is fa fom being smooth. The easons fo this can be easily undestood. Fo example, conside the eect of educing the size eccenticity atio. A decease in causes the RF's to become smalle, and hence, the total RF aea and the cost of the mapping decease. This emains tue until thee is enough space between the last RF ing and the image bounday to add a new RF ing. At this point the mapping cost jumps up and the thoughput deceases. Table 1 gives the maximum and minimum thoughputs fo each foveal adius fo the case of fou mapping nodes. Evidently a minimum ate of ten fames pe second can be achieved. As fo the maximum fame ates, they occu fo vey small peiphey sizes ( 100 7 This numbe of fames was chosen abitaily. 8 The opeating system used is the Helios Paallel Opeating System T M fo the C40 puchased fom Peihelion Softwae. 9 Given a set of values vi ji = 1; 2; :::; n oganized in ascending ode, the median value is dened as v n 2. If n is even, then eithe v b n 2 c o v d n 2 e can be used.

5. Pefomance of the Foveated Senso 15 Median Thoughput fo f = 5 Median Thoughput fo f = 50 fame ate (fames/s) 100 90 80 70 60 50 40 30 20 10 0 0 0.1 0.2 ω 0.3 0.4 0.5 0 0.5 α k = 4 k = 3 k = 2 k = 1 1 fame ate (fames/s) 100 90 80 70 60 50 40 30 20 10 0 0 0.1 0.2 ω 0.3 0.4 0.5 0 0.5 α k = 4 k = 3 k = 2 k = 1 1 (a) (b) Median Thoughput fo f = 100 100 fame ate (fames/s) 90 80 70 60 50 40 30 20 10 0 0 0.1 0.2 ω 0.3 0.4 0.5 0 0.5 α k = 4 k = 3 k = 2 k = 1 1 (c) Figue 7. System Thoughput. The medians of the measued thoughputs ae shown fo the valid paamete combinations. These anges of values ae illustated by the shadow of the sufaces on the!-planes. (a) f = 5, (b) f = 50, (c) f = 50. pixels). When unning this system with a display system instead of timing nodes, these same maximum fame ates ae measued. 5.3. Speed-Up. Speed-up is dened as the atio of pocessing time t(1) when using one pocesso to the time t(k) using k pocessos [30]. It is a measue of speed incease and indicates whethe adding pocessing nodes povides a wothwhile impovement in thoughput. Since the pocessing time is the ecipocal of thoughput, the speed-up using k pocessos when compaed to one is given by

16 A Real-Time Foveated Senso with Ovelapping Receptive Fields Fame pe Second f min. max. 5 26.1 83.5 50 11.9 67.2 100 15.1 65.7 Table 1. Minimum and Maximum Median Thoughputs. Speed-Up S2 S3 S4 f min. max. min. max. min. max. 5 1.98 2.01 2.95 3.01 3.90 4.02 50 1.98 2.01 2.94 3.02 3.91 4.01 100 1.98 2.01 2.96 3.01 3.93 4.01 Table 2. Minimum and Maximum Speed-Ups fo Each Foveal Radius. (11) S k = (k) (1) Hee, (1) is the case whee a single mapping node is used in conjunction with the captue and combination nodes; the latte simply acts as a data elay in this situation. The speed-up sufaces obtained using equation (11) ae shown in Figue 8. These sufaces ae supisingly at. This means that the shape of the thoughput cuves of Figue 7 fo a given foveal adius f is the same egadless of the numbe of mapping nodes. Thus, fo up to fou mapping nodes, the decease in mapping time is linealy elated to the numbe of nodes. In othe wods, the speed-up is linea. This indicates that the uppebound fo speedup has not been eached. This also shows that adding mapping nodes might esult in a futhe decease in mapping times and theeby incease the thoughput. Table 4.3 contains the minimum and maximum speed-up obseved fo each of the foveal adii. In all cases, the speed-up S k exceeds the maximal value k by a small amount. This is pobably due to the statistical vaiance inheent in the data (see discussion on the deviation in section 5.2). As fo the minimum speed-up, as k inceases, so does the dieence S k? k. This indicates that fo some paamete combinations, the speed-up obtained by adding a fouth mapping node is no longe linea. This behavio would be moe appaent if the expeiments wee pefomed using additional mapping nodes. Unfotunately, the existing netwok conguation is aleady at the pactical limit. Five of the six available C40 communication pots of the captue and combination nodes ae being used. In both cases, the sixth communication pot is employed by the opeating system fo management puposes. 6. Receptive Field Mask Examples It has aleady been mentioned that one of the advantages of using a cicula ovelapping eceptive eld model is its exibility in selecting dieent RF aveaging masks. A given

6. Receptive Field Mask Examples 17 Speed-Up fo f = 5 Speed-Up fo f = 50 5 5 speedup 4 3 2 1 S 4 S 3 S 2 speedup 4 3 2 1 S 4 S 3 S 2 0 0 0.1 0.2 ω 0.3 0.4 0.5 0 0.5 α 1 0 0 0.1 0.2 ω 0.3 0.4 0.5 0 0.5 α 1 (a) (b) Speed-Up fo f = 100 5 speedup 4 3 2 1 S 4 S 3 S 2 0 0 0.1 0.2 ω 0.3 0.4 0.5 0 0.5 α 1 (c) Figue 8. Speed-Up. Sufaces showing speed-up using 2, 3, and 4 pocessos fo the valid paamete combinations. (a) f = 5, (b) f = 50, (c) f = 100. aveaging mask dictates the weights used fo calculating the pixel contibutions fo each eceptive eld. Recall that the value z of a peiphey pixel (i; j) is the weighted sum of all pixels within the RF cicle and is given by (12) z(i; j) = X w(x; y; i; j)i(x; y); 8(x;y)2A(i;j) whee A(i; j) is the set of input image pixels with coodinates (x; y) within an RF. I(x; y) is the intensity of the input pixel and w(x; y; i; j) is the weight of the input pixel contibution to the RF. In this section, typical outputs fo fou kinds of aveaging masks ae pesented. Also, the methods fo calculating the pixel contibution weights based on the pole of the masks ae

18 A Real-Time Foveated Senso with Ovelapping Receptive Fields outlined. Thee of the masks ae ciculaly symmetic. These poduce unifom aveaging, Gaussian aveaging, and dieence-of-gaussians aveaging. In addition, a method fo using RF aveaging fo edge detection is poposed. Employing any of the masks does not aect the speed of the system since only the weights change. Howeve, fo the dieence-of-gaussians and edge detection masks some post-combination pocessing is equied. Figues 9(a) and 10(a) show the input images used. To compute the mapping, the foveal adius f is set to 50 pixels, the ovelap facto! to 0.5, and the size/eccenticity atio to 0.1. The fove poduced ae shown in Figues 9(b) and 10(b). 6.1. Unifom Aveaging. Figues 9(c) and 10(c) contain peiphey images computed using a unifom aveaging mask. In this case, the pixels within a given RF contibute equally to the RF output value. Hence the weighting factos ae the same fo each pixel contibution. In pepaing the span infomation table (SIT) the numbe of pixels? i within a eceptive eld in ing i is st calculated 10. The bounday of the cicula RF's is appoximated using a discetized-cicle geneation algoithm (see [29] fo details). Given this bounday,? i can be tallied. The weights used in equation (12) ae given by (13) w(i; j; x; y) = 1? i 6.2. Gaussian Aveaging. Figues 9(c) and 10(c) show peiphey outputs using a Gaussian aveaging mask. The weight of a pixel contibution is a function of the distance between the pixel and the cente of the RF. It is given by (14)? 2 w(i; j; x; y) = Ae i : An illustation of this pole function is shown in Figue 11. Fo = i, the weight is appoximately one thid of the maximal value A (see Figue 11). Theefoe, the distance at which weights become vey small can be set by choosing i. If i is small compaed to the RF adius i, only pixels nea the cente of the RF will contibute signicantly to the RF output. To obtain the outputs shown, the amplitude constant A is chosen so that the sum of all the weights fo an RF is unity. The value of i used depends on the size of the RF. Fo each size, it was decided that this one-thid dop-o point should occu at i 2. 6.3. Dieence-of-Gaussians Aveaging. The thid set of peiphey outputs, shown in Figues 9(e) and 10(e), is obtained using a dieence-of-gaussians (DOG) aveaging mask. This type of mask models the esponse of etinal ganglion cells, each of which dives one of the 10 6 optic neve bes. The pole function used in this case is (15)? 2 w(i; j; x; y) = A1e i1 2? 2? A2e i2 2 ; and is illustated in Figue 12(a). The paametes A1, A2, i1, i2 ae chosen to ensue that thee is a point 0 whee the function is zeo. Figue 12(b) depicts an RF using a DOG mask. The cental egion (white) contains only positive weights, while the suound (black) has only negative weights. 10 Recall that all RF's on ing i have the same size and contain the same numbe of pixels.

6. Receptive Field Mask Examples 19 (b) θ (a) (c) θ θ (d) (f) θ θ (e) Figue 9. Sample Output Images 1. The model paametes used to obtain the data eduction ae f = 50, = :1 and! = :5. (a) input image. (b) enlaged fovea (101 by 101). (c) peiphey using unifom RF aveaging (all peipheies ae 126 by 21 and ae enlaged). (d) peiphey using a Gaussian RF aveaging mask. (e) peiphey using a dieence-of-gaussians RF aveaging mask. The RF values ae scaled and oset to poduce gay scale values between 0 (black pixels; o-cente esponse) and 255 (white pixels; on-cente esponse). (f) thinned and thesholded magnitude esponse of the peiphey using the edge RF mask. (g) angle esponse of the peiphey using the edge RF mask. Angle values, anging fom 0 to 360 degees, ae scaled to the gay-scale values of 0 (black) to 127 (gay). (g)

20 A Real-Time Foveated Senso with Ovelapping Receptive Fields (b) θ (a) (c) θ θ (d) (f) θ θ (e) Figue 10. Sample Output Images 2. See caption fo Figue 9. Fo the outputs shown hee, the values fo the fou pole paametes wee set accoding to the RF size in ode to place the point 0 at appoximately i 2. Also, they wee chosen so that the sums of the positive and the negative weights wee each unity. If all of the pixels within an RF would have the same intensity (no contast), this would ensue that the RF value was zeo. If the cental egion of the RF wee bighte than its suound, then the RF value would be positive (on-cente esponse). Convesely, if the cente egion is dake than its suound, then the RF value would be negative (o-cente esponse) 11. 11 Thee ae also two systems in the the pimate etina; these ae the so-called ON and OFF systems [2]. (g)