Efficient Hough transform on the FPGA using DSP slices and block RAMs

Size: px

Start display at page:

Download "Efficient Hough transform on the FPGA using DSP slices and block RAMs"

Homer Walsh
5 years ago
Views:

Efficiet Hough trasform o the FPGA usig DSP slices ad block RAMs Xi Zhou, Norihiro Tomagou, Yasuaki Ito, ad Koji Nakao Departmet of Iformatio Egieerig Hiroshima Uiversity Kagamiyama 1-4-1, Higashi

1 Efficiet Hough trasform o the FPGA usig DSP slices ad block RAMs Xi Zhou, Norihiro Tomagou, Yasuaki Ito, ad Koji Nakao Departmet of Iformatio Egieerig Hiroshima Uiversity Kagamiyama 1-4-1, Higashi Hiroshima, Japa Abstract The mai cotributio of this paper is to preset a ew FPGA architecture for the Hough trasform that idetifies straight lies i a biary image. Recet FPGAs have hudreds of embedded DSP slices ad block RAMs. For example, Xilix Virtex-6 Family FPGAs have a DSP48E1 slice, which is a cofigurable logic block equipped with fast multipliers, adders, pipelie registers, ad so o. They also have a dualport memory with 18Kbits as a block RAM. Oe of the most importat key techiques for acceleratig computatio usig FPGAs is a efficiet usage of DSP slices ad block RAMs. Our ew architecture for the Hough trasform uses 178 DSP48E1 slices ad 180 block RAMs with 18Kbits that work i parallel. As far as we kow, there is o previously published work that fully utilizes DSP slices ad block RAMs for the Hough trasform. Roughly speakig, a covetioal sequetial implemetatio performs 180m votig operatios for m edge poits. Our architecture performs votig operatios i parallel, ad outputs idetified straight lies i m+97 clock cycles. Sice 180m votig operatios are performed usig 178 DSP48E1 slices, the lower boud of the computig time is m clock cycles. Hece our implemetatio is close to optimal. The implemetatio results show that the Hough trasform for a image with 333 edge poits ca be doe i oly µs. Keywords-Image processig, Lie detectio, Hough trasform, FPGA, Embedded DSP slices, Embedded block RAMs parameter space appropriately. I the followigs, we call this coutig to the accumulators votig. More specifically, for each edge poit (x, y) i a -dimetioal image, the votig is performed alog a curve = x cos θ+y si θ (0 θ < 180). Possible lies ca be detected by searchig poits that are voted itesively. Figure 1 shows a example of straight lie detectio usig the Hough trasform. For a iput image (Figure 1(a)), the biary edge image (Figure 1(b)) is obtaied by the edge detector such as Sobel filter. The result of votig to the parameter space is show i Figure. I this figure, darker poits show poits that are voted itesively, that is, represet probable lies. Accordig to the result of votig, the pricipal lies are detected (Figure 1(c)). I. INTRODUCTION Hough trasform is a techique to fid shapes i images [1]. I particular, it has bee utilized to extract lies, circles, ellipses ad arbitrary shapes. The Hough trasform defies a mappig from a image ito a parameter space represeted by a accumulate array. The parameter space is defied by parameterizig detected shapes. Based o each edge poit of the image, the mappig adds a vote to correspodig elemets i the accumulate array. The elemets that are icreased represet associated parameters based o detected shapes. Therefore, the elemets that are voted itesively correspod to the parameters of shapes i the image space. The Hough trasform ca be used to extract straight lies i a biary image []. The idea of this method is to exploit the duality betwee poits of a lie ad parameters of that lie. A poit i the image is represeted by a curve i the parameter space ad lies of colliear poits itersect i the parameter space at oe poit. These itersectios are couted i a array of accumulators that quatizes the Figure. Hough parameter space A Field Programmable Array (FPGA) is a programmable logic device desiged to be cofigured by the customer or desiger by hardware descriptio laguage after maufacturig. The most commo FPGA architecture cosists of a array of logic blocks, I/O pads, block RAMs ad routig chaels. Furthermore, recet FPGAs have embedded DSP slices that make a higher performace ad a broader applicatio. The Xilix Virtex-6 series FPGAs have DSP48E1 slices that are equipped with a multiplier, adders, logic operators, etc [3]. More specifically, the DSP48E1 slice has a twoiput multiplier followed by multiplexers ad a three iput adder/subtractor/accumulator. The DSP48E1 multiplier ca

Programmable pipeliig of iput operads, itermediate products, ad accumulator outputs ehaces throughput ad improves frequecy. The DSP48E1 also has pipelie registers betwee operators to reduce the delay.

2 (a) Iput image (b) Biary edge image by Sobel filter (c) Lie detectio usig the Hough trasform Figure 1. Example of straight lie detectio usig the Hough trasform perform multiplicatio of a 18bit ad a 5bit two s complemet umbers ad produces oe 48bit two s complemet productio. Programmable pipeliig of iput operads, itermediate products, ad accumulator outputs ehaces throughput ad improves frequecy. The DSP48E1 also has pipelie registers betwee operators to reduce the delay. The block RAM i the Virtex-6 FPGA is a embedded memory supportig sychroized read ad write operatios. I the Virtex-6 FPGA, it ca cofigured as 36Kbit dual port block RAMs, FIFOs, or two 18Kbit dual port RAMs. I our architecture, it is used as a 1K 18bit dual port RAM. Sice FPGA chips maitai relatively low price ad its programmable features, it is widely used i those fields which eed to update architecture or fuctios frequetly such as commuicatio ad educatio areas. They are widely used i cosumer ad idustrial products for acceleratig processor itesive algorithms [4], [5], [6], [7], [8], [9], [10], [11], [1]. The mai cotributio of this paper is to preset a ew FPGA architecture for the Hough trasform fully utilizes embedded DSP slices ad block RAMs. Our ew idea icludes: Votig Space Partitioig: Polar coordiate votig space (θ, ) is partitioed ad arraged ito block RAMs. This eables us to perform votig operatios i parallel. Also, the fuctio of dual-port of block RAMs are fully used to accumulate the votig value istatly. Efficiet Usage of DSP slices: DSP slices are used to compute x cos θ ad y si θ i parallel for each edge pixel (x, y). We compute x cos θ ad y si θ for θ such that 0 θ < 90 istead of computig them for θ such that 0 θ < 180. Also, we avoid the computatio of the values of cos θ ad si θ by pre-loadig them i the DSP slices. Fully Pipelied Architecture: We take ito accout a layout of DSP slices ad block RAMs i Virtex-6 FPGA architecture, ad desig our Hough trasform architecture as a fully pipelied oe. For example, i the Virtex- 6 FPGA XC6VLX40T has 768 DSP48E1 slices arraged i 8 colums of 96 adjacet DSP48E1 slices. Neighborig DSP48E1 slices are coected directly through pipelie registers. Our Hough trasform architecture uses colums to compute x cos θ ad y si θ each, ad uses a pipelie techique to maximize the clock frequecy. Usig these ideas, our ew architecture for the Hough trasform uses 178 DSP48E1 slices ad 180 block RAMs with 18Kbits that work i parallel. Oe of the most importat key techiques for acceleratig computatio usig FPGAs is a efficiet usage of DSP slices ad block RAMs. Nevertheless, as far as we kow, there is o previously published work that fully utilizes DSP slices ad block RAMs for the Hough trasform. Roughly speakig, a covetioal sequetial implemetatio performs 180m votig operatios for m edge poits. Our architecture performs votig operatios i parallel, ad outputs idetified straight lies i m + 97 clock cycles. Sice 180m votig operatios are performed usig 178 DSP48E1 slices, the lower boud of the computig time is m clock cycles. Hece our implemetatio is close to optimal. We have implemeted our ew architecture o a Virtex-6 family FPGA XC6VLX40T-1. The circuit rus i MHz ad outputs idetified straight lies i m + 97 cycles. For example, Figure 1 icludes 333 edge poits. Therefore, the circuit ca perform the Hough trasform i µs. May hardware algorithms for FPGA implemetatio of the Hough trasform for lies have bee proposed i past. As far as we kow, however, there is o published hardware algorithm usig embedded DSP slices or multipliers i the

3 FPGA. I the existig researches, istead of circuits of multiplicatio with DSP slices or multipliers, they itroduced icremetal Hough trasform [13], [14], [15], CORDIC [16], [17], ad hybrid-log arithmetic [18] to the computatio of the Hough trasform. Sice most of recet FPGAs produced by pricipal vedors equip embedded DSP blocks [19], [0], [1], oe of the most importat key techiques for acceleratig computatio usig FPGAs is a efficiet usage of DSP slices ad block RAMs. This paper is orgaized as follows. Sectio II itroduces the Hough trasform algorithms for lies. We show the FPGA architecture for the Hough trasform i Sectio III. Sectio IV shows the experimetal results. Fially, Sectio V cocludes the paper. II. HOUGH TRANSFORM The mai purpose of this sectio is to review the Hough trasform algorithms for straight lies. Suppose that we have a image of size. We assume that pixels are arraged i two dimesioal xy-space such that the origi is i the ceter of the image as illustrated i Figure 3. Hece, both coordiates x ad y take itegers i the rage [ + 1, ]. A pixel (x, y) ( + 1 x, y ) i the xy-space is coverted to a curve i the θ-space by the followig formula: = x cos θ + y si θ (0 θ < 180) (1) Clearly, the double iequality < is satisfied. The values of θ ad ca also be obtaied geometrically. Suppose that we draw a lie goig through the origi with agle θ as illustrated i Figure 3. For such a lie, we ca draw the orthogoal lie goig through a pixel (x, y). The value of correspods to the distace to the lie. I other words, a poit (θ, ) of θ-space correspods to a lie of xy-space. The key idea of the Hough trasform is to vote i θ-space for every pixel i the xy-space. Let (x 0, y 0 ), (x 1, y 1 ),..., (x k 1, y k 1 ) be the k pixels i xyspace. The Hough trasform is spelled out as follows: [Straight Forward Hough Trasform] for i 0 to k 1 for θ 0 to 179 begi x k cos θ + y k si θ v[θ][] v[θ][] + 1 ed for θ 0 to 179 do i parallel for to do i parallel output (θ, ) if v[θ][] threshold For simplicity, we assume that the value of is automatically rouded to a iteger. I the Straight Forward Hough Trasform, for each poit (x k, y k ), the values of x k cos θ ad y k si θ are computed for θ = 0, 1,..., 179. If v[θ][] is storig a large value, may poits i the k iput pixels lie i the lie i xy-space correspods to a poit (θ, ) i θ-space. We will show that, it is sufficiet to compute these values for θ = 0, 1,..., 90. From the additio theorem of trigoometric fuctios, we have = x k cos(180 θ) + y k si(180 θ) = x k cos(θ) + y k si(θ). () Usig Formula (), the Hough trasform ca also be doe by partitioig the rage [0, 179] of θ ito two rages [0, 89] ad [90, 179]. Also, we avoid goig through array v for fidig elemets larger tha a threshold. Thus, our ew Hough trasform, called the Circuit-orieted Hough Trasform is be spelled out as follows: [Circuit-orieted Hough Trasform] for i 0 to k 1 do begi for θ 0 to 89 do begi x k cos θ + y k si θ v[θ][] v[θ][] + 1 output (θ, ) if v[θ][] = threshold ed for θ 1 to 90 do begi x cos(θ) + y si(θ) v[180 θ][] v[180 θ][] + 1 output (θ, ) if v[θ][] = threshold ed ed I the followig sectio, we show a efficiet implemetatio of the Circuit-orieted Hough Trasform. III. OUR FPGA ARCHITECTURE FOR THE HOUGH TRANSFORM This sectio describes our FPGA architecture for the Hough trasform usig DSP slices ad block RAMs i Xilix Virtex-6 FPGA. We use Xilix Virtex-6 Family FPGA XC6VLX40T-1 as the target device []. A. Structure of our architecture for the Hough trasform Figure 4 illustrates our architecture for the Hough trasform. We use 178 DSP slices X 1, X,... X 89 ad Y 1, Y,..., Y 89. For each θ (0 θ 90) X θ ad Y θ compute x k cos θ ad y k cos θ for give x k ad y k, respectively. Sice x k cos 0 = x k, x k cos 90 = 0, y k si 0 = 0, ad y k cos 90 = y k, DSP slices X 0, X 90, Y 0, ad Y 90 are ot ecessary. Usig a adder ad a subtractor for each pair X θ ad Y θ, θ = x k cos θ + y k cos θ ad 180 θ = x k cos θ + y k cos θ are computed. We also use 180 block RAMs V 0, V 1,... V 179 to store the votig value. Address

4 y (x, y) (θ, ) + 1 θ x θ + 1 Figure 3. Two dimesioal Spaces xy ad θ used i the Hough trasform of each V θ (0 θ 179) is used to store the value of v[θ][]. To miimize the delay betwee registers, DSP slices X 1,..., X 90 are coected i a pipelie fashio as illustrated i Figure 4. Each X θ has a register to store the value of x k. I every clock cycle, the value is trasferred from X θ to X θ+1. Similarly, DSP slices Y 0, Y 1,..., Y 90 are coected i a pipelie fashio. Figure 5 illustrates two DSP slices X θ ad Y θ with a adder ad subtractor to compute. I X θ, the value of x k is loaded i a iteral register. Also, the value of cos θ is pre-computed. Note that the value of cos θ used i X θ is a fixed value. The product of x k ad cos θ is computed i a multiplier of the DSP slice X θ. Similarly, the value of si θ used i Y θ is a fixed value ad the product of y k ad si θ is computed i a multiplier of the DSP slice Y θ. I the Virtex-6 FPGA XC6VLX40T, that is our target device, DSP48E1 slices are arraged i 8 colums of 96 adjacet DSP48E1 slices. Neighborig DSP48E1 slices are coected directly through pipelie registers. Our Hough trasform architecture uses colums to compute x k cos θ ad y k si θ each, ad uses a pipelie techique to maximize the clock frequecy (Figure 6). Figure 7 illustrates the architecture of V θ usig a block RAM. A block RAM i the FPGA is dual port architecture. Xilix Virtex-6 Family has 18Kbit dual-port block RAMs, which have two sets of ports operated idepedetly. Two sets of ports are: Port Set A ADDRA (ADDRess A), DOA (Data Output A), DIA (Data Iput A), ad Port Set B ADDRB (ADDRess B), DOB (Data Output B), DIB (Data Iput B). Let M[i] deote a data of address i of the block RAM. I read operatio of Port Set A, M[ADDRA] is output from x k y k x k cos θ y k si θ x k cos θ y k cos θ Figure 5. Two DSP slices X θ ad Y θ with a adder ad subtractor to compute DOA after the risig clock edge. I write operatio of Port Set A, the data give to DIA is writte i M[ADDRA] at the risig clock edge. Read/write operatios of Port Set B are the same as Port Set A. Port Set A ad Port Set B work idepedetly. I the block RAMs i the target device of this work, read/write operatios ca be cofigured as either RF (Read First) mode or WF (Write First) mode. I the RF mode, if readig ad writig operatios are performed to the same address, readig operatio is performed before the

5 x k X 1 X X 89 y k Y 1 Y Y 89 V 90 V 0 V 179 V 1 V 178 V V 91 V 89 (θ, ) Figure 4. The outlie of our FPGA architecture for the Hough trasform x k y k x k (=x k cos0) y k (=y k si90) cos1 si1 x k cos1 y k si1 cos si 89 DSP blocks x k cos 89 DSP blocks y k si cos89 si89 x k cos89 y k si89 Figure 6. Pipelie architecture to compute x k cos θ ad y k si θ with DSP slices readig operatio. Hece the readig data is the data before writig data. O the other had, i the WF mode, sice the writig performed before the readig, the readig data is the updated data. However, whe a dual port is used, there is a restrictio that if read ad write operatios to the same address are performed for each port, the settig of block RAMs must be RF [3]. We use the block RAM to store the values of v[θ][] ( < ). Let v θ [i] deote the data of address i i block RAM V θ. Sice is give to it ADDRA, v θ [] is output from DOA after the risig clock edge as illustrated i Figure 7. After that, v θ [] + 1 is computed ad it is give to DOB. Sice is give to ADDB, v θ [] + 1 is writte i v θ []. I other words, v θ [] v θ [] + 1 is performed. At that time, accordig to the restrictio stated i the above, sice the same value of may be iput cotiuously, the settig of block RAMs must be RF. Namely, whe the same value of is iput cotiuously, the former voted value is ot read from the block RAM. To avoid this situatio, we use a additioal register to store the latest voted value ad if the same value of is iput cotiuously, the stored value is used istead of the value read from the block RAM. I the same time, a comparator is used to determie if v θ [] + 1 = threshold. If so, the value of is writte i a register. After that, a pair (θ, ) is writte ito a ext register. The pair (θ, ) represets a probable lie. It moves toward the output of the circuit usig series of shift registers oe by oe show i Figure 4. I order to reduce the umber of clock cycles ecessary to move data to the output, we use two series of shift registers. Oe is used for output data of V 0,..., V 89. The other is used for output data of V 90,..., V 179. Therefore, the umber of clock cycles ecessary to move data to the output is reduced to at most 90 clock cycles. B. Data represetatio The choice of data precisio is guided by the implemetatio cost i terms of area, simplicity of desig, speed ad power cosumptio. Higher precisio will lead to less quatizatio error i the fial implemetatio. O the other had, lower precisio will produce more compactio ad faster desigs with less power cosumptio. A trade-off choice eeds to be made depedig o the give applicatio ad available FPGA resources. I our work, i order to miimize chip space ad computatio time, short fixed poit represetatio of umbers

6 = v θ [] + 1 threshold = θ block RAM ADDRA DOA ADDRB DIB +1 v θ [] usig Xilix ISE I the implemetatio, to reduce the delay of the circuit, some pipelie registers are iserted ito betwee circuit elemets. It takes 3 clock cycles to compute the values of for give x k ad y k. Also, 4 clock cycles are ecessary to output a pair (θ, ) that represets a probable lie. Moreover, the umber of clock cycles ecessary to move data to the output is reduced to at most 90 clock cycles. Therefore, this circuit ca output idetified straight lies represeted by (θ, ) i m + 97 m+97 cycles, i.e., µs. For example, Figure 1(b) icludes 333 edge poits. Therefore, the circuit ca perform the Hough trasform i µs. Also, if all the poits of a image of size 51 51(= 6144) are edge poits, it takes µs to complete to output the results. Of course, it is ot possible that all poits are edge poits, however, this fact guaratees that our Hough trasform implemetatio for ay image termiates i less tha µs. (θ, ) Table I PERFORMANCE EVALUATION OF THE PROPOSED ARCHITECTURE FOR THE HOUGH TRANSFORM Figure 7. A block RAM V θ to store v[θ][] DSP48E1 slices (out of 768) 178 (3.1%) 18Kbit block RAMs (out of 83) 180 (1.6%) Slices (out of ) (4.81%) Clock frequecy [MHz] are used. Cosiderig the structure of DSP slices ad block RAMs, we choose the data presetatio i our implemetatio, as follows. The data format of iputs that are pairs of coordiates x k ad y k are 10bit two s complemet iteger each. Also, the data format of cos θ ad si θ is 16bit fixed poit umber, which cosists of 1bit sig, 1bit iteger ad 14bit fractio based o two s complemet. O the other had, the data format of is 10bit two s complemet iteger. The data format of the voted value is 18bit iteger. Namely, the umber of the vote is at most Sice the rage of the value of θ is 0 to 180, the data format of θ is 8bit iteger. IV. EXPERIMENTAL RESULTS We have implemeted ad evaluated our proposed methods of the Hough trasform o the FPGA. For the purpose of estimatig the speed up of our implemetatios, we have also implemeted a covetioal software approach of the Hough trasform usig GNU C. We have used Itel Xeo X7460 ruig i.66ghz ad 18GB memory to ru the sequetial algorithm for the Hough trasform. For the image show i Figure 1(b) that icludes 333 edge poits, the software implemetatio ca perform the Hough trasform i 37.10ms. If the iput image is worst case i terms of the computig time, that is, if all the poits of a image of size 51 51(= 6144) are edge poits, it takes 359.7ms to complete to output the results. I the evaluatio, we have used the Xilix Virtex-6 FPGA XC6VLX40T-1. Table I shows the experimetal results Table II shows the computig time of the Hough trasform for Figures 1(b), 8(a), ad 9(a). Accordig to the table, we ca fid that the computig time for both the CPU implemetatio ad the FPGA implemetatio almost depeds o the umber of the edge poits, ot the size of the image. There are a umber of literatures reported to implemet the Hough trasform for lies usig the FPGA show i Sectio I. Performaces such as device, logic blocks, DSP slices, frequecy ad throughput are compared i Table III. It is difficult to directly compare to other works because utilized FPGAs ad supported size of images differ. Cosiderig the throughput, however, it is clear that the performace of our FPGA implemetatio is better tha that of other works. Table III COMPARISON WITH RELATED WORKS FOR THE HOUGH TRANSFORM USING FPGAS Karaberou [16] Deg [17] Device XC4010EPC84 XC4010XL Logic blocks 05 CLBs 333 CLBs DSP slices Frequecy 3.166MHz 40MHz Throughput Mpixel/s 0.63Mpixel/s Lee [18] This work Device Virtex 4 XC6VLX40T-1 Logic blocks 314 CLBs Slices DSP slices 178 DSP48E1s Frequecy 13MHz MHz Throughput 3.768Mpixel/s 45.48Mpixel/s

7 (a) Iput biary edge image (b) Lie detectio usig the Hough trasform Figure 8. Example of straight lie detectio usig the Hough trasform ( , 393 edge poits) (a) Iput biary edge image (b) Lie detectio usig the Hough trasform Figure 9. Example of straight lie detectio usig the Hough trasform ( , 8009 edge poits) Table II COMPUTING TIME OF THE HOUGH TRANSFORM Image Size # edge poits Time (FPGA) Time (CPU) Speed-up Figure 1(b) µs 37.10ms 73.3 Figure 8(a) µs 7.47ms 88.3 Figure 9(a) µs 11.64ms 37.4 V. CONCLUSIONS We have preseted a ew FPGA implemetatio for the Hough trasform that idetifies straight lies i a biary image. Our basic idea is to partitio the votig space ad the votig operatio is performed i parallel. I our implemetatio, we utilize DSP slices ad block RAMs o the Virtex-6 Family FPGA. Partitioig the parameter space to vote, the 180 votig operatios are performed i parallel with 178 DSP48E1 slices ad Kbit block RAMs. We have implemeted our architecture o the Virtex-6 Family FPGA XC6VLX40T-1. The experimetal results show that this implemetatio rus i MHz ad give m coordiates of edge poits, it ca output idetified straight m+97 lies i m + 97 cycles, i.e., µs. The implemetatio results show that the Hough trasform for a image with 333 edge poits ca be doe i µs o the FPGA. REFERENCES [1] P. V. C. Hough, Method ad meas for recogizig complex patters, U.S. Patet 3,069,654, 196. [] R. O. Duda ad P. E. Hart, Use of the Hough trasformatio to detect lies ad curves i pictures, Commuicatios of the ACM, vol. 15, o. 1, pp , 197. [3] Xilix Ic., Virtex-6 FPGA DSP48E1 Slice User Guide (v1.3), 011. [4] J. L. Bordim, Y. Ito, ad K. Nakao, Acceleratig the CKY parsig usig FPGAs, IEICE Trasactios o Iformatio ad Systems, vol. E86-D, o. 5, pp , May 003. [5], Istace-specific solutios to accelerate the CKY parsig for large cotext-free grammars, Iteratioal Joural o Foudatios of Computer Sciece, pp , 004.

8 [6] Y. Ito ad K. Nakao, Low-latecy coected compoet labelig usig a FPGA, Iteratioal Joural o Foudatios of Computer Sciece, pp , 010. [7] Y. Ago, Y. Ito, ad K. Nakao, A FPGA implemetatio for eural etworks with the FDFM processor core approach, Iteratioal Joural of Parallel, Emerget ad Distributed Systems, pp. 1 13, 01. [] Xilix Ic., Virtex-6 Family Overview(v.4), 01. [3], Virtex-6 FPGA Memory Resources User Guide (v1.6), 011. [8] Y. Ito ad K. Nakao, A ew FM screeig method to geerate cluster-dot biary images usig the local exhaustive search with FPGA acceleratio, Iteratioal Joural o Foudatios of Computer Sciece, pp , 008. [9], Efficiet exhaustive verificatio of the Collatz cojecture usig DSP blocks of Xilix FPGAs, Iteratioal Joural of Networkig ad Computig, vol. 1, o. 1, pp. 49 6, 011. [10] Y. Ito, K. Nakao, ad S. Bo, The parallel FDFM processor core approach for CRT-based RSA decryptio, Iteratioal Joural of Networkig ad Computig, vol., o. 1, pp , 01. [11] K. Nakao ad E. Takamichi, A image retrieval system usig FPGAs, IEICE Trasactios o Iformatio ad Systems, vol. E86-D, o. 5, pp , May 003. [1] K. Nakao ad Y. Yamagishi, Hardware choose k couters with applicatios to the partial exhaustive search, IEICE Tras. o Iformatio & Systems, 005. [13] S. Tagzout, K. Achour, ad O. Djekoue, Hough trasform algorithm for FPGA implemetatio, Sigal Processig, vol. 81, o. 6, pp , 001. [14] H. Bessalah, S. Seddiki, F. Alim, ad M. Becherif, O lie mode icremetal Hough trasform implemetatio o Xilix fpga s, i Proc. of the 8th coferece o Sigal, Speech ad image processig, 008, pp [15] O. Djekoue ad K. Achour, Icremetal Hough trasform: a improved algorithm for digital device implemetatio, Real-Time Imagig, vol. 10, o. 6, pp , 004. [16] S. M. Karaberou ad F. Terrati, Real-time FPGA implemetatio of Hough trasform usig gradiet ad CORDIC algorithm, Image ad Visio Computig, vol. 3, o. 11, pp , 005. [17] D. D. S. Deg ad H. ElGidy, High-speed parameterisable Hough trasform usig recofigurable hardware, i Proc. of the Pa-Sydey area workshop o Visual iformatio processig, vol. 11, 001, pp [18] P. Lee ad A. Evagelos, A implemetatio of a multiplierless Hough trasform o a FPGA platform usig hybrid-log arithmetic, i Proc. of Real-Time Image Processig 008, vol. 6811, 008, pp G 1. [19] Xilix Ic., Virtex-4 FPGA User Guide(v.6), 008. [0], Virtex-5 FPGA User Guide(v5.), 009. [1] Altera Corp., Stratix V Device Hadbook, 01.

An Efficient Implementation of the Gradient-based Hough Transform using DSP slices and block RAMs on the FPGA

An Efficient Implementation of the Gradient-based Hough Transform using DSP slices and block RAMs on the FPGA A Efficiet Implemetatio of the Gradiet-based Hough Trasform usig DSP slices ad block RAMs o the FPGA Xi Zhou, Yasuaki Ito, ad Koji Nakao Departmet of Iformatio Egieerig Hiroshima Uiversity Kagamiyama 1-4-1,