A fast and area-efficient FPGA-based architecture for high accuracy logarithm approximation

Size: px

Start display at page:

Download "A fast and area-efficient FPGA-based architecture for high accuracy logarithm approximation"

Josephine Barker
5 years ago
Views:

1 A ast and area-eicient FPGA-based architecture or high accuracy logarithm approximation Dimitris Bariamis, Dimitris Maroulis, Dimitris K. Iakovidis Department o Inormatics and Telecommunications University o Athens rtsimage@di.uoa.gr Abstract. This paper presents a novel hardware architecture or the approximation o the base- logarithm o integers at a high accuracy. It is based on a generalized piecewise linear approximation approach, which allows a large number o linear segments to approximate the logarithm unction. Contrary to state o the art architectures that utilize up to six linear segments or the approximation, the implementation o the proposed architecture employs up to 04 linear segments, while retaining low FPGA resource requirements and high requency. The exploitation o the capabilities o modern FPGAs results in low resource utilization. More speciically, up to 39 slices, one BlockRAM and one multiplier are utilized on a Xilinx Virtex-5 FPGA. The approximation error depends on the number o segments used and is lower than the comparable architectures by ive orders o magnitude. Thus, the proposed architecture is suitable or use as a component in systems that require ast, accurate and resource-eicient logarithm calculation.. Introduction Accurate high throughput computation o the logarithm is a undamental requirement or a variety o applications including signal and image processing, telecommunications, biomedical and industrial applications. For example in [], massive logarithmic transormations are required or the pre-processing o thousands o microarray data samples. In this case inaccurate computations o the logarithm can lead to erroneous medical decisions. As another example we could consider real-time pattern recognition systems [], where inaccurate computations o eatures based on the logarithm can lead to a deteriorated recognition perormance. Hardware architectures that have been proposed or the approximation o the base- logarithm include implementations o several amilies o algorithms, including power series [3] and polynomial methods [4], high-radix algorithms [5], the CORDIC (Coordinate Rotation Digital Computer) algorithm [6] and piecewise linear approximation methods [7-0]. The iterative CORDIC algorithm has commonly been used or logarithm approximation; however implementing it as a pipelined circuit in an area-eicient way on FPGA remains a challenge. A vast number o implementations based on table lookup and polynomial approximation are examined in [], but possible optimizations based on the available FPGA resources are not

2 taken into account. Alternatively, many hardware architectures aiming at crude, however ast approximation o the logarithm, implement the piecewise linear approximation approach based on Mitchell s method [7]. According to this approach the logarithm unction is approximated by a small number o consecutive linear segments. In [] and [8] this method was implemented by using only two segments; in [9] our segments were used; whereas in [0] a VLSI architecture that uses two, three and six segments was proposed. Inspired by Mitchell s method, and motivated by the need o combining both computational eiciency and approximation accuracy, we propose a novel architecture or the approximation o the base- logarithm in a ast and area-eicient way, exploiting the capabilities o modern FPGAs. The proposed architecture implements a generalized piecewise linear approximation o the logarithm unction that allows or a large number o linear approximation segments, providing up to ive orders o magnitude lower approximation error than the other piecewise methods. The achieved approximation accuracy depends on the number o segments used, which aects the size o a ROM that is used or storing the parameters that control the computation. The implementation o the ROM using a single FPGA BlockRAM minimizes both the slice and the BlockRAM requirements, resulting in a small circuit that occupies less than 0.5% o the total slices on a Xilinx XC5VLX0T FPGA. The rest o this paper is organized in three sections. Section describes the logarithm approximation approach and the FPGA implementation o the proposed architecture. The results o the experiments conducted are presented in Section 3. The conclusions o this study are summarized in Section 4.. Logarithm Estimation via Piecewise Linear Approximation A piecewise linear approximation approach that involves three steps is considered. It involves the calculation o the integral part o the logarithm, the linear approximation o the ractional part o the logarithm by one segment and the piecewise linear approximation o the ractional part by several segments. The piecewise linear approximation o the third step is perormed by splitting the linear approximation produced in the second step into s consecutive linear segments. The steps are described in detail below:.. Step The integral part o the logarithm, l ( x ) log( x i = ), is determined by the position o the Most Signiicant Bit (MSB) o the input number x, where n n+ x < l i = n ( ()

3 5 Logarithm (,) (, ) ( 3,3) ( 4, 4) log( Step Step Input number x Fig.. Steps and o logarithm approximation B c seg(+ l ( A c seg( 0 0 seg(/s l ( (seg(+)/s Fig.. l as a unction o l.. Step The ractional part o the logarithm, i.e. l = log ( l i (, is estimated by linear approximation between the points ( n,n) and ( n+,n+), n=,,, o the unction log ( resulting in Eq.. The approximated ractional part is li ( x x l ( = = () li ( + li ( li (

4 .3. Step 3 In the third step, a new approximation l ( o the ractional part o the logarithm is derived as a unction o l (. It is obtained by piecewise linear approximation between the points ( n,n) and ( n+,n+) using s linear segments o equal length. The ractional part l ( determines the segment seg( to which x belongs, according to Eq. 3. seg( = s l ( (3) As illustrated in Fig., or an input number x, the endpoints A and B o the segment seg( are elevated by c seg( and c seg(+ and their coordinates are derived by Eqs. 4 and 5 respectively. The equation that deines the linear segment AB is simpliied to the equivalent Eq. 6 producing the equation used or the calculation o l (. It is worth noting that the resulting approximation is continuous, since the consecutive segments are not elevated independently, but share an endpoint. In particular, the right endpoint o a segment seg( is elevated by the same amount c seg(+ as the let endpoint o segment seg(+, thereore ensuring the continuity o the resulting approximation. seg( seg( A, + c seg ( (4) s s seg( + seg( + B, + c seg ( + (5) s s ( c c ) ( s l ( seg( ) l ( = l ( + cseg ( + seg ( + seg ( (6) The optimal parameters c i can be determined by minimizing the approximation error E (Eq. 7), which represents the sum o the relative dierences between the actual and approximated values o the logarithm. E = ( l i ( + l ( ) x log ( log ( (7).4. Implementation The proposed architecture is implemented as a ully pipelined circuit, in order to achieve a throughput o one result per clock cycle. It consists o 6 pipeline stages and one dual-ported ROM. The ROM stores the parameters c i in a ixed point representation and is implemented on a single FPGA BlockRAM. This representation consists o a b-bit wide mantissa, whereas the exponent e is common or all parameters c i and is not stored in the ROM. Figure 3 illustrates the structure o the pipelined circuit. In the irst pipeline stage, a priority encoder is used to locate the

5 x 6 priority encoder l shiter i ( l ( log s + log s 5 4 Dual-port ROM s b 3 b b 4 b 5-log s 5 5-log s+b + 5-log s+b prepend e leading zeroes 6 5-log s+b+e + 5-log s+b+e l ( log ( Fig. 3. The implementation o the logarithm approximation architecture MSB o the input number x. The output o the priority encoder is the integral part l i ( o log (. In the second stage a shiter is used to isolate the ractional part l (. For the calculation o l (, we have regarded s as a power o two. Thus, the quantities seg( and s l ( seg(, needed or the calculation o Eq. 6, can be calculated rom the ixed-point representation o l ( by separating its bits into two parts, without perorming any arithmetic or logic operations. In order to proceed with the calculation o l (, seg( and s l ( seg( are calculated at the end o the second stage. The third stage involves two concurrent

6 Table. Error E o piecewise linear approximation methods Method Segments Error E Mitchell et al. [7] 3.99E-03 SanGregory et al. [8].89E-03 Hall et al. [9] 4.65E-04 Abed et al. [0] Proposed architecture.e E E E E E-09 ROM lookups at addresses seg( and seg(+, retrieving the parameters c seg( and c seg(+. In the ourth stage, c seg( is subtracted rom c seg(+ and in the ith stage the result is multiplied by s l ( seg( using one o the on-chip multipliers. In the last stage, the results o the ourth and ith stage added to l (, producing l (, which is concatenated to l i ( in order to produce the inal log ( approximation. 3. Results Experiments were conducted to evaluate the perormance o the proposed architecture. The results are accompanied with comparisons with state o the art architectures implementing piecewise linear approaches or logarithm approximation. The proposed architecture or the approximation o the base- logarithm o 6-bit integers was implemented on a Xilinx Virtex-5 FPGA (XC5VLX0T-3) which eatures 690 slice lip-lops and LUTs, 64 DSP48E multiplier blocks and 48 36kbit BlockRAMs. The number o linear segments s and the width b o the parameters c i are adjusted so that the ROM resides on a single 36kbit BlockRAM and the multiplier is implemented on a single 5 8 bit multiplier o a DSP48E block. Thereore s ranges rom to 04, whereas b ranges rom to 4 bits. The logarithm approximation output is a ixed-point value that utilizes 4 bits or the representation o the integral part, whereas the ractional part occupies up to 3 bits. In order to evaluate the proposed architecture compared to the other piecewise linear approximation approaches, the obtained approximation error E was calculated or each method, as shown in Table. The method proposed by Hall et al. [9] surpasses the other approaches and achieves E= These piecewise linear approximation approaches [8-0] aim to provide a ast approximation o the logarithm that requires ew hardware resources. Due to their design considerations, they employ hardwired logic in order to increase the accuracy o [7]. In contrast to the aorementioned methods, the proposed architecture yields considerable gains in accuracy by eiciently using the available FPGA resources, instead o relying on hardwired error-correcting logic. We have implemented Hall s

7 Table. Comparison o the proposed architecture to Hall et al. [9] Hall et al. [9] Proposed architecture Segments s Bits b Slices Frequency 80MHz 38MHz 303MHz 88MHz BlockRAMs 0 Multipliers Error E.65E-04.37E E-07.83E-09 approach [9] on the same FPGA and compared it to the proposed architecture. The results are presented in Table and reveal that the error E can be decreased by ive orders o magnitude (E= ) when using the proposed architecture, by utilizing one BlockRAM and only moderately increasing the occupied slices. It is important to note that the 39 slices required by the proposed architecture represent less than 0.5% o the total FPGA area. In order to reach similar accuracy using more complex methods, such as polynomial [4] or CORDIC [6], signiicantly more FPGA resources would be required. We have synthesized the loating point logarithm approximation architecture presented in [4] on the same FPGA. It achieves a lowest approximation error in the order o E=0-8 (or a 3-bit mantissa), which is higher than that o the proposed architecture, while requiring a similar number o slices and 6 times more multipliers. Also, the CORDIC architecture, which produces ixed point output, does not take advantage o BlockRAMs or multipliers and occupies nearly 8 times more slices than the proposed architecture or E in the order o Conclusions We presented an FPGA-based architecture or ast and area-eicient approximation o the base- logarithm on FPGA devices. The novel eatures o this architecture can be summarized in the ollowing: It implements a piecewise linear approximation o the logarithm unction using a large number o linear segments, in order to achieve up to ive orders o magnitude higher accuracy than the comparable methods. In contrast to the state o the art implementations, it is designed to exploit the available FPGA resources, such as BlockRAMs and multipliers, in order to provide low FPGA area utilization and high requency. The irst eature allows the proposed architecture to be used in a variety o FPGA designs as a generic component that provides highly accurate logarithm calculation. The second eature enables a signiicant reduction in FPGA slices by using only one BlockRAM or the implementation o the dual-ported ROM and one FPGA multiplier, also allowing the architecture to reach a high requency potential, thus rendering the proposed architecture suitable or high-throughput applications.

8 Other logarithm approximation architectures implementing piecewise linear approximation [8-0] use up to a maximum o 6 segments, whereas they cannot be reconigured to urther increase the number o segments, as the number o segments is hardwired. The proposed architecture overcomes this limitation, consequently increasing the accuracy, while retaining the low FPGA area requirements and the requency advantages o these methods. Thus, it can be embedded into any FPGAbased application that requires ast and accurate logarithm approximation or throughput-sensitive and real-time applications. Acknowledgement This work was realized under the ramework o the Reinorcement Program o Human Research Manpower ( PENED ED34), co-unded 5% by the General Secretariat or Research and Technology, Greece, and 75% by the European Social Fund. Reerences [] D.M. Rocke and B. Durbin, Approximate variance-stabilizing transormations or geneexpression microarray data, Bioinormatics, Vol. 9 no. 8, pages , 003. [] D. Bariamis, D.K. Iakovidis, D. Maroulis, Dedicated hardware or real-time computation o second-order statistical eatures or high resolution images, Lect. Notes in Comp. Science, Volume 479 LNCS, Pages 67-77, 006 [3] D. M. Mandelbaum, and S. G. Mandelbaum, A Fast, Eicient Parallel-Acting Method o Generating Functions Deined by Power Series, Including Logarithm, Exponential, and Sine, Cosine, IEEE Trans. on Parallel and Distributed Systems, vol. 7, no., pp , Jan [4] J. Detrey, F. de Dinechin, Parametrized loating-point logarithm and exponential unctions or FPGA, Micropr. and Microsys., Vol. 3, Issue 8, pp , Dec [5] J.-A. Pineiro, M.D. Ercegovac, J. D. Bruguerax, High Radix Logarithm with Selection by Rounding, Proceedings o the IEEE Int. Con. on Application-Speciic Systems, Architectures, and Processors, pp. 0-0, July 00 [6] J. E. Volder, The CORDIC Trigonometric Computing Technique, IRE Trans. Electronic Comp., 8: , 959. [7] J.N. Mitchell Jr., Computer Multiplication and Division Using Binary Logarithms, IRE Trans. Electronic Comp., :5-57, 96. [8] S.L. SanGregory, R.E. Sierd, C. Brother, and D. Gallagher, A Fast, Low-Power Logarithm Approximation with CMOS VLSI Implementation, Proc. IEEE Midwest Symp. Circ. and Syst., Aug [9] E.L. Hall, D.D. Lynch, and S.J. Dwyer III, Generation o Products and Quotients Using Approximate Binary Logarithms or Digital Filtering Applications, IEEE Trans. Comp., vol. 9, pp , Feb [0] K. H. Abed and R. E. Sierd CMOS VLSI Implementation o a Low-Power Logarithmic Converter, IEEE Trans. Comp., vol. 5, pp , Dec 003. [] D. Lee, A. Abdul Gaar, O. Mencer and W. Luk, Optimizing hardware unction evaluation, IEEE Trans. Comp., vol. 54, no., pp , Dec 005.

Computations Of Elementary Functions Based On Table Lookup And Interpolation

RESEARCH ARTICLE OPEN ACCESS Computations Of Elementary Functions Based On Table Lookup And Interpolation Syed Aliasgar 1, Dr.V.Thrimurthulu 2, G Dillirani 3 1 Assistant Professor, Dept.of ECE, CREC, Tirupathi,