Implementation and Comparative Analysis between Devices for Double Precision Interval Arithmetic Radix 4 Wallace tree Multiplication Krutika Ranjankumar Bhagwat #1, Dr. Tejas V. Shah * 2, Prof. Deepali H. Shah # 3 # Instrumentation & Control Engineering Department L. D. College of Engineering Ahmedabad-380015, Gujarat, India *S.S College of Engineering Bhavnagar - 364060, Gujarat, India Abstract This paper presents comparative analysis between two devices for the design of a radix 4 wallace tree multiplier that performs interval multiplication. This 64 bit multiplier requires Booth partial product selection logic and [27,2] compressor. This interval arithmetic gives accurate result as rounding off error of floating point multiplier is eliminated. It requires slightly more area than conventional floating point unit. There is definite performance improvement over software approach as function calls, error and range checking etc are not present due to dedicated hardware logic. The input and output registers are each 64 bits and two multiplexer with control signal t x,t y are used [10]. Keywords Double Precision, Interval Multiplication, Booth partial product selection logic. I. INTRODUCTION IEEE 754 standard defines double precision as 1 sign bit, 11 bits for exponent,53 bits for (52 explicitly stored) significand precision. The format is written with the significand having an implicit integer bit of value 1, unless the written exponent have all Zeros, with the 52 bits of the fraction significand appearing in the memory format. The total precision is therefore 53 bits results in 16 decimal digits which gives 53 log 10 (2) 15.955 ) [4]. II. INTERVAL MULTIPLICATION Multiplication of the intervals x = [ x l, x u ] and y = [ y l,y u ] is defined as: Z = x *y = [min(x l y l, x l y u,x u y l, x u y u ),max( x l y l, x l y u, x u y l, x u y u )] Fig. 1 Interval multiplier The sign logic computes the sign of the result by performing the exclusiveor of the sign bits of the input operands. The exponent adder performs an 11bit addition of the two exponents and subtracts the exponent bias of 1023 [10]. III. RADIX 4 WALLACE TREE MULTIPLIER The significand multiplier performs a 53 bit by 53 bit radix 4 wallace tree multiplication. If the most signicant bit of the product is one, the normalization logic shifts the product right one bit and increments the exponent. The rounding logic rounds the product to 53 bits based on a rounding mode (r m ) which round to nearest even [10]. The interval multiplier shown in figure 1 has input and output registers, sign logic, an exponent adder and a significand multiplier with rounding and normalization ISSN: 2231-5381 Page 340
A. Booth multiplication Booth multiplication is a technique that allows for smaller, faster multiplication circuits, by recoding the numbers that are multiplied. It is possible to reduce the number of partial products by half. In this technique, instead of shifting and adding for every column of the multiplier term and multiplying by 1 or 0, we only take every second column, and multiply by ±1, ±2, or 0 to obtain the same results [16]. Partial products are halved in this method which gives tremendous performance advantage. Booth recode multiplier term, we consider the bits in blocks of three such that, each block overlaps the previous block by one bit. Grouping starts from the LSB, and the first block only uses two bits of the multiplier [16]. Fig. 3 Booth partial product selector logic Fig. 2 Grouping of bits from the multiplier term Figure 2 shows the grouping of bits from the multiplier term for use in modified booth encoding [16]. Each block is decoded to generate the correct partial product. The encoding of the multiplier Y, using the modified booth algorithm, generates the following five signed digits -2, -1, 0, +1, +2. Each encoded digit in the multiplier performs a certain operation on the multiplicand X as illustrated in Table 1.Booth partial product selector logic is shown in figure 3. Booth recording is used to reduce the number of partial products 53 to 27. The needed (27,2) Wallace tree is implemented which is used to add 27 partial products [16]. A [27,2] Compressor is made of three (9,2) blocks and one (6,2) block shown in figure 4. A [9,2] Compressor is made of three (3,2) blocks and one (6,2) block shown in figure 5. A [6,2] Compressor is made of two (3,2) blocks and one (4,2) block shown in figure 6. A [4,2] Compressor is made of two full adders shown in figure 7. B. [27,2] Compressor Since 27 is a multiple of 9, the (9,2) building block gives a very simple global routing structure for this multiplier. But it would not be appropriate for a 16 or 32 bit multiplier [13]. TABLE I OPERATION ON THE MULTIPLICAND X BLOCK Re- coded digit Operation on X 000 0 0X 001 +1 +1X 010 +1 +1X 011 +2 +2X 100-2 -2X 101-1 -1X 110-1 -1X 111 0 0X ISSN: 2231-5381 Page 341
Fig. 4 [27,2] Compressor respectively. Fig. 5 [9,2] Compressor TABLE II. POWER ANALYSIS xc3s1400an-5-fgg676 I(mA) P(mW) Total estimated power 0 97 consumption Quiescent Vccint 1.20V 31 37 Quiescent Vccaux 3.30V 18 60 Quiescent Vcco25 0 1 2.50V v3200efg1156-8 I(mA) P(mW) Total estimated power 0 367 consumption Quiescent Vccint 1.80V 200 360 Quiescent Vcco33 3.30V 2 7 250000 Fig. 6 [6,2] Compressor 200000 150000 100000 50000 v3200efg115 6-8 xc3s1400an- 5-fgg676 0 Total memory usage in kilobytes Fig. 7 [4,2] Compressor Fig. 8 Area Analysis IV. COMPARATIVE ANALYSIS Table 2 and 3 gives comparative analysis of interval arithmetic double precision Radix 4 wallace tree multiplication between virtexe and spartarn 3A &Spartan 3AN family which has device XCV3200E and XC3S1400AN, package FG1156and FGG676 ISSN: 2231-5381 Page 342
TABLE III. COMPARATIVE ANALYSIS Device comparision for 64 bit interval arithmetic based radix 4 wallace tree multiplier v3200efg1156-8 xc3s1400an-5-fgg676 AREA ANALYSIS Number of Slices 333 317 Number of Slice Flip Flops 143 117 Number of 4 input LUTs 648 608 Number of bonded IOBs 498 608 Number of IOs 498 498 IOB Flip Flops 214 214 Number of GCLKs: 2 2 Total memory usage 237984 kilobytes 183188 kilobytes SPEED ANALYSIS Minimum period 9.069ns 8.295ns Maximum Frequency 110.266MHz 120.548MHz Minimum input arrival time before clock 14.289ns 11.534ns Maximum output required time after clock 11.919ns 8.200ns Maximum combinational path delay 19.712ns 13.469ns TIMING CONSTRAINTS Worst case stack hold 1.632ns 1.324ns Best case achievable set up 9.948ns 9.265ns Total REAL time to Xst completion: 16.00 secs 21.00 secs THERMAL SUMMARY Estimated junction temperature 30C 27c Ambient temp 25C 25C Case temp 29C 26C Theta J-A range 13C/W 18C/W CLOCK REPORT Fanout le 214 214 Fanout clk 126 112 Net skew le ns 0.263 0.140 Net skew clk ns 0.139 0.176 Max Delay le ns 1.435 1.06 Max Delay clk ns 1.167 1.063 CLOCK SIGNAL (LOAD) clk 143 117 le 214 214 ISSN: 2231-5381 Page 343
12 25 10 8 20 6 v3200efg 1156-8 15 4 2 xc3s1400 an-5- fgg676 10 5 0 Worst case Best case stack hold achievable in ns set up in ns Fig. 9 Timing Constraints Analysis 0 Minimum period in ns Minimum input arrival time before clock Maximum output required time after Maximum combinational path delay v3200ef g1156-8 xc3s140 0an-5- fgg676 Fig. 11 Speed Analysis Fig. 10 Power Analysis V.CONCLUSION Interval arithmetic provides reliability and accuracy by computing a lower and upper bound in which result is guaranteed to reside. Concept of carry look ahead for 11 bit exponent adder is used which reduces the delay. Radix 4 wallace tree interval arithmetic based multiplication using virtexe has more the number of gates and delay compare to spartarn 3A and Spartan 3AN. 237984 kilobytes memory and 367 mw power are required for virtexe Radix 4 Wallace tree multiplication using interval arithmetic with 19.712ns maximum combinational path delay. While only 183188 kilobytes memory and 97 mw power required for spartarn 3A &Spartan 3AN. REFERENCES [1] Josh Milthorpe and Alistair Rendell Learning to live with errors: A fresh look at floating-point computation, Australian National University, Computing Conference 2005 [2] Gupte, ruchir Interval arithmetic logic unit for dsp and control applications, Electrical and Computer Engineering, Raleigh 2006 [3] Samir Palniker, Verilog HDL: A Guide to Digital Design and Synthesis, ISBN 81-297-0092-1, @2003 SUN MICROSYSTEMS ISSN: 2231-5381 Page 344
[4] IEEE Standard 754 for Binary Floating Point Arithmetic, ANSI/IEEE Standard No. 754, American National Standards Institute, Washington DC, 1985. [5] Behrooz Parhami, Computer Arithmetic, Algorithms and Hardware Designs, 2nd Edn, OXFORD, 2011 [6] C. N.Marimuthu1, P. Thangaraj2, Low Power High Performance Multiplier,Anna University,Tamil nadu, India, ICGST-PDCS, Volume 8, Issue 1, December 2008 [7] Michael J. Schulte and Earl E. Swartzlander Jr., A Performance Comparison Study on Multiplier Designs,IEEE Transaction On Computers, May 2000 [8] Yong Dou S. Vassiliadis G. K. Kuzmanov G. N. Gaydadjiev, 64-bit Floating-Point FPGA Matrix Multiplication, National Laboratory for Computer Engineering, FPGA 05, Monterey, California, USA, February 2005 [9] Anane Nadjia, Anane Mohamed, Bessalah Hamid, Issad Mohamed & Messaoudi khadidja, Hardware Algorithm for Variable Precision Multiplication on FPGA 2009 IEEE [10] James E. Stine and Michael J. Schulte A Combined Interval and Floating Point Multiplier, Computer Architecture and Arithmetic Laboratory,Electrical Engineering and Computer Science Department, Lehigh University, Bethlehem, PA 18015 [11] Sparc Architecture Manual [12] Prof. LohCS3220- Processor Design Carry-Save Addition - Spring, February, 2005 [13] Carry Save Adder Trees in Multipliers ecen 6 2 6 3 advanced vlsi design november 3, 1999 [14] C..N. Marimuthu1, P. Thangaraj Low Power High Performance Multiplier, Anna University, Tamil nadu, India [15] Steve Kilts, Advanced FPGA Design Architecture, Implementation, and Optimization, Wiley Interscience, A John Wiley & Sons, ISBN 978-0-470-05437-6, @ 2007 IEEE [16] p. Assady, A New Multiplication Algorithm Using High-Speed Counters Islamic Azad University Varameen branch, Iran, EuroJournals Publishing, Inc. 2009 ISSN: 2231-5381 Page 345