ISSN Vol.03,Issue.05, July-2015, Pages:

Size: px
Start display at page:

Download "ISSN Vol.03,Issue.05, July-2015, Pages:"

Transcription

1 ISSN Vol.03,Issue.05, July-2015, Pages: Evaluation of Fast Radix-10 Multiplication using Redundant BCD Codes for Speedup and Simplify Applications B. NAGARJUN SINGH 1, T. VENKATARAMARAO 2 1 Assoc Prof & HOD, Dept of ECE, Sarada Institute Technology and Science, India, nagarjun.singh24@gmail.com. 2 PG Scholar, Dept of ECE, Sarada Institute Technology and Science, TS, India, venkataramarao321@gmail.com. Abstract: We present the algorithm and architecture of a BCD parallel multiplier that exploits some properties of two different redundant BCD codes to speed up its computation: the redundant BCD excess-3 code (XS-3), and the overloaded BCD representation (ODDS). In addition, new techniques are developed to reduce significantly the latency and area of previous representative high performance implementations. Partial products are generated in parallel using a signed-digit radix-10encoding of the BCD multiplier with the digit set [-5, 5], and a set of positive multiplicand multiples (0X, 1X, 2X, 3X, 4X, 5X) coded in XS-3. Hardware implementations normally use BCD instead of binary to manipulate decimal fixed-point operands and integer significant of DFP numbers for easy conversion between machine and user representations. However, BCD is less efficient for encoding integers than binary, since codes 10 to 15 are unused. Moreover, the implementation of BCD arithmetic has more complications than binary, which lead to area and delay penalties in the resulting arithmetic units. The proposed decimal multiplier uses internally a redundant BCD arithmetic to speed up and simplify the implementation. In this paper we have presented the algorithm and architecture of a new BCD parallel multiplier. The area and delay figures estimated from both a theoretical model and synthesis show that our BCD multiplier presents 20-35% less area than the existing designs. Keywords: Parallel Multiplication, Decimal Hardware, Overloaded BCD Representation, Redundant Excess-3 Code, Redundant Arithmetic. I. INTRODUCTION Decimal fixed-point and floating-point formats are important in financial, commercial, and user-oriented computing. Since area and power dissipation are critical design factors in state-of-the-art DFPUs, multiplication and division are performed iteratively by means of digit-by digit algorithms and therefore they present low performance. Moreover, the aggressive cycle time of these processors puts an additional constraint on the use of parallel techniques for reducing the latency of DFP multiplication in high performance DFPUs. The improvement of parallel decimal multiplication by exploiting the redundancy of two decimal representations: the ODDS and the redundant BCD excess-3 (XS-3) representation, a self complementing code with the digit set. The general redundant BCD arithmetic is used to (that includes the ODDS, XS-3 and BCD representations to accelerate parallel BCD multiplication in two ways such as Partial Product Generation and Partial Product Reduction. The design of area efficient high speed with reasonable power consumption the decimal multiplication is one of the most important decimal arithmetic operations which have a growing demand in the area of commercial, financial, and scientific computing. The reasons behind the use of binary data for doing the arithmetic operations in almost all the computer systems is the speed and simplicity of binary arithmetic, efficiency in storing the binary data. But, for the Digital Signal processing and commercial applications, the use of decimal arithmetic is still relevant. But the speed of the operation major concern for the decimal software. Moreover, the commercial databases contain more decimal data than binary data. For the purpose of processing, these decimal data are converted into binary data. And, once the processing is completed, those are again converted back into the decimal format. Cause some delay binary Coded Decimal (BCD) on-going research which is carried out in almost everywhere using BCD with reduced area and delay. The computer arithmetic literature includes articles on decimal arithmetic hardware such as BCD adders and multi operand BCD adders, sequential BCD multipliers and dividers. However, some single arithmetic operations (e.g., division or square root), and also function evaluation circuits (e.g., radix-10 exponentiation or logarithm), are often implemented by a sequence of simpler operations including several multiplications. Therefore, hardware implementation of such operations and functions would call for high-speed parallel decimal multiplication. Such multiplication schemes are generally implemented as a sequence of three steps: partial product generation (PPG), partial product reduction (PPR), and the final carry- propagating addition. Parallel binary multipliers are used extensively in most of the binary floating point units for high performance. However, decimal multiplication is more difficult to implement due to the complexity in the generation of multiplicand multiples and the inefficiency of representing decimal values in system 2015 IJIT. All rights reserved.

2 based on binary signals. These issues complicate the generation and reduction of partial products. The first implementation of a parallel decimal multiplier is described. Several different parallel decimal multiplier architectures are proposed, which use new techniques for partial product generation and reduction. Furthermore, some of these architectures were extended to support binary multiplication. Some concepts were applied to design decimal 4:2 compressor trees. All of the previous designs are combinational fixed point architectures. A pipelined IEEE compliant DFP multiplier based on architecture was presented. The work is the major extension of the previous paper, which presented a new family of high performance parallel decimal multipliers. In this paper, we deal with fully combinational decimal fixed-point architecture. We describe in some detail the methods for partial product generation and reductions proposed and introduce new techniques to reduce the latency and the hardware complexity of the previous designs. The main objectives to improve the performance of BCD multiplication other objectives are given below. To avoid long carry-propagations in the generation of decimal positive multiplicand multiples To obtain the negative multiples from the corresponding positive ones easily To simplify conversion of the partial products generated in XS-3 to the ODDS representation for efficient partial product reduction. B. NAGARJUN SINGH, T. VENKATARAMARAO The partial products array PP (i) depicted in Fig.1 are reduced by CSA techniques; each column of M+1 operands is condensed into two ones. The strategy adopted to tackle and simplify the design is to divide the partial products into subsets, as is observed in Fig.3. In order to carry out the reduction, this paper proposes decimal Q: 2CS as which reduce Q digits to two ones. These reductions are implemented in different versions, for Q=3, 4,5,8,9 decimal operands. This will be further detailed in Section 3. It must be pointed out that the processing is based on the (4221) coding. Several decimal additions are applied over two reduced digits. The fast carry-chain adder described is a suitable alternative due to its high performance in terms of area-delay. II. AN OVERVIEW OFBCDMULTIPLIER A generic N-digit by M-digit multiplication P=AxB is indicated in Fig.1, where A and B are the multiplicand and multiplier respectively. The operation is made of the following modules: generation of partial products, reduction of partial products tree, and fast decimal adders. Decimal multiplication coded in BCD-8421(8421) presents a more complex implementation than binary multiplication due to the presence of invalid (8421) digits between {A,B F}. These need to be corrected generating an extra cost in computation as well as additional multiplicand multiples that must be implemented by the multiplier. The generation of decimal partial products is based on the techniques described. First of all, decimal digits of A are represented in BCD-4221(4221) to prevent the corrections previously mentioned. This reduces the computation complexity of multiplicand multiples coded into (4221). ABCD-5211(5211) format is introduced in this Section. Both of these formats are necessary to determine these to multiples. For example, the multiple 2 A is computed as follows: each (8421) digit is first recoded to (5421) decimal as it is shown in Fig.2a; straight away a1-bit left shift is carried out, obtaining the 2A multiple in (8421). Then, the 2A multiple is easily recoded from (8421) to (4221). Fig.1.Decimal SdRadix-10 pipe lined multiplier diagram. The rest of multiples are Computed according to Fig.2a. The multiplier B is recoded into signed digit (SD) radix-10. The (4221) or (5211) coding is convenient because speed-up the CSA reduction, whose 9 s complement is generated by simple bit inversion. In order to simplify the computational process, in this paper a reduced non-redundant decimal code (4221) is adopted. The reduced coding is exhibited in Fig.2b. (a) Multiplicand Multiples Generation

3 Evaluation of Fast Radix-10 Multiplication Using Redundant BCD Codes For Speedup and Simplify Applications Each block stage carries out M+3+ (M/#blocks) CSA reductions. At the same time, the reductions in the current pipelined stage are processed. The 4221 Adder Recoding unit transforms the two 8-bit out puts 2H (i), S(i) of the previous unit into two 4-bit digits D1(i), D0(i) coded in (4221). Index (i) denotes the ith CSA reduction. The early block is based on a (4221) addition and a circuit to detect special cases. Straight away, the (4221) outputs are recoded into (8421). Each mentioned stage pipeline executes a decimal addition which is based on carry-chain techniques. Finally, the result from the last pipe lined stage shows the multiplication N+Mdigit AxB. (b) ReducedBCD-4221 coding Fig.2.Multiples for SD radix-10 encoding. III. NXMBCDMULTIPLIERIMPLMENTATION A general overview of NxM-digit BCD multiplier architecture is described below and depicted in Fig.1. Dashed blocks indicate the main modules of the design, and the dotted line indicates the pipe lined stages. AN-digit multiplicand A and a M-digit multiplier B as unsigned decimal are assumed. The (8421) multiplication is processed as follows: the multiplicand multiples generation unit takes the operand A and computes a set of N+4-digit multiplicand multiples {1A, 2A, 3A, 4A, 5A} coded into (4221). In parallel, the sd-radix- 10 unit recodes each digit (8421) of B between {-5,-4,-3, +3,+4,+5} that is represented with a1-bit sign and 5-bit magnitude format. The recoded B is multiplied digit-by-digit by the previously computed multiplicand multiples. It is important to emphasize that the (8421) coding introduces a computational cost due to the corrections in the decimal reduction CSA. An alternative to prevent this drawback is to use (4221) coding. Immediately after, the partial product generation unit takes the outputs from the two modules previously mentioned and generates M+1 partial product which depends on each re-coded digit of B, represented in signed-digitradix-10 format. Each partial product coded in (4221) is at most of N+3- digit length. All of partial products are taken as inputs in the decimal Q: 2CS A reduction tree module, this simplifies Q digit into two decimal digits coded in(4221). The basic scheme with regard to decimal 3:2CSA reduction fulfills the relation A(i)+B(i)+C(i)=2H(i)+S(i), i {0,1,2, M-1} (Fig.3a). The 4-bit inputs A(i), B(i), and C(i) are simplified to 4-bit S and H by means of 4 full-adder cells. It can be noted that the 2H (i) is produced by the block2x. Therefore, the considered outputs will require at least 5-bit operands. The proposed decimal Q:2 reduction presents two outputs:2h and S of 8 bits each one. This architecture can be extended to different versions of decimal Q:2 compressors. To validate and carry out the evaluations, several decimal Q: 2CS as between 3:2 and 9:2 compressors were tested. The proposed decimal Q: 2CSA is base on several basic binary CSAs (Fig.3b, 3c, 3d, 3e) and these equally are made up of full or half-adders as basic cells. The partial products are split in # blocks which provide different pipeline stages (Fig.4). In, a performance comparison between 8x8 and 16x16 multiplications implemented in several pipelined levels is presented. Fig.3. Proposed Decimal 3:2CSA and several Binary Adder schemes. A. Signed-DigitRadix-10 recoding/ Multiplicand Multiples Generation The block diagram is depicted in the Fig.2a; the recoding is applied to each digit of B. This transforms a (8421) set {0, 1, 2..,9} into the signed-digit set{-5,-4 +4,+5} made up of a

4 sign signal SX (i) and a5-bit magnitude X(i). Each digit X (i) selects the corresponding multiplicand multiples coded in (4221). The sign SX verifies if a negative multiple is produced. It is important to highlight that a negative version of a partial product is obtained simply by inverting the bits of the positive version using arrays of XOR gates manipulated by SX (i). The above circuit requires 2 slices and6luts. The following section discusses Multiplicand Multiples Generation (Fig.2a), this set of multiples are coded into (4221). Multiples 2A and 5A are generated with recoding (8421 to 4221, 5211 to 4221) and carrying out a left shifting process. Multiple 4A is computed as 2x2 A and 3A comes from a decimal addition between A and 2A. Each partial product has N+3 digits. B. NAGARJUN SINGH, T. VENKATARAMARAO As an example the decimal 8:2CSA will be described: This implementation requires 4binary modules 8:4CSA, 2 binary 3:2CSA and seven 2x blocks. Each binary 8:4CSA is made up of one half and 4 full adders and has an area of 7LUT 4s and 3LUT6s. Table I exhibits the area contribution of several decimal CSA implementations. TABLE I: Area of Several Proposed Decimal CSAS B. Reduction Partial Product After generating the M+1 partial products, coded into (4221), the reduction of partial products is developed by means of decimal Q:2 carry save additions (CSA). First, the partial products are aligned and divided into blocks as it is shown in 3:2 Carry save adder Fig.4. A circuit based on decimal 3:2, 4:2, and 8:2CSA compressors is proposed, which entail to utilize block sizes of (M+3+M/#blocks) xq decimal operands. A decimal 3:2CSA compressor (Fig.3a) adds three numbers A, B, C each one of size 4-bit (coded in 4221) to get a decimal S and a carry H, such that A+B+C=S+2H as it was explained in Section 3. The 2X module observed in Fig.3a is composed of a (4221 to 5211) coding operation and a one bit left shift operation. The decimal 3:2CSA is implemented by 4 binaries 3:2CSA, each one utilizing 2LUT3s, and a 2X block. The hardware cost of different decimal Q: 2 versions are shown in the Table I. The above basic binary 3:2 (full adder) circuit is utilized as basic cell in different CSA compressor schemes. Fig.4.Example of partial products reduction architecture. C. BCD-4221 Adder Recoding After processing the earlier stage, the (4221) Adder Recoding unit in Fig.5 takes as inputs the outputs of each Q:2 reduction and codes the min to two decimal digits (4221) of 4-bit size. This implementation is based on a (4221) addition and a circuit to detect special cases. A sit is specified in the Section III, a (4221) decimal addition does not require correction as usually occurs in the (8421) decimal adder. As soon as the circuit receives the two outputs of 8 bits of the early implementation, a2-digit decimal addition in (4221) format is carried out. The function is represented as: 2H(i)(7..0)+S(i)(7..0)=D1(i)(3..0)&D0(i)(3..0), where (i) represents the i-th Q:2 reduction process. In Fig.5 the implemented architecture can be observed. One hand, the Q: 2 compressors are limited to a reduced group of (4221) code. Thus lead store code the outputs of the compressors to guarantee that 2H and S are belonging to the above code. On the other hand, there are certain decimal digits which match with the previously reduced codes that do not develop a correctly addition in (4221) format, therefore it is necessary to implement a detection circuit to overcome these cases (exhibited in Fig.5 as correction circuit). The (4221) adder is a completely combinational design and utilizes an area of: 32LUTs, and 10 Slices. D. Decimal Addition Design After finishing the previous stage and as soon as its outputs are computed, a (8421) decimal addition is carried out. First of all, an aligned process on the vectors D0 and D1 is developed, just before both of the mare recoded in to (8421). As it is pointed out at the beginning of this section, various levels of pipeline=log2 (M) +1 are implemented. Several decimal Additions are processed at each pipelined stage (Fig.4), carrying out σ additions of (M+3+M/#blocks) digits. The proposed addition is based on 10 s complement BCD numbers, and the carry-chains techniques to carry out the full design. The carry-chain addition consists in computing beforehand all propagating carry and carry generating conditions, in order to reduce the overall execution time. The implemented adder utilizes an area of 10x (M+3+M/#blocks) LUTs. It has been considered the highperformance adder proposed.

5 Evaluation of Fast Radix-10 Multiplication Using Redundant BCD Codes For Speedup and Simplify Applications that is, the delay of a 1 inverter with a fan out of four inverters. The hardware complexity is given as the number of equivalent minimum size NAND2 gates. We do not expect this rough model to give absolute area-delay figures, due to the high wiring complexity of parallel multipliers. However, based on our experience this model is good enough for making design decisions at gate level and it provides reasonable accuracy of area and delay ratios to compare different designs. The delay, input capacitance (L in ) and area of the main building blocks used in the BCD multipliers. The input capacitance is normalized to the input capacitance of the 1 inverter. The L out parameter represents the normalized output load connected to the gate. The XOR2 gate is implemented with CMOS transmission gates. To evaluate our architectures, gates with the drive strength of the minimum sized (1 ) inverter have been assumed, and buffers have been inserted to deal with high loads. The critical path delay in every stage of the multiplier has been estimated as the sum of Fig.5. BCD-4221 Adder recoding architecture. the delays of the gates on this critical path. The area and delay figures obtained for the digit and digit architectures are shown in Table 2. IV. EVALUATION AND COMPARISON The proposed combinational architectures for BCD digit and digit multipliers are evaluated and compared with other representative BCD multipliers. The area and delay figures of our architectures were obtained from an areadelay evaluation model for static CMOS gates, and validated with the synthesis of verified RTL models coded in VHDL. This evaluation is detailed in Section 4.1. TABLE II: Area and Delay (LE-Based Model) for the Proposed Mults Synthesis Results: The designs have been synthesized using Synopsys Design Compiler B SP3 and a 90 nm CMOS standard cell library [12]. The FO4 delay for this library is 49 ps under typical conditions (1 V, 25 C). The area-delay curves of Fig. 6 have been obtained with the constraint C out = C in =4C inv, where C inv is the input capacitance of a 1 inverter of the library. Finally, the most representative sequential and parallel decimal multipliers have been compared with our architecture. The results of the comparison are summarized in SectionIV. A. Evaluation As stated above, the evaluation has been performed in two steps. First, a technological independent evaluation using a model for static CMOS circuits based on Logical Effort (LE) has been carried out, and then the results obtained with this model have been validated with the synthesis and functional verification of the RTL model. Technological Independent Evaluation: Our technological independent evaluation model allows us to obtain a rough estimation of the area and delay figures for the architecture being evaluated. It takes into account the different input and output gate loads, but neither interconnections nor gate sizing optimizations are modeled. The delay is given in FO4 units, Fig.6. Area-delay space obtained from synthesis. We also include in Fig. 6 the area-delay points estimated from the LE-based model evaluation. We have kept the hierarchy of the design in the synthesis process as described in Sections 3 to 6 (no top level architecture optimization options). Nevertheless, some specific structures have been optimized internally to reduce the overall delay. To ensure the correctness of the designs we have simulated the RTL models of the digit and digit multipliers using the Synopsys VCS-MX tool and a comprehensive set of random test vectors. B. Comparison Table 3 shows the area and delay estimations obtained from synthesis for some representative BCD sequential and combinational multipliers. As far as we know, the most

6 representative high-performance BCD multipliers are digits combinational and sequential [10], [11] implementations. The area and delay figures shown in Table 3 correspond to the minimum delay point of each implementation, and were obtained from the synthesis results provided in the respective reference, except for the two multipliers of reference, which correspond to an estimation obtained by their authors using a LE-based model. The comparison ratios are given with respect to the area and delay figures of a 53-bit binary Booth radix-4 multiplier extracted. The PPG of multipliers [8] is based on a SD radix-5 scheme that generates 32 BCD partial products for a 16-digit multiplier. Though it only requires simple constant time delay BCD multiplicand multiples, the 9 s complement operation for obtaining the negative multiples is more complex than a simple bit inversion. The partial product reduction implemented is a BCD carry-save adder tree build of BCD digit adders. On the other hand, the BCD partial products are reduced in [8] by using counters that compute the binary sum of each column of digits sum, and subsequent binary to decimal conversions [8]. B. NAGARJUN SINGH, T. VENKATARAMARAO The BCD multipliers in use either the SD radix-5 PPG scheme or a SD radix-10 PPG scheme The last one has the advantage that practically it halves the number of partial products generated by the former (17 against 32 for digit multiplications). However, it has the disadvantage of a significant latency overhead due to the generation of the complex multiple 3X. The latency and area of prior-art multipliers are improved by representing the partial products in (4221) or (5211) decimal codes, which allow them to implement PPR using a very regular and compact tree of 4-bit binary carry-save adders (built of 3:2 or 4:2 compressors) and decimal digit doublers. The most recent implementation is presented. It also uses a SD radix-10 PPG scheme to reduce the number of partial products generated to 17, and subsequently, the area of the PPR tree. To avoid the latency overhead of the 3 multiple generation, the partial products are coded in are dundant SD representation. The BCD multiplier pre-computes all the positive decimal multiplicand multiples {0X;... ; 9X}. The delay of PPG is reduced by representing the complex operands (3X; 6X; 7X; 8X; 9X) as the sum of two simpler multiples. The number of partial products generated is therefore equivalent to that of the SD radix-5 scheme. The PPR tree is implemented with BCD digit adders. This has the disadvantage of a large area compared to the other BCD multipliers analyzed. The two digit BCD multipliers implement an easy-multiple PPG [10] (only pre-computes {2X; 4X; 5X}) that produces 32 BCD partial products. The intermediate decimal partial product sums are computed in overloaded BCD to speed up the PPR evaluation. The delay-improved design uses a tree structure built of five levels overloaded BCD digit adders, while the area-improved design replaces two levels of these custom designed adders by three levels of 4:2 compressors and a binary counter. This reduces the area consumption but at the cost of introducing a significant latency penalty. TABLE III: Synthesis Results for Fixed-Point Multipliers Fig.7. Area-delay space for the fastest digit mults. Sequential digit (Decimal64) BCD multipliers are about two times smaller than equivalent parallel implementations, but have higher latency and reduced throughput (one mult issued every 17 cycles). For example, the proposed multiplier is about seven times faster than the best sequential implementation proposed in [10], but requires 2.5 times more area. To compare the high hardware cost of a combinational Decimal128 implementation, we also include in Table 3 the area and delay figures obtained for our digit BCD multiplier. Due to the tight area and power consumption constraints of current DFUs [6], a sequential architecture seems a more realistic solution than a fully pipelined implementation for a commercial Decimal128 multiplier. Finally, we present a more detailed comparison of the fastest BCD digit combinational multipliers (SD Radix-5 and SD Radix-10, and the proposed one) in terms of latency and area. The corresponding are a delay synthesis values are shown in Fig. 7. We have directly introduced in the figure the area-delay curves of referenced multipliers as provided by their authors, since all of them were synthesized using 90 nm CMOS standard cell libraries and similar conditions. The area-delay points for the two multipliers of reference correspond to an estimation obtained by their authors using a LE-based model. From the area-delay space represented in Fig. 7, we observe that our proposed decimal multiplier has an area

7 Evaluation of Fast Radix-10 Multiplication Using Redundant BCD Codes For Speedup and Simplify Applications improvement roughly in the range percent depending on the target delay. On the other hand, for the minimum delay point (44FO4), the proposed multiplier is still competitive with the fastest design. More recently, the authors of reference have presented in a comparison study between their delay improved multiplier and the multiplier of reference based on synthesis results using a TSMC 130 nm standard CMOS process under typical conditions (1.2 V, 25 C). They show that for the minimum delay point of each one of the two area-delay curves obtained, the delay-improved multiplier is 20 percent faster and has 10 percent less area compared to the design. Therefore, according to the curve corresponding to the design presented should be to the left of the area-delay points corresponding to the delay-improved design presented. V. CONCLUSION This paper has reassessed several implementations of N-x M-digit multiplications on Xilinx FPGAs. This work presents the design of several BCD multipliers and their implementations on Virtex-6FPGA. Previous techniques and designs proposed are analyzed to carry out performance comparison in terms of area-delay as it was seen in Section 4. The proposed pipe lined multiplication is based on a multiplier operand coded in to SDradix-10 recoding. A parallel generation of decimal partial products is reduced by carry save adder techniques and decimal adders. The proposed circuit presents a high-performance design of decimal multiplier. Our proposal shows encouraging results: one hand, its figures are comparable and outperformed than other multipliers. The other hand, a considerable number of cores can be fit in to large FPGA. Future work plans to include the proposed multiplier in operations based on decimal floating point. Also, a good alternative would be to code the multiplier operand in to SDradix-5 and SDradix-4 formats and to compare the performance results. Finally, a Xilinx Micro blaze Embedded processor can be utilized to execute multiplications on SW and HW, in order to estimate the SW-HW performance-cost. VI. REFERENCES [1] Alvaro V_azquez, Member, IEEE, Elisardo Antelo, and Javier D. Bruguera, Member, IEEE, Fast Radix-10 Multiplication Using Redundant BCD Codes, IEEE Transactions on Computers, Vol. 63, No. 8, August [2] A. Aswal, M. G. Perumal, and G. N. S. Prasanna, On basic financial decimal operations on binary machines, IEEE Trans. Comput., vol. 61, no. 8, pp , Aug [3] M. F. Cowlishaw, E. M. Schwarz, R. M. Smith, and C. F. Webb, A decimal floating-point specification, in Proc. 15th IEEE Symp. Comput. Arithmetic, Jun. 2001, pp [4] M. F. Cowlishaw, Decimal floating-point: Algorism for computers, in Proc. 16th IEEE Symp. Comput. Arithmetic, Jul. 2003, pp [5] S. Carlough and E. Schwarz, Power6 decimal divide, in Proc. 18 th IEEE Symp. Appl.-Specific Syst., Arch., Process, Jul. 2007, pp [6] S. Carlough, S. Mueller, A. Collura, and M. Kroener, The IBM zenterprise-196 decimal floating point accelerator, in Proc. 20 th IEEE Symp. Comput. Arithmetic, Jul. 2011, pp [7] L. Dadda, Multi-operand parallel decimal adder: A mixed binary and BCD approach, IEEE Trans. Comput., vol. 56, no. 10, pp , Oct [8] L. Dadda and A. Nannarelli, A variant of a Radix-10 combinational multiplier, in Proc. IEEE Int. Symp. Circuits Syst., May 2008, pp [9] L. Eisen, J. W. Ward, H.-W. Tast, N. Mading, J. Leenstra, S. M. Mueller, C. Jacobi, J. Preiss, E. M. Schwarz, and S. R. Carlough, IBM POWER6 accelerators: VMX and DFU, IBM J. Res. Dev., vol. 51, no. 6, pp , Nov [10] M. A. Erle and M. J. Schulte, Decimal multiplication via carry save addition, in Proc. IEEE Int. Conf Appl.- Specific Syst., Arch., Process., Jun. 2003, pp [11] M. A. Erle, E. M. Schwarz, and M. J. Schulte, Decimal multiplication with efficient partial product generation, in Proc. 17th IEEE Symp. Comput. Arithmetic, Jun. 2005, pp [12] Faraday Tech. Corp. (2004). 90nm UMC L90 standard performance low-k library (RVT). [Online]. Available: Author s Profile: B. NAGARJUN SINGH, Assoc Prof & HOD, Dept of ECE, Sarada Institute Technology and Science, India, nagarjun.singh24@gmail.com. T. VENKATARAMARAO, PG Scholar, Dept of ECE, Sarada Institute Technology and Science, TS, India, venkataramarao321@gmail.com

High Throughput Radix-D Multiplication Using BCD

High Throughput Radix-D Multiplication Using BCD High Throughput Radix-D Multiplication Using BCD Y.Raj Kumar PG Scholar, VLSI&ES, Dept of ECE, Vidya Bharathi Institute of Technology, Janagaon, Warangal, Telangana. Dharavath Jagan, M.Tech Associate Professor,

More information

Efficient Radix-10 Multiplication Using BCD Codes

Efficient Radix-10 Multiplication Using BCD Codes Efficient Radix-10 Multiplication Using BCD Codes P.Ranjith Kumar Reddy M.Tech VLSI, Department of ECE, CMR Institute of Technology. P.Navitha Assistant Professor, Department of ECE, CMR Institute of Technology.

More information

Improved Design of High Performance Radix-10 Multiplication Using BCD Codes

Improved Design of High Performance Radix-10 Multiplication Using BCD Codes International OPEN ACCESS Journal ISSN: 2249-6645 Of Modern Engineering Research (IJMER) Improved Design of High Performance Radix-10 Multiplication Using BCD Codes 1 A. Anusha, 2 C.Ashok Kumar 1 M.Tech

More information

High Speed Multiplication Using BCD Codes For DSP Applications

High Speed Multiplication Using BCD Codes For DSP Applications High Speed Multiplication Using BCD Codes For DSP Applications Balasundaram 1, Dr. R. Vijayabhasker 2 PG Scholar, Dept. Electronics & Communication Engineering, Anna University Regional Centre, Coimbatore,

More information

Design of BCD Parrell Multiplier Using Redundant BCD Codes

Design of BCD Parrell Multiplier Using Redundant BCD Codes International Journal of Emerging Engineering Research and Technology Volume 3, Issue 9, September, 2015, PP 68-76 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Design of BCD Parrell Multiplier Using

More information

DESIGN AND IMPLEMENTATION OF FAST DECIMAL MULTIPLIER USING SMSD ENCODING TECHNIQUE

DESIGN AND IMPLEMENTATION OF FAST DECIMAL MULTIPLIER USING SMSD ENCODING TECHNIQUE RESEARCH ARTICLE OPEN ACCESS DESIGN AND IMPLEMENTATION OF FAST DECIMAL MULTIPLIER USING SMSD ENCODING TECHNIQUE S.Sirisha PG Scholar Department of Electronics and Communication Engineering AITS, Kadapa,

More information

A New Family of High Performance Parallel Decimal Multipliers

A New Family of High Performance Parallel Decimal Multipliers A New Family of High Performance Parallel Decimal Multipliers Alvaro Vázquez, Elisardo Antelo University of Santiago de Compostela Dept. of Electronic and Computer Science 15782 Santiago de Compostela,

More information

1902 IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 8, AUGUST Fast Radix-10 Multiplication Using Redundant BCD Codes

1902 IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 8, AUGUST Fast Radix-10 Multiplication Using Redundant BCD Codes 1902 IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 8, AUGUST 2014 Fast Radix-10 Multiplication Using Redundant BCD Codes Alvaro Vazquez, Member, IEEE, Elisardo Antelo, and Javier D. Bruguera, Member, IEEE

More information

I. Introduction. India; 2 Assistant Professor, Department of Electronics & Communication Engineering, SRIT, Jabalpur (M.P.

I. Introduction. India; 2 Assistant Professor, Department of Electronics & Communication Engineering, SRIT, Jabalpur (M.P. A Decimal / Binary Multi-operand Adder using a Fast Binary to Decimal Converter-A Review Ruchi Bhatt, Divyanshu Rao, Ravi Mohan 1 M. Tech Scholar, Department of Electronics & Communication Engineering,

More information

Improved Combined Binary/Decimal Fixed-Point Multipliers

Improved Combined Binary/Decimal Fixed-Point Multipliers Improved Combined Binary/Decimal Fixed-Point Multipliers Brian Hickmann and Michael Schulte University of Wisconsin - Madison Dept. of Electrical and Computer Engineering Madison, WI 53706 brian@hickmann.us

More information

II. MOTIVATION AND IMPLEMENTATION

II. MOTIVATION AND IMPLEMENTATION An Efficient Design of Modified Booth Recoder for Fused Add-Multiply operator Dhanalakshmi.G Applied Electronics PSN College of Engineering and Technology Tirunelveli dhanamgovind20@gmail.com Prof.V.Gopi

More information

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator A.Sindhu 1, K.PriyaMeenakshi 2 PG Student [VLSI], Dept. of ECE, Muthayammal Engineering College, Rasipuram, Tamil Nadu,

More information

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018 RESEARCH ARTICLE DESIGN AND ANALYSIS OF RADIX-16 BOOTH PARTIAL PRODUCT GENERATOR FOR 64-BIT BINARY MULTIPLIERS K.Deepthi 1, Dr.T.Lalith Kumar 2 OPEN ACCESS 1 PG Scholar,Dept. Of ECE,Annamacharya Institute

More information

Low Power Floating-Point Multiplier Based On Vedic Mathematics

Low Power Floating-Point Multiplier Based On Vedic Mathematics Low Power Floating-Point Multiplier Based On Vedic Mathematics K.Prashant Gokul, M.E(VLSI Design), Sri Ramanujar Engineering College, Chennai Prof.S.Murugeswari., Supervisor,Prof.&Head,ECE.,SREC.,Chennai-600

More information

A Radix-10 SRT Divider Based on Alternative BCD Codings

A Radix-10 SRT Divider Based on Alternative BCD Codings A Radix-10 SRT Divider Based on Alternative BCD Codings Alvaro Vázquez, Elisardo Antelo University of Santiago de Compostela Dept. of Electronic and Computer Science 782 Santiago de Compostela, Spain alvaro@dec.usc.es,

More information

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator

Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator D.S. Vanaja 1, S. Sandeep 2 1 M. Tech scholar in VLSI System Design, Department of ECE, Sri VenkatesaPerumal

More information

Analysis of Different Multiplication Algorithms & FPGA Implementation

Analysis of Different Multiplication Algorithms & FPGA Implementation IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 4, Issue 2, Ver. I (Mar-Apr. 2014), PP 29-35 e-issn: 2319 4200, p-issn No. : 2319 4197 Analysis of Different Multiplication Algorithms & FPGA

More information

Design and Implementation of IEEE-754 Decimal Floating Point Adder, Subtractor and Multiplier

Design and Implementation of IEEE-754 Decimal Floating Point Adder, Subtractor and Multiplier International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 8958, Volume-4 Issue 1, October 2014 Design and Implementation of IEEE-754 Decimal Floating Point Adder, Subtractor and Multiplier

More information

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3 Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number Chapter 3 Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number Chapter 3 3.1 Introduction The various sections

More information

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems. Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems. K. Ram Prakash 1, A.V.Sanju 2 1 Professor, 2 PG scholar, Department of Electronics

More information

A Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor

A Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor A Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor Abstract Increasing prominence of commercial, financial and internet-based applications, which process decimal data, there

More information

An IEEE Decimal Parallel and Pipelined FPGA Floating Point Multiplier

An IEEE Decimal Parallel and Pipelined FPGA Floating Point Multiplier An IEEE 754 2008 Decimal Parallel and Pipelined FPGA Floating Point Multiplier Malte Baesler, Sven Ole Voigt, Thomas Teufel Institute for Reliable Computing Hamburg University of Technology September 1st,

More information

Reducing Computational Time using Radix-4 in 2 s Complement Rectangular Multipliers

Reducing Computational Time using Radix-4 in 2 s Complement Rectangular Multipliers Reducing Computational Time using Radix-4 in 2 s Complement Rectangular Multipliers Y. Latha Post Graduate Scholar, Indur institute of Engineering & Technology, Siddipet K.Padmavathi Associate. Professor,

More information

16-BIT DECIMAL CONVERTER FOR DECIMAL / BINARY MULTI-OPERAND ADDER

16-BIT DECIMAL CONVERTER FOR DECIMAL / BINARY MULTI-OPERAND ADDER 16-BIT DECIMAL CONVERTER FOR DECIMAL / BINARY MULTI-OPERAND ADDER #1 SATYA KUSUMA NAIDU, M.Tech Student, #2 D.JHANSI LAKSHMI, Assistant Professor, Dept of EEE, KAKINADA INSTITUTE OF TECHNOLOGICAL SCIENCES,

More information

32-bit Signed and Unsigned Advanced Modified Booth Multiplication using Radix-4 Encoding Algorithm

32-bit Signed and Unsigned Advanced Modified Booth Multiplication using Radix-4 Encoding Algorithm 2016 IJSRSET Volume 2 Issue 3 Print ISSN : 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology 32-bit Signed and Unsigned Advanced Modified Booth Multiplication using Radix-4 Encoding

More information

A Simple Method to Improve the throughput of A Multiplier

A Simple Method to Improve the throughput of A Multiplier International Journal of Electronics and Communication Engineering. ISSN 0974-2166 Volume 6, Number 1 (2013), pp. 9-16 International Research Publication House http://www.irphouse.com A Simple Method to

More information

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE 754-2008 Standard M. Shyamsi, M. I. Ibrahimy, S. M. A. Motakabber and M. R. Ahsan Dept. of Electrical and Computer Engineering

More information

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

High Performance and Area Efficient DSP Architecture using Dadda Multiplier 2017 IJSRST Volume 3 Issue 6 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology High Performance and Area Efficient DSP Architecture using Dadda Multiplier V.Kiran Kumar

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DESIGN OF QUATERNARY ADDER FOR HIGH SPEED APPLICATIONS MS. PRITI S. KAPSE 1, DR.

More information

AN IMPROVED FUSED FLOATING-POINT THREE-TERM ADDER. Mohyiddin K, Nithin Jose, Mitha Raj, Muhamed Jasim TK, Bijith PS, Mohamed Waseem P

AN IMPROVED FUSED FLOATING-POINT THREE-TERM ADDER. Mohyiddin K, Nithin Jose, Mitha Raj, Muhamed Jasim TK, Bijith PS, Mohamed Waseem P AN IMPROVED FUSED FLOATING-POINT THREE-TERM ADDER Mohyiddin K, Nithin Jose, Mitha Raj, Muhamed Jasim TK, Bijith PS, Mohamed Waseem P ABSTRACT A fused floating-point three term adder performs two additions

More information

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER. OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER. A.Anusha 1 R.Basavaraju 2 anusha201093@gmail.com 1 basava430@gmail.com 2 1 PG Scholar, VLSI, Bharath Institute of Engineering

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 6b High-Speed Multiplication - II Israel Koren ECE666/Koren Part.6b.1 Accumulating the Partial

More information

Digital Computer Arithmetic

Digital Computer Arithmetic Digital Computer Arithmetic Part 6 High-Speed Multiplication Soo-Ik Chae Spring 2010 Koren Chap.6.1 Speeding Up Multiplication Multiplication involves 2 basic operations generation of partial products

More information

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 6b High-Speed Multiplication - II Spring 2017 Koren Part.6b.1 Accumulating the Partial Products After generating partial

More information

DESIGN OF QUATERNARY ADDER FOR HIGH SPEED APPLICATIONS

DESIGN OF QUATERNARY ADDER FOR HIGH SPEED APPLICATIONS DESIGN OF QUATERNARY ADDER FOR HIGH SPEED APPLICATIONS Ms. Priti S. Kapse 1, Dr. S. L. Haridas 2 1 Student, M. Tech. Department of Electronics, VLSI, GHRACET, Nagpur, (India) 2 H.O.D. of Electronics and

More information

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 3 DLD P VIDYA SAGAR

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 3 DLD P VIDYA SAGAR DLD UNIT III Combinational Circuits (CC), Analysis procedure, Design Procedure, Combinational circuit for different code converters and other problems, Binary Adder- Subtractor, Decimal Adder, Binary Multiplier,

More information

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017 VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier 1 Katakam Hemalatha,(M.Tech),Email Id: hema.spark2011@gmail.com 2 Kundurthi Ravi Kumar, M.Tech,Email Id: kundurthi.ravikumar@gmail.com

More information

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

An Efficient Carry Select Adder with Less Delay and Reduced Area Application An Efficient Carry Select Adder with Less Delay and Reduced Area Application Pandu Ranga Rao #1 Priyanka Halle #2 # Associate Professor Department of ECE Sreyas Institute of Engineering and Technology,

More information

Area Delay Power Efficient Carry-Select Adder

Area Delay Power Efficient Carry-Select Adder Area Delay Power Efficient Carry-Select Adder Pooja Vasant Tayade Electronics and Telecommunication, S.N.D COE and Research Centre, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

High Performance Architecture for Reciprocal Function Evaluation on Virtex II FPGA

High Performance Architecture for Reciprocal Function Evaluation on Virtex II FPGA EurAsia-ICT 00, Shiraz-Iran, 9-31 Oct. High Performance Architecture for Reciprocal Function Evaluation on Virtex II FPGA M. Anane, H. Bessalah and N. Anane Centre de Développement des Technologies Avancées

More information

HIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR

HIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR HIGH PERFORMANCE FUSED ADD MULTIPLY OPERATOR R. Alwin [1] S. Anbu Vallal [2] I. Angel [3] B. Benhar Silvan [4] V. Jai Ganesh [5] 1 Assistant Professor, 2,3,4,5 Student Members Department of Electronics

More information

Design of a Floating-Point Fused Add-Subtract Unit Using Verilog

Design of a Floating-Point Fused Add-Subtract Unit Using Verilog International Journal of Electronics and Computer Science Engineering 1007 Available Online at www.ijecse.org ISSN- 2277-1956 Design of a Floating-Point Fused Add-Subtract Unit Using Verilog Mayank Sharma,

More information

HIGH SPEED SINGLE PRECISION FLOATING POINT UNIT IMPLEMENTATION USING VERILOG

HIGH SPEED SINGLE PRECISION FLOATING POINT UNIT IMPLEMENTATION USING VERILOG HIGH SPEED SINGLE PRECISION FLOATING POINT UNIT IMPLEMENTATION USING VERILOG 1 C.RAMI REDDY, 2 O.HOMA KESAV, 3 A.MAHESWARA REDDY 1 PG Scholar, Dept of ECE, AITS, Kadapa, AP-INDIA. 2 Asst Prof, Dept of

More information

Area Delay Power Efficient Carry-Select Adder

Area Delay Power Efficient Carry-Select Adder Area Delay Power Efficient Carry-Select Adder B.Radhika MTech Student VLSI & Embedded Design, Vijaya Engineering College Khammam, India. T.V.Suresh Kumar, M.Tech,(Ph.D) Guide VLSI & Embedded Design, Vijaya

More information

Week 7: Assignment Solutions

Week 7: Assignment Solutions Week 7: Assignment Solutions 1. In 6-bit 2 s complement representation, when we subtract the decimal number +6 from +3, the result (in binary) will be: a. 111101 b. 000011 c. 100011 d. 111110 Correct answer

More information

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers International Journal of Research in Computer Science ISSN 2249-8257 Volume 1 Issue 1 (2011) pp. 1-7 White Globe Publications www.ijorcs.org IEEE-754 compliant Algorithms for Fast Multiplication of Double

More information

High Speed Radix 8 CORDIC Processor

High Speed Radix 8 CORDIC Processor High Speed Radix 8 CORDIC Processor Smt. J.M.Rudagi 1, Dr. Smt. S.S ubbaraman 2 1 Associate Professor, K.L.E CET, Chikodi, karnataka, India. 2 Professor, W C E Sangli, Maharashtra. 1 js_itti@yahoo.co.in

More information

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER Bhuvaneswaran.M 1, Elamathi.K 2 Assistant Professor, Muthayammal Engineering college, Rasipuram, Tamil Nadu, India 1 Assistant Professor, Muthayammal

More information

An Optimized Montgomery Modular Multiplication Algorithm for Cryptography

An Optimized Montgomery Modular Multiplication Algorithm for Cryptography 118 IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.1, January 2013 An Optimized Montgomery Modular Multiplication Algorithm for Cryptography G.Narmadha 1 Asst.Prof /ECE,

More information

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE Design and Implementation of Optimized Floating Point Matrix Multiplier Based on FPGA Maruti L. Doddamani IV Semester, M.Tech (Digital Electronics), Department

More information

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator M.Chitra Evangelin Christina Associate Professor Department of Electronics and Communication Engineering Francis Xavier

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Arithmetic Unit 10122011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Fixed Point Arithmetic Addition/Subtraction

More information

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications , Vol 7(4S), 34 39, April 204 ISSN (Print): 0974-6846 ISSN (Online) : 0974-5645 Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications B. Vignesh *, K. P. Sridhar

More information

A High-Performance Significand BCD Adder with IEEE Decimal Rounding

A High-Performance Significand BCD Adder with IEEE Decimal Rounding 2009 9th IEEE International Symposium on Computer Arithmetic A High-Performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding Alvaro Vázquez and Elisardo Antelo Department of Electronic and

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL Tharanidevi.B 1, Jayaprakash.R 2 Assistant Professor, Dept. of ECE, Bharathiyar Institute of Engineering for Woman, Salem, TamilNadu,

More information

An FPGA based Implementation of Floating-point Multiplier

An FPGA based Implementation of Floating-point Multiplier An FPGA based Implementation of Floating-point Multiplier L. Rajesh, Prashant.V. Joshi and Dr.S.S. Manvi Abstract In this paper we describe the parameterization, implementation and evaluation of floating-point

More information

Delay Optimised 16 Bit Twin Precision Baugh Wooley Multiplier

Delay Optimised 16 Bit Twin Precision Baugh Wooley Multiplier Delay Optimised 16 Bit Twin Precision Baugh Wooley Multiplier Vivek. V. Babu 1, S. Mary Vijaya Lense 2 1 II ME-VLSI DESIGN & The Rajaas Engineering College Vadakkangulam, Tirunelveli 2 Assistant Professor

More information

ISSN Vol.08,Issue.12, September-2016, Pages:

ISSN Vol.08,Issue.12, September-2016, Pages: ISSN 2348 2370 Vol.08,Issue.12, September-2016, Pages:2273-2277 www.ijatir.org G. DIVYA JYOTHI REDDY 1, V. ROOPA REDDY 2 1 PG Scholar, Dept of ECE, TKR Engineering College, Hyderabad, TS, India, E-mail:

More information

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm 1 A.Malashri, 2 C.Paramasivam 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 2, Issue 1, Feb 2015, 01-07 IIST HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC

More information

ISSN Vol.05,Issue.09, September-2017, Pages:

ISSN Vol.05,Issue.09, September-2017, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.09, September-2017, Pages:1693-1697 AJJAM PUSHPA 1, C. H. RAMA MOHAN 2 1 PG Scholar, Dept of ECE(DECS), Shirdi Sai Institute of Science and Technology, Anantapuramu,

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 6c High-Speed Multiplication - III Israel Koren Fall 2010 ECE666/Koren Part.6c.1 Array Multipliers

More information

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder 1.M.Megha,M.Tech (VLSI&ES),2. Nataraj, M.Tech (VLSI&ES), Assistant Professor, 1,2. ECE Department,ST.MARY S College of Engineering

More information

IMPLEMENTATION OF TWIN PRECISION TECHNIQUE FOR MULTIPLICATION

IMPLEMENTATION OF TWIN PRECISION TECHNIQUE FOR MULTIPLICATION IMPLEMENTATION OF TWIN PRECISION TECHNIQUE FOR MULTIPLICATION SUNITH KUMAR BANDI #1, M.VINODH KUMAR *2 # ECE department, M.V.G.R College of Engineering, Vizianagaram, Andhra Pradesh, INDIA. 1 sunithjc@gmail.com

More information

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE. 16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE. AditiPandey* Electronics & Communication,University Institute of Technology,

More information

FPGA Based Acceleration of Decimal Operations

FPGA Based Acceleration of Decimal Operations FPGA Based Acceleration of Decimal Operations Alberto Nannarelli Dept. Informatics and Mathematical Modelling Technical University of Denmark Kongens Lyngby, Denmark Email: an@imm.dtu.dk Abstract Field

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016 NEW VLSI ARCHITECTURE FOR EXPLOITING CARRY- SAVE ARITHMETIC USING VERILOG HDL B.Anusha 1 Ch.Ramesh 2 shivajeehul@gmail.com 1 chintala12271@rediffmail.com 2 1 PG Scholar, Dept of ECE, Ganapathy Engineering

More information

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering An Efficient Implementation of Double Precision Floating Point Multiplier Using Booth Algorithm Pallavi Ramteke 1, Dr. N. N. Mhala 2, Prof. P. R. Lakhe M.Tech [IV Sem], Dept. of Comm. Engg., S.D.C.E, [Selukate],

More information

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES Volume 120 No. 6 2018, 4453-4466 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR

More information

Floating-Point Butterfly Architecture Based on Binary Signed-Digit Representation

Floating-Point Butterfly Architecture Based on Binary Signed-Digit Representation Floating-Point Butterfly Architecture Based on Binary Signed-Digit Representation Abstract: Fast Fourier transform (FFT) coprocessor, having a significant impact on the performance of communication systems,

More information

JOURNAL OF INTERNATIONAL ACADEMIC RESEARCH FOR MULTIDISCIPLINARY Impact Factor 1.393, ISSN: , Volume 2, Issue 7, August 2014

JOURNAL OF INTERNATIONAL ACADEMIC RESEARCH FOR MULTIDISCIPLINARY Impact Factor 1.393, ISSN: , Volume 2, Issue 7, August 2014 DESIGN OF HIGH SPEED BOOTH ENCODED MULTIPLIER PRAVEENA KAKARLA* *Assistant Professor, Dept. of ECONE, Sree Vidyanikethan Engineering College, A.P., India ABSTRACT This paper presents the design and implementation

More information

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase Abhay Sharma M.Tech Student Department of ECE MNNIT Allahabad, India ABSTRACT Tree Multipliers are frequently

More information

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B) Computer Arithmetic Data is manipulated by using the arithmetic instructions in digital computers. Data is manipulated to produce results necessary to give solution for the computation problems. The Addition,

More information

A Novel Floating Point Comparator Using Parallel Tree Structure

A Novel Floating Point Comparator Using Parallel Tree Structure A Novel Floating Point Comparator Using Parallel Tree Structure Anuja T Samuel PG student ECE Dept,College of Engineeering, Guindy, Anna University, Chennai, Tamil Nadu, India anujatsamuel@gmail.com Jawahar

More information

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state Chapter 4 xi yi Carry in ci Sum s i Carry out c i+ At the ith stage: Input: ci is the carry-in Output: si is the sum ci+ carry-out to (i+)st state si = xi yi ci + xi yi ci + xi yi ci + xi yi ci = x i yi

More information

Binary Adders. Ripple-Carry Adder

Binary Adders. Ripple-Carry Adder Ripple-Carry Adder Binary Adders x n y n x y x y c n FA c n - c 2 FA c FA c s n MSB position Longest delay (Critical-path delay): d c(n) = n d carry = 2n gate delays d s(n-) = (n-) d carry +d sum = 2n

More information

Parallelized Radix-4 Scalable Montgomery Multipliers

Parallelized Radix-4 Scalable Montgomery Multipliers Parallelized Radix-4 Scalable Montgomery Multipliers Nathaniel Pinckney and David Money Harris 1 1 Harvey Mudd College, 301 Platt. Blvd., Claremont, CA, USA e-mail: npinckney@hmc.edu ABSTRACT This paper

More information

Efficient Design of Radix Booth Multiplier

Efficient Design of Radix Booth Multiplier Efficient Design of Radix Booth Multiplier 1Head and Associate professor E&TC Department, Pravara Rural Engineering College Loni 2ME E&TC Engg, Pravara Rural Engineering College Loni --------------------------------------------------------------------------***----------------------------------------------------------------------------

More information

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering A Review: Design of 16 bit Arithmetic and Logical unit using Vivado 14.7 and Implementation on Basys 3 FPGA Board Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor,

More information

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier U.V.N.S.Suhitha Student Department of ECE, BVC College of Engineering, AP, India. Abstract: The ever growing need for improved

More information

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation. ISSN 2319-8885 Vol.03,Issue.32 October-2014, Pages:6436-6440 www.ijsetr.com Design and Modeling of Arithmetic and Logical Unit with the Platform of VLSI N. AMRUTHA BINDU 1, M. SAILAJA 2 1 Dept of ECE,

More information

Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating point

Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating point International Journal of Electrical and Computer Engineering (IJECE) Vol. 3, No. 6, December 2013, pp. 805~814 ISSN: 2088-8708 805 Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating

More information

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Design and Implementation of CVNS Based Low Power 64-Bit Adder Design and Implementation of CVNS Based Low Power 64-Bit Adder Ch.Vijay Kumar Department of ECE Embedded Systems & VLSI Design Vishakhapatnam, India Sri.Sagara Pandu Department of ECE Embedded Systems

More information

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. Binary Arithmetic Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. MIT 6.004 Fall 2018 Reminder: Encoding Positive Integers Bit i in a binary representation (in right-to-left order)

More information

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 6c High-Speed Multiplication - III Spring 2017 Koren Part.6c.1 Array Multipliers The two basic operations - generation

More information

A Review on Optimizing Efficiency of Fixed Point Multiplication using Modified Booth s Algorithm

A Review on Optimizing Efficiency of Fixed Point Multiplication using Modified Booth s Algorithm A Review on Optimizing Efficiency of Fixed Point Multiplication using Modified Booth s Algorithm Mahendra R. Bhongade, Manas M. Ramteke, Vijay G. Roy Author Details Mahendra R. Bhongade, Department of

More information

FPGA Implementation of Efficient Carry-Select Adder Using Verilog HDL

FPGA Implementation of Efficient Carry-Select Adder Using Verilog HDL FPGA Implementation of Efficient Carry-Select Adder Using Verilog HDL Abstract: Lingappagari Raju M.Tech, VLSI & Embedded Systems, SR International Institute of Technology. Carry Select Adder (CSLA) is

More information

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,

More information

A New Architecture for 2 s Complement Gray Encoded Array Multiplier

A New Architecture for 2 s Complement Gray Encoded Array Multiplier A New Architecture for s Complement Gray Encoded Array Multiplier Eduardo Costa Sergio Bampi José Monteiro UCPel, Pelotas, Brazil UFRGS, P. Alegre, Brazil IST/INESC, Lisboa, Portugal ecosta@atlas.ucpel.tche.br

More information

DESIGN OF DECIMAL / BINARY MULTI-OPERAND ADDER USING A FAST BINARY TO DECIMAL CONVERTER

DESIGN OF DECIMAL / BINARY MULTI-OPERAND ADDER USING A FAST BINARY TO DECIMAL CONVERTER DESIGN OF DECIMAL / BINARY MULTI-OPERAND ADDER USING A FAST BINARY TO DECIMAL CONVERTER Sk.Howldar 1, M. Vamsi Krishna Allu 2 1 M.Tech VLSI Design student, 2 Assistant Professor 1,2 E.C.E Department, Sir

More information

ENERGY-EFFICIENT VLSI REALIZATION OF BINARY64 DIVISION WITH REDUNDANT NUMBER SYSTEMS 1 AVANIGADDA. NAGA SANDHYA RANI

ENERGY-EFFICIENT VLSI REALIZATION OF BINARY64 DIVISION WITH REDUNDANT NUMBER SYSTEMS 1 AVANIGADDA. NAGA SANDHYA RANI ENERGY-EFFICIENT VLSI REALIZATION OF BINARY64 DIVISION WITH REDUNDANT NUMBER SYSTEMS 1 AVANIGADDA. NAGA SANDHYA RANI 2 BALA KRISHNA.KONDA M.Tech, Assistant Professor 1,2 Eluru College Of Engineering And

More information

Combinational Circuits

Combinational Circuits Combinational Circuits Combinational circuit consists of an interconnection of logic gates They react to their inputs and produce their outputs by transforming binary information n input binary variables

More information

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient ISSN (Online) : 2278-1021 Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient PUSHPALATHA CHOPPA 1, B.N. SRINIVASA RAO 2 PG Scholar (VLSI Design), Department of ECE, Avanthi

More information

ASIC IMPLEMENTATION OF 16 BIT CARRY SELECT ADDER

ASIC IMPLEMENTATION OF 16 BIT CARRY SELECT ADDER ASIC IMPLEMENTATION OF 16 BIT CARRY SELECT ADDER Nomula Poojaramani 1, A.Vikram 2 1 Student, Sree Chaitanya Institute Of Tech. Sciences, Karimnagar, Telangana, INDIA 2 Assistant Professor, Sree Chaitanya

More information

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier Design and Implementation of VLSI 8 Bit Systolic Array Multiplier Khumanthem Devjit Singh, K. Jyothi MTech student (VLSI & ES), GIET, Rajahmundry, AP, India Associate Professor, Dept. of ECE, GIET, Rajahmundry,

More information

Get Free notes at Module-I One s Complement: Complement all the bits.i.e. makes all 1s as 0s and all 0s as 1s Two s Complement: One s complement+1 SIGNED BINARY NUMBERS Positive integers (including zero)

More information

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A.S. Sneka Priyaa PG Scholar Government College of Technology Coimbatore ABSTRACT The Least Mean Square Adaptive Filter is frequently

More information

Fast Multi Operand Decimal Adders using Digit Compressors with Decimal Carry Generation

Fast Multi Operand Decimal Adders using Digit Compressors with Decimal Carry Generation Downloaded from orbit.dtu.dk on: Dec, Fast Multi Operand Decimal Adders using Digit Compressors with Decimal Carry Generation Dadda, Luigi; Nannarelli, Alberto Publication date: Document Version Publisher's

More information