Design of an Efficient 128-Bit Carry Select Adder Using Bec and Variable csla Techniques

Similar documents
Design and Verification of Area Efficient High-Speed Carry Select Adder

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

A High Speed Design of 32 Bit Multiplier Using Modified CSLA

Low-Power And Area-Efficient Of 128-Bit Carry Select Adder

Design and Characterization of High Speed Carry Select Adder

Hard ware implementation of area and power efficient Carry Select Adder using reconfigurable adder structures

Implementation of CMOS Adder for Area & Energy Efficient Arithmetic Applications

Implementation of CMOS Adder for Area & Energy Efficient Arithmetic Applications

ASIC IMPLEMENTATION OF 16 BIT CARRY SELECT ADDER

Area Delay Power Efficient Carry-Select Adder

Area Delay Power Efficient Carry Select Adder

Anisha Rani et al., International Journal of Computer Engineering In Research Trends Volume 2, Issue 11, November-2015, pp.

FPGA Implementation of Efficient Carry-Select Adder Using Verilog HDL

Implementation of Regular Linear Carry Select Adder with Binary to Excess-1 Converter

Area Delay Power Efficient Carry-Select Adder

Carry Select Adder with High Speed and Power Efficiency

Design of Delay Efficient Carry Save Adder

Area-Delay-Power Efficient Carry-Select Adder

Implementation of 64-Bit Kogge Stone Carry Select Adder with ZFC for Efficient Area

International Journal of Innovative Research in Computer Science & Technology (IJIRCST) ISSN: , Volume-3, Issue-5, September-2015

A New Architecture Designed for Implementing Area Efficient Carry-Select Adder

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

DESIGN AND IMPLEMENTATION OF APPLICATION SPECIFIC 32-BITALU USING XILINX FPGA

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

AREA-DELAY-POWER EFFICIENT CARRY SELECT ADDER

VLSI Arithmetic Lecture 6

Implimentation of A 16-bit RISC Processor for Convolution Application

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Design and Implementation of CVNS Based Low Power 64-Bit Adder

High Speed Han Carlson Adder Using Modified SQRT CSLA

A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA

An Optimized Montgomery Modular Multiplication Algorithm for Cryptography

An Efficient Flexible Architecture for Error Tolerant Applications

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator


Implementation of Floating Point Multiplier Using Dadda Algorithm

Performance of Constant Addition Using Enhanced Flagged Binary Adder

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Parallel-Prefix Adders Implementation Using Reverse Converter Design. Department of ECE

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

VLSI Implementation of Adders for High Speed ALU

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER

II. MOTIVATION AND IMPLEMENTATION

ARITHMETIC operations based on residue number systems

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

16 Bit Low Power High Speed RCA Using Various Adder Configurations

CHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit

DESIGN AND IMPLEMENTATION OF ADDER ARCHITECTURES AND ANALYSIS OF PERFORMANCE METRICS

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Design of 2-Bit ALU using CMOS & GDI Logic Architectures.

Performance Analysis of 64-Bit Carry Look Ahead Adder

AN EFFICIENT REVERSE CONVERTER DESIGN VIA PARALLEL PREFIX ADDER

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

Efficient Radix-10 Multiplication Using BCD Codes

DESIGN AND IMPLEMENTATION 0F 64-BIT PARALLEL PREFIX BRENTKUNG ADDER

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

the main limitations of the work is that wiring increases with 1. INTRODUCTION

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

High Speed Multiplication Using BCD Codes For DSP Applications

An Efficient Hybrid Parallel Prefix Adders for Reverse Converters using QCA Technology

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016

Study, Implementation and Survey of Different VLSI Architectures for Multipliers

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Combinational Logic II

Evolution of CAD Tools & Verilog HDL Definition

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Date Performed: Marks Obtained: /10. Group Members (ID):. Experiment # 09 MULTIPLEXERS

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

Srinivasasamanoj.R et al., International Journal of Wireless Communications and Network Technologies, 1(1), August-September 2012, 4-9

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

High-Performance Carry Chains for FPGA s

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

Case Study on DiaHDL: A Web-based Electronic Design Automation Tool for Education Purpose

Optimized Design and Implementation of a 16-bit Iterative Logarithmic Multiplier

Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT

VARUN AGGARWAL

R. Solomon Roach 1, N. Nirmal Singh 2.

Implementation of Discrete Wavelet Transform for Image Compression Using Enhanced Half Ripple Carry Adder

ISSN Vol.05, Issue.02, February-2017, Pages:

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

Recursive Approach for Design of a Parallel Self-Timed Adder Using Verilog HDL

Verilog for High Performance

Chapter 4. Combinational Logic

IMPLEMENTATION OF HIGH SPEED AND LOW POWER RADIX-4 8*8 BOOTH MULTIPLIER IN CMOS 32nm TECHNOLOGY

High Throughput Radix-D Multiplication Using BCD

ANALYZING THE PERFORMANCE OF CARRY TREE ADDERS BASED ON FPGA S

EXPERIMENT #8: BINARY ARITHMETIC OPERATIONS

VHDL Design and Implementation of ASIC Processor Core by Using MIPS Pipelining

A Macro Generator for Arithmetic Cores

Transcription:

Design of an Efficient 128-Bit Carry Select Adder Using Bec and Variable csla Techniques B.Bharathi 1, C.V.Subhaskar Reddy 2 1 DEPARTMENT OF ECE, S.R.E.C, NANDYAL 2 ASSOCIATE PROFESSOR, S.R.E.C, NANDYAL. Abstract Design of area- and power-efficient highspeed data path logic systems forms the largest areas of research in VLSI system design. In digital adders, the speed of addition is limited by the time required to transmit a carry through the adder. Carry Select Adder (CSLA) is one of the fastest adders used in many data-processing processors to perform fast arithmetic functions. From the structure of the CSLA, it is clear that there is span for reducing the area and power consumption in the CSLA. This work uses a simple and efficient gate-level modification to drastically reduce the area and power of the CSLA. Based on this modification 8, 16, 32, 64 and 128-bit square-root CSLA (SQRT CSLA) architectures have been developed and compared with the regular SQRT CSLA architecture. The proposed design has reduced area and power as compared with the regular SQRT CSLA with only a minor increase in the delay. This work estimates the performance of the proposed designs in terms of delay, area are implemented in Xilinx ISE and Modelsim. I. Introduction Area and power reduction in data path logic systems are the main area of research in VLSI system design. High-speed addition and multiplication has always been a fundamental requirement of high-performance processors and systems. In digital adders, the speed of addition is limited by the time required to propagate a carry through the adder. The sum for each bit position in an elementary adder is generated sequentially only after the previous bit position has been summed and a carry propagated into the next position. The major speed limitation in any adder is in the production of carries and many authors have considered the addition problem. The CSLA is used in many computational systems to moderate the problem of carry propagation delay by independently generating multiple carries and then select a carry to generate the sum. However, the CSLA is not area efficient because it uses multiple pairs of Ripple Carry Adders (RCA) to generate partial sum and carry by considering carry input and then the final sum and carry are selected by the multiplexers (mux). To overcome above problem, the basic idea of the proposed work is by using n-bit binary to excess-1 code converters (BEC) to improve the speed of addition. This logic can be implemented with any type of adder to further improve the speed. Using Binary to Excess-1 Converter (BEC) instead of RCA in the regular CSLA to achieve lower area and power consumption. The main advantage of this BEC logic comes from the lesser number of logic gates than the Full Adder (FA) structure. In electronics, a carry-select adder is a particular way to implement an adder, which is a logic element that computes the -bit sum of two -bit numbers. The carry-select adder is simple but rather fast, having a gate level depth of. The carry-select adder generally consists of two ripple carry adders and a multiplexer. Adding two n-bit numbers with a carry-select adder is done with two adders (therefore two ripple carry adders) in order to perform the calculation twice, one time with the assumption of the carry being zero and the other assuming one. After the two results are calculated, the correct sum, as well as the correct carry, is then selected with the multiplexer once the correct carry is known. The number of bits in each carry select block can be uniform, or variable. In the uniform case, the optimal delay occurs for a block size of. When variable, the block size should have a delay, from addition inputs A and B to the carry out, equal to that of the multiplexer chain leading into it, so that the carry out is calculated just in time. The delay is derived from uniform sizing, where the ideal number of full-adder elements per block is equal to the square root of the number of bits being added, since that will yield an equal number of MUX delays. Basic building block Figure 1: Basic carry Select adder. 437 P a g e

Above is the basic building block of a carry-select adder, where the block size is 4. Two 4- bit ripple carry adders are multiplexed together, where the resulting carry and sum bits are selected by the carry-in. Since one ripple carry adder assumes a carry-in of 0, and the other assumes a carry-in of 1, selecting which adder had the correct assumption via the actual carry-in yields the desired result. Uniform-sized adder A 16-bit carry-select adder with a uniform block size of 4 can be created with three of these blocks and a 4-bit ripple carry adder. Since carry-in is known at the beginning of computation, a carry select block is not needed for the first four bits. The delay of this adder will be four full adder delays, plus three MUX delays. Figure 2: Regular Fixed Size CSLA Variable-sized adder A 16-bit carry-select adder with variable size can be similarly created. Here we show an adder with block sizes of 2-2-3-4-5. This break-up is ideal when the full-adder delay is equal to the MUX delay, which is unlikely. The total delay is two full adder delays, and four mux delays. This brief is structured as follows. Section II deals with the delay and area evaluation methodology of the basic adder blocks. Section III presents the detailed structure and the function of the BEC logic. The SQRT CSLA has been chosen for comparison with the proposed design as it has a more balanced delay, and requires lower power and area [5], [6]. The delay and area evaluation methodology of the regular and modified SQRT CSLA are presented in Sections IV and V, respectively. The ASIC implementation details and results are analyzed in Section VI. Finally, the work is concluded in Section VII. II. Basic Function And Structure Of BEC Logic The delay and area evaluation methodology considers all gates to be made up of AND, OR, and Inverter, each having delay equal to 1 unit and area equal to 1 unit. We then add up the number of gates in the longest path of a logic block that contributes to the maximum delay. The area evaluation is done by counting the total number of AOI gates required for each logic block. Based on this approach, the CSLA adder blocks of 2:1 mux, Half Adder (HA), and FA are evaluated and listed in Table I. The basic work is to use Binary to Excess-1 Converter (BEC) instead of RCA with Cin=1 in the regular CSLA to achieve lower area and power consumption. The main advantage of this BEC logic comes from the lesser number of logic gates than the n-bit Full Adder (FA) structure As stated above the main idea of this work is to use BEC instead of the RCA with Cin=1 in order to reduce the area and power consumption of the regular CSLA. To replace the n-bit RCA, an n+1-bit BEC is required. A structure and the function table of a 4-bit BEC are shown in Figure.2 and Table.2, respectively. Figure 3: Variable Sized CSLA. The basic idea of this work is to use Binary to Excess-1 Converter (BEC) instead of RCA with Cin=1 in the regular CSLA to achieve lower area and power consumption [2] [4]. The main advantage of this BEC logic comes from the lesser number of logic gates than the n-bit Full Adder (FA) structure. The details of the BEC logic are discussed in Section III. Figure 1: 4 Bit BEC. The Boolean expressions of the 4-bit BEC is listed as (note the functional symbols ~ NOT, & AND, ^XOR) X0 = ~B0 X1 = B0^B1 X2 = B2^ (B0 & B1) X3 = B3^ (B0 & B1 & B2) 438 P a g e

Figure 4: 4-Bit BEC with 8:4 mux. III. Basic Structure of Regular 16-Bit CSLA A 16-bit carry select has two types of block size namely uniform block size and variable block size. A 16-bit carry select adder with a uniform block size has the delay of four full adder delays and three MUX delays. While a 16-bit carry select adder with variable block size has the delay of two full adder delays, and four mux delays. Therefore we use 16-bit carry select adder with variable block size. Ripple-carry adders are the simplest and most compact full adders, but their performance is limited by a carry that must ripple from the least-significant to the most-significant bit. A carry-select adder achieves speeds 40% to 90% faster by performing additions in parallel and reducing the maximum carry path. A carry-select adder is divided into sectors, each of which, except for the least significant performs two additions in parallel, one assuming a carry-in of zero, the other a carry-in of one within the sector, there are two 4-bit ripple- carry adders receiving the same data inputs but different Cin. The upper adder has a carry-in of zero, the lower adder a carry-in of one. The actual Cin from the preceding sector selects one of the two adders. If the carry-in is zero, the sum and carry-out of the upper adder are selected. If the carry-in is one, the sum and carry-out of the lower adder are selected. Logically, the result is not different if a single ripple-carry adder were used. First the coding for full adder and different multiplexers of 6:3, 8:4, 10:5, and 12:6 was done. Then 2, 3, 4, 5-bit ripple carry adder was done by calling the full adder. The regular 64-bit CSLA was created by calling the ripple carry adders and all multiplexers based on circuit. Finally, regular 128- bit was implemented in the FIR filter design (section5). Figure 5: Regular 16-bit SQRT CSLA IV. Basic Structure of Modified 16-Bit CSLA It is similar to regular 16-bit SQRT CSLA. Only change is that in basic blocks having two ripple-carry adders, one ripple carry adder fed with a constant 1 carry-in is replaced by BEC. The area estimation of each group is calculated. Based on the consideration of delay values, the arrival time of selection input C1 [time (T) =7] of 6:3 mux is earlier than the s3 [t =9] and c3 [t =7] and later than the s2 [t =4]. Thus, the sum3 and final c3 (output from mux) are depending on s3 and mux and partial c3 (input to mux) and mux, respectively. The sum2 depends on c1 and mux. For the remaining parts the arrival time of mux selection input is always greater than the arrival time of data inputs from the BEC s. Thus, the delay of the remaining MUX depends on the arrival time of mux selection input and the mux delay. First the coding for full adder and multiplexers of 6:3, 8:4, 10:5, and 12:6 was done. The BEC program was design by using NOT, XOR and AND gates. Then 2, 3, 4, 5-bit ripple carry adder was done by calling the full adder. The modified 16- bit CSLA was created by calling the ripple carry adders, BEC and all multiplexers based upon the circuit. Finally, modified 64-bit was implemented in the FIR filter design (section 5). The arrival time of selection input of 6:3 mux is earlier. Thus, the sum3 and final c3 (output from mux) depends on s3 and mux and partial c3 (input to mux) and mux, respectively. The sum2 depends on c1and mux. For the remaining group s the arrival time of mux selection input is always greater than the arrival time of data inputs from the BEC s. Thus, the delay of the remaining groups depends on the arrival time of mux selection input and the mux delay. 439 P a g e

Figure. 6. Modified 16-bit SQRT CSLA. V. IMPLEMENTATION RESULTS The design proposed in this paper has been developed using Verilog-HDL and synthesized in Xilinx ISE 9.1i. For each word size of the adder, the same value changed dump (VCD) file is generated for all possible input conditions and imported the same to Xilinx ISE 9.1i Power Analysis to perform the power simulations. The similar design flow is followed for both the regular and modified SQRT CSLA. Table 5 exhibits the simulation results of both the CSLA fir filter structures in terms of delay, area and power. The power delay product of the proposed 8-bit is higher than that of the regular SQRT CSLA by 5.2% and the area-delay product is lower by 2.9%. However, the power-delay product of the proposed 16-bit SQRT CSLA reduces by 1.76% and for the 32-bit, 64-bit and by as much as 8.18%, and 12.28% respectively. input (a sequence of numbers, resulting from sampling and quantizing an analog signal) and produces a digital output. The simulation Results of regular and Modified Carry Select adder shown figure 7 and 8.comapre to regular CSLA modified CSLA architecture is take less amount of delay it is approximately reduced by 30% of regular method design Figure. 7. Regular 16-bit CSLA Figure. 8. Modified 16-bit CSLA VI. CONCLUSION A simple approach is proposed in this paper to reduce the area and power of SQRT CSLA architecture. The reduced number of gates of this work offers the great advantage in the reduction of area and also the total power. The compared results show that the modified SQRT CSLA has a slightly larger delay (only3.76%), but the area and power of the 128-bit modified SQRT CSLA are significantly reduced by 17.4% and 15.4% respectively. The power-delay product and also the area-delay product of the proposed design show a decrease for 16, 32,64 and 128-bit sizes which indicates the success of the method and not a mere tradeoff of delay for power and area. The modified CSLA architecture is therefore, low area, low power, simple and efficient for VLSI hardware implementation. As the functional verification decides the quality of the silicon, we spend 60% of the design cycle time only for the verification/simulation. In order to avoid the delay and meet the TTM, we use the latest verification methodologies and technologies and accelerate the verification process. This project helps one to understand the complete functional verification process of complex ASICs an SoC s and it gives opportunity to try the latest verification methodologies, programming concepts like Object Oriented Programming of Hardware Verification Languages and sophisticated EDA tools, for the high quality verification. Future Work: This project used System Verilog i.e., the technology used is direct test cases, randomized test cases,ovm for verification even though the coverage is 100% there may be some errors which cannot be shown so in Oder to overcome this the new technology of System 440 P a g e

Verilog i.e., OVM and UVM. In the coming future the Router can be done by using OVM and UVM. REFERENCES [1] Bedrij, O. J.,(1962), Carry-select adder, IRE Trans. Electron. Comput., pp.340 344. [2] Ceiang,T. Y. and Hsiao,M. J.,(Oct1998 ), Carry-select adder using single ripple carry adder, Electron. Lett., vol. 34, no. 22, pp. 2101 2103 [3] He, Y., Chang,C. H. and Gu, J., (2005), An A rea efficient 64-bit square root carryselect adder for low power application, in Proc. IEEE Int. Symp.Circuits Syst. vol. 4, pp. 4082 4085. [4] Ramkumar,B., Kittur, H.M. and Kannan,P. M.,(2010 ), ASIC implementation of modified faster carry save adder, Eur. J. Sci. Res., vol. 42, no. 1,pp.53 58. [5] Kim,Y. and Kim,L.-S.,(May2001), 64-bit carry-select adder with reduced area, Electron Lett., vol. 37, no. 10, pp. 614 615. [6] E. Abu-Shama and M. Bayoumi, A New cell for low power adders, in Proc.Int. Midwest Symp. Circuits and Systems, 1995, pp. 1014 1017 [7] www.xilinx.com. [8] Verilog HDL- Digital Design and Synthesis, by Samir Palnitkar 441 P a g e