Design and Verification of Area Efficient High-Speed Carry Select Adder

Similar documents
Design of an Efficient 128-Bit Carry Select Adder Using Bec and Variable csla Techniques

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

A High Speed Design of 32 Bit Multiplier Using Modified CSLA

Low-Power And Area-Efficient Of 128-Bit Carry Select Adder

Design and Characterization of High Speed Carry Select Adder

Hard ware implementation of area and power efficient Carry Select Adder using reconfigurable adder structures

Implementation of CMOS Adder for Area & Energy Efficient Arithmetic Applications

ASIC IMPLEMENTATION OF 16 BIT CARRY SELECT ADDER

Implementation of CMOS Adder for Area & Energy Efficient Arithmetic Applications

Area Delay Power Efficient Carry Select Adder

Implementation of Regular Linear Carry Select Adder with Binary to Excess-1 Converter

Area Delay Power Efficient Carry-Select Adder

Anisha Rani et al., International Journal of Computer Engineering In Research Trends Volume 2, Issue 11, November-2015, pp.

FPGA Implementation of Efficient Carry-Select Adder Using Verilog HDL

Area Delay Power Efficient Carry-Select Adder

Design of Delay Efficient Carry Save Adder

Carry Select Adder with High Speed and Power Efficiency

Implementation of 64-Bit Kogge Stone Carry Select Adder with ZFC for Efficient Area

Area-Delay-Power Efficient Carry-Select Adder

International Journal of Innovative Research in Computer Science & Technology (IJIRCST) ISSN: , Volume-3, Issue-5, September-2015

A New Architecture Designed for Implementing Area Efficient Carry-Select Adder

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

DESIGN AND IMPLEMENTATION OF APPLICATION SPECIFIC 32-BITALU USING XILINX FPGA

Implimentation of A 16-bit RISC Processor for Convolution Application

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

AREA-DELAY-POWER EFFICIENT CARRY SELECT ADDER

An Optimized Montgomery Modular Multiplication Algorithm for Cryptography

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

Design and Implementation of CVNS Based Low Power 64-Bit Adder

High Speed Han Carlson Adder Using Modified SQRT CSLA

VLSI Arithmetic Lecture 6

A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA

DESIGN AND IMPLEMENTATION OF ADDER ARCHITECTURES AND ANALYSIS OF PERFORMANCE METRICS

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

Performance of Constant Addition Using Enhanced Flagged Binary Adder

DESIGN AND IMPLEMENTATION 0F 64-BIT PARALLEL PREFIX BRENTKUNG ADDER

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

Srinivasasamanoj.R et al., International Journal of Wireless Communications and Network Technologies, 1(1), August-September 2012, 4-9

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Study, Implementation and Survey of Different VLSI Architectures for Multipliers

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Implementation of Floating Point Multiplier Using Dadda Algorithm

DEVELOPMENT AND VERIFICATION OF AHB2APB BRIDGE PROTOCOL USING UVM TECHNIQUE

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

An Efficient Flexible Architecture for Error Tolerant Applications

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER


Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT

16 Bit Low Power High Speed RCA Using Various Adder Configurations

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit

An Efficient Hybrid Parallel Prefix Adders for Reverse Converters using QCA Technology

Design of 2-Bit ALU using CMOS & GDI Logic Architectures.

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

VHDL Design and Implementation of ASIC Processor Core by Using MIPS Pipelining

University, Patiala, Punjab, India 1 2

AN EFFICIENT REVERSE CONVERTER DESIGN VIA PARALLEL PREFIX ADDER

Date Performed: Marks Obtained: /10. Group Members (ID):. Experiment # 09 MULTIPLEXERS

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

VLSI Implementation of Adders for High Speed ALU

Parallel-Prefix Adders Implementation Using Reverse Converter Design. Department of ECE

High Speed Multiplication Using BCD Codes For DSP Applications

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016

Implementation of Discrete Wavelet Transform for Image Compression Using Enhanced Half Ripple Carry Adder

Combinational Logic II

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

ARITHMETIC operations based on residue number systems

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

Design and Implementation of High Performance Parallel Prefix Adders

Low Power Circuits using Modified Gate Diffusion Input (GDI)

Performance Analysis of 64-Bit Carry Look Ahead Adder

CHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier

II. MOTIVATION AND IMPLEMENTATION

An RNS Based Montgomery Modular Multiplication Algorithm For Cryptography

ISSN Vol.05, Issue.02, February-2017, Pages:

PINE TRAINING ACADEMY

VHDL IMPLEMENTATION OF FLOATING POINT MULTIPLIER USING VEDIC MATHEMATICS

High Throughput Radix-D Multiplication Using BCD

structure syntax different levels of abstraction

Here is a list of lecture objectives. They are provided for you to reflect on what you are supposed to learn, rather than an introduction to this

A High Performance Reconfigurable Data Path Architecture For Flexible Accelerator

Digital Circuit Design and Language. Datapath Design. Chang, Ik Joon Kyunghee University

Efficient Radix-10 Multiplication Using BCD Codes

Arithmetic Circuits. Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak.

Vertical-Horizontal Binary Common Sub- Expression Elimination for Reconfigurable Transposed Form FIR Filter

FPGA Based Low Area Motion Estimation with BISCD Architecture

R. Solomon Roach 1, N. Nirmal Singh 2.

Design of Framework for Logic Synthesis Engine

Assistant Professor, PICT, Pune, Maharashtra, India

Henry Lin, Department of Electrical and Computer Engineering, California State University, Bakersfield Lecture 7 (Digital Logic) July 24 th, 2012

Transcription:

Design and Verification of Area Efficient High-Speed Carry Select Adder T. RatnaMala # 1, R. Vinay Kumar* 2, T. Chandra Kala #3 #1 PG Student, Kakinada Institute of Engineering and Technology,Korangi, Andhra Pradesh, India. *2 Assistant Professor, Kakinada Institute of Engineering and Technology, Korangi, Andhra Pradesh, India. #3 Assistant Professor, Sri Sai Aditya Institute Of Science & Technology, East Godavari, Andhra Pradesh, India. Abstract- Design of area efficient and high-speed data path logic systems forms the largest areas of research in VLSI system design. In digital adders, the speed of addition is limited by the time required to transmit a carry through the adder. Carry Select Adder (CSLA) is one of the fastest adders used in many dataprocessing processors to perform fast arithmetic functions. From the structure of the CSLA, it is clear that there is span for reducing the area in the CSLA. This work uses a simple and efficient gate-level modification to drastically reduce the area and power of the CSLA. Based on this modification 8, 16, 32, 64 and 128-bit square-root CSLA (SQRT CSLA) architectures have been developed and compared with the regular SQRT CSLA architecture. The proposed design has reduced area as compared with the regular SQRT CSLA. This work estimates the performance of the proposed designs in terms of delay, area are implemented in Xilinx ISE. Keywords- Carry Select Adder, Area, SQRT. 1. INTRODUCTION Area and power reduction in data path logic systems are the main area of research in VLSI system design. High-speed addition and multiplication has always been a fundamental requirement of highperformance processors and systems. In digital adders, the speed of addition is limited by the time required to propagate a carry through the adder. The sum for each bit position in an elementary adder is generated sequentially only after the previous bit position has been summed and a carry propagated into the next position. The major speed limitation in any adder is in the production of carries and many authors have considered the addition problem. The CSLA is used in many computational systems to moderate the problem of carry propagation delay by independently generating multiple carries and then select a carry to generate the sum. However, the CSLA is not area efficient because it uses multiple pairs of Ripple Carry Adders (RCA) to generate partial sum and carry by considering carry input and then the final sum and carry are selected by the multiplexers (mux). To overcome above problem, the basic idea of the proposed work is by using n-bit binary to excess-1 code converters (BEC) to improve the speed of addition. This logic can be implemented with any type of adder to further improve the speed. Using Binary to Excess-1 Converter (BEC) instead of RCA in the regular CSLA to achieve lower area and power consumption. The main advantage of this BEC logic comes from the lesser number of logic gates than the Full Adder (FA) structure. In electronics, a carry-select adder is a particular way to implement an adder, which is a logic element that computes the -bit sum of two - bit numbers. The carry-select adder is simple but rather fast, having a gate level depth of. The carry-select adder generally consists of two ripple carry adders and a multiplexer. Adding two n-bit www.ijrcct.org chiefeditor@ijrcct.org Page 345

numbers with a carry-select adder is done with two adders (therefore two ripple carry adders) in order to perform the calculation twice, one time with the assumption of the carry being zero and the other assuming one. After the two results are calculated, the correct sum, as well as the correct carry, is then selected with the multiplexer once the correct carry is known. The number of bits in each carry select block can be uniform, or variable. In the uniform case, the optimal delay occurs for a block size of B. Uniform-sized adder A 16-bit carry-select adder with a uniform block size of 4 can be created with three of these blocks and a 4- bit ripple carry adder. Since carry-in is known at the beginning of computation, a carry select block is not needed for the first four bits. The delay of this adder will be four full adder delays, plus three MUX delays.. When variable, the block size should have a delay, from addition inputs A and B to the carry out, equal to that of the multiplexer chain leading into it, so that the carry out is calculated just in time. The delay is derived from uniform sizing, where the ideal number of full-adder elements per block is equal to the square root of the number of bits being added, since that will yield an equal number of MUX delays. A. Basic building block: Figure 2: Regular Fixed Size CSLA C. Variable-sized adder A 16-bit carry-select adder with variable size can be similarly created. Here we show an adder with block sizes of 2-2-3-4-5. This break-up is ideal when the full-adder delay is equal to the MUX delay, which is unlikely. The total delay is two full adder delays, and four mux delays. Figure 1: Basic carry Select adder. Above is the basic building block of a carryselect adder, where the block size is 4. Two 4-bit ripple carry adders are multiplexed together, where the resulting carry and sum bits are selected by the carry-in. Since one ripple carry adder assumes a carry-in of 0, and the other assumes a carry-in of 1, selecting which adder had the correct assumption via the actual carry-in yields the desired result. Figure 3: Variable Sized CSLA. The basic idea of this work is to use Binary to Excess-1 Converter (BEC) instead of RCA with Cin=1 in the regular CSLA to achieve lower area and power www.ijrcct.org chiefeditor@ijrcct.org Page 346

consumption [2] [4]. The main advantage of this BEC logic comes from the lesser number of logic gates than the n-bit Full Adder (FA) structure. The details of the BEC logic are discussed in Section III. This brief is structured as follows. Section II deals with the delay and area evaluation methodology of the basic adder blocks. Section III presents the detailed structure and the function of the BEC logic. The SQRT CSLA has been chosen for comparison with the proposed design as it has a more balanced delay, and requires lower power and area [5], [6]. The delay and area evaluation methodology of the regular and modified SQRT CSLA are presented in Sections IV and V, respectively. The ASIC implementation details and results are analyzed in Section VI. Finally, the work is concluded in Section VII. II. BASIC FUNCTION AND STRUCTURE OF BEC LOGIC The delay and area evaluation methodology considers all gates to be made up of AND, OR, and Inverter, each having delay equal to 1 unit and area equal to 1 unit. We then add up the number of gates in the longest path of a logic block that contributes to the maximum delay. The area evaluation is done by counting the total number of AOI gates required for each logic block. Based on this approach, the CSLA adder blocks of 2:1 mux, Half Adder (HA), and FA are evaluated and listed in Table I. The basic work is to use Binary to Excess-1 Converter (BEC) instead of RCA with Cin=1 in the regular CSLA to achieve lower area and power consumption. The main advantage of this BEC logic comes from the lesser number of logic gates than the n-bit Full Adder (FA) structure As stated above the main idea of this work is to use BEC instead of the RCA with Cin=1 in order to reduce the area and power consumption of the regular CSLA. To replace the n-bit RCA, an n+1-bit BEC is required. A structure and the function table of a 4-bit BEC are shown in Figure.2 and Table.2, respectively. Figure 1: 4 Bit BEC. The Boolean expressions of the 4-bit BEC is listed as (note the funct ional symbols ~ NOT, & AND, ^XOR) X0 = ~B0 X1 = B0^B1 X2 = B2^ (B0 & B1) X3 = B3^ (B0 & B1 & B2) Figure 4: 4-Bit BEC with 8:4 mux. III. BASIC STRUCTURE OF REGULAR 128-BIT CSLA A 128-bit carry select has two types of block size namely uniform block size and variable block size. A 128 bit carry select adder with a uniform block size has the delay of four full adder delays and three MUX delays. While a 128-bit carry select adder with variable block size has the delay of two full adder delays, and four mux delays. Therefore we use 128-bit carry select adder with variable block size. Ripple-carry adders are the simplest and most compact full adders, but their performance is limited by a carry that must ripple from the leastsignificant to the most-significant bit. A carry-select adder achieves speeds 40% to 90% faster by performing additions in parallel and reducing the maximum carry path. www.ijrcct.org chiefeditor@ijrcct.org Page 347

A carry-select adder is divided into sectors, each of which, except for the least significant performs two additions in parallel, one assuming a carry-in of zero, the other a carry-in of one within the sector, there are two 2- bit ripple- carry adders receiving the same data inputs but different Cin. The upper adder has a carry-in of zero, the lower adder a carry-in of one. The actual Cin from the preceding sector selects one of the two adders. If the carry-in is zero, the sum and carry-out of the upper adder are selected. If the carry-in is one the sum and carry-out of the lower adder are selected. Logically, the result is not different if a single ripple-carry adder were used. First the coding for full adder and different multiplexers of 8:4, 10:5, 12:6, 14:7, 18:9, 20:10, 22:11, 24:12, 26:13, 28:14, 30:15, 32:16 and 34:17 was done. Then 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16-bit ripple carry adder was done by calling the full adder. The regular 128-bit CSLA was created by calling the ripple carry adders and all multiplexers based on circuit. Finally, regular 128-bit was implemented. mux selection input is always greater than the arrival time of data inputs from the BEC s. Thus, the delay of the remaining MUX depends on the arrival time of mux selection input and the mux delay. First the coding for full adder and multiplexers of 8:4, 10:5, 12:6, 14:7, 18:9, 20:10, 22:11, 24:12, 26:13, 28:14, 30:15, 32:16 and 34:17 was done. The BEC program was design by using NOT, XOR and AND gates. Then 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16 -bit ripple carry adder was done by calling the full adder. The modified 128-bit CSLA was created by calling the ripple carry adders, BEC and all multiplexers based upon the circuit. Finally, modified 128-bit was implemented. The arrival time of selection input of 8:4 mux is earlier. Thus, the sum3 and final c3 (output from mux) depends on s3 and mux and partial c3 (input to mux) and mux, respectively. The sum2 depends on c1and mux. For the remaining group s the arrival time of mux selection input is always greater than the arrival time of data inputs from the BEC s. Thus, the delay of the remaining groups depends on the arrival time of mux selection input and the mux delay. Figure 5: Regular 128-bit SQRT CSLA IV BASIC STRUCTURE OF MODIFIED 128-BIT CSLA It is similar to regular 128-bit SQRT CSLA. Only change is that in basic blocks having two ripplecarry adders, one ripple carry adder fed with a constant 1 carry-in is replaced by BEC. The area estimation of each group is calculated. Based on the consideration of delay values, the arrival time of selection input C1 [time (T) =7] of 6:3 mux is earlier than the s3 [t =9] and c3 [t =7] and later than the s2 [t =4]. Thus, the sum3 and final c3 (output from mux) are depending on s3 and mux and partial c3 (input to mux) and mux, respectively. The sum2 depends on c1 and mux. For the remaining parts the arrival time of Figure. 6. Modified 128-bit SQRT CSLA. VI. IMPLEMENTATION RESULTS The design proposed in this paper has been developed using Verilog-HDL and synthesized in Xilinx ISE 9.2i. For each word size of the adder, the same value changed dump (VCD) file is generated for all possible input conditions and imported the same to Xilinx ISE 9.2i Power Analysis to perform the power simulations. The similar design flow is followed for both the regular and modified SQRT CSLA. Table 5 exhibits the simulation results of both the CSLA structures in terms of delay and area. The simulation Results of regular and Modified Carry Select adder shown figure 7 and 8.comapre to www.ijrcct.org chiefeditor@ijrcct.org Page 348

regular CSLA modified CSLA architecture is take less amount of delay it is approximately reduced by 30% of regular method design Figure. 7. Regular 128-bit CSLA tradeoff of delay for power and area. The modified CSLA architecture is therefore, low area, simple and efficient for VLSI hardware implementation. As the functional verification decides the quality of the silicon, we spend 60% of the design cycle time only for the verification/simulation. This project helps one to understand the complete functional verification process of complex ASICs an SoC s and it gives opportunity to try the latest verification methodologies, programming concepts like Object Oriented Programming of Hardware Verification Languages and sophisticated EDA tools, for the high quality verification. Future Work: This project used System Verilog i.e., the technology used is direct test cases, randomized test cases, OVM for verification even though the coverage is 100% there may be some errors which cannot be shown so in Oder to overcome this the new technology of System Verilog i.e., OVM and UVM. In the coming future the Router can be done by using OVM and UVM. REFERENCES [1] B. Ramkumar and Harish M Kittur Low Power and Area-Efficient Carry Select Adder IEEE Transaction 1063-8210 2011 IEEE [2] Bedrij, O. J., (1962), Carry -select adder, IRE Trans. Electron. Comput., pp.340 344. [3] Ceiang,T. Y. and Hsiao,M. J.,(Oct1998 ), Carry-select adder using single ripple carry adder, Electron. Lett., vol. 34, no. 22, pp. 2101 2103 [4] He, Y., Chang,C. H. and Gu, J., (2005), An A rea efficient 64-bit square root carry-select adder for low power application, in Proc. IEEE Int. Symp.Circuits Syst. vol. 4, pp. 4082 4085. [5] Ramkumar,B., Kittur, H.M. and Kannan,P. M.,(2010 ), ASIC implementation of modified faster carry save adder, Eur. J. Sci. Res., vol. 42, no. 1,pp.53 58. [6] Kim,Y. and Kim,L.-S.,(May2001), 64-bit carry-select adder with reduced area, Electron Lett., vol. 37, no. 10, pp. 614 615. Figure. 8. Modified 128-bit CSLA VII. CONCLUSION [7] E. Abu-Shama and M. Bayoumi, A New cell for low power adders, in Proc.Int. Midwest Symp. Circuits and Systems, 1995, pp. 1014 1017 A simple approach is proposed in this paper to reduce the area of SQRT CSLA architecture. The reduced number of gates of this work offers the great advantage in the reduction of area. The compared results show that the modified SQRT CSLA has a delay (only3.76%), but the area of the 128-bit modified SQRT CSLA are significantly. The area-delay product of the proposed design show a decrease for 16 and 128-bit sizes which indicates the success of the method and not a mere [8] www.xilinx.com. [9] Verilog HDL- Digital Design and Synthesis, by Samir Palnitkar www.ijrcct.org chiefeditor@ijrcct.org Page 349