PARALLEL PREFIX TREE 32-BIT COMPARATOR AND ADDER BY USING SCALABLE DIGITAL CMOS

Similar documents
IMPLEMENTATION OF DIGITAL CMOS COMPARATOR USING PARALLEL PREFIX TREE

Performance Evaluation of Digital CMOS Comparator Using Parallel Prefix Tree.

Design of Low Power Digital CMOS Comparator

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Srinivasasamanoj.R et al., International Journal of Wireless Communications and Network Technologies, 1(1), August-September 2012, 4-9

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

A 65nm Parallel Prefix High Speed Tree based 64-Bit Binary Comparator

ANALYZING THE PERFORMANCE OF CARRY TREE ADDERS BASED ON FPGA S

Design of Parallel Prefix Adders using FPGAs

An Efficient Hybrid Parallel Prefix Adders for Reverse Converters using QCA Technology

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

the main limitations of the work is that wiring increases with 1. INTRODUCTION

High Speed Han Carlson Adder Using Modified SQRT CSLA

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

Design and Implementation of High Performance Parallel Prefix Adders

Design and Characterization of High Speed Carry Select Adder

VHDL for Synthesis. Course Description. Course Duration. Goals

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

Verilog for High Performance

AN IMPROVED FUSED FLOATING-POINT THREE-TERM ADDER. Mohyiddin K, Nithin Jose, Mitha Raj, Muhamed Jasim TK, Bijith PS, Mohamed Waseem P

Improved Fault Tolerant Sparse KOGGE Stone ADDER

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Parallel-Prefix Adders Implementation Using Reverse Converter Design. Department of ECE

Design of High Speed Modulo 2 n +1 Adder

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Area Delay Power Efficient Carry-Select Adder

A New Architecture Designed for Implementing Area Efficient Carry-Select Adder

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

DESIGN AND IMPLEMENTATION OF APPLICATION SPECIFIC 32-BITALU USING XILINX FPGA

A Unified Addition Structure for Moduli Set {2 n -1, 2 n,2 n +1} Based on a Novel RNS Representation

DESIGN OF RADIX-8 BOOTH MULTIPLIER USING KOGGESTONE ADDER FOR HIGH SPEED ARITHMETIC APPLICATIONS

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

Performance of Constant Addition Using Enhanced Flagged Binary Adder

A Low-Power Carry Skip Adder with Fast Saturation

Carry Select Adder with High Speed and Power Efficiency

R07. Code No: V0423. II B. Tech II Semester, Supplementary Examinations, April

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING QUESTION BANK NAME OF THE SUBJECT: EE 2255 DIGITAL LOGIC CIRCUITS

VLSI Implementation of Adders for High Speed ALU

(ii) Simplify and implement the following SOP function using NOR gates:

Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT

Parallel, Single-Rail Self-Timed Adder. Formulation for Performing Multi Bit Binary Addition. Without Any Carry Chain Propagation

Area-Delay-Power Efficient Carry-Select Adder

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE

1. Introduction. Raj Kishore Kumar 1, Vikram Kumar 2

FPGA Implementation of Efficient Carry-Select Adder Using Verilog HDL

AN EFFICIENT REVERSE CONVERTER DESIGN VIA PARALLEL PREFIX ADDER

Area Delay Power Efficient Carry Select Adder

Lecture 5. Other Adder Issues

ASIC IMPLEMENTATION OF 16 BIT CARRY SELECT ADDER

Partitioned Branch Condition Resolution Logic

A Simple Method to Improve the throughput of A Multiplier

Reduced Delay BCD Adder

Formulation for Performing Multi Bit Binary Addition using Parallel, Single-Rail Self-Timed Adder without Any Carry Chain Propagation

Chapter 4. Combinational Logic

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices

A Novel Floating Point Comparator Using Parallel Tree Structure

MLR Institute of Technology

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Area Delay Power Efficient Carry-Select Adder

Design of Low Power Asynchronous Parallel Adder Benedicta Roseline. R 1 Kamatchi. S 2

Paper ID # IC In the last decade many research have been carried

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

International Journal of Innovative Research in Computer Science & Technology (IJIRCST) ISSN: , Volume-3, Issue-5, September-2015

Implementation of 32-Bit Wave Pipelining Sparse Tree Adders

DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY

INSTITUTE OF AERONAUTICAL ENGINEERING Dundigal, Hyderabad ELECTRONICS AND COMMUNICATIONS ENGINEERING

Combinational Logic II

Delay Optimised 16 Bit Twin Precision Baugh Wooley Multiplier

Low-Area Low-Power Parallel Prefix Adder Based on Modified Ling Equations

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

VARUN AGGARWAL

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Design of Arithmetic circuits

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

CAD4 The ALU Fall 2009 Assignment. Description

NADAR SARASWATHI COLLEGE OF ENGINEERING AND TECHNOLOGY Vadapudupatti, Theni

IMPLEMENTATION OF TWIN PRECISION TECHNIQUE FOR MULTIPLICATION

Hours / 100 Marks Seat No.

Low-Power FIR Digital Filters Using Residue Arithmetic

Design of Delay Efficient Carry Save Adder

Synthesis at different abstraction levels

ISSN (Online)

An FPGA based Implementation of Floating-point Multiplier

Field Programmable Gate Array (FPGA)

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

קורס VHDL for High Performance. VHDL

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

Analysis of Different Multiplication Algorithms & FPGA Implementation

International Journal Of Global Innovations -Vol.6, Issue.I Paper Id: SP-V6-I1-P14 ISSN Online:

Design of an Efficient 128-Bit Carry Select Adder Using Bec and Variable csla Techniques

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016

FPGA IMPLEMENTATION OF EFFCIENT MODIFIED BOOTH ENCODER MULTIPLIER FOR SIGNED AND UNSIGNED NUMBERS

ECE 341 Midterm Exam

Transcription:

PARALLEL PREFIX TREE 32-BIT COMPARATOR AND ADDER BY USING SCALABLE DIGITAL CMOS Tulluri.Jyotsna, Mrs.J.Mrudula ABSTRACT Comparators and adders are key design elements for a wide range of applications scientific computation, test circuit applications and optimized equality-only comparators for general-purpose processor components (associative memories, loadstore queue buffers, translation look-aside buffers, branch target buffers), and many other CPU argument comparison blocks. Even though comparator logic design is straight forward, the extensive use of comparator designs use dynamic gate logic circuit structures to enhance performance, while others leverage specialized arithmetic units for wide comparisons, along with custom logic circuits. These structures require log2 N comparison levels, with each level consisting of several cascaded logic gate but with prefix tree structures area and power consumption can be improved by leveraging two-input multiplexers at each level and generate propagate logic cells on the first level which takes advantage of one s complement addition. This design uses all N Transistor (ANT) circuits to compensate for high fan- in with high pipeline throughput Key words- Binary to excess -1 converter (BEC), Variable stage Carry Select, Adder, Clock Select Adder with Sharing (CSAS), Logic design procedure. I.INTRODUCTION Comparators design improve scalability and reduce comparison delays using a hierarchical prefix tree structure composed of 2-b comparators. These structures require log2 N comparison levels, with each level consisting of several cascaded logic gates. However, the delay and area of these designs may be prohibitive for comparing wide operands. The prefix tree structure s area and power consumption can be improved by leveraging two input multiplexers (instead of 2-b comparator cells) at each level and generate-propagate logic cells on the first level (instead of 2-b adder cells),which takes advantage of one s compliment addition. Using this logic composition, a prefix tree requires six levels for the most common comparison bit width of 64 bits, but suffers from high power consumption due to every cell in the structure being active, regardless of the input operands values.furthermore,the structure can perform only greater-than or less than comparisons and not equality. To improve the speed and reduce power consumption, several designs rely on pipelining and power-down mechanisms to reduce switching activity with respect to the actual input operands bit values. One design uses all N transistor (ANT) circuits to compensate for high fan-in with high pipeline throughput. A multiphase clocking scheme may be unsuitable for high-speed single cycle processors because of several heavily loaded global clock signals that have high power transition activity.additionally,race conditions and a heavily constrained clock jitter margin may make this design unsuitable for wide range comparators. An alternative architecture leverages priority-encoder magnitude decision logic with two pipelined operations that are triggered at both the falling and rising clock edges to improve operating speed and eliminate long dynamic logic chains. However, 64-b and wider comparators require an multilevel cascade structure, with each logic level consisting of seven nmos transistors connected in ISSN: 2278 909X All Rights Reserved 2014 IJARECE 1294

series that behave in saturating mode during operation. This structure leads to a large overall conductive resistance, with heavily loaded parasitic components on the clock signal, which severely limits the clock speed and jitter margin. Other architectures use a multiplexer-based structure to spilt a 64-b comparator into two comparator stages: the first stage consists of eight modules performing 8-b comparisons and the modules outputs are input into a priority encoder and the second stage uses an 8-to -1 multiplexer to select the appropriate result from the eight modules in the first stage. To reduce the long delays suffered by bitwise ripple designs, an enhanced architecture incorporates and algorithm that uses no arithmetic operations. This scheme detects the larger operand by determining which operand possesses the leftmost 1 bit after preencoding, before supplying the operands to bitwise competition logic (BCL) Structure. The BCL structure partitions the operands into 8-b blocks and the result for each block is input into a multiplexer to determine the final comparison decision. Due to this BCL-based design s low transistor count, this design has the potential for low power consumption, but the pre-encoder logic modules preceding the BCL modules limit the maximum achievable operating frequency. In addition, special control logic is needed to enable the BCL units to switch dynamically in a synchronized fashion, thus to increasing the power consumption and reducing the operation frequency. To alleviate some of the drawbacks of previous designs (such as high power consumption,multi cycle computation, custom structures unsuitable for continued technology scaling, long time to market due to irregular VLSI structures, and irregular transistor geometry sizes), in this paper we leverage standard CMOS-cells to architect fast,scalable,wide range,and power-efficient algorithmic comparators with the following key features 1.Reconfigurable arithmetic algorithms with total (input-to-output) hardware realization for both fully custom and standard-cell approaches, improves the longevity of our design and makes our design ideal for technology scaling and short time to market. 2. A novel MSB-to-LSB parallel prefix tree structure, based on a reduce switching paradigm and using parallelism at each level (as opposed to a sequential approach),contributes to the speed and energy efficiency of our design. 3. Use of components built form simple single gatelevel logic,with maximum fan in and fan-out of five and four,respectively,regardless of the comparator bit-width, makes it easy to characterize and accurately model our comparator for arbitrary bitwidths. 4. Use of combinatorial logic, with neither clock gating nor latency delay, enables global partitioning into two main pipelined stages or locally into several pipelined stages based on the number of levels. This flexibility provides area versus performance tradeoffs. Some of the applications are USB Transmitter has been developed into a common code (Generalized USB Transmitter ) which can be reused for developing the complete USB device stack. Some of the Low speed and High-speed USB devices, which are presently in the market are (Optical Mouse, Keyboard, Printer, Scanner, Joy Stick, Memory stick, FlashMemor, Mobiles, Video Cameras). II. PARALLEL PREFIX TREE COMPARATOR The comparison resolution module performs the bitwise comparison asynchronously from left to right, such that the comparison logic s computation is triggered only if all bits of greater significance are equal. The parallel structure encodes the bitwise comparison results into two N-Bit buses, the left bus and the right bus, each of which store the partial comparison result as each of which store the partial comparison result as each bit position is evaluated. The comparison resolution module in Fig (which depicts the high level architecture of our proposed design )is a novel MSB -to- LSB parallel prefix tree structure that performs bitwise comparison of two N- bit operands A and B,denoted as AN-1,AN- ISSN: 2278 909X All Rights Reserved 2014 IJARECE 1295

2,,,,,,,A0 and BN-1,BN-2,,,,,,,,B0,where the subscripts range from N-1 for the MSB to 0 for the LSB. In addition, to reduce switching activities, as soon as a bitwise comparison is not equal, the bitwise comparison of every bit of lower significance is terminated and all such positions are set to zero on both buses,thus, there is never more than one high bit on either bus. The decision module uses two ORnetworks to output the final comparison decision based on separate OR-scans of all of the bits on the left bus (producing the R bit). If LR=00, then A=b, if LR=10 then A>b, if LR=01 then A<B, and LR = 11 is not possible. An 8-b comparison of input operands A=01011101 and B=01101001 is illustrated in fig.in the first step, a parallel prefix tree structure generates the encoded data on the left bus and right bus for each pair of corresponding bits from A and B. decision can be made based on the first three bits evaluated. The parallel prefix structure forces all bits of lesser significance on each bus to 0, regardless of the remaining bit values in the operands. In the second step, the OR-networks perform the bus ORscans, resulting in 0 and 1, respectively, and the final comparison decision is A>B. We partition the structure into five hierarchical prefixing sets, as depicted. Fig 2. Algorithm for 8-bit comparison III.PREFIX TREE ADDERS Parallel-prefix adders, also known as carry-tree adders, pre-compute the propagate and generate signals. These signals are variously combined using the fundamental carry operator (fco). (gl, pl) ο (gr, pr) = (gl + pl gr, pl pr) (1) Fig 1.Block diagram of parallel prefix tree comparator, In this example,a7=0 and B7=o encodes as left7=right7=0,a6=1,and B6=1 encodes as left 6=right 6=0,and A5=0 and B5=1 encodes left5=0 and right5=1. At this point, since the bits are unequal, the comparison terminates and a final comparison Due to associative property of the fco, this operator can be combined in different ways to form various adder structures. For, example the four bit carrylook ahead generator is given by: c4 = (g4, p4) ο [ (g3, p3) ο [(g2, p2) ο (g1, p1)] ] (2) ISSN: 2278 909X All Rights Reserved 2014 IJARECE 1296

A simple rearrangement of the order of operations allows parallel operation, resulting in a more efficient tree structure for this four bit example: c4 = [(g4, p4) ο (g3, p3)] ο [(g2, p2 ) ο (g1, p1)] (3) It is readily apparent that a key advantage of the tree structured adder is that the critical path due to the carry delay is on the order of log2n for an N-bit wide adder.the arrangement of the prefix network give s rise to various families of adders. For a discussion of the various carry tree structures. For this study, the focus is on the Kogge-stone adder,known for having minimal logic depth and fan out (see fig ) Here we designate BC as the black cell which generates the ordered pair in equation (1) ;the grey cell (GC)generates the left signal only, following. The interconnect area is known to be high, but for an FPGA with large routing overhead to begin with, this is not as important as in a VLSI implementation. The regularity of the Kogge-stone prefix network has built in redundancy which has implications for faulttolerant designs. The sparse Kogge-stone adder, shown in Fig 1(b), is also studied. This hybrid design completes the summation process with a 4bit RCA allowing the carry prefix network to be simplified. Fig 4. Spanning tree carry-lookahead Adder IV.SIMULATION AND SYNTHESIS REPORT FOR COMPARATOR Simulation Results: Fig 3. Parallel prefix Carry tree adder design Another carry-tree adder known as the spanning tree carry-look ahead (CLA) adder is also examined. Like the sparse Kogge-stone adder, this design terminates with a 4-bit RCA.As the FPGA uses a fast carrychain for the RCA, it is interesting to compare the performance of this adder with the sparse Kogge- Stone and regular Kogge-Stone adders. Also of interest for the spanning tree CLA is its testability features. ISSN: 2278 909X All Rights Reserved 2014 IJARECE 1297

Synthesis Report: Selected Devices : 3s1600efg320-5 Number of Slices :113 out of 14752 Number of Slice FFs :97 out of 29504 Number of 4 input LUTs :208 out of 29504 Number of Ios : 69 Number of bonded IOBs : 69 out of 250 Number of GCLKs : 1 out of 24 Max combinational path delay : No path delay VI.CONCLUSION A 32-bit parallel prefix tree comparator and adder have been designed, verified functionally using VHDL-simulator, synthesized by the synthesis tool, and a final net list has been created. This design is capable of comparing all the equality, greater and lesser comparisons. Because of the structure of the configurable logic and routing resources in FPGAs, parallel-prefix adders will have a different performance than VLSI implementations. The Functional-simulation has been successfully carried out with the results matching with the expected ones. Total memory usage : 217004kilobyte VII.FUTURESCOPE V.SIMULATION AND SYNTHESIS REPORT FOR ADDER Simulation Results: Future work will include additional circuit optimizations to further reduce the power dissipation by adapting dynamic and analog implementations for the comparator resolution module and high-speed zero-detector circuit for the decision module. Given that our comparator is composed of two balance timing modules, the structure can be divided into two or more pipeline stages with balanced delays, based on a set structure, to effectively increase the comparison throughput at the expense of increased power and latency. VIII.REFERENCES Synthesis Report: Selected Devices : 3s1600efg320-5 Number of Slices : 37 out of 14752 Number of 4 input LUTs : 64 out of 29504 Number of Ios : 98 Number of bonded IOBs : 98 out of 250 Max combinational path delay : 38.041ns Total memory usage : 210860kilobyte 1. F.Frustaci, S.Perri, M.Lanuzza and P.Carsonello, Energy-efficient single clock cycle binary comparator,int.j.circuit Theory Appl.,vol 40,no.3,pp.237-246, March.2012. 2. B. Parhami, Efficient hamming weight comparators for binary vectors based on accumulative and up/down parallel counters, IEEE Trans. Circuits Syst., vol. 56, no. 2, pp. 167 171, Feb. 2009. 3. M. S. Schmookler and K. J. Nowka, Leading zero anticipation and detection a comparison of methods, in Proc. 15th IEEE Symp. Compute. Arithmetic, Sep. 2001, pp. 7 12. 4. Basic VLSI design, 3 rd Edition by Douglas A. Pucknell, and Kamran Eshraghian. ISSN: 2278 909X All Rights Reserved 2014 IJARECE 1298

IX ABOUT AUTHOR (S) Tulluri Jyostna was born in Nandigama, AndhraPradesh. She had done her B. Tech from MIC College of Technology located near Vijayawada in Electronics Communications Engineering. After completion of B. Tech., she has undergone a training PG diploma in industrial automation in Hyderabad. She presently pursuing M.Tech (VLSI System Design) in Geethanjali college of engineering and Technology,Cheeryal village,ghatkesar (mandal) Telangana. 2. Mrs.J.Mrudula, She is presently pursuing her PhD and she working as a Associate professor in Geethanjali college of Engineering and Technology, Cheeryal village, Ghatkesar (mandal),telangana ISSN: 2278 909X All Rights Reserved 2014 IJARECE 1299