Binary Adder Architectures for Cell-Based VLSI and their Synthesis

Size: px

Start display at page:

Download "Binary Adder Architectures for Cell-Based VLSI and their Synthesis"

Solomon Randall
5 years ago
Views:

1 Dss. ETH No. Bnary Adder Archtectures for Cell-Based VLSI and ther Synthess A dssertaton submtted to the SWISS FEDERAL INSTITUTE OF TECHNOLOGY ZURICH for the degree of Doctor of techncal scences presented by RETO ZIMMERMANN Dpl. Informatk-Ing. ETH born ctzen of Vechgen BE accepted on the recommendaton of Prof. Dr. W. Fchtner, examner Prof. Dr. L. Thele, co-examner Acknowledgments I would lke to thank my advsor, Prof. Wolfgang Fchtner, for hs overall support and for hs confdence n me and my work. I would also lke to thank Prof. Lothar Thele for readng and co-examnng the thess. I am greatly ndebted n Hubert Kaesln and Norbert Felber for ther encouragement and support durng the work as well as for proofreadng and commentng on my thess. I also want to express my grattude to all colleagues at the Integrated Systems Laboratory who contrbuted to the perfect workng envronment. In partcular, I want to thank the secretares for keepng the admnstraton, Hanspeter Mathys and Hansjörg Gsler the nstallatons, Chrstoph Wck and Adam Fegn the computers, and Andreas Weland the VLSI desgn tools runnng. I want to thank Hanspeter Kunz and Patrck Müller for the valuable contrbutons durng ther student projects. Also, I am grateful to Rajv Gupta, Duncan Fsher, and all other people who supported me durng my nternshp at Rockwell Semconductor Systems n Newport Beach, CA. I acknowledge the fnancal support of McroSwss, a Mcroelectroncs Program of the Swss Government. Fnally my specal thanks go to my parents for ther support durng my educaton and for ther understandng and tolerance durng the last couple of years. 997

2 Acknowledgments Contents Acknowledgments Abstract x Zusammenfassung x Introducton. : : : : : : : : : : : : : : : : : : : : : : : : : Motvaton. : : : : : : : : : : : : : : : : : : : : : : : RelatedWork. : : : : : : : : : : : : : : : : : : : : : GoalsofthsWork. : : : : : : : : : : : : : : : : : : StructureoftheThess Basc Condtons and Implcatons. : : : : : : : : : : : : ArthmetcOperatonsandUnts.. : : : : : : : : : : : : : : : : : : : : Applcatons 6.. : : : : : : : : : : : Bascarthmetcoperatons 6.. Number representaton : : : : : : : schemes.. Sequental and combnatonal : : : : crcuts.. Synchronous and self-tmed : : : : : : crcuts

3 v Contents Contents v..6 Carry-propagate and carry-save : : adders..7 : : : : : : : : : : : : : : : : : : : : Implcatons. Crcut and Layout Desgn : : : : : : : : Technques.. Layout-based desgn : : : : : : : technques.. : : : : : : : : : Cell-baseddesgntechnques.. : : : : : : : : : : : : : : : : : : : : Implcatons. : : : : : : : : : : : : : : : : : : SubmcronVLSIDesgn.. : : : : : : : : : : : : : Multlevelmetalroutng.. : : : : : : : : : : : : : : : : Interconnectdelay 6.. : : : : : : : : : : : : : : : : : : : : Implcatons 6. Automated Crcut Synthess and : : : Optmzaton 6.. : : : : : : : : : : : : : : : : Hgh-levelsynthess 6.. : : : : : : : : : : : : : : : : Low-levelsynthess 7.. : : : : : : : : : : : : : : : Data-pathsynthess 7.. Optmzaton of combnatonal : : : : crcuts 7.. Hardware descrpton : : : : : : : languages..6 : : : : : : : : : : : : : : : : : : : : Implcatons. Crcut Complexty and Performance : : Modelng.. : : : : : : : : : : : : : : : : : : Areamodelng 9.. : : : : : : : : : : : : : : : : : : Delaymodelng.. : : : : : : : : Powermeasuresandmodelng.. Combned crcut performance : : measures.. : : : : : : : : : : : : : : : : : : : : Implcatons.6 : : : : : : : : : : : : : : : : : : : : : : : : : : Summary Basc Addton Prncples and Structures 7. : : : : : : : : : : : : : : -BtAdders,(m,k)-Counters 7.. : : : : : : : : : : : : Half-Adder,(,)-Counter.. : : : : : : : : : : : : Full-Adder,(,)-Counter 9.. : : : : : : : : : : : : : : : : : : (m,k)-counters. : : : : : : : : : : : : Carry-PropagateAdders(CPA). : : : : : : : : : : : : : : : : Carry-SaveAdders(CSA). : : : : : : : : : : : : : : : : : : Mult-OperandAdders.. : : : : : : : : : : : : : : : : : : : ArrayAdders.. : : : : : : : : : : : : : : : : (m,)-compressors 6.. : : : : : : : : : : : : : : : : : : : : TreeAdders 9.. : : : : : : : : : : : : : : : : : : : : : : Remarks. : : : : : : : : : : : : : : : : : : : : : : PrefxAlgorthms.. : : : : : : : : : : : : : : : : : : Prefxproblems.. : : : : : : : : : : : : : : Seral-prefxalgorthm.. : : : : : : : : : : : : : : Tree-prefxalgorthms.. : : : : : : : : : : : : : Group-prefxalgorthms.. Bnary addton as a prefx : : : : : : problem.6 : : : : : : : : : BascAddtonSpeed-UpTechnques 6.6. Bt-Level or Drect CPA : : : : : : : : Schemes.6. Block-Level or Compound CPA : : Schemes 9.6. : : : : : : : : : : : : CompostonofSchemes 6 AdderArchtectures 67

4 v Contents Contents v. AnthologyofAdderArchtectures : : : : : : : : : : : 67.. Rpple-CarryAdder(RCA) : : : : : : : : : : : 67.. Carry-SkpAdder(CSKA) : : : : : : : : : : : : 6.. Carry-SelectAdder(CSLA) : : : : : : : : : : : 7.. Condtonal-SumAdder(COSA) : : : : : : : : 7.. Carry-IncrementAdder(CIA) : : : : : : : : : 7..6 Parallel-Prefx/ Carry-Lookahead Adders(PPA /CLA) : : : : : : : : : : : : : : : : : : : : : : : :..7 HybrdAdderArchtectures : : : : : : : : : : :. Complexty and Performance Comparsons : : : : : 9.. Adder Archtectures Compared : : : : : : : : 9.. Comparsons Based on Unt-Gate Area and DelayModels : : : : : : : : : : : : : : : : : : : 9.. Comparson Based on Standard-Cell Implementatons : : : : : : : : : : : : : : : : : : : : : 9.. ResultsandDscusson : : : : : : : : : : : : : : 97.. MoreGeneralObservatons : : : : : : : : : :..6 ComparsonDagrams : : : : : : : : : : : : : :. Summary: Optmal Adder Archtectures : : : : : : : SpecalAdders. AdderswthFlagGeneraton : : : : : : : : : : : : : :. AddersforLateInputCarry : : : : : : : : : : : : : : :. Adders wth Relaxed Tmng Constrants : : : : : : : 6. Adders wth Non-Equal Bt Arrval Tmes : : : : : : : : 6. : : : : : : : : : : : : : : : : : : : : : : ModuloAdders.. AddtonModulo (? ) : : : : : : : : : : : : n.. AddtonModulo ( + ) : : : : : : : : : : : : n.6 : : : : : : : : : : : : : : : : : : : : : Dual-SzeAdders 6.7 : : : : : : : : : : : : : RelatedArthmetcOperatons 9.7. : : : : : : : : : : scomplementsubtractors 9.7. : : : : : : : : : Incrementers/Decrementers.7. : : : : : : : : : : : : : : : : : : : Comparators 6 AdderSynthess 6. : : : : : : : : : : : : : : : : : : : : : : : : Introducton 6. : : : : : : : : : : : PrefxGraphsandAdderSynthess 6. Synthess of Fxed Parallel-Prefx : : : : : : Structures 6.. : : : : : : : : : : GeneralSynthessAlgorthm 6.. : : : : : : : : : : : : : : : : Seral-PrefxGraph : : : : : : : : : : SklanskyParallel-PrefxGraph 6.. Brent-Kung Parallel-Prefx : : : : : : : : Graph Level Carry-Increment Parallel-Prefx Graph Level Carry-Increment Parallel-Prefx Graph 6. Synthess of Flexble Parallel-Prefx : : : : : Structures 6.. : : : : : : : : : : : : : : : : : : : : Introducton 6.. Parallel-Prefx Adders : : : : : : : : : Revsted 6.. Optmzaton and Synthess of Prefx Structures 6.. Expermental Results and : : : : : : Dscusson

5 v Contents Contents x 6.. Parallel-Prefx Schedules wth Resource Constrants : : : : : : : : : : : : : : : : : : : : : : : 6. Valdty and Verfcaton of Prefx Graphs : : : : : : : Propertes of the Prefx Operator : : : : : : : : GeneralzedPrefxProblem : : : : : : : : : : : Transformatons of Prefx Graphs : : : : : : : : ValdtyofPrefxGraphs : : : : : : : : : : : : : IrredundancyofPrefxGraphs : : : : : : : : : VerfcatonofPrefxGraphs : : : : : : : : : : : Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ComplexCells 7. : : : : : : : : : : : : : : : : : : : PpelnngofAdders 7.6 : : : : : : : : : : : : : : : : : : : : : AddersonFPGAs : : : : : : : : : : : : : Coarse-GranedFPGAs 7.6. : : : : : : : : : : : : : : : Fne-GranedFPGAs Conclusons 9 Bblography 97 Currculum Vtae 7 VLSI Aspects of Adders 7 7. VerfcatonofParallel-PrefxAdders : : : : : : : : : : VerfcatonGoals : : : : : : : : : : : : : : : : : VerfcatonTestBench : : : : : : : : : : : : : : 7 7. Transstor-LevelDesgnofAdders : : : : : : : : : : : : Dfferences between Gate- and Transstor- LevelDesgn : : : : : : : : : : : : : : : : : : : : LogcStyles : : : : : : : : : : : : : : : : : : : : Transstor-Level Arthmetc Crcuts : : : : : : : ExstngCustomAdderCrcuts : : : : : : : : : Proposed Custom Adder Crcut : : : : : : : : LayoutofCustomAdders : : : : : : : : : : : : : : : : 7. LbraryCellsforCell-BasedAdders : : : : : : : : : : : 7.. SmpleCells : : : : : : : : : : : : : : : : : : : :

6 Abstract The addton of two bnary numbers s the fundamental and most often used arthmetc operaton on mcroprocessors, dgtal sgnal processors (DSP), and data-processng applcaton-specfc ntegrated crcuts (ASIC). Therefore, bnary adders are crucal buldng blocks n very large-scale ntegrated (VLSI) crcuts. Ther effcent mplementaton s not trval because a costly carrypropagaton operaton nvolvng all operand bts has to be performed. Many dfferent crcut archtectures for bnary addton have been proposed over the last decades, coverng a wde range of performance characterstcs. Also, ther realzaton at the transstor level for full-custom crcut mplementatons has been addressed ntensvely. However, the sutablty of adder archtectures for cell-based desgn and hardware synthess both prerequstes for the ever ncreasng productvty n ASIC desgn was hardly nvestgated. Based on the varous speed-up schemes for bnary addton, a comprehensve overvew and a qualtatve evaluaton of the dfferent exstng adder archtectures are gven n ths thess. In addton, a new multlevel carryncrement adder archtecture s proposed. It s found that the rpple-carry, the carry-lookahead, and the proposed carry-ncrement adders show the best overall performance characterstcs for cell-based desgn. These three adder archtectures, whch together cover the entre range of possble area vs. delay trade-offs, are comprsed n the more general prefx adder archtecture reported n the lterature. It s shown that ths unversal and flexble prefx adder structure also allows the realzaton of varous customzed adders and of adders fulfllng arbtrary tmng and area constrants. A non-heurstc algorthm for the synthess and optmzaton of prefx adders s proposed. It allows the runtme-effcent generaton of area-optmal adders for gven tmng constrants. x

7 Zusammenfassung De Addton zweer bnärer Zahlen st de grundlegende und am mesten verwendete arthmetsche Operaton n Mkroprozessoren, dgtalen Sgnalprozessoren (DSP) und datenverarbetenden anwendungsspezfschen ntegrerten Schaltungen (ASIC). Deshalb stellen bnäre Adderer krtsche Komponenten n hochntegrerten Schaltungen (VLSI) dar. Deren effzente Realserung st ncht trval, da ene teure carry-propagaton Operaton ausgeführt werden muss. Ene Velzahl verschedener Schaltungsarchtekturen für de bnäre Addton wurden n den letzten Jahrzehnten vorgeschlagen, welche sehr unterschedlche Egenschaften aufwesen. Zudem wurde deren Schaltungsrealserung auf Transstornveau berets engehend behandelt. Anderersets wurde de Egnung von Addererarchtekturen für zellbaserte Entwcklungstechnken und für de automatsche Schaltungssynthese bedes Grundvoraussetzungen für de hohe Produktvtätsstegerung n der ASIC Entwcklung bsher kaum untersucht. Baserend auf den manngfaltgen Beschleungungstechnken für de bnäre Addton wrd n deser Arbet ene umfassende Überscht und en qualtatver Verglech der verschedenen exsterenden Addererarchtekturen gegeben. Zudem wrd ene neue multlevel carry-ncrement Addererarchtektur vorgeschlagen. Es wrd gezegt, dass der rpple-carry, der carry-lookahead und der vorgeschlagene carry-ncrement Adderer de besten Egenschaften für de zellbaserte Schaltungsentwcklung aufwesen. Dese dre Addererarchtekturen, welche zusammen den gesamten Berech möglcher Kompromsse zwschen Schaltungsfläche und Verzögerungszet abdecken, snd n der allgemeneren Prefx-Addererarchtektur enthalten, de n der Lteratur beschreben st. Es wrd gezegt, dass dese unverselle und flexble Prefx-Addererstruktur de Realserung von verschedensten spezalx

8 xv Zusammenfassung serten Adderern mt belebgen Zet- und Flächenanforderungen ermöglcht. En ncht-heurstscher Algorthmus für de Synthese und de Zetoptmerung von Prefx-Adderern wrd vorgeschlagen. Deser erlaubt de recheneffzente Genererung flächenoptmaler Adderer unter gegebenen Anforderungen and de Verzögerungszet. Introducton. Motvaton The core of every mcroprocessor, dgtal sgnal processor (DSP), and dataprocessng applcaton-specfc ntegrated crcut (ASIC) s ts data path. It s often the crucal crcut component f de area, power dsspaton, and especally operaton speed are of concern. At the heart of data-path and addressng unts n turn are arthmetc unts, such as comparators, adders, and multplers. Fnally, the basc operaton found n most arthmetc components s the bnary addton. Besdes of the smple addton of two numbers, adders are also used n more complex operatons lke multplcaton and dvson. But also smpler operatons lke ncrementaton and magntude comparson base on bnary addton. Therefore, bnary addton s the most mportant arthmetc operaton. It s also a very crtcal one f mplemented n hardware because t nvolves an expensve carry-propagaton step, the evaluaton tme of whch s dependent on the operand word length. The effcent mplementaton of the addton operaton n an ntegrated crcut s a key problem n VLSI desgn. Productvty n ASIC desgn s constantly mproved by the use of cellbased desgn technques such as standard cells, gate arrays, and feldprogrammable gate arrays (FPGA) and by low- and hgh-level hardware synthess. Ths asks for adder archtectures whch result n effcent cell-based

9 Introducton. Structure of the Thess crcut realzatons whch can easly be syntheszed. Furthermore, they should provde enough flexblty n order to accommodate custom tmng and area constrants as well as to allow the mplementaton of customzed adders.. RelatedWork Much work has been done and many publcatons have been wrtten on crcut archtectures for bnary addton. Dfferent well-known adder archtectures are wdely used and can be found n any book on computer arthmetc [Kor9, Cav, Spa, Hwa79, Zm97]. Many adder crcut mplementatons at the transstor level are reported n the lterature whch use a varety of dfferent adder archtectures and combnatons thereof [D + 9, G + 9, M + 9, OV9, O + 9, M + 9]. On the other hand, a systematc overvew of the basc addton speedup technques wth ther underlyng concepts and relatonshps can hardly be found. Ths, however, s a prerequste for optmal adder mplementatons and versatle synthess algorthms. Furthermore, optmalty of adder archtectures for cell-based desgns was not nvestgated ntensvely and comprehensve performance comparsons were carred out only margnally [Tya9]. Most work so far has focused on the standard two-operand addton. The effcent realzaton of customzed adders such as adders wth flag generaton, non-unform sgnal arrval tmes [Okl9], fast carry-n processng, modulo [ENK9] and dual-sze adders were not consdered wdely. Fnally, the synthess of adder crcuts was addressed only margnally up to now. Ths s because the generaton of fxed adder archtectures s rather straghtforward and because no effcent synthess algorthms for flexble adder archtectures were known. Exceptons are some publcatons on the computaton of optmal block szes e.g. for carry-skp adders [Tur9] and on heurstc algorthms for the optmzaton of parallel-prefx adders [Fs9, GBB9].. Goals of ths Work As a consequence, the followng goals have been formulated for ths work: Establsh an overvew of the basc addton speed-up schemes, ther characterstcs, and ther relatonshps. Derve all possble adder archtectures from the above speed-up schemes and compare them qualtatvely and quanttatvely wth focus on cellbased crcut mplementaton, sutablty for synthess, and realzaton of customzed adders. Try to unfy the dfferent adder archtectures as much as possble n order to come up wth more generc adder structures. The deal soluton would be a flexble adder archtecture coverng the entre range of possble area-delay trade-offs wth mnor structural changes. Elaborate effcent and versatle synthess algorthms for the best performng adder archtectures found n the above comparsons. The deal soluton would consst of one unversal algorthm for a generc adder archtecture, whch takes automatcally nto account arbtrary tmng and area constrants. Incorporate the realzaton and generaton of customzed adders nto the above adder archtectures and synthess algorthms. Address other mportant VLSI aspects such as crcut verfcaton, layout topologes, and ppelnng for the chosen adder archtectures.. Structure of the Thess As a startng pont, the basc condtons and ther mplcatons are summarzed n Chapter. It s substantated why cell-based combnatonal carry-propagate adders and ther synthess are mportant n VLSI desgn and thus worthwhle to be covered by ths thess. Chapter ntroduces the basc addton prncples and structures. Ths ncludes -bt and mult-operand adders as well as the formulaton of carrypropagaton as a prefx problem and ts basc speed-up prncples. The dfferent exstng adder archtectures are descrbed n Chapter. In addton, a new carry-ncrement adder archtecture s ntroduced. Qualtatve and quanttatve comparsons are carred out and documented on the bass of a unt-gate model and of standard-cell mplementatons. It s shown that the best-performng adders are all prefx adders.

10 Introducton The mplementaton of specal adders usng the prefx adder archtecture s treated n Chapter. In Chapter 6, synthess algorthms are gven for the best-performng adder archtectures. Also, an effcent non-heurstc algorthm s presented for the synthess and optmzaton of arbtrary prefx graphs used n parallel-prefx adders. An algorthm for the verfcaton of prefx graphs s also elaborated. Varous mportant VLSI aspects relatng to the desgn of adders are summarzed n Chapter 7. These nclude verfcaton, transstor-level desgn, and layout of adder crcuts, lbrary aspects for cell-based adders, ppelnng of adders, and the realzaton of adder crcuts on FPGAs. Fnally, the man results of the thess are summarzed and conclusons are drawn n Chapter. Basc Condtons and Implcatons Ths chapter formulates the motvaton and goals as well as the basc condtons for the work presented n ths thess by answerng the followng questons: Why s the effcent mplementaton of combnatonal carry-propagate adders mportant? What wll be the key layout desgn technologes n the future, and why do cell-based desgn technques such as standard cells get more and more mportance? How does submcron VLSI challenge the desgn of effcent combnatonal cell-based crcuts? What s the current status of hghand low-level hardware synthess wth respect to arthmetc operatons and adders n partcular? Why s hardware synthess ncludng the synthess of effcent arthmetc unts becomng a key ssue n VLSI desgn? How can area, delay, and power measures of combnatonal crcuts be estmated early n the desgn cycle? How can the performance and complexty of adder crcuts be modeled by takng nto account archtectural, crcut, layout, and technology aspects? Although some of the followng aspects can be stated for VLSI desgn n general, the emphass wll be on the desgn of arthmetc crcuts.. Arthmetc Operatons and Unts The tasks of a VLSI chp whether as applcaton-specfc ntegrated crcut (ASIC) or as general-purpose mcroprocessor are the processng of data and

11 6 Basc Condtons and Implcatons. Arthmetc Operatons and Unts 7 the control of nternal or external system components. Ths s typcally done by algorthms whch base on logc and arthmetc operatons on data tems. based on operaton fxed-pont floatng-pont.. Applcatons related operaton <<, >> Applcatons of arthmetc operatons n ntegrated crcuts are manfold. Mcroprocessors and dgtal sgnal processors (DSPs) typcally contan adders and multplers n ther data path, formng dedcated nteger and/or floatng-pont unts and multply-accumulate (MAC) structures. Specal crcut unts for fast dvson and square-root operatons are sometmes ncluded as well. Adders, ncrementers/decrementers, and comparators are arthmetc unts often used for address calculaton and flag generaton purposes n controllers. Applcaton-specfc ICs use arthmetc unts for the same purposes. Dependng on ther applcaton, they even may requre dedcated crcut components for specal arthmetc operators, such as for fnte feld arthmetc used n cryptography, error correcton codng, and sgnal processng... Basc arthmetc operatons =, < +, +/ sqrt (x) exp (x) log (x) arthops.eps 7 mm trg (x) +, hyp (x) complexty +, (same as on the left for floatng-pont numbers) The arthmetc operatons that can be computed n electronc equpment are (ordered by ncreasng complexty, see Fg..) [Zm97]: Fgure.: Dependences of arthmetc operatons. shft / extenson operatons equalty and magntude comparson ncrementaton / decrementaton complementaton (negaton) addton / subtracton multplcaton dvson square root exponentaton logarthmc functons trgonometrc and nverse trgonometrc functons hyperbolc functons For trgonometrc and logarthmc functons as well as exponentaton, varous teratve algorthms exst whch make use of smpler arthmetc operatons. Multplcaton, dvson and square root extracton can be performed usng seral or parallel methods. In both methods, the computaton s reduced to a sequence of condtonal addtons/subtractons and shft operatons. Exstng speed-up technques try to reduce the number of requred addton/subtracton operatons and to mprove ther speed. Subtracton corresponds to the addton of a negated operand. The addton of two n-bt numbers tself can be regarded as an elementary operaton. In fact, decomposton nto a seres of ncrements and shfts s possble but of no relevance. The algorthm for complementaton (negaton)

12 Basc Condtons and Implcatons. Arthmetc Operatons and Unts 9 of a number depends on the chosen number representaton, but s usually accomplshed by bt nverson and ncrementaton. Incrementaton and decrementaton are smplfed addtons wth one nput operand beng constantly or -. Equalty and magntude comparson operatons can also be regarded as smplfed addtons, where only some the respectve addton flags, but no sum bts are used as outputs. Fnally, shft by a constant number of bts and extenson operatons, as used n some of the above more complex arthmetc functons, can be accomplshed by approprate wrng and thus requre no addtonal hardware. Ths short overvew shows that the addton s the key arthmetc operaton, whch most other operatons are based on. Its mplementaton n hardware s therefore crucal for the effcent realzaton of almost every arthmetc unt n VLSI. Ths s n terms of crcut sze, computaton delay, and power consumpton... Number representaton schemes The representaton of numbers and the hardware mplementaton of arthmetc unts are strongly dependent on each other. On one hand, each number representaton requres dedcated computaton algorthms. On the other hand, effcent crcut realzatons may ask for adequate number representatons. Only fxed-pont number representatons are consdered n ths thess. Ths s justfed snce arthmetc operatons on floatng-pont numbers are accomplshed by applyng varous fxed-pont operatons on mantssa and exponent. Moreover, fxed-pont numbers are reduced to ntegers heren, snce every nteger can be consdered as a fracton multpled by a constant factor. Bnary number systems The radx- or bnary number system s the most wdely used number representaton, whch s due to ts mplementaton effcency and smplcty n dgtal crcut desgn. An n-bt number s represented as A = (a n? ; a n? ; : : : ; a ; a ), where a f; g. The followng representatons for unsgned and sgned fxed-pont numbers are used: Unsgned numbers are used for the representaton of postve ntegers (.e., natural numbers). Value: A = P n? = a, Range: [; n? ]. Two s complement s the standard representaton of sgned numbers. Value: A =?a n? n? + P n? = a, Range: [? n? ; n?? ], Complement:?A = n? A = A +, where A = (a n? ; a n? ; : : : ; a ; a ), Sgn: a n?, Propertes: asymmetrc range (.e., n? negatve numbers, ( n?? ) postve numbers), compatble wth unsgned numbers n most arthmetc operatons. One s complement s a smlar representaton as the two s complement. Value: A =?a n? ( n? + ) + P n? = a, Range: [?( n?? ); n?? ], Complement:?A = n? A? = A, Sgn: a n?, Propertes: double representaton of zero, symmetrc range, modulo ( n? ) number system. Sgn magntude s an alternatve representaton of sgned numbers. Here, the bts a n? ; a n? ; : : : ; a are the true magntude. Value: A =?a n? P n? = a, Range: [?( n?? ); n?? ], Complement:?A = (a n? ; a n? ; : : : ; a ; a ), Sgn: a n?, Propertes: double representaton of zero, symmetrc range. Due to ther advantages and wde-spread use, the unsgned and two s complement sgned number representatons wll be consdered throughout the thess.

13 Basc Condtons and Implcatons. Arthmetc Operatons and Unts Redundant number systems Some redundant number systems exst, whch e.g. allow for speedng-up arthmetc operatons [Kor9]. Carry-save s the redundant representaton of the result when addng up three numbers wthout carry propagaton (.e., the ndvdual carry bts are saved for later carry propagaton). A carry-save number conssts of two numbers, one contanng all carry bts and the other all sum bts. Delayed-carry or half-adder form [LJ96] s the correspondng representaton when addng up only two numbers. Sgned-dgt s a redundant number system, whch makes use of the dgt set f?; ; g. The carry-save number representaton plays an mportant role n multoperand adders (see Sec..). Otherwse, redundant number systems are of no concern n carry-propagate adders, snce they are used precsely to avod carry propagaton... Sequental and combnatonal crcuts Many arthmetc operatons can be realzed as combnatonal or sequental crcuts. Bt-seral or ppelned adders are examples for sequental adder crcuts. However, snce adder archtectures deal wth speedng up carrypropagaton logc, only combnatonal adder mplementatons are covered n ths thess... Synchronous and self-tmed crcuts The realzaton of a synchronous crcut can be done n a synchronous or a self-tmed asynchronous fashon, whch also nfluences the mplementaton of the combnatonal crcuts. In partcular, self-tmed combnatonal crcuts have to provde completon sgnals, whch are not trval to generate. As a matter of fact, synchronous crcut technques are standard n the VLSI desgn communty. However, adders are very appealng for self-tmed realzaton snce they have a short average carry-propagaton length (.e., O(log n)) [GO96]. Because the smplest adder archtecture namely the rpple-carry adder takes most advantage of self-tmed mplementaton, a further study of adder archtectures for self-tmed crcut realzaton makes no sense. Resdue number systems Resdue number system (RNS) do not use a fxed radx for all dgts, but are constructed from a set of dfferent resdues, so that each dgt has a dfferent radx [Kor9]. Arthmetc operatons n RNS can be computed on each dgt ndependently and n parallel. The resultng speed-up s consderable, but converson from and to conventonal number systems s very expensve. The ndvdual operatons performed on each sngle dgt are done usng normal or modular nteger arthmetc, and agan manly addtons. The nvestgatons on effcent nteger addton n ths thess thus also become mportant for RNS systems. In redundant number systems, the number of representable dgts s larger than the radx, thus allowng for multple representatons of the same number...6 Carry-propagate and carry-save adders Addton s a prefx problem (see Sec..), whch means that each result bt s dependent on all nput bts of equal or lower magntude. Propagaton of a carry sgnal from each bt poston to all hgher bt postons s necessary. Carrypropagate adders perform ths operaton mmedately. The requred carry propagaton from the least to the most sgnfcant bt results n a consderable crcut delay, whch s a functon of the word length of the nput operands. The most effcent way to speed-up addton s to avod carry propagaton, thus savng the carres for later processng. Ths allows the addton of two or more numbers n a very short tme, but yelds results n a redundant (carry-save) number representaton. Carry-save adders as the most commonly used redundant arthmetc

14 Basc Condtons and Implcatons. Crcut and Layout Desgn Technques adders play an mportant role n the effcent mplementaton of multoperand addton crcuts. They are very fast due to the absence of any carry-propagaton paths, ther structure s very smple, but the potental for further optmzaton s mnmal. The same holds for sgned-dgt adders, whch use a slghtly dfferent redundant number representaton. The addton results, however, usually have to be converted nto an rredundant nteger representaton n order to be processed further. Ths operaton s done usng a carry-propagate adder...7 Implcatons As we have seen so far, the combnatonal, bnary carry-propagate adder s one of the most often used and most crucal buldng block n dgtal VLSI desgn. Varous well-known methods exst for speedng-up carry-propagaton n adders, offerng very dfferent performance characterstcs, advantages, and dsadvantages. Some lack of understandng of the basc concepts and relatonshps often lead to suboptmal adder mplementatons. One goal of ths thess s the systematc nvestgaton and performance comparson of all exstng adder archtectures as well as ther optmzaton wth respect to cellbased desgn technologes.. Crcut and Layout Desgn Technques Table.: IC classfcaton scheme based on fabrcaton depth and desgn level. Fabrcaton Programmng Sem-custom Full-custom depth only fabrcaton fabrcaton Desgn Cell-based, as obtaned from Hand level schematc entry and/or synthess layout Type of Programm- Gate-array or Standard cell IC Fullntegrated able IC sea-of-gates IC (possbly also custom IC crcut (PLD, FPGA, wth macrocells CPLD, etc.) and megacells) and layouts are often collected n lbrares together wth automatc generators. Mega-cells are full-custom cells for unversal functons whch need no parameterzaton, e.g., mcroprocessor cores and perpherals. Macro-cells are used for large crcut components wth regular structure and need for wordlength parameterzaton, e.g., multplers, ROMs, and RAMs. Data paths are usually realzed n a bt-slced layout style, whch allows parameterzaton of word length (frst dmenson) and concatenaton of arbtrary data-path elements (second dmenson) for logc, arthmetc, and storage functons. Snce adders are too small to be mplemented as macro-cells, they are usually realzed as data-path elements. IC fabrcaton technologes can be classfed nto full-custom, sem-custom, and programmable ICs, as summarzed n Table. (taken from [Kae97]). Further dstnctons are made wth respect to crcut desgn technques and layout desgn technques, whch are strongly related... Layout-based desgn technques In layout-based desgn technques, dedcated full-custom layout s drawn manually for crcuts desgned at the transstor-level. The ntal desgn effort s very hgh, but maxmum crcut performance and layout effcency s acheved. Full-custom cells are entrely desgned by hand for dedcated hgh-performance unts, e.g., arthmetc unts. The tled-layout technque can be used to smplfy, automate, and parameterze the layout task. For reuse purposes, the crcuts.. Cell-based desgn technques At a hgher level of abstracton, arbtrary crcuts can be composed from elementary logc gates and storage elements contaned n a lbrary of pre-desgned cells. The layout s automatcally composed from correspondng layout cells usng dedcated layout strateges, dependng on the used IC technology. Cellbased desgn technques are used n standard-cell, gate-array, sea-of-gates, and feld-programmable gate-array (FPGA) technologes. The desgn of logc crcuts does not dffer consderably among the dfferent cell-based IC technologes. Crcuts are obtaned from ether schematc entry, behavoral synthess, or crcut generators (.e., structural synthess). Due to the requred generc propertes of the cells, more conventonal logc styles have to be used for ther crcut mplementaton.

15 Basc Condtons and Implcatons. Submcron VLSI Desgn The advantages of cell-based desgn technques le n ther unversal usage, automated synthess and layout generaton for arbtrary crcuts, portablty between tools and lbrares, hgh desgn productvty, hgh relablty, and hgh flexblty n floorplannng. Ths comes at the prce of lower crcut performance wth respect to speed and area. Cell-based desgn technques are manly used for the mplementaton of random logc (e.g., controllers) and custom crcuts for whch no approprate lbrary components are avalable and custom mplementaton would be too costly. Cell-based desgn technques are wdely used n the ASIC desgn communty. Standard cells Standard cells represent the hghest performance cell-based technology. The layout of the cells s full-custom, whch mandates for full-custom fabrcaton of the wavers. Ths n turn enables the combnaton of standard cells wth customlayout components on the same de. For layout generaton, the standard cells are placed n rows and connected through ntermedate routng channels. Wth the ncreasng number of routng layers and over-the-cell routng capabltes n modern process technologes, the layout densty of standard cells gets close to the densty obtaned from full-custom layout. The remanng drawback s the restrcted use of hgh-performance (transstor-level) crcut technques. desgn of arbtrary crcuts. Turnaround tmes are very fast makng FPGAs the deal soluton for rapd prototypng. On the other hand, low crcut performance, lmted crcut complexty, and hgh de costs severely lmt ther area of applcaton... Implcatons In the feld of hgh-performance IC desgn, where layout-based and transstorlevel desgn technques are appled, much research effort has been nvested n the realzaton of effcent adder crcuts, and many dfferent mplementatons have been proposed. Effcent adder mplementatons for cell-based desgn, however, have hardly been addressed so far. Here, the ssues to be nvestgated are the technology mappng, cell lbrary propertes, routng, synthess, and portablty aspects. The wdespread use of cell-based desgn technques justfes a closer nspecton of the effcent crcut mplementaton of addton and related arthmetc operatons.. Submcron VLSI Desgn Gate-arrays and sea-of-gates On gate-arrays and sea-of-gates, preprocessed wafers wth unconnected crcut elements are used. Thus, only metalzaton used for the nterconnect s customzed, resultng n lower producton costs and faster turnaround tmes. Crcut performance and layout flexblty s lower than for standard cells, whch n partcular decreases mplementaton effcency of regular structures such as macro-cells. Wth evolvng process technologes, feature szes of.m,.m, and less become standard. These submcron technologes offer smaller and faster crcut structures at lower supply voltages, resultng n consderably faster and more complex ICs wth a lower power dsspaton per gate. Changng physcal characterstcs, however, strongly nfluence crcut desgn. Increasng gate denstes and clockng frequences lead to hgher power denstes, makng low power an mportant ssue n order to be able to dsspate the hgh energy of large chps. FPGAs Feld-programmable gate-arrays (FPGA) are electrcally programmable generc ICs. They are organzed as an array of logc blocks and routng channels, and the confguraton s stored n a statc memory or programmed e.g. usng antfuses. Agan, a lbrary of logc cells and macros allows flexble and effcent.. Multlevel metal routng As processes wth three and more metalzaton levels become avalable, routng denstes ncrease massvely. Over-the-cell routng elmnates the drawback of area-consumng routng channels n cell-based technologes, yeldng layout denstes comparable to custom-layout. Ths also results n a larger amount

16 6 Basc Condtons and Implcatons. Automated Crcut Synthess and Optmzaton 7 of local nterconnects (crcut localty), hgher layout flexblty, and more effcent automated routers. Especally standard-cell technologes beneft from these advantages, provdng both hgh desgn productvty as well as good crcut and layout performance... Interconnectdelay The delay of nterconnectons becomes domnant over swtchng delays n submcron VLSI. Ths s because RC delays ncrease (hgher wre resstances at roughly constant capactances) and wre lengths typcally scale wth chp sze but not wth feature sze. Therefore, crcut connectvty, localty, and fan-out are becomng mportant performance optmzaton crtera... Implcatons Cell-based desgn technques take advantage from emergng submcron VLSI technologes, partly approachng denstes and performance of full-custom technques. Interconnect aspects have to be accounted for, also wth respect to the optmalty of crcut archtectures.. Automated Crcut Synthess and Optmzaton Crcut synthess denotes the automated generaton of logc networks from behavoral descrptons at an arbtrary level. Synthess s becomng a key ssue n VLSI desgn for many reasons. Increasng crcut complextes, shorter development tmes, as well as effcent and flexble usage of cell and component lbrares can only be handled wth the ad of powerful desgn automaton tools. Arthmetc synthess addresses the effcent mappng of arthmetc functons onto exstng arthmetc components and logc gates... Hgh-levelsynthess Hgh-level synthess, or behavoral/archtectural synthess, allows the translaton of algorthmc or behavoral descrptons of hgh abstracton level (e.g., by way of data dependency graphs) down to RTL (regster-transfer level) representaton, whch can be processed further by low-level synthess tools. The nvolved archtectural synthess, ncludng resource allocaton, resource bndng, and schedulng tasks, s far from beng trval and s currently researched ntensvely. Hgh-level arthmetc synthess makes use of arthmetc transformatons n order to optmze hardware usage under gven performance crtera. Thereby, arthmetc lbrary components are regarded as the resources for mplementng the basc arthmetc operatons... Low-levelsynthess Low-level synthess, or logc synthess, translates an RTL specfcaton nto a generc logc network. For random logc, synthess s acheved by establshng the logc equatons for all outputs and mplementng them n a logc network... Data-pathsynthess Effcent arthmetc crcuts contan very specfc structures of large logc depth and hgh factorzaton degree. Ther drect synthess from logc equatons s not feasble. Therefore, parameterzed netlst generators usng dedcated algorthms are used nstead. Most synthess tools nclude generators for the basc arthmetc functons, such as comparators, ncrementers, adders, and multplers. For other mportant operatons (e.g., squarng, dvson) and specalzed functons (e.g., addton wth flag generaton, multplcaton wthout fnal addton) usually no generators are provded and thus synthess of effcent crcutry s not avalable. Also, the performance of the commonly used crcut archtectures vares consderably, whch often leads to suboptmal cell-based crcut mplementatons... Optmzaton of combnatonal crcuts The optmzaton of combnatonal crcuts connotes the automated mnmzaton of a logc netlst wth respect to area, delay, and power dsspaton measures of the resultng crcut, and the technology mappng (.e., mappng of the logc network onto the set of logc cells provded by the used technology/lbrary). The appled algorthms are very powerful for optmzaton of random logc by performng steps lke flattenng, logc mnmzaton, tmng-drven factorzaton, and technology mappng. However, the potental for optmzaton

17 Basc Condtons and Implcatons. Crcut Complexty and Performance Modelng 9 s rather lmted for networks wth large logc depth and hgh factorzaton degree, especally arthmetc crcuts. There, only local logc mnmzaton s possble, leavng the global crcut archtecture bascally unchanged. Thus, the realzaton of well-performng arthmetc crcuts reles more on effcent data-path synthess than on smple logc optmzaton... Hardware descrpton languages Hardware descrpton languages allow the specfcaton of hardware at dfferent levels of abstracton, servng as entry ponts to hardware synthess. VHDL, as one of the most wdely used and most powerful languages, enables the descrpton of crcuts at the behavoral and structural level. In partcular, parameterzed netlst generators can be wrtten n structural VHDL. Synthess of arthmetc unts s ntated by usng the standard arthmetc operator symbols n the VHDL code, for whch the correspondng bult-n netlst generators are called by the synthess tool. Bascally, the advantages of VHDL over schematc entry le n the possblty of behavoral hardware descrpton, the parameterzablty of crcuts, and portablty of code thanks to language standardzaton...6 Implcatons Due to ther manyfold occurrences and flexble usage, arthmetc unts form an ntegral part n automated hardware synthess for hgh-productvty VLSI desgn. The used crcut archtectures must be hghly flexble and easly parameterzable and must result n smple netlst generators and effcent crcut mplementatons. Thus, ths thess also focuses on algorthms for the synthess of adder crcuts and nvestgates the sutablty of varous adder archtectures wth respect to netlst synthess and optmzaton.. Crcut Complexty and Performance Modelng One mportant aspect n desgn automaton s the complexty and performance estmaton of a crcut early n the desgn cycle,.e., pror to the tme-consumng logc synthess and physcal layout phases. At a hgher desgn level, ths s acheved by usng characterzaton nformaton of the hgh-level components to be used and by complexty estmaton of the nterconnect. At gate level, however, estmaton s more dffcult and less accurate because crcut sze and performance strongly depend on the gate-level synthess results and on the physcal cell arrangement and routng. For a rough prelmnary characterzaton of adder archtectures, we are nterested n smple complexty and performance models for gate-level crcuts. Gven a crcut specfed by logc formulae or a generc netlst (.e., a netlst bult from basc logc gates), we need estmatons of the expected area, speed, and power dsspaton for a compled cell-based crcut as a functon of the operand word length... Areamodelng Slcon area on a VLSI chp s taken up by the actve crcut elements and ther nterconnectons. In cell-based desgn technques, the followng crtera for area modelng can be formulated: Total crcut complexty (GE total ) can be measured by the number of gate equvalents GE ( -nput NAND-gate MOSFETs). Crcut area (A crcut ) s occuped by logc cells and nter-cell wrng. In technologes wth three and more metal layers, over-the-cell routng capabltes allow the overlap of cell and wrng areas, as opposed to -metal technologes. Ths means that most of the cell area can also be used for wrng, resultng n very low routng area (A factors. = crcut cells + A wrng ) A Total cell area (A cells ) s roughly proportonal to the number of transstors or gate (GE equvalents total ) contaned n a crcut. Ths number s nfluenced by technology mappng, but not by physcal layout. Thus, cell area can be roughly estmated from a generc crcut descrpton (e.g. logc equatons or netlst wth smple gates) and can be precsely determned from a syntheszed (A netlst. / GE total cells ) Wrng area (A wrng ) s proportonal to the total wre length. The exact wre lengths, however, are not known pror to physcal layout. wrng / L total ) (A

18 Basc Condtons and Implcatons. Crcut Complexty and Performance Modelng Total wre length (L total ) can be estmated from the number of nodes and the average wre length of a node [Feu, KP9] or, more accurate, from the sum of cell fan-out and the average wre length of cell-tocell connectons (.e. accounts for the longer wre length of nodes wth hgher fan-out). The wre lengths also depend on crcut sze, crcut connectvty (.e., localty of connectons), and layout topology, whch are not known pror to crcut parttonng and physcal layout [RK9]. total / F O total ) (L Cell fan-out (F O) s the number of cell nputs a cell output s drvng. Fan-n s the number of nputs to a cell [WE9], whch for many combnatonal gates s proportonal to the sze of the cell. Snce the sum of cell (F O fan-out total ) of a crcut s equvalent to the sum of cell fan-n, t s also proportonal to crcut (F O sze. / GE total total ) Therefore, n a frst approxmaton, cell area as well as wrng area are proportonal to the number of gate equvalents. More accurate area estmatons before performng actual technology mappng and crcut parttonng are hardly possble. For crcut comparson purposes, the proportonalty factor s of no (A concern. / GE total / F O total crcut ) Our area estmaton model we are nterested n must be smple to compute whle beng as accurate as possble, and t should antcpate from logc equatons or generc netlsts (.e. netlsts composed of smple logc gates) alone. By consderng the above observatons, possble canddates are: Unt-gate area model Ths s the smplest and most abstract crcut area model, whch s often used n the lterature [Tya9]. A unt gate s a basc, monotonc -nput gate (or logc operaton, f logc equatons are concerned), such as AND, OR, NAND, and NOR. Basc, non-monotonc -nput gates lke XOR and XNOR are counted as two unt gates, reflectng ther hgher crcut complextes. Complex gates as well as mult-nput basc gates are bult from -nput basc gates and ther gate count equals the sum of gate counts of the composng cells. Fan-n area model In the fan-n model, the sze of - and mult-nput basc cells s measured by countng the number of nputs (.e., fan-n). Complex cells are agan composed of basc cells wth ther fan-n numbers summed up, whle the XOR/XNOR-gates are treated ndvdually. The obtaned numbers bascally dffer from the unt-gate numbers only by an offset of (e.g., the AND-gate counts as one unt gate but has a fan-n of two). Other area models The two prevous models do not account for transstorlevel optmzaton possbltes n complex gates, e.g., n multplexers and full-adders. More accurate area models need ndvdual gate count numbers for such complex gates. However, some degree of abstracton s sacrfced and applcaton on arbtrary logc equatons s not possble anymore. The same holds true for models whch take wrng aspects nto consderaton. One example of a more accurate area model s the gate-equvalents model (GE) mentoned above, whch bases on gate transstor counts and therefore s only applcable after synthess and technology mappng. Inverters and buffers are not accounted for n the above area models, whch makes sense for pre-synthess crcut descrptons. Note that the bggest dfferences n bufferng costs are found between low fan-out and hgh fan-out crcuts. Wth respect to area occupaton however, these effects are partly compensated because hgh fan-out crcuts need addtonal bufferng whle low fan-out crcuts usually have more wrng. Investgatons showed that the unt-gate model approach for the area estmaton of complex gates, such as multplexers and full-adders, does not ntroduce more naccuraces than e.g. the neglecton of crcut connectvty for wrng area estmaton. Wth the XOR/XNOR beng treated separately, the unt-gate model yelds acceptable accuracy at the gven abstracton level. Also, t perfectly reflects the structure of logc equatons by modelng the basc logc operators ndvdually and by regardng complex logc functons as composed from basc ones. Investgatons showed comparable performance for the fan-n and the unt-gate models due to ther smlarty. After all, the unt-gate model s very commonly used n the lterature. Therefore, t s used n ths work for area estmatons and comparsons from logc crcut specfcatons. Comparson results of placed and routed standard-cell solutons wll follow n Secton.... Delaymodelng Propagaton delay n a crcut s determned by the cell and nterconnecton delays on the crtcal path (.e. longest sgnal propagaton path n a combna-

19 Basc Condtons and Implcatons. Crcut Complexty and Performance Modelng tonal crcut). As opposed to area estmaton, not average and total numbers are of nterest, but ndvdual cell and node values are relevant for path delays. Crtcal path evaluaton s done by statc tmng analyss whch nvolves graph-based search algorthms. Of course, tmngs are also dependent on temperature, voltage, and process parameters whch, however, are not of concern for our comparson purposes. Maxmum delay (t crt path ) of a crcut s equal to the sum of cell nertal delays, cell output ramp delays, and wre delays on the crtcal (t path. path = P crt path ((t cell + t ramp ) + P crt path t crt wre) Cell delay (t cell ) depends on the transstor-level crcut mplementaton and the complexty of a cell. All smple gates have comparable delays. Complex gates usually contan tree-lke crcut and transstor arrangements, resultng n logarthmc delay-to-area dependences. cell / log (A cell )) (t Ramp delay (t ramp ) s the tme t takes for a cell output to drve the attached capactve load, whch s made up of nterconnect and cell nput loads. The ramp delay depends lnearly on the capactve load attached, whch n turn depends lnearly on the fan-out of the cell. ramp / F O cell ) (t Wre delay or nterconnecton delay (t wre ) s the RC-delay of a wre, whch depends on the wre length. RC-delays, however, are neglgble compared to cell and ramp delays for small crcuts such as the adders nvestgated n ths (t work. = wre ). Thus, a rough delay estmaton s possble by consderng szes and, wth a smaller weghtng factor, fan-out of the cells on the crtcal path. (log path / P crt path (A cell) + kf O cell crt )) (t Possble delay estmaton models are: Unt-gate delay model The unt-gate delay model s smlar to the unt-gate area model. Agan, the basc -nput gates (AND, OR, NAND, NOR) count as one gate delay wth the excepton of the XOR/XNOR-gates whch count as two gate delays [Tya9]. Complex cells are composed of basc cells usng the fastest possble arrangement (.e., tree structures wherever possble) wth the total gate delay determned accordngly. Fan-n delay model As for area modelng, fan-n numbers can be taken nstead of unt-gate numbers. Agan, no advantages over the unt-gate model are observed. Fan-out delay model The fan-out delay model bases on the unt-gate model but ncorporates fan-out numbers, thus accountng for gate fan-out numbers and nterconnecton delays [WT9]. Indvdual fan-out numbers can be obtaned from a generc crcut descrpton. A proportonalty factor has to be determned for approprate weghtng of fan-out wth respect to unt-gate delay numbers. Other delay models Varous delay models exst at other abstracton levels. At the transstor level, transstors can be modeled to contrbute one unt delay each (-model [CSTO9]). At a hgher level, complex gates lke full-adders and multplexers can agan be modeled separately for hgher accuracy [Kan9, CSTO9]. The mpact of large fan-out on crcut delay s hgher than on area requrements. Ths s because hgh fan-out nodes lead to long wres and hgh capactve loads and requre addtonal bufferng, resultng n larger delays. Therefore, the fan-out delay model s more accurate than the unt-gate model. However, due to the much smpler calculaton of the unt-gate delay model and ts wdespread use, as well as for compatblty reasons wth the chosen unt-gate area model, ths model wll be used for the crcut comparsons n ths work. As already mentoned, delay calculaton for a crcut requres statc tmng analyss, whch corresponds to the search for the longest path n a weghted drected acyclc graph. In our case, false path detecton [MB9] s not of mportance snce false paths do not occur n adder crcuts wth one excepton, whch wll be dscussed later... Power measures and modelng An ncreasngly mportant performance parameter for VLSI crcuts s power dsspaton. Peak power s a problem wth respect to crcut relablty (e.g. voltage drop on power buses, ground bounce) whch, however, can be dealt wth by careful desgn. On the other hand, average power dsspaton s A false path s a sgnal path n a combnatonal crcut whch cannot be senstzed.

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant