Conditional Speculative Decimal Addition*

Similar documents
RADIX-10 PARALLEL DECIMAL MULTIPLIER

CHAPTER 4 PARALLEL PREFIX ADDER

Lecture 3: Computer Arithmetic: Multiplication and Division

FPGA IMPLEMENTATION OF RADIX-10 PARALLEL DECIMAL MULTIPLIER

Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Programming in Fortran 90 : 2017/2018

Area Efficient Self Timed Adders For Low Power Applications in VLSI

Mathematics 256 a course in differential equations for engineering students

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Newton-Raphson division module via truncated multipliers

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

A Family of Adders. Simon Knowles Element 14, Aztec Centre, Bristol, UK Abstract. 2. Addition as a Prefix Problem. 1.

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Dijkstra s Single Source Algorithm. All-Pairs Shortest Paths. Dynamic Programming Solution. Performance. Decision Sequence.

CS1100 Introduction to Programming

Simulation Based Analysis of FAST TCP using OMNET++

Dijkstra s Single Source Algorithm. All-Pairs Shortest Paths. Dynamic Programming Solution. Performance

FPGA IMPLEMENTATION OF A PARALLEL PIPELINED HARDWARE GENETIC ALGORITHM (PPHGA) AND ITS APPLICATIONS IN FUNCTION APPROXIMATION

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

Machine Learning: Algorithms and Applications

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Oracle Database: SQL and PL/SQL Fundamentals Certification Course

An Optimal Algorithm for Prufer Codes *

Sorting: The Big Picture. The steps of QuickSort. QuickSort Example. QuickSort Example. QuickSort Example. Recursive Quicksort

MATHEMATICS FORM ONE SCHEME OF WORK 2004

Analysis of Min Sum Iterative Decoder using Buffer Insertion

The Codesign Challenge

FPGA Based Fixed Width 4 4, 6 6, 8 8 and Bit Multipliers using Spartan-3AN

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

A New Family of High Performance Parallel Decimal Multipliers

Binary Adder Architectures for Cell-Based VLSI and their Synthesis

Specifications in 2001

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Using Delayed Addition Techniques to Accelerate Integer and Floating-Point Calculations in Configurable Hardware

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

A New Memory Reduced Radix-4 CORDIC Processor For FFT Operation

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

FPGA Implementation of CORDIC Algorithms for Sine and Cosine Generator

Problem Set 3 Solutions

An efficient iterative source routing algorithm

Introduction to Programming. Lecture 13: Container data structures. Container data structures. Topics for this lecture. A basic issue with containers

Sorting. Sorting. Why Sort? Consistent Ordering

Solving two-person zero-sum game by Matlab

Module Management Tool in Software Development Organizations

Reclaiming the Brain: Useful OpenFlow Functions in the Data Plane

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

High-Boost Mesh Filtering for 3-D Shape Enhancement

CHAPTER 10: ALGORITHM DESIGN TECHNIQUES

A Radix-10 SRT Divider Based on Alternative BCD Codings

Assembler. Building a Modern Computer From First Principles.

Feature Reduction and Selection

Verification by testing

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Outline. Midterm Review. Declaring Variables. Main Variable Data Types. Symbolic Constants. Arithmetic Operators. Midterm Review March 24, 2014

AP PHYSICS B 2008 SCORING GUIDELINES

A fault tree analysis strategy using binary decision diagrams

Wishing you all a Total Quality New Year!

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Line Clipping by Convex and Nonconvex Polyhedra in E 3

Machine Learning 9. week

Hermite Splines in Lie Groups as Products of Geodesics

Smoothing Spline ANOVA for variable screening

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallel Solutions of Indexed Recurrence Equations

Q.f f fractional bits : [-2 b-1 / 2 f, (2 b-1-1) / 2 f ] e.g. Q.15 Needs f+1 bits at least, b>f Resolution 2 -f

CPE 628 Chapter 2 Design for Testability. Dr. Rhonda Kay Gaede UAH. UAH Chapter Introduction

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS

Random Kernel Perceptron on ATTiny2313 Microcontroller

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

Mallathahally, Bangalore, India 1 2

A Frame Packing Mechanism Using PDO Communication Service within CANopen

CMPS 10 Introduction to Computer Science Lecture Notes

F Geometric Mean Graphs

A High-Performance Significand BCD Adder with IEEE Decimal Rounding

Research and Application of Fingerprint Recognition Based on MATLAB

Radial Basis Functions

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Algorithm To Convert A Decimal To A Fraction

Modelling and traceability for computationally-intensive precision engineering and metrology

Decomposition of Grey-Scale Morphological Structuring Elements in Hardware

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

CE 221 Data Structures and Algorithms

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Review of Basic Computer Architecture

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

An Image Compression Algorithm based on Wavelet Transform and LZW

Sorting and Algorithm Analysis

Transcription:

Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant PGIDT03TIC10502PR. RNC7-11th July, 2006 1

Condtonal Speculatve Decmal Addton Contents Introducton Demand of Hgh-Performance Decmal Arthmetc. Revson of the IEEE-754 Standard for Floatng Pont. Prevous Work on Integer Decmal Addton. Basc Decmal Addton. Drect Decmal Addton. Speculatve Decmal Addton. Proposed Method: Condtonal Speculatve Decmal Addton. Algorthm. Implementatons: Parallel Prefx Adders. Bnary Carry Tree: Kogge-Stone and Ladner-Fscher. Quaternary Carry Tree. Delay-Area Estmatons and Comparson. Delay-area model for Statc CMOS gates based on Logcal Effort. Conclusons. RNC7-11th July, 2006 2

Condtonal Speculatve Decmal Addton Contents Introducton Demand of Hgh-Performance Decmal Arthmetc. Revson of the IEEE-754 Standard for Floatng Pont. Prevous Work on Integer Decmal Addton. Basc Decmal Addton. Drect Decmal Addton. Speculatve Decmal Addton. Proposed Method: Condtonal Speculatve Decmal Addton. Algorthm. Implementatons: Parallel Prefx Adders. Bnary Carry Tree: Kogge-Stone and Ladner-Fscher. Quaternary Carry Tree. Delay-Area Estmatons and Comparson. Delay-area model for Statc CMOS gates based on Logcal Effort. Conclusons. RNC7-11th July, 2006 3

Introducton Demand of Hgh-Performance Decmal Arthmetc. Need of hardware support: fnancal and e-commerce applcatons. Bnary floatng-pont unts ntroduce naccurate results. Decmal software mplementatons do not satsfy performance demands. Reduced hardware support (IBM S/390 seres:g4,g5,g6,z900, z990). Only usual decmal nteger operatons mproved n hardware. Does not exst hardware mplementatons of decmal floatng-pont. RNC7-11th July, 2006 4

Introducton Revson of the IEEE-754 Standard for Floatng Pont. Current draft revson of IEEE-754 ncorporates specfcatons for decmal arthmetc. Decmal formats: 32-bt, 64-bt and 128-bt. Sgn, exponent and sgnfcand (DPD encodng BCD). Roundng modes and excepton handlng for bnary and decmal. Conversons between nteger and floatng-pont formats. Operatons defned: Add, subtract, multply, fused multply-add, dvde, square root. Software or/and Hardware mplementatons. RNC7-11th July, 2006 5

Condtonal Speculatve Decmal Addton Contents Introducton Demand of Hgh-Performance Decmal Arthmetc. Revson of the IEEE-754 Standard for Floatng Pont. Prevous Work on Integer Decmal Addton. Basc Decmal Addton. Drect Decmal Addton. Speculatve Decmal Addton. Proposed Method: Condtonal Speculatve Decmal Addton. Algorthm. Implementatons: Parallel Prefx Adders. Bnary Carry Tree: Kogge-Stone and Ladner-Fscher. Quaternary Carry Tree. Delay-Area Estmatons and Comparson. Delay-area model for Statc CMOS gates based on Logcal Effort. Conclusons. RNC7-11th July, 2006 6

Prevous Work on Integer Decmal Addton Basc Integer Decmal Addton (Hgh-level) Inputs: A = d 1 = 0 A 10 B = d 1 = 0 B 10 C 0 = Cn Output: S d 1 = S 10 = 0 Basc decmal carry propagate recurrence: C = A ± B + C + 1 /10 ( A ± B C ) S mod + = 10 Subtracton 10 s complement of subtrahend: d d = 10 + B = 1 ( 9 B ) 10 + 1 0 RNC7-11th July, 2006 7

Prevous Work on Integer Decmal Addton Basc Integer Decmal Addton (BCD) Dgts represented n BCD-8421 (4-bts/dgt). Subtracton : ( 9 B ) = 15 ( B + 6) = B + 6 Basc Addton/Subtracton n BCD: B B + 6 f ( op == sub) = B else C S ( + 1 = + 6) /16 S ( A + B C ) S = mod + 16 ( S + 6) mod f ( C+ 1 == 1 = S else 16 ) Problem: carry chan Improve delay of carry propagate recurrence RNC7-11th July, 2006 8

Prevous Work on Integer Decmal Addton Drect Decmal Addton Uses the followng decmal carry recurrence: C +1 = G + K C G decmal carry generate. K decmal carry kll. G true when A +B >10 K true when A +B <9 Can be evaluated usng conventonal parallel carry evaluaton technques: Carry lookahead. Parallel Prefx. Quaternary carry tree: 1 decmal carry per 4-bts. RNC7-11th July, 2006 9

Prevous Work on Integer Decmal Addton G = G + K Drect Decmal Addton G and K can be expressed n terms of bnary g [j] and k [j]: g [0] K = K k [0] ( k [2] k [1] ) G = g [3] + g [2] g [1] + k [3] + = 0,..., d 1 j = 0,1,2,3 K = k [3] + g [2] + k [2] g [1] g [j] and k [j] are the nputs of the quaternary carry tree. G and K are evaluated usng specfc logc and a logc level of the quaternary carry tree. Implemented n the FXU of the G4, G5 and G6 IBM S/390 seres. RNC7-11th July, 2006 10

Prevous Work on Integer Decmal Addton Implementaton of Drect Decmal Addton Operand A Operand B Performs bnary and drect decmal addtons/subtractons. STAGES Operand setup. Pre-sum Carry-evaluaton (precarry and carry tree). Sum. Quaternary carry tree (sparse tree, 1-n-4 carres). Evaluaton of decmal carry-generate and carry-kll sgnals. Sum performed usng 4- bt carry-select adders plus a dgt addton of 6. PRESUM C +1 (C =1) d Carry Select Adder +6 +6 1 0 Mux2 1 0 Mux2 Sum (Mux2 level) S C +1 (C =0) d 1-n-4 Carry Sgnals B+6 Mux2 0 1 0 generate & kll sgnals Parallel Prefx Carry Network (Quaternary Tree) sub RNC7-11th July, 2006 11 1 d s SIGNALS FOR CARRY GENERATION Decmal G & K sgnals 1 Mux2 0 1st level Carry Network d Crtcal path

Prevous Work on Integer Decmal Addton Speculatve Decmal Addton Speculatve Addton/Subtracton characterstcs: Intal (uncondtonal) sum of nput dgt+6 (wthout carry propagaton). ( A + B + C ) S = mod 6 + C = S S 16 + 1 /16 Fnal correcton of S -6 (wthout carry propagaton). ( S 6) mod f ( C+ 1 == 0 = S else 16 ) Bnary carres of S at decmal postons = decmal carres allows bnary parallel carry evaluaton technques. RNC7-11th July, 2006 12

Prevous Work on Integer Decmal Addton Speculatve Decmal Addton Two possbltes for the evaluaton of S : 1. Usng a parallel prefx carry tree. XOR operaton + post-correcton (after carry evaluaton). 2. Usng a quaternary carry tree (sparse). 4-bt carry select adders + correcton (n parallel wth carry evaluaton). Several choces for ntal sum +6 smlar mplementatons. ( A + B + 6) = ( A + 6) + B A + ( B + 6) ( A + 3) + ( B + 3) Implemented n the FXU of the IBM z900 and z990. RNC7-11th July, 2006 13

Prevous Work on Integer Decmal Addton Implementatons of Speculatve Decmal Addton Operand A Operand B Performs bnary and speculatve decmal addtons/subtractons. B+6 1 0 sub STAGES d a 1 0 Operand setup. Pre-sum Carry-evaluaton (pre-carry and carry tree). Sum. Post-correcton. Bnary carry tree (Kogge-Stone, Ladner-Fscher, etc ) Needs post-correcton n the crtcal path. Presum (XOR level) Sum (XOR level) S -6 1 generate & kll sgnals Parallel Prefx Carry Network (Bnary Tree) 1-n-1 Carry Sgnals Mux2 RNC7-11th July, 2006 DECIMAL CORRECTION 14 S 0 S C +1 1-n-4 Carry Sgnals d d Crtcal path

Prevous Work on Integer Decmal Addton Implementatons of Speculatve Decmal Addton Performs bnary and speculatve decmal addtons/subtractons. S s [3] s [2] s [1] s [0] Operand A Operand B STAGES Operand setup. Pre-sum Carry-evaluaton (pre-carry and carry tree). Sum. Quaternary carry tree (sparse tree, 1-n-4 carres). 4-bt carry-select adders. Sum correcton performed along wth carry evaluaton. s [3] s [2] s [1] s [0] S - 6 PRESUM C +1 (C =1) d Carry-Select Adder S S -6 1 0 1 0 Mux2 Mux2 Sum ( level) B+6 1 0 generate & kll sgnals Parallel Prefx Carry Network (Quaternary Tree) 1-n-4 Carry Sgnals Crtcal path S RNC7-11th July, 2006 15 d a S C +1 (C =0) S -6 d 1 0 sub

Condtonal Speculatve Decmal Addton Contents Introducton Demand of Hgh-performance Decmal Arthmetc. Revson of the IEEE-754 Standard for Floatng Pont. Prevous Work on Integer Decmal Addton. Basc Decmal Addton. Drect Decmal Addton. Speculatve Decmal Addton. Proposed Method: Condtonal Speculatve Decmal Addton. Algorthm. Implementatons: Parallel Prefx Adders. Bnary Carry Tree: Kogge-Stone and Ladner-Fscher. Quaternary Carry Tree. Delay-Area Estmatons and Comparson. Delay-area model for Statc CMOS gates based on Logcal Effort. Conclusons. RNC7-11th July, 2006 16

Proposed Method: Condtonal Speculatve Decmal Addton Algorthm Motvaton: mprove uncondtonal speculaton. Reducng the complexty of sum dgts correcton. Removng post-correcton from crtcal path delay. Soluton: Fnd a smple condton to reduce the values for whch the speculaton fals smple scheme for sum dgts correcton. RNC7-11th July, 2006 17

Proposed Method: Condtonal Speculatve Decmal Addton A B U A ( B ) U a [0] b [0] Algorthm Dvson of nput dgts n upper (3 left bts) and lower (rght bt) parts: C +1 S x x x x x x x x x x x x ( S ) U c [1] Wrong speculaton : s [0] Condton for speculaton : C We add +6 n ths case. U A + ( ) U B 8 U ( U A + B ) + 6 = 14 0 1 1 0 x x x x x x x x x x x x ( S ) U C + 1 = and c [ 1] = 0 1 U ( ) == 14 8 S Correcton (14 8) supposton real C +1 = 0 RNC7-11th July, 2006 18

Proposed Method: Condtonal Speculatve Decmal Addton Algorthm Sgnals for condtonal speculaton (detecton of U A + ( ) U B 8) r = k[ 3] + g[2] + k[1] g[1] For addton (d a ==1) t = a[ 3] + k[3] ( g[2] + k[2] k[1]) For subtracton (d s ==0) Condtonal speculaton: ( S ) U = Add 6 f f ( r ( r == == 1) 0) For addton (d a ==1) ( S ) U = Add 6 f ( t f ( t == == 1) 0) For subtracton (d s ==0) RNC7-11th July, 2006 19

Proposed Method: Condtonal Speculatve Decmal Addton Implementatons Goal: smplfy the sum correcton of the speculatve methods. 1. Full bnary parallel prefx carry tree confguratons Improve delay elmnatng post-correcton from crtcal path. 2. Quaternary carry tree confguratons Improve area smplfyng correcton. Lower dependency on the carry tree topology More flexblty to choose the adder archtecture and area/latency trade-offs. Combned bnary/decmal mplementatons Effcent mplementaton usng any exstng bnary parallel prefx adder. RNC7-11th July, 2006 20

Proposed Method: Condtonal Speculatve Decmal Addton Implementatons Performs bnary and condtonal speculatve decmal addtons/subtractons. Operand A Operand B Bnary carry tree (Kogge-Stone, Ladner- Fscher, etc ) Avods post-correcton n the crtcal path. A+6 d a Cond. Spec. ctrl sgnals r t ds B+6 1 0 sub 1 0 1 0 STAGES PRESUM Operand setup. XOR level generate & kll sgnals Pre-sum Carry-evaluaton (pre-carry and carry tree). Sum. Presum correcton Sum Parallel Prefx Carry Network (Bnary Tree) 1-n-1 Carry Sgnals Crtcal path S RNC7-11th July, 2006 21

Proposed Method: Condtonal Speculatve Decmal Addton Implementatons Smple correcton: (d == 1) Black gates replace 111- by 100- when U That s, ( U A + B ) + 6 = 14 and c [ 1] = U ( ) == 14 Addtonal gates (black) not n the crtcal path (grey). S 0 PRESUM (dgt) p [3] p [2] p [1] p [0] Operand A Operand B d A+6 Cond. Spec. ctrl sgnals d r a t ds B+6 1 0 sub 1 0 1 0 PRESUM SUM (dgt) c [3] c [2] c [1] c [0] 1 0 1 0 s [3] s [2] s [1] s [0] XOR level Presum correcton Sum S generate & kll sgnals Parallel Prefx Carry Network (Bnary Tree) 1-n-1 Carry Sgnals Crtcal path RNC7-11th July, 2006 22

Proposed Method: Condtonal Speculatve Decmal Addton Performs bnary and condtonal speculatve decmal addtons/subtractons. Quaternary carry tree (sparse tree, 1-n-4 carres). 4-bt carry-select adders. Sum correcton performed n carry-select adders. Smplfed selecton functon (only for sum dgts correcton): Decmal carres do not depend on condton for speculaton STAGES Operand setup. Pre-sum Carry-evaluaton (pre-carry and carry tree). Sum. Implementatons Operand A A+6 1 0 PRESUM Carry-Select Adder Sum Operand B generate & kll sgnals Parallel Prefx Carry Network (Quaternary Tree) 1-n-4 Carry Sgnals RNC7-11th July, 2006 23 d a Cond. spec. control sgnals r t S d s B+6 1 0 1 0 1 0 Crtcal path sub

Proposed Method: Condtonal Speculatve Decmal Addton Implementatons Modfed 4-bt carry select adder. Smple correcton: Replace 111- by 100- when (d == 1) Performed along wth carry computaton Addtonal gates (black) not n the crtcal path (grey). U ( ) == 14 S PRESUM - Carry-Select (dgt) p d p [3] k [1] p [1] [3]p [2]p [1] g [1] k [2] g [2] p [2] k [0] g [0] p [0] Operand A A+6 1 0 d a Operand B B+6 1 0 sub OAI OAI OAI OAI PRESUM Cond. spec. control sgnals r t d s generate & kll sgnals SUM (dgt) 1 0 1 0 1 0 1 0 s [3] s [2] s [1] s [0] C 1 0 1 Carry-Select Adder Sum S 0 Parallel Prefx Carry Network (Quaternary Tree) 1-n-4 Carry Sgnals Crtcal path RNC7-11th July, 2006 24

Condtonal Speculatve Decmal Addton Contents Introducton Demand of Hgh-performance Decmal Arthmetc. Revson of the IEEE-754 Standard for Floatng Pont. Prevous Work on Integer Decmal Addton. Basc Decmal Addton. Drect Decmal Addton. Speculatve Decmal Addton. Proposed Method: Condtonal Speculatve Decmal Addton. Algorthm. Implementatons: Parallel Prefx Adders. Bnary Carry Tree: Kogge-Stone and Ladner-Fscher. Quaternary Carry Tree. Delay-Area Estmatons and Comparson. Delay-area model for Statc CMOS gates based on Logcal Effort. Conclusons. RNC7-11th July, 2006 25

Delay-Area Estmatons and Comparson Delay-Area of Statc CMOS Gates Delay model for statc CMOS gates based on Logcal Effort. Delay values gven n FO4 unts (1x nverter wth fanout 4 1x nv). Area values gven n 1x Nand2 gate unts. Rough model vald for comparson among archtectures but not for obtanng precse absolute evaluaton results. We take nto account loads but nether nterconnectons nor gate szng optmzatons (we assume gates wth the drve strength of mn. szed nv. and ntroduce buffers when necessary). RNC7-11th July, 2006 26

Delay-Area Estmatons and Comparson Area-Delay Evaluaton Results 64-bt combned bnary/decmal adders Prefx tree Drect Decmal Speculatve Proposed Delay (t fo4 ) Area (Nand2) Delay (t fo4 ) Area (Nand2) Delay (t fo4 ) Area (Nand2) K-S ---- ---- 19.25 (1.14x) 2360 (1x) 16.85 (1x) 2660 (1.13x) L-F ---- ---- 20.65 (1.13x) 1985 (1x) 18.25 (1x) 2290 (1.15x) Q-T 16.85 (1.08x) 3251 (1.22x) 15.55 (1x) 2825 (1.06x) 15.55 (1x) 2655 (1x) In brackets the relatve ratos for each parallel prefx confguraton. RNC7-11th July, 2006 27

Area-Delay Space of Analyzed Adders Hardware Complexty (# nand 2 gates) 3500 3000 2500 2000 1500 1000 500 Speculatve Decmal Proposed QT QT QT K S K S L F L F K S QT L F Bnary adders Drect Decmal Bnary/Decmal Combned Adders: Drect Decmal no apparent advantage respect speculatve methods. For low latency Q-T best choce (our proposal requres less hardware). For low hardware cost and area-latency trade-off, L-F schemes are the best alternatves. 0 0 5 10 15 20 25 Delay (# FO4) Proposed combned Q-T adder only 1.10 slower than bnary Q-T although 1.65x more complex. RNC7-11th July, 2006 28

Condtonal Speculatve Decmal Addton Contents Introducton Demand of Hgh-performance Decmal Arthmetc. Revson of the IEEE-754 Standard for Floatng Pont. Prevous Work on Integer Decmal Addton. Basc Decmal Addton. Drect Decmal Addton. Speculatve Decmal Addton. Proposed Method: Condtonal Speculatve Decmal Addton. Algorthm. Implementatons: Parallel Prefx Adders. Bnary Carry Tree: Kogge-Stone and Ladner-Fscher. Quaternary Carry Tree. Delay-Area Estmatons and Comparson. Delay-area model for Statc CMOS gates based on Logcal Effort. Conclusons. RNC7-11th July, 2006 29

Condtonal Speculatve Decmal Addton Conclusons New hgh-performance algorthm for decmal nteger addton/subtracton. Avod the penalty delay of post-correcton schemes. Effcent mplementaton usng parallel prefx adders: both bnary and quaternary carry tree confguratons. Evaluaton results show very compettve area-delay fgures respect to commercal and patented mplementatons. RNC7-11th July, 2006 30