Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant PGIDT03TIC10502PR. RNC7-11th July, 2006 1
Condtonal Speculatve Decmal Addton Contents Introducton Demand of Hgh-Performance Decmal Arthmetc. Revson of the IEEE-754 Standard for Floatng Pont. Prevous Work on Integer Decmal Addton. Basc Decmal Addton. Drect Decmal Addton. Speculatve Decmal Addton. Proposed Method: Condtonal Speculatve Decmal Addton. Algorthm. Implementatons: Parallel Prefx Adders. Bnary Carry Tree: Kogge-Stone and Ladner-Fscher. Quaternary Carry Tree. Delay-Area Estmatons and Comparson. Delay-area model for Statc CMOS gates based on Logcal Effort. Conclusons. RNC7-11th July, 2006 2
Condtonal Speculatve Decmal Addton Contents Introducton Demand of Hgh-Performance Decmal Arthmetc. Revson of the IEEE-754 Standard for Floatng Pont. Prevous Work on Integer Decmal Addton. Basc Decmal Addton. Drect Decmal Addton. Speculatve Decmal Addton. Proposed Method: Condtonal Speculatve Decmal Addton. Algorthm. Implementatons: Parallel Prefx Adders. Bnary Carry Tree: Kogge-Stone and Ladner-Fscher. Quaternary Carry Tree. Delay-Area Estmatons and Comparson. Delay-area model for Statc CMOS gates based on Logcal Effort. Conclusons. RNC7-11th July, 2006 3
Introducton Demand of Hgh-Performance Decmal Arthmetc. Need of hardware support: fnancal and e-commerce applcatons. Bnary floatng-pont unts ntroduce naccurate results. Decmal software mplementatons do not satsfy performance demands. Reduced hardware support (IBM S/390 seres:g4,g5,g6,z900, z990). Only usual decmal nteger operatons mproved n hardware. Does not exst hardware mplementatons of decmal floatng-pont. RNC7-11th July, 2006 4
Introducton Revson of the IEEE-754 Standard for Floatng Pont. Current draft revson of IEEE-754 ncorporates specfcatons for decmal arthmetc. Decmal formats: 32-bt, 64-bt and 128-bt. Sgn, exponent and sgnfcand (DPD encodng BCD). Roundng modes and excepton handlng for bnary and decmal. Conversons between nteger and floatng-pont formats. Operatons defned: Add, subtract, multply, fused multply-add, dvde, square root. Software or/and Hardware mplementatons. RNC7-11th July, 2006 5
Condtonal Speculatve Decmal Addton Contents Introducton Demand of Hgh-Performance Decmal Arthmetc. Revson of the IEEE-754 Standard for Floatng Pont. Prevous Work on Integer Decmal Addton. Basc Decmal Addton. Drect Decmal Addton. Speculatve Decmal Addton. Proposed Method: Condtonal Speculatve Decmal Addton. Algorthm. Implementatons: Parallel Prefx Adders. Bnary Carry Tree: Kogge-Stone and Ladner-Fscher. Quaternary Carry Tree. Delay-Area Estmatons and Comparson. Delay-area model for Statc CMOS gates based on Logcal Effort. Conclusons. RNC7-11th July, 2006 6
Prevous Work on Integer Decmal Addton Basc Integer Decmal Addton (Hgh-level) Inputs: A = d 1 = 0 A 10 B = d 1 = 0 B 10 C 0 = Cn Output: S d 1 = S 10 = 0 Basc decmal carry propagate recurrence: C = A ± B + C + 1 /10 ( A ± B C ) S mod + = 10 Subtracton 10 s complement of subtrahend: d d = 10 + B = 1 ( 9 B ) 10 + 1 0 RNC7-11th July, 2006 7
Prevous Work on Integer Decmal Addton Basc Integer Decmal Addton (BCD) Dgts represented n BCD-8421 (4-bts/dgt). Subtracton : ( 9 B ) = 15 ( B + 6) = B + 6 Basc Addton/Subtracton n BCD: B B + 6 f ( op == sub) = B else C S ( + 1 = + 6) /16 S ( A + B C ) S = mod + 16 ( S + 6) mod f ( C+ 1 == 1 = S else 16 ) Problem: carry chan Improve delay of carry propagate recurrence RNC7-11th July, 2006 8
Prevous Work on Integer Decmal Addton Drect Decmal Addton Uses the followng decmal carry recurrence: C +1 = G + K C G decmal carry generate. K decmal carry kll. G true when A +B >10 K true when A +B <9 Can be evaluated usng conventonal parallel carry evaluaton technques: Carry lookahead. Parallel Prefx. Quaternary carry tree: 1 decmal carry per 4-bts. RNC7-11th July, 2006 9
Prevous Work on Integer Decmal Addton G = G + K Drect Decmal Addton G and K can be expressed n terms of bnary g [j] and k [j]: g [0] K = K k [0] ( k [2] k [1] ) G = g [3] + g [2] g [1] + k [3] + = 0,..., d 1 j = 0,1,2,3 K = k [3] + g [2] + k [2] g [1] g [j] and k [j] are the nputs of the quaternary carry tree. G and K are evaluated usng specfc logc and a logc level of the quaternary carry tree. Implemented n the FXU of the G4, G5 and G6 IBM S/390 seres. RNC7-11th July, 2006 10
Prevous Work on Integer Decmal Addton Implementaton of Drect Decmal Addton Operand A Operand B Performs bnary and drect decmal addtons/subtractons. STAGES Operand setup. Pre-sum Carry-evaluaton (precarry and carry tree). Sum. Quaternary carry tree (sparse tree, 1-n-4 carres). Evaluaton of decmal carry-generate and carry-kll sgnals. Sum performed usng 4- bt carry-select adders plus a dgt addton of 6. PRESUM C +1 (C =1) d Carry Select Adder +6 +6 1 0 Mux2 1 0 Mux2 Sum (Mux2 level) S C +1 (C =0) d 1-n-4 Carry Sgnals B+6 Mux2 0 1 0 generate & kll sgnals Parallel Prefx Carry Network (Quaternary Tree) sub RNC7-11th July, 2006 11 1 d s SIGNALS FOR CARRY GENERATION Decmal G & K sgnals 1 Mux2 0 1st level Carry Network d Crtcal path
Prevous Work on Integer Decmal Addton Speculatve Decmal Addton Speculatve Addton/Subtracton characterstcs: Intal (uncondtonal) sum of nput dgt+6 (wthout carry propagaton). ( A + B + C ) S = mod 6 + C = S S 16 + 1 /16 Fnal correcton of S -6 (wthout carry propagaton). ( S 6) mod f ( C+ 1 == 0 = S else 16 ) Bnary carres of S at decmal postons = decmal carres allows bnary parallel carry evaluaton technques. RNC7-11th July, 2006 12
Prevous Work on Integer Decmal Addton Speculatve Decmal Addton Two possbltes for the evaluaton of S : 1. Usng a parallel prefx carry tree. XOR operaton + post-correcton (after carry evaluaton). 2. Usng a quaternary carry tree (sparse). 4-bt carry select adders + correcton (n parallel wth carry evaluaton). Several choces for ntal sum +6 smlar mplementatons. ( A + B + 6) = ( A + 6) + B A + ( B + 6) ( A + 3) + ( B + 3) Implemented n the FXU of the IBM z900 and z990. RNC7-11th July, 2006 13
Prevous Work on Integer Decmal Addton Implementatons of Speculatve Decmal Addton Operand A Operand B Performs bnary and speculatve decmal addtons/subtractons. B+6 1 0 sub STAGES d a 1 0 Operand setup. Pre-sum Carry-evaluaton (pre-carry and carry tree). Sum. Post-correcton. Bnary carry tree (Kogge-Stone, Ladner-Fscher, etc ) Needs post-correcton n the crtcal path. Presum (XOR level) Sum (XOR level) S -6 1 generate & kll sgnals Parallel Prefx Carry Network (Bnary Tree) 1-n-1 Carry Sgnals Mux2 RNC7-11th July, 2006 DECIMAL CORRECTION 14 S 0 S C +1 1-n-4 Carry Sgnals d d Crtcal path
Prevous Work on Integer Decmal Addton Implementatons of Speculatve Decmal Addton Performs bnary and speculatve decmal addtons/subtractons. S s [3] s [2] s [1] s [0] Operand A Operand B STAGES Operand setup. Pre-sum Carry-evaluaton (pre-carry and carry tree). Sum. Quaternary carry tree (sparse tree, 1-n-4 carres). 4-bt carry-select adders. Sum correcton performed along wth carry evaluaton. s [3] s [2] s [1] s [0] S - 6 PRESUM C +1 (C =1) d Carry-Select Adder S S -6 1 0 1 0 Mux2 Mux2 Sum ( level) B+6 1 0 generate & kll sgnals Parallel Prefx Carry Network (Quaternary Tree) 1-n-4 Carry Sgnals Crtcal path S RNC7-11th July, 2006 15 d a S C +1 (C =0) S -6 d 1 0 sub
Condtonal Speculatve Decmal Addton Contents Introducton Demand of Hgh-performance Decmal Arthmetc. Revson of the IEEE-754 Standard for Floatng Pont. Prevous Work on Integer Decmal Addton. Basc Decmal Addton. Drect Decmal Addton. Speculatve Decmal Addton. Proposed Method: Condtonal Speculatve Decmal Addton. Algorthm. Implementatons: Parallel Prefx Adders. Bnary Carry Tree: Kogge-Stone and Ladner-Fscher. Quaternary Carry Tree. Delay-Area Estmatons and Comparson. Delay-area model for Statc CMOS gates based on Logcal Effort. Conclusons. RNC7-11th July, 2006 16
Proposed Method: Condtonal Speculatve Decmal Addton Algorthm Motvaton: mprove uncondtonal speculaton. Reducng the complexty of sum dgts correcton. Removng post-correcton from crtcal path delay. Soluton: Fnd a smple condton to reduce the values for whch the speculaton fals smple scheme for sum dgts correcton. RNC7-11th July, 2006 17
Proposed Method: Condtonal Speculatve Decmal Addton A B U A ( B ) U a [0] b [0] Algorthm Dvson of nput dgts n upper (3 left bts) and lower (rght bt) parts: C +1 S x x x x x x x x x x x x ( S ) U c [1] Wrong speculaton : s [0] Condton for speculaton : C We add +6 n ths case. U A + ( ) U B 8 U ( U A + B ) + 6 = 14 0 1 1 0 x x x x x x x x x x x x ( S ) U C + 1 = and c [ 1] = 0 1 U ( ) == 14 8 S Correcton (14 8) supposton real C +1 = 0 RNC7-11th July, 2006 18
Proposed Method: Condtonal Speculatve Decmal Addton Algorthm Sgnals for condtonal speculaton (detecton of U A + ( ) U B 8) r = k[ 3] + g[2] + k[1] g[1] For addton (d a ==1) t = a[ 3] + k[3] ( g[2] + k[2] k[1]) For subtracton (d s ==0) Condtonal speculaton: ( S ) U = Add 6 f f ( r ( r == == 1) 0) For addton (d a ==1) ( S ) U = Add 6 f ( t f ( t == == 1) 0) For subtracton (d s ==0) RNC7-11th July, 2006 19
Proposed Method: Condtonal Speculatve Decmal Addton Implementatons Goal: smplfy the sum correcton of the speculatve methods. 1. Full bnary parallel prefx carry tree confguratons Improve delay elmnatng post-correcton from crtcal path. 2. Quaternary carry tree confguratons Improve area smplfyng correcton. Lower dependency on the carry tree topology More flexblty to choose the adder archtecture and area/latency trade-offs. Combned bnary/decmal mplementatons Effcent mplementaton usng any exstng bnary parallel prefx adder. RNC7-11th July, 2006 20
Proposed Method: Condtonal Speculatve Decmal Addton Implementatons Performs bnary and condtonal speculatve decmal addtons/subtractons. Operand A Operand B Bnary carry tree (Kogge-Stone, Ladner- Fscher, etc ) Avods post-correcton n the crtcal path. A+6 d a Cond. Spec. ctrl sgnals r t ds B+6 1 0 sub 1 0 1 0 STAGES PRESUM Operand setup. XOR level generate & kll sgnals Pre-sum Carry-evaluaton (pre-carry and carry tree). Sum. Presum correcton Sum Parallel Prefx Carry Network (Bnary Tree) 1-n-1 Carry Sgnals Crtcal path S RNC7-11th July, 2006 21
Proposed Method: Condtonal Speculatve Decmal Addton Implementatons Smple correcton: (d == 1) Black gates replace 111- by 100- when U That s, ( U A + B ) + 6 = 14 and c [ 1] = U ( ) == 14 Addtonal gates (black) not n the crtcal path (grey). S 0 PRESUM (dgt) p [3] p [2] p [1] p [0] Operand A Operand B d A+6 Cond. Spec. ctrl sgnals d r a t ds B+6 1 0 sub 1 0 1 0 PRESUM SUM (dgt) c [3] c [2] c [1] c [0] 1 0 1 0 s [3] s [2] s [1] s [0] XOR level Presum correcton Sum S generate & kll sgnals Parallel Prefx Carry Network (Bnary Tree) 1-n-1 Carry Sgnals Crtcal path RNC7-11th July, 2006 22
Proposed Method: Condtonal Speculatve Decmal Addton Performs bnary and condtonal speculatve decmal addtons/subtractons. Quaternary carry tree (sparse tree, 1-n-4 carres). 4-bt carry-select adders. Sum correcton performed n carry-select adders. Smplfed selecton functon (only for sum dgts correcton): Decmal carres do not depend on condton for speculaton STAGES Operand setup. Pre-sum Carry-evaluaton (pre-carry and carry tree). Sum. Implementatons Operand A A+6 1 0 PRESUM Carry-Select Adder Sum Operand B generate & kll sgnals Parallel Prefx Carry Network (Quaternary Tree) 1-n-4 Carry Sgnals RNC7-11th July, 2006 23 d a Cond. spec. control sgnals r t S d s B+6 1 0 1 0 1 0 Crtcal path sub
Proposed Method: Condtonal Speculatve Decmal Addton Implementatons Modfed 4-bt carry select adder. Smple correcton: Replace 111- by 100- when (d == 1) Performed along wth carry computaton Addtonal gates (black) not n the crtcal path (grey). U ( ) == 14 S PRESUM - Carry-Select (dgt) p d p [3] k [1] p [1] [3]p [2]p [1] g [1] k [2] g [2] p [2] k [0] g [0] p [0] Operand A A+6 1 0 d a Operand B B+6 1 0 sub OAI OAI OAI OAI PRESUM Cond. spec. control sgnals r t d s generate & kll sgnals SUM (dgt) 1 0 1 0 1 0 1 0 s [3] s [2] s [1] s [0] C 1 0 1 Carry-Select Adder Sum S 0 Parallel Prefx Carry Network (Quaternary Tree) 1-n-4 Carry Sgnals Crtcal path RNC7-11th July, 2006 24
Condtonal Speculatve Decmal Addton Contents Introducton Demand of Hgh-performance Decmal Arthmetc. Revson of the IEEE-754 Standard for Floatng Pont. Prevous Work on Integer Decmal Addton. Basc Decmal Addton. Drect Decmal Addton. Speculatve Decmal Addton. Proposed Method: Condtonal Speculatve Decmal Addton. Algorthm. Implementatons: Parallel Prefx Adders. Bnary Carry Tree: Kogge-Stone and Ladner-Fscher. Quaternary Carry Tree. Delay-Area Estmatons and Comparson. Delay-area model for Statc CMOS gates based on Logcal Effort. Conclusons. RNC7-11th July, 2006 25
Delay-Area Estmatons and Comparson Delay-Area of Statc CMOS Gates Delay model for statc CMOS gates based on Logcal Effort. Delay values gven n FO4 unts (1x nverter wth fanout 4 1x nv). Area values gven n 1x Nand2 gate unts. Rough model vald for comparson among archtectures but not for obtanng precse absolute evaluaton results. We take nto account loads but nether nterconnectons nor gate szng optmzatons (we assume gates wth the drve strength of mn. szed nv. and ntroduce buffers when necessary). RNC7-11th July, 2006 26
Delay-Area Estmatons and Comparson Area-Delay Evaluaton Results 64-bt combned bnary/decmal adders Prefx tree Drect Decmal Speculatve Proposed Delay (t fo4 ) Area (Nand2) Delay (t fo4 ) Area (Nand2) Delay (t fo4 ) Area (Nand2) K-S ---- ---- 19.25 (1.14x) 2360 (1x) 16.85 (1x) 2660 (1.13x) L-F ---- ---- 20.65 (1.13x) 1985 (1x) 18.25 (1x) 2290 (1.15x) Q-T 16.85 (1.08x) 3251 (1.22x) 15.55 (1x) 2825 (1.06x) 15.55 (1x) 2655 (1x) In brackets the relatve ratos for each parallel prefx confguraton. RNC7-11th July, 2006 27
Area-Delay Space of Analyzed Adders Hardware Complexty (# nand 2 gates) 3500 3000 2500 2000 1500 1000 500 Speculatve Decmal Proposed QT QT QT K S K S L F L F K S QT L F Bnary adders Drect Decmal Bnary/Decmal Combned Adders: Drect Decmal no apparent advantage respect speculatve methods. For low latency Q-T best choce (our proposal requres less hardware). For low hardware cost and area-latency trade-off, L-F schemes are the best alternatves. 0 0 5 10 15 20 25 Delay (# FO4) Proposed combned Q-T adder only 1.10 slower than bnary Q-T although 1.65x more complex. RNC7-11th July, 2006 28
Condtonal Speculatve Decmal Addton Contents Introducton Demand of Hgh-performance Decmal Arthmetc. Revson of the IEEE-754 Standard for Floatng Pont. Prevous Work on Integer Decmal Addton. Basc Decmal Addton. Drect Decmal Addton. Speculatve Decmal Addton. Proposed Method: Condtonal Speculatve Decmal Addton. Algorthm. Implementatons: Parallel Prefx Adders. Bnary Carry Tree: Kogge-Stone and Ladner-Fscher. Quaternary Carry Tree. Delay-Area Estmatons and Comparson. Delay-area model for Statc CMOS gates based on Logcal Effort. Conclusons. RNC7-11th July, 2006 29
Condtonal Speculatve Decmal Addton Conclusons New hgh-performance algorthm for decmal nteger addton/subtracton. Avod the penalty delay of post-correcton schemes. Effcent mplementaton usng parallel prefx adders: both bnary and quaternary carry tree confguratons. Evaluaton results show very compettve area-delay fgures respect to commercal and patented mplementatons. RNC7-11th July, 2006 30