RADIX-10 PARALLEL DECIMAL MULTIPLIER

Size: px

Start display at page:

Download "RADIX-10 PARALLEL DECIMAL MULTIPLIER"

Naomi Perkins
5 years ago
Views:

1 RADIX-10 PARALLEL DECIMAL MULTIPLIER 1 MRUNALINI E. INGLE & 2 TEJASWINI PANSE 1&2 Electroncs Engneerng, Yeshwantrao Chavan College of Engneerng, Nagpur, Inda E-mal : mrunalngle@gmal.com, tejaswn.deshmukh@gmal.com Abstract - Ths paper ntroduces novel archtecture for Radx-10 decmal multpler. The new generaton of hghperformance decmal floatng-pont unts (DFUs) s demandng effcent mplementatons of parallel decmal multpler. The parallel generaton of partal products s performed usng sgned-dgt radx-10 recodng of the multpler and a smplfed set of multplcand multples. The reducton of partal products s mplemented n a tree structure based on a new algorthm decmal multoperand carry-save addton that uses a unconventonal decmal-coded number systems. We further detal these technques and t sgnfcantly mproves the area and latency of the prevous desgn, whch nclude: optmzed dgt recoders, decmal carry-save adders (CSA s) combnng dfferent decmal-coded operands, and carry free adders mplemented by specal desgned bt counters. Keywords Decmal computer arthmetc, parallel decmal multplcaton, partal product generaton and reducton, Decmal carry-save addton. I. INTRODUCTION Hardware mplementatons of decmal arthmetc unts have recently ganed mportance because they provde hgher accuracy n fnancal, commercal, scentfc, and nternet based applcatons. One reason s the need for precse floatng-pont representaton of many decmal values (e.g. 0.5) that do not have an exact bnary representaton[1]. The revson of the IEEE 754 Standard for Floatng-Pont Arthmetc (IEEE ) ncorporates specfcatons for DFP arthmetc that can be mplemented n software, hardware, or n a combnaton of both. The computer arthmetc lterature ncludes artcles on decmal arthmetc hardware such as BCD adders and multoperand BCD adders, sequental BCD multplers and dvders. However, some sngle arthmetc operatons (e.g., dvson or square root), and also functon evaluaton crcuts (e.g., radx-10 exponentaton or logarthm), are often mplemented by a sequence of smpler operatons ncludng several multplcatons. Therefore, hardware mplementaton of such operatons and functons would call for hgh-speed parallel decmal multplcaton. Such multplcaton schemes are generally mplemented as a sequence of three steps : partal product generaton (PPG), partal product reducton (PPR), and the fnal carry- propagatng addton. Parallel bnary multplers are used extensvely n most of the bnary floatng pont unts for hgh performance. However, decmal multplcaton s more dffcult to mplement due to the complexty n the generaton of multplcand multples and the neffcency of representng decmal values n system based on bnary sgnals. These ssues complcate the generaton and reducton of partal products. The frst mplementaton of a parallel decmal multpler s descrbed n [5]. Several dfferent parallel decmal multpler archtectures are proposed n [3], whch use new technques for partal product generaton and reducton. Furthermore, some of these archtectures were extended to support bnary multplcaton. Some concepts of [3] were appled n [6] to desgn decmal 4:2 compressor trees. All of the prevous desgns are combnatonal fxed pont archtectures. A ppelned IEEE complant DFP multpler based on an archtecture from [3] was presented n [7]. The work s the major extenson of the prevous paper [3], whch presented a new famly of hghperformance parallel decmal multplers. In ths paper, we deal wth fully combnatonal decmal fxed-pont archtecture. We descrbe n some detal the methods for partal product generaton and reducton proposed n [3] and ntroduce new technques to reduce the latency and the hardware complexty of the prevous desgns. The paper s organzed as follows: Secton 2 outlnes some prevous representatve work on decmal multplcaton. In Secton 3, we present the proposed multpler archtecture, sgned-dgt (SD) radx-10. The parallel generaton of decmal partal products s detaled n Secton 4. In Secton 5, we descrbe the method for fast multoperand decmal carry-save addton and propose several tree archtectures for an effcent reducton of partal products. Desgn decsons are supported by the area-delay model for statc CMOS gates descrbed n Secton 6. In addton, we have syntheszed SD radx-10 multpler desgns for 64-bt(16-dgt) operands, usng a 90 nm CMOS standard cells lbrary. The expected result of area-delay fgures shown n Secton 6. Fnally we summarze the man conclusons n Secton 7. Internatonal Journal of Electroncs Sgnals and Systems (IJESS) ISSN: , Vol-1 Iss-3,

2 TABLE-1 Decmal Codngs II. FIXED-POINT DECIMAL MULTIPLICATION A dgt Z d 1 0 of a decmal nteger operand Z Z 10 s coded as a postve weghted 4- bt vector as 3 Z z r (1) 0, j Where Z [0,9] s the th decmal dgt, z,j s the jth bt of the th dgt, and r j>=1 s the weght of the jth bt. The prevous expresson represents a set of coded decmal number systems that ncludes BCD (wth rj= 2 j ), shown n Table 1. The other decmal codes shown n Table 1 are used for representng dfferent decmal operands, as requred by the methods presented n ths paper, and are referenced later. We refer to these codes by ther weght bts as (r3r2r1r0) s denoted by Z (r3r2r1r0). d 1 The multplcand X= X 10 0 and d 1 multpler Y= Y 10 0 are unsgned decmal nteger d-dgt BCD words. Fxed-pont multplcaton (both bnary and decmal) conssts of three stages; generaton of partal products, reducton (addton) of partal products to two operands, and a fnal converson (usually a carry propagate addton) to a nonredundant 2d-dgt BCD representaton P= 2d 1 P10 0. Extenson to decmal floatng-pont multplcaton nvolves exponent addton, roundng of P=X Y to ft the requred precson, sgn calculatons, and excepton detecton and handlng. Decmal fxed-pont multplcaton s more complex than bnary multplcaton manly for two reasons: the larger range of decmal dgts ([0,9]), whch ncrements the number of multplcand multples and the neffcency of drectly representng decmal values n systems based on bnary logc usng BCD (snce only 9 out of the 16 possble 4-bt combnatons represent a vald decmal dgt). These ssues complcate the generaton and reducton of partal products. To mprove the decmal multplcaton performed the reducton of decmal partal product usng some scheme for decmal carrypropagate addton such as drect decmal addton. Radx-10 Parallel Decmal Multpler To reduce the contrbuton of the decmal correctons to the crtcal path, three dfferent technques for multoperand decmal carry-save addton were proposed n [4]. Two of them perform BCD correctons (+6 dgt addtons) usng combnatonal logc and an array of bnary carry-save adders (speculatve adders), although a fnal correcton s also requred. A sequental decmal multpler based on these technques s presented n [8]. It uses BCD nvald combnatons (overloaded BCD representaton) to smplfy the sum dgt logc. The other approach (non-speculatve adder [4]) uses a bnary CSA tree followed by a sngle decmal correcton. A recent proposal uses a bnary carry-free tree adder and a subsequent bnary to BCD converson to add up to N d-dgt BCD operands. An example of ths archtecture, mplemented n a decmal parallel multpler. The another group of methods [1], [5] uses dfferent topologes of 4-bt radx-10 carry-propagate adders to mplement decmal carry-save addton. In [1], a seral multpler s mplemented usng an array of radx-10 carry look-ahead adders (CLAs). A CSA tree usng these radx-10 CLAs s mplemented n the combnatonal decmal parallel multpler proposed n [5]. To optmze the partal product reducton, they also use an array of decmal dgt counters. The reducton of all decmal partal products n parallel requres the use of effcent multoperand decmal tree adders. We also mplement multoperand decmal tree adders usng a bnary CSA tree, but wth operands coded n decmal codngs that are more effcent than BCD, namely (4221) or (5211). These multoperand decmal CSA trees are detaled n Secton 5. II. DECIMAL PARALLEL MULTIPLIER In ths secton, we present a general overvew of the Radx-10 archtecture for d-dgt (4d-bt) BCD decmal fxed-pont parallel multplcaton. Ths desgn s based on the technques for partal product generaton and reducton detaled n Sectons 4 and 5, respectvely. The code (4221) and (5211) s used nstead of BCD to represent the partal product s the man feature of ths archtecture. Ths mproves the reducton of decmal partal product wth respect to other proposals, n terms of latency and area s expected. 3.1 SD Radx-10 Archtecture The archtecture of the d-dgt SD radx -10 multpler s shown n Fg.1. The multpler conssts of the followng stages : Generaton of decmal partal products coded n (4221) (generaton of multplcand multples and SD radx-10 encodng of the multpler), reducton of partal products, and a fnal BCD carry-propagate addton. Internatonal Journal of Electroncs Sgnals and Systems (IJESS) ISSN: , Vol-1 Iss-3,

3 The generaton of the d+ 1 partal products s performed by an encodng of the multpler nto d SD radx-10 dgts and an addtonal leadng bt as descrbed n Secton 4.1. Each SD radx-10 dgt controls a level of 5:1 muxes, whch selects a postve multplcand multple (0, X, 2X, 3X, 4X, 5X) coded n (4221). The generaton of these multples s detaled n Secton 4.3. To obtan each partal product, a level of XOR gates nverts the output bts of the 5:1 muxes when the sgn of the correspondng SD radx-10 dgt s negatve. Before beng reduced, the d+1 partal products, coded n (4221), are algned accordng to ther decmal weghts. Each p-dgt column of the partal product array s reduced to two (4221) decmal dgts usng one of the decmal dgt p:2 CSA trees descrbed n Secton 5.4. The number of dgts to be reduced for each column vares from p=d+1 to p= 2. Thus, the d+1 partal products are reduced to two 2ddgt operands S and H coded n (4221). Radx-10 Parallel Decmal Multpler We present a dfferent schemes wth good tradeoffs between fast generaton of partal products and the number of partal products generated. A mnmally redundant SD radx-10 recodng of the multpler (wth dgts n {-5,..., 0,...5}) produces only d+1 partal products but requres a carrypropagate addton to generate complex multples 3X and -3X. Furthermore, the (4221) and (5211) codes are self complementng (see Secton 5.1). Thus, an advantage of usng ths scheme, whch use BCD multples, s that the 9 s complement of each dgt can be obtaned by nvertng ts bts. Ths smplfes the generaton of the negatve multplcand multples from the postve ones. The fnal product s a 2d-dgt BCD word gven by P=2H +S. Before beng added, S and H need to be processed. S s recoded from (4221) to BCD excess-6 (BCD value plus 6, whch requres practcally the same logcal complexty as a recodng to BCD). The H 2 multplcaton s performed n parallel wth the recodng of S. Ths 2 block uses a (4221) to (5421) dgt recoder (see Secton 4.4) and a 1-bt wred left shft to obtan the operand 2H coded n BCD. For the fnal BCD carry-propagate addton, we use a quaternary tree (Q-T) adder based on condtonal speculatve decmal addton. It has low latency and requres less hardware than other alternatves. IV. DECIMAL PARTIAL PRODUCT GENERATION We am for a parallel generaton of a reduced number of partal products coded n (4221) or (5211). Ths s acheved wth the recodng of the d-dgt BCD multpler and the generaton of a reduced and smple set of multplcand multples. Fg 2. Partal product generaton for SD radx SD Radx-10 Recodng Fg. 2 shows the block dagram of the generaton of one partal product usng the SD radx-10 recodng. Ths recodng transforms a BCD dgt Y {0,...,9} nto an SD radx-10 Y b {-5,...,5}. The value of the recoded dgt Y b depends on the decmal value of Y and on a sgnal ys-1 (sgn sgnal) that ndcates f Y-1 s greater than or equal to 5. Thus, the d-dgt BCD multpler Y s recoded nto the d +1-dgt SD radx-10 multpler Yb d 10 wth Ybd = ysd-1 {0,1}. Each Yb 0 dgt Y b generates a partal product PP[] selectng the proper multplcand multple coded n (4221). Ths s performed n a smlar way to a modfed Booth recodng: Y b s represented as fve hot one code sgnals {y1, y2, y3, y4, y5} and a sgn bt ys. These sgnals are obtaned drectly from the BCD multpler dgts Y usng the followng logcal expresson ys y,3 y,2 ( y,1 y,0), y5 y,2 y,1 ( y,0 ys 1), y4 ys 1 y,0 ( y,2 y,1) ys 1 y,2 y,0, y3 y,1 ( y,0 ys 1), y2 ys 1 y,0 ( y,3 y,2 y,1) ys 1 y,3 y,0 y,2 y,1, y1 y,2 y,1 ( y,0 ys 1). Fg 1. Combnatonal SD radx-10 archtecture Internatonal Journal of Electroncs Sgnals and Systems (IJESS) ISSN: , Vol-1 Iss-3,

4 Radx-10 Parallel Decmal Multpler Symbols,, and ndcate Boolean operators OR, AND, and XOR, respectvely. The fve hot one code sgnals are used as selecton control sgnals for the 5:1 muxes to select the postve d +1-dgt multples {0, X, 2X, 3X, 4X, 5X}. The generaton of the postve multples (X, 2X, 3X, 4X, 5X) coded n (4221) from the BCD multplcand s detaled n Secton 4.3. To obtan the correct partal product, the selected postve multple s 10 s complemented f ys s one. Ths s performed smply by a bt nverson of the postve (4221) decmal-coded multple usng a row of XOR gates controlled by ys. The addton of one ulp (unt n the last place) s performed enclosng a tal-encoded bt ys (hot one) to the next sgnfcant partal product PP[+1], snce t s shfted a decmal poston to the left from PP[]. To avod a sgn extenson, and thus, to reduce the complexty of the partal product reducton tree, the partal product sgn bts ys are encoded at each leadng poston nto two dgts as ( PP[ ] d 2, PP[ ] d 1) ( ys0, ys0ys0ys0ys0), 0 (0,111 ys),,0 d 1, (0, 0000), d 1. Therefore, each partal product PP[] s at most of (d+3)-dgt length. 4.2 Generaton of Multplcand Multples All the requred decmal multplcand multples, except the 3X multple, are obtaned n a few levels of combnatonal logc usng dfferent dgt recoders and performng dfferent fxed m-bt left shfts (Lmshft) n the bt-vector representaton of operands. The structure of these dgt recoders s dscussed n Secton 4.3. Fg. 3 shows the block dagram for the generaton of the postve multplcand multples {X, 2X, 3X, 4X, 5X} for the SD radx-10 recodng. Fg 3(b) Waveforms of Multplcand multples All these multples are coded n (4221). The X BCD multplcand s easly recoded to (4221) usng the logcal expressons. ( w, w, w, w ) ( x x, x, x x, x ), (2),3,2,1,0,3,2,3,3,1,0 where x,j and w,j are nput and output, respectvely, the bts of the BCD and (4221) representatons of X. The generaton of multples s as follows: Multple 2X. Each BCD dgt s frst recoded to the (5421) decmal codng shown n Table 1 (the mappng s unque). An L1shft s performed to the recoded multplcand, obtanng the 2X multple n BCD. Then, the 2X BCD multple s recoded to (4221) usng Expressons (2). Multple 4X. It s obtaned as 2X 2, where the 2X multple s coded n (4221). The second 2 operaton s mplemented as a dgt recodng from (4221) to code (5211), followed by an L1shft. The desgn of the (4221) to (5211) dgt recoders s descrbed n Secton 4.3. The 2 operaton, wth nput operands coded n (4221) or (5211), s also mplemented n the decmal CSA trees used for partal product reducton, and therefore, t s more detaled n Secton 5.1. Multple 5X. It s obtaned by a smple L3shft of the (4221) recoded multplcand, wth resultant dgts coded n (5211). Then, a dgt recodng from (5211) to (4221) s performed (see Secton 4.3). Fg. 4 shows an example of ths operaton. Fg 3 (a).generaton of multplcand multples for SD radx 10 Multple 3X. It s evaluated by a carry-propagate addton of BCD multples X and 2X n a d-dgt BCD adder. The BCD sum dgts are recoded to (4221) as ndcated by Expresson (2). The latency of Internatonal Journal of Electroncs Sgnals and Systems (IJESS) ISSN: , Vol-1 Iss-3,

5 the partal product generaton for the SD radx-10 scheme s constraned by the generaton of 3X. Fg 4 : Calculaton of _5 for decmal operands coded n (4221) Radx-10 Parallel Decmal Multpler bt of each (5211s) dgt (weght 5) s shfted out to the next decmal poston (weght 10). Moreover, n some cases, the 2 may be smplfed. In partcular, the recodng gven by Expresson (2) maps the BCD representaton nto the subset (4221s). Therefore, the subsequent 2 operatons n Fg.3 s mplemented usng a level of smpler (4221s) to (5211s) dgt recoders. A (4221) to (5211s) dgt recoder has a hardware complexty of about 27 NAND2 gates, and ts crtcal path has (roughly) the delay of a full adder. The (4221s) to (5211s) dgt recoder has a smpler hardware complexty (about 19 NAND2 gates) wth 25 percent less latency. Z (5211) z 4 z 2 z 2 z, (4) * *,3,2,1,0 Table 2. Selected Decmal Codes for the Recoded Dgts 4.3 Implementaton of Dgt Recoders Dgt recoders are used to compute the decmal multplcand multples (Secton 4.2) and n the reducton of partal products (Secton 5 ) to compute 2 n (n > 0) operatons. The logcal mplementaton of dgts recoders for BCD, BCD excess-6, and (5421) decmal codes s straght forward, snce there s only a mappng of decmal dgts to these codes (each decmal dgt has a sngle 4-bt representaton). However, due to the redundancy of (4221) and (5211) decmal codes, there are several choces for the dgt recodng to (4221) or (5211). The sxteen 4-bt vectors of a codng can be mapped (recoded) nto dfferent subsets of 4-bt vectors of the other decmal codng representng the same decmal dgt. These subsets of the (4221) and (5211) codes are also decmal codngs. Among all the subsets analyzed, the nonredundant decmal codes (4221s) and (5211s) (subsets of ten 4-bt vectors), shown n Table 2, present nterestng propertes. In partcular, these codes verfy. 2 Z (4221 s) L1 [ Z (5211 s)], (3) shft that s, after shftng 1 bt to the left an operand Z represented n (5211s), the resultant bt-vector represents the decmal value of 2Z coded n (4221s). Ths fact smplfes the mplementaton of 2 n operatons for n > 1. Specfcally, for a decmal operand Z(4221), Z 2 n s mplemented by a frst level of Z(4221) to Z(5211s) dgt recoders followed by n -1 levels of Z(4221s) to Z(5211s) dgt recoders. The output of each level of dgt recoders s shfted 1 bt to the left such that the most sgnfcant Addtonally, the nverse dgt recodng (from (5211) to (4221) s easly mplemented usng a sngle full adder, snce wth z 2 z ( z z z ) 3. Ths *,1,0,3,1,0 recoder s used to generate the 5 multple for the (4221) codng and n mxed (4221/5211) multoperand CSAs (Secton 5.5) to convert a (5211) decmal-coded operand nto the equvalent (4221) coded one. V. PARTIAL PRODUCT REDUCTION Secton 5.1 descrbes the partal product arrays generated by the SD radx-10 encodng. Each column of p dgts s reduced to two dgts by means of a decmal dgt p:2 CSA tree. Also, decmal carres are passed between adjacent dgt columns. In Secton 5.2, we present the set of preferred decmal codngs and the method for decmal carry-save addton. We propose the use of the (4221) and (5211) decmal codngs nstead of BCD for an effcent mplementaton of decmal carry-save addton wth bnary CSAs or full adders. The use of these codes avods the need for decmal correctons, so we only need to focus on the 2 decmal multplcatons. The mplementaton of decmal 3:2 CSAs for the proposed codngs s also descrbed n Secton 5.2. We present the Decmal p:2 CSA Trees for Dgts Coded n (4221) Operands n Secton Partal Product Arrays As we detaled n Secton 4.1, the SD radx- 10 archtecture produces d +1 partal products coded n (4221) of d +3-dgt length. Before beng reduced, the d +1 partal products PP[] are algned accordng to ther decmal weghts by 4-bt wred left shfts (PP[] 10). The resultant partal product array for 16-dgt nput operands s shown n Fg. 5. In ths case, the number of dgts to be reduced vares from p = 17 to p =2. In partcular, the hghest columns can be Internatonal Journal of Electroncs Sgnals and Systems (IJESS) ISSN: , Vol-1 Iss-3,

reduced wth the area-optmzed or delay-optmzed decmal 17:2 CSA trees presented n Secton 5.

6 reduced wth the area-optmzed or delay-optmzed decmal 17:2 CSA trees presented n Secton Method for Decmal Carry-Save Addton Among all the possble decmal codes defned by Expresson (1) n Secton 2, there s a famly of codes sutable for smple decmal carry-save addton. Ths famly of decmal codngs verfes that the sum of ther weght bts s 9, that s, 3 rj 9, (5) j0 whch ncludes the (4221), (5211), (4311), and (3321) codes, shown n Table 1. Some of these decmal codngs are already known, but we use them n a dfferent context, to desgn components for decmal carry-save arthmetc. Moreover, they are redundant codes, snce two or more dfferent 4-bt vectors may represent the same decmal dgt. Radx-10 Parallel Decmal Multpler produces a decmal sum dgt (S) and a carry dgt H coded n (4221), such that A +B +C =S +2 H. In order to obtan (2H), H s frst recoded to W usng the (4221) to (5211s) dgt recoder ntroduced n Secton 4.3. The output of the dgt recoder (W) s then left shfted by 1 bt poston (L1shft[W]). A decmal carry output w,3 s passed to the next sgnfcant dgt poston, whle a decmal carry n w- 1,3 comes from the prevous. Snce the recoder s placed n the carry path, a full adder mplementaton wth a fast carry output, such as the one shown n Fg. 9b, reduces the total crtcal path delay. Fg. 6a shows the mplementaton of a decmal 3:2 CSA for dgts coded n (4221) usng a 4-bt bnary 3:2 CSA. The weght bts n Fg. 9a are placed n brackets above each bt column. The 4-bt bnary 3:2 CSA adds three decmal dgts (A, B, C), coded n (4221), and Fg 7. Implementaton of 2 block. Fg 5 Partal product arrays generated for 16-dgt operands. SD radx-10 archtecture. 5.3 Decmal p:2 CSA Trees for Decmal Coded n (4221) Operands. A decmal dgt p:2 CSA tree reduces p (p>= 3) nput dgts Z[l] (wth weght 10 ) coded n (4221) nto two decmal dgts H and S. In addton, several decmal carry outputs are generated to the next sgnfcant decmal poston (10 +1 ) and a certan number of decmal carry nputs come from the prevous poston (10-1 ). These decmal p:2 CSA trees are desgned as follows: Fg.6 Proposed decmal 3:2 CSA for operands coded n (4221). (a) Decmal dgt (4-bt) 3:2 CSA. (b) Full Adder. For p < 7, the nput dgts Z[l] are reduced n a frst level of bnary 3:2 CSAs. Each carry output dgt s multpled by 2 before beng reduced n the next level of the bnary 3:2 CSA tree. Each 2 operaton produces a decmal carry output to the next sgnfcant dgt column of the partal product array. The slowest outputs are connected to fast nputs of the next bnary 3:2 CSA level to balance the total delay of the dfferent paths (an F ndcates the fast nput). We use the full adder confguraton of Fg. 6b to mnmze the crtcal path delay of the CSA tree. The dgt blocks labelled 2 consst of a (4221) to (5211s) dgt recoder wth the outputs (for 4221 coded operands) 1-bt left shfted, as shown n Fg. 7. Internatonal Journal of Electroncs Sgnals and Systems (IJESS) ISSN: , Vol-1 Iss-3,

7 The most sgnfcant output bt (w,3) represents a decmal carry to the next dgt column. To smplfy the dagrams of the dfferent decmal p:2 CSA trees, the carres passed between adjacent dgt columns (w,3, w-1,3) are not represented. The carry output H must be multpled by 2 before beng assmlated wth the sum output S. For p>= 7, we follow dfferent strateges to obtan area-optmzed or delay-optmzed mplementatons. For area-optmzed mplementatons, the nput dgts Z[l] are reduced n a frst level of bnary 3:2 CSAs. Each ntermedate operand s assocated wth a multplcatve factor power of 2. Operands wth the same factor are reduced n a bnary 3:2 CSA before beng multpled by ths factor, that s, n n n n n n 1 2 A 2 B 2 C 2 ( A B C) 2 S 2 H. (6) Ths reduces the hardware complexty snce the overall number of 2 operatons s reduced. An areaoptmzed decmal 17:2 CSA tree for operands coded n (4221) s shown n Fg. 8a. The 4 and 8 dgt blocks produce two and three decmal carry outs to the next sgnfcant dgt column of the partal product array. They are mplemented usng a cascade confguraton of 2 blocks as shown n Fg. 9. The crtcal path delay s reduced by balancng the delay of the dfferent paths. For ths purpose, the ntermedate operands wth hgher multplcatve factors are multpled n parallel wth the reducton of the other ntermedate operands usng bnary 3:2 CSAs. The delay-optmzed 17:2 CSA tree n Fg. 8b has more hardware complexty (equvalent to two 2 blocks more) but the crtcal path s slghtly faster (about 1 XOR delay faster). Its delay s of about sx levels of bnary 3:2 CSAs and three levels of dgt recoders. The blocks labelled 9:4 and 8:4 represent the decmal dgt adders. VI. SYNTHESIS RESULT The 16-dgt SD radx-10 (Fg 1) combnatonal multpler have been syntheszed usng Modelsm SE6.5c. For partal product reducton, the SD radx- 10 multpler mplementes area-optmzed decmal p:2 CSA trees, smlal to the decmal 17:2 CSA tree of Fg 8a. VII. CONCLUSION Radx-10 Parallel Decmal Multpler reducton. We have proposed archtecture for decmal SD radx-10 multplcaton. Fg 8. Proposed decmal 17:2 CSAs. (a) Area Optmzed tree. (b) Delay-Optmzed tree. In ths paper, we have presented one technque to mplement decmal parallel multplcaton n hardware. We propose the SD encodng for the multpler that lead to fast parallel and smple generaton of partal products. We have developed a decmal carry-save algorthm based on (4221) and (5211) decmal encodng for partal product Fg 9. Implementaton of 8 Multplcaton for two adjacent columns. Internatonal Journal of Electroncs Sgnals and Systems (IJESS) ISSN: , Vol-1 Iss-3,

8 Radx-10 Parallel Decmal Multpler REFERENCES [1]. M. A. Erle and M. J. Schulte. Decmal multplcaton va carry-save addton. In Proc. IEEE Int l Conference on Applcaton-Specfc Systems, Archtectures, and Processors, pages , June [2]. M. A. Erle, E. M. Schwarz, and M. J. Schulte. Decmal multplcaton wth effcent partal product generaton. In Proc. IEEE 17th Symposum on Computer Arthmetc, pages 21 28, June [3]. Va zquez, E. Antelo, and P. Montusch, A New Famly of Hgh Performance Parallel Decmal Multplers, Proc. 18th IEEESymp. Computer Arthmetc, pp , June [4]. R. D. Kenney and M. J. Schulte. Hgh-speed multoperand decmal adders. IEEE Trans. on Computers, 54(8): , Aug [5]. T. Lang and A. Nannarell. A radx-10 combnatonal multpler. In Proc. 40th Aslomar Conference on Sgnals, Systems, and Computers, pages , Oct [6] I.D. Castellanos and J.E. Stne, Compressor Trees for Decmal Partal Product Reducton, Proc. 18th ACM Great Lakes Symp. VLSI, pp , Mar [7]. B.J. Hckman, A. Kroukov, M.A. Erle, and M.J. Schulte, A Parallel IEEE P754 Decmal Floatng-Pont Multpler, Proc. 25 th IEEE Conf. Computer Desgn, pp , Oct [8]. R.D. Kenney, M.J. Schulte, and M.A. Erle, Hgh-Frequency Decmal Multpler, Proc. IEEE Int l Conf. Computer Desgn: VLSI n Computers and Processors, pp , Oct Internatonal Journal of Electroncs Sgnals and Systems (IJESS) ISSN: , Vol-1 Iss-3,

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant