About this Topic. Topic 4. Arithmetic Circuits. Different adder architectures. Basic Ripple Carry Adder

About thi Topi Topi 4 Arithmeti Ciruit Peter Cheung Department of Eletrial & Eletroni Engineering Imperial College London URL: www.ee.imperial.a.uk/pheung/ E-mail: p.heung@imperial.a.uk Comparion of adder arhiteture on FPGA Multiple operand addition Bai multiplier Booth reoding multiplier Fied point v Floating Point Floating point Unit arhiteture Eample: FIR and IIR filter implementation Referene Computer Arithmeti, B. Parhami, OUP Computer Arithmeti Algorithm, I. Koren, AK Peter Topi 4 Slide Topi 4 Slide 2 Different adder arhiteture Bai Ripple Carr Adder Reviion on lat ear digital eletroni II oure (http://www.ee.i.a.uk/hp/taff/dmb/oure/dig2/5_adder.pdf) Common adder arhiteture are: Ripple arr adder Carr lookahead adder Carr kip (or arr elet) adder Carr ave adder Parallel prefi adder (Brent & Kung ) Shift Carr FF Clok 3 3 i+ i i FA i (a) Bit-erial adder. i Shift Uing full-adder in building bit-erial and ripple-arr adder. 32 out FA 3... 2 FA FA in 32 3 (b) Ripple-arr adder. Topi 4 Slide 3 Topi 4 Slide 4

Critial Path Through a Ripple-Carr Adder Adder Condition and Eeption k k k 2 k 2 T ripple-add = T FA (, out ) + (k 2) T FA ( in out) + T FA ( in ) out k k k 2 FA FA... 2 FA FA in k k k-2 k 2 Negative k k out k k 2 2 FA FA... FA FA k k 2 in Zero k k 2 Two -omplement adder with proviion for deteting ondition and eeption. Critial path in a k-bit ripple-arr adder. overflow 2 -ompl = k k k k k k overflow 2 -ompl = k k = k k k k Topi 4 Slide 5 Topi 4 Slide 6 Saturating Adder Full Carr Lookahead Saturating (aturation) arithmeti: When a reult magnitude i too large, do not wrap around; rather, provide the mot poitive or the mot negative value that i repreentable in the number format Eample In 8-bit 2 -omplement format, we have: 2 + 26 8 (wraparound); 2 + at 26 27 (aturating) Saturating arithmeti in deirable in man DSP appliation... 3 3 2 2 in Deigning aturating adder Unigned (quite ea) Adder 3 2 Theoretiall, it i poible to derive eah um digit diretl from the input that affet it Signed (onl lightl harder) Saturation value Carr-lookahead adder deign i impl a wa of reduing the ompleit of thi ideal, but impratial, arrangement b hardware haring among the variou lookahead iruit Topi 4 Slide 7 Topi 4 Slide 8

Unrolling the Carr Reurrene Carr-Lookahead Adder Deign Reall the generate g, propagate p ignal: Signal Radi r Binar g i i iff i + i r i i p i i iff i + i = r i i i ( i + i + i ) mod r i i i The arr reurrene an be unrolled to obtain eah arr ignal diretl from input, rather than through propagation i = g i + i p i = g i + (g i 2 + i 2 p i 2 ) p i = g i + g i 2 p i + i 2 p i 2 p i = g i + g i 2 p i + g i 3 p i 2 p i + i 3 p i 3 p i 2 p i = g i + g i 2 p i + g i 3 p i 2 p i + g i 4 p i 3 p i 2 p i + i 4 p i 4 p i 3 p i 2 p i =... Blok generate and propagate ignal g [i,i+3] = g i+3 + g i+2 p i+3 + g i+ p i+2 p i+3 + g i p i+ p i+2 p i+3 p [i,i+3] = p i p i+ p i+2 p i+3 i+3 i+2 g p g p g p g p i+3 i+3 g i+2 i+2 4-bit lookahead arr generator [i,i+3] p [i,i+3] i+ i+ Shemati diagram of a 4-bit lookahead arr generator. i+ i i i Topi 4 Slide 9 Topi 4 Slide A Building Blok for Carr-Lookahead Addition p [i,i+3] Combining Blok g and p Signal 4 3 Four-bit adder Four-bit lookahead arr generator. p 3 g 3 p 2 g 2 g [i,i+3] i+3 Blok Signal Generation Intermediate Carrie p i+3 g i+3 p i+2 g i+2 j 3 gp j j i j 2 i 2 i 3 j + j + j + 2 gp gp gp i Blok generate and propagate ignal an be ombined in the ame wa a bit g and p ignal to form g and p ignal for wider blok 2 p i+2 p i+ 4-bit lookahead arr generator i g p g i+ i g i+ p i g i gp Combining of g and p ignal of four (ontiguou or overlapping) blok of arbitrar width into the g and p ignal for the overall blok [i, j 3 ]. Topi 4 Slide Topi 4 Slide 2

Carr-Selet Adder Multilevel Carr-Selet Adder k - k /2 k - k /2-bit adder k /2-bit adder k /2-bit adder in k - 3k /4 k /4-bit adder 3k /4 - k /2 k /4-bit adder k /2 - k /4 k /4 - k /4-bit adder k /4-bit adder in k/2+ k/2+ k/2 k/4+ k/4+ k/4 k/4 k/4+ k/4+ k/4 Mu k/2 out k/2 Mu Mu k/4 High k /2 bit Low k /2 bit Carr-elet adder for k-bit number built from three k/2-bit adder. C elet-add (k) = 3C add (k/2) + k/2 + T elet-add (k) = T add (k/2) + Mu k/2+ k/2 out, High k /2 bit Middle k /4 bit Low k /4 bit Two-level arr-elet adder built of k/4-bit adder. k/4 Topi 4 Slide 3 Topi 4 Slide 4 Comparion between adder on modern FPGA Reult for Strati II Area Saritan, Rodella & Diaz, Comparion of addition truture nthei over ommerial FPGA, International Conf. on Deign & Tet, 26 Page():43-47 Compare ripple arr adder (RCA), arr lookahead adder (CLA), arr elet adder (CSLA), Brent&Kung parallel prefi adder (PA-BK) and finall not peifing an truture and let the nthei tool deide! Ue Altera Strati II and Xilin Virte-4 (not latet, but prett reent). Reult ummar: Motl a epeted, fater mean larger Surpriing, nthei tool doe the bet: both fat and mall!! Morale at low level, diffiult to beat modern nthei tool Reult hown in the net four lide. Soure: Saritan Topi 4 Slide 5 Topi 4 Slide 6

Reult for Strati II Dela Reult for Virte 4 Area Soure: Saritan Soure: Saritan Topi 4 Slide 7 Topi 4 Slide 8 Reult for Virte-4 Dela Multiplier and DSP Blok Remember that both Altera and Xilin FPGA have embedded multiplier with aumulator et. Thi part of the leture will look at ome of the ommon multiplier hardware (i.e. what uh embedded multiplier iruit might look like). We will alo onider appliation of FPGA embedded multiplier for FIR Filter implementation. Topi to over are: Bai multiplier Booth reoded multiplier Arra multiplier FIR Filter Compiler Soure: Saritan Topi 4 Slide 9 Topi 4 Slide 2

Multipliation of two 4-bit unigned number An eample Notation: a Multipliand a k a k 2... a a Multiplier k k 2... p Produt (a ) p 2k p 2k 2... p 3 p 2 p p Initiall, we aume unigned operand a Multipliand Multiplier a 2 a 2 a 22 2 a 23 3 p Partial produt bit-matri Produt Topi 4 Slide 2 Topi 4 Slide 22 Bai Sequential Multiplier Performing Add and Shift in One Clok Cle Shift Multiplier Doublewidth partial produt p (j) Adder arr-out Adder um k k Unued part of the multiplier Shift Partial produt p (j) Multiplia nd a Mu k j k To adder k To mu ontrol out a j k Adder Combining the loading and hifting of the double-width regiter holding the partial produt and the partiall ued multiplier. k Topi 4 Slide 23 Topi 4 Slide 24

Eample of a detail 44 unigned equential multiplier 2 omplement igned multipliation Topi 4 Slide 25 Topi 4 Slide 26 44 equential igned multiplier iruit Reoded Multiplier Booth Algorithm () Topi 4 Slide 27 Topi 4 Slide 28

Reoded Multiplier Booth Algorithm () Proof of Booth Algorithm Booth Algorithm doe thi 2 omplement rep of Topi 4 Slide 29 Topi 4 Slide 3 Sequential Booth Multiplier Multi-bit equential multiplier +/- B±A Topi 4 Slide 3 Topi 4 Slide 32

Modified Booth Algorithm (2 bit at a time) Modified Booth Reoding (2 bit at a time) Topi 4 Slide 33 Topi 4 Slide 34 Modified Booth Multiplier Ciruit Modified Booth Multiplier Ciruit Topi 4 Slide 35 Topi 4 Slide 36

Arra Multiplier Arra Multiplier obviou, but low verion Topi 4 Slide 37 Topi 4 Slide 38 Arra Multiplier uing arr-ave adder Embedded Multiplier in Altera Clone II () Soure: Topi 4 Slide 39 Topi 4 Slide 4

Embedded Multiplier in Altera Clone II (2) Embedded Multiplier in Altera Clone II (3) Soure: Soure: Topi 4 Slide 4 Topi 4 Slide 42 Appliation of Multiplier: Tpial DSP Stem Bai FIR Filter Altera and Xilin provide FIR filter ompiler upport. Thee eample are taken from Altera FIR Compiler Uer Guide. MegaCore funtion pre-deigned ore (large module). LPM Funtion are parameteried building blok (e.g. adder, multiplier) Soure: Altera and Xilin provide FIR filter ompiler upport. Thee eample are taken from Altera FIR Compiler Uer Guide. Soure: Topi 4 Slide 43 Topi 4 Slide 44

Eploiting Smmetri Coeffiient (7-tap) Parallel Implementation of FIR Filter Soure: Soure: Topi 4 Slide 45 Topi 4 Slide 46 Serial Implementation of FIR Filter Multibit Serial Implementation of FIR Filter Soure: Soure: Topi 4 Slide 47 Topi 4 Slide 48

FIR Filter Compiler Deign Spae Floating-Point Number No finite number tem an repreent all real number Variou tem an be ued for a ubet of real number Fied-point ± w. f Low preiion and/or range Rational ± p / q Diffiult arithmeti Floating-point ± b e Mot ommon heme Logarithmi ± log b Limiting ae of floating-point Fied-point number = (. ) two Small number = (. ) two Large number Floating-point number = ± b e or ± ignifiand bae eponent Note that a floating-point number ome with two ign: Number ign, uuall repreented b a eparate bit Eponent ign, uuall embedded in the biaed eponent Soure: Topi 4 Slide 49 Topi 4 Slide 5 Floating-Point Number Format and Ditribution Tpial floatingpoint number format. Subrange and peial value in floating-point number repreentation. Sparer Sign : + : ± e Dener E p o n e n t : Signed integer, often repreented a unigned value b adding a bia Range with h bit: [ bia, 2 h bia] Dener S i g n i f i a n d : Repreented a a fied-point number Uuall normalized b hifting, o that the MSB beome nonzero. In radi 2, the fied leading an be removed to ave one bit; thi bit i known a "hidden ". Negative number Poitive number ma FLP min ± min + FLP + ma + + Sparer The ANSI/IEEE Floating-Point Repreentation 8 bit, bia = 27, 26 to 27 Sign Eponent Short (32-bit) format bit, bia = 23, 22 to 23 23 bit for frational part (plu hidden in integer part) Signifiand 52 bit for frational part (plu hidden in integer part) IEEE 754 Standard (now being revied to ield IEEE 754R) region Underflow region region Long (64-bit) format Midwa eample Underflow eample Tpial eample eample Topi 4 Slide 5 Topi 4 Slide 52

Overview of IEEE 754 Standard Format Eponent Enoding Some feature of the ANSI/IEEE tandard floating-point number repreentation format. Feature Single/ Short Double/ Long Word width (bit) 32 64 Signifiand bit 23 + hidden 52 + hidden Signifiand range [, 2 2 23 ] [, 2 2 52 ] Eponent bit 8 Eponent bia 27 23 Zero (±) e + bia =, f = e + bia =, f = Denormal e + bia =, f e + bia =, f repreent ±.f 2 26 repreent ±.f 2 22 Infinit (± ) e + bia =255, f = e + bia = 247, f = Not-a-number (NaN) e + bia = 255, f e + bia = 247, f Ordinar number e + bia [, 254] e + bia [, 246] e [ 26, 27] e [ 22, 23] repreent.f 2 e repreent.f 2 e min 2 26.2 38 2 22 2.2 38 ma 2 28 3.4 38 2 24.8 38 Topi 4 Slide 53 Eponent enoding in 8 bit for the ingle/hort (32-bit) ANSI/IEEE format Deimal ode He ode Eponent value 26 27 28 254 255 7E 7F 8 FE FF f = : Repreentation of ± f : Repreentation of denormal,.f 2 26 Eponent enoding in bit for the double/long (64-bit) format i imilar 26 + +27 region Sparer f = : Repreentation of ± f : Repreentation of NaN Negative number Poitive number ma FLP min ± min + FLP + ma + + Midwa eample.f 2 e Dener Underflow eample Underflow region Dener Tpial eample Sparer region eample Topi 4 Slide 54 Floating-Point Adder/Subtrator Aume e e2; alignment hift (prehift) i needed if e > e2 (± b e ) + (± 2 b e2 ) = (± b e ) + (± 2/b e e2 ) b e Eample: Number to be added: = 25. = 2. Operand after alignment hift: - = 25. = 25. Reult of addition: = 25. = 25. = (± ± 2/b e e2 ) b e = ± b e Operand with maller eponent to be prehifted Etra bit to be rounded off Rounded um Like ign: Poible -poition normalizing right hift Different ign: Poible left hift b man poition /underflow during addition or normalization Topi 4 Slide 55 FP Adder/Sub Iolate the ign, eponent, ignifiand Reintate the hidden Convert operand to internal format Identif peial operand, eeption Other ke part of the adder: Signifiand aligner (prehifter) Reult normalizer (pothifter), inluding leading detetor/preditor Rounding unit Sign logi Converting internal to eternal repreentation, if required, mut be done at the rounding tage Combine ign, eponent, ignifiand Hide (remove) the leading Identif peial outome, eeption Add/ Sub Control & ign logi Sign Sign Eponent Add Mu Eponent Operand Unpak Sub out Pak Signifiand Seletive omplement and poible wap Align ignifiand Add Normalize Round and eletive omplement Normalize Signifiand Sum/Differe ne Topi 4 Slide 56 in

re- and Pothifting Leading Zero / One Detetion or Predition i+3 i+3 i i i+2 i+... Shift amount 3 3 2 5 32-to- Mu Enable Four-tage ombinational hifter for prehifting an operand b to 5 bit. LSB 4-Bit Shift Amount MSB One bit-lie of a ingle-tage pre-hifter. i+8 i+7 i+8 i+7 i+6 i+5 i+6 i+5 i+4 i+3 i+4 i+3 i+2 i+ i+2 i+ i i Leading zero predition, with adder input (. 2...) 2 -ompl and (. 2...) 2 -ompl Wa in whih leading / are generated: p p... p p g a a... a a g... p p... p p g a a... a a p... p p... p p a g g... g g a... p p... p p a g g... g g p... Predition might be done in two tage: Coare etimate, ued for oare hift Fine tuning of etimate, ued for fine hift In thi wa, predition an be partiall overlapped with hifting Leading zero/one ounting Adjut Eponent Adjut Eponent Count Leading / Shift amount Predit Leading / Shift amount Signifiand Adder Pot-Shifter Leading zero/one predition. Signifiand Adder Pot-Shifter Topi 4 Slide 57 Topi 4 Slide 58 Floating-Point Multiplier Further referene for Floating Point on FPGA (± b e ) (± 2 b e2 ) = (± 2) b e+e2 2 [, 4): ma need pothifting or underflow an our during multipliation or normalization Speed onideration Man multiplier produe the lower half of the produt (rounding info) earl Need for normalizing right-hift i known at or near the end Hene, rounding an be integrated in the generation of the upper half, b produing two verion of thee bit XOR Add Eponent Floating-point operand Adjut Eponent Adjut Eponent Unpak Pak Multipl Signifiand Normalize Round Normalize An anali of the double-preiion floating-point FFT on FPGA Hemmert, K.S.; Underwood, K.D.; 3th Annual IEEE Smpoium on Field-Programmable Cutom Computing Mahine, 8-2 April 25 Page():7-8 Arhitetural Modifiation to Improve Floating-Point Unit Effiien in FPGA Beauhamp, M.J.; Hauk, S.; Underwood, K.D.; Hemmert, K.S.; International Conferene on Field Programmable Logi and Appliation, 28-3 Aug. 26 Page(): - 6 Double preiion floating-point arithmeti on FPGA Pahalaki, S.; Lee, P.; IEEE International Conferene on Field-Programmable Tehnolog (FPT), 5-7 De. 23 Page():352-358 Produt Topi 4 Slide 59 Topi 4 Slide 6