Combined Radix-10 and Radix-16 Division Unit

Similar documents
About this Topic. Topic 4. Arithmetic Circuits. Different adder architectures. Basic Ripple Carry Adder

Topics. FPGA Design EECE 277. Number Representation and Adders. Class Exercise. Laboratory Assignment #2

Incorporating Speculative Execution into Scheduling of Control-flow Intensive Behavioral Descriptions

VLSI Design 9. Datapath Design

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

Pipelined Multipliers for Reconfigurable Hardware

Description of Traffic in ATM Networks by the First Erlang Formula

Parametric Micro-level Performance Models for Parallel Computing

COURSEWORK 1 FOR INF2B: FINDING THE DISTANCE OF CLOSEST PAIRS OF POINTS ISSUED: 9FEBRUARY 2017

Laboratory Exercise 6

Automatic design of robust PID controllers based on QFT specifications

Inverse Kinematics 1 1/29/2018

1. Introduction. Abstract

Relayer Selection Strategies in Cellular Networks with Peer-to-Peer Relaying

Datum Transformations of NAV420 Reference Frames

KINEMATIC ANALYSIS OF VARIOUS ROBOT CONFIGURATIONS

Macrohomogenous Li-Ion-Battery Modeling - Strengths and Limitations

Pruning Game Tree by Rollouts

Kinematic design of a double wishbone type front suspension mechanism using multi-objective optimization

Computer Arithmetic Homework Solutions. 1 An adder for graphics. 2 Partitioned adder. 3 HDL implementation of a partitioned adder

Folding. Hardware Mapped vs. Time multiplexed. Folding by N (N=folding factor) Node A. Unfolding by J A 1 A J-1. Time multiplexed/microcoded

Calculations for multiple mixers are based on a formalism that uses sideband information and LO frequencies: ( ) sb

Distributed Packet Processing Architecture with Reconfigurable Hardware Accelerators for 100Gbps Forwarding Performance on Virtualized Edge Router

Fall 2010 EE457 Instructor: Gandhi Puvvada Date: 10/1/2010, Friday in SGM123 Name:

Shortest Paths in Directed Graphs

Design of High Speed Mac Unit

OSI Model. SS7 Protocol Model. Application TCAP. Presentation Session Transport. ISDN-UP Null SCCP. Network. MTP Level 3 MTP Level 2 MTP Level 1

Universität Augsburg. Institut für Informatik. Approximating Optimal Visual Sensor Placement. E. Hörster, R. Lienhart.

Floating Point CORDIC Based Power Operation

KS3 Maths Assessment Objectives

How to Select Measurement Points in Access Point Localization

Fall 2010 EE457 Instructor: Gandhi Puvvada Date: 10/1/2010, Friday in SGM123 Name:

Laboratory Exercise 6

Visual Targeted Advertisement System Based on User Profiling and Content Consumption for Mobile Broadcasting Television

Laboratory Exercise 6

A Novel Method for Removing Image Staircase Artifacts

Reduced-Complexity Column-Layered Decoding and. Implementation for LDPC Codes

An Evolutionary Multiple Heuristic with Genetic Local Search for Solving TSP

Course Project: Adders, Subtractors, and Multipliers a

Deterministic Access for DSRC/802.11p Vehicular Safety Communication

9/6/2011. Multiplication. Binary Multipliers The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

Using Bayesian Networks for Cleansing Trauma Data

Karen L. Collins. Wesleyan University. Middletown, CT and. Mark Hovey MIT. Cambridge, MA Abstract

Motion Control (wheeled robots)

arxiv:cs.oh/ v1 7 Mar 2005

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

Laboratory Exercise 6

(12) Patent Application Publication (10) Pub. No.: US 2003/ A1

Dynamically Reconfigurable Neuron Architecture for the Implementation of Self- Organizing Learning Array

Background/Review on Numbers and Computers (lecture)

A note on degenerate and spectrally degenerate graphs

AN ALGORITHM FOR RESTRICTED NORMAL FORM TO SOLVE DUAL TYPE NON-CANONICAL LINEAR FRACTIONAL PROGRAMMING PROBLEM

Self-Contained Automated Construction Deposition System

A SIMPLE IMPERATIVE LANGUAGE THE STORE FUNCTION NON-TERMINATING COMMANDS

Planning of scooping position and approach path for loading operation by wheel loader

In-Plane Shear Behavior of SC Composite Walls: Theory vs. Experiment

An Intro to LP and the Simplex Algorithm. Primal Simplex

A Boyer-Moore Approach for. Two-Dimensional Matching. Jorma Tarhio. University of California. Berkeley, CA Abstract

SIMIT 7. Component Type Editor (CTE) User manual. Siemens Industrial

MAT 155: Describing, Exploring, and Comparing Data Page 1 of NotesCh2-3.doc

A Radix-10 SRT Divider Based on Alternative BCD Codings

Complex Rational Expressions

Q1:Choose the correct answer:

PROBLEM -1. where S. C basis x. 0, for entering

SPH3UW Unit 7.1 The Ray Model of Light Page 2 of 5. The accepted value for the speed of light inside a vacuum is c m which we usually

A New Approach to Pipeline FFT Processor

View-Based Tree-Language Rewritings

Advanced Encryption Standard and Modes of Operation

A METHOD OF REAL-TIME NURBS INTERPOLATION WITH CONFINED CHORD ERROR FOR CNC SYSTEMS

Correlation Models for Shadow Fading Simulation

Building a Compact On-line MRF Recognizer for Large Character Set using Structured Dictionary Representation and Vector Quantization Technique

Minimum congestion spanning trees in bipartite and random graphs

A {k, n}-secret Sharing Scheme for Color Images

Laboratory Exercise 2

DECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Euncheol Kim, Gwan Choi, Mark Yeary *

A Specification for Rijndael, the AES Algorithm

Representations and Transformations. Objectives

Outline: Software Design

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System

represent = as a finite deimal" either in base 0 or in base. We an imagine that the omputer first omputes the mathematial = then rounds the result to

Drawing Lines in 2 Dimensions

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

Laboratory Exercise 2

A Linear Interpolation-Based Algorithm for Path Planning and Replanning on Girds *

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays

[N309] Feedforward Active Noise Control Systems with Online Secondary Path Modeling. Muhammad Tahir Akhtar, Masahide Abe, and Masayuki Kawamata

ADAM - A PROBLEM-ORIENTED SYMBOL PROCESSOR

Multi-Target Tracking In Clutter

Performance of a Robust Filter-based Approach for Contour Detection in Wireless Sensor Networks

Key Terms - MinMin, MaxMin, Sufferage, Task Scheduling, Standard Deviation, Load Balancing.

Lecture 14: Minimum Spanning Tree I

SLA Adaptation for Service Overlay Networks

Department of Electrical and Computer Engineering University of Wisconsin Madison. Fall

the data. Structured Principal Component Analysis (SPCA)

Topics. Lecture 37: Global Optimization. Issues. A Simple Example: Copy Propagation X := 3 B > 0 Y := 0 X := 4 Y := Z + W A := 2 * 3X

Parallel Block-Layered Nonbinary QC-LDPC Decoding on GPU

Operational Semantics Class notes for a lecture given by Mooly Sagiv Tel Aviv University 24/5/2007 By Roy Ganor and Uri Juhasz

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition

We don t need no generation - a practical approach to sliding window RLNC

Transcription:

Combined adix- and adix-6 Diviion Unit Tomá ang and Alberto Nannarelli Dept. of Eletrial Engineering and Computer Siene, Univerity of California, Irvine, USA Dept. of Informati & Math. Modelling, Tehnial Univerity of Denmark, Kongen yngby, Denmark Abtrat In thi work we extend a previouly propoed digitreurrene radix- diviion unit to be able to perform alo radix- 6 diviion. The extenion i implified by the fat that in the radix- implementation the uotient digit i deompoed into two part and that thi deompoition i alo appropriate for the radix-6 ae. Moreover, to redue the lateny in the radix- the mot-ignifiant portion of the datapath, inluding the eletion funtion, ha been implemented in radix-2, o that the modifiation of that part to inlude radix-6 onit mainly in ombining the two module to obtain the eletion ontant. The ret of the modifiation relate to the generation of multiple, to the arry-ave adder, to the arry-propagate adder, and to the on-the-fly onverion and rounding. The implementation reult how that the delay of an iteration i imilar to that of the radix- ae and that the area i about thirty perent larger. I. INTODUCTION Hardware implementation of deimal arithmeti unit have reently gained importane beaue they provide higher auray in finanial appliation []. Moreover, to redue the reuired area, it i onvenient to perform in the ame unit the operation for both deimal and binary repreentation. Combined unit of thi type have been propoed for addition [2] and multipliation [3]. In thi work we propoe a ombined unit for the diviion operation. Previouly we deribed a radix- diviion unit uing the digit-reurrene approah []. Moreover, thi approah ha been ued extenively for radix-2 repreentation [5]. Speifially, ine the radix- unit produe one digit of the uotient per iteration and the radix- digit i repreented in by four bit, it eem appropriate to ombine it with a radix-6 unit. Thi ombination i implified by the fat that in the radix- ae we have deompoed the uotient digit into two part, whih i alo the preferred method for implementing radix-6 diviion [5]. II. DIVISION AGOITHM The expreion for the digit-reurrene iteration for the radix- ae are [] v[j] = w[j ] j (5d) w[j] = v[j] j d with j {,, } and j { 2,,,, 2} for a redundany fator ρ =7/9. Similarly, for the radix-6 ae v[j] = 6w[j ] j (d) w[j] = v[j] j d with j { 2,,,, 2} and j { 2,,,, 2} for a redundany fator ρ 6 =/6. Therefore, the two reurrene an be ombined into v[j] = [j ] j (kd) () w[j] = v[j] j d (2) with uotient digit eletion funtion = SE H (, d) (3) = SE ( v, d) () and uotient digit j = kj + j. The with between the two radie i performed by etting a bit uh that when = r =6and k = (radix-6) when = r =and k =5 (radix-) To enure onvergene, the reurrene i initialized a w[] = x/r 2. III. DIVIDE ACHITECTUE The heme implementing the diviion reurrene of () and (2) i hown in Fig.. The divider i ompleted by a unit to onvert the uotient-digit j from the igned-digit to the, or to the binary unigned, repreentation, and to perform the rounding. A mentioned above, the radix eletion i done with a ignal uh that for radix-6 = and for radix- =. Therefore, in the reurrene, we have to proe data both in (for radix-) and in binary (for radix-6). Thi reuire ome modifiation in the arry-ave adder (CSA) a explained in Setion III-D. In [], to peed up the radix- diviion, we implement the mot-ignifiant lie (MS-lie) of the reurrene in radix-2 (two omplement). The onverion, one digit per iteration, from a digit to a -bit binary digit i traightfoard. When ombining with radix-6, we need to apply only minor modifiation in the MS-lie, a explained in Setion III-C. One radix-6 digit i imply tranferred from the dual-radix reurrene part every iteration. In the following, we indiate with lower ae letter (e.g. d) digit-vetor in the dual radix part of the reurrene and with upper ae letter (e.g. D) bit-vetor in the MS-lie. When neeary to peify the radix, radix- digit-vetor are indiated with the ubript (e.g. d ). We now diu the implementation of the relevant blok in Fig. and the onvert-and-round unit. 978--2-2-7/8/$25. 27 IEEE 967

d^ 3* d preomp. + 2D + kd+ D 6* + 5d + 2d 8d d 5d d 5d 8d x Mux 2: - d - d r-6 r- r-6 r- -2 8d N/A -2 2d 2d - d 5d - d d d 5d d d 2 8d N/A 2 2d 2d m k m H m 2 m m H2 Table + kd radix 2 ^ +3* 2d 2d d d d radix /6 CSA 2d 2d v radix /6 CSA v 7 6 5 TABE I OPEATION OF MUT/MUX. SE & H 3 w w 3 Poition of regiter 2 Fig.. Bai implementation of radix-/radix-6 reurrene. A. Preomputation of the multiple Thi blok ompute the multiple of d neeary for both the reurrene and the eletion funtion. In the radix- ae the multiple five time the divior (5d ) and two time the divior (2d ), and their negative, are preomputed. For the radix-6 the multiple reuired are eight, four, and two time the divior (8d, d, and2d); thee are traightfoard to ompute (by hifting) and a eletor i ued to elet among the multiple depending on the radix. The detail of the multiple eletion for the dual radix reurrene i hown in Table I. In the radix-2 MS-lie for the radix-6 diviion, we imply ue a trunated repreentation of 8d, d, and2d. For radix- diviion, the multiple 5d, 2d and their negative are onverted into two omplement. B. Quotient-digit eletion In the uotient digit eletion funtion deribed by (3) and (), the etimate and v are obtained by uing a limited number of digit of the arry-ave repreentation. Although in priniple thi number ould be different, we ue the ame number to implify the heme. The eletion of the uotient-digit i done by preloading eletion ontant and omparion [6]. With repet to the radix- implementation of [], in thi dual radix divider we need to ombine the radix- eletion funtion with the radix- 6 one. We explored two alternative: ) eparate module to generate the ontant for eah radix; 2) a ombined module for both. The ombination of the radie an be done if the ontant m k atify the ondition on the bound of the eletion interval (ee [5] and [] for the detailed derivation). Thee bound are hown in Fig. 2, for the poitive uadrant. The..2.3..5.6.7.8.9.8.6..2.8.6..2..2.3..5.6.7.8.9 Fig. 2. PD plot for (top) and (bottom) (poitive uadrant). dotted line in the figure, repreent the bound for radix- 6, and the olid line the bound for radix-. The eletion ontant are then hoen by k m k <Uk+ 6 k = {,,, 2} where k i the bound obtained for radix- (olid line in Fig. 2) and Uk+ 6 i the bound for radix-6 (dotted line in Fig. 2). Moreover, we hooe ontant whih are ymmetri with repet to the ign: m 2 = m and m = m. For radix-6 (whih eletion funtion i deompoed into two radix- eletion funtion), 3 bit of the divior d are uffiient to elet the ontant m k d =.b 2 b 3 b... 968

M, M 2 SE (u= 2) initialization X () Mux 2: 2 2 SE (u= ) W 2 2 SE W 2 kd H (u=) D v (3) 8D kd kd 8D SE (u=) ritial path +/ KD 2D D D 2D +/ 8D SE (u=2) SE d^ m k M M H2 H Table M M M M H2 H 2 8W 8W 2W 2W 8W 8W Mux 2: CSA :2 mult by /6... d preomp.... (3) w () w () egiter 8d kd kd 8d 2d d d 2d x/r 2 Mux 2: radix r CSA v (3) radix r CSA 6* (3) v 6 v 6* 6 w w mux Poition of regiter adix 2 adix r Fig. 3. Implementation of the dual-radix reurrene. Therefore, to unify the interval on d for both radie, we map the 8 onfiguration. b 2 b 3 b into the loet 3 frational digit repreentation of d. The reulting interval on d and the ontant for both and are reported in Table II. Their bound are plotted in Fig. 2. The value in Table II repreent frational number, that ha to be implemented a integer in the radix-2 eletion funtion. Thi onverion fration to integer i done by M k = m k r 2 and therefore, we get two different enoding for radix- and radix-6. Moreover, with repet to the radix- only implementation, the digit-et of i extended from three to five value for radix-6. For thi reaon, the eletion by omparion i modified by omputing peulatively the five poible outome of ˆv[j]. C. adix-2 mot-ignifiant lie With repet of the implementation of [], the radix-2 MSlie i modified a follow. A piture of the implementation of the reurrene i hown in Fig. 3. ) To produe a 5-value uotient-digit, the eletion funtion for i ompoed of four ign-detetor and the enoder of the uotient digit i lightly hanged. 2) A a oneuene of ), the multiplexer produing (kd) i hanged into a 5: mux and two extra flipflop are reuired to tore. 3) The peulative eletion funtion for i in thi dual radix unit ompoed of five blok omputing peulatively = SE ( rw (kd), ˆd), = { 2,,,, 2} Coneuently, a mux 5:, ontrolled by,mutbe ued to elet among the poible value of. ) The multipliation rw[j] i performed by a CSA :2 and a multiplexer: radix input to CSA :2 6 8W +8W +8W +8W 8W +8W +2W +2W D. Dual-radix arry-ave adder The arry-ave adder (CSA) in the reurrene an be operated by eleting the radix with. A heme of the dualradix CSA i hown in Fig. for one digit, we indiate with x (i) the digit of weight r i. E. Converion and ounding The on-the-fly onverion and rounding implemented in [] an be eaily be adapted to the radix-6 ae, with the exeption of the normalization that in the binary ae reuire hift of one bit, while in radix- the hift i one digit ( bit). Moreover, the adder neeary to ompute the ign of the final reminder, and to determine if it i zero, i implemented in dual-radix. IV. IMPEMENTATION AND COMPAISONS In thi etion we preent the reult of the evaluation of the dual radix diviion unit and a omparion with the deimal divider of [] and a double-preiion radix-6 digit-reurrene diviion unit. 969

[d i, d i+ ) m H2 m H m H m H m 2 m m m.,.6 -.26 -.26 -.6. -. -.6.6,.2.28 -.28.2,.3.32 -.32.2.8 -.8 -.2.3,..3 -.3.,.5.36 -.36.5,.7. -..2 -.2.7,.2.6 -.6.28 -.28.2,.22.52 -.52.32 -.32.22,.25.58 -.58.36 -.36.25,.3.68 -.68. -..3,.35.8 -.8.8.6 -.6 -.8.35,.2.96 -.96.56 -.56.2,.5. -..68.2 -.2 -.68.5,.57 3.2.32 -.32-3.2.8 -.8.57,.63 3.52. -. -3.52.88.36 -.36 -.88.63,.69 3.8.58 -.58-3.8.96 -.96.69,.75.6.8 -.8 -.6.2 -.2.75,.82.8.88 -.88 -.8.82,.88 5.2 2.8-2.8-5.2.28 -.28.88,.9 2.2-2.2.9,. 5.76-5.76. -. TABE II CONSTANTS m k FO BOTH ADIX- AND ADIX-6 SEECTION. Unit yle time n. yle lateny peed-up area ratio [n] [n] [μm 2 ] adix-6 (tandard). 6 6.. 38. adix- []. 2 2..8 597.6 977. Dual-radix (thi work). 6/2 6.6/2.8.96/.96 785.8 TABE III SUMMA OF ESUTS FO THE SNTHESIZED UNIT. Fig.. ign ign a (i) + +2 out CPA (i+) b (i) out (i) CPA i. ign (i) Sheme of radix-/radix-6 CSA (one digit). We performed a ynthei of the unit of Fig. 3 (plu onvertand-round unit) uing the STM 9 nm CMOS tandard ell library [7] and Synopy Deign Compiler. From the ynthei we etimated the ritial path (inluding etimation at netlit level of wire load) and the area. The ritial path i highlighted in Fig. 3 (dotted line). The reult are ompared with thoe of [] for the radix- diviion and with thoe of [8] for radix-6. The data in Table III how that the delay of the ritial path for the dual radix unit i pratially the ame ine the differene i about one INVFO. The additional area with repet to the implementation of [] orrepond mainly to the following module: multiplexer to elet the multiple of the divior in the preomputation blok; module to ompute the eletion ontant; the extra module in the eletion funtion; the multiplexer for the CSA in the dual radix reurrene (Fig. ). However, by omparing the area of the ombined divider with eparate unit for eah radix, we have about 2% le area. V. CONCUSIONS We onlude that the ombination of both radie in a ingle unit i feaible. The yle time i imilar to that of the radix- (and radix-6) implementation and the additional area an be jutified by onidering that the unit an perform both radix- and radix-6 diviion. The eletion funtion might be implified omewhat by modifying the implementation for radix- of [] uing a et for = { 2,,,, 2}, ine that i alo reuired for radix-6. 97

EFEENCES [] M. F. Cowlihaw, Deimal floating-point: algorim for omputer, in Pro. of 6th Sympoium on Computer Arithmeti, June 23, pp.. [2] A. Vazuez and E. Antelo, Conditional peulative deimal addition, Pro. 7th Conferene on eal Number and Computer (NC 7), pp. 7 57, June 26. [3] A. Vazuez, E. Antelo, and P. Montuhi, A new family of highperformane parallel deimal multiplier, to appear in Pro. of 8th Sympoium on Computer Arithmeti, June 27. [] T. ang and A. Nannarelli, A adix- Digit-eurrene Diviion Unit: Algorithm and Arhiteture, IEEE Tranation on Computer, vol. 56, no. 6, pp. 727 739, June 27. [5] M. Eregova and T. ang, Diviion and Suare oot: Digit-eurrene Algorithm and Implementation. Kluwer Aademi Publiher, 99. [6] N. Burge and C. Hind, Deign Iue in adix- ST Suare oot and Divide Unit, Pro. 35th Ailomar Conferene on Signal, Sytem and Computer, pp. 66 65, 2. [7] STMiroeletroni. 9nm CMOS9 Deign Platform. [Online]. Available: http://www.t.om/tonline/prodpre/dediate/o/ai/9plat.htm [8] E. Antelo, T. ang, P. Montuhi, and A. Nannarelli, Digit-reurrene divider with redued logial depth, IEEE Tranation on Computer, vol. 5, pp. 837 85, July 25. 97