7 IMPLEMENTATION OF HIGH PERFORMANCE BINARY SQUARER PRADEEP M C, RAMESH S, Department of Electroncs and Communcaton Engneerng, Dr. Ambedkar Insttute of Technology, Mallathahally, Bangalore, Inda pradeepmc@gmal.com, rameshs.hullepura@gmal.com ABSTRACT In moble drven market to carry-on wth present technology hgh speed applcaton requres faster methods of square archtecture. In ths work hgh performance bnary number squarer usng the concept of mathematc sutras s presented. multpler and bnary adder crcuts are used for the desgn of squarng crcut archtecture. Dfferent optmzatons are presented to desgn squarer crcut archtecture to have low power and hgh speed. Optmzatons are carred-out by Partal Product (PP) foldng method and rearrangement of the PPs. The proposed squarer crcut s syntheszed usng Xln. verson tool for Feld Programmable Gate Array (FPGA) flow and Cadence. verson tool for Applcaton Specfc Integrated Crcut (ASIC) flow for the analyss of dynamc power consumpton and propagaton delay and the desgn s smulated usng Modelsm 6.5 verson tool for functonal verfcaton. Keywords: ASIC flow, Bnary Adder, FPGA flow, mathematcs. INTRODUCTION Square s an arthmetc crcut used n some specal processors such as Dgtal Sgnal Processng (DSP) [][]. Specalzed squarng crcuts have been proposed for DSP applcatons such as mage compresson, pattern recognton etc..., [][4]. Squarng operatons s specal multplcaton operaton whch has two equal operands. So a multpler wth two equal nputs can be used as squarng crcut. But n some arthmetc processors due to the ncreased delay and power whch s caused by usng the multpler crcut as squarer, a specal squarer crcut can be desgned for squarng operaton. In ths paper, archtecture used to desgn squarng of a bnary number s eplored to create a crcut usng sutras. By usng sutras the overall processor performance can be mproved for many applcatons. Therefore the goal s to create a squarng archtecture that s comparable n speed and power than a desgn usng standard multpler. Ths paper s organzed as follows. In secton, the overvew of related work s brefly revewed. In secton, the proposed squarer archtecture s dscussed. The performance of proposed squarer archtecture s compared wth estng squarer archtecture wth results and dscusson n secton 4. Fnally, a bref concluson s gven s secton 5.. RELATED WORK mathematcs was redscovered from the ancent Indan scrptures between 9 and 98 by Sr Bharat Krshna Trtha (884-96), a scholar of Sanskrt, mathematcs, hstory and phlosophy [5]. He studed these ancent tets for years and after careful nvestgaton, was able to reconstruct a seres of mathematcal formulae called sutras. mathematcs s the name gven to the ancent system of mathematcs, or, to be precse, a unque technque of calculatons based on smple rules and prncples wth whch any mathematcal problem can be solved be t arthmetc, algebra, geometry or trgonometry [6]. The system s based on 6 sutras or aphorsms, whch were actually word formulae descrbng natural ways of solvng a whole range of mathematcal problems. One of the sutras of mathematcs mpled for multplcaton s Urdhava Tryakbhyam (vertcal and cross wre) [7] whch s also the foundaton of the proposed desgn. It s based on a concept through whch the generaton of all Partal Products (PP) can be done wth the concurrent addton of these PPs. The parallelsm n generaton of PPs and ther summaton s obtaned by vertcal and cross wre multplcaton and addton. Varous eamples and mplementaton of Urdhava Tryakbhyam sutra s dscussed n [7]. In multpler the Partal Product Generaton (PPG) and addtons are done concurrently. Ths feature makes t more attractve for bnary multplcatons. In most of the computatons the multpler unt s used to compute the square of an operand. Snce squarer s a specal case of multplcaton a dedcated squarng hardware wll sgnfcantly mprove the computaton tme. A comparson between and conventonal multpler s dscussed n [8]. Urdhava Tryakbhyam sutra of mathematcs s used for the multpler desgn. Ths referred paper conclude that conventonal and methods are computatonally same and dfference between the two les n mplementaton strategy because of whch multpler has mproved effcency. Smlar work s descrbed n [9]. and squarer were desgned usng Urdhava Tryakbhyam sutra and duple property. Ths desgn shows that mathematcal methods are computatonally faster and
7 easy to perform than conventonal method. A bnary number squarer s descrbed n []. Here, one multpler and two squarng unt s mplemented usng Urdhava Tryakbhyam sutra to desgn squarer crcut, to have reduced delay. Ths squarer s proved to have mproved effcency n terms of speed. Ths work attempts to formulate an nteractve general strategy for desgnng and mplementaton of squarer based on prncples of mathematcs.. PROPOSED SQUARER ARCHITECTURE. Archtecture The proposed squarer uses multpler module for ts computaton. The proposed multpler s desgned usng Urdhava Tryakbhyam sutra. The Partal Products (PP) of 4 4 multpler usng Urdhava Tryakbhyam sutra s shown n Fg.. As shown n Fg. the PPs are grouped nto four (n/) multpler modules and they are added usng Carry Save Adder (CSA) to produce the fnal multpler products. The block dagram of Urdhava multpler s shown n Fg.. Three nput CSA s used n the archtecture. Frst [(n-((n/) +)) to ]-bt resultant product s obtaned by takng [n-((n/) +) to ]-bt result of frst multpler module drectly. Whle the remanng resultant bts [(n-) to (n-(n/))] s obtaned by the sum produced by CSA. Snce only CSA s used n the archtecture there s a consderable amount of reducton n power consumpton and overall propagaton delay than the work proposed n []. Fg.. Partal products of 4 4 vedc multpler usng urdhava tryakbhyam sutra. n= no. of bts a[(n-):n/] b[(n-):n/] a[(n-):n/] b[(n/-):] a[(n/-):] b[(n-):n/] a[(n/)-):] b[(n/)-):] p[(n-):] p[(n-):] p[(n-):] p[(n-):] { & & p[(n-):]} { & & p[(n-):]} {p[(n-):]& po[(n-):(n- (n/))]} [n+(n/)]-bt Carry Save Adder p[(n-) to (n-(n/))] p[n-((n/)+) to ] Fg.. Block dagram of urdhava multpler. Squarer Archtecture The squarer archtecture presented here conssts of three dfferent optmzatons. The three optmzatons are based on Partal Product [PP] foldng technque and PP re-arrangement technque. In each of the optmzaton the archtecture s composed of two (n/) bt square module and one (n/) bt multpler module and the results of these three modules s added usng Carry Save Adder (CSA). In the frst optmzaton usng the Urdhava Tryakbhyam sutra the PPs of squarer s wrtten as shown n Fg.. The PPs are grouped nto two (n/) square modules and one (n/) multpler module. In the multpler module we observe that PPs appear twce. Instead of usng two multpler modules, only one multpler module s utlzed by appendng zero at the Least Sgnfcant Bt (LSB) sde of the multpler module result whch s equvalent to addng two multpler modules havng smlar PPs.
7 Fg.. Partal products of 4 4 vedc squarer usng urdhava tryakbhyam sutra. The results of multpler and squarer modules are added usng Carry-Look-Ahead (CLA) adder. The block dagram of optmzaton one s shown n Fg.4. As shown n the block dagram frst [((n/)-) to ]-bt of fnal product s obtaned by drectly takng the [((n/)-) to ]-bt result of frst squarer module (Least Sgnfcant Bt (LSB)-bts squarer). The result of the second squarer (Most Sgnfcant Bt (MSB)-bts squarer) s concatenated wth remanng bts of frst squarer and t s added wth multpler module results by concatenatng ((n/)-) zeros at the MSB sde and one zero at the LSB sde. The sum produced by CLA adder gves the remanng [(n- ) to (n/)]-bt product. X[(n-) to ] n= No. of bts. X[(n-) to (n/)] X[(n/)- to ] MSB-bts LSB-bts P[(n-) to ] Square Square P[(n-) to ] [(n/)*(n/)]-bt P[(n-) to ] {[(n/)-] zeros & p[(n-) to ] & } {p[(n-) to ] & p[(n-) to (n/)]} [n+(n/)]-bt CLA Adder P[(n-) to (n/)] P[(n/)-) to ] ](n/)] Fg.4. Block dagram of n-bt vedc squarer for optmzaton one. In the second optmzaton the PPs shown n Fg. are reduced as X X = X X. The PPs havng smlar denttes can be combned as, + = () The reduced PPs usng Equaton () s shown n Fg.5. As done n optmzaton one the reduced PPs are grouped nto two (n/) square module and one (n/) multpler module. Snce the PPs are reduced only one multpler module s used and appendng of s elmnated as two multpler module was used n optmzaton one whch was done by appendng at the LSB sde. (a) (b) Fg.5. Reduced 4 4 vedc squarer partal products usng foldng technque: (a) before re-arrangement of partal products (b) after re-arrangement of partal products.
7 The block dagram of optmzaton two s shown n Fg.6. In optmzaton two [(n/) to ] bt of fnal product s drectly taken from frst squarer module. As done n optmzaton one the second square module result s concatenated wth remanng bt of frst squarer module and t s added wth multpler module result by appendng at MSB sde. In optmzaton two due to reduced PPs and usng (n+(n/)-)-bt adder there s consderable amount of reducton n power consumpton and propagaton delay as compared to optmzaton one as (n+ (n/)) bt CLA adder s used. X[(n-) to ] n= No. of bts. X[(n-) to (n/)] X[(n/)- to ] LSB-bts LSB-bts P[(n-) to ] Square Square P[(n-) to ] [(n/)*(n/)]-bt P[(n-) to ] { & p[(n-) to ] } {p[(n-) to ] & p[(n-) to (n/)+} [n+(n/)-]-bt CLA Adder P[(n-) to ((n/)+)] P[(n/)) to ] Fg.6. Block dagram of n-bt vedc squarer for optmzaton (n/)] two and three. In optmzaton three the PPs are further reduced usng Equaton (), Equaton () and Equaton (4), + = = = + - + ( - ) + As done n frst two optmzatons the PPs are grouped nto two (n/) square modules and one (n/) multpler module. As done n optmzaton two the results of squarer and multpler are added usng (n+ ((n/)-)-bt CLA adder. Due to further reducton n depth of PPs there s a sgnfcant reducton n power consumpton as well as propagaton delay as compared to frst two optmzatons. The reduced partal products for 4 4 squarer usng optmzaton three s shown n Fg.7 and ts block dagram s shown n Fg.6. () () (4) Fg.7. Reduced 4 4 vedc squarer partal products usng Equaton (4). 4. RESULTS AND DISCUSSION Squarer for 4-bt, 8-bt and 6-bt were desgned for both estng [] and optmzed methods. Three optmzatons (optmzaton, optmzaton and optmzaton ) were performed n the optmzed method. The desgned squarer were smulated usng Modelsm tool of verson 6.5 for functonal verfcaton and syntheszed usng Cadence RTL compler tool of verson. wth 8nm standard cell technology lbrary and Xln tool of verson. (Verte 7 famly wth speed grade of -) for dynamc power and propagaton delay analyss. The smulaton result for the proposed 4-bt, 8-bt and 6-bt squarer s shown n Fg. 8 to.
74 Smulaton result n Fg. 8 to s shown for varous possble nput combnatons. As shown n Fg.8 s a 4- bt nput and p s the output (square of nput ) whch results n 8-bt bnary number. Smlarly as shown n Fg.9 s an 8-bt nput and p s the output whch results n 6-bt bnary number and n Fg. s a 6-bt nput and p s the output whch results n -bt bnary number. Block dagram of 4-bt, 8-bt and 6-bt squarer for optmzaton three s shown n Fg. to. As shown n block dagram s the nput gven to squarer module and p s output of the squarer module, q and q are squarer modules, and m s multpler module and l, l are the adder module. The performance of the proposed squarer desgn for 4-bt, 8-bt and 6-bt s shown n Table [,,, 4, 5, and 6]. Percentage mprovement n the Table [,,, 4, 5, and 6] s calculated for optmzaton three squarer wth respect to estng squarer []. The comparson results n Table [,,, 4, 5 and 6] shows that the proposed squarng archtecture not only consumes less power but also performs hgh speed than squarer desgn n []. Table. Synthess Result of 4-bt Squarer n ASIC flow Parameters Propagaton Dynamc Estng [].77. Optmzaton-.. Optmzaton-.77. Optmzaton-.44.8 % Improvement 47.94 45.45 Table. Synthess Result of 8-bt Squarer n ASIC flow Parameters Propagaton Dynamc Estng [] 7.. Optmzaton- 4.8.76 Optmzaton- 4.64.7 Optmzaton-.78.65 % Improvement 48.44 8.6 Table. Synthess Result of 6-bt Squarer n ASIC flow Parameters Propagaton Dynamc Estng [] 5.464.87 Optmzaton- 8.96. Optmzaton- 8.946. Optmzaton- 8.66.994 % Improvement 4.99.76 Table 4. Synthess Result of 4-bt Squarer n FPGA flow Parameters Propagaton Dynamc Estng [] 6.59 4.7 Optmzaton- 4.6 4. Optmzaton- 4. 5. Optmzaton-.959 4.6 % Improvement 54.67 7. Table 5. Synthess Result of 8-bt Squarer n FPGA flow Parameters Propagaton Dynamc Estng [].8.56 Optmzaton- 7.664.4 Optmzaton- 7.5.5 Optmzaton- 7.5. % Improvement 4..9
75 Table 6. Synthess Result of 6-bt Squarer n FPGA flow Parameters Propagaton Dynamc Estng [].689.8 Optmzaton-.7 8.68 Optmzaton-.55.6 Optmzaton-.55 8.9 % Improvement 7.7 9.8 Fg.8. Smulaton results of 4-bt squarer Fg.9. Smulaton results of 8-bt squarer Fg.. Smulaton results of 6-bt squarer Fg.. Block dagram of optmzaton three 4-bt squarer Fg.. Block dagram of optmzaton three 8-bt squarer
76 Fg.. Block dagram of optmzaton three 6-bt squarer 5. CONCLUSION The focus of ths work s to acheve optmzed and realstc squarer archtecture. The overall performance of the proposed squarer s proved to ehbt mproved effcency n terms of propagaton delay and dynamc power reducton. Due to factors of low power and hgh speed the proposed squarer can be used for DSP and cryptography applcatons whch nvolve tme consumng processes lke squarng. The proposed squarer desgn s smulated and syntheszed for 4-bt, 8-bt and 6-bt and t can be etended for hgher number of bts of unsgned numbers. The tabulated result shows that for the optmzed 6-bt squarer the overall propagaton delay s reduced by 4.99% and dynamc power by.76% for ASIC flow and smlarly 7.7% and 9.8% for FPGA flow when compared wth estng squarer desgn []. REFERENCES [] Johnny Phl and Enar J (996), A multpler and squarer generator for hgh performance DSP applcatons, IEEE 9 th Mdwest Symposum on Crcuts and System, Ames, pp. 9-. [] Akhalesh K, Itawadya, Raesh Mahle, Vvek Patel and Dadan Kumar (), Desgn a DSP Operatons usng vedc mathematcs, IEEE Internatonal Conference on Communcatons and Sgnal Processng (ICCSP), Melmaruvathur, pp. 897-9. [] Hmanshu Thaplyal and M.B Srnvas (5), An effcent method of ellptc curve encrypton usng ancent Indan vedc mathematcs, IEEE 48th Mdwest Symposum on Crcuts and Systems, Covngton, pp. 86-88. [4] S.Kumaravel and Ramalatha Marmuthu (7), VLSI mplementaton of hgh performance RSA algorthm usng vedc mathematcs, IEEE Conference on Computatonal Intellgence and Multmeda Applcatons, Svakas, pp. 6-8. [5] www.vedcmaths.com [6] A.P Ncholas, J Pckles and K Wllams (98), Introductory Lectures on Mathematcs, Polytechnc of North London. [7] A.P Ncholas, K.R Wllams and J Pckles (), Applcatons of the mathematcs Sutra: Vertcally and Crosswre, Inspraton books, Thrd revsed edton, The mathematcs research group. [8] Parth Mehta and Dhanashr Gawal (9), Conventonal versus vedc mathematcal method for hardware mplementaton of a multpler, IEEE Internatonal Conference on Advances n Computng, Control, Telecommuncaton Technologes, Trvandrum, pp. 64-64. [9] Abheet Kumar, Dlp Kumar and Sddh (), Hardware mplementaton of 6*6 bt multpler and square usng vedc mathematcs, Internatonal Conference on Sgnal, Image and Vdeo Processng (ICSIVP), pp. 9-4. [] kabra Seth and Rutuparna Panda (), An mproved squarng crcut for bnary numbers, Internatonal ournal of advanced computer scence and applcatons, vol., No., pp. -5.