Mallathahally, Bangalore, India 1 2

Similar documents
FPGA Based Fixed Width 4 4, 6 6, 8 8 and Bit Multipliers using Spartan-3AN

RADIX-10 PARALLEL DECIMAL MULTIPLIER

A Binarization Algorithm specialized on Document Images and Photos

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

CHAPTER 4 PARALLEL PREFIX ADDER

Newton-Raphson division module via truncated multipliers

An Optimal Algorithm for Prufer Codes *

FPGA Implementation of CORDIC Algorithms for Sine and Cosine Generator

Area Efficient Self Timed Adders For Low Power Applications in VLSI

FPGA IMPLEMENTATION OF RADIX-10 PARALLEL DECIMAL MULTIPLIER

High-Boost Mesh Filtering for 3-D Shape Enhancement

A New Memory Reduced Radix-4 CORDIC Processor For FFT Operation

Load Balancing for Hex-Cell Interconnection Network

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Enhanced AMBTC for Image Compression using Block Classification and Interpolation

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Resource Efficient Design and Implementation of Standard and Truncated Multipliers using FPGAs

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Mathematics 256 a course in differential equations for engineering students

Conditional Speculative Decimal Addition*

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Analysis of Min Sum Iterative Decoder using Buffer Insertion

F Geometric Mean Graphs

Dynamic Code Block Size for JPEG 2000

Lecture 3: Computer Arithmetic: Multiplication and Division

Solving two-person zero-sum game by Matlab

Using Delayed Addition Techniques to Accelerate Integer and Floating-Point Calculations in Configurable Hardware

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

The Codesign Challenge

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Related-Mode Attacks on CTR Encryption Mode

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*

Hermite Splines in Lie Groups as Products of Geodesics

Rapid Development of High Performance Floating-Point Pipelines for Scientific Simulation 1

On Reconfiguration-Oriented Approximate Adder Design and Its Application

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Detection of an Object by using Principal Component Analysis

Parallel matrix-vector multiplication

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Computational ghost imaging using a fieldprogrammable

UB at GeoCLEF Department of Geography Abstract

Video Proxy System for a Large-scale VOD System (DINA)

Cluster Analysis of Electrical Behavior

Analysis of Continuous Beams in General

Load-Balanced Anycast Routing

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

A Facet Generation Procedure. for solving 0/1 integer programs

A Secured Method for Image Steganography Based On Pixel Values

A Load-balancing and Energy-aware Clustering Algorithm in Wireless Ad-hoc Networks

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Simulation Based Analysis of FAST TCP using OMNET++

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

Adaptive Energy and Location Aware Routing in Wireless Sensor Network

FPGA IMPLEMENTATION OF A PARALLEL PIPELINED HARDWARE GENETIC ALGORITHM (PPHGA) AND ITS APPLICATIONS IN FUNCTION APPROXIMATION

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

X- Chart Using ANOM Approach

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

Real-time Motion Capture System Using One Video Camera Based on Color and Edge Distribution

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

A New Approach For the Ranking of Fuzzy Sets With Different Heights

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

A fast algorithm for color image segmentation

Lecture 5: Multilayer Perceptrons

Fast Computation of Shortest Path for Visiting Segments in the Plane

IMPACT OF RADIO MAP SIMULATION ON POSITIONING IN INDOOR ENVIRONTMENT USING FINGER PRINTING ALGORITHMS

Learning a Class-Specific Dictionary for Facial Expression Recognition

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

The Shortest Path of Touring Lines given in the Plane

Shadowed Type-2 Fuzzy Logic Systems

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Convolutional interleaver for unequal error protection of turbo codes

Multiblock method for database generation in finite element programs

FPGA-based implementation of circular interpolation

Cracking of the Merkle Hellman Cryptosystem Using Genetic Algorithm

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm

Cordial and 3-Equitable Labeling for Some Star Related Graphs

An Associative Processor Array Designed for Computer Vision

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A method for real-time implementation of HOG feature extraction

Distance Calculation from Single Optical Image

Security Enhanced Dynamic ID based Remote User Authentication Scheme for Multi-Server Environments

Analysis on the Workspace of Six-degrees-of-freedom Industrial Robot Based on AutoCAD

Support Vector Machines

A Fast Parallel Reed-Solomon Decoder On a Reconfigurable Architecture

Transcription:

7 IMPLEMENTATION OF HIGH PERFORMANCE BINARY SQUARER PRADEEP M C, RAMESH S, Department of Electroncs and Communcaton Engneerng, Dr. Ambedkar Insttute of Technology, Mallathahally, Bangalore, Inda pradeepmc@gmal.com, rameshs.hullepura@gmal.com ABSTRACT In moble drven market to carry-on wth present technology hgh speed applcaton requres faster methods of square archtecture. In ths work hgh performance bnary number squarer usng the concept of mathematc sutras s presented. multpler and bnary adder crcuts are used for the desgn of squarng crcut archtecture. Dfferent optmzatons are presented to desgn squarer crcut archtecture to have low power and hgh speed. Optmzatons are carred-out by Partal Product (PP) foldng method and rearrangement of the PPs. The proposed squarer crcut s syntheszed usng Xln. verson tool for Feld Programmable Gate Array (FPGA) flow and Cadence. verson tool for Applcaton Specfc Integrated Crcut (ASIC) flow for the analyss of dynamc power consumpton and propagaton delay and the desgn s smulated usng Modelsm 6.5 verson tool for functonal verfcaton. Keywords: ASIC flow, Bnary Adder, FPGA flow, mathematcs. INTRODUCTION Square s an arthmetc crcut used n some specal processors such as Dgtal Sgnal Processng (DSP) [][]. Specalzed squarng crcuts have been proposed for DSP applcatons such as mage compresson, pattern recognton etc..., [][4]. Squarng operatons s specal multplcaton operaton whch has two equal operands. So a multpler wth two equal nputs can be used as squarng crcut. But n some arthmetc processors due to the ncreased delay and power whch s caused by usng the multpler crcut as squarer, a specal squarer crcut can be desgned for squarng operaton. In ths paper, archtecture used to desgn squarng of a bnary number s eplored to create a crcut usng sutras. By usng sutras the overall processor performance can be mproved for many applcatons. Therefore the goal s to create a squarng archtecture that s comparable n speed and power than a desgn usng standard multpler. Ths paper s organzed as follows. In secton, the overvew of related work s brefly revewed. In secton, the proposed squarer archtecture s dscussed. The performance of proposed squarer archtecture s compared wth estng squarer archtecture wth results and dscusson n secton 4. Fnally, a bref concluson s gven s secton 5.. RELATED WORK mathematcs was redscovered from the ancent Indan scrptures between 9 and 98 by Sr Bharat Krshna Trtha (884-96), a scholar of Sanskrt, mathematcs, hstory and phlosophy [5]. He studed these ancent tets for years and after careful nvestgaton, was able to reconstruct a seres of mathematcal formulae called sutras. mathematcs s the name gven to the ancent system of mathematcs, or, to be precse, a unque technque of calculatons based on smple rules and prncples wth whch any mathematcal problem can be solved be t arthmetc, algebra, geometry or trgonometry [6]. The system s based on 6 sutras or aphorsms, whch were actually word formulae descrbng natural ways of solvng a whole range of mathematcal problems. One of the sutras of mathematcs mpled for multplcaton s Urdhava Tryakbhyam (vertcal and cross wre) [7] whch s also the foundaton of the proposed desgn. It s based on a concept through whch the generaton of all Partal Products (PP) can be done wth the concurrent addton of these PPs. The parallelsm n generaton of PPs and ther summaton s obtaned by vertcal and cross wre multplcaton and addton. Varous eamples and mplementaton of Urdhava Tryakbhyam sutra s dscussed n [7]. In multpler the Partal Product Generaton (PPG) and addtons are done concurrently. Ths feature makes t more attractve for bnary multplcatons. In most of the computatons the multpler unt s used to compute the square of an operand. Snce squarer s a specal case of multplcaton a dedcated squarng hardware wll sgnfcantly mprove the computaton tme. A comparson between and conventonal multpler s dscussed n [8]. Urdhava Tryakbhyam sutra of mathematcs s used for the multpler desgn. Ths referred paper conclude that conventonal and methods are computatonally same and dfference between the two les n mplementaton strategy because of whch multpler has mproved effcency. Smlar work s descrbed n [9]. and squarer were desgned usng Urdhava Tryakbhyam sutra and duple property. Ths desgn shows that mathematcal methods are computatonally faster and

7 easy to perform than conventonal method. A bnary number squarer s descrbed n []. Here, one multpler and two squarng unt s mplemented usng Urdhava Tryakbhyam sutra to desgn squarer crcut, to have reduced delay. Ths squarer s proved to have mproved effcency n terms of speed. Ths work attempts to formulate an nteractve general strategy for desgnng and mplementaton of squarer based on prncples of mathematcs.. PROPOSED SQUARER ARCHITECTURE. Archtecture The proposed squarer uses multpler module for ts computaton. The proposed multpler s desgned usng Urdhava Tryakbhyam sutra. The Partal Products (PP) of 4 4 multpler usng Urdhava Tryakbhyam sutra s shown n Fg.. As shown n Fg. the PPs are grouped nto four (n/) multpler modules and they are added usng Carry Save Adder (CSA) to produce the fnal multpler products. The block dagram of Urdhava multpler s shown n Fg.. Three nput CSA s used n the archtecture. Frst [(n-((n/) +)) to ]-bt resultant product s obtaned by takng [n-((n/) +) to ]-bt result of frst multpler module drectly. Whle the remanng resultant bts [(n-) to (n-(n/))] s obtaned by the sum produced by CSA. Snce only CSA s used n the archtecture there s a consderable amount of reducton n power consumpton and overall propagaton delay than the work proposed n []. Fg.. Partal products of 4 4 vedc multpler usng urdhava tryakbhyam sutra. n= no. of bts a[(n-):n/] b[(n-):n/] a[(n-):n/] b[(n/-):] a[(n/-):] b[(n-):n/] a[(n/)-):] b[(n/)-):] p[(n-):] p[(n-):] p[(n-):] p[(n-):] { & & p[(n-):]} { & & p[(n-):]} {p[(n-):]& po[(n-):(n- (n/))]} [n+(n/)]-bt Carry Save Adder p[(n-) to (n-(n/))] p[n-((n/)+) to ] Fg.. Block dagram of urdhava multpler. Squarer Archtecture The squarer archtecture presented here conssts of three dfferent optmzatons. The three optmzatons are based on Partal Product [PP] foldng technque and PP re-arrangement technque. In each of the optmzaton the archtecture s composed of two (n/) bt square module and one (n/) bt multpler module and the results of these three modules s added usng Carry Save Adder (CSA). In the frst optmzaton usng the Urdhava Tryakbhyam sutra the PPs of squarer s wrtten as shown n Fg.. The PPs are grouped nto two (n/) square modules and one (n/) multpler module. In the multpler module we observe that PPs appear twce. Instead of usng two multpler modules, only one multpler module s utlzed by appendng zero at the Least Sgnfcant Bt (LSB) sde of the multpler module result whch s equvalent to addng two multpler modules havng smlar PPs.

7 Fg.. Partal products of 4 4 vedc squarer usng urdhava tryakbhyam sutra. The results of multpler and squarer modules are added usng Carry-Look-Ahead (CLA) adder. The block dagram of optmzaton one s shown n Fg.4. As shown n the block dagram frst [((n/)-) to ]-bt of fnal product s obtaned by drectly takng the [((n/)-) to ]-bt result of frst squarer module (Least Sgnfcant Bt (LSB)-bts squarer). The result of the second squarer (Most Sgnfcant Bt (MSB)-bts squarer) s concatenated wth remanng bts of frst squarer and t s added wth multpler module results by concatenatng ((n/)-) zeros at the MSB sde and one zero at the LSB sde. The sum produced by CLA adder gves the remanng [(n- ) to (n/)]-bt product. X[(n-) to ] n= No. of bts. X[(n-) to (n/)] X[(n/)- to ] MSB-bts LSB-bts P[(n-) to ] Square Square P[(n-) to ] [(n/)*(n/)]-bt P[(n-) to ] {[(n/)-] zeros & p[(n-) to ] & } {p[(n-) to ] & p[(n-) to (n/)]} [n+(n/)]-bt CLA Adder P[(n-) to (n/)] P[(n/)-) to ] ](n/)] Fg.4. Block dagram of n-bt vedc squarer for optmzaton one. In the second optmzaton the PPs shown n Fg. are reduced as X X = X X. The PPs havng smlar denttes can be combned as, + = () The reduced PPs usng Equaton () s shown n Fg.5. As done n optmzaton one the reduced PPs are grouped nto two (n/) square module and one (n/) multpler module. Snce the PPs are reduced only one multpler module s used and appendng of s elmnated as two multpler module was used n optmzaton one whch was done by appendng at the LSB sde. (a) (b) Fg.5. Reduced 4 4 vedc squarer partal products usng foldng technque: (a) before re-arrangement of partal products (b) after re-arrangement of partal products.

7 The block dagram of optmzaton two s shown n Fg.6. In optmzaton two [(n/) to ] bt of fnal product s drectly taken from frst squarer module. As done n optmzaton one the second square module result s concatenated wth remanng bt of frst squarer module and t s added wth multpler module result by appendng at MSB sde. In optmzaton two due to reduced PPs and usng (n+(n/)-)-bt adder there s consderable amount of reducton n power consumpton and propagaton delay as compared to optmzaton one as (n+ (n/)) bt CLA adder s used. X[(n-) to ] n= No. of bts. X[(n-) to (n/)] X[(n/)- to ] LSB-bts LSB-bts P[(n-) to ] Square Square P[(n-) to ] [(n/)*(n/)]-bt P[(n-) to ] { & p[(n-) to ] } {p[(n-) to ] & p[(n-) to (n/)+} [n+(n/)-]-bt CLA Adder P[(n-) to ((n/)+)] P[(n/)) to ] Fg.6. Block dagram of n-bt vedc squarer for optmzaton (n/)] two and three. In optmzaton three the PPs are further reduced usng Equaton (), Equaton () and Equaton (4), + = = = + - + ( - ) + As done n frst two optmzatons the PPs are grouped nto two (n/) square modules and one (n/) multpler module. As done n optmzaton two the results of squarer and multpler are added usng (n+ ((n/)-)-bt CLA adder. Due to further reducton n depth of PPs there s a sgnfcant reducton n power consumpton as well as propagaton delay as compared to frst two optmzatons. The reduced partal products for 4 4 squarer usng optmzaton three s shown n Fg.7 and ts block dagram s shown n Fg.6. () () (4) Fg.7. Reduced 4 4 vedc squarer partal products usng Equaton (4). 4. RESULTS AND DISCUSSION Squarer for 4-bt, 8-bt and 6-bt were desgned for both estng [] and optmzed methods. Three optmzatons (optmzaton, optmzaton and optmzaton ) were performed n the optmzed method. The desgned squarer were smulated usng Modelsm tool of verson 6.5 for functonal verfcaton and syntheszed usng Cadence RTL compler tool of verson. wth 8nm standard cell technology lbrary and Xln tool of verson. (Verte 7 famly wth speed grade of -) for dynamc power and propagaton delay analyss. The smulaton result for the proposed 4-bt, 8-bt and 6-bt squarer s shown n Fg. 8 to.

74 Smulaton result n Fg. 8 to s shown for varous possble nput combnatons. As shown n Fg.8 s a 4- bt nput and p s the output (square of nput ) whch results n 8-bt bnary number. Smlarly as shown n Fg.9 s an 8-bt nput and p s the output whch results n 6-bt bnary number and n Fg. s a 6-bt nput and p s the output whch results n -bt bnary number. Block dagram of 4-bt, 8-bt and 6-bt squarer for optmzaton three s shown n Fg. to. As shown n block dagram s the nput gven to squarer module and p s output of the squarer module, q and q are squarer modules, and m s multpler module and l, l are the adder module. The performance of the proposed squarer desgn for 4-bt, 8-bt and 6-bt s shown n Table [,,, 4, 5, and 6]. Percentage mprovement n the Table [,,, 4, 5, and 6] s calculated for optmzaton three squarer wth respect to estng squarer []. The comparson results n Table [,,, 4, 5 and 6] shows that the proposed squarng archtecture not only consumes less power but also performs hgh speed than squarer desgn n []. Table. Synthess Result of 4-bt Squarer n ASIC flow Parameters Propagaton Dynamc Estng [].77. Optmzaton-.. Optmzaton-.77. Optmzaton-.44.8 % Improvement 47.94 45.45 Table. Synthess Result of 8-bt Squarer n ASIC flow Parameters Propagaton Dynamc Estng [] 7.. Optmzaton- 4.8.76 Optmzaton- 4.64.7 Optmzaton-.78.65 % Improvement 48.44 8.6 Table. Synthess Result of 6-bt Squarer n ASIC flow Parameters Propagaton Dynamc Estng [] 5.464.87 Optmzaton- 8.96. Optmzaton- 8.946. Optmzaton- 8.66.994 % Improvement 4.99.76 Table 4. Synthess Result of 4-bt Squarer n FPGA flow Parameters Propagaton Dynamc Estng [] 6.59 4.7 Optmzaton- 4.6 4. Optmzaton- 4. 5. Optmzaton-.959 4.6 % Improvement 54.67 7. Table 5. Synthess Result of 8-bt Squarer n FPGA flow Parameters Propagaton Dynamc Estng [].8.56 Optmzaton- 7.664.4 Optmzaton- 7.5.5 Optmzaton- 7.5. % Improvement 4..9

75 Table 6. Synthess Result of 6-bt Squarer n FPGA flow Parameters Propagaton Dynamc Estng [].689.8 Optmzaton-.7 8.68 Optmzaton-.55.6 Optmzaton-.55 8.9 % Improvement 7.7 9.8 Fg.8. Smulaton results of 4-bt squarer Fg.9. Smulaton results of 8-bt squarer Fg.. Smulaton results of 6-bt squarer Fg.. Block dagram of optmzaton three 4-bt squarer Fg.. Block dagram of optmzaton three 8-bt squarer

76 Fg.. Block dagram of optmzaton three 6-bt squarer 5. CONCLUSION The focus of ths work s to acheve optmzed and realstc squarer archtecture. The overall performance of the proposed squarer s proved to ehbt mproved effcency n terms of propagaton delay and dynamc power reducton. Due to factors of low power and hgh speed the proposed squarer can be used for DSP and cryptography applcatons whch nvolve tme consumng processes lke squarng. The proposed squarer desgn s smulated and syntheszed for 4-bt, 8-bt and 6-bt and t can be etended for hgher number of bts of unsgned numbers. The tabulated result shows that for the optmzed 6-bt squarer the overall propagaton delay s reduced by 4.99% and dynamc power by.76% for ASIC flow and smlarly 7.7% and 9.8% for FPGA flow when compared wth estng squarer desgn []. REFERENCES [] Johnny Phl and Enar J (996), A multpler and squarer generator for hgh performance DSP applcatons, IEEE 9 th Mdwest Symposum on Crcuts and System, Ames, pp. 9-. [] Akhalesh K, Itawadya, Raesh Mahle, Vvek Patel and Dadan Kumar (), Desgn a DSP Operatons usng vedc mathematcs, IEEE Internatonal Conference on Communcatons and Sgnal Processng (ICCSP), Melmaruvathur, pp. 897-9. [] Hmanshu Thaplyal and M.B Srnvas (5), An effcent method of ellptc curve encrypton usng ancent Indan vedc mathematcs, IEEE 48th Mdwest Symposum on Crcuts and Systems, Covngton, pp. 86-88. [4] S.Kumaravel and Ramalatha Marmuthu (7), VLSI mplementaton of hgh performance RSA algorthm usng vedc mathematcs, IEEE Conference on Computatonal Intellgence and Multmeda Applcatons, Svakas, pp. 6-8. [5] www.vedcmaths.com [6] A.P Ncholas, J Pckles and K Wllams (98), Introductory Lectures on Mathematcs, Polytechnc of North London. [7] A.P Ncholas, K.R Wllams and J Pckles (), Applcatons of the mathematcs Sutra: Vertcally and Crosswre, Inspraton books, Thrd revsed edton, The mathematcs research group. [8] Parth Mehta and Dhanashr Gawal (9), Conventonal versus vedc mathematcal method for hardware mplementaton of a multpler, IEEE Internatonal Conference on Advances n Computng, Control, Telecommuncaton Technologes, Trvandrum, pp. 64-64. [9] Abheet Kumar, Dlp Kumar and Sddh (), Hardware mplementaton of 6*6 bt multpler and square usng vedc mathematcs, Internatonal Conference on Sgnal, Image and Vdeo Processng (ICSIVP), pp. 9-4. [] kabra Seth and Rutuparna Panda (), An mproved squarng crcut for bnary numbers, Internatonal ournal of advanced computer scence and applcatons, vol., No., pp. -5.