VHDL Implementation and Performance Analysis of two Division Algorithms. Salman Khan B.S., Sir Syed University of Engineering and Technology, 2010

Size: px
Start display at page:

Download "VHDL Implementation and Performance Analysis of two Division Algorithms. Salman Khan B.S., Sir Syed University of Engineering and Technology, 2010"

Transcription

1 VHDL Implementation and Performance Analysis of two Division Algorithms by Salman Khan B.S., Sir Syed University of Engineering and Technology, 2010 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Applied Science in the Department of Electrical and Computer Engineering c Salman Khan, 2015 University of Victoria All rights reserved. This thesis may not be reproduced in whole or in part, by photocopying or other means, without the permission of the author.

2 ii VHDL Implementation and Performance Analysis of two Division Algorithms by Salman Khan B.S., Sir Syed University of Engineering and Technology, 2010 Supervisory Committee Dr. Fayez Gebali, Supervisor (Department of Electrical and Computer Engineering) Dr. Atef Ibrahim, Member (Department of Electrical and Computer Engineering)

3 iii Supervisory Committee Dr. Fayez Gebali, Supervisor (Department of Electrical and Computer Engineering) Dr. Atef Ibrahim, Member (Department of Electrical and Computer Engineering) ABSTRACT Division is one of the most fundamental arithmetic operations and is used extensively in engineering, scientific, mathematical and cryptographic applications. The implementation of arithmetic operation such as division, is complex and expensive in hardware. Unlike addition and subtraction, division requires several iterative computational steps on given operands to produce the result. Division, in the past has often been perceived as an infrequently used operation and received not as much attention but it is one of the most difficult operations in computer arithmetic. The techniques of implementation in hardware of such an iterative computation impacts the speed, the area and power of the digital circuit. For this reason, we consider two division algorithms based on their step size in shift. Algorithm 1 operates on fixed shift step size and has a fixed number of iteration while the Algorithms 2 operates on variable shift step size and requires considerably fewer number of iterations. In this thesis, technique is provided to save power and speed up the overall computation. It also looks at different design goal strategies and presents a comparative study to asses how each of the two design perform in terms of area, delay and power consumption.

4 iv Contents Supervisory Committee Abstract Table of Contents List of Tables List of Figures Acknowledgements Dedication ii iii iv vii viii x xi 1 Introduction Overview Motivation for this work Contributions Thesis Organization Division Background Division Fundamentals Division Algorithms Classes Digit Recurrence Algorithms Functional Iteration Algorithms Very High Radix Algorithms Look-up Tables Variable Latency Algorithms Related work in the area Chapter Summary

5 v 3 Considered Division Algorithms Division Approach Reasons For Considerations Overview of Operation Division Algorithm 1 : Fixed Shift Algorithm Mode 1 : Range reduction of Y Mode 2 : Post processing of Y and Z Division Algorithm 2 : Adaptive Shift Algorithm Mode 1 : Range reduction of Y Mode 2 : Post processing of Y and Z Chapter Summary Design and Implementation Hardware entities for Algorithm X, Y and Z Registers Data Multiplexer Comparator for Y The Look-up table The ALU unit Counter Finite State Machine FSM : State transition diagram Hardware entities for Algorithm Delta Address Generator DAG Implementation Finite State Machine FSM : State transition diagram Circuit Implementations Algorithm 1 : Fixed Shift division algorithm DAG overall layout Algorithm 2: Adaptive Shift division algorithm Chapter summary Results and Evaluation Numerical Simulation using MATLAB

6 vi Numerical Simulation of Algorithm Numerical Simulation of Algorithm Hardware Simulation VHDL Simulation of Algorithm VHDL Simulation of Algorithm Performance Evaluation Device Utilization Timing Analysis Power Consumption Power-Delay Product Area-Delay Product Comparison of Work in Related Area Chapter Summary Conclusion, Contributions and Future Work Conclusion Contributions Future Work Bibliography 62 7 Additional Information Interpretation of signals Used Terms and Acronyms 67

7 vii List of Tables Table 4.1 Truth Table when Y is positive Table 4.2 Truth Table when Y is negative Table 5.1 Iterations for Algorithm Table 5.2 Iterations for Algorithm Table 5.3 On-chip device utilization of Algorithm Table 5.4 On-chip device utilization of Algorithm Table 5.5 Timing Summary of Algorithm Table 5.6 Timing Summary of Algorithm Table 5.7 On-chip power consumptions Table 5.8 Power-delay product for Algorithm 1 and Table 5.9 Area-delay product for Algorithm 1 and Table 5.10 Summary of related work in the area

8 viii List of Figures Figure 2.1 Nonzero bits of X and Y at the start of division Figure 2.2 Nonzero bits of X and Y at the end of division Figure 4.1 Algorithm 1 system level Figure 4.2 Registers X, Y and Z in the bank Figure 4.3 Data multiplexer for register bank Figure 4.4 Comparator for Y Figure 4.5 LUT block Figure 4.6 ALU block Figure 4.7 Logical operation of ALU during i th iteration Figure 4.8 Counter block diagram Figure 4.9 Finite State Machine block Figure 4.10 State transition diagram for Algorithm Figure 4.11 Algorithm 2 system level Figure 4.12 Delta (δ) Address Generator Figure 4.13 DAG system level Figure 4.14 Position finder unit block Figure 4.15 Multiplexer for flag input Figure 4.16 Multiplexer for data input Figure 4.17 The P x Register Figure 4.18 The number subtractor block in DAG Figure bits scan unit 0 in level Figure 4.20 Hierarchical approach between level 1 and Figure 4.21 Hierarchical arrangement of position finder unit Figure 4.22 Finite State Machine block Figure 4.23 State transition diagram for Algorithm Figure 4.24 Top level block of fixed shift division algorithm Figure 4.25 Fixed shift division algorithm RTL schematic

9 ix Figure 4.26 Delta address generator RTL schematic Figure 4.27 Top level block of adaptive shift division algorithm Figure 4.28 Adaptive shift division algorithm RTL schematic Figure 5.1 All iterations for Algorithm Figure 5.2 Iterations 0 to 2 for Algorithm Figure 5.3 Iterations 3 to 7 for Algorithm Figure 5.4 Iterations 8 to 11 for Algorithm Figure 5.5 Iterations 12 to 14 for Algorithm Figure 5.6 All iterations for Algorithm

10 x ACKNOWLEDGMENTS In the name of Allah, the Most Gracious and the Most Merciful. All praises belong to Allah the merciful for his guidance and blessings to enable me complete this thesis. I would like to thank: My parents, for their prayers, love, patience, emotional support, motivation and assurance in difficult and frustrating moments and for their constant motivation. Despite the financial constraints, they were always ready to support me financially. My Supervisor, Dr. Fayez Gebali, for all the mentoring and support which enabled me to achieve my academic and research objectives, also for helping me cope up with off-school problems and settling in as an international student. For sharing his ideas, concepts and experiences and It would not have been possible to complete my research without his invaluable guidance. My Committee, Dr. Atef Ibrahim, for devoting precious time and providing valuable suggestions to improve the quality of the thesis. My Manager at BC Hydro, Djordje Atanackovic, for his encouragement and support to help me focus on my thesis completion. UVIC ECE Dept admin and lab staff, Kevin Jones, Janice Closson, Paul Fedrigo and Brent Sirna for assisting me during the course of my degree.

11 xi DEDICATION To my father, Muhammad Khalid Zahid and my mother, Imtiaz Khalid for having a lifelong long dream to see me achieve my graduate qualification at a world class foreign institution. In difficult times, it proved as key motivating factor and enabled me to maintain focus. To my grandmother, Rasool Fatima for her countless prayers and believing in me. To my Supervisor, Dr. Fayez Gebali, he is one of the most knowledgeable, kindest and helpful person I have known. I wish him the best of health.

12 Chapter 1 Introduction 1.1 Overview Implementation of mathematical algorithms, as those required by a Random number generator (RNG), require complex and expensive arithmetic operations like division and multiplication, while also requiring iterative computations on given inputs to obtain the required output. The techniques of implementing these operations and iterations in hardware significantly impacts the speed, area and power of the hardware. The division of two integers, the divisor and the dividend, results in an integer remainder and an integer quotient. The integer division in the one of the most fundamental arithmetic operation and is heavily required in engineering, scientific, mathematical and statistical computations. Implementing and performing division operation in hardware is complex, expensive and requires more computational power in consumption when compared to the addition and subtraction operations. According to [1], division is the most difficult operation in computer arithmetic and it is a common perception of think of division as an infrequently used operation whose implementation does not receive much attention. The division in modern micro-processors takes many clock cycles, furthermore the number of requires clock cycles for integer division also depends on the operand s values [2], a larger integer operands will require more clock cycles to perform division. The more clock cycles or numbers of iterations are needed by the divider, the more is the power consumed and the speed of operation decreases too. Ignoring the implementation has been proven in to result in significant system performance degradation [3]. In applications that employ division operation, having an efficient implementation of division hardware can significantly improve the overall

13 2 performance of the system, thus it is imperative to find out the best implementation method of the division algorithm in hardware. Having a divider that has a lower heat dissipation is also a desirable attribute in terms of performance and security. 1.2 Motivation for this work This work is part of an on-going research on the design, development and implementation of low power Pseudo Random Number Generator (PRNG) and the work focuses on the implementation and performance analysis of division algorithms to be incorporated in the PRNG. These division algorithms are implemented as co-processor designs which will be later required by the PRNG to implement the mathematical algorithm that generates the random numbers. Although the implementation of the overall PRNG exceeds the scope of this work, the targeted PRNG is based on the Park-Miller algorithm, which is a fairly popular choice of algorithm for the generation of random numbers, the algorithm requires an initial seed value, a special prime number, a quotient and a remainder to generate a random number [4]. Two hardware divider designs are considered and implemented to generate the quotient and remainder through a division algorithm for the Park-Miller Algorithm so that the random number is generated by the PRNG. The hardware for 32 bit integer division is based on the digit-recurrence, non restoring division algorithm. The divider designs are analyzed later for the performance and impact on important parameters for the choice of application. There has been quite a bit of work in the hardware dividers with reference to the application of algorithms particularly dealing with higher radix implementation and floating point implementation. Most researchers compare the performance results of the overall divider in terms of speed and area while the methodology of implementation and how the changes in implementation can affect the performance, specially in power consumption, in fixed point integer division has not been explained much clearly. This motivated us to determine the best implementation in terms of performance parameters of hardware divider and study the two dividers to see which one is best suited to low power or high speed or low cost implementation. Another motivation of this work was to come up with a simplified design approach that would allow the new designers and researchers to understand and re-implement the integer division in hardware. From the academic and learning point of view, this work enabled the understanding of iterative algorithms, their design and implementation, state machine

14 3 syncronization which are skills useful for any one learning practical hardware design implementation. 1.3 Contributions Two division algorithms which are based on digit recurrence, non-restoring division are considered and implemented. The first algorithm is called the fixed shift division algorithm while the second is the adaptive shift algorithm. The second algorithm is an improvement of the first algorithm in terms of performance. Our work contributes to the following: 1. Designed and implemented two signed integer division algorithms for performing the division operation in hardware. 2. Verify the hardware design by developing a Matlab code to confirm the correctness and accuracy of the hardware implemented in VHDL. 3. Compared performance of two division algorithms from the viewpoint of device utilization (area), power consumption and timing analysis (delay). 4. The high-radix technique proposed in [5] for floating point arithmetic is adapted to integer arithmetic. Our work will help the designer in decision making towards choosing the division implementation for application specific purpose. If the application demands high speed or low power computation such as RNGs, cryptographic and encryption processors then the adaptive shift algorithm is the preferred choice where as in applications such as those in smart cards which have area and cost constraints, the fixed shift algorithm is better suited. 1.4 Thesis Organization This section outlines the organization of the thesis and is intended to present the reader with the brief summary of main focus of each chapter. Chapter 1 introduces the reader to the subject and the scope of the research. The motivation for the research and the contributions of the research is discussed which were the fundamental objectives in thesis.

15 4 Chapter 2 describes the background and fundamentals of division in hardware. A brief classification of division algorithms is provide in order to aid the reader with the understanding of the related previous work done in the area. Chapter 3 describes our approach towards the division operation. The two considered division algorithms, known as the fixed shift division algorithm and the adaptive shift division algorithm, are presented and their methodology explained which is used to achieve the correct result of division operation. Chapter 4 describes the hardware design and implementation. The system hardware entities are explained which are common to both the algorithms and also the ones that are specific to each of the two algorithms. The circuit implementations of both the algorithms are presented. Chapter 5 contains the results and evaluations of the two algorithms. The numerical simulation results are obtained to verify that the algorithms work and then the results of hardware simulations (in VHDL) are presented to confirm that the implementation of the two algorithm has been done correctly. Performance analysis of the two algorithms is also conducted in this chapter. Chapter 6 has the concluding statements and the short description of the work and what was achieved through this work.

16 5 Chapter 2 Division Background 2.1 Division Fundamentals There are various references, such as [6][7][8], by authors who have worked on number division. The fundamental principle of division is that the division of dividend by a divisor can be realized in cycles of shifting and adding (in actual subtraction) with hardware or software control of the loop which requires iteratively converging at the correct result of the division through the hardware divider. In this literature, we refer to Y as the dividend and X as the divisor. We wish to divide the integer Y by a positive integer X, the result of this division operation should be two integers: the quotient and the remainder, denoted by q and r respectively so that the following equation is satisfied: Y = qx + r (2.1) q and r can be expressed as: q = Y X (2.2) 0 r < X (2.3) the floor value of eqn (2.2) would give us a whole number rounded to the lower integer and a fractional part which is the difference from the actual value to the

17 6 rounded value. This whole number is the quotient while the fractional part of this floor function will give us the remainder. Using this concept we can rewrite: r = Y qx (2.4) The above equation states that the remainder r, can be obtained if X is subtracted from Y for a q number of times, untill the condition in (2.3) is statisfied and at this point the value of Y is the desired remainder, r. Most hardware dividers operate in the same manner, this is very similar to the the long division by hand in which the hardware divider updates the value of Y as per the equation: Y Y δx (2.5) The δ is the partial quotient and the updated value of Y is the partial remainder. The hardware divider, in the similar manner as long division method by hand, keeps track of the quotients by adding their values in a register Z, which is given by: Z Z + δ (2.6) From (2.5) and (2.6) we see that δ is subtracted in Y and added to Z. The choice of value of δ can be arbitrary towards achieving the correct result of division, provided that the following two conditions are met: 1. The updated value for Y in (2.5) should converge to the range 0 Y < X, so that this will produce the desired remainder. If Y is positive, the factor δx is subtracted, if Y is negative, the factor δx is added to Y ; 2. The updated value for Z in (2.6) should add or subtract to produce the desired quotient. We represent the dividend Y of n bits in 2 s complement so that the range of Y can be given as: 2 n 1 Y < 2 n 1 (2.7) Our divisor X is assumed to require only m bits for it s representation such that m n. Figure 2.1 shows the nonzero bits in Y and X at the start of division operation. Our goal is to iteratively reduce the nonzero bits of Y to m bits so that

18 7 Y comes in the range: 0 Y < X (2.8) Figure (2.2) shows the nonzero bits of Y at the end of the division operation where Y stores the value of the remainder which can fall in the range 0 r < X. The choice in value of δ at each iteration to implement (2.5) and (2.6) will differentiate the division algorithms that we will implement in our work, this will be demonstrated in chapters to follow. Figure 2.1: Nonzero bits of X and Y at the start of division Figure 2.2: Nonzero bits of X and Y at the end of division

19 8 2.2 Division Algorithms Classes Oberman and Flynn presented the taxonomy of division algorithm in [3], which classified the algorithms based on their hardware implementations and they classify the algorithms in five classes: digit recurrence, functional iteration, very high radix, table look-up and variable latency. Many practical division algorithms are hybrids of several of these classes and can reach combinations of classes to the overall algorithm Digit Recurrence Algorithms Digit recurrence is the most simplest and widest implemented of all division algorithms. The methodology behind it is that it uses subtractive methods to deduce digits of quotient on every iteration and it retires a fixed number of bits of the quotient in every iteration to achieve this, meaning that the step-size of bits retired in each iterations are the same. The implementation of digit recurrence algorithms require less complexity and area Functional Iteration Algorithms The functional iteration uses the multiplication operation as the basis of division operation. Functional iteration take advantage of high speed multiplier to converge to result quadratically, unlike the subtractive division through which the result is converged upon linearly, this reduces the latency and length of each iteration cycles. Therefore instead of retiring fixed bit at iterations, this class of algorithms retire increasing bits at each iteration Very High Radix Algorithms Digit recurrence algorithms are suited to low radix division operation and as we increase the radix, the hardware and divisor multiple process gets complicated and consumes more area and computation time too. A variant of this is the very high radix algorithm which avoids the constraints posed by the higher radix, and the term very high radix applies to dividers that retire more than 10 bits in each iteration.

20 Look-up Tables When a low-precision quotient is required, it may be feasible to apply division using a look-up table implementation without the use of an algorithm. This implementation uses direct and liner approximation methods to compute quotient bits. The table can be implemented as a ROM and the advantage of using this fast processing since no arithmetic calculation is needed but on the down side, the size of the look-up table grows exponentially to account for each added bit for accuracy Variable Latency Algorithms The digit recurrence and very high radix algorithms retire fixed number of bits in every iteration while the function iteration based algorithms retire increasing number of bits in every iteration, but all three of these algorithms complete the operation in fixed number of cycles. Variable latency algorithms based dividers perform division in variable amount of time. 2.3 Related work in the area The main algorithms for division in hardware implementation were highlighted in previous section and each methodology has it s own application and benefits, however the digit recurrence algorithms is the most commonly used approach for hardware division implementation and they have procedures like restoring, non-restoring, SRT division (Sweeney, Robertson and Tocher), approximation algorithms, CORDIC algorithm, multiplicative algorithm and continued product algorithm [9]. According to Sutter and Deschamps in [10], binary non-restoring digit recurrence algorithms are the mostly preferred procedure for FPGA based dividers. Authors of [9] implemented high speed non-restoring based division using the high speed adder/subtractor approach to speed up the division operation. Sutter and Deschamps implemented high speed fixed point dividers in [10] based on utilization of FPGA characteristics such as: adder/subtractor or conditional adders having same delay as simple adders; existence of dedicated and fast carry generation and propagation logic; and additional multiplexers to the general purpose LUTs in a sequential, combinational and pipelined circuits. Achieving higher speed is desirable in hardware implementation but some applications may also require power efficiency, Nannarelli and Lang proposed low power divider [11], which discussed power saving techniques such as : re-timing the

21 10 recurrence, changing redundant representations to reduce the number of flip flops, using gates with lower drive capability, equalizing the paths of the input signals of the blocks to reduce glitches, switching-off not active blocks. We focused the implementation of division algorithms on non-restoring division methodology and designed a fixed iteration division algorithm and then utilized Dr. Gebali s HCORDIC technique [5], an adaptive algorithm methodology, to reduce number of iterations based on hierarchical design for the adaptive shift iteration algorithm. Dr. Gebali implemented this technique for floating point arithmetic and we adapted this technique to make it applicable to integer arithmetic. 2.4 Chapter Summary This chapter highlights the basics of division in hardware which will enable the reader to understand the algorithms we present in Chapter 3. Overview of some of the known division algorithm classes are presented to enable the reader to understand the high level differences between different implementations. The related work in the area of division is also discussed to present the reader with additional information to help better understanding of intended work.

22 11 Chapter 3 Considered Division Algorithms The non-restoring division algorithm is based on retiring fixed number of quotient bits in each iterations, the basis of our algorithms depends on the shifts or δ, which was introduced in the previous chapter. The difference in the size of δ defines our algorithms with the fixed δ and the adaptive δ, which we refer to as fixed shift algorithm and the adaptive shift algorithm respectively. 3.1 Division Approach Reasons For Considerations We choose these two division algorithms because of the following reasons: 1. They are popular for implementation of division in integer arithmetic. 2. No multiplier is needed (reduced power and area). 3. No adder, no multiplier, look up table is utilized thus can be implemented in non-xilinx programmable logic devices, hence these algorithms are not device specific. 4. Simplicity of the algorithms Overview of Operation The two algorithms essentially operate in two modes:

23 12 1. Range reduction mode of Y - in this mode, the algorithm takes multiple steps/iterations to reduce the dividend to converge on to the result. 2. Post processing mode of Y and Z - this is a single step to process the remainder and quotient when the result in mode 1, does not fall in the desired range. To begin the operation in mode 1, the sign of the current value of dividend Y is checked, if the value is negative, the product of δ and divisor, X is added to Y and the next value of Y is obtained. If the value of Y is positive, the product δ*x is subtracted from the current value of Y to obtain the next value of Y, these steps yields the value of the remainder. The quotient is produced in simultaneous steps, the δ is added or subtracted to the current value of quotient Z depending on the operation performed on Y since the two will have opposite operations performed on them. At each of these steps, the range of Y is also kept in checked; if at the end of the iteration, the value of Y is in the desired range, that value of Y would be the remainder and the corresponding value of Z will be the quotient. If the value is not in the range at the end of the range reduction mode, the algorithm will jump to mode 2, which will be a single step to adjust the range so that we have the correct quotient and remainder at the next step. This methodology is mathematically explained in the next section. 3.2 Division Algorithm 1 : Fixed Shift Algorithm This algorithm performs a fixed minimal number of iterative steps to give the quotient and the remainder when we perform the division of Y by X. In our work, Y is a 32 bit signed integer such that the value n, which is the number of bits in the dividend, is 32. The X is m bits long, which is 17 bits long since this is the minimum value needed by Dr. Gebali for the initial quotient to implement the random number generator. The sign of X is arbitrary, therefore assumed to be positive. The fixed shift division algorithm has the following properties: 1. The required number of iteration is equal to n m + 1.

24 13 2. The sign of the current value of Y determines if the operation needed on the next iteration is addition or subtraction. 3. The value of Z will converge on to the quotient with the opposite operation to the operation of Y in property number The δ at every iteration is determined by the equation (3.6) below Mode 1 : Range reduction of Y The step size of δ is given by the iteration index and not by the intermediate values of Y. The property number 1 is applied on Y and Z per the following equations: Y (i+1) = Y (i) µ i δ i X, 0 i n m (3.1) Z (i+1) = Z (i) + µ i δ i (3.2) where the initial value of Y and Z are: Y (0) = Y (3.3) Z (0) = 0 (3.4) the µ i in equation (3.1) and (3.2) denotes the addition or subtraction operation in a given iterative index value i, the δ i is the step size given by the following equations: µ i = 1 when Y (i) 0 1 when Y (i) < 0 (3.5) δ i = 2 (n m i), 0 i n m (3.6) Once again, it is important to remember that the iteration step size depends on δ and not on the intermediate data of the partial quotient and remainder, this step size will be governed by the binary shift and will be used by the ALU of the divider to compute the result.

25 Mode 2 : Post processing of Y and Z On the completion of Mode 1, the value of remainder, Y n m+1 needs to fall in the range: 2 m 1 Y n m+1 2 m 1 1 (3.7) This range may not be satisfied due to the following: 1. The value of Y n m+1 is negative. 2. The value of Y n m+1 is positive but greater than X. In either outcome, the post processing mode becomes applicable such that the inequality below is satisfied in order to achieve the correct remainder: 0 Y n m+1 < X (3.8) the value of quotient, Z n m+1, also needs to be updated whenever Y is changed. In order to bring the result Y n m+1 in the desired range, the following process needs to be applied: Y (n m+1) = Y (n m+1) µx (3.9) Z (n m+1) = Z (n m+1) + µ (3.10) where µ works in the same way as in range reduction mode to determine the addition and subtraction operation on equation (3.9) and (3.10) based on the following condition: µ = 1 when Y (n m+1) X 1 when Y (n m+1) < 0 To satisfy (3.8), this process is only needed once. (3.11) The total number of iterations needed in algorithm 1 is n m + 1 if the result of division is achieved in mode 1. If the result is not achieved in mode 1, a total n m + 2 iterations will be required.

26 Division Algorithm 2 : Adaptive Shift Algorithm This algorithm does not perform a fixed number of iterative steps to compute the quotient and the remainder but instead it functions by determining at each iteration, the step size δ from the magnitude of the input data. Since the step size of the shift is not fixed, we call this as adaptive shift. This algorithm requires lesser iterations in comparison to the fixed shift algorithm. Similar to our assumptions for fixed shift algorithm, we consider the divisor X to have m bits and the dividend Y to have n bits, inclusive of sign bit. The adaptive shift division algorithm has the following properties: 1. The required number of iteration is determined by the input data. 2. The sign of the current value of Y determines if the operation needed on the next iteration is addition or subtraction. 3. The value of Z will converge on to the quotient with the opposite operation to the operation of Y in property number The location of the most significant bit value of Y and X determines the value of δ at every iteration by the equation (3.17) below Mode 1 : Range reduction of Y The step size of δ in the adaptive shift algorithm is obtained by the magnitude of the input data and not by the iteration index, as it was obtained in the fixed shift algorithm. The iterations on Y and Z occur as per the following equations: Y (i+1) = Y (i) µ i δ i X, 0 i n m (3.12) Z (i+1) = Z (i) + µ i δ i (3.13) where the initial value of Y and Z are: Y (0) = Y (3.14) Z (0) = 0 (3.15)

27 16 the µ i in equations (3.12) and (3.13) denotes the addition or subtraction operation in a given iterative index value i, the δ i is the step size given respectively by the following equations : µ i = 1 when Y (i) 0 1 when Y (i) < 0 (3.16) δ i = 2 (P y P x), y x (3.17) where P x is the position of the most significant set bit of X, since X is arbitrary and our notation assumes it as a positive value. while P y is defined as: position of most significant 1 when Y > 0 P y = 0 when Y = 0 (3.18) position of most significant 0 when Y < 0 when P y P x, the iterations for the range reduction mode are stopped Mode 2 : Post processing of Y and Z On the completion of Mode 1, the range of Y n m+1 needs to fall in the range: 2 m 1 Y n m+1 2 m 1 1 (3.19) Just like in fixed shift algorithm post processing; the range may not be satisfied because either the value of Y n m+1 is negative or positive but greater than X and thus, this value needs to be processed so that it satisfies the range: 0 Y n m+1 < X (3.20) the value of quotient, Z n m+1, also needs to be updated whenever Y is changed. In order to bring the result Y n m+1 in the desired range, the following process needs

28 17 to be applied: Y (n m+1) = Y (n m+1) µx (3.21) Z (n m+1) = Z (n m+1) + µ (3.22) where µ works in the same way as in range reduction mode to determine the addition and subtraction operation on equations (3.12) and (3.13) based on the following condition: µ = 1 when Y (n m+1) X 1 when Y (n m+1) < 0 (3.23) This processes is needed so that the range of equation (3.20) is satisfied. The total number of iterations needed in algorithm 2 is n m if the result of division is achieved in mode 1. If the result is not achieved in mode 1, one more iteration is needed in mode Chapter Summary In this chapter, we considered the two division algorithms; the fixed shift algorithm and the adaptive shift algorithm. The equations and conditions required by the algorithms were explained and represented mathematically. The difference between the two algorithms is primarily based on the step size δ, in the fixed shift algorithm, the δ is determined by the iterative index while in the adaptive shift algorithm, the δ is governed by the input data, that is difference between the position of most significant 1 or 0 based on sign of Y and the position of most significant 1 in X, since X is assumed to be positive. In both algorithms, the idea is to reduce Y as determined by δ such that it is positive and lesser than X in magnitude. When Y fails to falls in the correct range, a post processing step is required to obtain the correct values of Y and Z.

29 18 Chapter 4 Design and Implementation The hardware realization of the division algorithms requires identification and design implementation of individual system blocks and their interconnectivity divider designs. This chapter provides sufficient design methodology. 4.1 Hardware entities for Algorithm 1 The division methodology, equations, conditions and operations explained in chapter 3, will be used to determine the hardware entities required for each of the division algorithms. In this section we look at the hardware entities that are required for implementation of Algorithm 1. In every iteration the hardware needs to implement: One shift. One addition and one subtraction (two operations performed by the ALU) to implement this, Algorithm 1 needs the following entities: X, Y and Z registers Data multiplexer Comparator for Y Look-up table ALU Counter

30 19 Finite state machine The system block-level diagram of Algorithm 1 is shown in fig Figure 4.1: Algorithm 1 system level X, Y and Z Registers Division requires four operands in total; the divisor, the dividend, the quotient and the remainder but in our implementation, only three operands are needed since we

31 20 reduce the dividend such that it yields the quotient. Therefore we need to store only three values in the registers; the remainder Y, the quotient Z and the divisor X. The word width of Y is 32 bits, therefore we set the registers of X and Z to 32 bits word width too. Having a uniform word width of the three registers will simplify the applicability of arithmetic operations on these operands. Moreover, the registers are required to hold values from the following: The initial values from the external data lines The intermediate values of Y and Z from the data feedback from the ALU during each iteration. The final values of Y and Z once the iterations are complete and division result is obtained. To perform the above requirements, we need to have control signals for the register bank to enable the read/write capability on the register contents and we also need the ability to switch selectivity between the external data or the internal feedback data. The block level view of our register bank is shown in fig. 4.2 below. Figure 4.2: Registers X, Y and Z in the bank Data Multiplexer The multiplexer has the control signal input from the controller to select from the external data line or from the feedback data lines from the ALU, the output data lines from the multiplexer feeds the data into the registers. The block level of the multiplexer is shown in fig. 4.3.

32 21 Figure 4.3: Data multiplexer for register bank Comparator for Y The comparator that scans Y is an important part of the hardware since it determines if the addition or subtraction operation is needed on the next values of Y and Z. The operands X and Y are fed up in to the comparator to raise the flag when the following conditions occur: Raise the flag when the value of Y goes negative (f ypos = 0) Raise the flag when the value of Y is positive but less than X (f ygtex = 1) The block level view of the comparator is shown in fig. 4.4 below. Figure 4.4: Comparator for Y The Look-up table The look-up table (LUT) is implemented as a ROM in the system with contents stored in weights of binary shifts. The value of δ calculated from in two algorithms corresponds to the address in the LUT, which is picked up by the ALU during the computation in the iteration. The LUT block is shown in fig. 4.5.

33 22 Figure 4.5: LUT block The ALU unit The ALU unit computes the equations (3.1)(3.2)(3.9)(3.10)(3.12)(3.13)(3.21) and (3.22) and is comprised of three ALUs to perform the following: Perform multiplication between δ i and X. Perform addition/subtraction (based on sign bit of current Y ) of the product δ i X from Y i to obtain Y i+1 Perform addition/subtraction (based on the sign bit of current Y ) of δ i from Z i to obtain Z i+1 The ALU requires the control signal based on the status of comparator flags to perform addition or subtraction operation. The ALU block is shown in fig. 4.6 and the logical operation during an iteration is shown in fig Figure 4.6: ALU block

34 23 Figure 4.7: Logical operation of ALU during i th iteration Counter To perform the shift we need a counter. Recall from section that the step size of δ is given by the iteration index and not by the intermediate values of Y. The counter is employed in algorithm 1 to produce the iterations indexes at each iteration which pulls out the corresponding values from the LUT table for the ALU. When the iterations are complete, a flag is raised and it s status is provided to the controlling unit. The counter block is shown below in fig Figure 4.8: Counter block diagram Finite State Machine The finite state machine (FSM) is the controlling unit of the system, it sends and receives the control signals to and from other hardware entities in the system. The FSM block is shown in fig The FSM of the algorithm 1 is fairly simple and only has four states: initial, iterate, adjust and final.

35 24 Figure 4.9: Finite State Machine block FSM : State transition diagram In the initial state the FSM is in the idle mode and scans for an external start input control signal. The initial state is used as a system initialization mode which occurs upon reset and the counter is cleared, the sel (select) is set to high so that the external data inputs are selected and those values are ready to be loaded into the registers X,Y and Z. The enable x and enable yz are set to high which enables the writing in the registers while the done signal is set to zero and add sub y is essentially in the don t care state. Once the start is received, the FSM goes into the iterate mode which implements the range reduction of Y mode, for this the counter is enabled and the sel control is set to 0 so that the internal feedback data lines from the ALU are selected for the next iteration. The flags f ypos = 0 and f ygtex = 1, which means that the Y is negative or is positive but greater than equal to X respectively and the add sub y is controlled accordingly. If the value of Y is negative the addition is performed, if it s positive and greater than Y, subtraction is performed. When the counter has reached the pre-determined counts, the f i flag is raised to a 1, which sends the signal to the FSM that iterations are complete in mode 1. The FSM checks the status of the flag, if flags f ypos = 0 and f ygtex = 1 then the FSM goes into the adjust mode to post process Y and Z. Otherwise if the flags have different status (f ypos = 1 and f ygtex = 0), this means that the Y is in the correct range and the FSM goes directly into the final state. In the final state, the write capability in registers Y and Z is disabled through enable yz and the done signal is set to 1 which indicates that the division operation is complete. The state

36 25 transition diagram is shown in fig Figure 4.10: State transition diagram for Algorithm Hardware entities for Algorithm 2 We know from section that the step size of δ in the adaptive shift algorithm is obtained by the magnitude of the input data and not by the iteration index, therefore we do not use the counter in the implementation of this algorithm. We will instead need a special hardware unit that will check the most significant 1 s or 0 s in the operand, if the number is positive or negative respectively. In our design, we call this unit the delta address generator (DAG). In every iteration the hardware needs to implement the following operations: Determine the location of most significant 1 or 0 for Y i. One shift.

37 26 One addition and one subtraction (two operations performed by the ALU) to implement this, Algorithm 2 needs the following entities: X, Y and Z registers Data multiplexer Comparator for Y Look-up table ALU Delta (δ) address generator Finite state machine The system block-level diagram of Algorithm 2 is shown in fig We only discuss the DAG and finite state machine for Algorithm2 because it specific to the adaptive shift algorithm while the rest of the entities are implemented in the exact same way as in Algorithm 1. One key difference between both designs is that the counter used in Algorithm 1 is not used in Algorithm 2, instead, the DAG generates the shifts in δ.

38 27 Figure 4.11: Algorithm 2 system level Delta Address Generator This unit will determine the location of most significant 1 or 0 by scanning the position P y and P x and generating an address from the difference of the two position to obtain the corresponding value of shift in δ from the LUT ROM. which will be used by the ALU for the computation in the iteration step. The DAG block level diagram is shown in fig and the overall system block-level diagram is given in fig

39 28 Figure 4.12: Delta (δ) Address Generator Figure 4.13: DAG system level

40 29 The DAG is composed of several hardware entities such as : position finder unit. multiplexer for flag. multiplexer for data lines. P x register. number subtractor. Position finder unit The purpose of this unit is to find P y and P x from Y and X respectively based on the input of flag, f ypos. If f ypos = 1, the position finder unit detects the most significant 1 bit in Y and if f ypos = 0 then the unit detects for most significant 0 bit. Since X is assumed to be positive, the unit will always look for most significant 1 s in X. See fig below, the number at the input is either X or Y depending upon the data multiplexer input. Similarly, the f mux which is the flag forwarded by the flag multiplexer, indicates the sign on the number operand at the input of the position finder unit. For the case of Y, the f mux will have input from the f ypos of the comparator, for the case of X, the f mux will send a 1 to the position finder unit, which indicates the unit to look for most significant 1 in X. The output position will have the value of P y and P x from Y and X respectively. The flag out signal is the resultant of the hierarchical implementation of the position finder and is not used in computation of P y and P x or in the division operation. Figure 4.14: Position finder unit block

41 30 Multiplexer for flag This is just a simple multiplexer that enables re-using the same position finder unit for P x and P y. It reads the status of the flag, f ypos to decide if the unit needs to look for 1 s or 0 s in Y. For the case of P x, we feed a 1 from the multiplexer input so that the unit always looks for most significant 1 s in X, since X is always positive. Figure 4.15 below highlights this, the sel x input comes from the FSM and when it s high, the multiplexer sends 1 at the output, otherwise when its a low or a 0, it sends f ypos at the output as f mux. Figure 4.15: Multiplexer for flag input Multiplexer for data lines This works in the exact same way as the multiplexer for flag and share the same control input sel x, since we re-use the position finder unit for both P x and P y, this multiplexer helps to control the data lines selected as input for the position finder unit as shown in fig below. Figure 4.16: Multiplexer for data input

42 31 P x register To employ the re-usability of the position finder unit, we need a register that stores subtracter. Since this register is only used for P x, it will function only when sel x = 1, and therefore this is controlled by the signal enable reg P x. Figure 4.17 illustrates this block. Figure 4.17: The P x Register Number subtractor This hardware entity essentially performs the subtraction of P y P x that is used as an address for LUT and this entity also raises the flag f i when the result of subtraction is less than or equal to 0, which indicates to the FSM that the range reduction of Y mode is complete. The delta address will have the value of the delta from the result of P y P x while the position x and position signals represent P x and P y respectively. Figure 4.18 illustrates this block. Figure 4.18: The number subtractor block in DAG

43 DAG Implementation The DAG is the most important hardware unit for the Algorithm 2 since this unit computes the adaptive shift, δ for this algorithm. Remember in algorithm 1, we employed the counter to compute the fixed shifts which was based on the iterative index i, but in the adaptive shift based division technique, we scan the words Y and X for the bit position of most significant 1 s or 0 s and then use the difference between the bit locations to obtain value of δ. The DAG is implemented in a hierarchical arrangement of five levels which is given by the relation since we have the 32 bit operand: 2 x = 32 (4.1) therefore, x = 5 The level 1 is comprised of 16 2-bits scan units that each scans the two bits at a time for the entire word width of Y starting from bit location Y 0 Y 1 up till Y 30 Y 31, the unit checks the presence of 1 or 0 in the MSB, depending on the sign of Y otherwise checks the LSB for a 1 or 0 and sends the flag and position to the next hierarchical level. This unit also accepts a starting base value n at each block on which the value is obtained to pass on to the next level. Figure 4.19 below shows 2 of these units that will help illustrate the concept. Tables 4.1 and 4.2 show how the 2-bits scan unit works in Y is positive or negative. Figure 4.19: 2-bits scan unit 0 in level 1 The level 2 is comprised of 8 scan block that each scans, essentially 4 bits, the two numbers and the two flags from the 2-bits scan unit in level 1 starting from scan block 0 up to scan block 8, if the flag f1 of the 2-bits scan unit 1 is a 1 then the number on the output of scan block 0 is n1 and if the flag f0 is a 1 and f1

44 33 Y 1 Y 0 n 0 f n n n+1 1 Table 4.1: Truth Table when Y is positive Y 1 Y 0 n 0 f n n n Table 4.2: Truth Table when Y is negative is a 0 then the number on the output of scan block 0 is n0. We demonstrate this relation between scan block 0 in level 2 and the two 2-bits scan unit 0 and scan unit 1 from level 1 in fig The approach for level 2 transcends in the same manner all the way down to level 5 through level 3 and level 4. The scan block shown in fig is exactly the same for the rest of level and works on the same principle by accepting two numbers and two flags from the previous level and updating the number output depending on the status of the flag(s). As we increase a level, the number of scan blocks needed will be reduced by a factor of 2 and hence we only have four scan blocks in level 3 and then two in level 4 and one in level 5. Figure 4.20: Hierarchical approach between level 1 and 2 The hierarchical arrangement of all 5 levels is shown in fig The number

45 34 n0 L5 obtained in the output of level 5 is the position of the most significant 1 or 0 depending on the sign of operand. The n at the top of each 2-bits scan unit, referred in figure as u, is the base value for each unit. Notice that the whole word width of 32 bits is covered with bits scan unit, each scan unit forwards their respective bit position outputs (n0...n15) and flag outputs (f0...f15) to scan blocks in level 2. Although the methodology of operation is same for scan blocks as 2-bits scan unit, different notation for number output and flag outputs is used to highlight the difference. The numbers are the base value plus the most significant 1 or 0 in that unit, then the flag determine which scan block has the most significant 1 and 0, in other words, if the flag from a high order scan block is high, the number output of that scan block is sent at the output.

46 35 Figure 4.21: Hierarchical arrangement of position finder unit

47 Finite State Machine The finite state machine (FSM) is the controlling unit of the system sends and receives the control signals to and from other hardware entities in the system. The FSM block is shown below in the fig The FSM of the algorithm 2 has an additional state than algorithm 1 and only has a total of five states: initial, load X (initialize X ), iterate, adjust and final. Figure 4.22: Finite State Machine block FSM : State transition diagram In the initial state the FSM is in the idle mode and scans for an external start input control signal. Once the start is received, the FSM goes into the load X mode. The load X state is an additional initialization state, along with the initial state, that loads the value of P x into the P x register so that the iterations are synchronized with P y when the iterate mode is reached. The initial state is used as a system initialization mode which occurs upon reset and the sel (select) is set to high so that the external data inputs are selected and those values are ready to be loaded into the registers X,Y and Z. The enable x and enable yz are set to high which enables the writing in the registers while the done signal is set to zero and add sub y is essentially in the don t care state. We have two additional control signals; the sel x (select x) and enable reg x (enable register x) which are associated with obtaining the value of P x, the position of X. In the load X state, the sel x and enable reg x are disabled so that DAG will fetch values of Y in order to obtain the value of P y. The value of δ can be obtained when the DAG performs the operation P y P x,

48 37 on next clock cycle the FSM goes into the iterate mode which implements the range reduction of Y mode. The flags f ypos = 0 and f ygtex = 1, which means that the Y is negative or is positive but greater than equal to X respectively and the add sub y is controlled accordingly. If the value of Y is negative the addition is performed, if it s positive and greater than Y, subtraction is performed. When the result of P y P x 0, the the f i flag is raised to a 1, which sends the signal to the FSM that iterations are complete in mode 1. The state transition diagram is shown in fig Figure 4.23: State transition diagram for Algorithm 2

49 38 The FSM checks the status of the flag, if flags f ypos = 0 and f ygtex = 1 then the FSM goes into the adjust mode to post process Y and Z. Otherwise if the flags have different status (f ypos = 1 and f ygtex = 0), this means that the Y is in the correct range and the FSM goes directly into the final state. In the final state, the write capability in registers Y and Z is disabled through enable yz and the done signal is set to 1 which indicates that the division operation is complete. 4.3 Circuit Implementations The general description of the system and it s blocks has been covered in previous sections, In this section we look at the Register Transfer Level (RTL) view of the top level block and the overall RTL schematic of the division and allied hardware implementation. The signals paths are shown in red and the data paths are shown in black colored lines in the schematics Algorithm 1 : Fixed Shift division algorithm The top level block and the RTL schematic for fixed shift division algorithm are shown in the fig and Figure 4.24: Top level block of fixed shift division algorithm

50 39 Figure 4.25: Fixed shift division algorithm RTL schematic

51 DAG overall layout The overall RTL schematic for the DAG used in algorithm 2 is shown in fig Figure 4.26: Delta address generator RTL schematic Algorithm 2: Adaptive Shift division algorithm The schematics for adaptive shift division algorithm are shown in the fig and Figure 4.27: Top level block of adaptive shift division algorithm

52 41 Figure 4.28: Adaptive shift division algorithm RTL schematic

53 Chapter summary In this chapter, the design overview and methodology was explained with regards to each of the two division algorithms: the algorithm 1, fixed shift division algorithm and the algorithm 2, the adaptive shift division algorithm. The difference in operation and implementation between the two algorithms was explained with the reference to step size, δ. In algorithm 1, the iterations are pre-determined and this was achieved through a counter while in algorithm 2, the shifts in δ was achieved through a special hardware called the DAG. The DAG, is a hierarchal implementation of scan units and scan blocks with the purpose of calculating the difference between P y P x. This difference corresponds to an address in the LUT that holds the shifted binary value of δ.

54 43 Chapter 5 Results and Evaluation The aim of this chapter is to demonstrate that the two division algorithms designs in previous chapter will work based on the algorithms discussed in chapter 3. The implementation phase proved to be very challenging and required a respectable amount of testing, debugging and design revision to ensure that proper functionality of the intended hardware. This chapter documents the tests and simulation results to analyze the functionality and the performance of the two algorithms. Initially the division algorithm 1, based on fixed shift, was constructed to achieve a working division algorithm and then algorithm 2, based on adaptive shift technique, was constructed to produce the same division result. A comparative analysis was conducted between the two algorithms for their power consumption, device utilization, timing analysis, areadelay product and power-delay product based on design goals for balanced, timing performance and power optimization. Some of the related work in the area is also compared in this chapter. 5.1 Numerical Simulation using MATLAB The two algorithms were first implemented in software using MATLAB in order to verify that the division algorithms yielded correct value of quotient and remainder when the dividend was divided by the divisor. The purpose of this numerical simulation was also to have a reference benchmark of numerical values in each iteration so that the comparison can be drawn accordingly during the hardware implementation phase. These simulation numbers were not only important from the verification point of view, but were also very beneficial during hardware description debugging.

55 Numerical Simulation of Algorithm 1 Table 5.1 shows the numerical simulation in each iteration when Y = 1,176,349 is divided by X = 127,773. Range Reduction Mode of Y, Algorithm 1 i Y i+1 Z i+1 δ i = 2 n m i µ i δ i Initialize 1,176, Post Processing Mode of Y and Z, Algorithm 1 Not required, results are obtained Table 5.1: Iterations for Algorithm 1 In the above table, we have all the values of Y i+1 and Z i+1 for each of the iterations i, notice that shifts in δ i are decremental or decreasing by 1 bit, for this reason we call this algorithm as the fixed shift division algorithm. In chapter 3, we discussed that in our work Y is 32 bits and X needs to be at least in 17 bits, denoted by n and m respectively. The difference n m = 15 bits, which gives us the number of iterations required for the division operation, therefore we perform a total of 15 iterations. Since the result of division on the chose value of operands Y and X satisfies the equation (3.8), the post processing mode of Y and Z is not needed in the fixed shift division algorithm.

56 Numerical Simulation of Algorithm 2 Table 5.2 shows the numerical simulation in each iteration when Y = 1,176,349 is divided by X = 127,773. Range Reduction Mode of Y, Algorithm 2 i Y i+1 Z i+1 δ i = 2 Py Px µ i δ i Initialize 1,176, Post Processing Mode of Y and Z, Algorithm Table 5.2: Iterations for Algorithm 2 The table lists out the values of Y i+1 and Z i+1 for each of the iterations i, notice that shifts in δ i are not decremental as in case of algorithm 1, for this reason we call this algorithm as the adaptive shift division algorithm. As discussed in chapter 3, the iterations for adaptive shift division algorithm are given by P y P x. The most significant 1 in Y is at 20th bit position starting from bit position number 0, the least significant bit in Y while the most significant 1 in X is at 16th bit position starting from bit position number 0, the least significant bit in X. The difference between the two respective bit positions in Y and X is = 4, therefore the algorithm takes a total of 4 iterations to produce the result. The iterations in the range reduction mode of Y ends when P y P x, at this point the value of Y did not satisfy the equation (3.20) therefore the algorithm goes into the post processing mode of Y and Z to obtain the correct result. The Table 5.2, shows that the iterations required to achieve the division result is much lesser than the iterations given in Table Hardware Simulation The two algorithms were design, synthesized and implemented in VHDL using Xilinx ISE Project Navigator The implemented top level and overall RTL schematics of both the division algorithms and allied hardware modules were presented in chapter 4. The VHDL test benches were created and simulated to verify that the hardware performs division correctly.

57 VHDL Simulation of Algorithm 1 Figure 5.1 shows the screen shots of test bench output. Figure 5.1: All iterations for Algorithm 1

58 47 By observing Y and Z in fig. 5.1, once the start becomes a 1, the iterations begin on every rising edge as it can be seen until the result of division quotient and remainder is achieved. For clarity we break down the figure and examine the zoomed view of iterations in fig. 5.2 to fig. 5.6 such that, to verify iterations data with the numerical simulation data. Figure 5.2: Iterations 0 to 2 for Algorithm 1 Figure 5.3: Iterations 3 to 7 for Algorithm 1

59 48 Figure 5.4: Iterations 8 to 11 for Algorithm 1 Figure 5.5: Iterations 12 to 14 for Algorithm 1 Our observation of the figures of the test bench screen shots above, it can be seen that the iteration data from the VHDL simulation is consistent with the numerical simulation data obtain in section VHDL Simulation of Algorithm 2 We now assess the functionality of our division algorithm 2, the adaptive shift division algorithm. Just like in section 5.1, it was observed that the adaptive shift technique reduces considerable number of iterations as compared to the fixed shift technique, this is verified by observing Y and Z in fig. 5.7.

60 Figure 5.6: All iterations for Algorithm 2 49

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 3 Arithmetic for Computers Implementation Today Review representations (252/352 recap) Floating point Addition: Ripple

More information

ENERGY-EFFICIENT VLSI REALIZATION OF BINARY64 DIVISION WITH REDUNDANT NUMBER SYSTEMS 1 AVANIGADDA. NAGA SANDHYA RANI

ENERGY-EFFICIENT VLSI REALIZATION OF BINARY64 DIVISION WITH REDUNDANT NUMBER SYSTEMS 1 AVANIGADDA. NAGA SANDHYA RANI ENERGY-EFFICIENT VLSI REALIZATION OF BINARY64 DIVISION WITH REDUNDANT NUMBER SYSTEMS 1 AVANIGADDA. NAGA SANDHYA RANI 2 BALA KRISHNA.KONDA M.Tech, Assistant Professor 1,2 Eluru College Of Engineering And

More information

CS6303 COMPUTER ARCHITECTURE LESSION NOTES UNIT II ARITHMETIC OPERATIONS ALU In computing an arithmetic logic unit (ALU) is a digital circuit that performs arithmetic and logical operations. The ALU is

More information

HIGH SPEED SINGLE PRECISION FLOATING POINT UNIT IMPLEMENTATION USING VERILOG

HIGH SPEED SINGLE PRECISION FLOATING POINT UNIT IMPLEMENTATION USING VERILOG HIGH SPEED SINGLE PRECISION FLOATING POINT UNIT IMPLEMENTATION USING VERILOG 1 C.RAMI REDDY, 2 O.HOMA KESAV, 3 A.MAHESWARA REDDY 1 PG Scholar, Dept of ECE, AITS, Kadapa, AP-INDIA. 2 Asst Prof, Dept of

More information

COMPUTER ARITHMETIC (Part 1)

COMPUTER ARITHMETIC (Part 1) Eastern Mediterranean University School of Computing and Technology ITEC255 Computer Organization & Architecture COMPUTER ARITHMETIC (Part 1) Introduction The two principal concerns for computer arithmetic

More information

Chapter 10 - Computer Arithmetic

Chapter 10 - Computer Arithmetic Chapter 10 - Computer Arithmetic Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 10 - Computer Arithmetic 1 / 126 1 Motivation 2 Arithmetic and Logic Unit 3 Integer representation

More information

Binary Adders. Ripple-Carry Adder

Binary Adders. Ripple-Carry Adder Ripple-Carry Adder Binary Adders x n y n x y x y c n FA c n - c 2 FA c FA c s n MSB position Longest delay (Critical-path delay): d c(n) = n d carry = 2n gate delays d s(n-) = (n-) d carry +d sum = 2n

More information

Divide: Paper & Pencil

Divide: Paper & Pencil Divide: Paper & Pencil 1001 Quotient Divisor 1000 1001010 Dividend -1000 10 101 1010 1000 10 Remainder See how big a number can be subtracted, creating quotient bit on each step Binary => 1 * divisor or

More information

Module 2: Computer Arithmetic

Module 2: Computer Arithmetic Module 2: Computer Arithmetic 1 B O O K : C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N, 3 E D, D A V I D L. P A T T E R S O N A N D J O H N L. H A N N E S S Y, M O R G A N K A U F M A N N

More information

DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS

DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS C H A P T E R 6 DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS OUTLINE 6- Binary Addition 6-2 Representing Signed Numbers 6-3 Addition in the 2 s- Complement System 6-4 Subtraction in the 2 s- Complement

More information

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B) Computer Arithmetic Data is manipulated by using the arithmetic instructions in digital computers. Data is manipulated to produce results necessary to give solution for the computation problems. The Addition,

More information

CS 5803 Introduction to High Performance Computer Architecture: Arithmetic Logic Unit. A.R. Hurson 323 CS Building, Missouri S&T

CS 5803 Introduction to High Performance Computer Architecture: Arithmetic Logic Unit. A.R. Hurson 323 CS Building, Missouri S&T CS 5803 Introduction to High Performance Computer Architecture: Arithmetic Logic Unit A.R. Hurson 323 CS Building, Missouri S&T hurson@mst.edu 1 Outline Motivation Design of a simple ALU How to design

More information

FPGA Implementation of the Complex Division in Digital Predistortion Linearizer

FPGA Implementation of the Complex Division in Digital Predistortion Linearizer Australian Journal of Basic and Applied Sciences, 4(10): 5028-5037, 2010 ISSN 1991-8178 FPGA Implementation of the Complex Division in Digital Predistortion Linearizer Somayeh Mohammady, Pooria Varahram,

More information

Number Systems and Computer Arithmetic

Number Systems and Computer Arithmetic Number Systems and Computer Arithmetic Counting to four billion two fingers at a time What do all those bits mean now? bits (011011011100010...01) instruction R-format I-format... integer data number text

More information

PESIT Bangalore South Campus

PESIT Bangalore South Campus INTERNAL ASSESSMENT TEST III Date : 21/11/2017 Max Marks : 40 Subject & Code : Computer Organization (15CS34) Semester : III (A & B) Name of the faculty: Mrs. Sharmila Banu Time : 11.30 am 1.00 pm Answer

More information

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation. ISSN 2319-8885 Vol.03,Issue.32 October-2014, Pages:6436-6440 www.ijsetr.com Design and Modeling of Arithmetic and Logical Unit with the Platform of VLSI N. AMRUTHA BINDU 1, M. SAILAJA 2 1 Dept of ECE,

More information

(+A) + ( B) + (A B) (B A) + (A B) ( A) + (+ B) (A B) + (B A) + (A B) (+ A) (+ B) + (A - B) (B A) + (A B) ( A) ( B) (A B) + (B A) + (A B)

(+A) + ( B) + (A B) (B A) + (A B) ( A) + (+ B) (A B) + (B A) + (A B) (+ A) (+ B) + (A - B) (B A) + (A B) ( A) ( B) (A B) + (B A) + (A B) COMPUTER ARITHMETIC 1. Addition and Subtraction of Unsigned Numbers The direct method of subtraction taught in elementary schools uses the borrowconcept. In this method we borrow a 1 from a higher significant

More information

An instruction set processor consist of two important units: Data Processing Unit (DataPath) Program Control Unit

An instruction set processor consist of two important units: Data Processing Unit (DataPath) Program Control Unit DataPath Design An instruction set processor consist of two important units: Data Processing Unit (DataPath) Program Control Unit Add & subtract instructions for fixed binary numbers are found in the

More information

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications , Vol 7(4S), 34 39, April 204 ISSN (Print): 0974-6846 ISSN (Online) : 0974-5645 Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications B. Vignesh *, K. P. Sridhar

More information

COMPUTER ORGANIZATION AND ARCHITECTURE

COMPUTER ORGANIZATION AND ARCHITECTURE COMPUTER ORGANIZATION AND ARCHITECTURE For COMPUTER SCIENCE COMPUTER ORGANIZATION. SYLLABUS AND ARCHITECTURE Machine instructions and addressing modes, ALU and data-path, CPU control design, Memory interface,

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Arithmetic Unit 10122011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Fixed Point Arithmetic Addition/Subtraction

More information

Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs

Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs Xin Fang and Miriam Leeser Dept of Electrical and Computer Eng Northeastern University Boston, Massachusetts 02115

More information

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers International Journal of Research in Computer Science ISSN 2249-8257 Volume 1 Issue 1 (2011) pp. 1-7 White Globe Publications www.ijorcs.org IEEE-754 compliant Algorithms for Fast Multiplication of Double

More information

D I G I T A L C I R C U I T S E E

D I G I T A L C I R C U I T S E E D I G I T A L C I R C U I T S E E Digital Circuits Basic Scope and Introduction This book covers theory solved examples and previous year gate question for following topics: Number system, Boolean algebra,

More information

Chapter 5 : Computer Arithmetic

Chapter 5 : Computer Arithmetic Chapter 5 Computer Arithmetic Integer Representation: (Fixedpoint representation): An eight bit word can be represented the numbers from zero to 255 including = 1 = 1 11111111 = 255 In general if an nbit

More information

Integer Multiplication. Back to Arithmetic. Integer Multiplication. Example (Fig 4.25)

Integer Multiplication. Back to Arithmetic. Integer Multiplication. Example (Fig 4.25) Back to Arithmetic Before, we did Representation of integers Addition/Subtraction Logical ops Forecast Integer Multiplication Integer Division Floating-point Numbers Floating-point Addition/Multiplication

More information

Elec 326: Digital Logic Design

Elec 326: Digital Logic Design Elec 326: Digital Logic Design Project Requirements Fall 2005 For this project you will design and test a three-digit binary-coded-decimal (BCD) adder capable of adding positive and negative BCD numbers.

More information

CAD4 The ALU Fall 2009 Assignment. Description

CAD4 The ALU Fall 2009 Assignment. Description CAD4 The ALU Fall 2009 Assignment To design a 16-bit ALU which will be used in the datapath of the microprocessor. This ALU must support two s complement arithmetic and the instructions in the baseline

More information

By, Ajinkya Karande Adarsh Yoga

By, Ajinkya Karande Adarsh Yoga By, Ajinkya Karande Adarsh Yoga Introduction Early computer designers believed saving computer time and memory were more important than programmer time. Bug in the divide algorithm used in Intel chips.

More information

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 1 DLD P VIDYA SAGAR

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 1 DLD P VIDYA SAGAR UNIT I Digital Systems: Binary Numbers, Octal, Hexa Decimal and other base numbers, Number base conversions, complements, signed binary numbers, Floating point number representation, binary codes, error

More information

UNIT - I: COMPUTER ARITHMETIC, REGISTER TRANSFER LANGUAGE & MICROOPERATIONS

UNIT - I: COMPUTER ARITHMETIC, REGISTER TRANSFER LANGUAGE & MICROOPERATIONS UNIT - I: COMPUTER ARITHMETIC, REGISTER TRANSFER LANGUAGE & MICROOPERATIONS (09 periods) Computer Arithmetic: Data Representation, Fixed Point Representation, Floating Point Representation, Addition and

More information

IJRASET 2015: All Rights are Reserved

IJRASET 2015: All Rights are Reserved Design High Speed Doubles Precision Floating Point Unit Using Verilog V.Venkaiah 1, K.Subramanyam 2, M.Tech Department of Electronics and communication Engineering Audisankara College of Engineering &

More information

Computer Architecture and Organization

Computer Architecture and Organization 3-1 Chapter 3 - Arithmetic Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 3 Arithmetic 3-2 Chapter 3 - Arithmetic Chapter Contents 3.1 Fixed Point Addition and Subtraction

More information

Contents. Chapter 9 Datapaths Page 1 of 28

Contents. Chapter 9 Datapaths Page 1 of 28 Chapter 9 Datapaths Page of 2 Contents Contents... 9 Datapaths... 2 9. General Datapath... 3 9.2 Using a General Datapath... 5 9.3 Timing Issues... 7 9.4 A More Complex General Datapath... 9 9.5 VHDL for

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

Computer Architecture

Computer Architecture Computer Architecture Lecture 1: Digital logic circuits The digital computer is a digital system that performs various computational tasks. Digital computers use the binary number system, which has two

More information

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital hardware modules that accomplish a specific information-processing task. Digital systems vary in

More information

HANSABA COLLEGE OF ENGINEERING & TECHNOLOGY (098) SUBJECT: DIGITAL ELECTRONICS ( ) Assignment

HANSABA COLLEGE OF ENGINEERING & TECHNOLOGY (098) SUBJECT: DIGITAL ELECTRONICS ( ) Assignment Assignment 1. What is multiplexer? With logic circuit and function table explain the working of 4 to 1 line multiplexer. 2. Implement following Boolean function using 8: 1 multiplexer. F(A,B,C,D) = (2,3,5,7,8,9,12,13,14,15)

More information

Chapter 2: Number Systems

Chapter 2: Number Systems Chapter 2: Number Systems Logic circuits are used to generate and transmit 1s and 0s to compute and convey information. This two-valued number system is called binary. As presented earlier, there are many

More information

Available online at ScienceDirect. Procedia Technology 24 (2016 )

Available online at   ScienceDirect. Procedia Technology 24 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Technology 24 (2016 ) 1120 1126 International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015) FPGA

More information

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE 754-2008 Standard M. Shyamsi, M. I. Ibrahimy, S. M. A. Motakabber and M. R. Ahsan Dept. of Electrical and Computer Engineering

More information

Arithmetic Processing

Arithmetic Processing CS/EE 5830/6830 VLSI ARCHITECTURE Chapter 1 Basic Number Representations and Arithmetic Algorithms Arithmetic Processing AP = (operands, operation, results, conditions, singularities) Operands are: Set

More information

FPGA Matrix Multiplier

FPGA Matrix Multiplier FPGA Matrix Multiplier In Hwan Baek Henri Samueli School of Engineering and Applied Science University of California Los Angeles Los Angeles, California Email: chris.inhwan.baek@gmail.com David Boeck Henri

More information

A Single/Double Precision Floating-Point Reciprocal Unit Design for Multimedia Applications

A Single/Double Precision Floating-Point Reciprocal Unit Design for Multimedia Applications A Single/Double Precision Floating-Point Reciprocal Unit Design for Multimedia Applications Metin Mete Özbilen 1 and Mustafa Gök 2 1 Mersin University, Engineering Faculty, Department of Computer Science,

More information

Least Common Multiple (LCM)

Least Common Multiple (LCM) Least Common Multiple (LCM) Task: Implement an LCM algorithm that is able to handle any combination of 8-bit (sign bit included) numbers. Use two's complement format to represent negative values. Provide

More information

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state Chapter 4 xi yi Carry in ci Sum s i Carry out c i+ At the ith stage: Input: ci is the carry-in Output: si is the sum ci+ carry-out to (i+)st state si = xi yi ci + xi yi ci + xi yi ci + xi yi ci = x i yi

More information

High Speed Special Function Unit for Graphics Processing Unit

High Speed Special Function Unit for Graphics Processing Unit High Speed Special Function Unit for Graphics Processing Unit Abd-Elrahman G. Qoutb 1, Abdullah M. El-Gunidy 1, Mohammed F. Tolba 1, and Magdy A. El-Moursy 2 1 Electrical Engineering Department, Fayoum

More information

CO212 Lecture 10: Arithmetic & Logical Unit

CO212 Lecture 10: Arithmetic & Logical Unit CO212 Lecture 10: Arithmetic & Logical Unit Shobhanjana Kalita, Dept. of CSE, Tezpur University Slides courtesy: Computer Architecture and Organization, 9 th Ed, W. Stallings Integer Representation For

More information

a, b sum module add32 sum vector bus sum[31:0] sum[0] sum[31]. sum[7:0] sum sum overflow module add32_carry assign

a, b sum module add32 sum vector bus sum[31:0] sum[0] sum[31]. sum[7:0] sum sum overflow module add32_carry assign I hope you have completed Part 1 of the Experiment. This lecture leads you to Part 2 of the experiment and hopefully helps you with your progress to Part 2. It covers a number of topics: 1. How do we specify

More information

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Performance Analysis of CORDIC Architectures Targeted by FPGA Devices Guddeti Nagarjuna Reddy 1, R.Jayalakshmi 2, Dr.K.Umapathy

More information

Integer Multiplication and Division

Integer Multiplication and Division Integer Multiplication and Division for ENCM 369: Computer Organization Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 208 Integer

More information

Finite State Machine with Datapath

Finite State Machine with Datapath Finite State Machine with Datapath Task: Implement a GCD algorithm that is able to handle any combination of -bit (sign bit included) numbers. Use two's complement format to represent negative values.

More information

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm 1 A.Malashri, 2 C.Paramasivam 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering A Review: Design of 16 bit Arithmetic and Logical unit using Vivado 14.7 and Implementation on Basys 3 FPGA Board Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor,

More information

An FPGA based Implementation of Floating-point Multiplier

An FPGA based Implementation of Floating-point Multiplier An FPGA based Implementation of Floating-point Multiplier L. Rajesh, Prashant.V. Joshi and Dr.S.S. Manvi Abstract In this paper we describe the parameterization, implementation and evaluation of floating-point

More information

EECS150 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3

EECS150 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3 EECS15 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3 October 8, 22 John Wawrzynek Fall 22 EECS15 - Lec13-cla3 Page 1 Multiplication a 3 a 2 a 1 a Multiplicand b 3 b 2 b

More information

Carry-Free Radix-2 Subtractive Division Algorithm and Implementation of the Divider

Carry-Free Radix-2 Subtractive Division Algorithm and Implementation of the Divider Tamkang Journal of Science and Engineering, Vol. 3, No., pp. 29-255 (2000) 29 Carry-Free Radix-2 Subtractive Division Algorithm and Implementation of the Divider Jen-Shiun Chiang, Hung-Da Chung and Min-Show

More information

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y Arithmetic A basic operation in all digital computers is the addition and subtraction of two numbers They are implemented, along with the basic logic functions such as AND,OR, NOT,EX- OR in the ALU subsystem

More information

TSEA44 - Design for FPGAs

TSEA44 - Design for FPGAs 2015-11-24 Now for something else... Adapting designs to FPGAs Why? Clock frequency Area Power Target FPGA architecture: Xilinx FPGAs with 4 input LUTs (such as Virtex-II) Determining the maximum frequency

More information

Midterm Project Design of 4 Bit ALU Fall 2001

Midterm Project Design of 4 Bit ALU Fall 2001 Midterm Project Design of 4 Bit ALU Fall 2001 By K.Narayanan George Washington University E.C.E Department K.Narayanan Fall 2001 1 Midterm Project... 1 Design of 4 Bit ALU... 1 Abstract... 3 1.2 Specification:...

More information

COMP 303 Computer Architecture Lecture 6

COMP 303 Computer Architecture Lecture 6 COMP 303 Computer Architecture Lecture 6 MULTIPLY (unsigned) Paper and pencil example (unsigned): Multiplicand 1000 = 8 Multiplier x 1001 = 9 1000 0000 0000 1000 Product 01001000 = 72 n bits x n bits =

More information

Chapter 4 Arithmetic Functions

Chapter 4 Arithmetic Functions Logic and Computer Design Fundamentals Chapter 4 Arithmetic Functions Charles Kime & Thomas Kaminski 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Overview Iterative combinational

More information

McGill University Faculty of Engineering FINAL EXAMINATION Fall 2007 (DEC 2007)

McGill University Faculty of Engineering FINAL EXAMINATION Fall 2007 (DEC 2007) McGill University Faculty of Engineering FINAL EXAMINATION Fall 2007 (DEC 2007) VERSION 1 Examiner: Professor T.Arbel Signature: INTRODUCTION TO COMPUTER ENGINEERING ECSE-221A 6 December 2007, 1400-1700

More information

Embedded Systems Design Prof. Anupam Basu Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Embedded Systems Design Prof. Anupam Basu Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Embedded Systems Design Prof. Anupam Basu Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Optimization Issues Now I see, that is not been seen there;

More information

Timing for Ripple Carry Adder

Timing for Ripple Carry Adder Timing for Ripple Carry Adder 1 2 3 Look Ahead Method 5 6 7 8 9 Look-Ahead, bits wide 10 11 Multiplication Simple Gradeschool Algorithm for 32 Bits (6 Bit Result) Multiplier Multiplicand AND gates 32

More information

Analysis of Different Multiplication Algorithms & FPGA Implementation

Analysis of Different Multiplication Algorithms & FPGA Implementation IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 4, Issue 2, Ver. I (Mar-Apr. 2014), PP 29-35 e-issn: 2319 4200, p-issn No. : 2319 4197 Analysis of Different Multiplication Algorithms & FPGA

More information

Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers

Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers Subash Chandar G (g-chandar1@ti.com), Vaideeswaran S (vaidee@ti.com) DSP Design, Texas Instruments India

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information

XPLANATION: FPGA 101. The Basics of. by Adam Taylor Principal Engineer EADS Astrium FPGA Mathematics

XPLANATION: FPGA 101. The Basics of. by Adam Taylor Principal Engineer EADS Astrium FPGA Mathematics The Basics of by Adam Taylor Principal Engineer EADS Astrium aptaylor@theiet.org FPGA Mathematics 44 Xcell Journal Third Quarter 2012 One of the main advantages of the FPGA is its ability to perform mathematical

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 Advance Encryption Standard (AES) Rijndael algorithm is symmetric block cipher that can process data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256

More information

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1.1 Introduction Given that digital logic and memory devices are based on two electrical states (on and off), it is natural to use a number

More information

CHAPTER 1 Numerical Representation

CHAPTER 1 Numerical Representation CHAPTER 1 Numerical Representation To process a signal digitally, it must be represented in a digital format. This point may seem obvious, but it turns out that there are a number of different ways to

More information

A High-Speed FPGA Implementation of an RSD- Based ECC Processor

A High-Speed FPGA Implementation of an RSD- Based ECC Processor A High-Speed FPGA Implementation of an RSD- Based ECC Processor Abstract: In this paper, an exportable application-specific instruction-set elliptic curve cryptography processor based on redundant signed

More information

ECE 465, Spring 2010, Instructor: Prof. Shantanu Dutt. Project 2 : Due Fri, April 23, midnight.

ECE 465, Spring 2010, Instructor: Prof. Shantanu Dutt. Project 2 : Due Fri, April 23, midnight. ECE 465, Spring 2010, Instructor: Prof Shantanu Dutt Project 2 : Due Fri, April 23, midnight 1 Goal The goals of this project are: 1 To design a sequential circuit that meets a given speed requirement

More information

Chapter 1 Review of Number Systems

Chapter 1 Review of Number Systems 1.1 Introduction Chapter 1 Review of Number Systems Before the inception of digital computers, the only number system that was in common use is the decimal number system which has a total of 10 digits

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 4-A Floating-Point Arithmetic Israel Koren ECE666/Koren Part.4a.1 Preliminaries - Representation

More information

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control, UNIT - 7 Basic Processing Unit: Some Fundamental Concepts, Execution of a Complete Instruction, Multiple Bus Organization, Hard-wired Control, Microprogrammed Control Page 178 UNIT - 7 BASIC PROCESSING

More information

Divisibility Rules and Their Explanations

Divisibility Rules and Their Explanations Divisibility Rules and Their Explanations Increase Your Number Sense These divisibility rules apply to determining the divisibility of a positive integer (1, 2, 3, ) by another positive integer or 0 (although

More information

Module 5 - CPU Design

Module 5 - CPU Design Module 5 - CPU Design Lecture 1 - Introduction to CPU The operation or task that must perform by CPU is: Fetch Instruction: The CPU reads an instruction from memory. Interpret Instruction: The instruction

More information

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring EE 260: Introduction to Digital Design Arithmetic II Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Overview n Integer multiplication n Booth s algorithm n Integer division

More information

Implementation of CORDIC Algorithms in FPGA

Implementation of CORDIC Algorithms in FPGA Summer Project Report Implementation of CORDIC Algorithms in FPGA Sidharth Thomas Suyash Mahar under the guidance of Dr. Bishnu Prasad Das May 2017 Department of Electronics and Communication Engineering

More information

EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION

EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION QUESTION BANK UNIT-II 1. What are the disadvantages in using a ripple carry adder? (NOV/DEC 2006) The main disadvantage using ripple carry adder is time delay.

More information

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy. Math 340 Fall 2014, Victor Matveev Binary system, round-off errors, loss of significance, and double precision accuracy. 1. Bits and the binary number system A bit is one digit in a binary representation

More information

structure syntax different levels of abstraction

structure syntax different levels of abstraction This and the next lectures are about Verilog HDL, which, together with another language VHDL, are the most popular hardware languages used in industry. Verilog is only a tool; this course is about digital

More information

Here is a list of lecture objectives. They are provided for you to reflect on what you are supposed to learn, rather than an introduction to this

Here is a list of lecture objectives. They are provided for you to reflect on what you are supposed to learn, rather than an introduction to this This and the next lectures are about Verilog HDL, which, together with another language VHDL, are the most popular hardware languages used in industry. Verilog is only a tool; this course is about digital

More information

Number Systems CHAPTER Positional Number Systems

Number Systems CHAPTER Positional Number Systems CHAPTER 2 Number Systems Inside computers, information is encoded as patterns of bits because it is easy to construct electronic circuits that exhibit the two alternative states, 0 and 1. The meaning of

More information

*Instruction Matters: Purdue Academic Course Transformation. Introduction to Digital System Design. Module 4 Arithmetic and Computer Logic Circuits

*Instruction Matters: Purdue Academic Course Transformation. Introduction to Digital System Design. Module 4 Arithmetic and Computer Logic Circuits Purdue IM:PACT* Fall 2018 Edition *Instruction Matters: Purdue Academic Course Transformation Introduction to Digital System Design Module 4 Arithmetic and Computer Logic Circuits Glossary of Common Terms

More information

Chapter 3: part 3 Binary Subtraction

Chapter 3: part 3 Binary Subtraction Chapter 3: part 3 Binary Subtraction Iterative combinational circuits Binary adders Half and full adders Ripple carry and carry lookahead adders Binary subtraction Binary adder-subtractors Signed binary

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering Arithmetic for Computers James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy Arithmetic for

More information

Chapter 4 Section 2 Operations on Decimals

Chapter 4 Section 2 Operations on Decimals Chapter 4 Section 2 Operations on Decimals Addition and subtraction of decimals To add decimals, write the numbers so that the decimal points are on a vertical line. Add as you would with whole numbers.

More information

DESIGN PROJECT TOY RPN CALCULATOR

DESIGN PROJECT TOY RPN CALCULATOR April 8, 1998 DESIGN PROJECT TOY RPN CALCULATOR ECE/Comp Sci 352 Digital System Fundamentals Semester II 1997-98 Due Tuesday, April 28, 1998; 10% of course grade. This project is to be submitted and will

More information

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 2, Issue 1, Feb 2015, 01-07 IIST HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC

More information

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. Binary Arithmetic Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. MIT 6.004 Fall 2018 Reminder: Encoding Positive Integers Bit i in a binary representation (in right-to-left order)

More information

Iterative Division Techniques COMPUTER ARITHMETIC: Lecture Notes # 6. University of Illinois at Chicago

Iterative Division Techniques COMPUTER ARITHMETIC: Lecture Notes # 6. University of Illinois at Chicago 1 ECE 368 CAD Based Logic Design Instructor: Shantanu Dutt Department of Electrical and Computer Engineering University of Illinois at Chicago Lecture Notes # 6 COMPUTER ARITHMETIC: Iterative Division

More information

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering An Efficient Implementation of Double Precision Floating Point Multiplier Using Booth Algorithm Pallavi Ramteke 1, Dr. N. N. Mhala 2, Prof. P. R. Lakhe M.Tech [IV Sem], Dept. of Comm. Engg., S.D.C.E, [Selukate],

More information

EE 486 Winter The role of arithmetic. EE 486 : lecture 1, the integers. SIA Roadmap - 2. SIA Roadmap - 1

EE 486 Winter The role of arithmetic. EE 486 : lecture 1, the integers. SIA Roadmap - 2. SIA Roadmap - 1 EE 486 Winter 2-3 The role of arithmetic EE 486 : lecture, the integers M. J. Flynn With increasing circuit density available with sub micron feature sizes, there s a corresponding broader spectrum of

More information

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010 Digital Logic & Computer Design CS 434 Professor Dan Moldovan Spring 2 Copyright 27 Elsevier 5- Chapter 5 :: Digital Building Blocks Digital Design and Computer Architecture David Money Harris and Sarah

More information

CHW 261: Logic Design

CHW 261: Logic Design CHW 261: Logic Design Instructors: Prof. Hala Zayed Dr. Ahmed Shalaby http://www.bu.edu.eg/staff/halazayed14 http://bu.edu.eg/staff/ahmedshalaby14# Slide 1 Slide 2 Slide 3 Digital Fundamentals CHAPTER

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences Introductory Digital Systems Lab (6.111) uiz - Spring 2004 Prof. Anantha Chandrakasan Student Name: Problem

More information

361 div.1. Computer Architecture EECS 361 Lecture 7: ALU Design : Division

361 div.1. Computer Architecture EECS 361 Lecture 7: ALU Design : Division 361 div.1 Computer Architecture EECS 361 Lecture 7: ALU Design : Division Outline of Today s Lecture Introduction to Today s Lecture Divide Questions and Administrative Matters Introduction to Single cycle

More information