Design of Two Different 128-bit Adders. Project Report

Size: px
Start display at page:

Download "Design of Two Different 128-bit Adders. Project Report"

Transcription

1 Design of Two Different 128-bit Adders Project Report By Vladislav uravin Concordia ID: COEN6501: Digital Design & Synthesis Offered by Professor Asim Al-Khalili Concordia University December 2004

2 Table of Contents 1 INTRODUCTION REPORT ORGANIZATION COON ADDER STRUCTURES bit Full Adder N-bit Ripple Carry Adder Carry Skip Adder Carry Select Adder Carry Look Ahead Adder Prefix Adders Sklansky Prefix Adder Kogge-Stone Prefix Adder DESIGN FLOW & IPLEENTATION ICRO ARCHITECTURE Top Entity Sub-Block Partitioning "Carry Propagate" and "Carry Generate" Block (pg_gen) Carry Generation Block Carry Generation Block Sklansky Prefix Adder (cg_gen_sklansky) Carry Generation Block Kogge-Stone Prefix Adder (cg_gen_kogge_stone) Sum Bits Generation Block (sb_gen) RTL CODING VERIFICATION PLAN SYNTHESIS, PLACE AND ROUTE RESULTS SIULATION RESULTS Initial Test Cases General Test Case SYNTHESIS RESULTS ultiplexing I/O ultiplexed Inputs ultiplexed Outputs ultiplexed Inputs and Outputs Changing Target Device DESIGN ENHANCEENT PIPELINING SUARY AND CONCLUSIONS REFERENCES... 25

3 Table of Figures FIGURE 1: 1-BIT FULL ADDER... 4 FIGURE 2: N-BIT CARRY PROPAGATE ADDER... 5 FIGURE 3: CARRY SKIP CONCEPT... 5 FIGURE 4: CARRY SELECT CONCEPT... 6 FIGURE 5: SKLANSKY PREFIX TREE... 8 FIGURE 6: KOGGE-STONE PREFIX TREE... 9 FIGURE 7: DESIGN FLOW FIGURE 8: TOP LEVEL VIEW FIGURE 9: FULL_ADDER SUB-BLOCK PARTITIONING FIGURE 10: "CARRY GENERATE" AND "CARRY PROPAGATE" BLOCK IPLEENTATION FIGURE 11: SU BITS GENERATION BLOCK IPLEENTATION FIGURE 12: TEST BENCH & VERIFICATION PLAN FIGURE 13: INITIAL TEST CASE SIULATION RESULTS FIGURE 14: GENERAL TEST CASE - FULL ZOO FIGURE 15: GENERAL TEST CASE - EXAPLE FIGURE 16: GENERAL TEST CASE - EXAPLE FIGURE 17: FORWARD REGISTERS BALANCING (PIPELINING) FIGURE 18: BACKWARD REGISTERS BALANCING (PIPELINING) TABLE 1: SIGNAL DESCRIPTION TABLE 2: SYNTHESIS RESULTS (NO PLACEENT AND ROUTING): XC2V500 -FG456-4 DEVICE TABLE 3: SYNTHESIS RESULTS: XC2V1000 FF896-4 DEVICE TABLE 4: PLACEENT AND ROUTING RESULTS FF896-4 DEVICE TABLE 5: PLACEENT AND ROUTING RESULTS OF PIPELINED SKLANSKY ADDER TABLE 6: PLACEENT AND ROUTING RESULTS OF PIPELINED KOGGE-STONE ADDER... 23

4 1 Introduction The objective of this project is to design two different 128-bit adders by going through the full design cycle from initial concept to structural RTL coding, simulation and synthesis for Xilinx Virtex-2 FPGA family, device XC2V Report Organization The report is organized into few sections. Section 1 introduces common principles of adder designs and structures, briefly describing the Carry Select, Carry Skip and the Carry Look-Ahead principles with further elaboration on parallel-prefix adders, two of which, Sklansky prefix adder and Kogge-Stone prefix adder, are implemented in this project. Section 2 describes the design flow and the micro architecture of the design. Section 3 focuses on the verification and test plan of the designs, followed by section 4 describing the results. Finally, sections 5 and 6 finalize the report with the conclusions and references, respectively. 1.2 Common Adder Structures bit Full Adder A 1-bit Full Adder is shown on Figure 1. The equations describing the outputs are: S = A B C C in = A B + ( A B) out C in A B Cin Full Adder S Cout Cin A S B Figure 1: 1-bit Full Adder Cout N-bit Ripple Carry Adder An iterative approach of considering an N-bit full adder leads to cascading of 1-bit full adders. This concept is illustrated in Figure 2. Obviously, as N increases, the most critical path, which is the carry path, increases as well ( C path), linearly. out

5 Bn 1 An 1 Bi Ai B 0 A0 Full Adder C i Full Adder C 0 Full Adder C out S n 1 S i Figure 2: N-bit Carry Propagate Adder S Carry Skip Adder Let pi = ai bi and gi = ai bi. p denotes "propagate" and g denotes "generate". The basic carry-skip or carry-bypass design is an adder, which divides an N-bit adder into N blocks, where each block contains bits. This is shown at Figure 3. Within each block, a simple -bit full adder structure is realized (linear time Carry Skip Adder), where "propagate" and "generate" signals for the respective input bits are used to form the output sum bits and the output carries. The multiplexer at the end of a block, allows the input carry to bypass the block when all of the "propagate" signals in that block are asserted. After the carry generate delay of the first block, the bypassing of carries in subsequent blocks results in the carry-propagate delay. If any of the "propagate" signals in some block is unasserted, then the carry propagation is not dependent on any of the input carries from the previous blocks and each multiplexer. The critical path delay is N t PD = tsetup + t FA + 1 tux + ( K 1) t FA + tsu The subsequent section explains how the better performance can be achieved by modifying the block size. A N 1 A N B N 1 B N AN K 1 AN K BN K 1 BN K A 1 A 0 B 1 B0 Carry Select Logic Carry Select Logic Carry Select Logic Cin Carry Propagation Carry Propagation Carry Propagation SU(-1) Figure 3: Carry Skip Concept SU(-2) SU(0) Cout

6 1.2.4 Carry Select Adder This type of adder, despite its bigger amount of hardware needed, it has a very interesting design concept. The linear Carry Select Adder is divided into N blocks, where each block contains bits, just as Carry Skip Adder. At each block, the hardware is replicated in order to calculate sum and carry-out bits for both possible carry-ins. Figure 4 illustrates this concept. The multiplexer at the end chooses between the carry-outs based on the carry-in from the previous stage. In this implementation, the critical path delay comprises the carry-generate of the first block, followed by the mux delays for successive blocks. This results in a linear time Carry Select Adder. Variable-sized blocks can yield higher performance [5]. For a carry-select adder, one can have increasing sizes of the blocks so that the delay can be minimized by allowing all the inputs to arrive at the same time at each multiplexer. For example, if the multiplexer delay is similar to the delay of a full adder, then the minimal carry delay can be achieved by adding 1 bit in the first block, 2 in the second, and so on. Having linearly increasing block sizes results in a square-root number of block stages for the carry propagate delay, and hence a square-root time CSA. A similar approach can yield a square-root time CSkA. A 1 A 0 B 1 B0 2 1 A B B A 2 1 N 1 A N B N 1 B N A Cin -bit Adder -bit Adder -bit Adder -bit Adder -bit Adder SU(0) A Figure 4: Carry Select Concept SU(1) 2 1 A B 2 1 B AN 1 A N BN B N SU(-1) Cout Carry Look Ahead Adder Ripple Carry Adder implementation imposes the sequential generation of the carries, making the output carry of each stage dependant on the input carry to the stage. Carry Look Ahead implementation implies that the carry-out is not depending on the previous carries. Let pi = ai bi and gi = ai bi. P denotes "propagate" and G denotes "generate". Then si = pi ci and ci+1 = gi + pi ci Expanding the above given equations for N-bit adder gives: c = g + p 1 0 0c0

7 c 2 = g1 + p1c1 + p1 p0c0 cn = g n 1 + pn 1g n pn 1 pn 2... P1 P0 G Pn 1Pn 2... P1 P0 C0 It can be easily seen that since the carry is not depending on the previous carries, this would result in less delay, as the adder circuit can be implemented as sum of products. Consequently, an increase in the speed can be achieved. Unfortunately, due to the fact that COS delay increases non-linearly as the fan-in grows, Carry Look Ahead implementation is used in a modular way, cascading several 4-bit CLAs Prefix Adders In very simple words, a parallel prefix algorithm takes n inputs xn 1, xn 2,..., x0 and produces in parallel n outputs xn 1 xn 2... x0, xn 2... x0,..., x0. The analogy between carry computation and the prefix algorithm is that the carry computation at a certain stage i depends on all inputs of the stages i 1 to 0. Let an 1, an 2,..., a0 and bn 1, bn 2,..., b0 be n-bit binary numbers to be added. Let c o designate the input carry and c n designate the output carry. For each bit, "propagate" ( p i ) and "generate" ( g i ) signals are defined, as described in the previous section. Furthermore, for parallelizing the computation of a carry two additional terms are defined: Group Carry Generate ( G i : j ) and Group Carry Propagate ( P i : j ). For each group of bits the Group Carry Generate signal G i : j means that the carry is generated somewhere between stages i and j, and it is propagated from that location to stage i. This implies c i+1 = 1 and, in particular, if j = 0, then G i: 0 = ci. For each group of bits the Group Carry Propagate signal P i : j means that the carry is propagated from stage j to stage i, i.e. c i+1 = c j. So the formal definition of G i : j and P i : j is expressed using the following relationship: [ G i: j, Pi : j ] = [ g i, pi ] if i = j [ G i: j, Pi : j ] = [ Gi : k, Pi : k ] [ Gk: j, Pk : j ] if i j Where i k j and " " operator is introduced by Brent and Kung [1]. Finally, once the final carries G i: 0 for all i < n have been computed, the sub bits are calculated as: p > > =, i Gi :0 n i 1 si pi, i = 0 The traditional CRA can be regarded as serial prefix adder using the above definitions.

8 1.2.7 Sklansky Prefix Adder Sklansky Prefix tree is shown on Figure 5 for 16-bit adder. Its structure is the simplest among the prefix adders. It used for a conditional-sum addition [2]. The fan-out of such adder grows exponentially from input to output along the critical path and it is 2 n. This leads to a large delay as the adder operand s width increases. Recursive division of the blocks can construct full adder using such a tree for the implementation. The number of n "o" cells required to implement is log 2 n and the delay is log 2 n, where n is the 2 adder s width. The detailed implementation of "o" cell is described in Figure 5: Sklansky Prefix Tree

9 1.2.8 Kogge-Stone Prefix Adder The Kogge-Stone structure has a more optimal implementation than Sklansky structure, as its fan-out is greatly reduced to 2 at the expense of larger "o" (circle) cells. It is obtained by copying the of the most significant bit position [3]. Figure 6 shows this prefix tree for 16-bit operands. Just as in 1.2.7, recursive division of the blocks can construct full adder using such a tree for the implementation. The number of "o" cells required for the implementation is n log 2 n n +1 and the delay is log 2 n, where n is the adder s width. It is expected that Kogge-Stone adder should consume more resources than Sklansky adder. The delay is 7 levels Figure 6: Kogge-Stone Prefix Tree

10 2 Design Flow & Implementation The following Figure 7 illustrates design flow for the implementation of prefix adders. Design Specification acro Architecture VHDL RTL Coding Structural Level (Emacs VHDL mode) Test Bench PRBS Generator Verification Plan Test Case Specification Results Simulation odelsim 6.0 SE Results Compare Results Synthesis Place and Route Xilinx ISE 6.3 SP3 Results Reports Analyze Results Figure 7: Design Flow

11 2.1 icro Architecture Top Entity The following Figure 8 illustrates top-level view. The top entity is named full_adder_sklansky and full_adder_kogge_stone, respectively, with the following ports (Table 1). operand1 operand2 128 result 128 full_adder_sklansky or 128 sys_clk full_adder_kogge_stone carry_out reset_n Figure 8: Top Level View Signal Name Width, [bits] Direction Comments operand1 128 input Number #1 to be added operand2 128 input Number #2 to be added sys_clk 1 input System clock reset_n 1 input System reset (active low) result 128 output Result of an addition carry_out 1 output Output carry resulting from an addition Table 1: Signal Description

12 2.1.2 Sub-Block Partitioning The top-level block is further partitioned into three sub-blocks, as it is shown on Figure 9. No doubt, the choices of block partitioning are numerous. It is chosen to partition the design into three sub-blocks due to the fact that in such block partitioning the two different adders designs differ only by one sub-block, which is Carry Generation Block (cg_gen). Consequently, two different sub-blocks are designed: cg_gen_sklansky and cg_gen_kogge_stone. operand2[127] operand1[127] operand2[0] operand1[0] pg_gen ("Carry Propagate"&"Carry Generate" Block) g[127](0) g[0](0) cg_gen_sklansky cg_gen_kogge_stone (2-D Carry Generation Block) p[127] g[0](-1) p[0] g[127](-1) sb_gen (Sum Bits Generation Block) carry_out s[127] s[126] s[1] s[0] Figure 9: full_adder sub-block partitioning The subsequent sections elaborate on each one of the sub-blocks.

13 "Carry Propagate" and "Carry Generate" Block (pg_gen) This sub-block calculates "carry propagate" p [0) and "carry generate" g [0), which are calculated from operand1 and operand2 bitwise, as defined in 1.2.5, namely: p[ 0) = operand1[ i] operand2[ i] g[ 0) = operand1[ i] operand2[ i] The implementation is shown on Figure 10. This block consumes input AND gates and input XOR gates. operand2[127] operand1[127] operand2[1] operand1[1] operand2[0] operand1[0] g[127] p[127] g[i] p[i] g[1] p[1] g[0] p[0] Figure 10: "Carry Generate" and "Carry Propagate" Block Implementation

14 Carry Generation Block The signals p [0) and g [0) generated in Precondition Block are used within Carry Generation Block for calculation the g [ 1) signals, which could be represented as two-dimensional carry generate structure. Further subsequent sections describe the implementation of Carry Generation Block for each one of the chosen designs Carry Generation Block Sklansky Prefix Adder (cg_gen_sklansky) Following the Sklansky prefix tree (presented in 1.2.7), the following observation is determined (assuming a two-dimensional structure j rows by i columns): In the column i, cells occupy the nodes whose row coordinates j correspond to "1" in the binary representation of i, i.e. straight forward from binary encoding of the index i. The coordinate corresponding to "0" in the binary representation of i simply propagates the p [ j) and g [ j) All "o" (circle) cells are of GP type except of those situated in the bottom border of < log i. j 2 The output of GP cell is defined as following: g[ j) = g[ j 1) + p[ j 1) g[ i i mod 2 The output of G cell is defined as following: j 1 p[ j) = p[ j 1) p[ i] i mod 2 1]( j 1) g[ j) = g[ j 1) + p[ j 1) g[ i i mod 2 j 1 j 1 1]( j 1) 1]( j 1) Following the prefix algorithm description, with n = 128 the implementation consumes 448 "o" cells, namely input OR gates and the same amount of 2-input AND gates. The delay is 7 levels and the fan-out is Carry Generation Block Kogge-Stone Prefix Adder (cg_gen_kogge_stone) Following the Kogge-Stone prefix tree (presented in 1.2.8) and assuming a twodimensional structure j rows by i columns, the nodes in the upper-left are populated with "o" (circle) cells, while the rest of the two-dimensional array is empty, i.e. the "o" (circle) cells are placed in the nodes whose coordinates satisfy the following relationship: j+ 1 j 1 and i N 1 The outputs of the placed cells are: j 1 p[ j) = p[ j 1) p[ i 2 ]( j 1) g[ j) = g[ j 1) + p[ j 1) g[ i 2 j 1 ]( j 1) Following the prefix algorithm description, with n = 128 the implementation consumes 769 "o" cells, hence occupying input OR gates and the same amount of 2-input AND gates.

15 Sum Bits Generation Block (sb_gen) The sum bits are produced in Sum Bits Generation Block by XORing the "carry propagate" signals, p [0), generated in Precondition Block, and the "carry generate" bits g [ 1). Figure 11 illustrates the implementation, which is consuming input XOR gates. g[126](-1) p[127] g[0](-1) p[1] p[0] carry_in s[127] s[i] s[1] s[0] Figure 11: Sum Bits Generation Block Implementation 2.2 RTL Coding RTL coding is done in VHDL at the structural level. The basic cells are 2-input AND gate, 2-input OR gate, 2-input XOR gate and D-type positive edge triggered flip flop. The text editor used is emacs version 20.7 with vhdl mode, since it has many templates for arranging VHDL code in an alignment, which is easy to read. Each one of the files has a header at the top explaining the entity name and its logical function. 2.3 Verification Plan In general, describing the same design functionality (especially of a large and complex design) by a high-level language, such as C/C++ or using verification tools, such as Verisity Specman, etc, is the way to verify the design in many scenarios with many possible input combinations. For the verification of the two full adders, the following is proposed (Figure 12). A test bench, which is written in behavioral Verilog, instantiates both designs. Two 128- bit numbers are generated using a dedicated LFSR (Linear Feedback Shift Register) [4], which generates pseudo-random bit stream. Each clock cycle, the values of two 128-bit numbers change in pseudo-random way. These values are summed using a '+' operation within the test bench and they are also applied as inputs to both adders. The resulting output sum and carry of each adder is compared with the result generated by '+' addition within the test bench.

16 A successful test case (test passed) is defined as the match between the result of a test bench and the result of each adder. test_bench operand1+operand2 result[127:0] carry_out match_sklansky match_kogge_stone 128-bit PRBS Generator operand1[127:0] test_bench results file 128-bit PRBS Generator operand2[127:0] operand1 result 128 operand2 full_adder_sklansky 128 carry_out 128 operand1 result 128 operand2 full_adder_kogge_stone 128 carry_out 128 Figure 12: Test Bench & Verification Plan 2.4 Synthesis, Place and Route Synthesis, placement and routing of the design are done using Xilinx ISE 6.3i software with the latest service pack SP3. The constraints are set for the best timing, by selecting the optimization criteria "speed" with the maximum effort. ore details on the results, as well as the faced problems, are given in the section Results

17 3.1 Simulation Results Initial Test Cases The initial test cases are defined as the sum of the following 128-bit numbers. The very first case verifies the sum of the following numbers: 64 zeros followed by 64 ones. 64 ones followed by 64 zeros. The next case is: 32 repetitions of 0xA. 32 repetitions of 0x5. In such fashion, the possible bit swapping or incorrect index generation is tested. Figure 13 illustrates the simulation results for the initial test case. operand1 and operand2 are, effectively, the two 128-bit numbers to be added. result and carry_out are outputs of each one of the adders marked by the appropriate divider (Sklansky Adder and Kogge-Stone Adder, respectively). Figure 13: Initial Test Case Simulation Results General Test Case In general test case, the data is generated in a pseudo-random way, as described in the section 2.3. Three snapshots of the simulation results are given in the following figures. Figure 14 illustrates the entire simulation. The lowest divider separates the test bench signals. operand1_prbs and operand2_prbs are the 128-bit PRBS data, which is applied to the adders. operand1 and operand2 are the input numbers; result and carry_out are the outputs of the adder circuits, marked by the corresponding divider (Sklansky Adder and Kogge-Stone Adder, respectively). Two more very important test bench signals are result_match_sklansky and result_match_kogge_stone, which are updated each clock cycle, depending whether there is a match between the test bench result and the respective result of Sklansky adder and Kogge-Stone Adder. Figure 15 and Figure 16 are giving two "zoom-in" examples of the same simulation.

18 Figure 14: General Test Case - Full Zoom

19 Figure 15: General Test Case - Example 1 Figure 16: General Test Case - Example 2

20 3.2 Synthesis Results Both designs were successfully synthesized for Virtex-2 device XC2V500. The synthesis results are summarized in the following Table 2. It is noted that Kogge-Stone adder consumes more resources than Sklansky adder, just as it was expected. Results Explanation (Table 2): The input and outputs of the design were sampled in order to achieve more true delay estimation, assuming that the inputs and the outputs of the design are registered. Furthermore, in the placement and routing stage, a specific option, which forces the flip-flops to be packed within the I/O buffer, is selected, so that the logic delay represents true estimation of each adder s processing delay in this FPGA implementation. However, due to the fact the maximum available user I/O pins for this device is 264 (package FG456), further placement and routing of the design, and, hence, the true estimation of its logic delay is not possible. Consequently, there are two alternatives. One alternative is multiplexing the I/Os in order to fit the design into XC2V500 device. Another alternative is to select a larger device, which is XC2V1000. Both the alternatives are described in the following subsections. Table 2: Synthesis Results (No placement and routing): XC2V500 -FG456-4 device Design LUTs usage 1-bit Registers Usage Total Slices Usage aximum Frequency Sklansky 829 (13%) 385 (6%) 453 (14%) 85.6 Hz Adder Kogge-Stone Adder 1449 (23%) 385 (6%) 751 (24%) Hz ultiplexing I/O This alternative requires complete redesigning of the interface and changing the overall architecture of the design. Either loading the numbers or outputting the result in multiplexed way could have advantages and disadvantages, which are summarized further. In addition, handshaking signals, which designate the start of loading and the completion of the addition, are required ultiplexed Inputs In this case, it is obvious that the design latency (overall processing time) will increase, since the whole input numbers cannot be acquired at once. However, there are two major advantages that could be achieved. First, the logic required for the addition could be reduced, since the logic performing the addition cannot process more bits than are present on the interface at the same cycle. Consequently, the addition could be performed in multiplexed fashion, especially if the loading of the input numbers is done in the way that the least significant part of the numbers is loaded first. Second, that the overall speed of the design will definitely increase as the complexity and combinational levels of logic decrease as well.

21 ultiplexed Outputs In this case, it is also obvious that the design latency (overall processing time) will increase, since the output is not generated at once. However, there are two major advantages that could be achieved here as well. First, the logic required for the addition could be reduced, since the logic performing the addition cannot generate more bits than the output (result) width is. Consequently, the addition could be performed in multiplexed fashion, processing least significant part of the input numbers first, i.e. the least significant part of the output is generated earlier than the most significant one. Second, that the overall speed of the design will definitely increase as the complexity and combinational levels of logic decrease as well ultiplexed Inputs and Outputs In general, this case combines the alternatives discussed in and No doubt as the design latency (overall processing time) will increase. Assuming that the inputs are loaded with least significant part first, the least significant part of the output can be generated at once. So, there are the same two major advantages can be achieved in this case as well. First, the logic required for the addition could be reduced. Second, that the overall speed of the design will definitely increase as the complexity and combinational levels of logic decrease as well. The most optimal case is when the input and the output widths are the same. If the input and the output widths are different, this will definitely result in another level of complexity in this design, which I leave outside the scope of this project Changing Target Device This alternative is the quickest solution because it introduces no modifications within the RTL design. The new target device is XC2V1000 with package FF896, allowing up to 432 user I/O pins. The main disadvantage of this alternative is that the larger device represents a more costly solution. Table 3 and Table 4 present the synthesis and the placement and routing results with the maximum efforts on timing, respectively. The results are different because the synthesis tool gives the delays estimation without knowing the true placement and routing. Table 3: Synthesis Results: XC2V1000 FF896-4 device Design LUTs usage 1-bit Registers Usage Total Slices Usage aximum Frequency Sklansky 829 (8%) 385 (3%) 453 (6%) 85.6 Hz Adder Kogge-Stone Adder 1449 (14%) 385 (3%) 751 (14%) Hz Table 4: Placement and Routing Results FF896-4 device Design Total Slices Usage aximum Delay / Frequency Sklansky Adder 585 (11%) ns / 64.8 Hz Kogge-Stone Adder 1042 (20%) ns / 70.1 Hz

22 4 Design Enhancement Pipelining The pipelining of the design is introduced in order to improve the design speed. There are two ways of applying pipelining. One, manual, is to locate the exact point at the critical path, which has an arrival time of exactly half the total delay of the critical path (or one third, if two pipeline stages are inferred, and so on) and insert a pipeline there. Another alternative, automatic pipelining, is described below. The location of the pipelining registers location is chosen automatically by Xilinx synthesis tool. In the design, N pipeline stages are added to the inputs, the outputs or both inputs and outputs of a design and the software optimizes the location of the pipeline registers according to specified timing requirements and synthesis effort by moving them forward and backward. This is also referred as "forward/backward register balancing" in the tools (Xilinx ISE [6]) and "retiming" (Synplicity Synplify Pro 7.xx [7]) and it is illustrated at Figure 17 and Figure 18. The software automatically determines Td1 and Td2 corresponding to the given timing constraints and synthesis effort. Pipeline stage Pipeline stage Td Pipeline stage sys_clk Pipeline stage Td1 Pipeline stage Td2 Pipeline stage sys_clk Td = Td1 + Td2 Figure 17: Forward Registers Balancing (Pipelining) Pipeline stage Td Pipeline stage Pipeline stage sys_clk Pipeline stage Td1 Pipeline stage Td2 Pipeline stage sys_clk Td = Td1 + Td2 Figure 18: Backward Registers Balancing (Pipelining)

23 Table 5 gives the result of automatic pipelining of Sklansky Adder. Table 6 gives the result of automatic pipelining of Kogge-Stone Adder. From the results, it is observed that: Adding one output pipeline stage improves the timing, while adding two pipeline stages does not. The main reason is the fact that the delay distribution, consists of approximately 25%-30% logic delay and approximately 70% routing delay. Despite that adding 2 pipeline stages improves flip-flop to flip-flop delay, due to the routing delay, the total delay is worse than with only 1 pipeline stage. One other important factor that might prevent from achieving the good performance could be the high usage of I/O pins, which imposes another level of complexity for the place and route tool. The faster a certain path is, the more percentage of it is contributed by the actual logic delay. ultiple iterations of synthesis, place and route produce slightly different results. Number of Pipeline Stages Total Slices Usage aximum Delay / Frequency Delay Distribution Logic % / Routing % 1 input stage 551 (11%) ns / 91.7 Hz 33 / 67 2 input stages 746 (14%) 9.9 ns / 101 Hz 36 / 64 1 output stage 603 (11%) ns / 82.1 Hz 32 / 68 2 output stages 630 (12%) ns / 79.1 Hz 27 / 73 1 stage at input 571 (11%) ns / Hz 43 / 57 and output 2 stages at input and output 777 (15%) ns / Hz 45 / 55 Table 5: Placement and Routing Results of Pipelined Sklansky Adder Number of Total Slices Usage aximum Delay / Delay Distribution Pipeline Stages Frequency Logic % / Routing % 1 input stage 838 (16%) ns / 89.9 Hz 32 / 68 2 input stages 948 (18%) ns / Hz 28 / 72 1 output stage 852 (16%) ns / Hz 30 / 70 2 output stages 933 (18%) ns / Hz 41 / 69 1 stage at input 888 (17%) ns / Hz 43 / 57 and output 2 stages at input and output 1075 (%) ns / Hz 47 / 53 Table 6: Placement and Routing Results of Pipelined Kogge-Stone Adder

24 5 Summary and Conclusions Two different parallel prefix 128-bit adders were designed, analyzed and tested. In the beginning of the design process, it was noted that the required device (XC2V500) couldn t accommodate the requirements because of the limited number of the available user I/O pins. Two alternatives were discussed and considered for further step of the design: using the multiplexed I/O and, hence, reducing the overall number of the used I/Os or changing the target device to XC2V1000. The second alternative was chosen because it did not require redesigning and involving other levels of complexity. It was observed that due to the nature of Kogge-Stone prefix, the expected resource usage of Kogge-Stone adder will be greater comparing with Sklansky adder and it was justified by the results. It was also observed that multiple iterations of the same design s synthesis sometimes produce slightly different placement results in terms of logic resources usage and timing. The reason for this is the fact that the placement and routing algorithm used by Xilinx tools is based on randomized initial settings [6], [8], in opposite to Altera [7]. Pipelining by inserting a number of pipeline stages enhanced the designs and the results were analyzed. It turns out that the pipelining is not necessary improving the design speed. The main reason for this is that the delay distribution in most cases consists of approximately 20% to 40% of the actual logic and the rest, which is 80% down to 60%, respectively, of routing delay. So, it is concluded that adding more pipeline stages does not necessary improves the total delay.

25 6 References [1] R. T. Brent and H. T. Kung "A regular layout of parallel adders", IEEE Trans. Comput. Vol. C-31, No 3, pp , arch 1982 [2] J. Sklansky "Conditional-sum Addition Logic", in IRE transactions of electronic Computers, Vol. EC-9, No 2, pp , June 1960 [3] P.. Kogge and H. S. Stone "A parallel algorithm for the efficient solution of a general class of recurrence qeuations, IEEE Transactions on computers. C-22(8): Aug 1973 [4] Paul H. Bardell, William H. canney, and Jacob Savir, "Built-In Test for VLSI: Pseudorandom Techniques", John Wiley & Sons, New York, 1987 [5] V. G. Oklobdzija, E. R. Barnes, "Some Optimal Schemes for ALU Implementation in VLSI Technology", Proceedings of the 7th Symposium on Computer Arithmetic ARITH- 7, pp Reprinted in Computer Arithmetic, E. E. Swartzlander, (editor), Vol. II, pp , [6] Xilinx Programmable Logic Devices PLD & FPGA, [7] Synplicity Synplify Pro 7.02 user s guide [8] Xilinx ISE 6.2 / 6.3 user s manual

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient ISSN (Online) : 2278-1021 Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient PUSHPALATHA CHOPPA 1, B.N. SRINIVASA RAO 2 PG Scholar (VLSI Design), Department of ECE, Avanthi

More information

Design and Characterization of High Speed Carry Select Adder

Design and Characterization of High Speed Carry Select Adder Design and Characterization of High Speed Carry Select Adder Santosh Elangadi MTech Student, Dept of ECE, BVBCET, Hubli, Karnataka, India Suhas Shirol Professor, Dept of ECE, BVBCET, Hubli, Karnataka,

More information

Lecture 19: Arithmetic Modules 14-1

Lecture 19: Arithmetic Modules 14-1 Lecture 19: Arithmetic Modules 14-1 Syllabus Objectives Addition and subtraction Multiplication Division Arithmetic and logic unit 14-2 Objectives After completing this chapter, you will be able to: Describe

More information

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders Vol. 3, Issue. 4, July-august. 2013 pp-2266-2270 ISSN: 2249-6645 Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders V.Krishna Kumari (1), Y.Sri Chakrapani

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

An Efficient Carry Select Adder with Less Delay and Reduced Area Application An Efficient Carry Select Adder with Less Delay and Reduced Area Application Pandu Ranga Rao #1 Priyanka Halle #2 # Associate Professor Department of ECE Sreyas Institute of Engineering and Technology,

More information

VLSI Implementation of Adders for High Speed ALU

VLSI Implementation of Adders for High Speed ALU VLSI Implementation of Adders for High Speed ALU Prashant Gurjar Rashmi Solanki Pooja Kansliwal Mahendra Vucha Asst. Prof., Dept. EC,, ABSTRACT This paper is primarily deals the construction of high speed

More information

1. Introduction. Raj Kishore Kumar 1, Vikram Kumar 2

1. Introduction. Raj Kishore Kumar 1, Vikram Kumar 2 ASIC Implementation and Comparison of Diminished-one Modulo 2 n +1 Adder Raj Kishore Kumar 1, Vikram Kumar 2 1 Shivalik Institute of Engineering & Technology 2 Assistant Professor, Shivalik Institute of

More information

the main limitations of the work is that wiring increases with 1. INTRODUCTION

the main limitations of the work is that wiring increases with 1. INTRODUCTION Design of Low Power Speculative Han-Carlson Adder S.Sangeetha II ME - VLSI Design, Akshaya College of Engineering and Technology, Coimbatore sangeethasoctober@gmail.com S.Kamatchi Assistant Professor,

More information

High Speed Han Carlson Adder Using Modified SQRT CSLA

High Speed Han Carlson Adder Using Modified SQRT CSLA I J C T A, 9(16), 2016, pp. 7843-7849 International Science Press High Speed Han Carlson Adder Using Modified SQRT CSLA D. Vamshi Krishna*, P. Radhika** and T. Vigneswaran*** ABSTRACT Binary addition is

More information

Parallel-Prefix Adders Implementation Using Reverse Converter Design. Department of ECE

Parallel-Prefix Adders Implementation Using Reverse Converter Design. Department of ECE Parallel-Prefix Adders Implementation Using Reverse Converter Design Submitted by: M.SHASHIDHAR Guide name: J.PUSHPARANI, M.TECH Department of ECE ABSTRACT: The binary adder is the critical element in

More information

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit Nurul Hazlina 1 1. Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit Nurul Hazlina 2 Introduction 1. Digital circuits are frequently used for arithmetic operations 2. Fundamental

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 1502 Design and Characterization of Koggestone, Sparse Koggestone, Spanning tree and Brentkung Adders V. Krishna

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3 Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number Chapter 3 Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number Chapter 3 3.1 Introduction The various sections

More information

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,

More information

Srinivasasamanoj.R et al., International Journal of Wireless Communications and Network Technologies, 1(1), August-September 2012, 4-9

Srinivasasamanoj.R et al., International Journal of Wireless Communications and Network Technologies, 1(1), August-September 2012, 4-9 ISSN 2319-6629 Volume 1, No.1, August- September 2012 International Journal of Wireless Communications and Networking Technologies Available Online at http://warse.org/pdfs/ijwcnt02112012.pdf High speed

More information

Xilinx ASMBL Architecture

Xilinx ASMBL Architecture FPGA Structure Xilinx ASMBL Architecture Design Flow Synthesis: HDL to FPGA primitives Translate: FPGA Primitives to FPGA Slice components Map: Packing of Slice components into Slices, placement of Slices

More information

CAD4 The ALU Fall 2009 Assignment. Description

CAD4 The ALU Fall 2009 Assignment. Description CAD4 The ALU Fall 2009 Assignment To design a 16-bit ALU which will be used in the datapath of the microprocessor. This ALU must support two s complement arithmetic and the instructions in the baseline

More information

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1 FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1 Anurag Dwivedi Digital Design : Bottom Up Approach Basic Block - Gates Digital Design : Bottom Up Approach Gates -> Flip Flops Digital

More information

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Fall 2002 EECS150 - Lec06-FPGA Page 1 Outline What are FPGAs? Why use FPGAs (a short history

More information

ECE468 Computer Organization & Architecture. The Design Process & ALU Design

ECE468 Computer Organization & Architecture. The Design Process & ALU Design ECE6 Computer Organization & Architecture The Design Process & Design The Design Process "To Design Is To Represent" Design activity yields description/representation of an object -- Traditional craftsman

More information

Parallelized Radix-4 Scalable Montgomery Multipliers

Parallelized Radix-4 Scalable Montgomery Multipliers Parallelized Radix-4 Scalable Montgomery Multipliers Nathaniel Pinckney and David Money Harris 1 1 Harvey Mudd College, 301 Platt. Blvd., Claremont, CA, USA e-mail: npinckney@hmc.edu ABSTRACT This paper

More information

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs? EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Outline What are FPGAs? Why use FPGAs (a short history lesson). FPGA variations Internal logic

More information

DESIGN AND IMPLEMENTATION 0F 64-BIT PARALLEL PREFIX BRENTKUNG ADDER

DESIGN AND IMPLEMENTATION 0F 64-BIT PARALLEL PREFIX BRENTKUNG ADDER DESIGN AND IMPLEMENTATION 0F 64-BIT PARALLEL PREFIX BRENTKUNG ADDER V. Jeevan Kumar 1, N.Manasadevi 2, A.Hemalatha 3, M.Sai Kiran 4, P.Jhansi Rani 5 1 Asst. Professor, 2,3,4,5 Student, Dept of ECE, Sri

More information

Low-Area Low-Power Parallel Prefix Adder Based on Modified Ling Equations

Low-Area Low-Power Parallel Prefix Adder Based on Modified Ling Equations I J C T A, 9(18) 2016, pp. 8935-8943 International Science Press Low-Area Low-Power Parallel Prefix Adder Based on Modified Ling Equations Rohan Pinto * and Kumara Shama * ABSTRACT For the design and implementation

More information

Parallel FIR Filters. Chapter 5

Parallel FIR Filters. Chapter 5 Chapter 5 Parallel FIR Filters This chapter describes the implementation of high-performance, parallel, full-precision FIR filters using the DSP48 slice in a Virtex-4 device. ecause the Virtex-4 architecture

More information

Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT

Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT Design of Delay Efficient Arithmetic Based Split Radix FFT Nisha Laguri #1, K. Anusudha *2 #1 M.Tech Student, Electronics, Department of Electronics Engineering, Pondicherry University, Puducherry, India

More information

CHAPTER 9 MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES

CHAPTER 9 MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES CHAPTER 9 MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES This chapter in the book includes: Objectives Study Guide 9.1 Introduction 9.2 Multiplexers 9.3 Three-State Buffers 9.4 Decoders and Encoders

More information

Design of an Efficient 128-Bit Carry Select Adder Using Bec and Variable csla Techniques

Design of an Efficient 128-Bit Carry Select Adder Using Bec and Variable csla Techniques Design of an Efficient 128-Bit Carry Select Adder Using Bec and Variable csla Techniques B.Bharathi 1, C.V.Subhaskar Reddy 2 1 DEPARTMENT OF ECE, S.R.E.C, NANDYAL 2 ASSOCIATE PROFESSOR, S.R.E.C, NANDYAL.

More information

A High Speed Design of 32 Bit Multiplier Using Modified CSLA

A High Speed Design of 32 Bit Multiplier Using Modified CSLA Journal From the SelectedWorks of Journal October, 2014 A High Speed Design of 32 Bit Multiplier Using Modified CSLA Vijaya kumar vadladi David Solomon Raju. Y This work is licensed under a Creative Commons

More information

ANALYZING THE PERFORMANCE OF CARRY TREE ADDERS BASED ON FPGA S

ANALYZING THE PERFORMANCE OF CARRY TREE ADDERS BASED ON FPGA S ANALYZING THE PERFORMANCE OF CARRY TREE ADDERS BASED ON FPGA S RENUKUNTLA KIRAN 1 & SUNITHA NAMPALLY 2 1,2 Ganapathy Engineering College E-mail: kiran00447@gmail.com, nsunitha566@gmail.com Abstract- In

More information

Design and Implementation of High Performance Parallel Prefix Adders

Design and Implementation of High Performance Parallel Prefix Adders Design and Implementation of High Performance Parallel Prefix Adders CH.Sudha Rani, CH.Ramesh Student, Department of ECE, Ganapathy Engineering College, Warangal, India. Associate Professor, Department

More information

VARUN AGGARWAL

VARUN AGGARWAL ECE 645 PROJECT SPECIFICATION -------------- Design A Microprocessor Functional Unit Able To Perform Multiplication & Division Professor: Students: KRIS GAJ LUU PHAM VARUN AGGARWAL GMU Mar. 2002 CONTENTS

More information

A Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor

A Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor A Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor Abstract Increasing prominence of commercial, financial and internet-based applications, which process decimal data, there

More information

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE. 16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE. AditiPandey* Electronics & Communication,University Institute of Technology,

More information

Topics. Midterm Finish Chapter 7

Topics. Midterm Finish Chapter 7 Lecture 9 Topics Midterm Finish Chapter 7 Xilinx FPGAs Chapter 7 Spartan 3E Architecture Source: Spartan-3E FPGA Family Datasheet CLB Configurable Logic Blocks Each CLB contains four slices Each slice

More information

Area Delay Power Efficient Carry-Select Adder

Area Delay Power Efficient Carry-Select Adder Area Delay Power Efficient Carry-Select Adder Pooja Vasant Tayade Electronics and Telecommunication, S.N.D COE and Research Centre, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Reduced Delay BCD Adder

Reduced Delay BCD Adder Reduced Delay BCD Adder Alp Arslan Bayrakçi and Ahmet Akkaş Computer Engineering Department Koç University 350 Sarıyer, İstanbul, Turkey abayrakci@ku.edu.tr ahakkas@ku.edu.tr Abstract Financial and commercial

More information

M.J. Flynn 1. Lecture 6 EE 486. Bit logic. Ripple adders. Add algorithms. Addition. EE 486 lecture 6: Integer Addition

M.J. Flynn 1. Lecture 6 EE 486. Bit logic. Ripple adders. Add algorithms. Addition. EE 486 lecture 6: Integer Addition EE 486 lecture 6: Integer Addition M. J. Flynn Computer Architecture & Arithmetic Group 1 Stanford University Computer Architecture & Arithmetic Group 2 Stanford University Addition The add function is

More information

University, Patiala, Punjab, India 1 2

University, Patiala, Punjab, India 1 2 1102 Design and Implementation of Efficient Adder based Floating Point Multiplier LOKESH BHARDWAJ 1, SAKSHI BAJAJ 2 1 Student, M.tech, VLSI, 2 Assistant Professor,Electronics and Communication Engineering

More information

Don t expect to be able to write and debug your code during the lab session.

Don t expect to be able to write and debug your code during the lab session. EECS150 Spring 2002 Lab 4 Verilog Simulation Mapping UNIVERSITY OF CALIFORNIA AT BERKELEY COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Lab 4 Verilog Simulation Mapping

More information

DESIGN AND IMPLEMENTATION OF ADDER ARCHITECTURES AND ANALYSIS OF PERFORMANCE METRICS

DESIGN AND IMPLEMENTATION OF ADDER ARCHITECTURES AND ANALYSIS OF PERFORMANCE METRICS International Journal of Electronics and Communication Engineering and Technology (IJECET) Volume 8, Issue 5, September-October 2017, pp. 1 6, Article ID: IJECET_08_05_001 Available online at http://www.iaeme.com/ijecet/issues.asp?jtype=ijecet&vtype=8&itype=5

More information

FPGA for Software Engineers

FPGA for Software Engineers FPGA for Software Engineers Course Description This course closes the gap between hardware and software engineers by providing the software engineer all the necessary FPGA concepts and terms. The course

More information

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs ECE 645: Lecture Basic Adders and Counters Implementation of Adders in FPGAs Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 5, Basic Addition and Counting,

More information

PINE TRAINING ACADEMY

PINE TRAINING ACADEMY PINE TRAINING ACADEMY Course Module A d d r e s s D - 5 5 7, G o v i n d p u r a m, G h a z i a b a d, U. P., 2 0 1 0 1 3, I n d i a Digital Logic System Design using Gates/Verilog or VHDL and Implementation

More information

A Unified Addition Structure for Moduli Set {2 n -1, 2 n,2 n +1} Based on a Novel RNS Representation

A Unified Addition Structure for Moduli Set {2 n -1, 2 n,2 n +1} Based on a Novel RNS Representation A Unified Addition Structure for Moduli Set { n -, n, n +} Based on a Novel RNS Representation Somayeh Timarchi,, Mahmood Fazlali,, and Sorin D.Cotofana Department of Electrical and Computer Engineering,

More information

IE1204 Digital Design L7: Combinational circuits, Introduction to VHDL

IE1204 Digital Design L7: Combinational circuits, Introduction to VHDL IE24 Digital Design L7: Combinational circuits, Introduction to VHDL Elena Dubrova KTH / ICT / ES dubrova@kth.se This lecture BV 38-339, 6-65, 28-29,34-365 IE24 Digital Design, HT 24 2 The multiplexer

More information

Tutorial 3. Appendix D. D.1 Design Using Verilog Code. The Ripple-Carry Adder Code. Functional Simulation

Tutorial 3. Appendix D. D.1 Design Using Verilog Code. The Ripple-Carry Adder Code. Functional Simulation Appendix D Tutorial 3 This tutorial introduces more advanced capabilities of the Quartus II system. We show how Verilog code is organized and compiled and illustrate how multibit signals are represented

More information

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase Abhay Sharma M.Tech Student Department of ECE MNNIT Allahabad, India ABSTRACT Tree Multipliers are frequently

More information

Circuit Design and Simulation with VHDL 2nd edition Volnei A. Pedroni MIT Press, 2010 Book web:

Circuit Design and Simulation with VHDL 2nd edition Volnei A. Pedroni MIT Press, 2010 Book web: Circuit Design and Simulation with VHDL 2nd edition Volnei A. Pedroni MIT Press, 2010 Book web: www.vhdl.us Appendix C Xilinx ISE Tutorial (ISE 11.1) This tutorial is based on ISE 11.1 WebPack (free at

More information

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Performance Analysis of CORDIC Architectures Targeted by FPGA Devices Guddeti Nagarjuna Reddy 1, R.Jayalakshmi 2, Dr.K.Umapathy

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

Adders, Subtracters and Accumulators in XC3000

Adders, Subtracters and Accumulators in XC3000 s, ubtracters and Accumulators in XC3000 XAPP 022.000 Application Note By PETER ALFKE and BERNIE NEW ummary This Application Note surveys the different adder techniques that are available for XC3000 designs.

More information

Lecture 3: Modeling in VHDL. EE 3610 Digital Systems

Lecture 3: Modeling in VHDL. EE 3610 Digital Systems EE 3610: Digital Systems 1 Lecture 3: Modeling in VHDL VHDL: Overview 2 VHDL VHSIC Hardware Description Language VHSIC=Very High Speed Integrated Circuit Programming language for modelling of hardware

More information

AN EFFICIENT REVERSE CONVERTER DESIGN VIA PARALLEL PREFIX ADDER

AN EFFICIENT REVERSE CONVERTER DESIGN VIA PARALLEL PREFIX ADDER AN EFFICIENT REVERSE CONVERTER DESIGN VIA PARALLEL PREFIX ADDER #1 BEERAM SANDHYARANI, M.Tech Student, #2 R.NARAIAH, Associate Professor, Department Of ECE VAAGESHWARI COLLEGE OF ENGINEERING, KARIMNAGAR,

More information

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 6c High-Speed Multiplication - III Spring 2017 Koren Part.6c.1 Array Multipliers The two basic operations - generation

More information

FPGA Matrix Multiplier

FPGA Matrix Multiplier FPGA Matrix Multiplier In Hwan Baek Henri Samueli School of Engineering and Applied Science University of California Los Angeles Los Angeles, California Email: chris.inhwan.baek@gmail.com David Boeck Henri

More information

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic ECE 42/52 Rapid Prototyping with FPGAs Dr. Charlie Wang Department of Electrical and Computer Engineering University of Colorado at Colorado Springs Evolution of Implementation Technologies Discrete devices:

More information

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y Arithmetic A basic operation in all digital computers is the addition and subtraction of two numbers They are implemented, along with the basic logic functions such as AND,OR, NOT,EX- OR in the ALU subsystem

More information

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function. FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different

More information

Topics. Midterm Finish Chapter 7

Topics. Midterm Finish Chapter 7 Lecture 9 Topics Midterm Finish Chapter 7 ROM (review) Memory device in which permanent binary information is stored. Example: 32 x 8 ROM Five input lines (2 5 = 32) 32 outputs, each representing a memory

More information

Lecture 7. Standard ICs FPGA (Field Programmable Gate Array) VHDL (Very-high-speed integrated circuits. Hardware Description Language)

Lecture 7. Standard ICs FPGA (Field Programmable Gate Array) VHDL (Very-high-speed integrated circuits. Hardware Description Language) Standard ICs FPGA (Field Programmable Gate Array) VHDL (Very-high-speed integrated circuits Hardware Description Language) 1 Standard ICs PLD: Programmable Logic Device CPLD: Complex PLD FPGA: Field Programmable

More information

Performance of Constant Addition Using Enhanced Flagged Binary Adder

Performance of Constant Addition Using Enhanced Flagged Binary Adder Performance of Constant Addition Using Enhanced Flagged Binary Adder Sangeetha A UG Student, Department of Electronics and Communication Engineering Bannari Amman Institute of Technology, Sathyamangalam,

More information

Tutorial 2 Implementing Circuits in Altera Devices

Tutorial 2 Implementing Circuits in Altera Devices Appendix C Tutorial 2 Implementing Circuits in Altera Devices In this tutorial we describe how to use the physical design tools in Quartus II. In addition to the modules used in Tutorial 1, the following

More information

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011 FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level

More information

*Instruction Matters: Purdue Academic Course Transformation. Introduction to Digital System Design. Module 4 Arithmetic and Computer Logic Circuits

*Instruction Matters: Purdue Academic Course Transformation. Introduction to Digital System Design. Module 4 Arithmetic and Computer Logic Circuits Purdue IM:PACT* Fall 2018 Edition *Instruction Matters: Purdue Academic Course Transformation Introduction to Digital System Design Module 4 Arithmetic and Computer Logic Circuits Glossary of Common Terms

More information

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. Binary Arithmetic Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. MIT 6.004 Fall 2018 Reminder: Encoding Positive Integers Bit i in a binary representation (in right-to-left order)

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

Verilog Module 1 Introduction and Combinational Logic

Verilog Module 1 Introduction and Combinational Logic Verilog Module 1 Introduction and Combinational Logic Jim Duckworth ECE Department, WPI 1 Module 1 Verilog background 1983: Gateway Design Automation released Verilog HDL Verilog and simulator 1985: Verilog

More information

ISSN Vol.02, Issue.11, December-2014, Pages:

ISSN Vol.02, Issue.11, December-2014, Pages: ISSN 2322-0929 Vol.02, Issue.11, December-2014, Pages:1208-1212 www.ijvdcs.org Implementation of Area Optimized Floating Point Unit using Verilog G.RAJA SEKHAR 1, M.SRIHARI 2 1 PG Scholar, Dept of ECE,

More information

High Speed Systolic Montgomery Modular Multipliers for RSA Cryptosystems

High Speed Systolic Montgomery Modular Multipliers for RSA Cryptosystems High Speed Systolic Montgomery Modular Multipliers for RSA Cryptosystems RAVI KUMAR SATZODA, CHIP-HONG CHANG and CHING-CHUEN JONG Centre for High Performance Embedded Systems Nanyang Technological University

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 6c High-Speed Multiplication - III Israel Koren Fall 2010 ECE666/Koren Part.6c.1 Array Multipliers

More information

C-Based Hardware Design

C-Based Hardware Design LECTURE 6 In this lecture we will introduce: The VHDL Language and its benefits. The VHDL entity Concurrent and Sequential constructs Structural design. Hierarchy Packages Various architectures Examples

More information

FPGA architecture and design technology

FPGA architecture and design technology CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA

More information

EXPERIMENT NUMBER 11 REGISTERED ALU DESIGN

EXPERIMENT NUMBER 11 REGISTERED ALU DESIGN 11-1 EXPERIMENT NUMBER 11 REGISTERED ALU DESIGN Purpose Extend the design of the basic four bit adder to include other arithmetic and logic functions. References Wakerly: Section 5.1 Materials Required

More information

Area-Delay-Power Efficient Carry-Select Adder

Area-Delay-Power Efficient Carry-Select Adder Area-Delay-Power Efficient Carry-Select Adder Shruthi Nataraj 1, Karthik.L 2 1 M-Tech Student, Karavali Institute of Technology, Neermarga, Mangalore, Karnataka 2 Assistant professor, Karavali Institute

More information

Implementation of 64-Bit Kogge Stone Carry Select Adder with ZFC for Efficient Area

Implementation of 64-Bit Kogge Stone Carry Select Adder with ZFC for Efficient Area Implementation of 64-Bit Kogge Stone Carry Select Adder with ZFC for Efficient Area B.Tapasvi J, tapasvio 7@gmail.com B. G.S.S.B.Lakshmi J, gssblbolisetty@gmail.com K.Bala Sinduri 2, k.b.sindhuri@gmail.com

More information

MODULO 2 n + 1 MAC UNIT

MODULO 2 n + 1 MAC UNIT Int. J. Elec&Electr.Eng&Telecoms. 2013 Sithara Sha and Shajimon K John, 2013 Research Paper MODULO 2 n + 1 MAC UNIT ISSN 2319 2518 www.ijeetc.com Vol. 2, No. 4, October 2013 2013 IJEETC. All Rights Reserved

More information

FPGA Implementation of Efficient Carry-Select Adder Using Verilog HDL

FPGA Implementation of Efficient Carry-Select Adder Using Verilog HDL FPGA Implementation of Efficient Carry-Select Adder Using Verilog HDL Abstract: Lingappagari Raju M.Tech, VLSI & Embedded Systems, SR International Institute of Technology. Carry Select Adder (CSLA) is

More information

Digital Circuit Design and Language. Datapath Design. Chang, Ik Joon Kyunghee University

Digital Circuit Design and Language. Datapath Design. Chang, Ik Joon Kyunghee University Digital Circuit Design and Language Datapath Design Chang, Ik Joon Kyunghee University Typical Synchronous Design + Control Section : Finite State Machine + Data Section: Adder, Multiplier, Shift Register

More information

Binary Adders. Ripple-Carry Adder

Binary Adders. Ripple-Carry Adder Ripple-Carry Adder Binary Adders x n y n x y x y c n FA c n - c 2 FA c FA c s n MSB position Longest delay (Critical-path delay): d c(n) = n d carry = 2n gate delays d s(n-) = (n-) d carry +d sum = 2n

More information

CHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier

CHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier CHAPTER 3 METHODOLOGY 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier The design analysis starts with the analysis of the elementary algorithm for multiplication by

More information

Area Delay Power Efficient Carry Select Adder

Area Delay Power Efficient Carry Select Adder Area Delay Power Efficient Carry Select Adder Deeti Samitha M.Tech Student, Jawaharlal Nehru Institute of Engineering & Technology, IbrahimPatnam, Hyderabad. Abstract: Carry Select Adder (CSLA) is one

More information

Chapter 2 Basic Logic Circuits and VHDL Description

Chapter 2 Basic Logic Circuits and VHDL Description Chapter 2 Basic Logic Circuits and VHDL Description We cannot solve our problems with the same thinking we used when we created them. ----- Albert Einstein Like a C or C++ programmer don t apply the logic.

More information

An FPGA based Implementation of Floating-point Multiplier

An FPGA based Implementation of Floating-point Multiplier An FPGA based Implementation of Floating-point Multiplier L. Rajesh, Prashant.V. Joshi and Dr.S.S. Manvi Abstract In this paper we describe the parameterization, implementation and evaluation of floating-point

More information

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses Today Comments about assignment 3-43 Comments about assignment 3 ASICs and Programmable logic Others courses octor Per should show up in the end of the lecture Mealy machines can not be coded in a single

More information

Digital Design with FPGAs. By Neeraj Kulkarni

Digital Design with FPGAs. By Neeraj Kulkarni Digital Design with FPGAs By Neeraj Kulkarni Some Basic Electronics Basic Elements: Gates: And, Or, Nor, Nand, Xor.. Memory elements: Flip Flops, Registers.. Techniques to design a circuit using basic

More information

VLSI Implementation of Parallel CRC Using Pipelining, Unfolding and Retiming

VLSI Implementation of Parallel CRC Using Pipelining, Unfolding and Retiming IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 2, Issue 5 (May. Jun. 203), PP 66-72 e-issn: 239 4200, p-issn No. : 239 497 VLSI Implementation of Parallel CRC Using Pipelining, Unfolding

More information

Performance Analysis of 64-Bit Carry Look Ahead Adder

Performance Analysis of 64-Bit Carry Look Ahead Adder Journal From the SelectedWorks of Journal November, 2014 Performance Analysis of 64-Bit Carry Look Ahead Adder Daljit Kaur Ana Monga This work is licensed under a Creative Commons CC_BY-NC International

More information

Chapter 3 Arithmetic for Computers

Chapter 3 Arithmetic for Computers Chapter 3 Arithmetic for Computers 1 Arithmetic Where we've been: Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's up ahead: Implementing the Architecture operation

More information

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 3 DLD P VIDYA SAGAR

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 3 DLD P VIDYA SAGAR DLD UNIT III Combinational Circuits (CC), Analysis procedure, Design Procedure, Combinational circuit for different code converters and other problems, Binary Adder- Subtractor, Decimal Adder, Binary Multiplier,

More information

An Efficient Implementation of Floating Point Multiplier

An Efficient Implementation of Floating Point Multiplier An Efficient Implementation of Floating Point Multiplier Mohamed Al-Ashrafy Mentor Graphics Mohamed_Samy@Mentor.com Ashraf Salem Mentor Graphics Ashraf_Salem@Mentor.com Wagdy Anis Communications and Electronics

More information

DESIGN AND IMPLEMENTATION OF APPLICATION SPECIFIC 32-BITALU USING XILINX FPGA

DESIGN AND IMPLEMENTATION OF APPLICATION SPECIFIC 32-BITALU USING XILINX FPGA DESIGN AND IMPLEMENTATION OF APPLICATION SPECIFIC 32-BITALU USING XILINX FPGA T.MALLIKARJUNA 1 *,K.SREENIVASA RAO 2 1 PG Scholar, Annamacharya Institute of Technology & Sciences, Rajampet, A.P, India.

More information

Hardware Description Language VHDL (1) Introduction

Hardware Description Language VHDL (1) Introduction Hardware Description Language VHDL (1) Introduction Digital Radiation Measurement and Spectroscopy NE/RHP 537 Introduction Hardware description language (HDL) Intended to describe circuits textually, for

More information

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE Design and Implementation of Optimized Floating Point Matrix Multiplier Based on FPGA Maruti L. Doddamani IV Semester, M.Tech (Digital Electronics), Department

More information

Abi Farsoni, Department of Nuclear Engineering and Radiation Health Physics, Oregon State University

Abi Farsoni, Department of Nuclear Engineering and Radiation Health Physics, Oregon State University Hardware description language (HDL) Intended to describe circuits textually, for a computer to read Evolved starting in the 1970s and 1980s Popular languages today include: VHDL Defined in 1980s by U.S.

More information

FPGA: FIELD PROGRAMMABLE GATE ARRAY Verilog: a hardware description language. Reference: [1]

FPGA: FIELD PROGRAMMABLE GATE ARRAY Verilog: a hardware description language. Reference: [1] FPGA: FIELD PROGRAMMABLE GATE ARRAY Verilog: a hardware description language Reference: [] FIELD PROGRAMMABLE GATE ARRAY FPGA is a hardware logic device that is programmable Logic functions may be programmed

More information

Lecture 5. Other Adder Issues

Lecture 5. Other Adder Issues Lecture 5 Other Adder Issues Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 24 by Mark Horowitz with information from Brucek Khailany 1 Overview Reading There

More information

Chapter 6 Combinational-Circuit Building Blocks

Chapter 6 Combinational-Circuit Building Blocks Chapter 6 Combinational-Circuit Building Blocks Commonly used combinational building blocks in design of large circuits: Multiplexers Decoders Encoders Comparators Arithmetic circuits Multiplexers A multiplexer

More information