Design of Arithmetic circuits

Size: px

Start display at page:

Download "Design of Arithmetic circuits"

Maximillian Williamson
5 years ago
Views:

1 Design of Arithmetic circuits ic principle of pipelining ditional approach Input Data clk Process < 100 ns Through 10 MH elining approach Throughput considerably. increases

2 Chip area also increases. Latency comes into effect. put ta Proc. 1 <10 ns Reg. 1 Proc. 10 <10 ns Reg. 10 Throug 100 M clk clk cessing order e(ns) Input Reg. 1 Reg. 2 Reg Data1 0 Data2 Proc.1_1

3 20 Data3 Proc.1_2 roc.2_1 0 Data11 Proc.1_10 Proc.2_9... c.10_1 ency: 100 ns. titioning of a design Partition of data width Partition of functionality Partition of data width

4 Consider the example of a signed adder: Eight signed input numbers, each of width 12 bits. Sum of these numbers are required. Conventional approach of addition/subtraction uses all the 12 bits together. Since full adders are used for implementation, the result is delayed owing to the propagation of carry rippling through all the 12 bits.

5 Even the usage of carry look ahead circuit does not help in speeding up the computation since a large number of gates and inputs are required in this case. The answer for this problem is to divide the data widths into smaller chunks, and introduce pipelining. In the data width partitioning approach, all sub blocks do the same function. rtition of functionality In this method, the functional block is

6 divided into smaller sub blocks. In this type of partitioning, each sub block does a different function, in general. In the signed adder example to be presented, LSBs (7 bits) of the eight numbers are added concurrently followed by the addition of MSBs (5 bits along with carry from LSB addition) in subsequent pipeline stages.

7 ADDER CAN BE REALIZED IN TWO DIFFERENT WAYS: Feeding inputs serially Feeding inputs concurrently SERIAL SIGNED ADDER DESIGN sum [14:0] + s n [11:0] ( n0 n7 ) enable - clk s // Pipelined Serial Signed Adder Design - Verilog Code

8 //Adds eight numbers of 12 bit, 2's complement // nos. Feed inputs serially at 'n'. // Eight pipelining posedge of clk. // Result, sum, is 15 bits wide, in 2's complement // (registered output). module serial_adder12s ( clk, enable, n, sum, sum_valid, result ) ; input clk ; input enable ; input [11:0] n ; output [14:0] sum ; output sum_valid ;

9 output [14:0] result ; // Extend the result till it is overwritten by the new result. wire [14:0] sum_next ; wire [2:0] cnt_next ; wire sum_val ; reg [14:0] sum; reg [2:0] cnt ; reg sum_valid ; reg [14:0] result ; assign sum_next[14:0] = enable? ({{3{n[11]}},n[11:0]}+sum[14:0]) : 0 ; // Sign extend & accumulate.

10 assign cnt_next[2:0] = enable? (cnt+1) : 0 ; // Sign extend & pre-advance the counter. assign sum_val = (cnt==7)? 1 : 0 ; // Pre-determine the validity of the sum. (posedge clk) // Pipeline - Register the sum. begin sum[14:0] <= sum_next[14:0] ; // Register the sum. cnt[2:0] <= cnt_next[2:0] ; // Advance the count.

11 sum_valid <= sum_val ; // Register the signal. end (posedge clk) // Extend the result till it is overwritten by the new result. begin result[14:0] = sum_valid? sum[14:0] : result[14:0] ; // Register the sum. end

12 endmodule // Test Bench for Serial Adder Design `define clkperiodby2 10 `include "serial_adder12s.v" module serial_adder12s_test ( sum, sum_valid, result ); output [14:0] sum;

13 output sum_valid ; output [14:0] result; reg clk ; reg enable ; reg [11:0] n ; serial_adder12s u1( initial begin.clk(clk),.enable(enable),.n(n),.sum(sum),.sum_valid(sum_valid),.result(result) );

14 clk = 1'b0 ; // Apply first set of inputs sequentially every 20 ns. n = 12'h0 ; // 0 ns. enable = 0 ; #20 enable = 1 ; #17 n = 12'hfff ; // 37 ns. #20 n = 12'h7ff ; // 57 ns, etc. #20 n = 12'h800 ; #20 n = 12'h001 ; #20 n = 12'h001 ; #20 n = 12'h7ff ; #20 n = 12'haaa ; // 157 ns. #20 n = 12'h0 ; enable = 0 ; // Disable before applying // the next set of inputs

15 accumulated // so that the // sum is cleared. #20 enable = 1 ; // Apply the next set of inputs. n =100 ; // n0 #20 n = 200 ; #20 n = 300 ; #20 n = 400 ; #20 n = 500 ; #20 n = 100 ; #20 n = 200 ; #20 n = 247 ; // n7 #20 enable = 0 ; #100 $stop ; end

16 always #`clkperiodby2 clk <= ~clk ; // Run the clock at 50 MHz. endmodule Simulation results of serial signed adder

18 Synplify results Max. frequency of operation: 138 MHz. Mapping to part: xcv600ehq240-8 Cell usage:

19 MUXCY_L XORCY FDR FDE GND 14 uses 14 uses 19 uses 15 uses 1 use I/O primitives: IBUF OBUF BUFGP 13 uses 31 uses 1 use I/O Register bits: 15 Register bits not including I/Os: 19 (0%) Global Clock Buffers: 1 of 4 (25%) Mapping Summary: Total LUTs: 18 (0%) Mapper successful!

20 Xilinx P&R Results Design Summary: Number of errors: 0 Number of warnings: 0 Number of Slices: 11 out of 6,912 1% Number of Slices containing unrelated logic: 0 out of 11 0% Number of Slice Flip Flops: 19 out of 13,824 1% Number of 4 input LUTs: 18 out of 13,824 1% Number of bonded IOBs: 44 out of % IOB Flip Flops: 15 Number of GCLKs: 1 out of 4 25% Number of GCLKIOBs: 1 out of 4 25% Total equivalent gate count for design: 464 Additional JTAG gate count for IOBs: 2,160

21 Mapping completed. Maximum frequency: MHz PARALLEL SIGNED ADDER DESIGN n0 [11:0] n1 [11:0] n2 [11:0] n3 [11:0] n4 [11:0] adder12s n5 [11:0] n6 [11:0]

22 Complement evaluation (shortcut) [8]...[0] p Data Retain first 1 followed by 0s Invert other bits Sign can be extended by any number of bits without affecting the actual value.

23 Sign extend means duplicate MSB ([8]=[7]). [8]...[0] Extend Sign Ignore Carry.

24 Without the sign extension, the MSB [7] will be mistaken as a negative number for high positive values such as design partition Pipelined

25 n0 [11:0] n1 [11:0] n2 [11:0] n3 [11:0] n4 [11:0] n5 [11:0] n6 [11:0] n7 [11:0] Regist Register Result clk clk LSB MSB First stage clk clk LSB MSB Register Result Second stage T

26 ********** Verilog code for signed adder // Adds eight 12 bit, 2's complement nos., // n0 to n7. // Five pipeline stages posedge // clk. // Result, sum, is in 12 bit, 2's complement // (not registered). module adder12s( clk, n0,n1,n2,n3,n4,n5,n6,n7, sum ) ;

27 input clk ; input [11:0] n0, n1, n2, n3, n4, n5, n6, n7; output [14:0] sum ; wire [7:0] s00_lsb ; wire [7:0] s01_lsb ; wire [7:0] s02_lsb ; wire [7:0] s03_lsb ; wire [5:0] s00_msb ; wire [5:0] s01_msb ; wire [5:0] s02_msb ; wire [5:0] s03_msb ; wire [7:0] s10_lsb ; wire [7:0] s11_lsb ; wire [6:0] s10_msb ; wire [6:0] s11_msb ;

28 wire [7:0] s20_lsb ; reg [11:7] n0_reg1 ; reg [11:7] n1_reg1 ; reg [11:7] n2_reg1 ; reg [11:7] n3_reg1 ; reg [11:7] n4_reg1 ; reg [11:7] n5_reg1 ; reg [11:7] n6_reg1 ; reg [11:7] n7_reg1 ; reg [7:0] s00_lsbreg1 ; reg [7:0] s01_lsbreg1 ; reg [7:0] s02_lsbreg1 ; reg [7:0] s03_lsbreg1 ; reg [5:0] s00_msbreg2 ; reg [5:0] s01_msbreg2 ; reg [5:0] s02_msbreg2 ; reg [5:0] s03_msbreg2 ; reg [6:0] s00_lsbreg2 ;

29 reg [6:0] s01_lsbreg2 ; reg [6:0] s02_lsbreg2 ; reg [6:0] s03_lsbreg2 ; reg [7:0] s10_lsbreg3 ; reg [7:0] s11_lsbreg3 ; reg [5:0] s00_msbreg3 ; reg [5:0] s01_msbreg3 ; reg [5:0] s02_msbreg3 ; reg [5:0] s03_msbreg3 ; reg [6:0] s10_lsbreg4 ; reg [6:0] s11_lsbreg4 ; reg [6:0] s10_msbreg4 ; reg [6:0] s11_msbreg4 ; reg [6:0] s10_msbreg5 ; reg [6:0] s11_msbreg5 ; reg s20_lsbreg5cy ; reg [6:0] s20_lsbreg5 ;

30 // First stage addition assign s00_lsb[7:0] = n0[6:0]+n1[6:0] ; // Add lsb first - s00_lsb[7] is the carry assign s01_lsb[7:0] = n2[6:0]+n3[6:0] ; // n0-n7 lsb need not be registered since // addition is already carried out here. assign s02_lsb[7:0] = n4[6:0]+n5[6:0] ; assign s03_lsb[7:0] = n6[6:0]+n7[6:0] ;

31 (posedge clk) // Pipeline 1: clk (1). Register msb to // continue addition of msb. begin n0_reg1[11:7] <= n0[11:7] ; // Preserve all inputs for msb addition // during the clk(2). n1_reg1[11:7] <= n1[11:7] ; n2_reg1[11:7] <= n2[11:7] ; n3_reg1[11:7] <= n3[11:7] ; n4_reg1[11:7] <= n4[11:7] ;

32 n5_reg1[11:7] <= n5[11:7] ; n6_reg1[11:7] <= n6[11:7] ; n7_reg1[11:7] <= n7[11:7] ; s00_lsbreg1[7:0] <= s00_lsb[7:0] ; addition. // Preserve all lsb sum. // s00_lsbreg1[7] is the // registered carry // from lsb s01_lsbreg1[7:0] <= s01_lsb[7:0] ;

33 s02_lsbreg1[7:0] <= s02_lsb[7:0] ; s03_lsbreg1[7:0] <= s03_lsb[7:0] ; end // Sign extended & msb added with carry. assign s00_msb[5:0] = {n0_reg1[11], n0_reg1[11:7]}+ {n1_reg1[11], n1_reg1[11:7]}+s00_lsbreg1[7]; // s00_msb[6] is ignored.

34 assign s01_msb[5:0] = {n2_reg1[11], n2_reg1[11:7]}+ {n3_reg1[11], n3_reg1[11:7]}+s01_lsbreg1[7]; assign s02_msb[5:0] = {n4_reg1[11], n4_reg1[11:7]}+ {n5_reg1[11], n5_reg1[11:7]}+s02_lsbreg1[7]; assign s03_msb[5:0] = {n6_reg1[11], n6_reg1[11:7]}+

35 {n7_reg1[11], n7_reg1[11:7]}+s03_lsbreg1[7]; (posedge clk) // Pipeline 2: clk (2). Register msb to // continue addition of msb. begin s00_msbreg2[5:0] <= s00_msb[5:0] ; Preserve all msb sum. // s01_msbreg2[5:0] <= s01_msb[5:0] ; s02_msbreg2[5:0] <= s02_msb[5:0] ;

36 s03_msbreg2[5:0] <= s03_msb[5:0] ; s00_lsbreg2[6:0] <= s00_lsbreg1[6:0] ; s01_lsbreg2[6:0] <= s01_lsbreg1[6:0] ; s02_lsbreg2[6:0] <= s02_lsbreg1[6:0] ; s03_lsbreg2[6:0] <= s03_lsbreg1[6:0] ; end // Preserve all lsb sum.

37 // Second stage addition assign s10_lsb[7:0] = s00_lsbreg2[6:0] + s01_lsbreg2[6:0] ; // Add lsb first - s10_lsb[7] is // the carry. assign s11_lsb[7:0] = s02_lsbreg2[6:0] + s03_lsbreg2[6:0] ; // s00,s01 lsbs need not be registered // since addition is already carried // out here.

38 (posedge clk) // Pipeline 3: clk (3). Register msb to // continue addition of msb. begin s10_lsbreg3[7:0] <= s10_lsb[7:0] ; s11_lsbreg3[7:0] <= s11_lsb[7:0] ; s00_msbreg3[5:0] <= s00_msbreg2[5:0] ; // Preserve all lsb sum.

39 all msb sum. // Preserve s01_msbreg3[5:0] <= s01_msbreg2[5:0] ; s02_msbreg3[5:0] <= s02_msbreg2[5:0] ; s03_msbreg3[5:0] <= s03_msbreg2[5:0] ; end assign s10_msb[6:0] = {s00_msbreg3[5], s00_msbreg3[5:0]}+ {s01_msbreg3[5], s01_msbreg3[5:0]}

40 +s10_lsbreg3[7] ; // Add MSB of 2 nd stage with sign extension // and carry in from LSB. // s10_msb[7] is ignored. assign s11_msb[6:0] = {s02_msbreg3[5], s02_msbreg3[5:0]}+ {s03_msbreg3[5], s03_msbreg3[5:0]}+ s11_lsbreg3[7] ; (posedge clk) // Pipeline 4: clk (4). Register msb to

41 // continue addition of msb. begin s10_lsbreg4[6:0] <= s10_lsbreg3[6:0] ; Preserve all lsb sum. // s11_lsbreg4[6:0] <= s11_lsbreg3[6:0] ; s10_msbreg4[6:0] <= s10_msb[6:0] ; Preserve all msb sum. // s11_msbreg4[6:0] <= s11_msb[6:0] ; end

42 // Third stage addition. assign s20_lsb[7:0] = s10_lsbreg4[6:0]+ s11_lsbreg4[6:0] ; // Add lsb first - s20_lsb[7] is // the carry. (posedge clk) // Pipeline 5: clk (5). Register msb to // continue addition of msb. begin

43 s10_msbreg5[6:0] <= s10_msbreg4[6:0] ; Preserve all msb sum. // s11_msbreg5[6:0] <= s11_msbreg4[6:0] ; s20_lsbreg5cy <= s20_lsb[7]; Preserve all lsb sum. // s20_lsbreg5[6:0] <= s20_lsb[6:0]; end // Add third stage MSB result and concatenate

44 // with LSB result to get the final result. assign sum[14:0] = {({s10_msbreg5[6], s10_msbreg5[6:0]}+ {s11_msbreg5[6], s11_msbreg5[6:0]}+ s20_lsbreg5cy), s20_lsbreg5[6:0]}; endmodule

45 TEST BENCH FOR PARALLEL SIGNED ADDER DESIGN `define clkperiodby2 10 `include "adder12s_banno.v" // Use back annotated file. module adder12s_test ( sum ); output [14:0] sum; reg clk ; reg [11:0] n0 ;

46 reg [11:0] n1 ; reg [11:0] n2 ; reg [11:0] n3 ; reg [11:0] n4 ; reg [11:0] n5 ; reg [11:0] n6 ; reg [11:0] n7 ; adder12s u1(.clk(clk),.n0(n0),.n1(n1),.n2(n2),.n3(n3),

47 );.n4(n4),.n5(n5),.n6(n6),.n7(n7),.sum(sum) initial begin clk = 1'b0 ; n0 = 12'h0 ; n1 = 12'h0 ; n2 = 12'h0 ; n3 = 12'h0 ; n4 = 12'h0 ; n5 = 12'h0 ; n6 = 12'h0 ; n7 = 12'h0 ;

48 #17 n0 = 12'hfff ; n1 = 12'hfff ; n2 = 12'hfff ; n3 = 12'hfff ; n4 = 12'hfff ; n5 = 12'hfff ; n6 = 12'hfff ; n7 = 12'hfff ; #20 n0 = 12'h7ff ; n1 = 12'h7ff ; n2 = 12'h7ff ; n3 = 12'h7ff ; n4 = 12'h7ff ; n5 = 12'h7ff ; n6 = 12'h7ff ; n7 = 12'h7ff ; #20 n0 = 12'h800 ; n1 = 12'h800 ; n2 = 12'h800 ; n3 = 12'h800 ; n4 = 12'h800 ;

49 n5 = 12'h800 ; n6 = 12'h800 ; n7 = 12'h800 ; #20 n0 = 12'h001 ; n1 = 12'h001 ; n2 = 12'h001 ; n3 = 12'h001 ; n4 = 12'h001 ; n5 = 12'h001 ; n6 = 12'h001 ; n7 = 12'h001 ; #20 n0 = 12'h001 ; n1 = 12'hfff ; n2 = 12'h001 ; n3 = 12'hfff ; n4 = 12'h001 ; n5 = 12'hfff ; n6 = 12'h001 ; n7 = 12'hfff ; #20 n0 = 12'h7ff ;

50 n1 = 12'h7ff ; n2 = 12'h7ff ; n3 = 12'h7ff ; n4 = 12'h801 ; n5 = 12'h801 ; n6 = 12'h801 ; n7 = 12'h801 ; #20 n0 = 12'haaa ; n1 = 12'h555 ; n2 = 12'haaa ; n3 = 12'h555 ; n4 = 12'haaa ; n5 = 12'h555 ; n6 = 12'haaa ; n7 = 12'h555 ; #20 n0 = 12'h0 ; n1 = 12'h0 ; n2 = 12'h0 ; n3 = 12'h0 ; n4 = 12'h0 ; n5 = 12'h0 ;

51 end n6 = 12'h0 ; n7 = 12'h0 ; #400 $stop ; always #`clkperiodby2 clk <= ~clk ; endmodule Simulation results of eight input parallel signed adder

54 Synplify synthesis dvlsi_des_verilog\adder12s.v" Verilog syntax check successful! Selecting top level module adder12s Synthesizing module adder12s Performance Summary ******************* Worst slack in design: Requested Estimated Starting Clock Frequency Frequency clk MHz MHz

55 ================================ =========== Requested Estimated Clock Period Period Slack Type inferred ================================ ============== Resource Usage Report for adder12s Mapping to part: xcv600ehq240-8 Cell usage: MUXCY_L 81 uses XORCY 88 uses MUXCY 7 uses

56 FD GND 214 uses 1 use I/O primitives: IBUF 96 uses OBUF 15 uses BUFGP 1 use I/O Register bits: 47 Register bits not including I/Os: 167 (1%) Global Clock Buffers: 1 of 4 (25%) Mapping Summary: Total LUTs: 95 (0%) Mapper successful!

57 Results Xilinx P&R Design Summary: Number of errors: 0 Number of warnings: 0 Number of Slices: 97 out of 6,912 1% Number of Slices containing unrelated logic: 0 out of 97 0% Number of Slice Flip Flops: 167 out of 13,824 1% Number of 4 input LUTs: 95 out of 13,824 1%

58 Number of bonded IOBs: 111 out of % IOB Flip Flops: 47 Number of GCLKs: of 4 25% Number of GCLKIOBs: of 4 25% 1 out 1 out Total equivalent gate count for design: 2,810 Additional JTAG gate count for IOBs: 5,376 Mapping completed. Timing summary: Design statistics:

59 Minimum period: 6.563ns (Maximum frequency: MHz) Minimum input arrival time before clock: 4.259ns Minimum output required time after clock: ns Running DRC. DRC detected 0 errors and 0 warnings. Creating bit map... Saving bit stream in "adder12s.bit". Creating bit mask... Saving mask bit stream in "adder12s.msk".

60 Bitstream generation is complete. COMPARISON OF SERIAL ADDER AND PARALLEL ADDER WITH EIGHT NUMBER OF INPUTS Type of Serial Parallel Adder No. of i/p 8 1 clk cycles No. of o/p 9 1 clk cycles

61 Gate count JTAG gate 2, Max. freq. of Operation in MHz MULTIPLIER DESIGN A NEW ALGORITHM n1 [10:0] n2 [7:0] mult11sx8s

62 clk 8 pipeline stages Example : Consider the evaluation of products of two signed numbers: 1023 x -128 = Binary, signed representation: x = n1 (magnitude) n2 (magnitude) x

63 x P1 P2 P3 P4 P5 P6 P7 P

64 (magnitude) Pipelined design partition P1 P2 P3 P4 LS 1 b + S 11 + LS 1 b S LS 2 b S 2 P5 P6 P7 LS 1 b + + S LS 2 b L S 2

65 P8 LS 1 b S 1 4 Second stage Verilog code for multiplier // Signed multiplication of two numbers, n1 // (11-bit) & n2 (8-bit). // n1 (Partial product, CX for example) is the // multiplicand, and is signed. // n2 (cos term, CT for example) is the signed // multiplier.

66 // Result (CX)CT is in twos complement. // CX, CT are used in DCTQ Processor. // This module has eight pipeline stages to // increase the speed - input is not // registered. module mult11sx8s( clk, n1, n2, result ) ; input clk ; input [10:0] n1 ; input [7:0] n2 ; output [18:0] result ;

67 wire ; n1orn2z wire [10:0] p1 ; wire [10:0] p2 ; wire [10:0] p3 ; wire [10:0] p4 ; wire [10:0] p5 ; wire [10:0] p6 ; wire [10:0] p7 ; wire [10:0] p8 ; wire [6:0] s11a ; wire [6:0] s12a ; wire [6:0] s13a ; wire [6:0] s14a ; wire [5:0] s11b ; wire [5:0] s12b ; wire [5:0] s13b ; wire [5:0] s14b ;

68 wire [12:0] s11 ; wire [12:0] s12 ; wire [12:0] s13 ; wire [12:0] s14 ; wire [7:0] s21a ; wire [7:0] s22a ; wire [6:0] s21b ; wire [6:0] s22b ; wire [14:0] s21 ; wire [14:0] s22 ; wire [8:0] s31a ; wire [7:0] s31b ; wire [17:0] s31 ; wire res_sign ; wire [18:0] res ; reg [10:0] n1_mag ;

69 reg [7:0] n2_mag ; reg [10:0] p1_reg1 ; reg [10:0] p2_reg1 ; reg [10:0] p3_reg1 ; reg [10:0] p4_reg1 ; reg [10:0] p5_reg1 ; reg [10:0] p6_reg1 ; reg [10:0] p7_reg1 ; reg [10:0] p8_reg1 ; reg [6:0] s11a_reg2 ; reg [6:0] s12a_reg2 ;

70 reg [6:0] s13a_reg2 ; reg [6:0] s14a_reg2 ; reg reg reg reg reg reg reg reg reg reg reg reg reg reg n1_reg1; n1_reg2; n1_reg3; n1_reg4; n1_reg5; n1_reg6; n1_reg7; n2_reg1; n2_reg2; n2_reg3; n2_reg4; n2_reg5; n2_reg6; n2_reg7; reg n1orn2z_reg1 ; reg n1orn2z_reg2 ;

71 reg n1orn2z_reg3 ; reg n1orn2z_reg4 ; reg n1orn2z_reg5 ; reg n1orn2z_reg6 ; reg n1orn2z_reg7 ; reg [10:0] p1_reg2 ; reg [10:0] p2_reg2 ; reg [10:0] p3_reg2 ; reg [10:0] p4_reg2 ; reg [10:0] p5_reg2 ; reg [10:0] p6_reg2 ; reg [10:0] p7_reg2 ; reg [10:0] p8_reg2 ;

72 reg [12:0] s11_reg3 ; reg [12:0] s12_reg3 ; reg [12:0] s13_reg3 ; reg [12:0] s14_reg3 ; reg [12:0] s11_reg4 ; reg [12:0] s12_reg4 ; reg [12:0] s13_reg4 ; reg [12:0] s14_reg4 ; reg [7:0] s21a_reg4 ; reg [7:0] s22a_reg4 ;

73 reg [14:0] s21_reg5 ; reg [14:0] s22_reg5 ; reg [14:0] s21_reg6 ; reg [14:0] s22_reg6 ; reg [8:0] s31a_reg6 ; reg [17:0] s31_reg7 ; reg [18:0] result ; begin

74 if(n1[10] == 1'b0) n1_mag = n1[10:0]; else n1_mag = ~n1[10:0] + 1; // Evaluate twos complement. end begin if(n2[7] == 1'b0) n2_mag = n2[7:0]; else n2_mag = ~n2[7:0] + 1; // Evaluate twos complement. end

75 assign n1orn2z = ((n1 == 11'b0) (n2 == 7'b0))? 1'b1:1'b0; // If n1 or n2 is zero, make final // result +0. assign p1 = n1_mag[10:0] & {11{n2_mag[0]}}; products. // Compute the partial assign p2 = n1_mag[10:0] & {11{n2_mag[1]}}; // n1 multiplied by n2 bit '0', etc. assign p3 = n1_mag[10:0] & {11{n2_mag[2]}};

76 assign p4 = n1_mag[10:0] & {11{n2_mag[3]}}; assign p5 = n1_mag[10:0] & {11{n2_mag[4]}}; assign p6 = n1_mag[10:0] & {11{n2_mag[5]}}; assign p7 = n1_mag[10:0] & {11{n2_mag[6]}}; assign p8 = n1_mag[10:0] & {11{n2_mag[7]}}; (posedge clk) // This is the first pipeline register, // clk(1). begin p1_reg1 <= p1; p2_reg1 <= p2; p3_reg1 <= p3;

77 p4_reg1 <= p4; p5_reg1 <= p5; p6_reg1 <= p6; p7_reg1 <= p7; p8_reg1 <= p8; n1_reg1 <= n1[10]; n2_reg1 <= n2[7]; n1orn2z_reg1 <= n1orn2z; end // p1_reg1, etc. means p1, etc. are registered // after positive edge of clk (1), clk (2), // etc. assign s11a[6:0] = p1_reg1[6:1] + p2_reg1[5:0]; is added here. // LSB

78 assign s12a[6:0] = p3_reg1[6:1] + p4_reg1[5:0]; shifts are // Note the left // taken care of. assign s13a[6:0] = p5_reg1[6:1] + p6_reg1[5:0]; p3, p5 and p7. // for p1, assign s14a[6:0] = p7_reg1[6:1] + p8_reg1[5:0]; etc. will be // p1_reg1[0],

79 at the clk (2). etc. are the // processed // s11a[6], // carry bits. (posedge clk) // This is the second pipeline register, // clk (2). begin s11a_reg2 <= s11a; // Store LSB partial sums. s12a_reg2 <= s12a; s13a_reg2 <= s13a; s14a_reg2 <= s14a; p1_reg2[10:7] <= p1_reg1[10:7];

80 // Store MSB of partial products. p2_reg2[10:6] <= p2_reg1[10:6]; p3_reg2[10:7] <= p3_reg1[10:7]; p4_reg2[10:6] <= p4_reg1[10:6]; p5_reg2[10:7] <= p5_reg1[10:7]; p6_reg2[10:6] <= p6_reg1[10:6]; p7_reg2[10:7] <= p7_reg1[10:7]; p8_reg2[10:6] <= p8_reg1[10:6]; p1_reg2[0] <= p1_reg1[0]; // Store '0' th bit // since it is not p3_reg2[0] <= p3_reg1[0]; // yet processed. p5_reg2[0] <= p5_reg1[0]; p7_reg2[0] <= p7_reg1[0]; n1_reg2 <= n1_reg1;

81 // Also store sign bits and zero status. n2_reg2 <= n2_reg1; n1orn2z_reg2 <= n1orn2z_reg1; end // MSB is added here along with carry. assign s11b[5:0] = {1'b0, p1_reg2[10:7]} + p2_reg2[10:6] + s11a_reg2[6]; assign s12b[5:0] = {1'b0, p3_reg2[10:7]} + p4_reg2[10:6] +

82 s12a_reg2[6]; assign s13b[5:0] = {1'b0, p5_reg2[10:7]} + p6_reg2[10:6] + s13a_reg2[6]; assign s14b[5:0] = {1'b0, p7_reg2[10:7]} + p8_reg2[10:6] + s14a_reg2[6]; are here. // MSBs & LSBs // concatenated

83 assign s11[12:0] = {s11b, s11a_reg2[5:0], p1_reg2[0]}; '0' th bit respectively. // MSB, LSB, // assign s12[12:0] = {s12b, s12a_reg2[5:0], p3_reg2[0]}; assign s13[12:0] = {s13b, s13a_reg2[5:0], p5_reg2[0]}; assign s14[12:0] = {s14b, s14a_reg2[5:0], p7_reg2[0]};

84 (posedge clk) // This is the third pipeline register, // clk (3). First stage results. begin s11_reg3 <= s11; for further processing. s12_reg3 <= s12; s13_reg3 <= s13; s14_reg3 <= s14; // Store // n1_reg3 <= n1_reg2; n2_reg3 <= n2_reg2; n1orn2z_reg3 <= n1orn2z_reg2;

85 end assign s21a[7:0] = s11_reg3[8:2] + s12_reg3[6:0]; s21a[7]is the carry. // assign s22a[7:0] = s13_reg3[8:2] + s14_reg3[6:0]; sum, 2nd stage. // LSB (posedge clk)

86 // This is the fourth pipeline register, // clk (4). begin s11_reg4[12:9] <= s11_reg3[12:9]; // Store bits not yet processed. s11_reg4[1:0] <= s11_reg3[1:0]; s12_reg4[12:7] <= s12_reg3[12:7]; s13_reg4[12:9] <= s13_reg3[12:9]; s13_reg4[1:0] <= s13_reg3[1:0]; s14_reg4[12:7] <= s14_reg3[12:7]; s21a_reg4 <= s21a;

87 // Store LSB, second stage partial sums. s22a_reg4 <= s22a; n1_reg4 <= n1_reg3; n2_reg4 <= n2_reg3; n1orn2z_reg4 <= n1orn2z_reg3; end // Add second stage MSBs with carry. assign s21b[6:0] = {2'b0, s11_reg4[12:9]} + s12_reg4[12:7] + s21a_reg4[7];

88 assign s22b[6:0] = {2'b0, s13_reg4[12:9]} + s14_reg4[12:7] + s22a_reg4[7]; assign s21[14:0] = {s21b[5:0], s21a_reg4[6:0], s11_reg4[1:0]} ; LSB, [1:0]} // {MSB, // Result will never effect s21b[6], // which is always 0. assign s22[14:0] = {s22b[5:0], s22a_reg4[6:0],

89 s13_reg4[1:0]} ; (posedge clk) // This is the fifth pipeline register, // clk (5). begin s21_reg5 <= s21; // Store for further processing. s22_reg5 <= s22; n1_reg5 <= n1_reg4; n2_reg5 <= n2_reg4; n1orn2z_reg5 <= n1orn2z_reg4; end

90 assign s31a[8:0] = s21_reg5[11:4] + s22_reg5[7:0]; // 3rd stage LSB computed here. (posedge clk) // This is the sixth pipeline register, // clk (6). begin s21_reg6[14:12]<= s21_reg5[14:12]; Preserve MSB. s22_reg6[14:8] <= s22_reg5[14:8]; //

91 s21_reg6[3:0] <= s21_reg5[3:0]; s31a_reg6 <= s31a; //3rd stage LSB // registered here. n1_reg6 <= n1_reg5; n2_reg6 <= n2_reg5; n1orn2z_reg6 <= n1orn2z_reg5; end assign s31b[7:0] = {4'b0, s21_reg6[14:12]} + s22_reg6[14:8] + s31a_reg6[8]; // 3rd stage MSB computed here.

92 assign s31[17:0] = {s31b[5:0], s31a_reg6[7:0], s21_reg6[3:0]} ; // Put MSB, LSB and [3:0] bits together. // Note that the 3rd stage result will never // effect s31b[6:5], which is always 0. (posedge clk) // This is the seventh pipeline register, // clk (7). begin n1_reg7 <= n1_reg6;

93 // Store intermediate results. n2_reg7 <= n2_reg6; s31_reg7 <= s31; n1orn2z_reg7 <= n1orn2z_reg6; end assign res_sign = n1_reg7^n2_reg7; means a -ve no. // '1' assign res[18:0] = (res_sign )? {1'b1, (~s31_reg7 + 1'b1)}: {1'b0, s31_reg7};

94 (posedge clk) // This is the eighth pipeline register, // clk (8). begin if (n1orn2z_reg7 == 1'b1) result[18:0] <= 19'b0; else result[18:0] <= res; // This is the final result // (product of two numbers) // in twos complement. end

95 endmodule TEST BENCH FOR MULTIPLIER `define clkperiodby2 10 `include "mult11sx8s_banno.v" module mult11sx8s_test ( result ); output [18:0] result;

96 reg clk ; reg [10:0] n1 ; reg [7:0] n2 ; mult11sx8s u1( initial begin clk = 1'b0 ; n1 = 11'h0 ;.clk(clk),.n1(n1),.n2(n2),.result(result) );

97 n2 = 8'h0 ; #17 n1 = 11'h555 ; n2 = 8'h55; #20 n1 = 11'h2aa ; n2 = 8'haa; #20 n1 = 11'h7ff ; n2 = 8'h80; #20 n1 = 11'h555 ; n2 = 8'hff; #20 n1 = 11'h7ff ; n2 = 8'h81; #20 n1 = 11'h555 ; n2 = 8'h81; #20 n1 = 11'h2aa ; n2 = 8'h81;

98 end #20 n1 = 11'h7ff ; n2 = 8'h00; #20 n1 = 11'h7ff ; n2 = 8'h7f; #20 n1 = 11'h000 ; n2 = 8'hff; #20 n1 = 11'h000 ; n2 = 8'h7f; #400 $stop ; always #`clkperiodby2 clk <= ~clk ; endmodule

99 Simulation results of multiplier

100

101 Synplify dvlsi_des_verilog\mult11sx8s.v" Verilog syntax check successful!

102 Selecting top level module mult11sx8s Synthesizing module vlsi_des_verilog\mult11sx8s.v":3 46:0:346:5 Found seqshift n1orn2z, depth=7, vlsi_des_verilog\mult11sx8s.v":3 46:0:346:5 Found seqshift n1, depth=6, vlsi_des_verilog\mult11sx8s.v":3 46:0:346:5 Found seqshift n2, depth=6, vlsi_des_verilog\mult11sx8s.v":2 02:0:202:5 Register bit s14a_reg2[6] is always 0, Performance Summary

103 ******************* Worst slack in design: Requested Estimated Starting Clock Frequency Frequency clk 50.0 MHz MHz ================================ =========== Requested Estimated Clock Period Period Slack Type

104 inferred ================================ ============== Resource Usage Report for mult11sx8s Mapping to part: xcv600ehq240-8 Cell usage: MUXCY_L 100 uses XORCY 109 uses MUXCY 9 uses FDR 105 uses FD 209 uses GND 1 use VCC 1 use I/O primitives: IBUF 19 uses OBUF 19 uses BUFGP 1 use

105 SRL primitives: SRL16 9 uses I/O Register bits: 22 Register bits not including I/Os: 292 (2%) Global Clock Buffers: 1 of 4 (25%) Mapping Summary: Total LUTs: 181 (1%) Mapper successful! Xilinx P&R Results Design Summary: Number of errors: 0

106 Number of warnings: 0 Number of Slices: 201 out of 6,912 2% Number of Slices containing unrelated logic: 0 out of 201 0% Number of Slice Flip Flops: 292 out of 13,824 2% Total Number 4 input LUTs: 178 out of 13,824 1% Number used as LUTs:161 Number used as a route-thru: 8 Number used as Shift registers: 9 Number of bonded IOBs: 38 out of % IOB Flip Flops: 22 Number of GCLKs: 1 out of 4 25%

107 Number of GCLKIOBs: of 4 25% 1 out Total equivalent gate count for design: 5,284 Additional JTAG gate count for IOBs: 1,872 Mapping completed. Timing summary: Timing errors: 0 Score: 0 Constraints cover 2328 paths, 0 nets, and 896 connections (100.0% coverage) Design statistics: Minimum period: ns (Maximum

108 82.427MHz) frequency: Minimum input arrival time before clock: ns Minimum output required time after clock: 5.617ns Running DRC. DRC detected 0 errors and 0 warnings. Creating bit map... Saving bit stream in "mult11sx8s.bit". Creating bit mask... Saving mask bit stream in "mult11sx8s.msk". Bitstream generation is complete.

109

Asynchronous FIFO Design

Asynchronous FIFO Design 2.1 Introduction: An Asynchronous FIFO Design refers to a FIFO Design where in the data values are written to the FIFO memory from one clock domain and the data values are read