Computer Arithmetic Homework Solutions. 1 An adder for graphics. 2 Partitioned adder. 3 HDL implementation of a partitioned adder

Computer Arithmetic Homework 3 2016 2017 Solution 1 An adder for graphic In a normal ripple carry addition of two poitive number, the carry i the ignal for a reult exceeding the maximum. We ue thi ignal to aturate the put at the all one repreentation. On the other hand, in the cae of a ubtraction of a poitive number from another, the abence of a carry indicate a negative reult. (See problem 1.6 in the book.) We ue thoe two fact in Fig. 1 to deign the aturating 8 bit adder/ubtractor where we et the aturation ignal to at = c 8 ub and we connect c 0 to ub to correctly fulfill the ubtraction. In a ingle full adder c = ab + bc in + ac in and the time i etimated to be two gate delay, one to produce the AND and the other for the OR. The wort cae delay i when a carry ripple through the eight full adder and then goe through the final XOR gate to aturate the reult. If we aume that the driving capability of the XOR gate i enough for the row of multiplexer then the total delay i: τ = 8 2 + 1 (XOR) + 2 (mux) = 19 gate delay. A more accurate analyi may conider that b 0 pae through an XOR gate and hence the calculation of c 1 take longer than two gate delay. Each of the other full adder take only two gate delay ince their correponding b ignal would have already paed by the XOR gate before the arrival of the carry. Furthermore, the aumption that the aturation ignal i trong enough to drive eight multiplexer may not be true. That ignal hould be buffered. Hence the gate delay calculated above are a lower etimate. 2 Partitioned adder The main idea here i to enable or diable the propagation of the carry ignal between ucceive block deping on the width of the adder needed. Fig. 2 how a poible olution. I intentionally do not want to preent the other poibilitie and potential optimization in thi problem. You hould think ab them. Note that if the adder i partitioned then each ub-part i not getting the carry from the next lower part but it may have it own input carry. It i important to chooe the correct aturation ignal for each adder a well. Try to follow the diagram of the deign and undertand it function. 3 HDL implementation of a partitioned adder Thi homework attempt to help you tart uing Verilog to decribe and tet arithmetic circuit. It alo provide you with a few ource of information that might be ueful in your future reearch work. Firt, ome ueful link: (you can click on them directly if your pdf reader i et correctly) For Verilog: 1. A very good quick reference guide to Verilog: http://www.utherland-hdl.com/pdf/verilog_2001_ref_guide.pdf 2. Reource page: http://www.aic-world.com/verilog/index.html http://verilog-hdl.winite.com/ 3. Verilator i a very fat and reliable imulator: http://www.veripool.org/wiki/verilator 4. Cver (actually it free verion gplcver) i another good imulator: http://ourceforge.net/project/gplcver/ The proceeding of all the previou IEEE Sympoium on Computer Arithmetic (Arith): http: //www.acel-lab.com/arithmetic/ 1

FA a b c um c FA a b c um c FA a b c um c FA a b c um c FA a b c um c FA a b c um c FA a b c um c FA a b c um c ub a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] b[0] b[1] b[2] b[3] b[4] b[5] b[6] b[7] c0 c8 i0 i0 i0 i0 i0 i0 i0 i0 at [0] [1] [2] [3] [4] [5] [6] [7] Figure 1: An 8-bit ripple carry aturating adder/ubtractor. 2

c32 c16 3 2 1 0 0 1 Mux 4x1 c32 c16 c8 3 2 1 0 0 1 Mux 4x1 b[7:0] a[7:0] c8 a[7:0] b[7:0] c0 at_add8 i0 c8 at [7:0] ub [7:0] ci8 F a[7:0] b[7:0] c0 a8 a8 a16 F c32 c32 b[15:8] a[15:8] 3 2 1 0 0 0 1 1 Mux 4x1 c16 a[7:0] b[7:0] c0 at_add8 i0 c8 at [7:0] ub [15:8] F b[23:16] a[23:16] a[7:0] b[7:0] c0 at_add8 i0 c8 at [7:0] ub [23:16] c32 3 2 1 0 0 1 Mux 4x1 3 2 1 0 0 1 Mux 4x1 a8 3 2 1 0 0 1 Mux 4x1 b[39:32] a[39:32] at_add8 at [7:0] ub [39:32] ci40 F 0 1 c40 i0 c8 c6 a8 b[47:40] a[47:40] c48 a[7:0] b[7:0] c0 at_add8 i0 c8 at [7:0] ub [47:40] ci48 a8 F c24 3 2 1 0 0 1 Mux 4x1 ub ci24 F b[55:48] a[55:48] a[7:0] b[7:0] c0 at_add8 at [7:0] ub [55:48] ci56 F a8 c56 i0 c8 a64 b[31:24] a[31:24] c32 a[7:0] b[7:0] c0 at_add8 i0 c8 at [7:0] ub [31:24] ci32 F b[63:56] a[63:56] a[7:0] b[7:0] c0 at_add8 c8 at [7:0] ub [63:56] 0 1 0 1 0 1 a8 a16 a32 a32 a8 a16 a8 F F 0 1 c24 c16 a16 F c32 c32 c56 0 1 c48 c48 0 1 c48 c40 a64 Figure 2: A 64-bit partitioned adder/ubtractor. 3

The US patent and trademark office: http://www.upto.gov/ Full paper of all the major conference on deign automation: http://www.igda.org/publication VHDL Library of Arithmetic Unit: http://www.ii.ee.ethz.ch/~zimmi/arith_lib.html Fixed point arithmetic in VHDL: http://www.doulo.com/knowhow/vhdl_model/fp_arith/ A dedicated language for decribing computer arithmetic algorithm: http://www.aoki.ecei. tohoku.ac.jp/arith/ For thoe intereted in cryptography and error correcting code, here i a page decribing a Galoi field arithmetic library: http://www.partow.net/project/galoi/ Next, we look at the iue of exhautive teting. The number of tet vector needed for one of the mode (ay the 1 64 mode) in the 64 bit partitioned adder i derived by conidering the poible tate of the two operand, each 64 bit, and the ubtraction ignal. Hence, the total number of input bit i 64 + 64 + 1 = 129 and the number of tet vector for an exhautive tet i 2 129. Given that we have four mode of operation and we aume that only one of them i active at any time, the total number of tet cae i thu 4 2 129 = 2 131 which i larger than 10 39 tet vector. If each vector take 10 9 econd then we need ab 10 30 econd which i more than 10 25 day, i.e. it i practically impoible to tet uch a deign exhautively. The whole purpoe of thi calculation i to how that large deign cannot be teted exhautively with reaonable reource. Finally, we tart looking at the code. A full adder i a baic element in the deign of mot arithmetic circuit. Here i a ample code for thi baic element. 4 full adder 4 * A imple full adder circuit module fa (c,, in0, in1, in2); input in0, in1, in2; put c,; aign c = (in1 & in2) (in1 & in0) (in2 & in0); aign = (in1 ^ in2 ^ in0); module //fa Thi code i ued in chunk 13b and 14c. 4

In general, it i better to follow a pecific tyle while writing code in order to minimize mitake. In the above code, the lit of argument in the line giving the name of the module tart by the put followed by the input. If thi ame order i followed in all the code, the reader can eaily follow the relation between all the module when one module intantiate another. To form an eight bit adder we concatenate eight full adder together. One way to do thi i by intantiating the full adder on eight conecutive line uch a: fa add0(cin1, [0], in0[0], in1[0], cin0); fa add1(cin2, [1], in0[1], in1[1], cin1); fa add2(cin3, [2], in0[2], in1[2], cin2); fa add3(cin4, [3], in0[3], in1[3], cin3); fa add4(cin5, [4], in0[4], in1[4], cin4); fa add5(cin6, [5], in0[5], in1[5], cin5); fa add6(cin7, [6], in0[6], in1[6], cin6); fa add7(co, [7], in0[7], in1[7], cin7); The full adder in the leat ignificant poition receive the initial carry in. For the other full adder, the carry of the preceiding full adder i ued a a carry in. The full adder at the mot ignificant bit poition produce the final carry. A much fater (le typing) way i to ue the capabilitie of Verilog and ue an array of module in a ingle line a in fa adder[7:0] (.c(c),.(),.in0(in0),.in1(in1),.in2(cin)); which call eight full adder, pae the input, and receive the correponding put auming that the input and put are defined a array. The input and put are paed with an explicit ue of their name in order to avoid any confuion but thi i not neceary. For thi line to really imitate the eight line above we mut make the connection between the carrie of the module and the carrie into the following module: aign cin = {c[6:0],ci}; A further improvement would be to define a parameter 5 width parameter 5 define width 8 Thi code i ued in chunk 11 and 15. 5

and to ue it later in a line uch a fa adder[ width-1:0] (.c(c),.(),.in0(in0),.in1(in1),.in2(cin)); which help u to increae the width of the adder jut by changing the value of the parameter. Obviouly the input and put of uch a variable width adder hould be alo defined uing the ame width parameter. Hence the code of the adder become 6 variable width adder 6 * Variable width adder module adder (co,, in0, in1, ci); input [ width-1:0] in0, in1; input ci; put co; put [ width-1:0] ; verilator lint_off UNOPTFLAT wire [ width-1:0] cin; verilator lint_on UNOPTFLAT wire [ width-1:0] c; aign cin = {c[ width-2:0],ci}; fa adder[ width-1:0] (.c(c),.(),.in0(in0),.in1(in1),.in2(cin)); aign co = c[ width-1]; module //adder Thi code i ued in chunk 13b and 14c. 6

The comment before and after the line wire [ width-1:0] cin; are pecific to the verilator imulator ued in thi example. They are not needed in other imulator uch a cver (more ab that later). The adder i now ready and we hould build the tet bench for it. The tet bench 1. applie the input to the deign under tet, 2. read the put when they are ready, 3. compare thoe put to the correct put, 4. flag an error if the put do not match, 5. repeat the above tet cycle for all the input provided, and 6. produce a final report ab failure if any. 7a 7b If the input provided are all the poible combination of input for the deign under tet then the above teting procedure i an exhautive tet of the deign. Obviouly, exhautive tet are only feaible when the number of combination i limited enough for the tet to finih in a reaonable time. Digital deigner ue the term tet vector to deignate the input for a deign and the correponding correct put. A large number of uch tet vector i needed to produce a reaonable tet for mot deign. We may aume for now that another program exit to generate the required tet vector and ave the input into a file named input.txt while the correponding correct put are aved in a file named correct put.txt. The input of each tet vector are written in a pre-agreed order conecutively on one line of input.txt while the correponding correct put i written on the correponding line in put.txt. For example, in the cae of an 8 bit binary adder the following line 00000001 00000010 0 in the input.txt file and the correponding line in the correct put.txt file 0 00000011 could mean that we are adding the 8 bit value of 1 to the value of 2 with a carry in of 0 to produce a carry of 0 and a um of 3. For deign where the width of the operand i large, it might be eaier to read the line if they are written in hexadecimal notation intead of binary: 01 02 0 and 0 03 If a large number of tet vector exit, the ue of hexadecimal notation reduce the ize of the file coniderably. The correctne of the tet vector generation proce i a big topic by itelf and we will aume that it ha been performed with fault. The tetbench can either read the correct put to compare them to the put of the deign under tet or it can ave the reult of the deign in a file put.txt to be compared later with the correct put. For our firt tet bench, we can thu define two parameter: Input/Output file name 7a define deign_timulu define deign_reult Thi code i ued in chunk 11. "input.txt" "put.txt" We hould alo define the number of tet vector that we will ue. number of tet vector 7b define num_tet 131071// number of tet vector Thi code i ued in chunk 11. 7

8a The value 131071 i not cat in tone, it i jut the number of cae in one input file ued to tet a deign! We can think of the above teting cycle a running on their own clock cycle which we may call tetclock. In a combinational deign uch a an eight bit ripple adder there are no internal clock o the tetclock i the only clock in the ytem. In a equential deign the tetclock period will be a multiple of the period of the internal clock in order for the deign to produce it put ignal before checking them. A long a the tetclock period i long enough compared to the delay within the deign under tet, we may apply the input on one edge of the tetclock ignal (for example the poitive edge) and check the put on the other edge. If in an adder deign the two operand and carry in are called operanda, operandb, and Cin we can ue the following piece of code to read the input value: aign the input to one tet vector 8a * Apply the input on the poitive edge alway @(poedge tetclock ) {operanda,operandb,cin}=tetvector[numvector]; 8b Thi code i ued in chunk 11. Here we aume that the tet vector exit in an array named tetvector and that we are currently applying vector number numvector. Thi array of tet vector may be read at the tart of the imulation from the input file. At thi initialization tage we hould tart our counter numvector at zero and we may alo initialize the put file to make it ready. initialize file 8b * Initialiazation: * Load tet vector, * open the put file, * zero the counter, initial // $readmemb( deign_timulu,tetvector); $readmemh( deign_timulu,tetvector); fd= $fopen( deign_reult, "w"); numvector =0; Thi code i ued in chunk 11. 8

9a The initialization code here aume that the input vector are in hexadecimal format. If they were in binary format then the $readmemb hould be ued. It i important to note that the ue of hexadecimal notation can caue ome minor trange effect. The command $readmemh produce the four bit value 0001 when it read a digit equal to 1 aigned to the carry in. In order to prevent warning or error from the imulator, the variable receiving thi value hould be 4 bit wide and then later the leat ignificant bit i ued while the 3 other bit are dropped. Hence, we define the deign input and put a deign ignal 9a * Signal needed for the deign reg [ width-1:0] reg [ width-1:0] reg [3:0] Cin; operanda; operandb; 9b wire [ width-1:0] wire Thi code i ued in chunk 11. um; carry; The deign under tet itelf i called uing only the leat ignificant bit of Cin: deign under tet 9b * Deign under tet take the input read from the input file * and produce the put that will be written to the put * file. adder DUT( carry, um, operanda,operandb,cin[0]); Thi code i ued in chunk 11. 9

At each negative edge of the tetclock ignal, the put of the deign i checked veru the correct put. If our counter numvector reache the final number of tet vector required the imulation. 10 check put and write to file until finihed 10 * Get the put and write it to a file on the negative edge alway @(negedge tetclock ) // $fdiplay(fd, "%b_%b",carry,um); $fdiplay(fd, "%x_%x",carry,um); numvector =numvector+1; hould after the number of tet vector within the file if(numvector== num_tet ) $fcloe(fd); $finih; Thi code i ued in chunk 11. 10

Again, the above code aume that the put file i in hexadecimal. For binary we hould ue %b intead of %x within the $fdiplay command. The teting procedure i jut a imple concatenation of the previou part together with ome definition of variable. 11 tet bench 11 width parameter 5 number of tet vector 7b Input/Output file name 7a * Tet bench procedure uing input and put file module tetbench(tetclock); input tetclock; * Variable to handle the put file * and count the number of tet vector. reg [31:0] fd; integer numvector; * The definition of tet clock a a wire * and the definition of the tetvector array. wire tetclock; reg [2* width+3:0] tetvector [ num_tet-1:0]; deign ignal 9a deign under tet 9b initialize file 8b aign the input to one tet vector 8a check put and write to file until finihed 10 module // tetbench Thi code i ued in chunk 13b and 14b. 11

Deping on the imulator ued, the variable to handle the put file may be defined a a wire or a an integer. The above tructure of the tet bench which read the input from a file and ave the put into a file i general enough to handle a large variety of deign not jut imple adder. The teting procedure above need the tetclock input ignal. We may imulate a clock generation by 12 clock generation 12 timecale 1p/1p define clk_cycle 4 // Clock period * Clock generation module clock(tetclock); // Interface put tetclock; // Internal clk ignal reg tetclock; initial tetclock=0; // Alway executing at time 0 and NEVER top // toggle the clock every half period alway #( clk_cycle/2) tetclock = ~tetclock; module // clock Thi code i ued in chunk 13b and 14a. 12

Since the code of the clock generation ue explicit delay it may produce error if ued in a ynthei tool. Thi code i ueful only for imulation not for ynthei. Thi ditinction between imulation code and ynthei code i alway good to remember. Uually the tet bench code contain ome command that cannot be yntheized uch a $readmemh, $fdiplay, and the ue of explicit delay. Another way to generate the clock i // Generate Clock with period = 66 delay initial forever tetclock = 1; #33; tetclock = 0; #33; It i alway good practice to tart any code with a preamble comment giving the date or verion of the file and who the author i a well a any copyright notice. 13a 13b generic preamble comment 13a * * Written by Hoam A. H. Fahmy in 2013 * for the tudent in hi computer arithmetic cla within * the Electronic and Communication Engineering Department * of Cairo Univerity, Egypt. * * For any other ue beyond the cla, pleae conult with the * original author. Thi code i ued in chunk 13 15. The full tet bench file may contain all the circuit module in one file a in top module with all circuit in one file 13b generic preamble comment 13a clock generation 12 tet bench 11 full adder 4 variable width adder 6 * Top module connecting the clock and the tet bench module top(); wire tetclock; clock clock(tetclock); tetbench tet(tetclock); module // top Root chunk (not ued in thi document). 13

14a For large deign the incluion of all the module in one file i not very practical. In uch cae it i better to ue eparate file and ak the imulator to include the different file into the imulation. top module in a eparate file 14a generic preamble comment 13a clock generation 12 * Top module connecting the clock and the tet bench module top(); wire tetclock; clock clock (tetclock); tetbench tet(tetclock); 14b 14c module // top Root chunk (not ued in thi document). The other file in thi cae may include one for the tet bench tet bench file 14b generic preamble comment 13a tet bench 11 Root chunk (not ued in thi document). and one or more file for the deign deign file or file 14c generic preamble comment 13a full adder 4 variable width adder 6 Root chunk (not ued in thi document). 14

In the cae of a imple operation uch a the addition which could be eaily performed by very imple command in Verilog, we may ue Verilog operation intead of file containing tet vector. In thi cae we make a loop in Verilog to generate all the poible combination, aign the input value, check the put againt the reult of Verilog operation (aumed to be correct), and continue to exhaut all the poible cae. 15 tet bench with loop 15 generic preamble comment 13a width parameter 5 * Tet bench procedure uing a loop module tetbench(tetclock); input tetclock; * Variable x i a loop counter that tart at 0 * the mot ignificant bit of x give the carry in * while the ret of x give the two operand * the c and are the put of the deign * the vc and v are the correct put * the cin ignal i the carry in for the calculation * of the correct put reg [ width*2:0] x; wire c; wire [ width-1:0] ; reg [ width-1:0] v; reg [ width-1:0] cin; reg vc; initial x=0; cin =0; * At the poitive edge increment x * and check if it flipped back to zero alway @(poedge tetclock) x = x+ 1; if (x==0) 15

$diplay("pae all tet"); $finih; // Intantiate yourcell, note how the input are given adder adder(c,, x[ width*2-1: width],x[ width-1:0],x[ width*2]); * Check at the negative edge of the clock * Note that the operator ued i!== not jut!= * the!= operator check only the 0 and 1 value * while the!== operator check alo the Z and X value alway @(negedge tetclock) cin[0] = x[ width*2]; {vc,v} = {1 b0,x[ width*2-1: width]}+{1 b0,x[ width-1:0]}+{1 b0,cin}; // $diplay("%x_%x_%x",x[ width*2-1: width],x[ width-1:0],x[ width*2]); // $diplay("%x_%x",vc,v); if ({vc,v}!== {c,}) $diplay("error a=%d b=%d cin=%d",x[ width*2-1: width],x[ width-1:0],x[ width*2]); $diplay("error vc=%d v=%d c=%d =%d",vc,v,c,); $finih; module Root chunk (not ued in thi document). 16

For a imple deign with a mall number of input we can even do an exhautive tet manually by etting all the poible value and checking the put or viually by looking at waveform timing diagram of the deign ignal reulting from the imulation. An 8 bit adder depite being a imple deign i already beyond the poibility of manual checking in reaonable time. Another baic element that i ued in the deign of the aturating adder i the two to one multiplexer. 17a mux2x1 17a module mux2x1 (, i0,, ); input i0; input ; input ; put reg ; alway@(*) = ()? : i0; 17b module Root chunk (not ued in thi document). // mux2x1 We can combine the full adder and the multiplexer to form the cell of the aturating adder. The ret of the code hould be written here for a full olution. aturating adder cell 17b module atadd (); module Root chunk (not ued in thi document). // atadd 17

4 New carry ignal 1. When the value of g i and p i are ubtituted in the firt equation we get c i+1 = a i b i + (a i b i )c i (1) = a i b i + a i bi c i + ā i b i c i (2) = a i b i + a i c i + b i c i. (3) Similarly, when we ue the value of g i and p i in the econd equation we get c i+1 = b i (a i b i ) + (a i b i )c i (4) = b i (a i b i + ā i bi ) + a i bi c i + ā i b i c i (5) = a i b i + a i c i + b i c i. (6) The two expreion are equivalent and the fri claim i correct. 2. The ame approach a in traditional carry-lookahead i followed. c i+1 = g i p i + p i c i (7) c i+1 = g i p i + p i (g i 1 p i 1 + p i 1 c i 1 ) (8) c i+1 = g i p i + p i g i 1 p i 1 + p i p i 1 (g i 2 p i 2 + p i 2 c i 2 ) (9) c i+1 = g i p i + p i g i 1 p i 1 + p i p i 1 g i 2 p i 2 + p i p i 1 p i 2 (g i 3 p i 3 + p i 3 c i 3 ) (10) c i+1 = g i p i + p i g i 1 p i 1 + p i p i 1 g i 2 p i 2 + p i p i 1 p i 2 g i 3 p i 3 + p i p i 1 p i 2 p i 3 c i 3. (11) We define the group propagate ignal a then ue G i i 3 = g i p i + p i g i 1 p i 1 + p i p i 1 g i 2 p i 2 + p i p i 1 p i 2 g i 3 p i 3 (12) c i+1 = G i i 3 P i i 3 + P i i 3 c i. (13) Thi lat equation can be ued ince the term G P i i 3 i i 3 reduce to G i i 3 which make equation 13 equivalent to equation 11. A imilar grouping i ued at higher level. The block diagram hould be drawn here for a full olution. 3. The implification of the g generate ignal i an advantage of the new cheme but the complication of the carry equation at the higher level eem to overcome that advantage. In general, the traditional cheme i better ince it ue le logic gate which might make it fater, maller in area, and le in power conumption. However, if the baic element available for the implementation are multiplexer then the new deign might be fater. 18