Fall 2010 EE457 Instructor: Gandhi Puvvada Date: 10/1/2010, Friday in SGM123 Name:

Fall 2010 EE457 Intructor: Gandhi Puvvada Quiz (~ 10%) Date: 10/1/2010, Friday in SGM123 Name: Calculator and Cadence Verilog guide are allowed; Cloed-book, Cloed-note, Time: 12:00-2:15PM Total point: Student ID: Do NOT write any tudent ID or SSN Perfect core: / 1 ( point) min. 1.1 A and B are negative number repreented in 2 complement notation. A i 16-bit in ize and B i 8-bit in ize. For A to be equal to B, all the 8 A bit (A[14:7]) hall be (all zero / all one) (and / or) the ret of the 7 A bit (A[6:0]) hall be equal to the correponding 7 B bit (B[6:0]). For A to be le than B (like in -3 i le than -2), it i enough if any of the 8 A bit (A[14:7]) i a (zero / one). On the other hand, if all thoe 8 A bit (A[14:7]) are (all zero / all one), then, for A to be le than B, we need the 7-bit A (A[6:0]) to be (lower / higher) than the correponding 7-bit B (B[6:0]). Here we compare the two 7-bit number treating them a (igned / unigned) 7-bit number. 1.2 Mealy machine deign: Browe through the tate diagram on the next page firt. Here, we perform erial inpection of bit of A (or bit of A and B) to compare them. A i a 16-bit number, but for thi part of the quetion, B can be any where between 8-bit to 16-bit number. Here, we are allowed to inpect at a time (in a clock) one bit of A (A[I]) and (imultaneouly, if needed) one bit of B (B[J]). The I and the J are indice into the A and B repectively. I i initialized to 15. J i initiated to Jini (Jinitial) which can be anywhere between 7 through 15 (correponding to the B ize of 8-bit to 16-bit ). You will be needing to compare I and J to ee when they are equal. Note: Your TA ay that,after STAT i given, you hould not take more than 16 clock. After all, A i 16 bit and B i at mot 16 bit. So decrement I and/or J a oon a poible! 1.2.1 Suppoe B i an 8-bit number. State an example of A and B (in binary) uch that the concluion i drawn in the leat number of clock. A = ; B = ; How many clock are pent in INS_AI_BJ tate for the above number? State an example of A and B (in binary) uch that the concluion i drawn in the mot number of clock. A = ; B = ; How many clock are pent in INS_AI_BJ tate for the above number? 1.2.2 Since A and B are negative, there i no point looking at A[15] and B[Jini]. True / Fale 1.2.3 Since A ize i fixed at 16-bit, if A i equal to B (equal in value, but not necearily in ize), it take the ame number of clock to compare A and B, irrepective of the ize of B. True / Fale October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 1 / 9 C Copyright 2010 Gandhi Puvvada

1.2.4 State diagram: Complete the 7 miing tate tranition condition for 7 tranitition arrow. Complete TL for the three tate. In tate C_I_J (compare I and J), you would like to decrement (I / J / I or J / I and J / I unconditionally and J conditionally / neither I nor J). Crucial point: You want to arrive in INS_AI_BJ with the right combination of I and J whether you arrive from C_I_J or INS_AI. If B i an 8-bit number (B[7:0]), the firt pair to be compared in INS_AI_BJ i (A[7],B[7] / A[6],B[6] / neither). And if B i a 16-bit number (B[15:0]), the firt pair to be compared in INS_AI_BJ i (A[15],B[15] / A[14],B[14] / neither). ESET I STAT Initial A <= Aini; B <= Bini; I <= 15; J <= Jini; STAT C_I_J Compare I, J INS_AI Inpect A[I] INS_AI_BJ Inpect A[I],B[J] D_AGTB Done, A greater than B D_AEQB Done, A equal to B D_ALTB Done, A le than B October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 2 / 9 C Copyright 2010 Gandhi Puvvada

2 ( point) min. State diagram coding in Verilog (you may refer to the Cadence (Eperan) Verilog guide): Conider the following partial flowchart and the correponding partial tate diagram along with the Verilog code egment written by four tudent. T T S2 T S0 A? F B? F C? if (A) tate <= ; if (B) tate <= S2; if (C) tate <= S3; tate <= S0; #1 if (!A &&!B &&!C) tate <= S0; if (!A &&!B && C) tate <= S3; if (!A && B) tate <= S2; #2 // there i no if A tate <= ; A S3 F A B C S0 A B C A B S3 S2 if (A) tate <= ; if (!A && B) tate <= S2; if (!A &&!B && C) tate <= S3; if (!A &&!B &&!C) tate <= S0; #3 #4 if (A) tate <= ; if (!A && B) tate <= S2; if (!A &&!B && C) tate <= S3; if (!A &&!B &&!C) tate <= S0; 2.1 Notice that code #3 i imilar to code #1, except that code #3 i perhap unnecearily (but harmlely) more verboe (like my wife! don t tell her). Code #4 i formed by removing the three occurrence of "" in code #3. Code #2 i eentially the revere ordering of code #3. Write "ight" or "Wrong" below for each. Code #1 ; Code #2 ; Code #3 ; Code #4 ; 2.2 Now conider the incomplete Code #5 on the ide along with the Karnaugh map repreentation of the deired tate tranition. If all the three, A, B, C, are true, tate get aigned with S3, get reaigned with S2 and further reaigned with. Since the lat aignment prevail over the prior aignment, in thi cae, tate finally goe to. Note that there i no if claue leading back to S0. Complete the "if" condition in code #5. Ele tate reaon why it can not be completed. #5 if ( ) // write either B or C tate <= S3; if ( ) // write either B or C tate <= S2; if (A) tate <= ; A BC 00 01 B 11 10 0 A 1 S0 S3 S2 S2 C October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 3 / 9 C Copyright 2010 Gandhi Puvvada

2.3 Combinational logic coding: The reult i either the um SUM of A and B (A+B) or the difference DIFF X minu Y (X-Y), deping on which ever i greater. Auming that all needed declaration are already made appropriately (a reg or wire), complete the alway block below. A B X Y + - SUM DIFF P Q Q>P I1S Y GT_Int GT internal ignal Out of the three SUM, DIFF,, we need to aign uing (blocking / non-blocking) aignment operator only, where a we can aign uing any one of the two operator. DIFF_GT alway @( ) SUM A + B DIFF X - Y GT_Int DIFF_GT GT_Int if (GT_Int) Sequential logic coding: The above deign i modified to have regitered output. Thee regiter hall be updated only in COMP tate. Complete the TL (datapath operation) in the COMP tate (COMP cae branch). The cae tatement i in an alway block( alway @ (poedge ) ). A B X Y + - SUM DIFF GT_Int I1S Y I1S Y D Q [7:0] [7:0] P Q Q>P I1S Y D Q QCOMP (STATE == COMP) one-hot CU notation S DIFF_GT Out of the three SUM, DIFF,,we need to aign uing (blocking / non-blocking) aignment operator only, where a we S COMP:// COMP tate cae branch SUM A + B DIFF X - Y GT_Int DIFF_GT GT_Int if (GT_Int) 3 ( 10 point) 7 min. eproduced below i a Fall 2008 Quiz quetion together with it anwer. Fall 2008 Quetion Number ytem, adder deign: You are looking for a 3-bit adder/ubtractor, which can perform addition or ubtraction of igned or unigned 3-bit number and produce appropriate um/difference together with overflow information. You are given the following 4-bit adder/ubtractor chip. Your lab partner connected it to A[2:0], B[2:0], and SUM[2:0] a hown below. He i not ure whether thi i o far correct and alo he doe not know how to proceed with X0 and Y0 (i.e. whether to connect 0,0 or 0,1, or 1,0, or 1,1). October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 4 / 9 C Copyright 2010 Gandhi Puvvada

Fall 2008 Q & A Carry Carry awcarry B2 B1 B0 Y3 Y2 Y1 Y0 A2 A1 A0 X3 X2 X1 X0 S3 S2 S0 SUM2 SUM1 SUM0 N.C no connection V C0 ADD/SUB Do you agree with your partner partial deign? Agree / Diagree If you agree, tate what choice work for (X0,Y0) among {(0,0), (0,1), (1,0), (1,1)}. If you diagree, tate reaon. Anwer: {(0,0),(1,0)}. Note: (0,1) doe not convey the needed 1 to C1 for ubtraction. (1,1) convey unneceary 1 to C1 during addition. Same quetion a above, but ince 2 pin i broken, we are forced to connect a hown below. awcarry A2 B2 Y3 Y2 Y1 Y0 X3 X2 X1 X0 S3 S2 S0 SUM2 V N.C no connection A1 B1 SUM1 A0 B0 SUM0 C0 ADD/SUB Baically, the full-adder #2 hould act like a tranparent medium conveying the incoming C2 a outgoing C3. Select one or more choice of connecting to (X2, Y2). (i) (0, 0) (ii) (1, 0) (iii) (0, ADD/SUB) (iv) (0, inverted ADD/SUB) (v) ( ) // fill-in 4 ( point) min. Performance: 4.1 We know frequency of occurrence in the dynamic trace can eaily be different from percentage of time pent. For example, floating point intruction may occur only 10% in frequency but may take 20% of the execution time a they are long. In the following table, there are two unknown, F and T. Find them. Hint: Conider 100 intruction, clock pent on them, percentage of clock pent on C category. Category CPI Frequency Time Spent A B C 2 5 10 50% (50-F)% F % T % (75-T)% 25% 4.2 Without changing ISA or frequency, by improving non-floating point intruction by reducing the number of clock taken to execute them, you will improve (circle all applicable): (A) ET (Execution Time) (B) elative MIPS (C) Native MIPS (D) MFLOP Without changing ISA or frequency, by improving floating point intruction by reducing the number of clock taken to execute them, you will improve (circle all applicable): (A) ET (Execution Time) (B) elative MIPS (C) Native MIPS (D) MFLOP October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 5 / 9 C Copyright 2010 Gandhi Puvvada

Without changing frequency, by plitting a non-floating point intruction, which wa taking 12 clock, into two non-floating point intruction taking 6 clock each, you will caue (i) ET (Execution Time) (to go up/to go down/to remain the ame) (ii) elative MIPS (to go up/to go down/to remain the ame) (iii) Native MIPS (to go up/to go down/to remain the ame) (iv) MFLOP (to go up/to go down/to remain the ame) Without changing frequency, by plitting a floating point intruction, which wa taking 12 clock, into two floating point intruction taking 6 clock each, you will caue (i) ET (Execution Time) (to go up/to go down/to remain the ame) (ii) elative MIPS (to go up/to go down/to remain the ame) (iii) Native MIPS (to go up/to go down/to remain the ame) (iv) MFLOP (to go up/to go down/to remain the ame) 5 ( point) min. Single-cycle CPU: eproduced on the next two page i the ingle-cycle CPU block diagram, unmodified on the firt page for your reference and partially modified for your completion on the next page to upport thi new load word intruction, lw_addu, which i defined a follow. regular lw: lw rt, offet(r) ; (rt) <= M[offet = (r)] new lw_add: lw_addu rt, offet(r) ; (rt) <= (rt) + M[offet = (r)] After reading the content of the memory, intead of imply depoiting the read-content into rt, it add the read-content to rt and depoit the um into rt. An additional adder after the memory could be ued but Mr. Trojan told u to ue the BTA (Branch Target Addre) calculating adder a it i not required to calculate BTA when thi lw_add i executing. Ue two multiplexer to deliver the memory content and the rt content to the BTA_adder and another multiplexer to deliver the BTA adder reult a WD (write data) to the regiter file. Control all the three multiplexer uing a new control line called lau (hort for lw_addu). 5.1 Complete the block diagram on next to next page and the control ignal table below. Intruction egdt ALUSrc Memtoreg egwrite Memead MemWrite Branch ALUOp1 ALUop0 lau -format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 w X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 v-lw lw_addu 1 0 1 1 1 0 0 0 1 5.2 If the original lw intruction needed 40n (10n to fetch the lw intruction, 5n to fetch ource regiter, 10n to calculate effective addre, 10n to read the memory, and 5n to write back to the regiter file). What i your etimate of time needed by lw_addu? October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 6 / 9 C Copyright 2010 Gandhi Puvvada

5.3 The hardware engineer implemented thi new lw_addu intruction in the new verion of the ingle-cycle CPU but compiler wa not redeigned to utilize thi new intruction. So frequency of occurrence of thi intruction in the dynamic execution trace i 0% (zero percent). The ET (execution time) of the bench mark (goe up / goe down / remain the ame). If you aid, goe up or goe down, do you have adequate data to calculate the factor of performance change? Ye / No. If ye, the performance ha changed by 5.4 new lw_add:lw_addu rt, offet(r) ; (rt) <= (rt) + M[offet = (r) How many ource regiter? how many detination regiter? Intercept Single Cycle CPU (unmodified) No need to write on thi page Intercept October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 7 / 9 C Copyright 2010 Gandhi Puvvada

6 ( point) min. Multi-cycle CPU: 6.1 While PCWrite (i needed / in t needed) in the ingle-cycle CPU, it (i needed / in t needed) in the multi-cycle CPU. Chooe an appropriate PC-egiter deign from the following 3 choice. Choice 1 i the cheapet and #3 i the mot expenive. You hould chooe cheaper deign if it atifie your need. #1 PC #2 PC #3 D Q [31:0] [31:0] D Q [31:0] [31:0] I1S Y D Q [31:0] [31:0] PC Level-enitive Latch Edge-enitive bare regiter (no recirculating mux) Edge-enitive regiter with recirculating mux Your choice of PC for the ingle-cycle CPU : Your choice of PC for the multi-cycle CPU : 6.2 Temporary regiter are needed if information i produced in one clock and conumed in a later clock. However, we could avoid temporary regiter (for example, MD Memory Data egiter) by Temporary regiter have Temp_egWrite control input o that the CU can tell regiter when to write. The exception to thi rule i 6.3 Computing BTA (for an aumed beq intruction), during the decode tate, i (better/wore) than computing EA (effective addre) (for an aumed lw/w intruction) becaue Blank Space (cratch pad area) October 1, 2010 4:01 pm EE457 Quiz - Fall 2010 9 / 9 C Copyright 2010 Gandhi Puvvada