Name:. _ -A. ECE 411 Exam 2. November 3, pm-10pm. This exam has 5 questions. Make sure you have a complete exam before you begin.

Size: px

Start display at page:

Download "Name:. _ -A. ECE 411 Exam 2. November 3, pm-10pm. This exam has 5 questions. Make sure you have a complete exam before you begin."

Eustacia Chapman
5 years ago
Views:

Name:. _ A.!...,!n.!...l.;s~w=e=:: '.) J...t _ November 3, 2015 7pm10pm This exam has 5 questions. Make sure you have a complete exam before you begin.

If you need more space to answer a given question, continue on the back of the page. but clearly indicate that you have done so. This exam is closedbook. You may use one sheet of notes.

Correct answers that do not include work demonstrating how they were generated may not receive full credit, and answers that show no work cannot receive partial credit.

1 Name:. _ A.!...,!n.!...l.;s~w=e=:: '.) J...t _ November 3, pm10pm This exam has 5 questions. Make sure you have a complete exam before you begin. Write your name on every page in case pages become separated during grading. You will have three hours to complete this exam. Write all of your answers on the exam itself. If you need more space to answer a given question, continue on the back of the page. but clearly indicate that you have done so. This exam is closedbook. You may use one sheet of notes. You may use a calculator. Do not do anything that might be perceived as cheating. The minimwn penalty for cheating will be a grade of zero. Show all of your work on all problems. Correct answers that do not include work demonstrating how they were generated may not receive full credit, and answers that show no work cannot receive partial credit. The exam is meant to test your understanding. Ample time has been provided. So be patient and read the questions carefully before you answer. Good luck! Question Points Score Machine Problem 16 Data Hazard 18 Control Hazard 18 IEEE Floating Point Format 21 Tomasulo 22 Total: 95

1. Machjne Problem (16 points) for the entire question, the cache has 8 sets, each line is 128 bits,

(a) (4 points) In the given figures is a controller and part of a data path for an ECE 4 11 MP2 cache.

The mem_resp signal should only be high wheu a memory access is being performed.

2 1. Machjne Problem (16 points) for the entire question, the cache has 8 sets, each line is 128 bits, and is byte addressable. (a) (4 points) In the given figures is a controller and part of a data path for an ECE 4 11 MP2 cache. Write SystemVerilog code to implement tl1e module labeled gen_mem_resp. The mem_resp signal should only be high wheu a memory access is being performed. (write answer on page 4). address_split a.,2,o 3 ~ g~ lb :::; 8"3 ~ 3 c I I ~ ;;.., :I 'o ~ :E a. < oro.c.~ ~. a. ~ \0~,..,.;ti)~ lb data_array tag_ array valid_array dirty_array wa data[! way_dirtyll) mem_write gen_mem_resp mem_resp 3 v; "'

$starts here ) ; I II\ ~ut If! f>~ t ovtrvt " ~~<ter~... h, r 1,.$ $component latencies. Compute the latency ofmem_resp for part A for a \vrite.$ and gate 2ns or gate 2ns 2to1 mux 4 ns 18 bit comparator 4 ns 916 bit

and gate 2ns or gate 2ns 2to1 mux 4 ns 18 bit comparator 4 ns 916 bit

3 Name: import lc3b_types: :*; module gen_mem_resp ( //Your answer for part A starts here ) ; I II\ ~ut If! f>~ t ovtrvt " ~~<ter~... h, r 1,.,c,_ resf endmodule gen_mem_resp; (b) (4 points) Given the following table of component latencies. Compute the latency ofmem_resp for part A for a \vrite. and gate 2ns or gate 2ns 2to1 mux 4 ns 18 bit comparator 4 ns 916 bit comparator 6ns tag/ data/valid I dirty array 20 ns onj ( h,.;... ~.~) lo + 31 (\ ~ ECE41 l Exam2 Page4

4 DataPath byte_enable mem_write mem_read mem_wdata mem_rdata mem_address mem_resp mem_address byte_enable address rdata wdata read write CacheComponent0 write_line cache_line_input eviction_data eviction_address hit miss eviction mem_address[6:4] 0 1 byte_enable CacheComponent1 hit miss eviction address rdata wdata read write eviction_address eviction_data cache_line_input write_line pmem_rdata 1Bit Array write input index lru_out pmem_rdata lru_out output lru_out lru_out 0 1 lru_out pmem_rdata pmem_write pmem_read pmem_wdata pmem_rdata pmem_address pmem_resp PhysicalMemory lru_out mem_address cw eviction pw pr pmem_resp miss Input/Output Signals ~pmem_resp State Machine HitIdle WB LdWait Load cw=0; cw=0; cw=0 cw=1 pw=0; pw=1; pw=0 pw=0 pr=0; pr=0; pr=1 pr=0 Output Values WB eviction pmem_resp HitIdle miss&~eviction LdWait ~miss always pmem_resp Load ~pmem_resp

5 Name: 2. Data Hazard (18 points) This question tests your understanding of the mechanisms for handling data hazards in pipelined processors. Assume a 5stage pipelined processor discussed in class with transparent register file. It has no data forwarding but staus on data hazards. Consider the following code segment: 11: LOR Rl, RO, A 12: LOR R2, RO, 8 13: ADD Rl, Rl, R2 14: STR Rl, RO, c 15: AND R3, Rl, 4 16: ADD R3, Rl, R3 (a) (5 points) Identify all the register data dependencies in the code above. For each occurrence of dependency, give the two instructions and the register involved in the form of (11, 12, Rn), which means 12 depends on I 1 through their use of Rn. Leave blank if no dependency is present for that category. RAW Dependency: (:r,. h '\"' (.L, 1... P1.') l~ '4.R ) WAW Dependency: ::... 1~. ~~) I k l.t ll~) WAR Dependency: Which of the dependences is/ are data hazard(s) in this system? Why and why not? Page 7

Nan1e:. (b) (4 points) Complete the following table with all the data hazards handled with stall but no forwarding.

$A ~~~~~~=~~~~~~l~~~;t~~~i~~~~=: ~c:~= ~~? Jo... 1i n13 ~~~ ~ ~=~ ~;,~~~:;~~:~~.~~ ~~~... ~:.. f:=: :~... ~~i.r:.lisl: 1 :~ f"l( Mt~ W\~ =~= = := :::~ ::: :~ :.~:~.~.. ~.=+.$ $....... f.. ++ +1 14)STRRt.RO,C ' i j ~'!: "t.r ~F ):{) E~ M.tf" Vvi} IS),\NDR3,R1, 4... t...l......... _j._f..::92 t f.~~ _w'"'> f...... 16)ADOR3, RI.R3 [ l ~ '!$ .. (c) (4 points) What is the speedup of running this code segment with full data forwarding (MEM >EX, WB>EX and WB>MEM)? Show your calculation.

.. (c) (4 points) What is the speedup of running this code segment with full data forwarding (MEM >EX, WB>EX and WB>MEM)? Show your calculation.

6 Nan1e:. (b) (4 points) Complete the following table with all the data hazards handled with stall but no forwarding. Fill each cell with the current pipeline stage (IF, ID, EX, MEM or WB) the instruction is in at each cycle. Leave the cell blank only when the instruction is not in the pipeline. The first two rows have been completed as an example. ~;~~~>Rnt.Ro. A ~~~~~~=~~~~~~l~~~;t~~~i~~~~=: ~c:~= ~~? Jo... 1i n13 ~~~ ~ ~=~ ~;,~~~:;~~:~~.~~ ~~~... ~:.. f:=: :~... ~~i.r:.lisl: 1 :~ f"l( Mt~ W\~ =~= = := :::~ ::: :~ :.~:~.~.. ~.= f )STRRt.RO,C ' i j ~'!: "t.r ~F ):{) E~ M.tf" Vvi} IS),\NDR3,R1, 4... t...l _j._f..::92 t f.~~ _w'"'> f )ADOR3, RI.R3 [ l ~ '!.!i" '!.~ :Lt) fll ~ vvl')...l..' ''... "... L~... (c) (4 points) What is the speedup of running this code segment with full data forwarding (MEM >EX, WB>EX and WB>MEM)? Show your calculation. 4 (d) (5 points) Sort the following optimizations by the max:imum speedup they can achieve on this code segment assuming that we have both stalling and bypassing. Use only> and =. For example, a> b means option a has a higher speedup than option b. Put your answer in the box and explain your reasoning below the box. a. Instruction reordering b. Hardware Register renaming c. 2issue superscalar d. 3issue superscalar ECE4ll Exam2 Page 8

EX stage For every cycle, fill in the pc of the instruction for each stage.

Leave the cell blank ifrhere is no valid instruction in there.

7 Name: 3. Control Hazard (18 points) Assume the following conditions for this problem: Ll Cache hit rate 100% BTB hit rate 100% Branch decision is available at the end of EX stage For every cycle, fill in the pc of the instruction for each stage. Only the instruction shown in the following table are branch instructions, and there are no other instruction which would cause a pc change (i.e., no JSR/JMP/TRAP/etc) and no indirect instructions. Leave the cell blank ifrhere is no valid instruction in there. Branch Id Branch Instruction PC Target Action PC 0 Ox22 Ox38 Not Taken 1 Ox24 Ox36 Taken 2 Ox38 Ox50 Not Taken (a) (6 points) There is no prediction (and no BTB). Cycle# IF ID EX 0 Ox22 Ox20 Oxle 1 )il). 0x)..0 2 I 0Jt "2)_ 3 Ch L4 4 Orl}f 5 D'L 1Af 6 OdC. 7 ().., t.4. fh H,, (b) (6 points) BTB is present, always predict t~ken. Cycle# IF ID EX 0 Ox MEM Oxlc Oxle OvL.O ('J>.t 2. "L MEM Oxlc WB Ox1a Oxlc Oxle Ov20 Ot.. L <f. ~t...cf WB Oxla Page9

Name: (c) (6 points) BRIEFLY explain why it is (or is

(compare to the baseline: resolve branch in WB).

8 Name: (c) (6 points) BRIEFLY explain why it is (or is not) challenging to resolve branch in EX stage or earlier. BRIEFLY give a high level idea on how it would work (compare to the baseline: resolve branch in WB). Resolve means "having the branch decision ready" LOR ih. M"&M ~R it.. E;X Y\e'e~ tv {Mollf~ CL ~ to EX M.&A1 Page 10

Name: _ 4. IEEE Floating Point Format (21 points) This question tests your understanding of the IEEE 754 Floating Point Standard.

The format supports denom1ajized numbers. Show your work for full and partial credit. (a) (3 points) What is the representation of decimal value 0.0 in this format?

$(c) (3 points) What is the decimal value of (0 000 01)? You can expressed as a power of2? ~ ~f\'ot'lr\ n..lt I Lf ~ ( yf /1.6 ~...~.l:z,j... ~ y ' 00.,).$

9 Name: _ 4. IEEE Floating Point Format (21 points) This question tests your understanding of the IEEE 754 Floating Point Standard. Assume a hypothetical 6bit floating format (I bits, 3bit E, 2bit M) that conforms to the IEEE 754 standard in answering the following questions. The format supports denom1ajized numbers. Show your work for full and partial credit. (a) (3 points) What is the representation of decimal value 0.0 in this format? s:.o, E=v I M =. o '" (0; ()0 0 '00) (b) (3 points) What is the decimal value of the largest representable number in this format? (c) (3 points) What is the decimal value of ( )? You can expressed as a power of2? ~ ~f\'ot'lr\ n..lt I Lf ~ ( yf /1.6 ~...~.l:z,j... ~ y ' 00.,).A(_$) (e) (3 points) What would be the correct result value of the previous su~ja!non~.c ( 1) :( o. o ~ ( 0 0 0! (f) {3 points) What is the ULP for ( )? (g) (3 points) What is the decimal value of ( )? to 1f...t v~~ 6 f f /...,! S ~.A(I )::: 2 Page 11 _

Name: 5. Tomasulo (22 points) Follow these guidelines Null (empty) values are denoted by"". All instructions are fetched and reside in instruction queue. One instruction can be issued per cycle.

10 Name: 5. Tomasulo (22 points) Follow these guidelines Null (empty) values are denoted by"". All instructions are fetched and reside in instruction queue. One instruction can be issued per cycle. If all operands are available, an instruction can issue and begin execution on the same cycle. LD/ST instructions can begin execution once the address is available. When a reservation station obtains its last operand value, it can begin execution on the next cycle. If one instruction finishes executing, the result is broadcast at the same cycle. The value is written to register tile at the next cycle. If 1:\vo or more instructions finish execution at the same cycle, the one issued first broadcast first. There are one ADDF /SUBF tmit, one MULTF /DIVF unit, and one LD/ST unit. Each pipelined execution unit has two reservation stations (or load/store buffer) and can start executing a new instruction every clock cycle. The execution unit latencies are ADDF/SUBF = 5, MULTF = 10, DIVF = 20, LD = 10, ST = 15. These latencies do not include writing the results to reservation stations and/ or registers. Reservation stations are deallocated during the result writing cycle and can be reassigned on the following cycle. Deallocated but not reassigned reservation stations should be indicated with null fields. Initial register contents of Rl: 100. Values in memory should be denoted by MEM(addrs). Here is a snapshot of the system at cycle 0 (CO). Instruction Status: # Instruction Issued? EX Complete l ADDF Fl, F2, F3 y 2 SUBF F4, Fl, F5 N 3 LD Fl, O(Rl) N 4 DIVF F7, F3, F8 N 5 ADDF F4, F4, Fl N 6 MlJLfF F6, F3, F4 N 7 STO(Rl), F6 N Results Written? Reservation Stations: Name Busy? OP Addl y ADDF Add2 N Multl N Mult2 N Valuel 2 Value2 3 Producerl Producer2 ECE 41 l Exam 2 Page 12

Name: Load/Store Buffers: Register File (FlF8) Status: Name Busy?

points) Suppose that instruction 1 issued in cycle CO.

If an instruction is not issued, briefly explain why it cannot. crc.b 2 C1 A /;. 3 ()z l~j /. 4 CJ t ttdjt I rth AJJ.

11 Name: Load/Store Buffers: Register File (FlF8) Status: Name Busy? Address Producer Value Loadl N N/A N/A Storel N Register Fl F2 F3 F4 F5 F6 F7 F8 Producer Addl Value (a) (2 points) Suppose that instruction 1 issued in cycle CO. Which instructions will issue in the next 6 cycles and at which cycle? If an instruction is not issued, briefly explain why it cannot. crc.b 2 C1 A /;. 3 ()z l~j /. 4 CJ t ttdjt I rth AJJ.I (b) (8 points) Show the state of the system at the end of cycle 12 (Cl2). Instruction Status: # Instruction Issued? EX Complete Results Written? l ADDF Fl. 2, F3 A~J1 AJb ( J I 2 3 tj1.,tf 4 1\ JJ I f5+===::,..:r~t ;._~+!..:!...!.,L1~t~,. f rvtl\rtz. 6 ~~=~~ ~~~1~~~~~1~T+1 01 ~ 7 ~~~~~~~~~~~~~~ Page 13

$Name: Reservation Stations: Name Busy? OP Valuel Value2 Producer! Producer2 Addl Add2 tv\lm ( lf"'!$ Address Producer Value Loadl I tj N/A N/A Store I I ~ li17 t.jh.

Address Producer Value Loadl I tj N/A N/A Store I I ~ li17 t.jh.

12 Name: Reservation Stations: Name Busy? OP Valuel Value2 Producer! Producer2 Addl Add2 tv\lm ( lf"'! Multi /2 Mult2 Load/Store Buffers: Name I Busy? Address Producer Value Loadl I tj N/A N/A Store I I ~ li17 t.jh.h '], I Register File (Flf8) Status: Register Fl F2 F3 F4 FS f6 F7 F8 Producer II Adrll M.~tfJ; M lu ~ 'l' Value > c L z. (c) (4 points) Suppose the following instruction occurred after the given instructions. The instruction may not run correctly on the machine described in this problem. Why would it be incorrect? What should be done to correct the problem? 8: LD FlO, O(R I) Page 14

Name: (d) (4 points) Assume the instruction LD Fl, O(H]) causes a page fault at the last cycle of

Fill in the values for the register file at the cycle when the ill instruction incurs page fault.

vii Hegister File (FlF8) Status: Register Fl F2 F3 F4 FS F6 F7 F8 Value z 5 t t2 Value Correct?

13 Name: (d) (4 points) Assume the instruction LD Fl, O(H]) causes a page fault at the last cycle of its execution. Without reorder buller, some of the register value may contain wrong value. Fill in the values for the register file at the cycle when the ill instruction incurs page fault. Indicate the registers that have wrong values. vii Hegister File (FlF8) Status: Register Fl F2 F3 F4 FS F6 F7 F8 Value z 5 t t2 Value Correct? I( I( /I( y' I (e) (4 points) Besides exception, what other kinds of situation or instruction can a reorder buffer support? ECE 41 l Exam 2 Page 15 r~ ~~~~~~1

ISA Instruction Operation

ISA Instruction Operation This exam has 6 problems. Make sure you have a complete exam before you begin. Write your name on every page in case pages become separated during grading. You will have three hours to complete this exam.