Computer Architecture Homework Set # 3 COVER SHEET Please turn in with your own solution

Size: px

Start display at page:

Download "Computer Architecture Homework Set # 3 COVER SHEET Please turn in with your own solution"

Sherilyn Peters
5 years ago
Views:

1 CSCE 6 (Fall 07) Computer Architecture Homework Set # COVER SHEET Please turn in with your own solution Eun Jung Kim Write your answers on the sheets provided. Submit with the COVER SHEET. If you need additional sheets for any of the problems, use the Additional Work Sheet page (as many copies as you require). Name : ID Number: Print your name clearly. No late homework will be accepted. You are expected to write up your solutions on your own, without referring to other students works or to solutions you may find on the web. This homework is due at the beginning of class on Thursday, October th, 07.

2 Dynamic Hardware Branch Prediction. Suppose the following branch instructions have been executed. Label Address branch Taken/Not Taken b T b NT b T...00 b NT...00 b T Draw a (, ) predictor and indicate the state of the buffer (with prediction entries per table) after executing the above branch instructions. Also show the prediction for each branch instruction. Assume that a predictor uses a saturating counter implemented in zsim A by default (0) Instruction Prediction. Suppose we have a deeply pipelined processor, for which we implement a branch target buffer for the conditional branches only. Assume that the mis-prediction penalty is always cycles and the buffer miss penalty is always cycles. Assume 90% hit rate, 9% accuracy and % conditional branch frequency. How much faster is the processor with the branch target buffer versus a processor that has a fixed -cycle branch penalty? Assume a base CPI without branch stalls of. (0)

3 . Suppose the following branch instructions have been executed. Label Address branch Taken/Not Taken b T b NT b T...00 b NT...00 b T a) Draw a (, ) predictor and indicate the state of the buffer (with prediction entries per a table) after executing the above branch instructions. Also show the prediction for each branch instruction. Assume that a predictor uses bit saturating counter implemented in zsim A by default. (0) Instruction Prediction b) Show the prediction for each branch instruction using a tournament predictor with entries. Also show the final contents of Predictor buffer and Predictor buffer. Predictor and Predictor are -bit saturating counters with prediction entries. Note that Predictor is a local predictor while Predictor is global. Assume all table and buffer contents are initialized to zero. (0) Instruction Prediction

4 . Assume there are a floating-point unit with add, multiple/divide, and load/store units, with execution latencies of clock cycles for add, for multiply, for divide, and for load/store ( for address calculation, for memory access), an integer unit for ALU operation, another unit for address calculation and the other unit for branch condition evaluation. For the code sequence below, answer the following questions. Note that the number of reservation stations is same as that of functional units and single issue.. LD F6, (R). LD F, (R). MULTD F0, F, F. SUBD F8, F6, F. DIVD F0, F0, F6 6. ADDD F6, F8, F a) Identify all the data hazards in the above code fragment, along with the type of each hazard identified. You can mark them appropriately on the code fragment and use acronyms to specify the hazard type. (0)

5 b) For the above code sequence, show the status tables when all instructions have completed with single-issue Tomasulo s algorithm. For the instruction status table, list the clock cycle when the event happens. (0) Instruction Status Instruction Issue Execute Memory Access Write Result LD F6,(R) LD F,(R) MULTD F0,F,F SUBD F8,F6,F DIVD F0,F0,F6 ADDD F6,F8,F Reservation Stations Name Busy Op Vj Vk Qj Qk A ADD ADD MUL/DIV MUL/DIV LD/STR LD/STR Register Status Field F0 F F F6 F8 F0 F... F0 Qi

6 noindent 6. Consider the execution of a loop on a two-issue processor. Assume there are two functional units; one for effective address calculation and integer ALU operation, and the other for branch condition evaluation. Also assume that there are CDB and that up to two instructions of any type can commit per clock cycle. Assume that branches single issue but that branch prediction is perfect. (Latency: integer ALU operation, load/store ( for address calculation, for memory access), FP ALU operation ). Fill out the time table of a pipeline with Dynamic Scheduling. (0) Instruction Issue Execute Memory Access Write CDB L.D F0, 0(R) DADDIU R, R, # 8 BNE R, R, LOOP L.D F0, 0(R) DADDIU R, R, # 8 BNE R, R, LOOP L.D F0, 0(R) DADDIU R, R, # 8 BNE R, R, LOOP 6

7 7. With a single-issue pipeline, unroll the loop a sufficient number of times to schedule it without any delays. Show the schedule after eliminating any redundant overhead instructions. (0) (Latency: integer ALU operation, load and store (memory access only), FP ALU operation ). Loop: L.D F0, 0(R) DADDIU R, R, # 8 BNE R, R, Loop 8.. Show a software pipelined version of the loop in question 7. You may omit the start-up and clean-up code. (0) 7

CS433 Homework 2 (Chapter 3)

CS433 Homework 2 (Chapter 3) CS Homework 2 (Chapter ) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration..