6.004 Tutorial Problems L22 Branch Prediction

Size: px

Start display at page:

Download "6.004 Tutorial Problems L22 Branch Prediction"

Scarlett Lambert
5 years ago
Views:

1 6.004 Tutorial Problems L22 Branch Prediction Branch target buffer (BTB): Direct-mapped cache (can also be set-associative) that stores the target address of jumps and taken branches. The BTB is searched in the first stage of the pipeline to see if the current pc value is in the BTB. If it is, the stored target is used as the next PC. Otherwise, pc + 4 is used. The BTB is trained (updated) in case of a misprediction: jumps and taken branches insert the (pc, target) entry in the BTB, and non-taken branches and other instructions that were mispredicted as taken mark the BTB entry that caused the misprediction as invalid. Problem 1. BTB and Loop Performance You are comparing the steady state performance of the following code segment on two processors. Processor P1 is a 5-stage pipelined RISC-V processor with F, D, EX, MEM, and stages. It has full bypassing, but no BTB. Between each pipeline stage is a single register. Processor P2 is just like P1, except that it also has an 8-entry direct mapped BTB for next address prediction. The BTB is trained from the EX stage on all mispredicted next PCs. If the next PC is not PC+4, the next PC is added to the corresponding entry in the BTB. If the next PC is PC+4, the incorrect entry is removed from the BTB.. = 0x100 L1: add x11, x11, x12 lw x13, 0(x11) beqz x13, L1 I1 I2 I Fall 2018 Worksheet - 1 of 5 - L22 Branch Prediction

2 (A) Show the pipelined diagram of the execution of loop L1 on processors P1 and P2. P1: Cycle IF DEC EXE MEM P2: Cycle IF DEC EXE MEM (B) Assuming that the branch is always taken, how many cycles does it take to complete each loop iteration on each of the two processors? P1 number of cycles to execute loop: P2 number of cycles to execute loop: Fall 2018 Worksheet - 2 of 5 - L22 Branch Prediction

3 Problem 2. BTB Sizing Assume you have a four-stage pipelined RISC-V processor with F, D, EX, and stages and a BTB for next address prediction. The BTB is trained from the EX stage on all mispredicted next PCs. If the next PC is not PC+4, the next PC is added to the corresponding entry in the BTB. If the next PC is PC+4, the incorrect entry is removed from the BTB. The processor has been running the following loop for a long time:. = 0x1000 loop: lw x10, 0(x11) B1: beqz x10, L1 add x12, x10, x12 srli x12, x12, 1 L1: addi x11, x11, -4 B2: bnez x11, loop Assume the branch at B1 is taken every other loop iteration, and the branch at B2 is always taken. A) Assuming a 4-entry direct-mapped BTB, what is the average prediction accuracy for branches B1 and B2 in this BTB? B1 average prediction accuracy: B2 average prediction accuracy: B) Assuming an 8-entry direct-mapped BTB, what is the average prediction accuracy for branches B1 and B2 in this BTB? B1 average prediction accuracy: B2 average prediction accuracy: Fall 2018 Worksheet - 3 of 5 - L22 Branch Prediction

4 Problem 3. Branch Prediction in a Complex Pipeline Ben Bitdiddle has decided his high-performance RISC-V processor should have 8 pipeline stages, shown below. IF1 IF2 D RF ALU MEM1 MEM2 Instruction fetch, first cycle Instruction fetch, second cycle Instruction decode, calculate branch/jal target address Read/bypass register operands Perform ALU operation on operands, resolve branches LD/ST memory access, first cycle LD/ST memory access, second cycle Write result to register file at end of cycle Unless directed otherwise, the IF1 stage speculates that the next instruction comes from PC+4. The determination that an instruction is a branch or jump is made in the D stage. The calculation of the target address for jal and branch instructions is also made in the D stage. The calculation of the target address for jalr instructions and the actual branch decision (taken/not taken) is made in the ALU stage. (A) With the 8-stage pipeline, what is the number of NOPs introduced into the pipeline when a branch instruction changes the PC to the branch target address, i.e., it s a taken branch? When a branch instruction is not a taken branch? The number of NOPs introduced is called the branch penalty. Branch penalty for taken branches (# of NOPs introduced): Branch penalty for not-taken branches (# of NOPs introduced): To reduce the penalty for taken branches, Ben plans to use a direct mapped Branch Target Buffer (BTB) in the IF1 stage. The BTB and pipeline now work as follows: 1. The BTB holds entrypc, targetpc pairs for jumps and branches predicted to be taken. 2. The BTB is accessed every cycle. If there is a match with the current PC, PC is redirected to the targetpc predicted by the BTB (unless PC is redirected by an older instruction); if not, it is set to PC In the D stage, if a jal was mispredicted, stages IF1 and IF2 are flushed and the PC is redirected to the calculated jal target address. 4. In the ALU stage, if a jalr or a branch were mispredicted, the previous stages are flushed and the PC is redirected to the calculated jalr/branch target address Fall 2018 Worksheet - 4 of 5 - L22 Branch Prediction

5 (B) Fill out the following table of the number of pipeline bubbles (inserted NOPs) for different jumps and branches. Fill in tables with jump/branch penalties JAL: BTB hit? Correct BTB target? Pipeline Bubbles No No --- JALR: BTB hit? Correct BTB target? Pipeline Bubbles No No --- Branches: BTB hit? Correct BTB target? Actually taken? Pipeline Bubbles No No No No No --- No --- No Fall 2018 Worksheet - 5 of 5 - L22 Branch Prediction

1 /10 2 /16 3 /18 4 /15 5 /20 6 /9 7 /12

1 /10 2 /16 3 /18 4 /15 5 /20 6 /9 7 /12 M A S S A C H U S E T T S I N S T I T U T E O F T E C H N O L O G Y DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE 6.004 Computation Structures Fall 2018 Practice Quiz #3B Name Athena login