COSC 6385 Computer Architecture Dynamic Branch Prediction

Size: px

Start display at page:

Download "COSC 6385 Computer Architecture Dynamic Branch Prediction"

Elaine Gray
5 years ago
Views:

1 COSC 6385 Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 208 Pipelining Pipelining allows for overlapping the execution of instructions Limitations on the (pipelined) execution of instruction Data Dependencies Control Dependencies Minimizing the effect of the limitations can be done in Hardware Software (Compiler)

2 Branch prediction No instruction is allowed to initiate execution until all branches preceding the instruction have completed How to deal with control hazards Stall the pipeline until we know the next fetch address Guess the next fetch address (branch prediction) Employ delayed branching (branch delay slot) Do something else (fine-grained multithreading) Eliminate control-flow instructions (predicated execution) Fetch from both possible paths (multipath execution) Slide based on a lecture by Onur Mutlu, Carnegie Mellon University branch-prediction-afterlecture.pdf Idea: Predict the next fetch address (to be used in the next cycle) Requires three things to be predicted at fetch stage: Whether the fetched instruction is a branch (Conditional) branch direction Branch target address (if taken) In most instances, target address remains the same for a branch instruction across multiple instances Branch Target Buffer: Store the target address from previous instance and access it with the Program Counter Slide based on a lecture by Onur Mutlu, Carnegie Mellon University branch-prediction-afterlecture.pdf 2

3 An example for ( i=0; i < 000 ; i++ ) { x[i] = x[i] + s; } Loop: L.D F0, 0(R) ADD.D F4, F0, F2 S.D F4, 0(R) ADD R, R, #8 BNE R, R2, Loop Dynamic branch prediction Algorithms using the previous execution of a branch to predict the outcome of the next execution Several techniques for dynamic branch prediction bit branch prediction buffer 2bit branch prediction buffer Correlating Branch Prediction Buffer Tournament predictors Tagged hybrid predictors 3

4 bit Branch prediction buffer (I) Branch prediction buffer: Small memory area indexed by the lower portion of the address of the branch instruction Records whether the branch was taken the last time or not ( bit is sufficient) Address of branch instruction tag BTB index k-bits Branch Target Buffer (BTB) bit per entry 2 k entries Prediction for this branch bit BTB entry is updated after actual outcome of the branch is known Slide based on a lecture by Milos Prvulovic, Georgia Institute of Technology -bit predictor using Branch Target Address Address of branch instruction tag BTB index k-bits tag -bit Branch Target Address =? Instruction found in BTB? prediction next PC PC+4 Slide based on a lecture by Onur Mutlu, Carnegie Mellon University branch-prediction-afterlecture.pdf 4

5 bit Branch Prediction Buffer Limitations Even for a regular loop (embedded in another large loop) the bit Branch Prediction Buffer will mispredict at least the first and the last iteration st iteration: the bit has been set by the last iteration of the same loop to not-taken, but the branch will be taken Last iteration: the bit says taken, but the branch won t be taken 2bit Branch Prediction Buffer A prediction must miss twice before the prediction is changed Can be extended to n-bits Taken Predict taken Taken Not taken Taken Predict taken 0 Not taken Predict not taken 0 Not taken Taken Predict not taken 00 5

6 Correlated branches Partial correlations: one branch could test for cond, and another branch could test for cond && cond 2 (if cond is false, then the second branch can be predicted as false) Multiple correlations: one branch tests cond, a second tests cond 2, and a third tests cond cond 2 (which can always be predicted if the first two branches are known). Slide based on a lecture by Milos Prvulovic, Georgia Institute of Technology Correlated branches Local History What is the predicted outcome of Branch A given the outcomes of previous instances of Branch A? Global History What is the predicted outcome of Branch Z given the outcomes of (all/last n) previous branches A, B,, X and Y executed before Branch Z? Slide based on a lecture by Milos Prvulovic, Georgia Institute of Technology 6

7 Correlated branches For a (,) predictor: each branch has two different branch prediction buffers: Predictor used in case the previous branch in the application has not been taken X / Y Predictor used in case the previous branch in the application has been taken The content of the two branch prediction buffers are determined by the branch to which they belong (local history) Which of the two branch prediction buffers are used is depending on the outcome of the previous branch in the application (global history) Correlated branches - example if ( d==0 ) d = ; if ( d== ) BNEZ R, L!branch b MOV R, # L: ADD R3, R, #- L2: BNEZ R3, L2!branch b2 Initial value of d d==0? b Value of d before b2 d==? 2 No Taken 2 No Taken 0 Yes Not taken Yes Not taken 2 No Taken 2 No Taken 0 Yes Not taken Yes Not taken b2 7

8 Correlated branches - example d=? BPB b b act. BPB b2 B2 act. 2 NT/NT NT/NT the branch prediction buffers for the branches b and b2 are assumed to hold the prediction Not taken for both option (previous branch not taken/taken) Correlated branches - example d=? BPB b b act. BPB b2 B2 act. 2 NT/NT NT/NT assuming BPB for b uses the Not Taken predictor because the previous branch in the application has not been taken BPB for b predicts that b will not be taken 8

9 Correlated branches - example d=? BPB b b act. BPB b2 B2 act. 2 NT/NT T NT/NT BPB for b predicts that b will not be taken b is taken (see table for d=2) Initial value of d d==0? b Value of d before b2 d==? 2 No Taken 2 No Taken 0 Yes Not taken Yes Not taken b2 Correlated branches - example d=? BPB b b act. BPB b2 B2 act. 2 NT/NT T NT/NT T/NT updating the Previous branch has not been taken part of BPB for b to Taken because b has been taken, the last branch has been taken part of BPB b2 will be used BPB b2 predicts, that b2 will not be taken 9

10 Correlated branches - example d=? BPB b b act. BPB b2 B2 act. 2 NT/NT T NT/NT T T/NT NT/T b2 is taken (see table for d=2) updating the Previous branch has been taken part of BPB for b2 to Taken because b2 has been taken, the last branch has been taken part of BPB b will be used Initial BPB value b predicts, d==0? that b will b not be Value taken of d d==? b2 of d before b2 2 No Taken 2 No Taken 0 Yes Not taken Yes Not taken Correlated branches - example d=? BPB b b act. BPB b2 B2 act. 2 NT/NT T NT/NT T 0 T/NT NT NT/T b is not taken (see table for d=0) matches prediction! update of BPB b does not modify any entry taken because b has not been taken, the last branch has not been taken part of BPB b2 will be used BPB b2 predicts that b2 will not be taken Initial value of d d==0? b Value of d before b2 d==? 2 No Taken 2 No Taken 0 Yes Not taken Yes Not taken b2 0

11 Correlated branches A (2,) correlated branch predictor Uses outcome of the last 2 branches to choose from 2 2 different predictions Uses a bit predictor for each of the 4 prediction buffers Predictor used in case the previous 2 branches in the application have both not been taken (00) Predictor used in case the previous branches have the history :second last branch not taken, last branch taken (0) Predictor used in case the previous branches have the history: second last branch taken, last branch not taken (0) Predictor used in case the previous 2 branches in the application have both been taken () A / B / C / D Correlated branches How do we know which of the four sections of our branch predictor to use Need to record the behavior of all branches in the application Initial value of d d==0? b Value of d before b2 d==? 2 No Taken 2 No Taken 0 Yes Not taken Yes Not taken 2 No Taken 2 No Taken 0 Yes Not taken Yes Not taken b2 e.g

12 Frequency of Mispredictions Frequency of Mispredictions Global branch history For a (2,n) branch predictor, the outcome of last two branches is relevant bit global branch history implemented using a 2bit shift register 6% 4% Accuracy of Different Schemes 20% 8% 8% 4096 Entries 2-bit BHT Unlimited Entries 2-bit BHT 024 Entries (2,2) BHT 2% % 0% 8% 6% 4% 0% 2% 0% 6% 6% 6% 5% 5% 4% % % 0% nasa7 matrix300 tomcatv doducd spice fpppp gcc espresso eqntott li 4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry,024 entries (2,2) Slide based on a lecture by David A. Patterson, University of California, Berkley 2

13 Tournament predictors Combine multiple prediction algorithms Use a predictor to predict which predictor works best Keeps track which predictor was the most accurate for the last execution of a branch Allows to use different prediction algorithms for different types of branches 3

14 Gshare predictor Another example of a correlating branch predictor Combines branch history and branch address to determine index in a branch predictor table Branch history (e.g. 0 bit) Branch address XOR bit predictors 0 prediction TAGE: Tagged hybrid predictor Uses the same algorithm but for different history length Some branches are detected to require long history, while others are predicted accurately with short history Index calculated using a hash of branch history and branch address (of various history length) Entries in history tables are uniquely identified by a tag Tag can be short ( 4-8 bits) since hash already had to match Allows to avoid accidental matches by two different branch instructions to the same entry A prediction is only used if tags match and hash match The prediction for a given branch is the predictor with the longest branch history 4

15 Base predictor prediction tag prediction tag prediction tag prediction tag TAGE: Tagged hybrid predictor P(0) pc pc h(0:9) pc h(0:9) pc h(0:39) pc h(0:79) hash hash hash hash hash hash hash hash P() =? =? =? =? 5

COSC 6385 Computer Architecture. Instruction Level Parallelism

COSC 6385 Computer Architecture Instruction Level Parallelism Spring 2013 Instruction Level Parallelism Pipelining allows for overlapping the execution of instructions Limitations on the (pipelined) execution