CS422 Computer Architecture

Size: px

Start display at page:

Download "CS422 Computer Architecture"

Tamsin Powers
5 years ago
Views:

1 CS422 Computer Architecture Spring 2004 Lecture 14, 18 Feb 2004 Bhaskaran Raman Department of CSE IIT Kanpur

2 Dealing with Control Hazards Software techniques: Branch delay slots Software branch prediction Canceling or nullifying branches Misprediction rates can be high Worse if multiple issue per cycle Hence, hardware/dynamic branch prediction

3 Branch Prediction Buffer PC --> Taken/Not-Taken (T/NT) mapping Can use just the last few bits of PC Prediction may be that of some other branch Ok since correctness is not affected Shortcoming of this prediction scheme: Branch mispredicted twice for each execution of a loop Bad if loop is small for(int i = 0; i < 10; i++) { } x[i] = x[i] + C;

4 Two-Bit Predictor Have to mispredict twice before changing prediction Built in hysteresis General case is an n-bit predictor 0 to (2^n)-1 saturating counter 0 to (2^[n-1])-1 predict as taken 2^[n-1] to (2^n)-1 predict as not-taken Experimental studies: 2-bit as good as n-bit

5 Implementing Branch Prediction Buffers Implementing branch prediction buffers Small cache accessed along with the instruction in IF Or, additional 2 bits in instruction cache Note: branch prediction buffer not useful for DLX pipeline Branch target not known earlier than branch condition

6 Prediction Performance Misprediction rate 18.00% 17.00% 16.00% 15.00% 14.00% 13.00% 12.00% 11.00% 10.00% 9.00% 8.00% 7.00% 6.00% 5.00% 4.00% 3.00% 2.00% 1.00% 0.00% Nasa7 Matrix30 0 Doduc Spice Fpppp Gcc Espre sso 4096 entries in the prediction buffer T om- Eqntott Li SPEC89, IBM Power architecture

7 Improving Branch Prediction Two ways: increase buffer size, improve accuracy Misprediction rate 18.00% 17.00% 16.00% 15.00% 14.00% 13.00% 12.00% 11.00% 10.00% 9.00% 8.00% 7.00% 6.00% 5.00% 4.00% 3.00% 2.00% 1.00% 0.00% Nasa 7 Matrix30 0 Do duc 4096 entries Inf. entries Spice Fppp p Gcc Espr esso T omcatv Eqntott Li

8 Improving Prediction Accuracy Predict branches based on outcomes of recent other branches if(aa == 2) { } aa = 0; if(bb == 2) { } bb = 0; if(aa == bb) { // Do something Correlating, } or two-level predictor

9 Two-Level Predictor There are effectively two predictors for each branch: Depending on whether previous branch is T/NT Prediction bits Prediction if last branch NT Prediction if last branch T NT/NT NT NT NT/T NT T T/NT T NT T/T T T

10 Two-Level Predictor (continued) Last predictor was a (1,1) predictor One bit each of history, and prediction General case is (m,n) predictor m bits of history, n bits of prediction How to implement? Have an m-bit shift register

11 Cost of Two-Level Predictor Number of bits required: Num. branch entries x 2^m x n How many bits in 4096 (0,2) predictor? 8K How many branch entries for an 8K (2,2) predictor? 1K

12 Performance of (2,2) Predictor 4096 entries; (0,2) Inf. entries; (0,2) 1K entries; (2,2) 18.00% 16.00% Misprediction rate 14.00% 12.00% 10.00% 8.00% 6.00% 4.00% 2.00% 0.00% Nasa7 Matrix30 0 Doduc Spice Fpppp Gcc Espre sso T om- Eqntott Li

13 Branch Target Buffer Branch prediction buffer is not useful for DLX Need to know target address by the end of IF Store branch target address also Branch target buffer, or cache Access branch target buffer in IF cycle Hit ==> predicted branch target known at the end of IF We also need to know if the branch is predicted T/NT

14 Branch Target Buffer (continued) Lookup based on PC Predicted target No entry found ==> (Target = PC+4) Exact match of PC is important Since we are predicting even before knowing that it is a branch instruction Hardware is similar to a cache Need to store predicted PC only for taken predictions

15 Steps in Using a Target Buffer Access Instn. Cache and target buffer Use predicted PC Mispredicted branch; restart fetch; delete buffer entry Entry found? Yes No A taken branch? No A taken branch? Yes Normal execution No Yes Make new target buffer entry Correct prediction, proceed IF ID EX

16 Penalties in Branch Prediction Buffer hit? Branch taken? Penalty Yes Yes 0 Yes No 2 No - 2 Given a prediction accuracy of p, a buffer hitrate of h, and a taken branch frequency of f, what is the branch penalty? h x (1-p) x 2 + (1-h) x f x 2

17 Storing Target Instructions Directly store instructions instead of target address Target buffer access is now allowed to take longer Or, branch folding can be achieved Replace fetched instruction with that found in the target buffer entry Zero cycle unconditional branch; may be conditional as well

Dynamic Hardware Prediction. Basic Branch Prediction Buffers. N-bit Branch Prediction Buffers

Dynamic Hardware Prediction. Basic Branch Prediction Buffers. N-bit Branch Prediction Buffers Dynamic Hardware Prediction Importance of control dependences Branches and jumps are frequent Limiting factor as ILP increases (Amdahl s law) Schemes to attack control dependences Static Basic (stall the