CS252 S05. Outline. Dynamic Branch Prediction. Static Branch Prediction. Dynamic Branch Prediction. Dynamic Branch Prediction

Size: px

Start display at page:

Download "CS252 S05. Outline. Dynamic Branch Prediction. Static Branch Prediction. Dynamic Branch Prediction. Dynamic Branch Prediction"

Dorthy Gilmore
5 years ago
Views:

1 Outline CMSC Computer Systems Architecture Lecture 9 Instruction Level Parallelism (Static & Dynamic Branch ion) ILP Compiler techniques to increase ILP Loop Unrolling Static Branch ion Dynamic Branch ion Overcoming Data Hazards with Dynamic Scheduling omasulo Algorithm Conclusion CMSC - 8 (from Patterson) Static Branch ion Previously scheduled code around delayed branch o reorder code around branches Need to predict branch statically during compile Simplest scheme is to predict a branch as taken Average misprediction = untaken branch frequency = % SPEC9 More accurate scheme predicts branches using profile information collected from earlier runs, and modify prediction based on last run: Misprediction Rate % % % % % % % % compress eqntott 8% espresso gcc Integer CMSC - 8 (from Patterson) % % li doduc ear hydrod H&P Figure. % 6% 9% % mdljdp Floating Point sucor % Dynamic Branch ion Why does prediction work? Underlying algorithm has regularities Data that is being operated on has regularities Instruction sequence has redundancies that are artifacts of way that humans/compilers think about problems Is dynamic branch prediction better than static branch prediction? Seems to be here are a small number of important branches in programs that have dynamic behavior CMSC - 8 (from Patterson) Dynamic Branch ion Performance = ƒ(accuracy, cost of misprediction) Branch History able (BH): table of -bit values indexed by lower bits of PC address index Says whether or not branch taken last time No address check (may refer to wrong branch) N aken N Not aken Problem: in a loop, -bit BH will cause two mispredictions (avg is 9 loop iterations before exit): End of loop, when it exits instead of looping as before First time through loop on next time through code, when it predicts exit instead of looping Dynamic Branch ion Solution: -bit prediction scheme where predictor changes prediction only if it mispredicts twice in a row aken Not aken Red: stop, not taken Green: go, taken H&P Figure. Adds hysteresis to decision making process N N N aken N Not aken CMSC - 8 (from Patterson) CMSC - 8 (from Patterson) 6 CS S

2 BH Accuracy Correlated Branch ion Mispredict because either: Wrong guess for that branch Got branch history of wrong branch when indexing into the table 96 entry table: Misprediction Rate SPEC89 % 8% 6% % % % 8% 6% % % % eqntott 8% % espresso gcc Integer CMSC - 8 (from Patterson) % % 9% li spice doduc % H&P Figure. spice 9% 9% fpppp Floating Point matrix % % nasa7 Idea record m most recently executed branches as taken or not taken, and use that pattern to select the proper n-bit branch history table In general, (m,n) predictor means record last m branches to select between m history tables, each with n-bit counters hus, old -bit BH is a (,) predictor Global Branch History: m-bit shift register keeping /N status of last m branches. Each entry in table has m n-bit predictors Also known as -level adaptive predictor Depends on previous branches! 7 CMSC - 8 (from Patterson) if (aa == ) aa = ; if (bb == ) bb = ; if (aa!= bb) { 8 Correlating Branches Correlated Branch ion (,) predictor w/ Behavior of recent branches selects between four predictions of next branch, updating just that prediction Or, addr bits + history bits give us 6-bit index into 6 = 6 predictors, each having two bits 8 total bits. Global branch history CMSC - 8 (from Patterson) Branch address ion -bits per branch predictor 9 Possible choices Local history + branch address Global branch history + branch address Global branch history only (no branch address)» Ignores branch instruction Branch address Global branch history Index into or Local branch history or Calculations Accuracy of Different Schemes 96-entry (,) predictor (i.e., -bit BH) k x = 8k bits k = address bits How to use the same # bits w/ a (,) predictor? 8k bits w/ -bit BH means k BHs the (, ) implies an entry has four BHs k entries, i.e. a (,) predictor w/ entries Frequency of Mispredictions % 8% 6% % % % 8% 6% % % % SPEC89 96 Entries -bit BH Unlimited Entries -bit BH Entries (,) BH % 6% 6% % % % % nasa7 matrix tomcatv doducd spice fpppp gcc H&P Figure.7 6% % % expresso eqntott li,96 entries: -bits per entry Unlimited entries: -bits/entry, entries (,) CMSC - 8 (from Patterson) CMSC - 8 (from Patterson) CS S

ournament ors Multilevel branch predictor Use n-bit saturating counter to choose between predictors Usually choice is between global and local predictors or correct. or incorrect.

3 ournament ors Multilevel branch predictor Use n-bit saturating counter to choose between predictors Usually choice is between global and local predictors or correct. or incorrect. CMSC - 8 (from Patterson) N-bit Saturating Counter Used to choose between predictors X & Y N-bit counter value between and n - Counter operations Increment by (up to n -)» If X is correct & Y is incorrect Decrement by (down to )» If Y is correct & X is incorrect Choose predictor X if counter > n-, Y otherwise Can be used as predictor (X = taken, Y = not taken) = taken N = not taken ournament or : DEC Alpha 6 ournament predictor using K -bit counters 8K indexed by local branch address. Chooses between: Global predictor K entries indexed by history of last branches ( = K) 8K Each entry is a standard -bit predictor Local predictor K Local history table: K -bit entries recording last branches, index by branch address he pattern of the last occurrences of that K particular branch used to index table of K entries with -bit saturating counters otal size of predictor = 8K + 8K + K + K = 9K CMSC - 8 (from Patterson) (,) or B: BNEZ // branch B: BNEZ // branch B:,N,,N, B:,,,,N or Branch ion N N N N N aken N or ion based on state of predictor N Branch ion N Not aken N (,) or (,) or w/ Saturating Counter B: BNEZ // branch B: BNEZ // branch B:,N,,N, B:,,,,N aken Not aken N N aken N Not aken N B: BNEZ // branch B: BNEZ // branch B:,N,,N, B:,,,,N or Branch ion N N N N N N N or Branch ion N N N or Branch ion N N N N N N N or Branch ion N N N CS S

4 (,) or w/ Global History + Branch B: BNEZ // branch B: BNEZ // branch B:,N,,N, B:,,,,N or Branch ion N N aken N or N P / P Last global branch Not taken / aken Choose predictor based on last global branch action Not aken Branch ion N (,) or w/ Global History + Branch B: BNEZ // branch B: BNEZ // branch B:,N,,N, B:,,,,N or / / / / / / Branch ion N N N N N N aken N or / / / / / / N P / P Last global branch Not taken / aken Branch ion N N Not aken N (,) or w/ Local History + Branch B: BNEZ // branch B: BNEZ // branch B:,N,,N, B:,,,,N or Branch ion N N aken N or N P / P Last local branch Not taken / aken Choose predictor based on last local branch action Not aken Branch ion N (,) or w/ Local History + Branch B: BNEZ // branch B: BNEZ // branch B:,N,,N, B:,,,,N or / / / / / / Branch ion N N N N N aken N or / / / / / / N P / P Last local branch Not taken / aken Branch ion N N Not aken N (,) Global or (no Branch Addr) B: BNEZ // branch N B: BNEZ // branch N aken Not aken B:,N,,N, B:,,,,N P / P / P / P History = / / / Branch actions stored in Global History Branch Branch Iter History or ion History or ion?/?/?/? Same?/?/?/??/?/?/? ors! N?/?/?/??/?/?/??/?/?/??/?/?/? N?/?/?/??/?/?/??/?/?/? N Exit History based on last global branch actions; chose predictor based on history (,) Global or (no Branch Addr) N B: BNEZ // branch B: BNEZ // branch N aken Not aken B:,N,,N, B:,,,,N P / P / P / P History = / / / Iter History Exit or /// /// /// /// /// /// Branch ion N N N N History or /// /// /// /// /// Branch ion N N N N N CS S

(,) Global or (no Branch Addr) N aken aken B: BNEZ // branch N N B: BNEZ // branch Not aken Not aken B:,N,,N, N B:,,,,N P / P / P / P History = / / / ournament or -bit tournament predictor Indexed by

5 (,) Global or (no Branch Addr) N aken aken B: BNEZ // branch N N B: BNEZ // branch Not aken Not aken B:,N,,N, N B:,,,,N P / P / P / P History = / / / ournament or -bit tournament predictor Indexed by branch address Chooses between two predictors. (,) Global or. (,) or w/ Local History Iter Exit History or /// /// /// /// /// /// Branch ion N N N N N N History or /// /// /// /// /// Branch ion N N N N N N Iter Exit, N N N N, N N N Branch or N N, N N N N N, N N Branch or N Iter Exit ournament or -bit tournament predictor Indexed by branch address Chooses between two predictors. (,) Global or. (,) or w/ Local History, N N N N Branch, or N N N N N N N N N, N N N N N, N N Branch or N N N N N Comparing ors (H&P Fig..8) Advantage of tournament predictor is ability to select the right predictor for a particular branch Particularly crucial for integer benchmarks. A typical tournament predictor will select the global predictor almost % of the time for the SPEC integer benchmarks and less than % of the time for the SPEC FP benchmarks CMSC - 8 (from Patterson) 8 Pentium Misprediction Rate (per instructions, not per branch) 6% misprediction rate per branch SPECint (9% of IN instructions are branch) % misprediction rate per branch SPECfp (% of FP instructions are branch) H&P Figure.8 Branch arget Buffers (BB) Branch target calculation is costly and stalls the instruction fetch. BB stores PCs the same way as caches he PC of a branch is sent to the BB When a match is found the corresponding ed PC is returned If the branch was predicted taken, instruction fetch continues at the returned predicted PC SPECint SPECfp CMSC - 8 (from Patterson) 9 CMSC - 8 (from Patterson) CS S

6 Branch arget Buffers Dynamic Branch ion Summary H&P Figure. ion becoming important part of execution Branch History able: bits for loop accuracy Correlation: Recently executed branches correlated with next branch Either different branches (GA) Or different executions of same branches (PA) ournament predictors take insight to next level, by using multiple predictors Usually one based on global information and one based on local information, and combining them with a selector In 6, tournament predictors using K bits are in processors like the Power and Pentium Branch arget Buffer: include branch address & prediction CMSC - 8 (from Patterson) CMSC - 8 (from Patterson) CS S

Administrivia. CMSC 411 Computer Systems Architecture Lecture 14 Instruction Level Parallelism (cont.) Control Dependencies

Administrivia. CMSC 411 Computer Systems Architecture Lecture 14 Instruction Level Parallelism (cont.) Control Dependencies Administrivia CMSC 411 Computer Systems Architecture Lecture 14 Instruction Level Parallelism (cont.) HW #3, on memory hierarchy, due Tuesday Continue reading Chapter 3 of H&P Alan Sussman als@cs.umd.edu