CS6290 Speculation Recovery

Size: px

Start display at page:

Download "CS6290 Speculation Recovery"

Wesley Marshall
5 years ago
Views:

1 CS6290 Speculation Recovery

2 Loose Ends Up to now: Techniques for handling register dependencies Register renaming for WR, WW Tomasulo s algorithm for scheduling RW ranch prediction for control dependencies Still need to address: Memory dependencies Mispredictions, Faults, Exceptions

3 Speculative Fetch/Non-Spec Exec Fetch instructions, even fetch past a branch, but don t exec right away fetch decode issue exec write commit 1. DIV R1 = R2 / R3 2. DD R4 = R5 + R6 3. EQZ R1, 4. SU R5 = R4 R3 5. MUL R7 = R5 * R2 6. MUL R8 = R7 * R3 I E W C

4 ranch Prediction/Speculative Execution When we hit a branch, guess if it s T or NT DD ranch Guess T T C NT Q LOD SU LOD DIV XOR DD Keep scheduling and executing instructions as we knew the outcome was T D R STORE LOD ranch DD DD SU STORE MUL Sometime later, if we messed up Just throw it all out nd fetch the correct instructions

5 gain, with Speculative Execution ssume fetched direction is correct fetch decode issue exec write commit 1. DIV R1 = R2 / R3 2. DD R4 = R5 + R6 3. EQZ R1, 4. SU R5 = R4 R3 5. MUL R7 = R5 * R2 6. MUL R8 = R7 * R3 I E W C

6 ranch Misprediction Recovery RF RF state corresponds to state prior to oldest non-committed instruction br?!? RT s instructions are processed, the RT corresponds to the register mapping after the most recently renamed instruction On a branch misprediction, wrong-path instructions are flushed from the machine The RT is left with an invalid set of mappings corresponding to the wrongpath instruction state

7 Solution 1: Stall and Drain X br RF RT llow all instructions to execute and commit; RF corresponds to last committed instruction RF now corresponds to the state right before the next instruction to be renamed ()?!? Reset RT so that all mappings Pros: Very simple refer to the RF to implement Resume renaming the new correctpath fetch; instructions from fetch Cons: Performance Correct path instructions loss from due can t to rename stalls because RT is wrong

8 Solution 2: Checkpointing br br br br RF RT t each branch, make a copy of the RT (register mapping at the time of the branch) RT RT RT RT On a misprediction: 1. flush wrong-path instructions 2. deallocate RT checkpoints 3. recover RT from checkpoint 4. resume renaming Checkpoint Free Pool

9 Speculative Execution is OK RO maintains program order RO temporarily stores results If we screw something up, only the RO knows, but no architected state is affected Register rename recovery makes sure we resume with correct register mapping If we mess up, we: 1. Can undo the effects 2. Know how to resume

10 Commit, Exceptions What happens if a speculatively executed instruction faults?, Commit ranch mispred C C W D D X E Divide by Zero! E Divide by Zero! Outside world sees:,, fault! Fault should never have happened! Should have been:,, C, D, fault!

11 Treat Fault Like a Result Regular register results written to RO until Instruction is oldest for in-order state update Instruction is known to be non-speculative Do the same with faults! t exec, make note of any faults, but don t expose to outside world If the instruction gets to commit, then expose the fault

12 Example LOD R1 = 0[R2] (commit) DD R3 = R1 + R4 LOD Resolved Miss P1 imm R2 E C F X X C SU R1 = R5 R6 DD SU P2 P3 P1 R5 R4 R6 X X X X D DIV R4 = R7 / R1 DIV LOD P4 P5 R7 imm P3 R7 X X X X X X E LOD R6 = 0[R7] Fault! (3 commits) Flush rest of RO, Start fetching Fault handler Divide by zero Fault deferred until architecturally Now raise correct fault point. Other fault never happened

13 Executing Memory Instructions Load R3 = 0[R6] dd R7 = R3 + R9 Store R4 0[R7] Sub R1 = R1 R2 Load R8 = 0[R1] Issue Issue Issue Issue Issue Miss Cache serviced Miss! Cache Hit! ut there was a later load If R1!= R7, then Load R8 gets correct value from cache If R1 == R7, then Load R8 should have gotten value from the Store, but it didn t!

14 Out-of-Order Load Execution So don t let loads execute out-oforder! ILP = 8/7 = 1.14 E H C D C D F E G F ILP = 8/3 = 2.67 G H Ok, maybe not a good idea. No support for OOO load execution can reduce your ILP

15 What Else Could Happen? bar D$ Some sort of data forwarding mechanism Value from cache is stale D$ No problem. No problem. Uh oh. C Should have used s store value bar Luckily, this usually can t even happen Uh oh. Uh oh.

16 Memory Disambiguation Problem Why can t this happen with non-memory insts? Operand specifiers in non-memory insts are absolute R1 refers to one specific location Operand specifiers in memory insts are ambiguous R1 refers to a memory location specified by the value of R1. s pointers change, so does this location.

17 Two Problems Memory disambiguation re there any earlier unexecuted stores to the same address as myself? (I m a load) inary question: answer is yes or no Store-to-load forwarding problem Which earlier store do I get my value from? (I m a load) Which later load(s) do I forward my value to? (I m a store) Non-binary question: answer is one or more instruction identifiers

18 Load Store Queue (LSQ) Oldest Youngest L/S PC Seq ddr Value L 0xF x S 0xF04C x S 0xF x L 0xF x L 0xF x L 0xF x S 0xF85C x L 0xF x L 0xF x L 0xF63C x Data Cache 0x x x x Disambiguation: loads cannot execute until all earlier store addresses computed Forwarding: broadcast/search entire LSQ for match

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Out-of-Order Memory Access Dynamic Scheduling Summary Out-of-order execution: a performance technique Feature I: Dynamic scheduling (io OoO) Performance piece: re-arrange