Unresolved data hazards 81
Unresolved data hazards Arithmetic instructions following a load, and reading the register updated by the load: if (ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt))) stall the pipeline 82
How do we stall a pipeline? A bubble is actually am instruction converted to nop (no operation) instruction. The nop is implemented by zeroing out all control signals of an instruction in flight. Goal is to freeze the PC and prevent the instruction from having an 83 effect. The pipeline will refetch the instruction in the next cycle!
How do we stall a pipeline? We need a hazard detection unit to nullify instructions in the event of a hazard. Hazards are detected between loads and dependent arithmetic instructions. Detection unit zeros EX,MEM and WM control signals, so that instruction 84 does not take any effect. Detection unit also controls PCWrite signal to freeze PC
The trouble with branches Case of control dependence. We don't know which instruction actually follows the branch, until the MEM stage of the branch instruction. The pipeline however will have already fetched 3 instructions from the not-taken path! 85
Solutions for branch hazards Assume always not taken Fetch the next instruction in program order When the branch is resolved: If the branch is not taken, keep going, no problem If the branch is taken, we need to flush 3 instructions in the pipeline We use nops to discard instructions in the IF, ID, EX stage. In the case of branches, we need to flush the instructions from the pipeline, so that they don't have any effect 86
Solutions for branch hazards Say, we move branch logic earlier in the pipeline: ID stage is the earliest time possible Need circuitry to calculate branch target address Easy, since we have PC and offset from the IF stage Need circuitry to evaluate branch condition: Harder! Branch condition may be dependent on earlier instructions! Moving the condition checking earlier introduces more data hazards between the branch and earlier instructions on which the branch depends! 87
Solutions for branch hazards Say, we move branch logic earlier in the pipeline: Need to take care of data hazards before the branch Forwarding from the EX/MEM and the MEM/WB stage, if the branch depends on prior instruction Data hazard can still occur, if the immediately preceding instruction generates a register which is used for the comparison in the branch. At decode stage we need to decide whether we should bypass the ALU and use the dedicated branch condition logic 88
Reducing branch delay Moving branch target and condition calculation to the ID stage, reduces the branch stall in the taken case to one cycle Still create a bubble in the pipeline Can use this bubble if we have a useful instruction to execute and the instruction is independent of the branch (neither in the taken path, nor in the non-taken path, i.e. non conditionally executed). This is called using a branch delay slot 89
Solutions for branch hazards Branch prediction using one bit: When we see a branch, store the PC of the branch instruction and the outcome of the branch in a table When we see the branch again, search the table and predict that the next outcome of the branch will be the last outcome of the branch How good is this? 90
Solutions for branch hazards Branch prediction using one bit: Consider loop with 10 iterations executed twice in the program First time through the loop, first iteration, nothing in the branch prediction table, so prediction is random, but we store the actual outcome (taken) in the table Next eight times through the loop, predict taken Last time through the loop, predict taken, branch not taken. Prediction accuracy 80-90%. Second time through the loop, prediction accuracy 80%, since last outcome was not taken! 91
Improving branch prediction Introduce a hysteresis, i.e. wait to see the same outcome of the branch twice, before you flip the prediction. Will resolve loops and other predictable branches, but still not perfect. 92
Pipelined data path with hazards Hazard detection unit can modify PC to introduce bubble. Control flushes instructions in the IF stage to avoid branch hazards. Hazard detection unit zeroes control signals to prevent later instructions from updating state.. 93
The trouble with exceptions An arithmetic exception resembles a taken branch Control must freeze the current instruction Control needs to jump to an exception handled in the OS Need to flush instructions in flight so that they do not commit state. Need to save program counter (use MIPS EPC register) Need to handle both recoverable and irrecoverable exceptions Exceptions may occur at different stages during instructions (e.g. illegal instruction vs. arithmetic overflow, vs. bad memory address) 94
Pipelined datapath with exceptions Reuse much of branch hazard hardware to flush instructions. Since we have a new source of a hazard, we use multiplexers to zero out control lines in the ID, EX, MEM stages. The ALU zero signal needs to be fed back to the control unit 95
Other considerations with exceptions Possible sources of exceptions: I/O device interrupt (recoverable) OS system call (e.g. file I/O, recoverable) Bad instruction (unrecoverable) Hardware malfunction (unrecoverable) We stop instructions in the middle of execution Programmer needs to see instruction either committed or uncommitted. If exception is recoverable, then we must restart the interrupted instruction and let it commit its result. 96
Other considerations with exceptions Recovering state: When jumping to an exception handler we need to save the state of the program in memory. State includes EPC and other registers in use by the program at the time of the exception State restored upon return from the exception (if we are to recover from it) The very same functionality, called a context switch, can be used to switch between processes (programs) in the processor Context switching can improve processor utilization, e.g. by overlapping I/O with other computation 97
A glimpse of state-of-the-art Advanced pipelining techniques: Multiple ALUs, functional units, multi-port memories and register files Processors able to execute multiple independent instructions in the same stage of the pipeline More hardware enables multiple instruction issue and CPI < 1 (IPC > 1) Hardware performs dynamic dependence analysis between instructions Hardware uses more advanced forms of prediction, not only for branches, but also for data read from memory 98