ECE475/ECE4420 Computer Architecture L4: Advanced Issues in Pipelining Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcement Lab1 is released Start early we only have limited computing resources Reading: Appendix A.1 A.6 BRC (Big Red Chip) Contact info on blackboard Career Fair tomorrow 2 1
Roadmap Tricky issues in the 5-stage pipeline Handling exceptions Deeper pipeline More complex pipeline with multi-cycle operations 3 Exceptions Exceptions: interrupt instruction execution unexpectedly Common exceptions: I/O device interrupt OS system call Arithmetic overflow, FP anomaly Page fault Misaligned memory access Memory protection violation Illegal instruction Power / hardware failure 4 2
A Taxonomy of Exceptions Synchronous vs. asynchronous User- vs. hardware-triggered Maskable vs. nonmaskable (NMI) 5 A Taxonomy of Exceptions Within vs. between instructions Resume vs. terminate 6 3
Restartable Exceptions What do we need to do in order to resume after an exception? 7 Precise Exception It must appear as if an interrupt is taken between two instructions (say I i and I i+1 ) the effect of all instructions up to and including I i is totally complete no effect of any instruction after I i has taken place The interrupt handler either aborts the program or restarts it at I i+1. 8 4
Exceptions in Pipeline PC Inst. Mem D Decode E + M Data Mem W 9 Exceptions in Pipeline lw IF ID EX MEM WB add IF ID EX MEM WB How to handle multiple exceptions in the same cycle? 10 5
Exceptions in Pipeline lw IF ID EX MEM WB add IF ID EX MEM WB How to handle multiple exceptions for one instruction? 11 Exception Handling (In-Order Five-Stage Pipeline) PC Inst. Mem D Decode E + M Data Mem W PC Address Exceptions Illegal Opcode Overflow Data Addr Except 12 6
When Does State Change? An instruction is committed when it is guaranteed to complete Easy to restart if state has not been changed Simple for MIPS: MEM/WB VAX: auto-increment mode, state updated in middle of inst, need HW support to back out, undo roll back state changes Some architectures have string copy instructions updates memory cannot undo 100% general-purpose registers hold all state instruction continues after exception rather than restart 13 Implementation Details What if an exception is in a branch delay slot? Can we restart the instruction in the delay slot? 14 7
MIPS R4000 Eight-stage pipeline, high clock rate (superpipelined) IF IS RF EX DF DS TC WB IF select PC, start i$ access IS complete i$ access RF decode, register access, check i$ tag EX execution (ALU) DF start d$ access DS complete d$ access TC check d$ tag WB write back result to register file Memory access takes three cycles MEM 15 Deep Pipelines Pros and Cons 16 8
Limits of Pipelining Cannot increase pipeline depth forever hit ILP limits CPI eventually begins to increase due to stalls clock rate does not go down enough to compensate 17 Multicycle Operations: Why? Pipelining becomes complex when we want high performance in the presence of multi-cycle operations 18 9
Realistic Memory System Latency of access to the main memory is usually much greater than one cycle and often unpredictable Solving this problem is a central issue in computer architecture Common approaches to improving memory performance separate instruction and data memory ports no self-modifying code caches single cycle except in case of a miss stall interleaved memory multiple memory accesses bank conflicts split-phase memory operations out-of-order responses 19 Floating Point Unit Much more hardware than an integer unit Single-cycle floating point unit is a bad idea - why? 20 10
Function Unit Characteristics fully pipelined busy 1cyc 1cyc 1cyc accept partially pipelined busy 2 cyc 2 cyc accept Function units have internal pipeline registers 21 Complex Pipeline Structure ALU Mem IF ID WB GPR s FPR s Fadd Fmul Fdiv 22 11
New Challenges Structural conflicts at the execution stage if some FPU or memory unit is not pipelined and takes more than one cycle Structural conflicts at the write-back stage due to variable latencies of different function units Out-of-order write hazards due to variable latencies of different function units (WAW hazards) How to handle exceptions? 23 Structural Hazard Partially pipelined functional units Write-port conflict fmult fadd ld IF ID X1 X2 X3 X4 X5 X6 X7 WB IF ID X1 X2 X3 WB IF ID EX MM WB 24 12
Data Hazard 1 Read-After-Write hazard fmult f2, IF ID X1 X2 X3 X4 X5 X6 X7 WB fadd, f2 IF ID ** ** ** ** ** ** X1 X2 25 Data Hazard 2 Write-After-Write Hazard fadd f2, IF ID X1 X2 X3 X4 WB fld f2, IF ID EX MM WB 26 13
Maintaining Precise Exceptions fdiv f1, f2, f3 fadd f2, f2, f4 Scenario: fadd done, fdiv raises exception 27 Multicycle Hazards Summary Check for structural hazards unpipelined units divide write ports Check for RAW hazards if producer in flight, stall (apply transitive closure here) many stall cycles, even with full bypassing Check for WAW hazards if instruction in flight with same destination, stall How is all this accomplished? dynamic scheduling by scoreboard stay tuned 28 14