The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.
Pipeline
A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions add, sub, AND, OR and slt Branch on equal (beq)
Instruction Fetch Elements
Instruction Fetch
ALU Operations Elements Addr Data REGISTER FILE Data Write ADD R1, R2, R3
ADD R1, R2, R3 ALU Operations Elements
ADD R1, R2, R3 ALU Operations Elements
LW R1, -8(R2) Loads and Stores Elements
Branches Elements BEQ R1, R2, LABEL BEQ R1, R2, -16
Branches Elements BEQ R1, R2, LABEL BEQ R1, R2, -16
Memory and R-type Instructions
LW R1, -8(R2) Memory Instruction Load
SW R1, -8(R2) Memory Instruction Store
ADD R1, R2, R3 R Type Instruction ADD
The MIPS Datapath
BEQ R1, R2, -16 The MIPS Datapath BEQ
MIPS Datapath and Control Lines
Pipeline Stages Instruction Instruction Fetch Fetch (IF) (IF) ID: ID: Instruction Instruction decode/ decode/ Register Register file file read read EX: EX: Execution/ Execution/ Address Address Calculation Calculation MEM: MEM: Memory Memory Access Access WB: WB: Write Write Back Back
Pipelined Datapath Instruction Instruction Fetch Fetch (IF) (IF) ID: ID: Instruction Instruction decode/ decode/ Register Register file file read read EX: EX: Execution/ Execution/ Address Address Calculation Calculation MEM: MEM: Memory Memory Access Access WB: WB: Write Write Back Back
Pipelined vs. Nonpipelined Implementation
Pipelined vs. Nonpipelined Implementation Ratio of total execution times between the two versions for 10^6 instructions? Pipelining increases the instruction throughput opposed to individual instruction execution time. IF ID EX MEM WB
Speedup of the Pipeline The speedup of a k stage pipelined processor over an unpipelined processor S k = T unpipelined T pipelined = n k k+(n 1) n: number of instructions in the program. k: number of pipeline stages
Efficiency of the Pipeline Percentage of stages accomplishing tasks related to the instruction in execution η= No. of Instructions Instruction Execution Time η= n k+(n 1) n: number of instructions in the program. k: number of pipeline stages
Throughput of the Pipeline Number of tasks completed in unit time (one second) w=η f f: frequency of operation
Pipeline Hazards Hazard: n. An unavoidable danger or risk, even though often foreseeable. Situations that prevent the next instruction in the instruction stream from being executing during its designated clock cycle Reduce the performance from the ideal speedup gained by pipelining
Structural Hazard 1 2 3 4 5 6 7 8 9 i1 i2 i3 i4 MEM ID EX MEM WB MEM ID EX MEM WB MEM ID EX MEM WB MEM ID EX MEM WB i5... HAZARD!!! Lack of resources Solution: Increase resources MEM ID EX MEM WB Use of separate Data and Instruction memories in the MIPS pipeline
Data Hazard 1 2 3 4 5 6 7 8 9 ADD R1, R2, R3 IF ID EX MEM WB SUB R4, R1, R5 IF ID EX smem WB WRONG! Data (input operands) required by the instruction are not ready/available Data dependence RAW, WAR, WAW dependences ADD R1, R2, R3 SUB R2, R4, R5 ADD R1, R2, R3 SUB R1, R4, R5
Data Hazard DADD DSUB AND OR XOR R1,R2,R3 R4,R1,R5 R6,R1,R7 R8,R1,R9 R10,R1,R11 Time (clock cycles) DADD IM REG ALU DM REG DSUB IM REG ALU DM REG AND IM REG ALU DM REG OR IM REG ALU DM XOR IM REG ALU
Avoiding Data Hazards Forwarding DADD DSUB AND OR XOR R1,R2,R3 R4,R1,R5 R6,R1,R7 R8,R1,R9 R10,R1,R11 Time (clock cycles) DADD IM REG ALU DM REG DSUB IM REG ALU DM REG AND IM REG ALU DM REG OR IM REG ALU DM XOR IM REG ALU
Pipeline without Forwarding
Pipeline with Forwarding
Data Hazard Load Instruction LD DSUB AND OR R1,0(R2) R4,R1,R5 R6,R1,R7 R8,R1,R9 Time (clock cycles) LD IM REG ALU DM REG DSUB IM REG ALU DM REG AND IM REG ALU DM REG OR IM REG ALU DM
Data Hazards Stalls LD DSUB AND OR R1,0(R2) R4,R1,R5 R6,R1,R7 R8,R1,R9 Time (clock cycles) LD IM REG ALU DM REG DSUB IM REG ALU ALU DM REG AND IM REG ALU ALU DM OR IM REG ALU ALU
Data Hazard Solutions Data Forwarding Instruction Reordering
Control Hazard Arise from the pipelining of branches and other instructions that change the PC Also called Branch Hazards
Branch Hazards Time 1 2 3 4 5 6 (clock cycles) 7 8 9 BEQ IF ID EX MEM WB ADD IF ID EX MEM WB Branch Successor Branch Successor + 1 Branch Successor + 2 IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Assumption: Branch condition evaluation completed in in the the ID ID stage
Reducing Pipeline Branch Penalties Freeze the pipeline Predict Taken Predict Untaken Fill Branch Delay Slot Time (clock cycles) 1 2 3 4 5 6 7 8 9 i BEQ IF ID EX MEM WB i-1 AND IF ID EX MEM WB i+16 Branch Successor IF ID EX MEM WB i+17 Branch Successor + 1 IF ID EX MEM WB
Dynamic Branch Prediction Branch prediction buffers Single bit predictors Change prediction with branch behaviour No. of wrong predictions? BRANCH PREDICTION BUFFER T T T T N T T T T T T T T T T T T Wrong Predictions PC Prediction 0x0100 1 0x0154 0 0x0210 1... 1
Dynamic Branch Prediction 2-bit predictors 00 0x0100 0x0154 0x0210 Branch Prediction Buffer 11 10 11 11 11 10 00 01