UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 05 Title: Pipelining - Basic Principles Summary: (analysis of the instruction execution, implementation and performance analysis); (structural, data and control). 2010/2011 Nuno.Roma@ist.utl.pt
Architectures for Embedded Computing Pipelining: Basic Principles Prof. Nuno Roma ACE 2010/11 - DEI-IST 1 / 38 Previous Class In the previous class... Code Generation: Types of Assembly Instructions Control Instructions Compilers Role MIPS Logic Architecture Prof. Nuno Roma ACE 2010/11 - DEI-IST 2 / 38
Road Map Prof. Nuno Roma ACE 2010/11 - DEI-IST 3 / 38 Summary Today: : of the instruction execution analysis : Structural Data Control Bibliography: Computer Architecture: a Quantitative Approach, Sections A.1 - A.3 Prof. Nuno Roma ACE 2010/11 - DEI-IST 4 / 38
Prof. Nuno Roma ACE 2010/11 - DEI-IST 5 / 38 The Laundry Analogy for Pipelining Four loads, each one taking 4 30 min = 2 hours, to: Wash; Dry; Fold; Store. Total time: 4 loads 2 hours = 8 hours!!! Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 38
The Laundry Analogy for Pipelining Pipeline approach: Total time = 3.5 hours Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 38 Execution of an Assembly Instruction Execution phases of an Assembly instruction: 1.(F) Instruction Fetch Read the instruction to an internal register and increment the program counter (PC). Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 38
Execution of an Assembly Instruction Execution phases of an Assembly instruction: 1.(F) Instruction Fetch Read the instruction to an internal register and increment the program counter (PC). 2.(D) Instruction Decode Interpretation of the instruction encoding fields to determine the type of instruction; Copy of the operands to temporary registers. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 38 Execution of an Assembly Instruction Execution phases of an Assembly instruction: 1.(F) Instruction Fetch Read the instruction to an internal register and increment the program counter (PC). 2.(D) Instruction Decode Interpretation of the instruction encoding fields to determine the type of instruction; Copy of the operands to temporary registers. 3.(X) Execution Computation of the instruction result. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 38
Execution of an Assembly Instruction Execution phases of an Assembly instruction: 1.(F) Instruction Fetch Read the instruction to an internal register and increment the program counter (PC). 2.(D) Instruction Decode Interpretation of the instruction encoding fields to determine the type of instruction; Copy of the operands to temporary registers. 3.(X) Execution Computation of the instruction result. 4.(W) Write Back Write of the result in the destination specified by the instruction. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 38 Execution in a CISC Processor Structure of a CISC processor: Register Bank PC IR Address Bus Control Unit Op1 Op2 Data Bus ALU Main modules: Res Processing Unit Control Unit; Processing Unit: Register Bank and Arithmetic-Logic-Unit (ALU). Prof. Nuno Roma ACE 2010/11 - DEI-IST 8 / 38
Execution in a CISC Processor Register Bank PC IR Address Bus Control Unit Op1 Op2 Data Bus ALU Res Processing Unit Processing Unit repeatedly used in all phases of the instruction execution process: F D 1 D 2 X 1 X 2 X 3 X 4 W F D 1 D 2 D 3 X 1 W F D 1 X 1 X 2 X 3 W Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 38 Execution in a CISC Processor Register Bank PC IR Address Bus Control Unit Op1 Op2 Data Bus ALU Res Processing Unit Processing Unit repeatedly used in all phases of the instruction execution process: F D 1 D 2 X 1 X 2 X 3 X 4 W F D 1 D 2 D 3 X 1 W F D 1 X 1 X 2 X 3 W Instructions may be as complex as necessary; Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 38
Execution in a CISC Processor Register Bank PC IR Address Bus Control Unit Op1 Op2 Data Bus ALU Res Processing Unit Processing Unit repeatedly used in all phases of the instruction execution process: F D 1 D 2 X 1 X 2 X 3 X 4 W F D 1 D 2 D 3 X 1 W F D 1 X 1 X 2 X 3 W Instructions may be as complex as necessary; Difficult to parallelize the instruction execution process. Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 38 Characteristics of a RISC Processor Characteristics of a RISC Processor: All instructions take the same amount of time to execute; Simple instructions: only those implemented by the ALU; Only immediate and register addressing modes; Assembly instructions with rigid encoding formats. Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 38
CISC vs RISC Comparison F CISC RISC Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 38 CISC vs RISC Comparison CISC RISC F IR M[PC],PC+=instLen IR M[PC],PC++ D Prof. Nuno Roma ACE 2010/11 - DEI-IST 12 / 38
CISC vs RISC Comparison CISC RISC F IR M[PC],PC+=instLen IR M[PC],PC++ D Several different addressing modes X Only by register or immediate Prof. Nuno Roma ACE 2010/11 - DEI-IST 13 / 38 CISC vs RISC Comparison CISC RISC F IR M[PC],PC+=instLen IR M[PC],PC++ D Several different addressing modes X W Arbitrary sequence of operations in ALU Only by register or immediate Only one operation in ALU Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 38
CISC vs RISC Comparison CISC RISC F IR M[PC],PC+=instLen IR M[PC],PC++ D Several different addressing modes X W Arbitrary sequence of operations in ALU Result is written into a register or memory position Only by register or immediate Only one operation in ALU Result is written into a register or memory position Prof. Nuno Roma ACE 2010/11 - DEI-IST 15 / 38 Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 38
Processing Phases of MIPS Processor One additional phase for memory read and write: only used by load and store instructions. F - Instruction Fetch D - Instruction Decode X - Execution M - Memory Access W - Write-Back Each phase takes only one clock cycle. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 38 Processing Phases of MIPS Processor One additional phase for memory read and write: only used by load and store instructions. F - Instruction Fetch D - Instruction Decode X - Execution M - Memory Access W - Write-Back Each phase takes only one clock cycle. 1. (F) Fetch IR M[PC],PC PC+4 Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 38
Processing Phases of MIPS Processor 2. (D) Instruction Decode Decode the instruction Read operands from the register bank Sign extension of constants Prof. Nuno Roma ACE 2010/11 - DEI-IST 18 / 38 Processing Phases of MIPS Processor 3. (X) Execution ALU operations with 2 registers, ALU operations with 1 register and one constant, Effective address calculation. Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 38
Processing Phases of MIPS Processor 3. (X) Execution ALU operations with 2 registers, ALU operations with 1 register and one constant, Effective address calculation. 4. (M) Memory Access If load: read from data memory, If store: write to data memory, Branch resolution. Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 38 Processing Phases of MIPS Processor 3. (X) Execution ALU operations with 2 registers, ALU operations with 1 register and one constant, Effective address calculation. 4. (M) Memory Access If load: read from data memory, If store: write to data memory, Branch resolution. 5. (W) Write-Back Write the result in the register bank (either an ALU operation or a load instruction). Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 38
MIPS Processor Architecture Each execution phase corresponds to one pipeline stage; Each pipeline stage is characterized by an autonomous processing capability; Intermediate results are stored in registers between the pipeline stages; Processing speed is defined by the slowest pipeline stage. Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 38 MIPS Processor Architecture Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 38
Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 38 Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 i F D X M W i + 1 F D X M W i + 2 F D X M W i + 3 F D X M W i + 4 F D X M W Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 38
Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 i F D X M W i + 1 F D X M W i + 2 F D X M W i + 3 F D X M W i + 4 F D X M W All instructions must pass through all pipeline stages, either using it or not! Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 38 Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 38
Pipeline Speedup pipe = Average Time without Pipeline Average Time with Pipeline = CPI serial T clk serial CPI pipe T clk pipe = CPI serial CPI pipe T clk serial T clk pipe Ideal case: CPI pipe = 1 CPI serial = #stages Speedup pipe = #stages T clk serial T clk pipe Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 38 Pipeline Throughput Number of executed instructions per unit of time. That s the parameter we are interested!!! Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 38
Pipeline Throughput Number of executed instructions per unit of time. That s the parameter we are interested!!! Latency Amount of time each instruction takes to execute. Latency increases with the introduction of the pipeline... Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 38 Prof. Nuno Roma ACE 2010/11 - DEI-IST 27 / 38
Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 38 Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 38
Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Control Change on the sequence of instructions that is to be executed. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 38 Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Control Change on the sequence of instructions that is to be executed. The occurrence of a hazard implies an interruption of the execution of all pipeline stages before the one where the hazard has occurred: Stall. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 38
Example of a Structural Hazard Prof. Nuno Roma ACE 2010/11 - DEI-IST 29 / 38 Example of a Structural Hazard Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 10 LD F D X M W i + 1 F D X M W i + 2 F D X M W i + 3 F S F D X M W i + 4 F D X M W Prof. Nuno Roma ACE 2010/11 - DEI-IST 30 / 38
Example of a Data Hazard Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 38 Example of a Data Hazard Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 10 DADD R1,R2,R3 F D X M W DSUB R4,R1,R5 F D S D S D S D X M W AND R6,R1,R7 F S F S F S F D X M W OR R8,R1,R9 F D X M XOR R10,R1,R11 F D X Prof. Nuno Roma ACE 2010/11 - DEI-IST 32 / 38
Example of a Control Hazard Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 10 DADD R1,R2,R3 F D X M W BEQZ R4,ciclo F D X M W AND R6,R7,R8???? Prof. Nuno Roma ACE 2010/11 - DEI-IST 33 / 38 Example of a Control Hazard Prof. Nuno Roma ACE 2010/11 - DEI-IST 34 / 38
Example of a Control Hazard Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 10 DADD R1,R2,R3 F D X M W BEQZ R4,ciclo F D X M W AND R6,R7,R8???? Prof. Nuno Roma ACE 2010/11 - DEI-IST 35 / 38 Example of a Control Hazard Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 10 DADD R1,R2,R3 F D X M W BEQZ R4,ciclo F D X M W AND R6,R7,R8 F S F S F S F D X M W OR R8,R1,R9 F D X M Prof. Nuno Roma ACE 2010/11 - DEI-IST 35 / 38
Real But... Speedup pipe = CPI serial CPI pipe T clk serial T clk pipe CPI serial = #Stages CPI pipe = 1 + #Stalls Speedup pipe = #stages T clk serial T clk pipe 1 1+#Stalls Prof. Nuno Roma ACE 2010/11 - DEI-IST 36 / 38 Prof. Nuno Roma ACE 2010/11 - DEI-IST 37 / 38
of a program execution in a pipeline; Hazards in the pipeline: Structural hazards; Data hazards: Types of hazards; Overcoming the Stalls: By Software; By writing in opposite edges of the clock cycle; By data forwarding. Prof. Nuno Roma ACE 2010/11 - DEI-IST 38 / 38