EE2011 Computer Organization Lecture 10: Enhancing Performance with Pipelining ~ Pipelined Datapath

EE2011 Computer Organization Lecture 10: Enhancing Performance with Pipelining ~ Pipelined Datapath Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw May 2018

Pipelined Datapath (Ch. 4.6) Pipelined Datapath Computer Organization, 106/2, EE/CGU, W.Y. Lin 2

Basic Idea for Pipelined Datapath ~ from Single-cycle Datapath (Fig. 4.33) WB Let s borrow the path from Single-cycle design as much as possible. Why? What do we need to add to actually split the path into stages? => Add internal registers, i.e. Pipeline Registers, to hold internal between each stage in the pipeline. Computer Organization, 106/2, EE/CGU, W.Y. Lin 3

Pipelined Datapath (Fig. 4.35) 64 bits 128 bits 97 bits 64 bits Each pipeline register has to be large enough to hold all the produced in previous cycle and being passing to the next stage. Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem? Computer Organization, 106/2, EE/CGU, W.Y. Lin 4

Details of Pipeline Registers IF/ID (64 bits) IF/ID.IR : register for instruction code (32 bits) IF/ID.PC : register for Incremented Program Counter (32 bits) ID/EX (128 bits) ID/EX.RegRS : register for Rs value (32 bits) ID/EX.RegRT : register for Rt value (32 bits) ID/EX.PC : register for the passage of incremented Program Counter (32 bits) ID/EX.S32 : register for 32-bit sign-extension (32 bits) EX/MEM (97 bits) EX/MEM.BrAddr = register for the Computed Branch Target Address (32 bits) EX/MEM.Zero = register for register compared result (1 bit) EX/MEM.ALUResu = register for ALU result (32 bits) EX/MEM.RegRT = register for the passage of Rt value (32 bits) MEM/WB (64 bits) MEM/WB.DMData = register for the from memory (32 bits) MEM/WB.ALUResu = register for the ALU result passage (32 bits) Computer Organization, 106/2, EE/CGU, W.Y. Lin 5

Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined path Single-clock-cycle pipeline diagram Shows pipeline usage in a single cycle Highlight resources used c.f. multi-clock-cycle diagram Graph of operation over time We ll look at single-clock-cycle diagrams for load & store Computer Organization, 106/2, EE/CGU, W.Y. Lin 6

IF for Load, Store, (Fig. 4.36) Computer Organization, 106/2, EE/CGU, W.Y. Lin 7

ID for Load, Store, (Fig. 4.36) Computer Organization, 106/2, EE/CGU, W.Y. Lin 8

EX for Load (Fig. 4.37) Computer Organization, 106/2, EE/CGU, W.Y. Lin 9

MEM for Load (Fig. 4.38) Computer Organization, 106/2, EE/CGU, W.Y. Lin 10

WB for Load (Fig. 4.38) Computer Organization, 106/2, EE/CGU, W.Y. Lin 11

EX for Store (Fig. 4.39) Computer Organization, 106/2, EE/CGU, W.Y. Lin 12

MEM for Store (Fig. 4.40) Computer Organization, 106/2, EE/CGU, W.Y. Lin 13

WB for Store (Fig. 4.40) Computer Organization, 106/2, EE/CGU, W.Y. Lin 14

Example Cycle 1 Addr: Instruction 8000: add $1, $2, $3 8004: lw $4, 4($5) 8008: sw $6, 8($5) 8012: and $7, $2, $3 8016: sub $8, $0, $9 add $1, $2, $3 4 Add 8004 IF/ID ID/EX EX/MEM MEM/WB Shift left 2 Add Add result Reg# Value 0 0 1 0 2 20 3 50 8004 8000 PC 8000 Address 0 2 3 1 0 32 Instruction memory register 1 register 2 Registers register 1 2 Zero ALU ALU result Address Data memory 4 10 5 600 6 88 Actions during the cycle I_Mem[8000] PC + 4 16 Sign 32 extend 7 1 8 2 9 3 Computer Organization, 99/2, EE/CGU, W.Y. Lin 15

Example Cycle 2 Addr: Instruction 8000: add $1, $2, $3 8004: lw $4, 4($5) 8008: sw $6, 8($5) 8012: and $7, $2, $3 8016: sub $8, $0, $9 Reg# 0 0 1 0 Value 2 20 3 50 4 10 5 600 6 88 7 1 8 2 9 3 8008 8004 PC lw $4, 4($5) 4 8004 Address Add Instruction memory Actions during 2nd the cycle I_Mem[8004]; PC + 4; 8008 35 5 4 4 IF/ID ID/EX EX/MEM MEM/WB 8004 0 2 3 1 0 32 Actions when 2nd clock tick IF/ID.IR =0 2 3 1 0 32; IF/ID.PC = 8004; PC = 8004; add $1, $2, $3 register 1 register 2 Registers register 8004 Actions during the 2nd cycle Reg[2] & Reg[3]; S16_to_S32(2080); 2 3 1 2080 1 2 20 50 2080 16 Sign 32 extend Shift left 2 Add Add result Zero ALU ALU result Address Data memory Computer Organization, 99/2, EE/CGU, W.Y. Lin 16

Example Cycle 3 Addr: Instruction 8000: add $1, $2, $3 8004: lw $4, 4($5) 8008: sw $6, 8($5) 8012: and $7, $2, $3 8016: sub $8, $0, $9 Reg# 0 0 1 0 Value 2 20 3 50 4 10 5 600 6 88 7 1 8 2 9 3 8012 8008 PC 4 8008 sw $6, 8($5) Address Add Instruction memory Actions during 3rd the cycle I_Mem[8008]; PC + 4; Actions when 3rd clock tick IF/ID.IR =35 5 4 4; IF/ID.PC = 8008; PC = 8008; 8012 43 5 6 8 IF/ID ID/EX EX/MEM MEM/WB 8008 35 5 4 4 lw $4, 4($5) register 1 register 2 Registers 2 register 8008 Actions during the 3rd cycle Reg[5] & Reg[4]; S16_to_S32(4); 5 4 0 4 1 600 10 16 Sign 32 extend Actions when 3rd clock tick ID/EX.RegRS =20; ID/EX.RegRT = 50; ID/EX.PC = 8004; ID/EX.S32 = 2080; 4 8004 20 50 2080 Shift left 2 add $1, $2, $3 8004 8320 20 2080 Add Add result 50 Zero ALU ALU result 50 16324 0 70 Address Actions during 3rd clock ID/EX.RegRS + ID/EX.RegRT; ID/EX.PC + Shf_L_2(ID/EX.S32); Data memory Computer Organization, 99/2, EE/CGU, W.Y. Lin 17

Example Cycle 4 Addr: Instruction 8000: add $1, $2, $3 8004: lw $4, 4($5) 8008: sw $6, 8($5) 8012: and $7, $2, $3 8016: sub $8, $0, $9 Reg# 0 0 1 0 Value 2 20 3 50 4 10 5 600 6 88 7 1 8 2 9 3 8016 8012 PC and $7, $2, $3 4 8012 Address Add Instruction memory Actions during 4th the cycle I_Mem[8012]; PC + 4; Actions when 4th clock tick IF/ID.IR =43 5 6 8; IF/ID.PC = 8012; PC = 8012; 8016 0 2 3 7 0 36 IF/ID ID/EX EX/MEM MEM/WB 8012 43 5 6 8 5 6 0 register 1 register 2 Registers 2 register 8 sw $6, 8($5) 8012 Actions during the 4th cycle Reg[5] & Reg[6]; S16_to_S32(8); 1 600 88 16 Sign 32 extend Actions when 4th clock tick ID/EX.RegRS =600; ID/EX.RegRT = 10; ID/EX.PC = 8008; ID/EX.S32 = 4; 8 8008 600 10 4 8008 Shift left 2 4 16 600 lw $4, 4($5) 4 Add Add result Zero ALU ALU result 10 8024 0 604 16324 0 70 50 70 50 add $1, $2, $3 Address Actions during 4th clock ID/EX.RegRS + ID/EX.EX.S32; ID/EX.PC + Shf_L_2(ID/EX.S32); Actions during the 4th cycle No Action Data memory 70 Actions when 4th clock tick EX/MEM.BrAddr =16324; EX/MEM.Zero = 0; EX/MEM.ALUResu = 70; EX/MEM.RegRT = 50; Computer Organization, 99/2, EE/CGU, W.Y. Lin 18

Example Cycle 5 Addr: Instruction 8000: add $1, $2, $3 8004: lw $4, 4($5) 8008: sw $6, 8($5) 8012: and $7, $2, $3 8016: sub $8, $0, $9 Reg# 0 0 1 0 Value 2 20 3 50 4 10 5 600 6 88 7 1->70 8 2 9 3 8020 8016 PC sub $8, $0, $9 4 8016 Address Add Instruction memory Actions during 5th the cycle I_Mem[8016]; PC + 4; Actions when 5th clock tick IF/ID.IR =0 2 3 7 0 36; IF/ID.PC = 8016; PC = 8016; 8020 0 0 9 8 0 34 IF/ID ID/EX EX/MEM MEM/WB 8016 0 2 3 7 0 36 register 1 register 2 Registers 2 register 8016 Actions during the 5th cycle Reg[2] & Reg[3]; S16_to_S32(7204); 2 3 7 70 and $7, $2, $3 7204 1 600 88 7204 16 Sign 32 extend Actions when 5th clock tick ID/EX.RegRS =600; ID/EX.RegRT = 88; ID/EX.PC = 8012; ID/EX.S32 = 8; 8012 600 88 8 8012 Shift left 2 8 sw $6, 8($5) 32 600 8 Add Add result 8044 Zero ALU ALU result 88 0 608 8024 0 604 10 lw $4, 4($5) Address Actions during 5th clock ID/EX.RegRS + ID/EX.EX.S32; ID/EX.PC + Shf_L_2(ID/EX.S32); Actions during the 5th cycle D_Mem[604]; 604 10 Data memory 70 add $1, $2, $3 Actions when 5th clock tick MEM/WB.ALUResu =70; MEM/WB.DMData = xxxx; Actions during the 5th cycle Reg[IF/ID.RegRd] = MEM/WB.ALUResu; 1234 xxxx 70 Actions when 5th clock tick EX/MEM.BrAddr =8024; EX/MEM.Zero = 0; EX/MEM.ALUResu = 604; EX/MEM.RegRT = 10; Computer Organization, 99/2, EE/CGU, W.Y. Lin 19

WB for Load Wrong register number Computer Organization, 106/2, EE/CGU, W.Y. Lin 20

Corrected Datapath (Fig. 4.41) All the information (including control signals) required by all instructions on the executing stages have to be carried!! Computer Organization, 106/2, EE/CGU, W.Y. Lin 21

Corrected Datapath for Load Computer Organization, 106/2, EE/CGU, W.Y. Lin 22

Graphically Representing Pipelines (p. 286, Fig. 4.43) Multiple-clock-cycle pipelining diagram of five instructions Form showing resource usage Computer Organization, 106/2, EE/CGU, W.Y. Lin 23

Graphically Representing Pipelines (Fig. 4.44) Traditional multiple-clock-cycle pipelining diagram of five instructions Computer Organization, 106/2, EE/CGU, W.Y. Lin 24

Graphically Representing Pipelines (Fig. 4.45) The single-clock-cycle diagram corresponding to clock cycle 5 of the pipeline. Computer Organization, 106/2, EE/CGU, W.Y. Lin 25