Basic Pipelining Concepts

Size: px

Start display at page:

Download "Basic Pipelining Concepts"

Antonia Bryant
5 years ago
Views:

1 Basic ipelining oncepts Appendix A (recommended reading, not everything will be covered today) Basic pipelining ipeline hazards Data hazards ontrol hazards Structural hazards Multicycle operations

2 Execution For each instruction: 1. fetch (IF) 2. decode, operand fetch (ID) 3. Execute computations (EX). Memory access (MEM) 5. Write back results to registers (WB) Number and types of steps can vary between different ISA and implementations

3 MIS Single ycle Implementation fetch op rs rt rd shamt funct op rs rt address/immediate 32 bit

4 MIS Single ycle Implementation fetch decode/ register fetch op rs rt rd shamt funct op rs rt address/immediate rs rt rd Registers data_rs data_rt address/ immediate 16 bit Sign ext. 32 bit

5 MIS Single ycle Implementation fetch decode/ register fetch Execute/ address calc. op rs rt rd shamt funct op rs rt address/immediate rs rt rd Registers data_rs data_rt Shift left 2 status ALU result address/ immediate 16 bit Sign ext. 32 bit

6 MIS Single ycle Implementation fetch decode/ register fetch Execute/ address calc. op rs rt rd shamt funct op rs rt address/immediate Memory access rs rt rd Registers data_rs data_rt Shift left 2 status ALU result Data address/ immediate 16 bit 32 bit Sign ext.

7 MIS Single ycle Implementation fetch decode/ register fetch Execute/ address calc. op rs rt rd shamt funct op rs rt address/immediate Memory access Write back rs rt rd Registers data_rs data_rt Shift left 2 status ALU result Data address/ immediate 16 bit 32 bit Sign ext.

8 MIS Single ycle Implementation fetch decode/ register fetch Execute/ address calc. op rs rt rd shamt funct op rs rt address/immediate Memory access Write back rs rt rd Registers data_rs data_rt Shift left 2 status ALU result Data address/ immediate 16 bit 32 bit Sign ext.

9 roblems All instructions take the time required by the longest instruction Alternative solutions: 1. Multicycle processors 2. ipelining Both solution require the implementation to change! We will have closer look at pipelining!

The Assembly Line oncept A pipelined processor is based on the assembly line concept One station for each stage in the instruction execution At any moment

10 The Assembly Line oncept A pipelined processor is based on the assembly line concept One station for each stage in the instruction execution At any moment there is one instruction at each station One new instruction every cycle => I=1 Each instruction takes multiple cycles to complete, but the throughput is high!

11 ipeline for MIS fetch decode/ register fetch Execute/ address calc. Memory access Write back rs rt rd Registers data_rs data_rt Shift left 2 status ALU result Data address/ immediate 16 bit 32 bit Sign ext.

12 ipeline for MIS IF/ ID ID/ EX EX/ MEM MEM/ WB Shift left 2 Read addr. Registers Write addr. ALU Data Sign ext. ipeline registers

13 ipelining Example add $5, $2, $3 lw $, 100($5) sw $, 00($7) beq $8, $9, 800

14 ipelining Example add $5, $2, $3 lw $, 100($5) sw $, 00($7) beq $8, $9, 800 IF/ ID ID/ EX EX/ MEM MEM/ WB Shift left 2 Read addr. Registers Write addr. ALU Data Sign ext.

15 ipelining Example add $5, $2, $3 lw $, 100($5) sw $, 00($7) beq $8, $9, 800 IF/ ID ID/ EX EX/ MEM MEM/ WB Shift left 2 Read addr. Registers Write addr. ALU Data Sign ext. add $5, $2, $3

16 ipelining Example add $5, $2, $3 lw $, 100($5) sw $, 00($7) beq $8, $9, 800 IF/ ID ID/ EX EX/ MEM MEM/ WB Shift left 2 Read addr. Registers Write addr. $2 $3 ALU Data Sign ext. lw $, 100($5) add $5, $2, $3

17 ipelining Example add $5, $2, $3 lw $, 100($5) sw $, 00($7) beq $8, $9, 800 IF/ ID ID/ EX EX/ MEM MEM/ WB Shift left 2 Read addr. Registers Write addr. $5 $2 $3 ALU $2+$3 Data Sign ext. 100 sw $, 00($7) lw $, 100($5) add $5, $2, $3

18 ipelining Example add $5, $2, $3 lw $, 100($5) sw $, 00($7) beq $8, $9, 800 IF/ ID ID/ EX EX/ MEM MEM/ WB beq+ Shift left 2 Read addr. Registers Write addr. $7 $ $5 ALU $5+100 $2+$3 Data Sign ext beq $8, $9, 800 sw $, 00($7) lw $, 100($5) add $5, $2, $3

19 ipelining Example lw $, 100($5) sw $, 00($7) beq $8, $9, IF/ ID ID/ EX EX/ MEM MEM/ WB beq+ Shift left 2 $5 Read addr. Registers Write addr. $8 $9 $7 $ ALU $7+00 $5+100 Data Sign ext $2+$3 (beq+) beq $8, $9, 800 sw $, 00($7) lw $, 100($5) add

20 ipelining Example sw $, 00($7) beq $8, $9, IF/ ID ID/ EX EX/ MEM MEM/ WB beq+ $ Read addr. Registers Write addr. Sign ext. $8 $9 800 Shift left beq ALU Z $8-$9 $7+00 $ Data (beq+8) (beq+) beq $8, $9, 800 sw $, 00($7) M[$5+100] lw $

21 ipelining Example ontrols mux beq IF/ ID ID/ EX EX/ MEM MEM/ WB Shift left 2 Read addr. Registers Write addr. ALU Z Data Sign ext. (beq+12) (beq+8) (beq+) beq $8, $9, 800 sw...

22 ipelining Example branch dest. IF/ ID ID/ EX EX/ MEM MEM/ WB Shift left 2 Read addr. Registers Write addr. ALU Data Sign ext. (beq++3200) (beq+12) (beq+8) (beq+) beq...

23 ipeline Hazards Neighboring instructions are rarely independent In a pipeline, this can cause conflicts called hazards Three main types of hazards Data hazards ontrol hazards Structural hazards

24 Hazard Resolution ipeline hazards can be resolved in many different ways Stall: Stop parts of the pipeline until the conflicting instructions are sufficiently separated Make results available earlier Move calculations to earlier pipeline stages Make results available before they have been stored Guess results before they have been computed(!) Reorder instructions

25 Data Hazards Three types Read-After-Write (RAW). Write-After-Read (WAR). Do not occur in simple pipelines. Write-After-Write (WAW). Do not occur in simple pipelines. RAW hazard occurse when An instruction needs the result of an earlier instruction that is stored in a register (or location), and the earlier instruction has not yet written the result Usually handled by (some combination of) data forwarding (bypassing) stalling instruction reordering

26 RAW Hazard Example IF/ ID ID/ EX EX/ MEM MEM/ WB Shift left 2 Read addr. Registers Write addr. $5 $2 $3 ALU Data Sign ext. 100 sw $, 00($7) lw $, 100($5) add $5, $2, $3

27 Solution: Data forwarding IF/ ID ID/ EX EX/ MEM MEM/ WB beq+ Shift left 2 Read addr. Registers Write addr. $7 $ $5 ALU $2+$3 Data Sign ext beq $8, $9, 800 sw $, 00($7) lw $, 100($5) add $5, $2, $3

28 Another RAW Hazard Example IF/ ID ID/ EX EX/ MEM MEM/ WB beq+ Shift left 2 Read addr. Registers Write addr. $7 $ $5 ALU $2+$3 Data Sign ext beq $8, $9, 800 sw $, 00($7) lw $, 100($5) add $5, $2, $3

29 Solution: Stalling and IF/ID not updated until lw reaches WB Bubbles (nop=no operation) loaded into ID/EX until lw reaches WB IF/ ID ID/ EX EX/ MEM MEM/ WB Shift left 2 $ Read addr. Registers Write addr. $7 $ ALU Data Sign ext. 00 beq $8, $9, 800 sw $, 00($7) nop nop lw...

30 Speedup Equation for ipelining I pipelined = Ideal I + ipeline stall cycles per instr. I unpipelined Speedup = * Ideal I + #stall cycles/instr Tc unpipelined Tc pipelined Ideal I for a pipeline is normally = 1. And, if I unpipelined * Tc unpipelined Tc pipelined = Number of pipeline stages. Gives, Speedup = #pipeline stages 1 + #stall cycles/instr

31 ontrol Hazards Occur when the program counter () is changed by a branch och jump instruction an exception (interrupt, trap, etc.) Usually handled by (some combination of) branch prediction earlier target address calculation stalling delayed branch instruction reordering (static or dynamic scheduling)

32 ontrol Hazard Example If branch to be taken, (beq+)-(beq+12) should not have been fetched beq++800 IF/ ID ID/ EX EX/ MEM MEM/ WB Shift left 2 Read addr. Registers Write addr. ALU Data Sign ext. (beq+12) (beq+8) (beq+) beq $8, $9, 800 sw...

33 Solution: Stalling Stop fetching instructions after a branch instruction until the address of the next instruction has been determined This is very inefficient because branches tend to be very frequent

34 Solution: Earlier Branch alculation Both address and condition need to be calculated earlier (in ID) IF/ ID ID/ EX EX/ MEM MEM/ WB Shift left 2 Read addr. Registers Write addr. ALU Data Sign ext. Risk that clock cycle must be increased, leading to total performance loss.

35 Solution: Branch-Delay Hiding Techniques Stall until branch condition and target is known: an cause significant penalties redict Branch not taken (a fairly rare case) Execute successor instructions in sequence Squash instructions in pipeline if the branch is actually taken Works well if state is updated late in the pipeline 30%-38% of conditional branches are not taken on average redict Branch taken (a fairly common case) 62%-70% of conditional branches are taken on average Makes sense for more complex pipeline organizations Delayed branch (schedule useful instr. in delay slot) Define branch to take place after a following instruction

36 Static Scheduling and Delayed Branch Scheduling an instruction from before is always safe Scheduling from target or from the not-taken path is not always safe; must be guaranteed that speculative instr. do no harm.

37 onditional Delay Slot Execution ancelling or nullifying branch instruction: ancel the instruction in the delay slot if branch does not conform with the prediction Measurements on SE: 80% of the branch-delay slots can be filled with useful instructions 70% will be filled at run-time; 10% of the useful instructions will be cancelled because of mispredictions

Evaluating Branch Hazard Avoidance Techniques ipeline speedup = #pipeline stages 1 + Branch frequency x Branch penalty Scheduling scheme Branch

38 Evaluating Branch Hazard Avoidance Techniques ipeline speedup = #pipeline stages 1 + Branch frequency x Branch penalty Scheduling scheme Branch penalty for integ. gm I Speedup vs. Unpipelined Stall pipeline redict taken redict not taken Delayed branch

39 Exceptions An exception (interrupt, trap, ) always causes a jump in execution Exceptions are extra difficult to handle as they cannot be predicted and may occur at different stages for different instructions ertain precise exceptions require that instructions in the beginning of the pipeline are thrown away and later restarted when the exception handler is finished

40 Respecting the Execution order Exceptions may be generated in another order than the instruction execution order ipeline stage IF ID EX MEM WB roblem causing exception age fault on instruction fetch; misaligned access; protection violation Undefined or illegal opcode Arithmetic exception age fault on data access; Misaligned access; Memory protection violation none Example sequence: lw (e.g., page fault in MEM) add (e.g., page fault in IF) The add instruction causes a fault before the load

41 Structural Hazards Occur when two instructions require the same resource at the same time Blocked resources can be ipeline stages, or functional units in pipeline stages Memory Register file Usually handled by (a combination of) stalling instruction reordering (e.g. dynamic scheduling, which will be covered later in the course)

42 Structural Hazard Example lw has to wait on a cache miss in MEM => stall for previous stages IF/ ID ID/ EX EX/ MEM MEM/ WB beq+ Shift left 2 Read addr. Registers Write addr. $8 $9 $7 $ ALU $5+100 Data Sign ext (beq+) beq $8, $9, 800 sw $, 00($7) lw $, 100($5) nop

43 Multicycle Operations in the ipeline Integer unit: Handles integer instructions, branches, and loads/stores Other units: May take several cycles each. Some units are pipelined (mult,add) others are not (div)

44 arallel Execution of s MULTDF2,F,F6 IFIDM1 M2M3M M5 M6 M7 MEMWB ADF8,F10,F12 IFIDA1 A2A3 A MEMWB SUBI R2,R3,#8 LD F1,0(R2) IFIDEXMEMWB IFIDEX MEMWB Structural and RAW hazards: Structural hazards. Stall in ID stage when the functional unit is occupied many instructions can reach the WB stage at the same time RAW hazards: Normal bypassing from MEM and WB stages Stall in ID stage if any of the source operands is a destination operand of an instruction in any of the F functional units

45 WAR and WAW Hazards for Multicycle Operations WAR hazards are a non-issue because operands are read in program order WAW hazards may occur Example of a WAW hazard: DIVF F0,F2,F F divide 2 cycles SUBF F0,F8,F10 F sub 3 cycles SUB finishes before DIV ; out-of-order completion WAW hazards are avoided by: stalling the SUBF until DIVF reaches the MEM stage, or disabling the write to register F0 for the DIVF instruction

omplications: recise exceptions: maintain execution order ISA must be designed to match pipelining requirements

46 Summary ipelining: Speeds up throughput, not latency Speedup #stages Hazards are fundamental limits: Structural: need more HW Data (RAW,WAR,WAW): need forwarding and compiler scheduling ontrol: delayed branch, branch prediction omplications: recise exceptions: maintain execution order ISA must be designed to match pipelining requirements Multi-cycle operations may result in out-of-order completion Out-of-order completion introduces WAW hazards and problems with precise interrupts

Appendix C: Pipelining: Basic and Intermediate Concepts

Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)