Pipelining. Parts of these slides are from the support material provided by W. Stallings

Size: px

Start display at page:

Download "Pipelining. Parts of these slides are from the support material provided by W. Stallings"

Hilary Knight
6 years ago
Views:

1 Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings Objective To present the Pipelining concept, its limitations and the techniques for performance optimization Pipelining 2 1

2 Outline Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards Resource Data Control Pipelining 3 Instruction Cycle State Diagram instruction fetch instruction operation decoding operand address calculation multiple operands operand fetch instruction address decoding indirection data operation no interrupt indirection result address calculation interrrupt interrupt interrrupt check result store multiple results Pipelining 4 2

3 Outline Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards Resource Data Control Pipelining 5 Instruction Pipelining Instruction cycle is split into sequential steps A specific hardware unit (pipeline stage) is built to perform each step Pipeline stages are arranged as a chain pipeline stages 1 2 k Pipelining 6 3

4 Instruction Pipelining - Example fetch instruction decode instruction CO calculate operand FO fetch operand EI execute instruction WO write operand (result) Pipelining 7 Instruction Pipeline Operation time instruction 1 instruction 2 instruction 3 instruction 4 instruction 5 instruction 6 instruction 7 instruction 15 instruction 16 Pipelining 8 4

5 Outline Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards Resource Data Control Pipelining 9 Pipeline Performance Assuming k stages τ= τ 1 = τ 2 =... = τ k (τ i is the time delay of the i-th stage) T n,k time for a pipeline with k stages to execute n instructions T n,1 = n k τ (conventional machine) T n,k = k τ+ (n-1)τ = (n+k-1)τ (pipeline) The speedup S k nkτ nk = = ( n + k 1) τ n + k 1 For large n nk lim Sk = lim = k!!!!!! n n n + k 1 Pipelining 10 5

6 Pipeline Performance Speedup k = 12 stages k = 9 stages k = 6 stages Number Number of of instructions instructions Speedup Number of stages Number of instructions n = 30 instructions n = 20 instructions n = 10 instructions Pipelining 11 Pipeline Performance The optimal performance is never reached because: The execution time is different from stage to stage τ τ L 1 2 There is still a time delay to latch the output of each stage τ k Pipeline hazards τ = max τ i [ ] + d i Pipelining 12 6

7 Outline Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards Resource Data Control Pipelining 13 Pipeline Hazards In some cases a portion of pipeline must stall, due to the so called hazards Also called pipeline bubble Types of hazards Resource Data Control Pipelining 14 7

8 Outline Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards Resource Data Control Pipelining 15 Resource Hazards Also called structural hazards, occur when multiple instructions need the same resource, e.g., single port memory time instruction 1 instruction 2 instruction 3 instruction 4 COFOidle FO idle EI WO Pipelining 16 8

9 Outline Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards Resource Data Control Pipelining 17 Data Hazards Conflict in access of an operand location Two instructions to be executed in sequence Both access a particular memory or register operand Example: time ADD EAX,EBX SUB ECX,EAX instruction 3 instruction 4 CO idle FO CO idle FO CO idle FO Pipelining 18 9

10 Outline Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards Resource Data Control Pipelining 19 Control Hazards Also called branch hazard. What is the address of the instruction following a conditional branch from? Known only here time ADD EAX,EBX JNZ ADDRESS instruction 3 instruction 4 idle no memory conflict! Pipelining 20 10

11 Control Hazards Dealing with Branches Multiple Streams Two pipelines prefetch each branch into a separate pipeline (IBM 370/168 and IBM 3033). Always one pipeline produces no useful work. Prefetch Branch Target Target of branch is prefetched in addition to instructions following branch; keep target until branch is executed (IBM 360/91) Loop buffer Very fast memory maintained by fetch stage of pipeline. Check buffer before fetching from memory (CRAY-1) Branch prediction Delayed branching Pipelining 21 Branch Prediction Concept: Instead of delaying the fetch of next instruction, it is predicted Results are stored in temporary registers If prediction correct, make results definitive If prediction incorrect, flush results, and restart fetching from the right address Pipelining 22 11

Branch Prediction Static Methods: Predict never or always Predicted by opcode There are two codes for each branch instruction 1 bit indicates predict or predict not Compiler analyses the code,

12 Branch Prediction Static Methods: Predict never or always Predicted by opcode There are two codes for each branch instruction 1 bit indicates predict or predict not Compiler analyses the code, guesses and generate the appropriate branch code. Processor follows compiler suggestion Implies in code incompatibility with previous processors Pipelining 23 Branch Prediction Dynamic Methods Base on recent branch history branch instruction address target address state predict not predict not predict not not predict not not Branch History Table Branch Prediction State Diagram Pipelining 24 12

13 Delayed Branch Concept The branch takes effect until after execution of the following instruction reduces the branch penalty not not MOV EDX,ECX ADD EAX,[EBX] JZ LA instruction MOV EDX,ECX ADD EAX,[EBX] JZ LA instruction ADD EAX,[EBX] JZ LA MOV EDX,ECX instrução ADD EAX,[EBX] JZ LA MOV EDX,ECX instruction always executed LA: instruction... LA: instruction... LA: instrução... LA: instruction... conventional branch delayed branch Pipelining 25 Delayed Branch Example: conventional branch Prediction wrong MOV EDX,ECX time ADD EAX,[EBX] JZ LA instruction 1 instruction 2 instruction 3 FO branch penalty instruction 4 Pipelining 26 13

14 Delayed Branch Example: delayed branch Prediction wrong ADD EAX,[EBX] time JZ LA MOV EDX,ECX instruction 1 instruction 2 branch penalty instruction 3 instruction 4 Pipelining 27 Exercises Exercise 1: Assume the pipeline shown in slide 7 containing 6 stages. Complete the graphs below that represent the pipeline operation assuming a single port memory. Hint: take in consideration the memory accesses for instruction fetch, operand fetch and result write. PROGRAM ADD EAX,[EBX+ESI] INC EBX DEC [ESI*2+EBP] MOV CX,[ 4768] PROGRAM ADD [EBX+ESI], EAX INC EBX DEC [ESI*2+EBP] MOV CX,[ 4768] Pipelining 28 14

15 Exercises Exercise 2: Repeat the previous exercise assuming that there are separate Instruction and Data caches, so that accesses to instruction and operands may occur simultaneously. PROGRAM ADD EAX,[EBX+ESI] INC EBX DEC [ESI*2+EBP] MOV CX,[ 4768] PROGRAM ADD [EBX+ESI], EAX INC EBX DEC [ESI*2+EBP] MOV CX,[ 4768] Pipelining 29 Text Book References The topics are covered in Stallings - sections 12.3 and 12.4 Pipelining 30 15

16 Pipelining END Pipelining 31 16

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers