Pipeline Architecture RISC

Size: px

Start display at page:

Download "Pipeline Architecture RISC"

Gladys Byrd
5 years ago
Views:

1 Pipeline Architecture RISC

2 Independent tasks with independent hardware serial No repetitions during the process pipelined Pipelined vs Serial Processing

3 Instruction Machine Cycle Every instruction must got through the same stages Instruction Fetch Instruction Decode MEM Execution Memory Access Write Back Instruction may not require all the stages, but must execute the all, with empty clock cycles if needed Register-type Instructions Store Instruction MEM Load Instruction MEM Branch & Jump Instructions

4 Hardware Resource Usage MEM MEM MEM time (clock cycles) Independent resources in all stages of execution writing IR, Incrementing PC, register decoding operation with ALU MEM reading or writing the memory writing final result to register file At any moment all resources are being used At any moment several instructions are processed

5 Pipelined Processing Performance instr_1 instr_2 instr_3 MEM MEM MEM time (clock cycles) end of instr_1 end of instr_2 end of instr_3 Every clock cycle one instruction is finished (effectively) Speed-up (compared to serial processing) is proportional to the number of stages in machine cycle (e.g. 5 staged = 5 times faster)

6 Structural Hazard (Resource Conflict) Competition for hardware resources between two instructions in a pipeline (e.g. access to the same memory: and MEM) instr_1 instr_2 MEM MEM MEM instr_3 instr_4 instr_1 instr_2 instr_3 instr_4 MEM MEM MEM

7 Structural Hazard Solution Hardware duplication e.g. independent program () and data (MEM) memory block (Harvard Architecture) Pipeline stall simplest and some cases necessary, but always brings performance drop instr_1 instr_2 instr_3 MEM instr_4 MEM MEM 1 cycle stall 2 cycles end of instr_3 end of instr_4

8 Data Hazard Attempt to use register content violating the register modification order Data hazards are very common stall is wrong solution ADD D1, D2, D3 D3 SUB D4, D3, D5 MUL D4, D3, D1 czas D3

9 Data Hazard Solution Forwarding using up-to-date results directly from earlier execution stages, without waiting for final register update ADD D1, D2, D3 D3 SUB D4, D3, D5 MUL D4, D3, D1

10 Code Optimization Data hazards can be effectively eliminated by code optimization (by optimizing compilers) Architecture dependent or independent optimization e.g. swapping two element of a table: X[k] i X[k+1] Register R3 contains addres of X[k] table element Before: lw lw sw sw R1,0(R3) R2,4(R3) R2,0(R3) R1,4(R3) After: * * * * R1 = R2 = X[k] = X[k+1]= X[k] X[k+1] R2 R1 lw lw sw sw R1,0(R3) R2,4(R3) R1,4(R3) R2,0(R3) * * * * R1 = R2 = X[k+1]= X[k] = X[k] X[k+1] R1 R2

11 Control Hazard Every branch (or jump) breaks instruction sequence Conditional branch (or jump) delay the moment of next instruction execution until the condition is evaluated Pipeline processor may allow some "uncertain" instructions to be fetched and processed and invalidated if necessary jump next instr_1 instr_2 "uncertain" instructions instr_next next 0 jump next instr_1 instr_2 instr_3 instr_4 instr_5 instr_next instr_... n MEM

12 Delayed Branch Instructions between the branch and the target, that entered the pipeline (in delay slot), will be executed unconditionally Instruction in delay branch slot must be chosen carefully (useful or at least harmless - NOP) jump next branch delay slot instr_1 instr_2 instructions executed unconditionally MEM MEM MEM instr_next

13 Branch History Table BHT contains addresses of recently executed branches and their last target during stage the branch and the target can be identified just in time Conditional branch validates the entry in BHT the target instruction may be invalidated if necessary Unsuccessful predictions are limited to starts and ends of program loops BHT jump next instr_next MEM

14 Pipelined Processing Summary Idea: 1 instruction / clock cycle (CPI) Instruction machine cycle is the same (5 stages long), but effective CPI is 1 Hazards (resource, data, control) lower the pipeline performance: CPI>1 Several instructions are always processed at a time All hardware resources are used at a time (almost) Software optimization is crucial to pipelined processing

15 Starting point: Single-Cycle Arch. Data flow can be clearly identified (but asynchronous) Independent stages and independent hardware

16 Pipelined Architecture Intermediate registers to synchronize the data flow

17 Pipelined Hardware Implications Hardware complexity resource duplication (adders, memory) to avoid structural hazards Introduction of "long" intermediate registers to synchronize data flow between stages of instruction execution Instructions can use any resources only one time, more complex tasks (with reuse of resources) are impossible Instruction list and functionality is reduced (RISC) simple addressing modes - Load/Store architecture Performance increase is achieved at the cost of having simpler instructions programs (machine code) must be longer Control unit implementation is simple a consequence of reduced instruction set

18 Load Word (LW) demonstration (1)

19 LW (2)

20 LW (3)

21 LW (4)

22 LW (5)

23 Register Update Correction (LW, R-type) Instruction must "carry" the register number to be modified through the whole pipeline

24 LW Resource Utilization Example Resource utilization in all 5 stages of execution

25 SW (3)

26 SW (4)

27 SW (5) Empty cycle

28 2 instructions Demonstration

29 LW / SUB (1)

30 LW / SUB (2)

31 LW / SUB (3)

32 LW / SUB (4)

33 LW / SUB (5)

34 LW / SUB (6)

35 Control Signals in Pipelined Arch.

36 Pipelined Architecture Control Control is simple no reuse of hardware resources All control signals must be developed in decode stage Control unit implementation is combinatorial all signals can be described with a truth table (similar to Single-Cycle) Control signals must be application must be distributed over several stages of instruction execution (similar to Multi-Cycle) Instruction must "carry" the control signals through the pipeline and use them selectively

37 Control Signals Distribution

38 Minimal Pipelined Architecture

39 Instructions in a Pipeline Example Important Assumption: no hazards (structural, data, control) lw sub and or add R10,20(R1) R2,R3,R11 R4,R5,R12 R6,R7,R13 R8,R9,R14

40 Pipeline (1)

41 Pipeline (2)

42 Pipeline (3)

43 Pipeline (4)

44 Pipeline (5)

45 Pipeline (6)

46 Pipeline (7)

47 Pipeline (8)

48 Pipeline (9)

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add