INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 06 Title: Pipelining - Structural and Summary: Analysis of a program execution in a pipeline; in the pipeline (structural hazards and data hazards); Types of hazards; Overcoming the stalls by software, by writing in opposite edges of the clock cycle and by data forwarding. 2010/2011 Nuno.Roma@ist.utl.pt

Architectures for Embedded Computing Pipelining: Structural and Prof. Nuno Roma ACE 2010/11 - DEI-IST 1 / 31 Previous Class In the previous class... Pipeline Processing: Analysis of the instruction execution Implementation Performance analysis : Structural Data Control Prof. Nuno Roma ACE 2010/11 - DEI-IST 2 / 31

Road Map Prof. Nuno Roma ACE 2010/11 - DEI-IST 3 / 31 Summary Today: Analysis of a program execution in a pipeline; in the pipeline: Structural hazards; Data hazards: Types of hazards; Overcoming the Stalls: By software; By writing in opposite edges of the clock cycle; By data forwarding. Bibliography: Computer Architecture: a Quantitative Approach, Secctions 2.1, A.2 and A.3 Prof. Nuno Roma ACE 2010/11 - DEI-IST 4 / 31

Prof. Nuno Roma ACE 2010/11 - DEI-IST 5 / 31 Architecture of MIPS Processor Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 31

Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 31 Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 31

Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Control Change on the sequence of instructions that is to be executed. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 31 Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Control Change on the sequence of instructions that is to be executed. The occurrence of a hazard implies an interruption of the execution of all pipeline stages before the one where the hazard has occurred: Stall. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 31

Prof. Nuno Roma ACE 2010/11 - DEI-IST 8 / 31 Example of a Structural Hazard Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 10 LD F D X M W i + 1 F D X M W i + 2 F D X M W i + 3 F S F D X M W i + 4 F D X M W Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 31

Example of a Structural Hazard Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 31 Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 31

Prof. Nuno Roma ACE 2010/11 - DEI-IST 12 / 31 Example of a Pipeline Execution for(i = MAX-1; i > 0; i--){ sum = sum + A[i]; A[i] = sum; } cycle: LD R8,0(R1) ; R8 takes A[i] DADD R9,R9,R8 ; R9 accommodates the sum SD R9,0(R1) ; store in A[i] DADDI R1,R1,#-8 ; decrement the pointer BNE R1,R0,cycle ; keep cycle if i<>0 Performance (CPI)? Speedup? Memory bus occupation rate? Prof. Nuno Roma ACE 2010/11 - DEI-IST 13 / 31

Architecture of MIPS Processor Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 31 Data dependencies have to be detected and resolved in order to assure the correct execution of the program. Options: Dependencies resolved by hardware; Pipeline structure is exposed to the code generators (compilers) which foresee and resolve the several dependencies: PROBLEM: executable code becomes dependent on the pipeline structure. Prof. Nuno Roma ACE 2010/11 - DEI-IST 15 / 31

Types of Dependencies i instruction before j: Instruction j has a Data Dependency from instruction i if: Instruction i produces a result that instruction j needs; Instruction j depends from instruction k and instruction k depends from instruction i. Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 31 Types of Dependencies i instruction before j: Instruction j has a Data Dependency from instruction i if: Instruction i produces a result that instruction j needs; Instruction j depends from instruction k and instruction k depends from instruction i. Instruction j has a Name Dependency from instruction i if: Instruction j writes in a register or memory position from where instruction i reads anti-dependency; Both instructions write in the same register or in the same memory position output dependency. Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 31

Types of i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 31 Types of i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. WAW (write after write) j tries to write before i writes, storing the i value instead of j Output dependency. Does not happen in the basic version of the pipeline. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 31

Types of i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. WAW (write after write) j tries to write before i writes, storing the i value instead of j Output dependency. Does not happen in the basic version of the pipeline. WAR (write after read) j tries to write before i reads, loading the updated value instead of the previous result of an anti-dependency. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 31 Types of i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. WAW (write after write) j tries to write before i writes, storing the i value instead of j Output dependency. Does not happen in the basic version of the pipeline. WAR (write after read) j tries to write before i reads, loading the updated value instead of the previous result of an anti-dependency. RAR (read after read)? Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 31

Reduction of Prof. Nuno Roma ACE 2010/11 - DEI-IST 18 / 31 Software Optimization Original: cycle: LD R8,0(R1) ; R8 takes A[i] DADD R9,R9,R8 ; R9 accommodates the sum SD R9,0(R1) ; store in A[i] DADDI R1,R1,#-8 ; decrement the pointer BNE R1,R0,cycle ; keep cycle if i<>0 What can we do to minimize data dependencies? Hint: re-ordering of certain instructions... Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 31

Architecture of the MIPS Processor Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 31 Software Optimization Original: cycle: LD R8,0(R1) ; R8 takes A[i] DADD R9,R9,R8 ; R9 accommodates the sum SD R9,0(R1) ; store in A[i] DADDI R1,R1,#-8 ; decrement the pointer BNE R1,R0,cycle ; keep cycle if i<>0 Optimized: cycle: LD R8,0(R1) ; R8 takes A[i] DADDI R1,R1,#-8 ; decrement the pointer DADD R9,R9,R8 ; R9 accommodates the sum SD R9,8(R1) ; store in A[i] BNE R1,R0,cycle ; keep cycle if i<>0 Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 31

Software Optimization Hazard reduction by software optimization essentially consists on: Instruction Re-ordering Dependencies Register Renaming Reduction of Name Dependencies Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 31 Read-Write in Opposite Clock Edges At the same time? Is the read value in OR R8,R1,R9 the same that is simultaneously written in DADD R1,R2,R3? How can we safely implement the operation j i in a single clock period? Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 31

Read-Write in Opposite Clock Edges Operation: j i In the same phase of the clock: i is written to register bank j is read from register bank Write-Back (i) STALL (j) Instruction Decode (j) Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 31 Read-Write in Opposite Clock Edges Operation: j i In the same phase of the clock: i is written to register bank j is read from register bank Write-Back (i) STALL (j) Instruction Decode (j) In the opposite phase of the clock: i is written to register bank j is read from register bank Write-Back (i) Instruction Decode (j) Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 31

Data Forwarding Data-Forwarding: consists in bypassing the value required by a given instruction directly from an intermediate point of the pipeline. Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 31 Data Forwarding Data-Forwarding: Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 31

Implementation of Data Forwarding Prof. Nuno Roma ACE 2010/11 - DEI-IST 27 / 31 Data Forwarding One Assembly instruction may require the forwarding of more than one operand: Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 31

Data Forwarding Data forwarding does not always prevent the introduction of Stalls: Prof. Nuno Roma ACE 2010/11 - DEI-IST 29 / 31 Prof. Nuno Roma ACE 2010/11 - DEI-IST 30 / 31

- Stall minimization (revision): By software, Read-write in opposite clock edges, Data forwarding. Control - Stall minimization: Anticipation of branch resolution, Delayed branches, Branch prediction: Static, Dynamic. Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 31