INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Similar documents
INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

EE 4683/5683: COMPUTER ARCHITECTURE

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

What is Pipelining? RISC remainder (our assumptions)

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Chapter 06: Instruction Pipelining and Parallel Processing

Instruction Pipelining Review

Updated Exercises by Diana Franklin

CMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction.

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Copyright 2012, Elsevier Inc. All rights reserved.

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

EEC 581 Computer Architecture. Lec 4 Instruction Level Parallelism

Instruction-Level Parallelism (ILP)

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism

DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD

Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012

ECE 505 Computer Architecture

The basic structure of a MIPS floating-point unit

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Advanced Computer Architecture

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville

DYNAMIC SPECULATIVE EXECUTION

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Chapter 3 & Appendix C Part B: ILP and Its Exploitation

COSC4201 Instruction Level Parallelism Dynamic Scheduling

Lecture 4: Advanced Pipelines. Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

DAT105: Computer Architecture Study Period 2, 2009 Exercise 3 Chapter 2: Instruction-Level Parallelism and Its Exploitation

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EECC551 Exam Review 4 questions out of 6 questions

Instruction Level Parallelism

T T T T T T N T T T T T T T T N T T T T T T T T T N T T T T T T T T T T T N.

Course on Advanced Computer Architectures

ILP: Instruction Level Parallelism

CPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation

Hardware-Based Speculation

Multi-cycle Instructions in the Pipeline (Floating Point)

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

Lecture 19: Instruction Level Parallelism

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Static vs. Dynamic Scheduling

COSC 6385 Computer Architecture - Pipelining

References EE457. Out of Order (OoO) Execution. Instruction Scheduling (Re-ordering of instructions)

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

CS252 Prerequisite Quiz. Solutions Fall 2007

ADVANCED COMPUTER ARCHITECTURES: Prof. C. SILVANO Written exam 11 July 2011

Solutions to exercises on Instruction Level Parallelism

CIS 662: Midterm. 16 cycles, 6 stalls

Lecture 5: Pipelining Basics

ECE 571 Advanced Microprocessor-Based Design Lecture 4

The Processor: Instruction-Level Parallelism

Adapted from David Patterson s slides on graduate computer architecture

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

Slide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng

Week 11: Assignment Solutions

CS425 Computer Systems Architecture

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

In embedded systems there is a trade off between performance and power consumption. Using ILP saves power and leads to DECREASING clock frequency.

CA226 Advanced Computer Architecture

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections )

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Lecture 4: Introduction to Advanced Pipelining

Lecture: Pipeline Wrap-Up and Static ILP

Instruction-Level Parallelism and Its Exploitation

CMSC 611: Advanced Computer Architecture

Computer Architecture

Spring 2014 Midterm Exam Review

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

EITF20: Computer Architecture Part3.2.1: Pipeline - 3

Pipelining: Issue instructions in every cycle (CPI 1) Compiler scheduling (static scheduling) reduces impact of dependences

Instr. execution impl. view

Transcription:

UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 06 Title: Pipelining - Structural and Summary: Analysis of a program execution in a pipeline; in the pipeline (structural hazards and data hazards); Types of hazards; Overcoming the stalls by software, by writing in opposite edges of the clock cycle and by data forwarding. 2010/2011 Nuno.Roma@ist.utl.pt

Architectures for Embedded Computing Pipelining: Structural and Prof. Nuno Roma ACE 2010/11 - DEI-IST 1 / 31 Previous Class In the previous class... Pipeline Processing: Analysis of the instruction execution Implementation Performance analysis : Structural Data Control Prof. Nuno Roma ACE 2010/11 - DEI-IST 2 / 31

Road Map Prof. Nuno Roma ACE 2010/11 - DEI-IST 3 / 31 Summary Today: Analysis of a program execution in a pipeline; in the pipeline: Structural hazards; Data hazards: Types of hazards; Overcoming the Stalls: By software; By writing in opposite edges of the clock cycle; By data forwarding. Bibliography: Computer Architecture: a Quantitative Approach, Secctions 2.1, A.2 and A.3 Prof. Nuno Roma ACE 2010/11 - DEI-IST 4 / 31

Prof. Nuno Roma ACE 2010/11 - DEI-IST 5 / 31 Architecture of MIPS Processor Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 31

Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 31 Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 31

Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Control Change on the sequence of instructions that is to be executed. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 31 Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Control Change on the sequence of instructions that is to be executed. The occurrence of a hazard implies an interruption of the execution of all pipeline stages before the one where the hazard has occurred: Stall. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 31

Prof. Nuno Roma ACE 2010/11 - DEI-IST 8 / 31 Example of a Structural Hazard Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 10 LD F D X M W i + 1 F D X M W i + 2 F D X M W i + 3 F S F D X M W i + 4 F D X M W Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 31

Example of a Structural Hazard Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 31 Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 31

Prof. Nuno Roma ACE 2010/11 - DEI-IST 12 / 31 Example of a Pipeline Execution for(i = MAX-1; i > 0; i--){ sum = sum + A[i]; A[i] = sum; } cycle: LD R8,0(R1) ; R8 takes A[i] DADD R9,R9,R8 ; R9 accommodates the sum SD R9,0(R1) ; store in A[i] DADDI R1,R1,#-8 ; decrement the pointer BNE R1,R0,cycle ; keep cycle if i<>0 Performance (CPI)? Speedup? Memory bus occupation rate? Prof. Nuno Roma ACE 2010/11 - DEI-IST 13 / 31

Architecture of MIPS Processor Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 31 Data dependencies have to be detected and resolved in order to assure the correct execution of the program. Options: Dependencies resolved by hardware; Pipeline structure is exposed to the code generators (compilers) which foresee and resolve the several dependencies: PROBLEM: executable code becomes dependent on the pipeline structure. Prof. Nuno Roma ACE 2010/11 - DEI-IST 15 / 31

Types of Dependencies i instruction before j: Instruction j has a Data Dependency from instruction i if: Instruction i produces a result that instruction j needs; Instruction j depends from instruction k and instruction k depends from instruction i. Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 31 Types of Dependencies i instruction before j: Instruction j has a Data Dependency from instruction i if: Instruction i produces a result that instruction j needs; Instruction j depends from instruction k and instruction k depends from instruction i. Instruction j has a Name Dependency from instruction i if: Instruction j writes in a register or memory position from where instruction i reads anti-dependency; Both instructions write in the same register or in the same memory position output dependency. Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 31

Types of i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 31 Types of i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. WAW (write after write) j tries to write before i writes, storing the i value instead of j Output dependency. Does not happen in the basic version of the pipeline. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 31

Types of i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. WAW (write after write) j tries to write before i writes, storing the i value instead of j Output dependency. Does not happen in the basic version of the pipeline. WAR (write after read) j tries to write before i reads, loading the updated value instead of the previous result of an anti-dependency. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 31 Types of i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. WAW (write after write) j tries to write before i writes, storing the i value instead of j Output dependency. Does not happen in the basic version of the pipeline. WAR (write after read) j tries to write before i reads, loading the updated value instead of the previous result of an anti-dependency. RAR (read after read)? Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 31

Reduction of Prof. Nuno Roma ACE 2010/11 - DEI-IST 18 / 31 Software Optimization Original: cycle: LD R8,0(R1) ; R8 takes A[i] DADD R9,R9,R8 ; R9 accommodates the sum SD R9,0(R1) ; store in A[i] DADDI R1,R1,#-8 ; decrement the pointer BNE R1,R0,cycle ; keep cycle if i<>0 What can we do to minimize data dependencies? Hint: re-ordering of certain instructions... Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 31

Architecture of the MIPS Processor Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 31 Software Optimization Original: cycle: LD R8,0(R1) ; R8 takes A[i] DADD R9,R9,R8 ; R9 accommodates the sum SD R9,0(R1) ; store in A[i] DADDI R1,R1,#-8 ; decrement the pointer BNE R1,R0,cycle ; keep cycle if i<>0 Optimized: cycle: LD R8,0(R1) ; R8 takes A[i] DADDI R1,R1,#-8 ; decrement the pointer DADD R9,R9,R8 ; R9 accommodates the sum SD R9,8(R1) ; store in A[i] BNE R1,R0,cycle ; keep cycle if i<>0 Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 31

Software Optimization Hazard reduction by software optimization essentially consists on: Instruction Re-ordering Dependencies Register Renaming Reduction of Name Dependencies Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 31 Read-Write in Opposite Clock Edges At the same time? Is the read value in OR R8,R1,R9 the same that is simultaneously written in DADD R1,R2,R3? How can we safely implement the operation j i in a single clock period? Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 31

Read-Write in Opposite Clock Edges Operation: j i In the same phase of the clock: i is written to register bank j is read from register bank Write-Back (i) STALL (j) Instruction Decode (j) Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 31 Read-Write in Opposite Clock Edges Operation: j i In the same phase of the clock: i is written to register bank j is read from register bank Write-Back (i) STALL (j) Instruction Decode (j) In the opposite phase of the clock: i is written to register bank j is read from register bank Write-Back (i) Instruction Decode (j) Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 31

Data Forwarding Data-Forwarding: consists in bypassing the value required by a given instruction directly from an intermediate point of the pipeline. Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 31 Data Forwarding Data-Forwarding: Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 31

Implementation of Data Forwarding Prof. Nuno Roma ACE 2010/11 - DEI-IST 27 / 31 Data Forwarding One Assembly instruction may require the forwarding of more than one operand: Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 31

Data Forwarding Data forwarding does not always prevent the introduction of Stalls: Prof. Nuno Roma ACE 2010/11 - DEI-IST 29 / 31 Prof. Nuno Roma ACE 2010/11 - DEI-IST 30 / 31

- Stall minimization (revision): By software, Read-write in opposite clock edges, Data forwarding. Control - Stall minimization: Anticipation of branch resolution, Delayed branches, Branch prediction: Static, Dynamic. Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 31