INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
|
|
- Eleanor Copeland
- 6 years ago
- Views:
Transcription
1 UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version English Lecture 06 Title: Pipelining - Structural and Summary: Analysis of a program execution in a pipeline; in the pipeline (structural hazards and data hazards); Types of hazards; Overcoming the stalls by software, by writing in opposite edges of the clock cycle and by data forwarding. 2010/2011 Nuno.Roma@ist.utl.pt
2 Architectures for Embedded Computing Pipelining: Structural and Prof. Nuno Roma ACE 2010/11 - DEI-IST 1 / 31 Previous Class In the previous class... Pipeline Processing: Analysis of the instruction execution Implementation Performance analysis : Structural Data Control Prof. Nuno Roma ACE 2010/11 - DEI-IST 2 / 31
3 Road Map Prof. Nuno Roma ACE 2010/11 - DEI-IST 3 / 31 Summary Today: Analysis of a program execution in a pipeline; in the pipeline: Structural hazards; Data hazards: Types of hazards; Overcoming the Stalls: By software; By writing in opposite edges of the clock cycle; By data forwarding. Bibliography: Computer Architecture: a Quantitative Approach, Secctions 2.1, A.2 and A.3 Prof. Nuno Roma ACE 2010/11 - DEI-IST 4 / 31
4 Prof. Nuno Roma ACE 2010/11 - DEI-IST 5 / 31 Architecture of MIPS Processor Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 31
5 Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 31 Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 31
6 Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Control Change on the sequence of instructions that is to be executed. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 31 Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Control Change on the sequence of instructions that is to be executed. The occurrence of a hazard implies an interruption of the execution of all pipeline stages before the one where the hazard has occurred: Stall. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 31
7 Prof. Nuno Roma ACE 2010/11 - DEI-IST 8 / 31 Example of a Structural Hazard Clock Cycle Instruction LD F D X M W i + 1 F D X M W i + 2 F D X M W i + 3 F S F D X M W i + 4 F D X M W Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 31
8 Example of a Structural Hazard Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 31 Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 31
9 Prof. Nuno Roma ACE 2010/11 - DEI-IST 12 / 31 Example of a Pipeline Execution for(i = MAX-1; i > 0; i--){ sum = sum + A[i]; A[i] = sum; } cycle: LD R8,0(R1) ; R8 takes A[i] DADD R9,R9,R8 ; R9 accommodates the sum SD R9,0(R1) ; store in A[i] DADDI R1,R1,#-8 ; decrement the pointer BNE R1,R0,cycle ; keep cycle if i<>0 Performance (CPI)? Speedup? Memory bus occupation rate? Prof. Nuno Roma ACE 2010/11 - DEI-IST 13 / 31
10 Architecture of MIPS Processor Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 31 Data dependencies have to be detected and resolved in order to assure the correct execution of the program. Options: Dependencies resolved by hardware; Pipeline structure is exposed to the code generators (compilers) which foresee and resolve the several dependencies: PROBLEM: executable code becomes dependent on the pipeline structure. Prof. Nuno Roma ACE 2010/11 - DEI-IST 15 / 31
11 Types of Dependencies i instruction before j: Instruction j has a Data Dependency from instruction i if: Instruction i produces a result that instruction j needs; Instruction j depends from instruction k and instruction k depends from instruction i. Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 31 Types of Dependencies i instruction before j: Instruction j has a Data Dependency from instruction i if: Instruction i produces a result that instruction j needs; Instruction j depends from instruction k and instruction k depends from instruction i. Instruction j has a Name Dependency from instruction i if: Instruction j writes in a register or memory position from where instruction i reads anti-dependency; Both instructions write in the same register or in the same memory position output dependency. Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 31
12 Types of i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 31 Types of i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. WAW (write after write) j tries to write before i writes, storing the i value instead of j Output dependency. Does not happen in the basic version of the pipeline. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 31
13 Types of i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. WAW (write after write) j tries to write before i writes, storing the i value instead of j Output dependency. Does not happen in the basic version of the pipeline. WAR (write after read) j tries to write before i reads, loading the updated value instead of the previous result of an anti-dependency. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 31 Types of i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. WAW (write after write) j tries to write before i writes, storing the i value instead of j Output dependency. Does not happen in the basic version of the pipeline. WAR (write after read) j tries to write before i reads, loading the updated value instead of the previous result of an anti-dependency. RAR (read after read)? Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 31
14 Reduction of Prof. Nuno Roma ACE 2010/11 - DEI-IST 18 / 31 Software Optimization Original: cycle: LD R8,0(R1) ; R8 takes A[i] DADD R9,R9,R8 ; R9 accommodates the sum SD R9,0(R1) ; store in A[i] DADDI R1,R1,#-8 ; decrement the pointer BNE R1,R0,cycle ; keep cycle if i<>0 What can we do to minimize data dependencies? Hint: re-ordering of certain instructions... Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 31
15 Architecture of the MIPS Processor Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 31 Software Optimization Original: cycle: LD R8,0(R1) ; R8 takes A[i] DADD R9,R9,R8 ; R9 accommodates the sum SD R9,0(R1) ; store in A[i] DADDI R1,R1,#-8 ; decrement the pointer BNE R1,R0,cycle ; keep cycle if i<>0 Optimized: cycle: LD R8,0(R1) ; R8 takes A[i] DADDI R1,R1,#-8 ; decrement the pointer DADD R9,R9,R8 ; R9 accommodates the sum SD R9,8(R1) ; store in A[i] BNE R1,R0,cycle ; keep cycle if i<>0 Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 31
16 Software Optimization Hazard reduction by software optimization essentially consists on: Instruction Re-ordering Dependencies Register Renaming Reduction of Name Dependencies Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 31 Read-Write in Opposite Clock Edges At the same time? Is the read value in OR R8,R1,R9 the same that is simultaneously written in DADD R1,R2,R3? How can we safely implement the operation j i in a single clock period? Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 31
17 Read-Write in Opposite Clock Edges Operation: j i In the same phase of the clock: i is written to register bank j is read from register bank Write-Back (i) STALL (j) Instruction Decode (j) Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 31 Read-Write in Opposite Clock Edges Operation: j i In the same phase of the clock: i is written to register bank j is read from register bank Write-Back (i) STALL (j) Instruction Decode (j) In the opposite phase of the clock: i is written to register bank j is read from register bank Write-Back (i) Instruction Decode (j) Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 31
18 Data Forwarding Data-Forwarding: consists in bypassing the value required by a given instruction directly from an intermediate point of the pipeline. Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 31 Data Forwarding Data-Forwarding: Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 31
19 Implementation of Data Forwarding Prof. Nuno Roma ACE 2010/11 - DEI-IST 27 / 31 Data Forwarding One Assembly instruction may require the forwarding of more than one operand: Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 31
20 Data Forwarding Data forwarding does not always prevent the introduction of Stalls: Prof. Nuno Roma ACE 2010/11 - DEI-IST 29 / 31 Prof. Nuno Roma ACE 2010/11 - DEI-IST 30 / 31
21 - Stall minimization (revision): By software, Read-write in opposite clock edges, Data forwarding. Control - Stall minimization: Anticipation of branch resolution, Delayed branches, Branch prediction: Static, Dynamic. Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 31
INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 07
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 05
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 09
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 14
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 04
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 03 Title: Processor
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 17
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 16
More informationPipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 4A: Instruction Level Parallelism - Static Scheduling Avinash Kodi, kodi@ohio.edu Agenda 2 Dependences RAW, WAR, WAW Static Scheduling Loop-carried Dependence
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationWhat is Pipelining? RISC remainder (our assumptions)
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationThe Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.
The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions
More informationChapter 06: Instruction Pipelining and Parallel Processing
Chapter 06: Instruction Pipelining and Parallel Processing Lesson 09: Superscalar Processors and Parallel Computer Systems Objective To understand parallel pipelines and multiple execution units Instruction
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationUpdated Exercises by Diana Franklin
C-82 Appendix C Pipelining: Basic and Intermediate Concepts Updated Exercises by Diana Franklin C.1 [15/15/15/15/25/10/15] Use the following code fragment: Loop: LD R1,0(R2) ;load R1 from address
More informationCMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction.
CMSC 4 Practice Exam w/answers General instructions. Be complete, yet concise. You may leave arithmetic expressions in any form that a calculator could evaluate.. CPU performance Suppose we have the following
More informationPage 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson
More informationInstruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction
Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not
More informationPage # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationWhat is Pipelining? Time per instruction on unpipelined machine Number of pipe stages
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationEEC 581 Computer Architecture. Lec 4 Instruction Level Parallelism
EEC 581 Computer Architecture Lec 4 Instruction Level Parallelism Chansu Yu Electrical and Computer Engineering Cleveland State University Acknowledgement Part of class notes are from David Patterson Electrical
More informationInstruction-Level Parallelism (ILP)
Instruction Level Parallelism Instruction-Level Parallelism (ILP): overlap the execution of instructions to improve performance 2 approaches to exploit ILP: 1. Rely on hardware to help discover and exploit
More informationMinimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline
Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding
More informationELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism
ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University,
More informationDYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD
DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,
More informationLecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.SP96 1 Review: Evaluating Branch Alternatives Two part solution: Determine
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationAdvanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012
Advanced Computer Architecture CMSC 611 Homework 3 Due in class Oct 17 th, 2012 (Show your work to receive partial credit) 1) For the following code snippet list the data dependencies and rewrite the code
More informationECE 505 Computer Architecture
ECE 505 Computer Architecture Pipelining 2 Berk Sunar and Thomas Eisenbarth Review 5 stages of RISC IF ID EX MEM WB Ideal speedup of pipelining = Pipeline depth (N) Practically Implementation problems
More informationThe basic structure of a MIPS floating-point unit
Tomasulo s scheme The algorithm based on the idea of reservation station The reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from
More informationLecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S
Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching
More informationReduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:
Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by: Result forwarding (register bypassing) to reduce or eliminate stalls needed
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of
More informationRecall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationExploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville
Lecture : Exploiting ILP with SW Approaches Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Basic Pipeline Scheduling and Loop
More informationDYNAMIC SPECULATIVE EXECUTION
DYNAMIC SPECULATIVE EXECUTION Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson, Morgan Kaufmann,
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationPage 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationChapter 3 & Appendix C Part B: ILP and Its Exploitation
CS359: Computer Architecture Chapter 3 & Appendix C Part B: ILP and Its Exploitation Yanyan Shen Department of Computer Science and Engineering Shanghai Jiao Tong University 1 Outline 3.1 Concepts and
More informationCOSC4201 Instruction Level Parallelism Dynamic Scheduling
COSC4201 Instruction Level Parallelism Dynamic Scheduling Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Outline Data dependence and hazards Exposing parallelism
More informationLecture 4: Advanced Pipelines. Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10) 1 Hazards Structural hazards: different instructions in different stages (or the same stage)
More informationBackground: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation
Instruction Scheduling Last week Register allocation Background: Pipelining Basics Idea Begin executing an instruction before completing the previous one Today Instruction scheduling The problem: Pipelined
More informationDAT105: Computer Architecture Study Period 2, 2009 Exercise 3 Chapter 2: Instruction-Level Parallelism and Its Exploitation
Study Period 2, 2009 Exercise 3 Chapter 2: Instruction-Level Parallelism and Its Exploitation Mafijul Islam Department of Computer Science and Engineering November 19, 2009 Study Period 2, 2009 Goals:
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationEECC551 Exam Review 4 questions out of 6 questions
EECC551 Exam Review 4 questions out of 6 questions (Must answer first 2 questions and 2 from remaining 4) Instruction Dependencies and graphs In-order Floating Point/Multicycle Pipelining (quiz 2) Improving
More informationInstruction Level Parallelism
Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches
More informationT T T T T T N T T T T T T T T N T T T T T T T T T N T T T T T T T T T T T N.
A1: Architecture (25 points) Consider these four possible branch predictors: (A) Static backward taken, forward not taken (B) 1-bit saturating counter (C) 2-bit saturating counter (D) Global predictor
More informationCourse on Advanced Computer Architectures
Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, July 9, 2018 Course on Advanced Computer Architectures Prof. D. Sciuto, Prof. C. Silvano EX1 EX2 EX3 Q1
More informationILP: Instruction Level Parallelism
ILP: Instruction Level Parallelism Tassadaq Hussain Riphah International University Barcelona Supercomputing Center Universitat Politècnica de Catalunya Introduction Introduction Pipelining become universal
More informationCPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation
Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction
More informationHardware-Based Speculation
Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 21
More informationInstruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4
PROBLEM 1: An application running on a 1GHz pipelined processor has the following instruction mix: Instruction Frequency CPI Load-store 55% 5 Arithmetic 30% 4 Branch 15% 4 a) Determine the overall CPI
More informationHigh Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Dynamic Instruction Scheduling with Branch Prediction
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationLecture 19: Instruction Level Parallelism
Lecture 19: Instruction Level Parallelism Administrative: Homework #5 due Homework #6 handed out today Last Time: DRAM organization and implementation Today Static and Dynamic ILP Instruction windows Register
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationStatic vs. Dynamic Scheduling
Static vs. Dynamic Scheduling Dynamic Scheduling Fast Requires complex hardware More power consumption May result in a slower clock Static Scheduling Done in S/W (compiler) Maybe not as fast Simpler processor
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationReferences EE457. Out of Order (OoO) Execution. Instruction Scheduling (Re-ordering of instructions)
EE457 Out of Order (OoO) Execution Introduction to Dynamic Scheduling of Instructions (The Tomasulo Algorithm) By Gandhi Puvvada References EE557 Textbook Prof Dubois EE557 Classnotes Prof Annavaram s
More informationCPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts
CPE 110408443 Computer Architecture Appendix A: Pipelining: Basic and Intermediate Concepts Sa ed R. Abed [Computer Engineering Department, Hashemite University] Outline Basic concept of Pipelining The
More informationPage 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002
More informationCS252 Prerequisite Quiz. Solutions Fall 2007
CS252 Prerequisite Quiz Krste Asanovic Solutions Fall 2007 Problem 1 (29 points) The followings are two code segments written in MIPS64 assembly language: Segment A: Loop: LD r5, 0(r1) # r5 Mem[r1+0] LD
More informationADVANCED COMPUTER ARCHITECTURES: Prof. C. SILVANO Written exam 11 July 2011
ADVANCED COMPUTER ARCHITECTURES: 088949 Prof. C. SILVANO Written exam 11 July 2011 SURNAME NAME ID EMAIL SIGNATURE EX1 (3) EX2 (3) EX3 (3) EX4 (5) EX5 (5) EX6 (4) EX7 (5) EX8 (3+2) TOTAL (33) EXERCISE
More informationSolutions to exercises on Instruction Level Parallelism
Solutions to exercises on Instruction Level Parallelism J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Computer Architecture ARCOS Group Computer Science and Engineering
More informationCIS 662: Midterm. 16 cycles, 6 stalls
CIS 662: Midterm Name: Points: /100 First read all the questions carefully and note how many points each question carries and how difficult it is. You have 1 hour 15 minutes. Plan your time accordingly.
More informationLecture 5: Pipelining Basics
Lecture 5: Pipelining Basics Biggest contributors to performance: clock speed, parallelism Today: basic pipelining implementation (Sections A.1-A.3) 1 The Assembly Line Unpipelined Start and finish a job
More informationECE 571 Advanced Microprocessor-Based Design Lecture 4
ECE 571 Advanced Microprocessor-Based Design Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Homework #1 was due Announcements Homework #2 will be posted
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Basic Compiler Techniques for Exposing ILP Advanced Branch Prediction Dynamic Scheduling Hardware-Based Speculation
More informationILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)
Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case
More informationSlide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng
Slide Set 8 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 8 slide
More informationWeek 11: Assignment Solutions
Week 11: Assignment Solutions 1. Consider an instruction pipeline with four stages with the stage delays 5 nsec, 6 nsec, 11 nsec, and 8 nsec respectively. The delay of an inter-stage register stage of
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2018 Static Instruction Scheduling 1 Techniques to reduce stalls CPI = Ideal CPI + Structural stalls per instruction + RAW stalls per instruction + WAR stalls per
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationIn embedded systems there is a trade off between performance and power consumption. Using ILP saves power and leads to DECREASING clock frequency.
Lesson 1 Course Notes Review of Computer Architecture Embedded Systems ideal: low power, low cost, high performance Overview of VLIW and ILP What is ILP? It can be seen in: Superscalar In Order Processors
More informationCA226 Advanced Computer Architecture
Stephen Blott Today: data hazards Table of Contents 1 2 MIPS Pipeline Recall: the MIPS pipeline implements instruction level parallelism ideally, up to five instructions are executed
More informationLecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections )
Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 4.4) 1 Static vs Dynamic Scheduling Arguments against dynamic scheduling: requires complex structures
More informationChapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)
Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,
More informationLecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )
Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target
More informationLecture 4: Introduction to Advanced Pipelining
Lecture 4: Introduction to Advanced Pipelining Prepared by: Professor David A. Patterson Computer Science 252, Fall 1996 Edited and presented by : Prof. Kurt Keutzer Computer Science 252, Spring 2000 KK
More informationLecture: Pipeline Wrap-Up and Static ILP
Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2) 1 Multicycle
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More informationCMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture Instruction Level Parallelism Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson /
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationSpring 2014 Midterm Exam Review
mr 1 When / Where Spring 2014 Midterm Exam Review mr 1 Monday, 31 March 2014, 9:30-10:40 CDT 1112 P. Taylor Hall (Here) Conditions Closed Book, Closed Notes Bring one sheet of notes (both sides), 216 mm
More informationCMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1
CMCS 611-101 Advanced Computer Architecture Lecture 9 Pipeline Implementation Challenges October 5, 2009 www.csee.umbc.edu/~younis/cmsc611/cmsc611.htm Mohamed Younis CMCS 611, Advanced Computer Architecture
More informationChapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST
Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism
More informationEITF20: Computer Architecture Part3.2.1: Pipeline - 3
EITF20: Computer Architecture Part3.2.1: Pipeline - 3 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Dynamic scheduling - Tomasulo Superscalar, VLIW Speculation ILP limitations What we have done
More informationPipelining: Issue instructions in every cycle (CPI 1) Compiler scheduling (static scheduling) reduces impact of dependences
Dynamic Scheduling Pipelining: Issue instructions in every cycle (CPI 1) Compiler scheduling (static scheduling) reduces impact of dependences Increased compiler complexity, especially when attempting
More informationInstr. execution impl. view
Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction
More information