INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
|
|
- Esmond Kelley
- 5 years ago
- Views:
Transcription
1 UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version English Lecture 07 Title: Pipelining - Data and Summary: Minimization of data hazards (revision) by software, read-write in opposite clock edges, data forwarding; Minimization of control hazards (anticipation of branch resolution, delayed branches, static and dynamic branch prediction). 2010/2011 Nuno.Roma@ist.utl.pt
2 Architectures for Embedded Computing Pipelining: Data and Prof. Nuno Roma ACE 2010/11 - DEI-IST 1 / 37 Previous Class In the previous class... Analysis of a program execution in a pipeline; Hazards in the pipeline: Structural hazards; Data hazards: Types of hazards; Overcoming the Stalls: By software; By writing in opposite edges of the clock cycle; By data forwarding. Prof. Nuno Roma ACE 2010/11 - DEI-IST 2 / 37
3 Road Map Prof. Nuno Roma ACE 2010/11 - DEI-IST 3 / 37 Summary Today: - Stall minimization (revision): By software, Read-write in opposite clock edges, Data forwarding. - Stall minimization: Anticipation of branch resolution, Delayed branches, Branch prediction: Static, Dynamic. Bibliography: Computer Architecture: a Quantitative Approach, Sections A.2 to A.3; 2.1 to 2.3 Prof. Nuno Roma ACE 2010/11 - DEI-IST 4 / 37
4 Prof. Nuno Roma ACE 2010/11 - DEI-IST 5 / 37 i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 37
5 i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. WAW (write after write) j tries to write before i writes, storing the i value instead of j Output dependency. Does not happen in the basic version of the pipeline. Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 37 i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. WAW (write after write) j tries to write before i writes, storing the i value instead of j Output dependency. Does not happen in the basic version of the pipeline. WAR (write after read) j tries to write before i reads, loading the updated value instead of the previous result of an anti-dependency. Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 37
6 i instruction before j: RAW (read after write) j tries to read before i writes, loading the previous value. True data dependency most common hazard in pipelines. WAW (write after write) j tries to write before i writes, storing the i value instead of j Output dependency. Does not happen in the basic version of the pipeline. WAR (write after read) j tries to write before i reads, loading the updated value instead of the previous result of an anti-dependency. RAR (read after read)? Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 37 Data Forwarding Data-Forwarding: Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 37
7 Pipeline Structure Prof. Nuno Roma ACE 2010/11 - DEI-IST 8 / 37 Implementation of Data Forwarding Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 37
8 Data Forwarding Data forwarding does not always prevent the introduction of Stalls: Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 37 Data Forwarding: Example Original: LD R8,0(R1) F D X M W DADD R9,R9,R8 F D X M W SD R9,0(R1) F D X M W DADDUI R1,R1,#-8 F D X M W LD R8,0(R1) F Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 37
9 Data Forwarding: Example Original: LD R8,0(R1) F D X M W DADD R9,R9,R8 F D X M W SD R9,0(R1) F D X M W DADDUI R1,R1,#-8 F D X M W LD R8,0(R1) F With data forwarding: LD R8,0(R1) F D X M W DADD R9,R9,R8 F D X M W SD R9,0(R1) F D X M W DADDUI R1,R1,#-8 F D X M W LD R8,0(R1) F Prof. Nuno Roma ACE 2010/11 - DEI-IST 12 / 37 Prof. Nuno Roma ACE 2010/11 - DEI-IST 13 / 37
10 Happens as a consequence of an interruption in the sequentiality of the instruction execution processes, as implied by the pipeline structure LD R8,0(R1) F D X M W DADD R9,R9,R8 F D X M W SD R9,0(R1) F D X M W DADDUI R1,R1,#-8 F D X M W LD R8,0(R1)???? Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 37 : pipeline structure Prof. Nuno Roma ACE 2010/11 - DEI-IST 15 / 37
11 Happens as a consequence of an interruption in the sequentiality of the instruction execution processes, as implied by the pipeline structure LD R8,0(R1) F D X M W DADD R9,R9,R8 F D X M W SD R9,0(R1) F D X M W DADDUI R1,R1,#-8 F D X M W LD R8,0(R1)???? Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 37 Happens as a consequence of an interruption in the sequentiality of the instruction execution processes, as implied by the pipeline structure LD R8,0(R1) F D X M W DADD R9,R9,R8 F D X M W SD R9,0(R1) F D X M W DADDUI R1,R1,#-8 F D X M W LD R8,0(R1) F S F S F S F D... Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 37
12 : optimizations Optimizations: Anticipation of branch resolution Delayed branches Branch prediction Static Dynamic Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 37 : optimizations Optimizations: Anticipation of branch resolution Delayed branches Branch prediction Static Dynamic Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 37
13 : pipeline structure Prof. Nuno Roma ACE 2010/11 - DEI-IST 18 / 37 Anticipation of Branch Resolution The evaluation of the branch condition and of the target address is anticipated to the ID stage of the pipeline. Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 37
14 Anticipation of Branch Resolution Without anticipation of branch resolution: LD R8,0(R1) F D X M W DADD R9,R9,R8 F D X M W SD R9,0(R1) F D X M W DADDUI R1,R1,#-8 F D X M W LD R8,0(R1) F S F S F S F Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 37 Anticipation of Branch Resolution Without anticipation of branch resolution: LD R8,0(R1) F D X M W DADD R9,R9,R8 F D X M W SD R9,0(R1) F D X M W DADDUI R1,R1,#-8 F D X M W LD R8,0(R1) F S F S F S F With anticipation of branch resolution: LD R8,0(R1) F D X M W DADD R9,R9,R8 F D X M W SD R9,0(R1) F D X M W DADDUI R1,R1,#-8 F D X M W LD R8,0(R1) F S F D X Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 37
15 : optimizations Optimizations: Anticipation of branch resolution Delayed branches Branch prediction Static Dynamic Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 37 Delayed Branches Use the time required to resolve the branch to execute useful instructions: These instructions execute completely, either the branch is taken or not. How? Select one instruction prior to the branch to place in the delay-slot, provided that the sequentiality of the data processing is kept. Example: Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 37
16 Delayed Branches Without using the delay-slot: LD R8,0(R1) F D X M W DADD R9,R9,R8 F S F D X M W SD R9,0(R1) F D X M W DADDUI R1,R1,#-8 F D X M W LD R8,0(R1) F S F D X Using the delay-slot: LD R8,0(R1) F D X M W DADD R9,R9,R8 F S F D X M W DADDUI R1,R1,#-8 F D X M W SD R9,8(R1) F D X M W LD R8,0(R1) F D X M Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 37 Delayed Branches Compiler: Reorder the instructions in order to take maximum advantage of the delay-slot, respecting all the program data dependencies. Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 37
17 : optimizations Optimizations: Anticipation of branch resolution Delayed branches Branch prediction Static Dynamic Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 37 Branch Prediction: Static Implemented by the compiler; Upon each branch instruction, instead of interrupting the pipeline until the new value of the PC register is updated: 1. Assume one decision beforehand: take or not take the branch; 2. Continue executing the instructions according to that premonition; 3. At the branch resolution instant, check if the assumed premonition is correct or not: Yes: continue the normal execution; No: eliminate the instructions that are being executed in the pipeline and re-start executing at the new value of the PC register. Prof. Nuno Roma ACE 2010/11 - DEI-IST 27 / 37
18 Branch Prediction: Static EXAMPLE: Predict Taken Pipeline WITHOUT anticipation of the branch resolution Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 37 Branch Prediction: Static Predict Taken Case 1: Branch condition is TRUE Correct prediction!!! DADDUI R1,R1,#-8 F D X M W LD R8,0(R1) F D X M W DADD R9,R9,R8 F S F D X M W SD R9,0(R1) F D X M W DADDUI R1,R1,#-8 F D X M W Case 2: Branch condition is FALSE Wrong prediction!!! DADDUI R1,R1,#-8 F D X M W LD R8,0(R1) F D DADD R9,R9,R8 F... F D X M W... F D X M W Prof. Nuno Roma ACE 2010/11 - DEI-IST 29 / 37
19 Branch Prediction: Static The compiler may use information collected from the execution of previous instructions of the same program + processed data (profile based prediction); Prof. Nuno Roma ACE 2010/11 - DEI-IST 30 / 37 Branch Prediction: Static The compiler may use information collected from the execution of previous instructions of the same program + processed data (profile based prediction); In practice... Predict Not Taken approach is easier to implement; The compiler may choose the type of conditional branch in order to adjust the branch prediction to the most frequent path; Usual rule: Forward branches Predict Not Taken Backward branches Predict Taken Prof. Nuno Roma ACE 2010/11 - DEI-IST 30 / 37
20 : optimizations Optimizations: Anticipation of branch resolution Delayed branches Branch prediction Static Dynamic Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 37 Branch Prediction: Dynamic The objective is to adapt the branch prediction to the execution pattern that was observed in that program until that instant. The most simple implementation is to use an one-bit memory, addressed with the least significant bits of the address of the branch instruction: Branch-Prediction Buffer or Branch History Table Prof. Nuno Roma ACE 2010/11 - DEI-IST 32 / 37
21 Branch Prediction: Dynamic In each position, this memory accommodates: the address of the branch instruction (current PC); the address of the target instruction, in case of a taken branch; one bit that indicates either the branch instruction was recently taken or not taken. Prof. Nuno Roma ACE 2010/11 - DEI-IST 33 / 37 Branch Prediction: Dynamic In each position, this memory accommodates: the address of the branch instruction (current PC); the address of the target instruction, in case of a taken branch; one bit that indicates either the branch instruction was recently taken or not taken. Usually, these tables may have 4096 or even more entries. Prof. Nuno Roma ACE 2010/11 - DEI-IST 33 / 37
22 Branch Prediction: Dynamic Problem: in a program loop, this predictor fails twice... Prof. Nuno Roma ACE 2010/11 - DEI-IST 34 / 37 Branch Prediction: Dynamic Optimization - usage of an n-bits counter: Increment whenever the branch is taken; Decrement whenever the branch is not taken; Predict the branch as taken whenever the counter is greater than one half of its maximum range. Prof. Nuno Roma ACE 2010/11 - DEI-IST 35 / 37
23 Branch Prediction: Dynamic Optimization - usage of an n-bits counter: Increment whenever the branch is taken; Decrement whenever the branch is not taken; Predict the branch as taken whenever the counter is greater than one half of its maximum range. In practice, n = 2 works almost as well as any other greater value of n. Prof. Nuno Roma ACE 2010/11 - DEI-IST 35 / 37 Prof. Nuno Roma ACE 2010/11 - DEI-IST 36 / 37
24 Pipelining - implementation problems: Multi-cycle instructions; Super-pipelining; Interruptions; Code optimization to a pipeline execution. Prof. Nuno Roma ACE 2010/11 - DEI-IST 37 / 37
INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 06
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 05
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 14
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 09
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 04
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 17
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 03 Title: Processor
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 16
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended
More informationChapter 06: Instruction Pipelining and Parallel Processing
Chapter 06: Instruction Pipelining and Parallel Processing Lesson 09: Superscalar Processors and Parallel Computer Systems Objective To understand parallel pipelines and multiple execution units Instruction
More informationThe Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.
The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions
More informationEE 4683/5683: COMPUTER ARCHITECTURE
EE 4683/5683: COMPUTER ARCHITECTURE Lecture 4A: Instruction Level Parallelism - Static Scheduling Avinash Kodi, kodi@ohio.edu Agenda 2 Dependences RAW, WAR, WAW Static Scheduling Loop-carried Dependence
More informationInstruction-Level Parallelism (ILP)
Instruction Level Parallelism Instruction-Level Parallelism (ILP): overlap the execution of instructions to improve performance 2 approaches to exploit ILP: 1. Rely on hardware to help discover and exploit
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationCS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes
CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.
More informationPage 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not
More informationInstruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction
Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches
More informationHardware-Based Speculation
Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of
More informationPage # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationLecture 4: Advanced Pipelines. Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10) 1 Hazards Structural hazards: different instructions in different stages (or the same stage)
More informationMinimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline
Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationPipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &
More informationEEC 581 Computer Architecture. Lec 4 Instruction Level Parallelism
EEC 581 Computer Architecture Lec 4 Instruction Level Parallelism Chansu Yu Electrical and Computer Engineering Cleveland State University Acknowledgement Part of class notes are from David Patterson Electrical
More informationILP: Instruction Level Parallelism
ILP: Instruction Level Parallelism Tassadaq Hussain Riphah International University Barcelona Supercomputing Center Universitat Politècnica de Catalunya Introduction Introduction Pipelining become universal
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationLecture 19: Instruction Level Parallelism
Lecture 19: Instruction Level Parallelism Administrative: Homework #5 due Homework #6 handed out today Last Time: DRAM organization and implementation Today Static and Dynamic ILP Instruction windows Register
More informationInstruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4
PROBLEM 1: An application running on a 1GHz pipelined processor has the following instruction mix: Instruction Frequency CPI Load-store 55% 5 Arithmetic 30% 4 Branch 15% 4 a) Determine the overall CPI
More informationELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism
ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University,
More informationInstruction Level Parallelism
Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches
More informationHardware-based Speculation
Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions
More informationLecture: Pipeline Wrap-Up and Static ILP
Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2) 1 Multicycle
More informationWhat is Pipelining? RISC remainder (our assumptions)
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 21
More informationPage 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationT T T T T T N T T T T T T T T N T T T T T T T T T N T T T T T T T T T T T N.
A1: Architecture (25 points) Consider these four possible branch predictors: (A) Static backward taken, forward not taken (B) 1-bit saturating counter (C) 2-bit saturating counter (D) Global predictor
More informationAdvanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012
Advanced Computer Architecture CMSC 611 Homework 3 Due in class Oct 17 th, 2012 (Show your work to receive partial credit) 1) For the following code snippet list the data dependencies and rewrite the code
More informationIn embedded systems there is a trade off between performance and power consumption. Using ILP saves power and leads to DECREASING clock frequency.
Lesson 1 Course Notes Review of Computer Architecture Embedded Systems ideal: low power, low cost, high performance Overview of VLIW and ILP What is ILP? It can be seen in: Superscalar In Order Processors
More informationECE473 Computer Architecture and Organization. Pipeline: Control Hazard
Computer Architecture and Organization Pipeline: Control Hazard Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 15.1 Pipelining Outline Introduction
More informationECE 505 Computer Architecture
ECE 505 Computer Architecture Pipelining 2 Berk Sunar and Thomas Eisenbarth Review 5 stages of RISC IF ID EX MEM WB Ideal speedup of pipelining = Pipeline depth (N) Practically Implementation problems
More informationPipeline issues. Pipeline hazard: RaW. Pipeline hazard: RaW. Calcolatori Elettronici e Sistemi Operativi. Hazards. Data hazard.
Calcolatori Elettronici e Sistemi Operativi Pipeline issues Hazards Pipeline issues Data hazard Control hazard Structural hazard Pipeline hazard: RaW Pipeline hazard: RaW 5 6 7 8 9 5 6 7 8 9 : add R,R,R
More informationCMSC411 Fall 2013 Midterm 2 Solutions
CMSC411 Fall 2013 Midterm 2 Solutions 1. (12 pts) Memory hierarchy a. (6 pts) Suppose we have a virtual memory of size 64 GB, or 2 36 bytes, where pages are 16 KB (2 14 bytes) each, and the machine has
More informationDYNAMIC SPECULATIVE EXECUTION
DYNAMIC SPECULATIVE EXECUTION Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson, Morgan Kaufmann,
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationCS252 Prerequisite Quiz. Solutions Fall 2007
CS252 Prerequisite Quiz Krste Asanovic Solutions Fall 2007 Problem 1 (29 points) The followings are two code segments written in MIPS64 assembly language: Segment A: Loop: LD r5, 0(r1) # r5 Mem[r1+0] LD
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Basic Compiler Techniques for Exposing ILP Advanced Branch Prediction Dynamic Scheduling Hardware-Based Speculation
More informationLecture 5: Pipelining Basics
Lecture 5: Pipelining Basics Biggest contributors to performance: clock speed, parallelism Today: basic pipelining implementation (Sections A.1-A.3) 1 The Assembly Line Unpipelined Start and finish a job
More informationAdministrivia. CMSC 411 Computer Systems Architecture Lecture 14 Instruction Level Parallelism (cont.) Control Dependencies
Administrivia CMSC 411 Computer Systems Architecture Lecture 14 Instruction Level Parallelism (cont.) HW #3, on memory hierarchy, due Tuesday Continue reading Chapter 3 of H&P Alan Sussman als@cs.umd.edu
More informationWhat is Pipelining? Time per instruction on unpipelined machine Number of pipe stages
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationThe basic structure of a MIPS floating-point unit
Tomasulo s scheme The algorithm based on the idea of reservation station The reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from
More informationStatic vs. Dynamic Scheduling
Static vs. Dynamic Scheduling Dynamic Scheduling Fast Requires complex hardware More power consumption May result in a slower clock Static Scheduling Done in S/W (compiler) Maybe not as fast Simpler processor
More informationRecall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationCOSC 6385 Computer Architecture. Instruction Level Parallelism
COSC 6385 Computer Architecture Instruction Level Parallelism Spring 2013 Instruction Level Parallelism Pipelining allows for overlapping the execution of instructions Limitations on the (pipelined) execution
More informationLecture 8: Branch Prediction, Dynamic ILP. Topics: static speculation and branch prediction (Sections )
Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections 2.3-2.6) 1 Correlating Predictors Basic branch prediction: maintain a 2-bit saturating counter for each
More informationBackground: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation
Instruction Scheduling Last week Register allocation Background: Pipelining Basics Idea Begin executing an instruction before completing the previous one Today Instruction scheduling The problem: Pipelined
More informationLecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest
More informationHigh Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Dynamic Instruction Scheduling with Branch Prediction
More informationReduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:
Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by: Result forwarding (register bypassing) to reduce or eliminate stalls needed
More informationPage 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002
More informationPipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction
More informationECE 571 Advanced Microprocessor-Based Design Lecture 4
ECE 571 Advanced Microprocessor-Based Design Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Homework #1 was due Announcements Homework #2 will be posted
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Pipelining 11142011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review I/O Chapter 5 Overview Pipelining Pipelining
More informationLecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.SP96 1 Review: Evaluating Branch Alternatives Two part solution: Determine
More informationLimiting The Data Hazards by Combining The Forwarding with Delay Slots Operations to Improve Dynamic Branch Prediction in Superscalar Processor
Limiting The Data Hazards by Combining The Forwarding with Delay Slots Operations to Improve Dynamic Branch Prediction in Superscalar Processor S.A.Hadoud and A.M.Mosbah Azzaituna University, Tarhuna Libya
More informationILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)
Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case
More informationLecture Topics ECE 341. Lecture # 14. Unconditional Branches. Instruction Hazards. Pipelining
ECE 341 Lecture # 14 Instructor: Zeshan Chishti zeshan@ece.pdx.edu November 17, 2014 Portland State University Lecture Topics Pipelining Instruction Hazards Branch penalty Branch delay slot optimization
More informationWilliam Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors
William Stallings Computer Organization and Architecture 8 th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors What is Superscalar? Common instructions (arithmetic, load/store,
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,
More informationAs the amount of ILP to exploit grows, control dependences rapidly become the limiting factor.
Hiroaki Kobayashi // As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Branches will arrive up to n times faster in an n-issue processor, and providing an instruction
More informationThe Processor: Improving the performance - Control Hazards
The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary
More informationGetting CPI under 1: Outline
CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics
More informationLecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ
Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ 1 An Out-of-Order Processor Implementation Reorder Buffer (ROB)
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationChapter 3 & Appendix C Part B: ILP and Its Exploitation
CS359: Computer Architecture Chapter 3 & Appendix C Part B: ILP and Its Exploitation Yanyan Shen Department of Computer Science and Engineering Shanghai Jiao Tong University 1 Outline 3.1 Concepts and
More informationNOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline
CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationDonn Morrison Department of Computer Science. TDT4255 ILP and speculation
TDT4255 Lecture 9: ILP and speculation Donn Morrison Department of Computer Science 2 Outline Textbook: Computer Architecture: A Quantitative Approach, 4th ed Section 2.6: Speculation Section 2.7: Multiple
More informationLecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections )
Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 4.4) 1 Static vs Dynamic Scheduling Arguments against dynamic scheduling: requires complex structures
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy
More informationReduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction
ISA Support Needed By CPU Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with control hazards in instruction pipelines by: 1 2 3 4 Assuming that the branch
More informationLecture 2: Pipelining Basics. Today: chapter 1 wrap-up, basic pipelining implementation (Sections A.1 - A.4)
Lecture 2: Pipelining Basics Today: chapter 1 wrap-up, basic pipelining implementation (Sections A.1 - A.4) 1 Defining Fault, Error, and Failure A fault produces a latent error; it becomes effective when
More informationLecture: Static ILP. Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)
Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2) 1 Static vs Dynamic Scheduling Arguments against dynamic scheduling: requires complex structures
More informationTi Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr
Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions
More informationCPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts
CPE 110408443 Computer Architecture Appendix A: Pipelining: Basic and Intermediate Concepts Sa ed R. Abed [Computer Engineering Department, Hashemite University] Outline Basic concept of Pipelining The
More informationCOSC4201 Pipelining. Prof. Mokhtar Aboelaze York University
COSC4201 Pipelining Prof. Mokhtar Aboelaze York University 1 Instructions: Fetch Every instruction could be executed in 5 cycles, these 5 cycles are (MIPS like machine). Instruction fetch IR Mem[PC] NPC
More informationSuperscalar Processors
Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input
More informationPIPELINING: HAZARDS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PIPELINING: HAZARDS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 1 submission deadline: Jan. 30 th This
More informationHiroaki Kobayashi 12/21/2004
Hiroaki Kobayashi 12/21/2004 1 Loop Unrolling Static Branch Prediction Static Multiple Issue: The VLIW Approach Software Pipelining Global Code Scheduling Trace Scheduling Superblock Scheduling Conditional
More information( ) תשס"ח סמסטר ב' May, 2008 Hugo Guterman Web site:
ארכיטקטורת יחידת עיבוד מרכזית (36113741) תשס"ח סמסטר ב' May, 2008 Hugo Guterman (hugo@ee.bgu.ac.il) Web site: http://www.ee.bgu.ac.il/~cpuarch Arch. CPU L5 Pipeline II 1 Outline More pipelining Control
More informationLecture 8: Compiling for ILP and Branch Prediction. Advanced pipelining and instruction level parallelism
Lecture 8: Compiling for ILP and Branch Prediction Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 Advanced pipelining and instruction level parallelism
More informationCS433 Homework 3 (Chapter 3)
CS433 Homework 3 (Chapter 3) Assigned on 10/3/2017 Due in class on 10/17/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies
More informationPage # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two?
Exploiting ILP through Software Approaches Venkatesh Akella EEC 270 Winter 2005 Based on Slides from Prof. Al. Davis @ cs.utah.edu Let the Compiler Do it Pros and Cons Pros No window size limitation, the
More information