Chapter 06: Instruction Pipelining and Parallel Processing

Similar documents
Chapter 3 & Appendix C Part B: ILP and Its Exploitation

Dynamic Scheduling. CSE471 Susan Eggers 1

In embedded systems there is a trade off between performance and power consumption. Using ILP saves power and leads to DECREASING clock frequency.

ECE 505 Computer Architecture

Instruction-Level Parallelism and Its Exploitation

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Hardware-based Speculation

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Instruction Level Parallelism (ILP)

CMSC411 Fall 2013 Midterm 2 Solutions

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

Portland State University ECE 587/687. The Microarchitecture of Superscalar Processors

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Copyright 2012, Elsevier Inc. All rights reserved.

RECAP. B649 Parallel Architectures and Programming

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

EITF20: Computer Architecture Part3.2.1: Pipeline - 3

ILP: Instruction Level Parallelism

Lecture 19: Instruction Level Parallelism

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )

Data-flow prescheduling for large instruction windows in out-of-order processors. Pierre Michaud, André Seznec IRISA / INRIA January 2001

Tomasulo s Algorithm

What is ILP? Instruction Level Parallelism. Where do we find ILP? How do we expose ILP?

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

COSC4201 Instruction Level Parallelism Dynamic Scheduling

Instruction Level Parallelism

Superscalar Processors

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

CIS 662: Midterm. 16 cycles, 6 stalls

Instruction-Level Parallelism (ILP)

Multiple Instruction Issue and Hardware Based Speculation

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , )

Hardware-Based Speculation

Advanced Computer Architecture

University of Toronto Faculty of Applied Science and Engineering

Pipeline issues. Pipeline hazard: RaW. Pipeline hazard: RaW. Calcolatori Elettronici e Sistemi Operativi. Hazards. Data hazard.

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

Multi-cycle Instructions in the Pipeline (Floating Point)

The basic structure of a MIPS floating-point unit

Advanced Pipelining and Instruction- Level Parallelism 4

Chapter 7. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 7 <1>

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Lecture 8: Branch Prediction, Dynamic ILP. Topics: static speculation and branch prediction (Sections )

Structure of Computer Systems

Lecture-13 (ROB and Multi-threading) CS422-Spring

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Chapter 9. Pipelining Design Techniques

Course on Advanced Computer Architectures

Lecture 11: Out-of-order Processors. Topics: more ooo design details, timing, load-store queue

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

EE 4683/5683: COMPUTER ARCHITECTURE

Pipelining. CSC Friday, November 6, 2015

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Advanced Instruction-Level Parallelism

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

CS425 Computer Systems Architecture

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

EECS 452 Lecture 9 TLP Thread-Level Parallelism

E0-243: Computer Architecture

COMPUTER ORGANIZATION AND DESI

Instruction Pipelining

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Instruction Pipelining

UNIT I (Two Marks Questions & Answers)

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Website for Students VTU NOTES QUESTION PAPERS NEWS RESULTS

DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING

CSE502 Graduate Computer Architecture. Lec 22 Goodbye to Computer Architecture and Review

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Instruction scheduling. Advanced Compiler Construction Michel Schinz

Instruction Level Parallelism

Tutorial 11. Final Exam Review

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

Getting CPI under 1: Outline

5008: Computer Architecture

Lecture 4: Advanced Pipelines. Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

5008: Computer Architecture HW#2

Advanced Computer Architecture

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

Department of Computer Science and Engineering

Updated Exercises by Diana Franklin

S = 32 2 d kb (1) L = 32 2 D B (2) A = 2 2 m mod 4 (3) W = 16 2 y mod 4 b (4)

Computer Architecture Spring 2016

Adapted from David Patterson s slides on graduate computer architecture

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Out of Order Processing

Processor: Superscalars Dynamic Scheduling

Floating Point/Multicycle Pipelining in DLX

Static vs. Dynamic Scheduling

Transcription:

Chapter 06: Instruction Pipelining and Parallel Processing Lesson 09: Superscalar Processors and Parallel Computer Systems

Objective To understand parallel pipelines and multiple execution units Instruction level parallelism in superscalar processors 2

Multiple execution Units in Superscalar Processor 3

Superscalar Processor Multiple execution units to execute instructions Each execution unit reads its operands from and writes its results to a single, centralized register file 4

Multiple execution Units - Superscalar Processor When an operation writes its result back to the register file, that result becomes visible to all of the execution units on the next cycle, allowing operations to execute on different units from the operations that generate their inputs 5

Instruction Issue Logic and Four execution units in a superscalar 6

Instruction Level Parallelism (ILP) in Superscalar processors Have complex bypassing hardware that forwards the results of each instruction to all of the execution units to reduce the delay between dependent instructions The instructions that make up a program are handled in superscalar processors by the instruction issue logic, which issues instructions to the units in parallel 7

Superscalar processors Instruction Issue Logic Allows control flow changes, such as branches, to occur simultaneously across all of the units, making it much easier to write and compile programs for instruction-level parallel superscalar processors 8

A superscalar processor hardware Extracts instruction-level parallelism from sequential programs During each cycle, the instruction issue logic of a superscalar processor examines the instructions in the sequential program to determine which instructions may be issued on that cycle 9

Instruction level parallelism 10

Instruction-level parallelism If enough parallelism exists within a program, a superscalar processor can execute one instruction per execution unit per cycle Even if the program was originally compiled for execution on a processor that could only execute one instruction per cycle 11

Instruction-level parallelism Instruction-level parallel processors exploit the fact that many of the instructions in a sequential program do not depend on the instructions that immediately precede them in the program 12

Example of ILP 13

Example Instructions I1, I3, and I5 are dependent on each other because I1 generates a data value that is used by I3, which generates a result that is used by I5 14

Example I2 and I4 do not use the results of any other instructions in the fragment and do not generate any results that are used by instructions in the fragment 15

Data Dependencies Require that I1, I3, and I5 be executed in order to generate the correct result I2 and I4 can be executed before, after, or in parallel with any of the other instructions without changing the results of the program fragment 16

Latency Program fragment in three cycles if each instruction had a latency of one cycle Because I1, I3, and I5 are dependent, it is not possible to reduce the execution time of the fragment any further by increasing the number of instructions that the processor can execute simultaneously 17

Techniques to Overcome ILP processor Performance Limitations from Data Dependency related Stalls, Branch Penalties, and other factors 18

Limitations RAW hazard (dependencies) limit performance by requiring that instructions be executed in sequence to generate the correct results, and they represent a fundamental limitation on the amount of performance Possess hardware to efficiently take care of the collisions in execution unit and of data and control hazards 19

Limitations Instructions with WAW hazard (dependencies) and WAR dependencies Branch Penalty Limitations 20

Limitations Processor does not know which instructions will be executed after a branch until the branch has completed Branches limitation on instruction-level parallelism Requirement of Branch prediction hardware 21

Hardware Characteristics of superscalar Reduces data dependencies (data hazards) and check the stalls efficiently Reduces control dependencies (hazards) and branch delays efficiently 22

Hardware Characteristics Select the program, order and process in order but results evaluated later out of order 23

Hardware reservation station Virtual execution unit and register renaming Enabling dynamic scheduling (hardware out-oforder reordering) of the instruction to increase the performance by taking care of above limiting factor 24

Summary 25

We Learnt Example of Parallel Pipelines Improvement in Performance Limitations and methods to overcome these Scheduling the instructions dynamically and by predicting branches by hardware 26

End of Lesson 09 on Superscalar Processors and Parallel Computer Systems 27