Case Study IBM PowerPC 620

Similar documents
Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Hardware-based Speculation

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Dynamic Scheduling. CSE471 Susan Eggers 1

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING

EITF20: Computer Architecture Part3.2.1: Pipeline - 3

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

Four Steps of Speculative Tomasulo cycle 0

Handout 2 ILP: Part B

15-740/ Computer Architecture Lecture 8: Issues in Out-of-order Execution. Prof. Onur Mutlu Carnegie Mellon University

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

5008: Computer Architecture

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,

E0-243: Computer Architecture

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

COSC4201 Instruction Level Parallelism Dynamic Scheduling

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

Hardware-Based Speculation

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

PowerPC 740 and 750

Lecture-13 (ROB and Multi-threading) CS422-Spring

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

Advanced Computer Architecture

Metodologie di Progettazione Hardware-Software

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example

Multiple Instruction Issue. Superscalars

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1

EECC551 Exam Review 4 questions out of 6 questions

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)

Chapter 4 The Processor 1. Chapter 4D. The Processor

CMSC22200 Computer Architecture Lecture 8: Out-of-Order Execution. Prof. Yanjing Li University of Chicago

Processor: Superscalars Dynamic Scheduling

Advanced Computer Architecture. Chapter 4: More sophisticated CPU architectures

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

Superscalar Processors Ch 14

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Website for Students VTU NOTES QUESTION PAPERS NEWS RESULTS

Static vs. Dynamic Scheduling

Multiple Instruction Issue and Hardware Based Speculation

ECE/CS 552: Introduction to Superscalar Processors

CS433 Homework 2 (Chapter 3)

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods

Superscalar Processors

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

Super Scalar. Kalyan Basu March 21,

Portland State University ECE 587/687. Superscalar Issue Logic

CS425 Computer Systems Architecture

Hardware Speculation Support

EC 513 Computer Architecture

Superscalar Processor Design

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011

Superscalar Machines. Characteristics of superscalar processors

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

The Tomasulo Algorithm Implementation

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

Limitations of Scalar Pipelines

Instruction Level Parallelism

CS433 Homework 2 (Chapter 3)

Architectures for Instruction-Level Parallelism

Multicycle ALU Operations 2/28/2011. Diversified Pipelines The Path Toward Superscalar Processors. Limitations of Our Simple 5 stage Pipeline

CISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3

Instruction-Level Parallelism and Its Exploitation

Copyright 2012, Elsevier Inc. All rights reserved.

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

Scoreboard information (3 tables) Four stages of scoreboard control

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

CS152 Computer Architecture and Engineering. Complex Pipelines

Complex Pipelining: Out-of-order Execution & Register Renaming. Multiple Function Units

15-740/ Computer Architecture Lecture 12: Issues in OoO Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Complex Pipelining COE 501. Computer Architecture Prof. Muhamed Mudawar

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

Superscalar Organization

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

For this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )

CS152 Computer Architecture and Engineering. Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

Computer Architecture Lecture 13: State Maintenance and Recovery. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/15/2013

PowerPC 620 Case Study

LSU EE 4720 Dynamic Scheduling Study Guide Fall David M. Koppelman. 1.1 Introduction. 1.2 Summary of Dynamic Scheduling Method 3

Advanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012

Computer Architectures. Chapter 4. Tien-Fu Chen. National Chung Cheng Univ.

ECE 552: Introduction To Computer Architecture 1. Scalar upper bound on throughput. Instructor: Mikko H Lipasti. University of Wisconsin-Madison

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士

Superscalar Processor

Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

CS 423 Computer Architecture Spring Lecture 04: A Superscalar Pipeline

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Two Dynamic Scheduling Methods

Transcription:

Case Study IBM PowerPC 620 year shipped: 1995 allowing out-of-order execution (dynamic scheduling) and in-order commit (hardware speculation). using a reorder buffer to track when instruction can commit, but using extra extra renaming registers (8 int and 12 FT) to hold uncommitted results. When an instruction issues, it is allocated a rename register for holding the result when it completes; when it finally commits, the result is copied to the permanent register. using separate load and store buffers to hold EFA; store buffers also hold EFA and data until store instructions are ready to commit. Chapter 4 page 83 CS 5515

Superscalar PowerPC 620 64-bit bus PowerPC architecture can fetch, issue and complete 4 instructions/cycle has six execution units with their own reservation stations -- several RSs share one execution unit two simple integer units XSU0 and XSU1 for integer add/subtract/logical take one cycle one complex integer unit MCFXU for integer multiply and divide latency of 3 to 20 cycles pipelined for multiply and unpipelined for divide Chapter 4 page 84 CS 5515

PowerPC 620 Execution Units one load-store unit -- LSU execution latency: integer load is 1 and FP load is 2 cycles fully pipelined and has its own EFA adder has load/store buffers to hold EFA and/or data; load results are written to rename registers and store results are held in the store buffers until instructions commit detect memory alias to allow loads bypass pending stores one floating point unit -- FPU different latencies for use of results by another FP inst. 2 cycles for multiply (fully pipelined); 31 cycles for divide one branch unit -- BPU completes BR and informs fetch and reorder buffer unit of misprediction can evaluate branches independent of other instructions Chapter 4 page 85 CS 5515

Inst. Execution Steps in PowerPC 620 Fetch - PC can be obtained from looking up a 256-entry two-way set associative branch target buffer (BTB) - if it is a branch and BTB misses then use a 2048-entry branch prediction buffer (BPB) to predict the branch outcome. - also uses a stack to predict call returning addresses Decode: decode 4 instructions per cycle Issue: issue the instructions to the RSs and also read operands from the register files if available Execution - execution unit contention occurs when more than one RS want to use the execution unit at the same cycle - RS is freed when the inst uses the execution unit Chapter 4 page 86 CS 5515

Inst. Execution Steps in PowerPC 620 Execution - results are written to the rename register via CDB and also forwarded to any other RSs that need the result. The result is tagged with the name of the renaming register, not the reorder buffer number Commit - commit the instruction when all previous instructions have been committed, i.e., no exception and no speculation. - rename registers are written to the permanent register file and freed - for a store instruction, LSU is notified so it can write stored data to the memory Chapter 4 page 87 CS 5515

Limitation to PowerPC performance (Ideal CPI=1/4 but actual is 1/1.3) shortage of replicated execution units RSs may compete for the same execution unit losses in specific stages fetch: misprediction, cache misses, etc. issue: RS, renaming regs, or reorder buffer not available EX: source operands, FU not available, etc. commit: lack of register/memory write ports limited ILP in programs and finite buffering Chapter 4 page 88 CS 5515

PowerPC 620 Fetch unit cache Branch correction Dispatch unit with 8-entry instruction queue dispatch buses Completion unit with reorder buffer Reorder buffer information GP operand buses Register nos. operation buses GP registers Register nos. FP registers Register nos. FP operand buses Register nos. Reservation stations XSU0 XSU1 MCFSU LSU FPU BPU GP result buses FP result buses Result status buses Data cache Chapter 4 page 89 CS 5515

PowerPC pipeline stages Fetch Issue Execute Commit memory buffer Registers Reservation stations Reorder buffer XSU0 XSU1 MCFXU LSU FPU BPU FUs Rename registers Commit unit Registers Chapter 4 page 90 CS 5515