Pipelined Processor Design. EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon

Similar documents
Processor (IV) - advanced ILP. Hwansoo Han

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

COMPUTER ORGANIZATION AND DESI

The Processor: Instruction-Level Parallelism

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Advanced processor designs

COSC4201 Instruction Level Parallelism Dynamic Scheduling

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

COMP2611: Computer Organization. The Pipelined Processor

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

Processor (I) - datapath & control. Hwansoo Han

Processor Design Pipelined Processor. Hung-Wei Tseng

Hardware-Based Speculation

Hardware-based Speculation

Chapter 4 The Processor 1. Chapter 4D. The Processor

14:332:331 Pipelined Datapath

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

Processor Design Pipelined Processor (II) Hung-Wei Tseng

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Superscalar Machines. Characteristics of superscalar processors

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

CS 351 Exam 2, Fall 2012

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

E0-243: Computer Architecture

EEC 581 Computer Architecture. Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW)

Chapter 4. The Processor

Superscalar Processors

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.

5008: Computer Architecture

Limitations of Scalar Pipelines

EEC 581 Computer Architecture. Lec 7 Instruction Level Parallelism (2.6 Hardware-based Speculation and 2.7 Static Scheduling/VLIW)

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

DEE 1053 Computer Organization Lecture 6: Pipelining

Chapter 4. The Processor

Chapter 4 The Processor (Part 4)

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

Chapter 4. The Processor

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Static vs. Dynamic Scheduling

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

Four Steps of Speculative Tomasulo cycle 0

Codeword[1] Codeword[0]

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

CS3350B Computer Architecture Quiz 3 March 15, 2018

CSE Quiz 3 - Fall 2009

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

Superscalar Processors Ch 14

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

CS425 Computer Systems Architecture

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011

EIE/ENE 334 Microprocessors

ECE369. Chapter 5 ECE369

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.

IF1/IF2. Dout2[31:0] Data Memory. Addr[31:0] Din[31:0] Zero. Res ALU << 2. CPU Registers. extension. sign. W_add[4:0] Din[31:0] Dout[31:0] PC+4

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Pipelined Processor Design

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Lecture 8: Branch Prediction, Dynamic ILP. Topics: static speculation and branch prediction (Sections )

CPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation

The basic structure of a MIPS floating-point unit

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

Superscalar Processors

Quiz for Chapter 4 The Processor3.10

Complex Pipelining COE 501. Computer Architecture Prof. Muhamed Mudawar

Intel released new technology call P6P

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )

Instruction Level Parallelism

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

Pipelined Processor Design

Handout 2 ILP: Part B

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Case Study IBM PowerPC 620

Chapter 4. The Processor

Chapter 4. The Processor. Jiang Jiang

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5

Transcription:

Pipelined Processor Design EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon

Concept

Identification of Pipeline Segments

Add Pipeline Registers

Pipeline Stage Control Signals Fetch: control signals to read IMem and PC write are always asserted. Nothing special here. Decode/register-read: no special control is needed Execute: RegDst, ALUOp, and ALUSrc Mem Access: Branch, MemRead, MemWrite Write-Back: MemtoReg and RegWrite

Control Signals Identified

Control for the Pipeline Stages

Control Portion of Pipeline Registers

Dependence and Data Forwarding

Adding Data Forwarding

Data Forwarding Control

Data Hazards and Stall

Stall: Insertion of nop

Controls for Forwarding and Hazard Detection

Control Hazard: IF.Flush for dealing with the delays caused by branch

Impact of Branch Instruction

Branch decision on decode stage

Insertion of nop when branch taken

Exception in Pipeline The instruction that follows offending instruction must be flushed. Fetch must begin from the new address that implements the exception service routine.

Pipeline implementation of Exception Flushed instruction is replaced with nop register control signals A new control signal, called ID.flush, is ORed with the stall signal from the hazard detection unit To flush the execution stage, a new signal called EX.flush is used to zero the control lines in the pipeline buffer 0x80000180 is multiplexed to PC, which is the address for exception processing The address of the offending instruction is saved in the EPC and the cause in the Cause register

Controls that Include Exception

Extracting More Performance from Pipelining Increase the depth of the pipeline Multiple issue pipelining (2-issue pipeline)

Dynamic Pipeline Scheduling Each Reservation Station (RS, buffer) holds the operand and the operation that corresponds the attached Functional Unit (FU) Instructions are fetched, decoded, and then distributed to RS. Each FU computes and produces the result as soon as the operands are ready. The results are either forwarded to the RS waiting for the result or to the Commit Unit (CU). The CU (reorder buffer) holds results until they are safe to put them to the register file or to store them into memory. It also directly supplies operands FUs.

Dynamic Pipeline Scheduling

Microarchtecture of the Intel Pentium-4

Pentium-4 Pipeline

Register Renaming To effectively use the reorder buffer and the reservation stations, registers are often renamed. The instruction in a RS is buffered until all the operands are execution units are available. When the FU eventually produces the result, it is directly copied into the waiting RS bypassing the register. The feed registers are renamed and let them available for other instructions.

Register renaming illustration Compiler Level 1. R1 <= Mem[10] 2. R1 <= R1 + 2 3. Mem[10] <= R1 4. R1 <= Mem[20] 5. R1 <= R1 + 4 6. Mem[20] <= R1 Renamed 1. R1 <= Mem[10] 2. R1 <= R1 + 2 3. Mem[10] <= R1 1. R2 <= Mem[20] 2. R2 <= R2 + 4 3. Mem[20] <= R2

Pentium 4 Pipeline IA-32 instructions are translated to micro-ops Micro-ops are executed by a dynamically scheduled speculative pipeline multiple functional units =7 Deep pipeline (20 stages) Use of trace cache (holds predecoded sequence of micro-ops) 128 registers Upto 126 micro-ops can be queues at any point in time Extensive bypass network among functional units

IA-64 Architecture (1) RISC style register-to-register instruction set 128 integer and 128 floating point registers; 8 registers for branch Supports register windows (similar to SPARC) Instructions are encoded in bundles, which 128 bits wide Each bundle consists of 5-bit template field and three instructions (41 bits each) The template field specifies which of the five different execution units each instruction in the bundle requires

IA-64 Architecture (2) Five different types of execution units: (1) integer ALU, (2) non-integer ALU (shifters and multimedia operations), (3) memory unit, (4) FP unit, and (5) branch unit. The bundle formats can specify only a subset of all possible combinations of instruction types Supports predication (A technique that is used to eliminate branches by making the execution of an instruction dependent on a predicate, rather than dependent on a branch)

Reality Check Many dependencies cannot be alleviated Compilers or hardware often do not exactly know data dependencies; this leads to a conservative design Pointers are often assumed as potential dependencies Ability to predict branches is limited Memory systems are not able to keep the pipeline full Cache misses stall the pipeline

Fallacies and Pitfalls 1. Pipelining is easy (true or false?) 2. Pipelining ideas can be implemented independent of technology (true or false?) 3. Failure to consider instruction set design can adversely impact pipelining (true or false?) 4. Longer pipelines, multiple instruction issues and dynamically scheduled pipeline in 1990s helped sustain the single process performance increase (true or false?)