Pipeline Review. Review

Similar documents
COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? RISC remainder (our assumptions)

Pipelining: Hazards Ver. Jan 14, 2014

Instruction Pipelining Review

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

第三章 Instruction-Level Parallelism and Its Dynamic Exploitation. 陈文智 浙江大学计算机学院 2014 年 10 月

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

ECEC 355: Pipelining

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Computer Architecture

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Modern Computer Architecture

Appendix A. Overview

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Outline. Pipelining basics The Basic Pipeline for DLX & MIPS Pipeline hazards. Handling exceptions Multi-cycle operations

Pipelining: Basic and Intermediate Concepts

1 Hazards COMP2611 Fall 2015 Pipelined Processor

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

Pipelining. CSC Friday, November 6, 2015

Appendix C: Pipelining: Basic and Intermediate Concepts

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Advanced Computer Architecture Pipelining

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

COSC 6385 Computer Architecture - Pipelining

DLX Unpipelined Implementation

Instr. execution impl. view

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts

MIPS An ISA for Pipelining

Pipelining. Maurizio Palesi

Lecture 5: Pipelining Basics

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Chapter 4. The Processor

Full Datapath. Chapter 4 The Processor 2

ELE 655 Microprocessor System Design

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

What do we have so far? Multi-Cycle Datapath (Textbook Version)

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

Lecture 2: Processor and Pipelining 1

Processor (II) - pipelining. Hwansoo Han

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Appendix C. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

ECE260: Fundamentals of Computer Engineering

COMP2611: Computer Organization. The Pipelined Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Advanced Computer Architecture

EITF20: Computer Architecture Part2.2.1: Pipeline-1

LECTURE 3: THE PROCESSOR

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Appendix C. Abdullah Muzahid CS 5513

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Pipelined Processor Design

Chapter 4. The Processor

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

CSE 502 Graduate Computer Architecture. Lec 3-5 Performance + Instruction Pipelining Review

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

14:332:331 Pipelined Datapath

Full Datapath. Chapter 4 The Processor 2

Pipelining. Each step does a small fraction of the job All steps ideally operate concurrently

CSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Advanced Computer Architecture

Computer System. Agenda

Week 11: Assignment Solutions

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

COMPUTER ORGANIZATION AND DESIGN

Pipelining. Pipeline performance

CSE 502 Graduate Computer Architecture. Lec 3-5 Performance + Instruction Pipelining Review

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

COMPUTER ORGANIZATION AND DESIGN

Processor Architecture

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

mywbut.com Pipelining

Pipelined Processor Design

CIS 662: Midterm. 16 cycles, 6 stalls

CSEE 3827: Fundamentals of Computer Systems

Computer System. Hiroaki Kobayashi 6/16/2010. Ver /16/2010 Computer Science 1

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

Transcription:

Pipeline Review Review Covered in EECS2021 (was CSE2021) Just a reminder of pipeline and hazards If you need more details, review 2021 materials 1

The basic MIPS Processor Pipeline 2

Performance of pipelining Pipelining does not reduce the time to execute the instruction (it actually increases it). It increases the throughput. We can not skip stages anymore, and the pipeline cycle = time for longest cycle longest cycle Example: a machine with 10 ns cycle, it takes 4 cycles for ALU and branches, and 5 for memory (40,20,40%), what is the effect of the pipelining Without execution time = 10*(0.6*4+0.4*5)= 44 ns With pipelining 10+1 (overhead)=11 Speedup = 4 Performance of Pipelining The length of a machine clock cycle is determined by the time required for the slowest pipe stage. An important pipeline design consideration is to balance the length of each pipeline stage. If all stages are perfectly balanced, then the time per instruction on a pipelined machine (assuming ideal conditions with no stalls): Time per instruction on unpipelined machine Number of pipe stages Under these ideal conditions: Speedup from pipelining equals the number of pipeline stages: n, One instruction is completed every cycle, CPI = 1. 3

Hazards There are situation called hazards that prevents the continuous flow of instructions in the pipe Structural hazards: resource conflicts Data hazards: instruction depends on the results from a previous instruction that is not ready yet. Control hazards: branches (we don t know the address of the next instruction) 4

Performance Hazards in pipelines may make it necessary to stall the pipeline by one or more cycles and thus degrading performance from the ideal CPI of 1. CPI pipelined = Ideal CPI + Pipeline stall clock cycles per instruction When all instructions take the same number of cycles and is equal to the number of pipeline stages then: Speedup = Pipeline depth / (1 + Pipeline stall cycles per instruction) Structural Hazards When we pipeline a machine, the overlapped instruction execution requires pipelining of functional units and duplication of resources to allow all possible combinations of instructions in the pipeline. If a resource conflict arises due to a hardware resource being required by more than one instruction in a single cycle, and one or more such instructions cannot be accommodated, then a structural hazard has occurred, for example: One example is when we have a single memory for both instructions and data. 5

Structural Hazards Data Hazards Pipelining changes the relative timing of the instructions by overlapping their execution. If the timing of read/write accesses to the operands is changed, that might result in incorrect execution Example: ADD R1, R2, R3 SUB R4, R1, R5 AND R7, R1, R6 OR R8, R1, R9 XOR R10,R1,R11 All the instructions after ADD use the result of the ADD instruction (ready in WB stage) Without proper precautions, SUB will read the old value in R1 SUB, AND, and OR instructions need to be stalled for correct execution. 6

ADD R1,R2,R3 SUB R4,R1,R5 X AND R7,R1,R6 X OR XOR R8,R1,R9 R10,R1,R3 X Forwarding It is one thing to read operand before it is written, and between requesting an operand before it is produced. Results of ADD is written in CC 5, read by SUB in CC 3 BUT, the results of ADD is produced in CC 3, requested by SUB in CC 4 We can use Forwarding 1. The ALU result from EX/MEM register is always fed back to the input of the ALU 2. If the forwarding hardware detects that if the previous ALU op writes to a register that is the source of the current ALU op, use a MUX to choose the fed back value instead of source register We need to forward results not only from previous instruction, but from an instruction that started 3 cycles earlier 7

Forwarding Forwarding Consider the following sequence ADD R1, R2, R3 R4, 0(R1) SW 12(R1), R4 To prevent stalls, we need to forward the values in R1 and R4 from the pipeline registers to the inputs of the ALU and data memory. We may require a forwarding path from any pipeline register to the input of any functional unit 8

Load Use Data Hazard Need to stall for one cycle Chapter 4 The Processor 18 9

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t r. O r d e r Load Instr 1 Instr 2 Stall Ifetch Instr 3 Ifetch ALU Ifetch DMem ALU DMem ALU DMem Bubble Bubble Bubble Bubble Bubble Ifetch ALU DMem Compiler Scheduling for Data Hazards Compiler may try to rearrange the code in order to avoid stalls. For example, avoid generating a code where there is a load followed by the immediate use of the loaded value Example, consider the code segment A=b+c D=e-f Here is two way of generating code 10

EX ADD SW SUB SW Rb,b Rc,c Ra,Rb,Rc a,ra Rf,f Re,e Rd,Re,Rf d,rd Stall Stall ADD SW SUB SW Rb,b Rc,c Rf,f Ra,Rb,Rc Re,e a,ra Rd,Re,Rf d,rd Control Hazards Branch instruction IF ID EX MEM WB Branch successor IF stall stall IF ID EX MEM WB Branch successor + 1 IF ID EX MEM WB Branch successor + 2 IF ID EX MEM Branch successor + 3 IF ID EX Branch successor + 4 IF ID Branch successor + 5 IF In order to reduce the branch penalty, we must do two things Find out if the branch is taken or not as soon as possible Compute the taken PC ASAP 11

Compile Time Solutions Compiler may decide to predicts during compilation if the branch is taken or not Easiest way is to freeze or flush the pipe (simple) Predict branch is not taken, proceed as usual but be careful either not to change the state of the machine (or back out if you do) until you know if the prediction was correct or not Predict taken Useful only if we know the target address before we know the result of the comparison (condition) Scheduling the Branch Delay Slot In this case, the job of the compiler is to make the successor instruction valid and useful (Fig. 3.28) In (a) the scheduled instruction should be done anyway, no harm at all IN (b) and (c) the use of R1 in the branch condition prevents moving the instruction to after the branch In both cases, it must be O.K. to execute the SUB instruction when the branch will go the opposite direction In (b) useful when the branch is taken with a high probability, the reverse in (c). 12

13