Instruction Pipelining

Similar documents
Instruction Pipelining

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Pipelining. Maurizio Palesi

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining. Each step does a small fraction of the job All steps ideally operate concurrently

ECEC 355: Pipelining

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Pipelining. CSC Friday, November 6, 2015

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Advanced Computer Architecture

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

DLX Unpipelined Implementation

Computer Organization MIPS Architecture. Department of Computer Science Missouri University of Science & Technology

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

ECSE 425 Lecture 6: Pipelining

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

Outline. Pipelining basics The Basic Pipeline for DLX & MIPS Pipeline hazards. Handling exceptions Multi-cycle operations

ECE154A Introduction to Computer Architecture. Homework 4 solution

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

LECTURE 3: THE PROCESSOR

Execution/Effective address

Appendix C. Abdullah Muzahid CS 5513

Computer System. Hiroaki Kobayashi 6/16/2010. Ver /16/2010 Computer Science 1

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Computer System. Agenda

mywbut.com Pipelining

Instruction Pipelining Review

Appendix C: Pipelining: Basic and Intermediate Concepts

Appendix A. Overview

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

COSC 6385 Computer Architecture - Pipelining

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Full Datapath. Chapter 4 The Processor 2

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Appendix C. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods

Processor (II) - pipelining. Hwansoo Han

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Pipelining: Basic and Intermediate Concepts

What is Pipelining? RISC remainder (our assumptions)

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Very Simple MIPS Implementation

ECE260: Fundamentals of Computer Engineering

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

1 Hazards COMP2611 Fall 2015 Pipelined Processor

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Pipelining: Hazards Ver. Jan 14, 2014

Chapter 4. The Processor

ארכי טק טורת יחיד ת עיבוד מרכזי ת

Full Datapath. Chapter 4 The Processor 2

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Updated Exercises by Diana Franklin

Chapter 4. The Processor

ELE 655 Microprocessor System Design

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Chapter 4 The Processor 1. Chapter 4A. The Processor

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

CALIFORNIA STATE UNIVERSITY, NORTHRIDGE A TOMASULO BASED MIPS SIMULATOR. For the degree of Master of Science. In Electrical Engineering.

COMPUTER ORGANIZATION AND DESIGN

Very Simple MIPS Implementation

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations

The Processor: Instruction-Level Parallelism

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Two Dynamic Scheduling Methods

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

COMPUTER ORGANIZATION AND DESI

CSEE 3827: Fundamentals of Computer Systems

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

( ) תשס"ח סמסטר ב' May, 2008 Hugo Guterman Web site:

CSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Computer Architecture

DLX: A Simplified RISC Model

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

Lecture 7 Pipelining. Peng Liu.

Pipeline Architecture RISC

This course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers

Superscalar Machines. Characteristics of superscalar processors

Speeding Up DLX Computer Architecture Hadassah College Spring 2018 Speeding Up DLX Dr. Martin Land

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Processor Architecture

Pipelining. Pipeline performance

Pipelining. CS701 High Performance Computing

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Transcription:

Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages are the same length instructions and operands can be fetched quickly enough results can be stored quickly enough Copyright 1998 Leslie S. Smith 31R6 - Computer Design Slide 46 Longer linear pipelines Because pipeline speedup is directly proportional to pipe length, longer pipelines are attractive e.g. Fetch Instruction Decode Instruction Operand Address Generate or Execute Operand Fetch Store Result These stages are unlikely to be all of the same length, so sometimes null stages are added to compensate Fetch Instruction NULL Decode Instruction Operand Address Generate Operand Fetch NULL Store Result NULL Copyright 1998 Leslie S. Smith 31R6 - Computer Design Slide 47

A pipelining example: DLX (Taken from Hennessy & Patterson) DLX has a five - stage architecture. DLX is a RISC processor Stages are IF - Instruction Fetch Fetch instruction from memory to IR increment PC (NPC <- PC + 4) ID - Instruction fetch/register decode Decode the instruction and access the register file to read the register(s) into temporary registers A and B Also sign-extend lower 16 bits of IR (Imm <- (sign-extended) IR15-0) EX- Execution/effective address cycle for a memory reference instruction ALU.Output <- A + Imm for a register-register instruction ALU.Output <- A function B for a branch instruction ALU.Output <- NPC + Imm Cond <- A op 0 Copyright 1998 Leslie S. Smith 31R6 - Computer Design Slide 48 DLX stages (cont) Stages (cont) MEM - memory access/branch completion cycle for a memory reference instruction LMD <- Mem[Alu.Output] or Mem[ALU.Output] <- B for a branch instruction if (cond) PC <- ALU.Output else PC <- NPC WB - Write-back cycle for a register-register ALU instruction Regs[IR16-20] <- ALU.Output] for a Register- Immediate Instruction Regs[IR11-15] <- ALU.Output for a load instruction Regs[11-15] <- LMD Notes the memories referred to are all cache memories there are two caches, an instruction cache and a data cache Copyright 1998 Leslie S. Smith 31R6 - Computer Design Slide 49

Speedup of DLX DLX pipeline has a five-stage pipeline, so the speedup is 5 (?) No not all instructions use all stages MEM stage is not used at all by register-register ALU instructions the latch itself has an overhead as well even if it is not very large actual speedup is approximately 4 times though this depends on the relative frequency of he different instruction known as the instruction mix. More importantly, there are reasons why one cannot expect the pipe to fill up and remain full: there are hazards. Copyright 1998 Leslie S. Smith 31R6 - Computer Design Slide 50 Pipeline Hazards There are three different types of hazard which prevent the instruction stream from executing resulting in the pipe not being kept full at all times. Structural Hazards these arise form resource conflicts the hardware cannot support all the possible combinations of instructions in the pipe Data Hazards these arise when an instruction depends on the result of a previous instruction this result may not yet have been stored, or even computed. Control Hazards these arise from attempting to pipeline instructions that themselves affect the flow of control that is, they affect the program counter (e.g. jumps, branches, function calls, etc.) Copyright 1998 Leslie S. Smith 31R6 - Computer Design Slide 51

Structural Hazards Can occur (e.g.) if some function requires a unit which is not pipelined e.g. floating-point units are sometimes not pipelined, and the performance of the pipe decreases severely when there are many FP instructions but generally, these are not so very common If the DLX did not have a separate instruction and data cache, but had only a single port on to memory the IF and MEM stages could provide a structural hazard Copyright 1998 Leslie S. Smith 31R6 - Computer Design Slide 52 Avoiding structural hazards RISC instructions are much more predictable than CISC ones and this makes structural hazards easier to avoid These hazards can be avoided by adding more hardware though not an option on an existing processor, this becomes easier on new versions of a processor because of the improvements in manufacturing technology Note that dynamic examination of code can show up the likelihood of particular structural hazards and suggest whether additional hardware will give a reasonable improvement or not. Copyright 1998 Leslie S. Smith 31R6 - Computer Design Slide 53

Data Hazards Can occur because the pipelining reorders execution from the intuitive order. Consider ADD R1, R2, R3 // R1 <- R2 + R3 SUB AND OR R4, R1, R5 // R4 <- R1 - R5 R6, R1, R7 // R6 <- R1 AND R7 R8, R1, R9 // R8 <- R1 OR R9 R1 is computed in the first instruction and written during the last (WB) pipe stage and R1 is used in the 3 following instructions and accessed during the ID stage but the WB stage of the first instruction will not have run yet so (if nothing was done) the wrong value for R1 would be used Copyright 1998 Leslie S. Smith 31R6 - Computer Design Slide 54 Solving data hazards I One can stall the pipeline until the result has been calculated and stored (amended picture) Copyright 1998 Leslie S. Smith 31R6 - Computer Design Slide 55

Solving data hazards II One can used additional hardware to alleviate the problem add registers to make ALU outputs immediately available don t wait until they have been written back to registers (or memory) Called forwarding (or bypassing, or short-circuiting) The ALU.output value from the EX/MEM latch is made available at the ALU input registers and is used instead of the register input if the CPU detects that the register has been updated This can completely remove the data hazard described above. Copyright 1998 Leslie S. Smith 31R6 - Computer Design Slide 56 Data Hazard Classification Data hazards are of one of three different types:(i<j) RAW (read after write) j tries to read a source operand before i writes to it this is the commonest form of data hazard WAW (write after write) i tries to write a result after it is written by j thus the wrong result gets left in the memory WAR (write after read) j tries to write a value before it is read by i this doesn t happen in the pipeline described here but could occur if results were written early in the pipe as might occur with autoincrement addressing modes. One cannot in general use forwarding to solve all of these problems so that some data hazards do require pipeline stalls Copyright 1998 Leslie S. Smith 31R6 - Computer Design Slide 57

Compiler scheduling for data hazards Typical straightforward generated code for simple statements like A = B+C causes stalls. LW R1, B IF ID EX MEM WB LW R2, C IF ID EX MEM WB ADD R3, R2, R1 IF ID stall EX MEM WB SW A, R3 IF ID stall EX But compiler scheduling (code rearranging) can help. Copyright 1998 Leslie S. Smith 31R6 - Computer Design Slide 58 Rearranging code E.g. a = b + c ; d = e + f ; can be rewritten as LW R1, B LW R2, C LW R3, E ADD R4, R1, R2 LW R5, F SW A, R4 ADD R6, R3, R5 SW D, R6 The ADDs and SWs have been rearranged so as to avoid pipeline stalls. Copyright 1998 Leslie S. Smith 31R6 - Computer Design Slide 59