The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Similar documents
Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? RISC remainder (our assumptions)

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Instruction Pipelining Review

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Pipelining. CSC Friday, November 6, 2015

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

ECE260: Fundamentals of Computer Engineering

ECE154A Introduction to Computer Architecture. Homework 4 solution

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

ECEC 355: Pipelining

DLX Unpipelined Implementation

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

LECTURE 3: THE PROCESSOR

COSC 6385 Computer Architecture - Pipelining

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

Control Dependence, Branch Prediction

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

Advanced Computer Architecture

Appendix A. Overview

ECE 505 Computer Architecture

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

Pipeline Review. Review

Pipelining. Maurizio Palesi

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

Final Exam Fall 2007

Instruction Pipelining

Instruction Pipelining

COMP2611: Computer Organization. The Pipelined Processor

Computer Architecture

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Basic Pipelining Concepts

Processor (II) - pipelining. Hwansoo Han

Pipelining: Basic and Intermediate Concepts

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Chapter 4. The Processor

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

Lecture 5: Pipelining Basics

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

HY425 Lecture 05: Branch Prediction

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

ECE331: Hardware Organization and Design

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Pipelining: Hazards Ver. Jan 14, 2014

Pipelined Processor Design

Chapter 4. The Processor

Lecture 2: Processor and Pipelining 1

ECE232: Hardware Organization and Design

ELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST *

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Full Datapath. Chapter 4 The Processor 2

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

Modern Computer Architecture

Lecture 4: Advanced Pipelines. Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Full Datapath. Chapter 4 The Processor 2

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

MIPS An ISA for Pipelining

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Very Simple MIPS Implementation

ECE331: Hardware Organization and Design

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Improving Performance: Pipelining

Very Simple MIPS Implementation

CS252 Prerequisite Quiz. Solutions Fall 2007

ECE 486/586. Computer Architecture. Lecture # 12

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

ELE 655 Microprocessor System Design

Slide Set 7. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

Chapter 4 The Processor 1. Chapter 4A. The Processor

Computer Architecture. Lecture 6.1: Fundamentals of

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Pipeline: Introduction

Appendix C. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

PIPELINING: HAZARDS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

Transcription:

The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Pipeline

A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions add, sub, AND, OR and slt Branch on equal (beq)

Instruction Fetch Elements

Instruction Fetch

ALU Operations Elements Addr Data REGISTER FILE Data Write ADD R1, R2, R3

ADD R1, R2, R3 ALU Operations Elements

ADD R1, R2, R3 ALU Operations Elements

LW R1, -8(R2) Loads and Stores Elements

Branches Elements BEQ R1, R2, LABEL BEQ R1, R2, -16

Branches Elements BEQ R1, R2, LABEL BEQ R1, R2, -16

Memory and R-type Instructions

LW R1, -8(R2) Memory Instruction Load

SW R1, -8(R2) Memory Instruction Store

ADD R1, R2, R3 R Type Instruction ADD

The MIPS Datapath

BEQ R1, R2, -16 The MIPS Datapath BEQ

MIPS Datapath and Control Lines

Pipeline Stages Instruction Instruction Fetch Fetch (IF) (IF) ID: ID: Instruction Instruction decode/ decode/ Register Register file file read read EX: EX: Execution/ Execution/ Address Address Calculation Calculation MEM: MEM: Memory Memory Access Access WB: WB: Write Write Back Back

Pipelined Datapath Instruction Instruction Fetch Fetch (IF) (IF) ID: ID: Instruction Instruction decode/ decode/ Register Register file file read read EX: EX: Execution/ Execution/ Address Address Calculation Calculation MEM: MEM: Memory Memory Access Access WB: WB: Write Write Back Back

Pipelined vs. Nonpipelined Implementation

Pipelined vs. Nonpipelined Implementation Ratio of total execution times between the two versions for 10^6 instructions? Pipelining increases the instruction throughput opposed to individual instruction execution time. IF ID EX MEM WB

Speedup of the Pipeline The speedup of a k stage pipelined processor over an unpipelined processor S k = T unpipelined T pipelined = n k k+(n 1) n: number of instructions in the program. k: number of pipeline stages

Efficiency of the Pipeline Percentage of stages accomplishing tasks related to the instruction in execution η= No. of Instructions Instruction Execution Time η= n k+(n 1) n: number of instructions in the program. k: number of pipeline stages

Throughput of the Pipeline Number of tasks completed in unit time (one second) w=η f f: frequency of operation

Pipeline Hazards Hazard: n. An unavoidable danger or risk, even though often foreseeable. Situations that prevent the next instruction in the instruction stream from being executing during its designated clock cycle Reduce the performance from the ideal speedup gained by pipelining

Structural Hazard 1 2 3 4 5 6 7 8 9 i1 i2 i3 i4 MEM ID EX MEM WB MEM ID EX MEM WB MEM ID EX MEM WB MEM ID EX MEM WB i5... HAZARD!!! Lack of resources Solution: Increase resources MEM ID EX MEM WB Use of separate Data and Instruction memories in the MIPS pipeline

Data Hazard 1 2 3 4 5 6 7 8 9 ADD R1, R2, R3 IF ID EX MEM WB SUB R4, R1, R5 IF ID EX smem WB WRONG! Data (input operands) required by the instruction are not ready/available Data dependence RAW, WAR, WAW dependences ADD R1, R2, R3 SUB R2, R4, R5 ADD R1, R2, R3 SUB R1, R4, R5

Data Hazard DADD DSUB AND OR XOR R1,R2,R3 R4,R1,R5 R6,R1,R7 R8,R1,R9 R10,R1,R11 Time (clock cycles) DADD IM REG ALU DM REG DSUB IM REG ALU DM REG AND IM REG ALU DM REG OR IM REG ALU DM XOR IM REG ALU

Avoiding Data Hazards Forwarding DADD DSUB AND OR XOR R1,R2,R3 R4,R1,R5 R6,R1,R7 R8,R1,R9 R10,R1,R11 Time (clock cycles) DADD IM REG ALU DM REG DSUB IM REG ALU DM REG AND IM REG ALU DM REG OR IM REG ALU DM XOR IM REG ALU

Pipeline without Forwarding

Pipeline with Forwarding

Data Hazard Load Instruction LD DSUB AND OR R1,0(R2) R4,R1,R5 R6,R1,R7 R8,R1,R9 Time (clock cycles) LD IM REG ALU DM REG DSUB IM REG ALU DM REG AND IM REG ALU DM REG OR IM REG ALU DM

Data Hazards Stalls LD DSUB AND OR R1,0(R2) R4,R1,R5 R6,R1,R7 R8,R1,R9 Time (clock cycles) LD IM REG ALU DM REG DSUB IM REG ALU ALU DM REG AND IM REG ALU ALU DM OR IM REG ALU ALU

Data Hazard Solutions Data Forwarding Instruction Reordering

Control Hazard Arise from the pipelining of branches and other instructions that change the PC Also called Branch Hazards

Branch Hazards Time 1 2 3 4 5 6 (clock cycles) 7 8 9 BEQ IF ID EX MEM WB ADD IF ID EX MEM WB Branch Successor Branch Successor + 1 Branch Successor + 2 IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Assumption: Branch condition evaluation completed in in the the ID ID stage

Reducing Pipeline Branch Penalties Freeze the pipeline Predict Taken Predict Untaken Fill Branch Delay Slot Time (clock cycles) 1 2 3 4 5 6 7 8 9 i BEQ IF ID EX MEM WB i-1 AND IF ID EX MEM WB i+16 Branch Successor IF ID EX MEM WB i+17 Branch Successor + 1 IF ID EX MEM WB

Dynamic Branch Prediction Branch prediction buffers Single bit predictors Change prediction with branch behaviour No. of wrong predictions? BRANCH PREDICTION BUFFER T T T T N T T T T T T T T T T T T Wrong Predictions PC Prediction 0x0100 1 0x0154 0 0x0210 1... 1

Dynamic Branch Prediction 2-bit predictors 00 0x0100 0x0154 0x0210 Branch Prediction Buffer 11 10 11 11 11 10 00 01