Forwarding. March 31, Howard Huang 1

Similar documents
Processor Design Pipelined Processor (II) Hung-Wei Tseng

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

LECTURE 5. Single-Cycle Datapath and Control

ECE232: Hardware Organization and Design

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Processor (I) - datapath & control. Hwansoo Han

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Chapter 4. The Processor

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Chapter 4 The Processor 1. Chapter 4A. The Processor

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Chapter 4. The Processor

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor

Systems Architecture

CPE 335. Basic MIPS Architecture Part II

CSEN 601: Computer System Architecture Summer 2014

Chapter 4 The Processor 1. Chapter 4B. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

ECS 154B Computer Architecture II Spring 2009

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

University of Jordan Computer Engineering Department CPE439: Computer Design Lab

CSE 2021 COMPUTER ORGANIZATION

ECE232: Hardware Organization and Design

Inf2C - Computer Systems Lecture Processor Design Single Cycle

COMPUTER ORGANIZATION AND DESIGN

COMP2611: Computer Organization. The Pipelined Processor

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Processor: Multi- Cycle Datapath & Control

CS 2506 Computer Organization II Test 1. Do not start the test until instructed to do so! printed

Design of the MIPS Processor

Chapter 5: The Processor: Datapath and Control

Chapter 4. The Processor Designing the datapath

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

LECTURE 9. Pipeline Hazards

Chapter 5 Solutions: For More Practice

Processor (II) - pipelining. Hwansoo Han

LECTURE 3: THE PROCESSOR

ECE154A Introduction to Computer Architecture. Homework 4 solution

Pipelining. CSC Friday, November 6, 2015

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

CS232 Final Exam May 5, 2001

CC 311- Computer Architecture. The Processor - Control

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Lecture 10: Simple Data Path

CSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements

Multicycle conclusion

Computer Organization and Structure

CSE 378 Midterm 2/12/10 Sample Solution

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)

Topic #6. Processor Design

ECE 313 Computer Organization FINAL EXAM December 13, 2000

ECE 313 Computer Organization FINAL EXAM December 11, Multicycle Processor Design 30 Points

RISC Processor Design

Single Cycle Data Path

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

The MIPS Processor Datapath

Full Datapath. Chapter 4 The Processor 2

ECE260: Fundamentals of Computer Engineering

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

COSC 6385 Computer Architecture - Pipelining

Introduction. Datapath Basics

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Improving Performance: Pipelining

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

ECE260: Fundamentals of Computer Engineering

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

CSE 533: Advanced Computer Architectures. Pipelining. Instructor: Gürhan Küçük. Yeditepe University

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

CS232 Final Exam May 5, 2001

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

CPE 335 Computer Organization. Basic MIPS Architecture Part I

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #3 Due Date: At the beginning of lecture, October 20 th, 2010

Lecture 7 Pipelining. Peng Liu.

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Transcription:

Forwarding Just before Spring Break, we introduced a pipelined MIPS processor which executes several instructions simultaneously. Each instruction requires five stages, and five cycles, to complete. Each stage uses different functional units of the datapath. So we can execute up to five instructions in any clock cycle, with each instruction in a different stage and using different hardware. Today we ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding. March 3, 3-3 Howard Huang

The pipelined datapath / PCSrc Control M / / 4 / M Add P C Write Shift left Add Read address Instruction [3-] Read register Read register Read data Read data ALU Zero Result MemWrite Address Instruction Write register Write data isters ALUSrc ALUOp Write data Data Read data MemTo Instr [5 - ] Instr [ - 6] Instr [5 - ] Sign extend Dst MemRead March 3, 3 Forwarding

Pipeline diagram review 3 Clock cycle 4 5 6 7 8 9 lw $8, 4($9) sub $, $4, $5 and $9, $, $ or $6, $7, $8 add $3, $4, $ This diagram shows the execution of an ideal code fragment. Each instruction needs a total of five cycles for execution. One instruction begins on every clock cycle for the first five cycles. One instruction completes on each cycle from that time on. March 3, 3 Forwarding 3

Our examples are too simple Here is the example instruction sequence used to illustrate pipelining on the previous page. lw $8, 4($9) sub $, $4, $5 and $9, $, $ or $6, $7, $8 add $3, $4, $ The instructions in this example are independent. Each instruction reads and writes completely different registers. Our datapath handles this sequence easily, as we saw last time. But most sequences of instructions are not independent! March 3, 3 Forwarding 4

An example with dependencies sub $, $, $3 and $, $, $5 or $3, $6, $ add $4, $, $ sw $5, ($) There are several dependencies in this new code fragment. The first instruction, SUB, stores a value into $. That register is used as a source in the rest of the instructions. This is not a problem for the single-cycle and multicycle datapaths. Each instruction is executed completely before the next one begins. This ensures that instructions through 5 above use the new value of $ (the sub result), just as we expect. How would this code sequence fare in our pipelined datapath? March 3, 3 Forwarding 5

Data hazards in the pipeline diagram 3 Clock cycle 4 5 6 7 8 9 sub $, $, $3 and $, $, $5 or $3, $6, $ add $4, $, $ sw $5, ($) The SUB instruction does not write to register $ until clock cycle 5. This causes two data hazards in our current pipelined datapath. The AND reads register $ in cycle 3. Since SUB hasn t modified the register yet, this will be the old value of $, not the new one. Similarly, the OR instruction uses register $ in cycle 4, again before it s actually updated by SUB. March 3, 3 Forwarding 6

Things that are okay 3 Clock cycle 4 5 6 7 8 9 sub $, $, $3 and $, $, $5 or $3, $6, $ add $4, $, $ sw $5, ($) The ADD instruction is okay, because of the register file design. isters are written at the beginning of a clock cycle. The new value will be available by the end of that cycle. The SW is no problem at all, since it reads $ after the SUB finishes. March 3, 3 Forwarding 7

Dependency arrows 3 Clock cycle 4 5 6 7 8 9 sub $, $, $3 and $, $, $5 or $3, $6, $ add $4, $, $ sw $5, ($) Arrows indicate the flow of data between instructions. The tails of the arrows show when register $ is written. The heads of the arrows show when $ is read. Any arrow that points backwards in time represents a data hazard in our basic pipelined datapath. Here, hazards exist between instructions & and & 3. March 3, 3 Forwarding 8

A fancier pipeline diagram Clock cycle 3 4 5 6 7 8 9 sub $, $, $3 and $, $, $5 or $3, $6, $ add $4, $, $ sw $5, ($) March 3, 3 Forwarding 9

A more detailed look at the pipeline We have to eliminate the hazards, so the AND and OR instructions in our example will use the correct value for register $. Let s look at when the data is actually produced and consumed. The SUB instruction produces its result in its stage, during cycle 3 in the diagram below. The AND and OR need the new value of $ in their stages, during clock cycles 4-5 here. Clock cycle 3 4 5 6 7 sub $, $, $3 and $, $, $5 or $3, $6, $ March 3, 3 Forwarding

Bypassing the register file The actual result $ - $3 is computed in clock cycle 3, before it s needed in cycles 4 and 5. If we could somehow bypass the writeback and register read stages when needed, then we can eliminate these data hazards. Today we just study hazards involving arithmetic instructions like sub. On Wednesday we ll examine the lw instruction. Essentially we need to pass the ALU output from SUB directly to the AND and OR instructions, without going through the register file. Clock cycle 3 4 5 6 7 sub $, $, $3 and $, $, $5 or $3, $6, $ March 3, 3 Forwarding

Where to find the ALU result The ALU result generated in the stage is normally passed through the pipeline registers to the and stages, before it is finally written to the register file. This is an abridged diagram of our pipelined datapath. / / / / PC Instruction isters ALU Data Rt Rd March 3, 3 Forwarding

Forwarding Since the pipeline registers already contain the ALU result, we could just forward that value to subsequent instructions, to prevent data hazards. In clock cycle 4, the AND instruction can get the value $ - $3 from the / pipeline register used by sub. Then in cycle 5, the OR can get that same result from the / pipeline register being used by SUB. Clock cycle 3 4 5 6 7 sub $, $, $3 and $, $, $5 or $3, $6, $ March 3, 3 Forwarding 3

Outline of forwarding hardware A forwarding unit selects the correct ALU inputs for the stage. If there is no hazard, the ALU s operands will come from the register file, just like before. If there is a hazard, the operands will come from either the / or / pipeline registers instead. The ALU sources will be selected by two new multiplexers, with control signals named ForwardA and ForwardB. sub $, $, $3 and $, $, $5 or $3, $6, $ March 3, 3 Forwarding 4

Simplified datapath with forwarding muxes / / / / PC Instruction isters ForwardA ALU Data Rt Rd ForwardB March 3, 3 Forwarding 5

Detecting / data hazards So how can the hardware determine if a hazard exists? An / hazard occurs between the instruction currently in its stage and the previous instruction if:. The previous instruction will write to the register file, and. The destination is one of the ALU source registers in the stage. There is an / hazard between the two instructions below. sub $, $, $3 and $, $, $5 Data in a pipeline register can be referenced using a class-like syntax. For example, /.isterrt refers to the rt field stored in the / pipeline. March 3, 3 Forwarding 6

/ data hazard equations The first ALU source comes from the pipeline register when necessary. if (/.Write = and /.isterrd = /.isterrs) then ForwardA = The second ALU source is similar. if (/.Write = and /.isterrd = /.isterrt) then ForwardB = sub $, $, $3 and $, $, $5 March 3, 3 Forwarding 7

Detecting / data hazards A / hazard may occur between an instruction in the stage and the instruction from two cycles ago. One new problem is if a register is updated twice in a row. add $, $, $3 add $, $, $4 sub $5, $5, $ The SUB depends on both of the previous instructions, but only the most recent result (from the second ADD) should be forwarded. add $, $, $3 add $, $, $4 sub $5, $5, $ March 3, 3 Forwarding 8

/ hazard equations Here is an equation for detecting and handling / hazards for the first ALU source. if (/.Write = and /.isterrd = /.isterrs and /.isterrd /.isterrs) then ForwardA = The second ALU operand is handled similarly. if (/.Write = and /.isterrd = /.isterrt and /.isterrd /.isterrt) then ForwardB = March 3, 3 Forwarding 9

Simplified datapath with forwarding / / / / PC Instruction isters ForwardA ALU Data Rt Rd Rs ForwardB /. isterrt /.isterrd Forwarding Unit /.isterrd /. isterrs March 3, 3 Forwarding

The forwarding unit The forwarding unit has several control signals as inputs. /.isterrs /.isterrd /.isterrd /.isterrt /.Write /.Write (The two Write signals are not shown in the diagram, but they come from the control unit.) The fowarding unit outputs are selectors for the ForwardA and ForwardB multiplexers attached to the ALU. These outputs are generated from the inputs using the equations on the previous pages. Some new buses route data from pipeline registers to the new muxes. March 3, 3 Forwarding

Example sub $, $, $3 and $, $, $5 or $3, $6, $ add $4, $, $ sw $5, ($) Assume again each register initially contains its number plus. After the first instruction, $ should contain - ( - 3). The other instructions should all use - as one of their operands. We ll try to keep the example short. Assume no forwarding is needed except for register $. We ll skip the first two cycles, since they re the same as before. March 3, 3 Forwarding

Clock cycle 3 : or $3, $6, $ : and $, $, $5 : sub $, $, $3 / / / / PC Instruction 5 X X isters 5 3 3 ALU - Data 5 (Rt) (Rd) (Rs) /. isterrt /.isterrd 3 Forwarding Unit /. isterrs /.isterrd March 3, 3 Forwarding 3

Clock cycle 4: forwarding $ from / : add $4, $, $ : or $3, $6, $ : and $, $, $5 : sub $, $, $3 / / / / PC Instruction 6 X X isters 6 5-5 ALU 4 - Data (Rt) 3 (Rd) 6 (Rs) /. isterrt /.isterrd 5 Forwarding Unit /. isterrs - /.isterrd March 3, 3 Forwarding 4

Clock cycle 5: forwarding $ from / : sw $5, ($) : add $4, $, $ : or $3, $6, $ : and $, $, $5 : sub $, $, $3 / / / / PC Instruction - isters - - - 6 6 - ALU - 4 Data X (Rt) 4 (Rd) (Rs) /. isterrt 3 3 /.isterrd - Forwarding Unit /. 6 isterrs 4 /.isterrd - March 3, 3 Forwarding 5

Lots of data hazards The first data hazard occurs during cycle 4. The forwarding unit notices that the ALU s first source register for the AND is also the destination of the SUB instruction. The correct value is forwarded from the / register, overriding the incorrect old value still in the register file. A second hazard occurs during clock cycle 5. The ALU s second source (for OR) is the SUB destination again. This time, the value has to be forwarded from the / pipeline register instead. There are no other hazards involving the SUB instruction. During cycle 5, SUB writes its result back into register $. The ADD instruction can read this new value from the register file in the same cycle. March 3, 3 Forwarding 6

Complete pipelined datapath...so far / / Control M / / M PC Addr Instr Instruction Read register Read register Write register Write data Instr [5 - ] Rt Rd Rs Read data Read data isters Extend ALUSrc ALU Zero Dst Result Address Data Write data Read data /.isterrd Forwarding Unit /.isterrd March 3, 3 Forwarding 7

Miscellaneous comments Each MIPS instruction writes to at most one register. This makes the forwarding hardware easier to design, since there is only one destination register that ever needs to be forwarded. How could you handle the lw+ instruction from the homework? Forwarding is especially important, but especially difficult, with deep pipelines like the ones in all current PC processors. Section 6.4 of the textbook has some additional material not shown here. Their hazard detection equations also ensure that the source register is not $, which can never be modified. There is a more complex example of forwarding, with several cases covered. Take a look at it! March 3, 3 Forwarding 8

Summary In real code, most instructions are dependent upon other ones. This can lead to data hazards in our original pipelined datapath. Instructions can t write back to the register file soon enough for the next two instructions to read. Forwarding eliminates data hazards involving arithmetic instructions. The forwarding unit detects hazards by comparing the destination registers of previous instructions to the source registers of the current instruction. Hazards are avoided by grabbing results from the pipeline registers before they are written back to the register file. Next time we ll finish up pipelining. Forwarding can t save us in some cases involving lw. We still haven t talked about branches for the pipelined datapath. March 3, 3 Forwarding 9