MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Similar documents
3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Pipelining. CSC Friday, November 6, 2015

Processor (II) - pipelining. Hwansoo Han

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Full Datapath. Chapter 4 The Processor 2

Computer Architecture. Lecture 6.1: Fundamentals of

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Chapter 4. The Processor

LECTURE 3: THE PROCESSOR

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4. The Processor

ECE260: Fundamentals of Computer Engineering

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Chapter 4 (Part II) Sequential Laundry

Full Datapath. Chapter 4 The Processor 2

COMPUTER ORGANIZATION AND DESIGN

The Processor: Improving the performance - Control Hazards

ECE 154A Introduction to. Fall 2012

CSEE 3827: Fundamentals of Computer Systems

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN

COMP2611: Computer Organization. The Pipelined Processor

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

The Processor: Instruction-Level Parallelism

Outline Marquette University

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Chapter 4 The Processor 1. Chapter 4B. The Processor

Introduction to Pipelined Datapath

ECE232: Hardware Organization and Design

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

COMPUTER ORGANIZATION AND DESIGN

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

1 Hazards COMP2611 Fall 2015 Pipelined Processor

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Lecture 7 Pipelining. Peng Liu.

Thomas Polzer Institut für Technische Informatik

EIE/ENE 334 Microprocessors

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Chapter 4. The Processor

Pipelining. Maurizio Palesi

Chapter 4. The Processor

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Instr. execution impl. view

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

CENG 3420 Lecture 06: Pipeline

Pipeline: Introduction

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Chapter 4. The Processor

EITF20: Computer Architecture Part2.2.1: Pipeline-1

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Pipelining: Hazards Ver. Jan 14, 2014

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

14:332:331 Pipelined Datapath

ECEC 355: Pipelining

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Computer Architecture

Instruction Pipelining

Lecture 2: Processor and Pipelining 1

Modern Computer Architecture

What is Pipelining? RISC remainder (our assumptions)

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Instruction Pipelining

Pipeline Review. Review

COSC 6385 Computer Architecture - Pipelining

Pipelined Processor Design

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

MIPS An ISA for Pipelining

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Pipelined Processor Design

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Lecture 6: Pipelining

Computer Systems Architecture Spring 2016

ECE331: Hardware Organization and Design

ארכי טק טורת יחיד ת עיבוד מרכזי ת

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

ECE331: Hardware Organization and Design

Chapter 4. The Processor

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

Transcription:

MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK and from Prof. Mary Jane Irwin, PSU

Summary Previous Class The MIPS architecture Today: Pipeline Pipeline hazards 2

MIPS Datapath 3

Making It Faster Start fetching and executing the next instruction before the current one has completed Pipelining modern processors are pipelined for performance CPU time = CPI * CC * IC Start the next instruction before the current one has completed improves throughput total amount of work done in a given time instruction latency (execution time, delay time, response time - time from the start of an instruction to its completion) is not reduced Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem WB R-type IFetch Dec Exec Mem WB 4

Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Speedup Four loads: Speedup = 8 / 3.5 = 2.3 Non-stop: Speedup = 2n / (0.5n + 1.5) 4 = number of stages 5

MIPS Pipeline Five stages, one step per stage 1. IF: Instruction fetch from memory 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 IF ID EX MEM WB IM Reg MEM Reg ALU 6

Pipeline Performance Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single-cycle datapath Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps 7

Pipeline Performance Single-cycle (T c = 800ps) Pipelined (T c = 200ps) 8

Pipeline for Performance Time (clock cycles) I n s t r. Inst 0 Inst 1 ALU IM Reg DM Reg ALU IM Reg DM Reg Once the pipeline is full, one instruction is completed every cycle, so CPI = 1 O r d e r Inst 2 Inst 3 Inst 4 ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg Time to fill the pipeline 9

Pipeline Speedup If all stages are balanced i.e., all take the same time Time between instructions pipelined = Time between instructions nonpipelined Number of stages If not balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decrease 10

Pipelining and ISA Design MIPS ISA designed for pipelining All instructions are 32-bits easier to fetch and decode in one cycle c.f. x86: 1- to 17-byte instructions Few and regular instruction formats can decode and read registers in one step Memory operations occur only in loads and stores can use the execute stage to calculate memory addresses Each instruction writes at most one result and does it in the last few pipeline stages (MEM or WB) Operands must be aligned in memory a single data transfer takes only one data memory access 11

Pipeline Registers Need registers between stages To hold information produced in previous cycle 12

Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined datapath Single-clock-cycle pipeline diagram Shows pipeline usage in a single cycle Highlight resources used c.f. multi-clock-cycle diagram Graph of operation over time We ll look at single-clock-cycle diagrams for load & store 13

IF for Load, Store, 14

ID for Load, Store, 15

EX for Load 16

MEM for Load 17

WB for Load Wrong register number 18

Corrected Datapath for Load 19

EX for Store 20

MEM for Store 21

WB for Store 22

Multi-Cycle Pipeline Diagram Form showing resource usage 23

Multi-Cycle Pipeline Diagram Traditional form 24

Single-Cycle Pipeline Diagram State of pipeline in a given cycle 25

Pipelined Control (Simplified) 26

Pipelined Control Control signals derived from instruction As in single-cycle implementation 27

Pipelined Control 28

Can Pipelining Get Us Into Trouble? Pipeline Hazards - Situations that prevent starting the next instruction in the next cycle structural hazards: attempt to use the same resource by two different instructions at the same time data hazards: deciding on control action depends on previous instruction An instruction s source operand(s) are produced by a prior instruction still in the pipeline control hazards: attempt to make a decision about program control flow before the condition has been evaluated by a previous instruction branch and jump instructions, exceptions Can usually resolve hazards by waiting (stall) pipeline control must detect the hazard and take action to resolve hazards 29

Structural Hazards Conflict for use of a resource In MIPS pipeline with a single memory Load/store requires data access Instruction fetch would have to stall for that cycle Would cause a pipeline bubble Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches 30

Data Hazards An instruction depends on completion of data access by a previous instruction add $s0, $t0, $t1 sub $t2, $s0, $t3 31

Forwarding (aka Bypassing) Use result when it is computed Don t wait for it to be stored in a register Requires extra connections in the datapath 32

Load-Use Data Hazard Can t always avoid stalls by forwarding If value not computed when needed Can t forward backward in time! 33

Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction C code for A = B + E; C = B + F; stall stall lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 13 cycles lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 11 cycles 34

Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always fetch correct instruction Still working on ID stage of branch In MIPS pipeline Need to compare registers and compute target early in the pipeline Add hardware to do it in ID stage 35

Stall on Branch Wait until branch outcome determined before fetching next instruction 36

Branch Prediction Longer pipelines can t readily determine branch outcome early Stall penalty becomes unacceptable Predict outcome of branch Only stall if prediction is wrong In MIPS pipeline Can predict branches not taken Fetch instruction after branch, with no delay 37

MIPS with Predict Not Taken Prediction correct Prediction incorrect 38

More-Realistic Branch Prediction Static branch prediction Based on typical branch behavior Example: loop and if-statement branches Predict backward branches taken Predict forward branches not taken Dynamic branch prediction Hardware measures actual branch behavior e.g., record recent history of each branch Assume future behavior will continue the trend When wrong, stall while re-fetching, and update history 39

Pipeline in a nutshell The BIG Picture Pipelining improves performance by increasing instruction throughput Executes multiple instructions in parallel Each instruction has the same latency Subject to hazards Structure, data, control Instruction set design affects complexity of pipeline implementation 40

Other Pipeline Structures Are Possible What about the (slow) multiply operation? make the clock twice as slow or let it take two cycles (since it doesn t MUL use the DM stage) IM Reg DM Reg ALU What if the data memory access is twice as slow as the instruction memory? make the clock twice as slow or let data memory access take two cycles (and keep the same clock rate) IM Reg DM1 DM2 Reg ALU 41

Other Sample Pipeline Alternatives ARM7 IM Reg EX PC update IM access decode reg access ALU op DM access shift/rotate commit result (write back) XScale IM access shift/rotate reg 2 access IM1 IM2 Reg SHFT DM1 ALU start DM access exception Reg DM2 PC update BTB access start IM access decode reg 1 access ALU op DM write reg write 42

Conclusion All modern day processors use pipelining Pipelining doesn t help latency of single instruction, it helps throughput of entire workload Potential speedup: a CPI of 1 and faster CC Pipeline rate limited by slowest pipeline stage Unbalanced pipe stages makes for inefficiencies The time to fill pipeline and time to drain it can impact speedup for deep pipelines and short code runs Must detect and resolve hazards Stalling negatively affects CPI (makes CPI higher than the ideal of 1) 43

Next Classes Reducing pipeline data and branch hazards Forwarding Delayed Branch & Branch Prediction Exceptions and Interrupts 44

MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK and from Prof. Mary Jane Irwin, PSU