The Processor: Improving the performance - Control Hazards

Similar documents
LECTURE 3: THE PROCESSOR

Chapter 4 The Processor 1. Chapter 4B. The Processor

Full Datapath. Chapter 4 The Processor 2

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

COMPUTER ORGANIZATION AND DESIGN

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

COMPUTER ORGANIZATION AND DESI

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Thomas Polzer Institut für Technische Informatik

Chapter 4. The Processor

EIE/ENE 334 Microprocessors

CSEE 3827: Fundamentals of Computer Systems

Chapter 4. The Processor

Chapter 4. The Processor

Lecture 9 Pipeline and Cache

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CENG 3420 Lecture 06: Pipeline

Chapter 4. The Processor

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

MIPS An ISA for Pipelining

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor. Jiang Jiang

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

1 Hazards COMP2611 Fall 2015 Pipelined Processor

The Processor: Instruction-Level Parallelism

There are different characteristics for exceptions. They are as follows:

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Computer Architecture

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

ECE232: Hardware Organization and Design

Unresolved data hazards. CS2504, Spring'2007 Dimitris Nikolopoulos

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 2: Processor and Pipelining 1

Instr. execution impl. view

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Pipelining. CSC Friday, November 6, 2015

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Pipelining - Part 2. CSE Computer Systems October 19, 2001

Control Hazards. Prediction

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

LECTURE 10. Pipelining: Advanced ILP

Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov

Modern Computer Architecture

Control Hazards. Branch Prediction

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures

Chapter 4 The Processor (Part 4)

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

Complex Pipelines and Branch Prediction

(Refer Slide Time: 00:02:04)

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

ECE331: Hardware Organization and Design

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Instruction Pipelining Review

ELE 655 Microprocessor System Design

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

Processor (II) - pipelining. Hwansoo Han

What about branches? Branch outcomes are not known until EXE What are our options?

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties

Full Datapath. Chapter 4 The Processor 2

CS311 Lecture: Pipelining and Superscalar Architectures

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

ECE 486/586. Computer Architecture. Lecture # 12

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

ECE260: Fundamentals of Computer Engineering

Assignment 1 solutions

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

ECE 4750 Computer Architecture, Fall 2017 T13 Advanced Processors: Branch Prediction

Quiz 5 Mini project #1 solution Mini project #2 assigned Stalling recap Branches!

CS 152 Computer Architecture and Engineering. Lecture 5 - Pipelining II (Branches, Exceptions)

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

EITF20: Computer Architecture Part2.2.1: Pipeline-1

COSC121: Computer Systems. ISA and Performance

COMPUTER ORGANIZATION AND DESIGN

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Chapter 4. The Processor

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Quiz 5 Mini project #1 solution Mini project #2 assigned Stalling recap Branches!

Multi-cycle Instructions in the Pipeline (Floating Point)

Transcription:

The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU

Summary Previous Class Reducing pipeline data hazards Today: Reducing pipeline branch hazards Delayed Branch Branch Prediction Exceptions and Interrupts 2

Branch Hazards When the flow of instruction addresses is not sequential (i.e., PC = PC + 4); incurred by change of flow instructions Unconditional branches (j, jal, jr) Conditional branches (beq, bne) Exceptions Possible approaches Stall (impacts CPI) Move decision point as early in the pipeline as possible, thereby reducing the number of stall cycles Delay decision (requires compiler support) Predict and hope for the best! Control hazards occur less frequently than data hazards, but there is nothing as effective against control hazards as forwarding is for data hazards 3

Branch Instructions Cause Control Hazards Dependencies backward in time cause hazards I n s t r. O r d e r beq Inst_target Inst 3 Inst 4 IM Reg DM Reg IM Reg DM Reg IM Reg DM Reg IM Reg DM Reg 4

Pipelined Control (Simplified) 5

Branch Hazards nop instruction (or bubble) inserted between two instructions in the pipeline Keep the instructions earlier in the pipeline (later in the code) from progressing down the pipeline for a cycle ( bounce them in place with write control signals) Insert nop by zeroing control bits in the pipeline register at the appropriate stage Let the instructions later in the pipeline (earlier in the code) progress normally down the pipeline Flushes (or instruction squashing) were an instruction in the pipeline is replaced with a nop instruction (as done for instructions located sequentially after j instructions) Zero the control bits for the instruction to be flushed 6

One Way to Fix a Branch Control Hazard I n s t r. beq flush IM Reg DM Reg IM Reg DM Reg Fix branch hazard by waiting flush but affects CPI O r d e r flush flush beq target IM Reg DM Reg IM Reg DM Reg IM Reg DM Reg Inst 3 IM Reg DM 7

Pipelined Control (Simplified) 8

Branch Hazards Move branch decision hardware back to as early in the pipeline as possible i.e., during the decode cycle I n s t r. beq flush IM Reg DM Reg IM Reg DM Reg Fix remaining jump hazard by waiting flush O r d e r beq target One stall is still needed IM Reg DM Reg 9

Reducing the Delay of Branches Move the branch decision hardware back to the EX stage Add hardware to compute the branch target address and evaluate the branch decision to the ID stage Reduces the number of stall (flush) cycles to one like with jumps but now need to add forwarding hardware in ID stage For deeper pipelines, branch decision points can be even later in the pipeline, incurring more stalls 10

Delayed Branches If the branch hardware has been moved to the ID stage, then we can eliminate all branch stalls with delayed branches which are defined as always executing the next sequential instruction after the branch instruction the branch takes effect after that next instruction MIPS compiler moves an instruction to immediately after the branch that is not affected by the branch (a safe instruction) thereby hiding the branch delay With deeper pipelines, the branch delay grows requiring more than one delay slot Delayed branches have lost popularity compared to more expensive but more flexible (dynamic) hardware branch prediction Growth in available transistors has made hardware branch prediction relatively cheaper 11

Scheduling Branch Delay Slots A. From before branch B. From branch target C. From fall through add $1,$2,$3 bne $2,$0,40 delay slot sub $4,$5,$6 add $1,$2,$3 bne $2,$0,-40 delay slot add $1,$2,$3 bne $2,$0,40 delay slot sub $4,$5,$6 becomes becomes becomes sub $4,$5,$6 add $1,$2,$3 bne $2,$0,40 bne $2,$0,44 add $1,$2,$3 add $1,$2,$3 bne $2,$0,-36 sub $4,$5,$6 A is the best choice, fills delay slot and reduces IC sub $4,$5,$6 In B and C, the sub instruction may need to be copied, increasing IC In B and C, must be okay to execute sub when branch fails 12

Branch Prediction Longer pipelines can t readily determine branch outcome early Stall penalty becomes unacceptable Predict outcome of branch Only stall if prediction is wrong In MIPS pipeline Can predict branches not taken Fetch instruction after branch, with no delay 13

Static Branch Prediction Resolve branch hazards by assuming a given outcome and proceeding without waiting to see the actual branch outcome Predict not taken always predict branches will not be taken, continue to fetch from the sequential instruction stream, only when branch is taken does the pipeline stall If taken, flush instructions after the branch (earlier in the pipeline) ensure that those flushed instructions haven t changed the machine state restart the pipeline at the branch destination 14

MIPS with Predict Not Taken Prediction correct Prediction incorrect 15

Flushing with Misprediction (Not Taken) I n s t r. O r d e r 4 beq $1,$2,2 8 sub $4,$1,$5 flush 16 and $6,$1,$7 20 or r8,$1,$9 IM Reg DM Reg IM Reg DM Reg IM Reg DM Reg IM Reg DM Reg 16

Static Branch Prediction Predict not taken works well for top of the loop branching structures But such loops have jumps at the bottom of the loop to return to the top of the loop Loop: beq $1,$2,Out 1 nd loop instr... last loop instr j Loop Out: fall out instr Predict not taken doesn t work well for bottom of the loop branching structures Loop: 1 st loop instr 2 nd loop instr... last loop instr bne $1,$2,Loop fall out instr 17

More-Realistic Branch Prediction Static branch prediction Based on typical branch behavior Example: loop and if-statement branches Predict backward branches taken Predict forward branches not taken Dynamic branch prediction Hardware measures actual branch behavior e.g., record recent history of each branch Assume future behavior will continue the trend When wrong, stall while re-fetching, and update history 18

Dynamic Branch Prediction Dynamic prediction: Branch prediction buffer (aka branch history table) Indexed by recent branch instruction addresses Stores outcome (taken/not taken) To execute a branch Check table, expect the same outcome Start fetching from fall-through or target If wrong, flush pipeline and flip prediction 19

1-Bit Predictor: Shortcoming Inner loop branches mispredicted twice! outer: inner: beq,, inner beq,, outer Mispredict as taken on last iteration of inner loop Then mispredict as not taken on first iteration of inner loop next time around 20

2-Bit Predictor Only change prediction on two successive mispredictions 21

Exceptions and Interrupts Unexpected events requiring change in flow of control Different ISAs use the terms differently Exception Arises within the CPU e.g., undefined opcode, overflow, syscall, Interrupt From an external I/O controller Dealing with them without sacrificing performance is hard. 22

Handling Exceptions In MIPS, exceptions managed by a System Control Coprocessor (CP0) Save PC of offending (or interrupted) instruction In MIPS: Exception Program Counter (EPC) Save indication of the problem In MIPS: Cause register We ll assume 1-bit 0 for undefined opcode, 1 for overflow Jump to handler at 8000 0180 23

An Alternate Mechanism Vectored Interrupts Handler address determined by the cause Example: Undefined opcode: C000 0000 Overflow: C000 0020 : C000 0040 Instructions either Deal with the interrupt, or Jump to real handler 24

Handler Actions Read cause, and transfer to relevant handler Determine action required If restartable Take corrective action Use EPC to return to program Otherwise Terminate program Report error using EPC, cause, 25

Exceptions in a Pipeline Another form of control hazard Consider overflow on add in EX stage add $1, $2, $1 Prevent $1 from being clobbered Complete previous instructions Flush add and subsequent instructions Set Cause and EPC register values Transfer control to handler Similar to mispredicted branch Use much of the same hardware 26

Exception Properties Restartable exceptions Pipeline can flush the instruction Handler executes, then returns to the instruction Refetched and executed from scratch PC saved in EPC register Identifies causing instruction Actually PC + 4 is saved Handler must adjust 27

Multiple Exceptions Pipelining overlaps multiple instructions Could have multiple exceptions at once Simple approach: deal with exception from earliest instruction Flush subsequent instructions Precise exceptions In complex pipelines Multiple instructions issued per cycle Out-of-order completion Maintaining precise exceptions is difficult! 28

Imprecise Exceptions Just stop pipeline and save state Including exception cause(s) Let the handler work out Which instruction(s) had exceptions Which to complete or flush May require manual completion Simplifies hardware, but more complex handler software Not feasible for complex multiple-issue out-of-order pipelines 29

Conclusion Must detect and resolve hazards Control hazards put the branch decision hardware in as early a stage in the pipeline as possible Stall (impacts CPI) Delay decision (requires compiler support) Static and dynamic prediction (requires hardware support) Pipelining complicates exception handling 30

Next Class Multiple issue processors Instruction-Level Parallelism (ILP) 31

The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU