CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

Similar documents
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

ECE 486/586. Computer Architecture. Lecture # 12

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Instruction Pipelining Review

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

There are different characteristics for exceptions. They are as follows:

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch

Lecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

CISC 662 Graduate Computer Architecture Lecture 7 - Multi-cycles

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

CISC 662 Graduate Computer Architecture Lecture 7 - Multi-cycles. Interrupts and Exceptions. Device Interrupt (Say, arrival of network message)

Appendix A. Overview

Exception Handling. Precise Exception Handling. Exception Types. Exception Handling Terminology

Appendix C: Pipelining: Basic and Intermediate Concepts

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Pipelining: Basic and Intermediate Concepts

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1

LECTURE 3: THE PROCESSOR

CS4617 Computer Architecture

The Processor: Improving the performance - Control Hazards

Instr. execution impl. view

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Appendix C. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Anti-Inspiration. It s true hard work never killed anybody, but I figure, why take the chance? Ronald Reagan, US President

COSC 6385 Computer Architecture - Pipelining (II)

Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST

Announcement. ECE475/ECE4420 Computer Architecture L4: Advanced Issues in Pipelining. Edward Suh Computer Systems Laboratory

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Basic Pipelining Concepts

Modern Computer Architecture

COMPUTER ORGANIZATION AND DESIGN

MIPS An ISA for Pipelining

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

Chapter 4. The Processor

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Full Datapath. Chapter 4 The Processor 2

EE 457 Unit 8. Exceptions What Happens When Things Go Wrong

Outline Marquette University

CS422 Computer Architecture

Chapter 4. The Processor

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations

ELE 655 Microprocessor System Design

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

POLITECNICO DI MILANO. Exception handling. Donatella Sciuto:

What are Exceptions? EE 457 Unit 8. Exception Processing. Exception Examples 1. Exceptions What Happens When Things Go Wrong

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Performance of tournament predictors In the last lecture, we saw the design of the tournament predictor used by the Alpha

LECTURE 10. Pipelining: Advanced ILP

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor

Computer Architecture

Pipelining. CSC Friday, November 6, 2015

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

COMPUTER ORGANIZATION AND DESI

Computer Systems Architecture I. CSE 560M Lecture 5 Prof. Patrick Crowley

COMP 4211 Seminar Presentation

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Administrivia. CMSC 411 Computer Systems Architecture Lecture 6. When do MIPS exceptions occur? Review: Exceptions. Answers to HW #1 posted

Lecture 7: Static ILP and branch prediction. Topics: static speculation and branch prediction (Appendix G, Section 2.3)

Advanced issues in pipelining

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Computer Architecture. Lecture 6.1: Fundamentals of

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations

Multi-cycle Instructions in the Pipeline (Floating Point)

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Pipelining. Maurizio Palesi

ECE 313 Computer Organization FINAL EXAM December 13, 2000

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

5008: Computer Architecture HW#2

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

CSEE 3827: Fundamentals of Computer Systems

COMPUTER ORGANIZATION AND DESIGN

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Pipeline Architecture RISC

Pipeline Complexities. Pipeline Hazards

Dynamic Control Hazard Avoidance

Slide Set 7. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

HY425 Lecture 05: Branch Prediction

The Big Picture Problem Focus S re r g X r eg A d, M lt2 Sub u, Shi h ft Mac2 M l u t l 1 Mac1 Mac Performance Focus Gate Source Drain BOX

Transcription:

CMCS 611-101 Advanced Computer Architecture Lecture 8 Control Hazards and Exception Handling September 30, 2009 www.csee.umbc.edu/~younis/cmsc611/cmsc611.htm Mohamed Younis CMCS 611, Advanced Computer Architecture 1

Previous Lecture: Data Hazards Lecture s Overview Forwarding techniques for simple data hazards resolution Data hazards classifications and detection logic Load-caused d pipeline stalls and how to limit it their scope Compiler-based instruction scheduling to avoid pipeline stalls Implementation of data hazard detection and forwarding logic This Lecture Control hazards Pipelining and exception handling Mohamed Younis CMCS 611, Advanced Computer Architecture 2

Pipeline Hazards Pipeline hazards are cases that affect instruction execution semantics and thus need to be detected and corrected Hazards types Structural hazard: attempt to use a resource two different ways at same time E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) Single memory for instruction ti and data Data hazard: attempt to use item before it is ready E.g., one sock of pair in dryer and one in washer; can t fold until get sock from washer through dryer instruction depends on result of prior instruction still in the pipeline Control hazard: attempt to make a decision before condition is evaluated E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in branch instructions Hazards can always be resolved by waiting Mohamed Younis CMCS 611, Advanced Computer Architecture 3

Control Hazard Control hazards are caused by the uncertainty of the execution path, branch taken or untaken If the branch is to be taken, the PC is not normally changed until the end of the MEM stage The simplest way to deal with control hazards is to stall the pipeline Branch Instruction IF ID EX DM WB Branch successor IF Stall Stall IF ID EX DM WB ID Branch successor + 1 IF ID EX DM WB Branch successor + 2 IF ID EX DM Impact: 2 clock cycles wasted per branch instruction slow Performance penalty can be limited by: move up decision to 2 nd stage by adding hardware to check registers as being read (MIPS allows only comparison with zero in the condition) Compute the address of the branch target earlier in the pipeline Mohamed Younis CMCS 611, Advanced Computer Architecture 4

Taken address calculation Enhanced Pipeline Datapath Result of condition evaluation When taken, IF/ID is reset to zero to make it a nop Mohamed Younis CMCS 611, Advanced Computer Architecture 5

Impact of Modifications Condition evaluated and taken address calculated at ID stage I n s t r. Add Time (clock cycles) ALU Mem Reg Mem Reg O r d e r Beq Load ALU Mem Reg Mem Reg Stall Mem ALU Reg Mem Reg 2 clock cycles per branch instr. (1 cycle wasted) still slow * Figure is courtesy of Dave Patterson Mohamed Younis CMCS 611, Advanced Computer Architecture 6

Control Hazard Solution (I) Predict: guess one direction then back off if wrong Predict untaken: machine state must be updated to ensure correct semantics Impact: 1 clock cycles per branch instruction if right, 2 if wrong Taken Branch Instr. IF ID EX DM WB Branch successor IF idle idle idle idle Branch target IF ID EX DM WB Branch target + 1 IF ID EX DM WB Branch target + 2 IF ID EX DM WB I n s t r. O r d e r Add Beq Load Time (clock cycles) ALU Mem Reg Mem Reg ALU Mem Reg Mem Reg Mem ALU Reg Mem Reg * Figure is courtesy of Dave Patterson Mohamed Younis CMCS 611, Advanced Computer Architecture 7

Branch Behavior in Programs Forward taken nditional branches Percentag ge of all co Backward taken k Benchmar Forward conditional branches Backward conditional branches Unconditional branches Benchmark Percentage of instructions executed 60% of forward and 15% of backward ard branches are taken (mainly loops) 67% of the branches (forward and backward) are taken on average Mohamed Younis CMCS 611, Advanced Computer Architecture 8

Control Hazard Solution (II) Redefine branch behavior (takes place after next instr.) delayed branch Compiler optimization plays an essential role in filling delay slots I n s t r. O r d e r Add Beq Misc Time (clock cycles) ALU Mem Reg Mem Reg ALU Mem Reg Mem Reg Mem ALU Reg Mem Reg Load Mem AL LU Reg Mem Reg Impact: 0 clock cycles per branch instruction if can find instructions to put in the delay slot * Slide is courtesy of Dave Patterson Mohamed Younis CMCS 611, Advanced Computer Architecture 9

Scheduling Branch-Delay Slots Replication is necessary Must be OK to execute, e.g. R7 is temp reg Best scenario Good for loops Good taken strategy Mohamed Younis CMCS 611, Advanced Computer Architecture 10

Branch-Delay Scheduling Requirements Scheduling Improves performance Requirements Strategy when? (a) From before Branch must not depend on the rescheduled instructions (b) From target (c) From fall through Must be OK to execute rescheduled instructions if branch is not taken. May need to duplicate instructions. Must be okay to execute instructions if branch is taken. Always When branch is taken. May enlarge programs if instructions are duplicated. When branch is not taken. The limitation on delayed-branch scheduling arise from: 1. Restrictions on the instructions that are scheduled into the delay slots 2. Ability to predict at compile-time whether a branch is likely to be taken To improve the ability of the compiler to fill branch delay slots with little undo penalty, a capability for cancellation (turning to no-op) is added Additional PC is needed to allow safe operation in case of interrupts Mohamed Younis CMCS 611, Advanced Computer Architecture 11

Performance of Branch-Delay Percentage of Conditional Branche es Canceled delay slots Empty slot Benchmark On average 30% of the branch delay slots are wasted (integer programs are worse than floating point) Mohamed Younis CMCS 611, Advanced Computer Architecture 12

Static Branch Prediction Examination of program behavior Assume branch is usually taken based on statistics but misprediction rate still range from 9%-59% Predict on branch direction forward/backward d based on statistics ti ti and code generation convention Profile information from earlier program runs Instru ctions betw ween mi ispredictio on Predict taken Profile based Mohamed Younis CMCS 611, Advanced Computer Architecture 13

Profile-based Predictor Performance rate ediction r Mispre Benchmark Misprediction rate varies widely but is generally better for FP programs Actual performance depends on both prediction accuracy & branch frequency Mohamed Younis CMCS 611, Advanced Computer Architecture 14

Performance of MIPS Integer Pipeline of all ins structions s that sta all Per rcentage Branch stalls Load stalls Performance is based on a perfect memory system with all cache hits Basic delayed branch with cancellation support and thus costing one delay cycle Integer programs exhibit an average of 0.06 branch stalls per instruction and 0.05 load stalls per instruction Average CPI from MIPS pipelining with perfect memory is 1.11 SPECint92 performance can be enhanced by a factor of 5/1.11 = 4.5 Mohamed Younis CMCS 611, Advanced Computer Architecture 15

Exception Types and Requirements Famous Types of Exceptions I/O device request Breakpoint Integer arithmetic overflow FP arithmetic anomaly Page fault Misaligned memory accesses Memory-protection violation Undefined instruction Privilege violation Hardware and power failure Requirements Characterization Synchronous vs. asynchronous Async. Excep. are caused by I/O devices and allows completion of current instr. User requested vs. coerced requested exceptions are predictable and easier to handle User Maskable vs. unmaskable Within vs. between instructions Exceptions occurring within instructions are synchronous It is harder to deal with exceptions that occur within instructions Resume vs. terminate it is easier to implement exceptions that terminate program execution Mohamed Younis CMCS 611, Advanced Computer Architecture 16

Exception Categorization (Example) Synchronous User req. User Within Vs Resume Exception Type Vs Asynchronous vs coerced maskable Vs nonmaskable between instructions Vs terminate I/O device request Asynchronous Coerced nonmaskable between Resume Invoke operating system synchronous User req. nonmaskable between Resume Tracing instruction execution synchronous User req. maskable between Resume Breakpoint synchronous User req. maskable between Resume Integer arithmetic overflow synchronous Coerced maskable Within Resume FP overflow or underflow synchronous Coerced maskable Within Resume Page fault synchronous Coerced nonmaskable Within Resume Misaligned memory access synchronous Coerced maskable Within Resume Memory protection violations synchronous Coerced nonmaskable Within Resume Using undefined instructions synchronous Coerced nonmaskable Within terminate Hardware malfunctions Asynchronous Coerced nonmaskable Within terminate Power failure Asynchronous Coerced nonmaskable Within terminate Exceptions occurring within an instruction are the most difficult to handle since they require background saving of the program state Pipelines is called restartable, if it allows an instruction to be restarted after exception handling Mohamed Younis CMCS 611, Advanced Computer Architecture 17

Stopping & Restarting Execution Some exception, e.g. page fault, takes place while the instruction is in the MEM stage, requiring the instruction to be restarted When an exception occurs, the pipeline control can do the following: 1. Force a trap instruction into the pipeline on the next IF 2. Until the trap is taken, turn off all writes for the faulting instruction and all the instructions that follow 3. After the exception-handling routine in the operating system receives control, it saves the PC of the faulting instruction When using delayed branching, it is impossible to recreate the state using a single PC since the instructions may not be sequentially related A pipeline that allows instructions before the faulting one to complete and those after it to restart, is said to have precise exception Ideally faulting instructions will not change the state of the machine before the exception occur, however if it does, exception handling gets complex Supporting precise exceptions simplifies the operating system interface and is required in machines that implements demand paging Mohamed Younis CMCS 611, Advanced Computer Architecture 18

Exceptions in MIPS Pipeline Stage Problem exceptions occurring IF ID EX MEM WB Page fault on instruction fetch; misaligned memory access; memory protection violation Undefined or illegal opcode Arithmetic exception Page fault on data fetch; misaligned memory access; memory protection violation None Multiple exceptions might occur since multiple instructions are executing (LW followed by DIV might cause page fault and an arith. exceptions in same cycle) Exceptions can even occur out of order (e.g. a page fault of instr. memory can occur earlier than a page fault of data memory caused by the proceeding instruction in the pipeline) Pipeline exceptions has to be handled in order of execution of faulting instructions not according to the time they occur Mohamed Younis CMCS 611, Advanced Computer Architecture 19

Precise Exception Handling The MIPS Approach: Hardware posts all exceptions caused by a given instruction in a status vector associated with the instruction The exception status vector is carried along as the instruction goes down the pipeline Once an exception indication is set in the exception status vector, any control signal that may cause a data value to be written is turned off Upon entering the WB stage the exception status vector is checked and the exceptions, if any, will be handled according to the time they occurred Allowing an instruction to continue execution till the WB stage is not a problem since all write operations for that instruction will be disallowed Notes: The MIPS machine design does not allow exception to occur at the WB stage All write operations in the MIPS pipeline are in late stages Machines that allow writing in early pipeline stages are difficult to handle since exceptions can occur after the machine state has been already changed Mohamed Younis CMCS 611, Advanced Computer Architecture 20

Instruction Set Complications Early-Write Instructions An instruction that has a single write at the last stage of the pipeline, e.g. MIPS, can easily handle exceptions Machines that allow for multiple writes, e.g. VAX auto-increment, during instr. execution, usually require a capability to rollback the effect of an instruction Instructions that update the memory state during execution, e.g. string copy, temporary registers are saved and restored to allow instruction to continue Branching mechanisms Code based branching set by non-branch instructions requires special care if an exception happen prior to executing the branch instruction Multi-cycle operations In machines with widely different instruction cycles, an instruction can make multiple l writes and cause complex data hazard, and thus exception become very hard to handle If architects realize the relationship between instruction set design and pipelining, they can enable efficient pipelining Mohamed Younis CMCS 611, Advanced Computer Architecture 21

Summary Control hazards Conclusion Limiting the effect of control hazards via branch prediction and delay Static branch prediction techniques and performance Branch delay issues and performance Exceptions handling Categorizing of exception based on types and handling requirements Issues of stopping and restarting instructions in a pipeline Precise exception handling and the conditions that enable it Instructions ti set effects on complicating pipeline design Next Lecture Pipelining floating point operations An example pipeline: MIPS R4000 Reading assignment includes Appendix A.4 & A.5 in the textbook Mohamed Younis CMCS 611, Advanced Computer Architecture 22