A Fault Tolerant Superscalar Processor

Similar documents
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

Portland State University ECE 587/687. The Microarchitecture of Superscalar Processors

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Superscalar Processors Ch 14

15-740/ Computer Architecture Lecture 22: Superscalar Processing (II) Prof. Onur Mutlu Carnegie Mellon University

Coverage ofa Microarchitecture-level Fault Check Regimen in a Superscalar Processor

Eric Rotenberg Karthik Sundaramoorthy, Zach Purser

Handout 2 ILP: Part B

Architectures for Instruction-Level Parallelism

15-740/ Computer Architecture Lecture 21: Superscalar Processing. Prof. Onur Mutlu Carnegie Mellon University

Superscalar Processors

Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window

Superscalar Processors

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Superscalar Processor Design

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Eric Rotenberg Jim Smith North Carolina State University Department of Electrical and Computer Engineering

Computer Architecture: Multithreading (III) Prof. Onur Mutlu Carnegie Mellon University

Lecture 7: Static ILP and branch prediction. Topics: static speculation and branch prediction (Appendix G, Section 2.3)

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

In embedded systems there is a trade off between performance and power consumption. Using ILP saves power and leads to DECREASING clock frequency.

ECE 341. Lecture # 15

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

Lecture-13 (ROB and Multi-threading) CS422-Spring

Case Study IBM PowerPC 620

Execution-based Prediction Using Speculative Slices

5008: Computer Architecture

Transient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof.

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.

CPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation

Branch Prediction Chapter 3

EITF20: Computer Architecture Part3.2.1: Pipeline - 3

Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar, VLIW. CPE 631 Session 19 Exploiting ILP with SW Approaches

Superscalar Machines. Characteristics of superscalar processors

Looking for Instruction Level Parallelism (ILP) Branch Prediction. Branch Prediction. Importance of Branch Prediction

15-740/ Computer Architecture Lecture 23: Superscalar Processing (III) Prof. Onur Mutlu Carnegie Mellon University

Control Hazards. Branch Prediction

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Chapter 9 Pipelining. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Superscalar Processors Ch 13. Superscalar Processing (5) Computer Organization II 10/10/2001. New dependency for superscalar case? (8) Name dependency

Adapted from David Patterson s slides on graduate computer architecture

Data-flow prescheduling for large instruction windows in out-of-order processors. Pierre Michaud, André Seznec IRISA / INRIA January 2001

Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.

Chapter 06: Instruction Pipelining and Parallel Processing

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

Micro-Architectural Attacks and Countermeasures

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

The Processor: Instruction-Level Parallelism

Portland State University ECE 587/687. Memory Ordering

Wide Instruction Fetch

Looking for Instruction Level Parallelism (ILP) Branch Prediction. Branch Prediction. Importance of Branch Prediction

Appendix A.2 (pg. A-21 A-26), Section 4.2, Section 3.4. Performance of Branch Prediction Schemes

Chapter 4. The Processor

Locality-Based Information Redundancy for Processor Reliability

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , )

CMSC22200 Computer Architecture Lecture 8: Out-of-Order Execution. Prof. Yanjing Li University of Chicago

Instruction Fetch and Branch Prediction. CprE 581 Computer Systems Architecture Readings: Textbook (4 th ed 2.3, 2.9); (5 th ed 3.

Processors. Young W. Lim. May 12, 2016

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Architectural Performance. Superscalar Processing. 740 October 31, i486 Pipeline. Pipeline Stage Details. Page 1

Control Hazards. Prediction

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions

E0-243: Computer Architecture

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines

Lecture 8: Branch Prediction, Dynamic ILP. Topics: static speculation and branch prediction (Sections )

Portland State University ECE 587/687. Memory Ordering

Hardware-based Speculation

What is Pipelining? RISC remainder (our assumptions)

Simulation Of Computer Systems. Prof. S. Shakya

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

Static vs. Dynamic Scheduling

Dynamic Control Hazard Avoidance

Chapter 3 (CONT) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,2013 1

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Announcements. EE382A Lecture 6: Register Renaming. Lecture 6 Outline. Dynamic Branch Prediction Using History. 1. Branch Prediction (epilog)

Hardware-Based Speculation

Data Speculation. Architecture. Carnegie Mellon School of Computer Science

Copyright 2012, Elsevier Inc. All rights reserved.

Instruction-Level Parallelism and Its Exploitation

Processor Architecture V! Wrap-Up!

Four Steps of Speculative Tomasulo cycle 0

Dynamic Memory Scheduling

Speculation and Future-Generation Computer Architecture

TECH. CH14 Instruction Level Parallelism and Superscalar Processors. What is Superscalar? Why Superscalar? General Superscalar Organization

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Precise Exceptions and Out-of-Order Execution. Samira Khan

Simultaneous Multithreading Processor

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

Transcription:

A Fault Tolerant Superscalar Processor 1 [Based on Coverage of a Microarchitecture-level Fault Check Regimen in a Superscalar Processor by V. Reddy and E. Rotenberg (2008)] P R E S E N T E D B Y NAN Z H E N G [Part of slides borrowed from V. Reddy s slides in DSN2008]

Introduction FT in processors: why Outline Superscalar processors: what and why Conventional processor FT, related drawbacks Hardware & info & time redundancy The need for a regimen-based FT 2

Outline (Cont.) Regimen-based FT (RFT) by Reddy and Rotenberg (2008) FT regimen Inherent Time Redundancy (ITR) Register Name Authentication (RNA) Timestamp-based Assertion Check (TAC) Sequential PC Checks (SPC) Register Consumer Counter (CC) BFT Verify (BTBV) Simulation Approach & Result Summary 3

Introduction Why Fault Tolerance (FT) in processors: Critical charge decreases with processor die area (quadratically), i.e, making easier to flip a bit. Cosmic rays in atmosphere being a source Superscalar processors: what and why What? Processors that exploit ILP by fetching & executing multiple instructions per cycle from a sequential instruction stream. Why? Almost all modern processors are superscalar 4

Introduction (Cont.) 5

Introduction (Cont.) Conventional FT schemes in processors Basic idea: some form of redundancy Hardware redundancy Additional FU especially for redundancy execution Drawbacks: silicon area overhead, not for commercial processors Information redundancy Error-correcting code (ECC) in memory Control flow based signals Checksums for algorithm-based FT Time redundancy Instruction re-execution Retrasmission of data Note: Additional overheads in silicon area, pipeline stalls Only focused on FUs, errors can also occur in DU, DS and RF Need a systematic suite of fault checks to achieve maximum coverage over all pipeline stages, and minimum overhead at the same time 6

Overview on FT regimen: Regimen-based FT Inherent Time Redundancy (ITR) Register Name Authentication (RNA) Timestamp-based Assertion Check (TAC) Sequential PC Check (SPC) Register Consumer Counter (CC) Confident Branch Misprediction (ConfBr) BTB Verify (BTBV) Individuals explained next 7

Inherent Time Redundancy (ITR) program program duplicate program == == == == == == == == Conventional time redundancy Inherent time redundancy 8

Inherent Time Redundancy (ITR) A decode signature is maintained per instruction 9 Signature is updated at last use of a decode signal At retirement, instruction signatures are combined into trace signatures A trace ends at branch or 16 instructions Trace signatures are stored in a ITR cache Each new trace signature is checked with the copy in ITR cache Cache miss does not directly cause fault coverage loss Later hit to a previously missed signature detects faults in either the current or previous signature

RNA & TAC Register Name Authentication (RNA) Detects faults in destination register mappings of instructions Checks consistencies in rename unit 10 Timestamp-based Assertion Check (TAC) Detect faults in the issue unit Checks if there s sequential order among data dependent instructions Implementation: Check: Instr s Timestamp >= Prod. Timestamps

Sequential PC Check (SPC) Detects faults affecting sequential control flow Asserts that a committing instr. s PC matches the retirement PC Implementation Maintain retirement program counter (PC) For non-branch instr., increment retirement PC by instr. size For branch instr., update retirement PC with calculated PC Check: committing instr. PC match retirement PC 11

CC & ConfBr Register Consumer Counter (CC) Detects faults in source register mappings after register renaming Implementation: One counter per physical register Increment counter of source register at rename stage Assert counter of source register > 0 at register read stage Decrement counter of source register after register read Confident Branches Misprediction (ConfBr) Detects faults affecting values that influence branch outcomes Implementation Identify highly-predictable branches using confidence counters Misprediction of a confident branch may be symptomatic of a fault 12

BTB Verify (BTBV) Detects faults in BTB and decode logic Exploits inherent redundancy between the BTB and the decode stage BTB hit produces decode info about branches one cycle earlier than decode stage BTB info should match decode info Mismatch indicates fault in BTB logic (false hit, BTB fault, etc.) or decode stage BTB aliasing mismatches are handled in the same manner (flush the instruction and instructions after it, don t trust the decoder) 13

RFT: Simulation Approach Evaluation Using Fault Injection, goals: Measure processor fault coverage of a µarch-level fault-check regimen Leverage C/C++ cycle-level µarch. simulators Cost and time efficient Ensure high fault modeling coverage Fault Injection Approach Analyze high-level (µarch-level) effects of faults in each pipeline stage Randomly inject µarch-level faults in simulator Example: fetch stage (IF) (a) 14 (b)

15 Fetch stage fault analysis for fault detection

RFT: Simulation Approach 16

RFT: Results Fault Locations Fetch 9% Decode 39% Rename 24% Dispatch 7% Backend 21% 17

RFT: Results Fault Outcomes Faults detected by the regimen 60% Faults detected by watchdog 9% Faults undetected 31% 18

RFT: Results (Cont.) 19 6.3% 24.6% 1.3% 8% 59.8% 6.2% 0.1% 17.4% 7.2% 0.4% 7.6% 35.8% 24% Non-masked faults = 40.2% Non-masked faults detected by regimen = 24% (60% reduction in vulnerability) Non-masked faults detected by watchdog = 9% (23% reduction in vulnerability) Non-masked faults detected by regimen + watchdog = 33% (~83% of non-masked faults get detected)

Summary RFT presented a regimen of µarch-level fault checks to protect a superscalar processor Injected a broad spectrum of fault types across all pipeline stages Regimen-based approach provides substantial fault protection (detects ~83% of non-masked faults) 20

21 THANK YOU!