ARCHITECTURE DESIGN FOR SOFT ERRORS

Size: px
Start display at page:

Download "ARCHITECTURE DESIGN FOR SOFT ERRORS"

Transcription

1 ARCHITECTURE DESIGN FOR SOFT ERRORS Shubu Mukherjee ^ШВпШшр"* AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO T^"ТГПШГ SAN FRANCISCO SINGAPORE SYDNEY TOKYO ^ P f ^ ^ ELSEVIER Morgan Kaufmann Publishers is an imprint of Elsevier MORGAN KAUFMANN PUBLISHERS

2 Foreword Preface xiii xvii 1 Introduction Overview Evidence of Soft Errors Types of Soft Errors Cost-Eff ective Solutions to Mitigate the Impact of Soft Errors Faults Errors Metrics Dependability Models Reliability Availability Miscellaneous Models Permanent Faults in Complementary Metal Oxide Semiconductor Technology Metal Failure Modes Gate Oxide Failure Modes Radiation-Induced Transient Faults in CMOS Transistors The Alpha Particle The Neutron Interaction of Alpha Particles and Neutrons with Silicon Crystals Architectural Fault Models for Alpha Particle and Neutron Strikes Silent Data Corruption and Detected Unrecoverable Error Basic Definitions: SDC and DUE SDC and DUE Budgets 34 vii

3 viii 1.10 Soft Error Scaling Trends SRAM and Latch Scaling Trends DRAM Scaling Trends Summary Historical Anecdote 39 References 40 2 Device- and Circuit-Level Modeling, Measurement, and Mitigation Overview Modeling Circuit-Level SERs Impact of Alpha Particle or Neutron on Circuit Elements Critical Charge (Qcrit) Timing Vulnerability Factor Masking Effects in Combinatorial Logic Gates Vulnerability of Clock Circuits Measurement Field Data Collection Accelerated Alpha Particle Tests Accelerated Neutron Tests Mitigation Techniques Device Enhancements Circuit Enhancements Summary Historical Anecdote 76 References 76 3 Architectural Vulnerability Analysis Overview AVF Basics Does a Bit Matter? SDC and DUE Equations Bit-Level SDC and DUE FIT Equations Chip-Level SDC and DUE FIT Equations False DUE AVF Case Study: False DUE from Lockstepped Checkers Process-Kill versus System-Kill DUE AVF ACE Principles Types of ACE and Un-ACE Bits Point-of-Strike Model versus Propagated Fault Model 3.6 Microarchitectural Un-ACE Bits Idle or Invalid State Misspeculated State Predictor Structures Ex-ACE State

4 3.7 Architectural Un-ACE Bits NOP Instructions Performance-Enhancing Operations Predicated False Instructions Dynamically Dead Instructions Logical Masking AVF Equations for a Hardware Structure Computing AVF with Little's Law Implications of Little's Law for AVF Computation Computing AVF with a Performance Model Limitations of AVF Analysis with Performance Models ACE Analysis Using the Point-of-Strike Fault Model AVF Results from an Itanium 2 Performance Model ACE Analysis Using the Propagated Fault Model Summary Historical Anecdote 118 References Advanced Architectural Vulnerability Analysis Overview Lifetime Analysis of RAM Arrays Basic Idea of Lifetime Analysis Accounting for Structural Differences in Lifetime Analysis Impact of Working Set Size for Lifetime Analysis Granularity of Lifetime Analysis Computing the DUE AVF Lifetime Analysis of CAM Arrays Handling False-Positive Matches in a CAM Array Handling False-Negative Matches in a CAM Array Effect of Cooldown in Lifetime Analysis AVF Results for Cache, Data Translation Buffer, and Store Buffer Unknown Components RAM Arrays CAM Arrays DUE AVF Computing AVFs Using SFI into an RTL Model Comparison of Fault Injection and ACE Analyses Random Sampling in SFI Determining if an Injected Fault Will Result in an Error Case Study of SFI The Illinois SFI Study SFI Methodology Transient Faults in Pipeline State Transient Faults in Logic Blocks 156

5 4.8 Summary Historical Anecdote 159 References 160 Error Coding Techniques Overview Fault Detection and ECC for State Bits Basics of Error Coding Error Detection Using Parity Codes Single-Error Correction Codes Single-Error Correct Double-Error Detect Code Double-Error Correct Triple-Error Detect Code Cyclic Redundancy Check Error Detection Codes for Execution Units AN Codes Residue Codes Parity Prediction Circuits Implementation Overhead of Error Detection and Correction Codes Number of Logic Levels Overhead in Area Scrubbing Analysis DUE FIT from Temporal Double-Bit Error with No Scrubbing DUE Rate from Temporal Double-Bit Error with Fixed-Interval Scrubbing Detecting False Errors Sources of False DUE Events in a Microprocessor Pipeline Mechanism to Propagate Error Information Distinguishing False Errors from True Errors Hardware Assertions Machine Check Architecture Informing the OS of an Error Recording Information about the Error Isolating the Error Summary Historical Anecdote 205 References 205 Fault Detection via Redundant Execution Overview Sphere of Replication Components of the Sphere of Replication The Size of Sphere of Replication Output Comparison and Input Replication 211

6 XI 6.3 Fault Detection via Cycle-by-Cycle Lockstepping Advantages of Lockstepping Disadvantages of Lockstepping Lockstepping in the Stratus ftserver Lockstepping in the Hewlett-Packard NonStop Himalaya Architecture Lockstepping in the IBM Z-series Processors Fault Detection via RMT RMT in the Marathon Endurance Server RMT in the Hewlett-Packard NonStop Advanced Architecture RMT Within a Single-Processor Core A Simultaneous Multithreaded Processor Design Space for SMT in a Single Core Output Comparison in an SRT Processor Input Replication in an SRT Processor Input Replication of Cached Load Data Two Techniques to Enhance Performance of an SRT Processor Performance Evaluation of an SRT Processor Alternate Single-Core RMT Implementation RMT in a Multicore Architecture DIVA: RMT Using Specialized Checker Processor RMT Enhancements Relaxed Input Replication Relaxed Output Comparison Partial RMT Summary Historical Anecdote 248 References Hardware Error Recovery Overview Classification of Hardware Error Recovery Schemes Reboot Forward Error Recovery Backward Error Recovery Forward Error Recovery Fail-Over Systems DMR with Recovery Triple Modular Redundancy Pair-and-Spare Backward Error Recovery with Fault Detection Before Register Commit Fujitsu SPARC64 V: Parity with Retry IBM Z-Series: Lockstepping with Retry 265

7 XII Simultaneous and Redundantly Threaded Processor with Recovery Chip-Level Redundantly Threaded Processor with Recovery (CRTR) Exposure Reduction via Pipeline Squash Fault Screening with Pipeline Squash and Re-execution Backward Error Recovery with Fault Detection before Memory Commit Incremental Checkpointing Using a History Buffer Periodic Checkpointing with Fingerprinting Backward Error Recovery with Fault Detection before I/O Commit LVQ-Based Recovery in an SRT Processor Re Vive: Backward Error Recovery Using Global Checkpoints SafetyNet: Backward Error Recovery Using Local Checkpoints Backward Error Recovery with Fault Detection after I/O Commit Summary Historical Anecdote 294 References Software Detection and Recovery Overview Fault Detection Using SIS Fault Detection Using Software RMT Error Detection by Duplicated Instructions Software-Implemented Fault Tolerance Configurable Transient Fault Detection via Dynamic Binary Translation Fault Detection Using Hybrid RMT CRAFT: A Hybrid RMT Implementation CRAFT Evaluation Fault Detection Using RVMs Application-Level Recovery Forward Error Recovery Using Software RMT and AN Codes for? auist Detection Log-Based Backward Error Recovery in Database Systems Checkpoint-Based Backward Error Recovery for Shared-Memory Programs OS-Level and VMM-Level Recoveries Summary 323 References 324 Index 327

Transient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof.

Transient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof. Transient Fault Detection and Reducing Transient Error Rate Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof. Steven Swanson Outline Motivation What are transient faults? Hardware Fault Detection

More information

Reliable Architectures

Reliable Architectures 6.823, L24-1 Reliable Architectures Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 6.823, L24-2 Strike Changes State of a Single Bit 10 6.823, L24-3 Impact

More information

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability. Copyright 2010 Daniel J. Sorin Duke University

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability. Copyright 2010 Daniel J. Sorin Duke University Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2010 Daniel J. Sorin Duke University Definition and Motivation Outline General Principles of Available System Design

More information

Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance

Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance Outline Introduction and Motivation Software-centric Fault Detection Process-Level Redundancy Experimental Results

More information

Computers as Components Principles of Embedded Computing System Design

Computers as Components Principles of Embedded Computing System Design Computers as Components Principles of Embedded Computing System Design Third Edition Marilyn Wolf ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY

More information

Computer Architecture A Quantitative Approach

Computer Architecture A Quantitative Approach Computer Architecture A Quantitative Approach Third Edition John L. Hennessy Stanford University David A. Patterson University of California at Berkeley With Contributions by David Goldberg Xerox Palo

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 18 Chapter 7 Case Studies Part.18.1 Introduction Illustrate practical use of methods described previously Highlight fault-tolerance

More information

Computer Architecture: Multithreading (III) Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multithreading (III) Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multithreading (III) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-742 Fall 2012, Parallel Computer Architecture, Lecture 13:

More information

Embedded Systems Architecture

Embedded Systems Architecture Embedded Systems Architecture A Comprehensive Guide for Engineers and Programmers By Tammy Noergaard ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors Computer Sciences Department University of Wisconsin Madison http://www.cs.wisc.edu/~ericro/ericro.html ericro@cs.wisc.edu High-Performance

More information

Ilan Beer. IBM Haifa Research Lab 27 Oct IBM Corporation

Ilan Beer. IBM Haifa Research Lab 27 Oct IBM Corporation Ilan Beer IBM Haifa Research Lab 27 Oct. 2008 As the semiconductors industry progresses deeply into the sub-micron technology, vulnerability of chips to soft errors is growing In high reliability systems,

More information

M (~ Computer Organization and Design ELSEVIER. David A. Patterson. John L. Hennessy. University of California, Berkeley. Stanford University

M (~ Computer Organization and Design ELSEVIER. David A. Patterson. John L. Hennessy. University of California, Berkeley. Stanford University T H I R D EDITION REVISED Computer Organization and Design THE HARDWARE/SOFTWARE INTERFACE David A. Patterson University of California, Berkeley John L. Hennessy Stanford University With contributions

More information

FPGAs: Instant Access

FPGAs: Instant Access FPGAs: Instant Access Clive"Max"Maxfield AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO % ELSEVIER Newnes is an imprint of Elsevier Newnes Contents

More information

Fault Tolerant Computing. Prof. David August/Prof. David Walker. Without the Transistor. Transistors.

Fault Tolerant Computing. Prof. David August/Prof. David Walker. Without the Transistor. Transistors. Fault Tolerant Computing Prof. David August/Prof. David Walker 2 3 Without the Transistor Transistors 4 http://www.ominous-valve.com/vtsc.html 1 Basic MOSFET Transistor Semiconductors Pure semiconductors

More information

Jeremy W. Sheaffer 1 David P. Luebke 2 Kevin Skadron 1. University of Virginia Computer Science 2. NVIDIA Research

Jeremy W. Sheaffer 1 David P. Luebke 2 Kevin Skadron 1. University of Virginia Computer Science 2. NVIDIA Research A Hardware Redundancy and Recovery Mechanism for Reliable Scientific Computation on Graphics Processors Jeremy W. Sheaffer 1 David P. Luebke 2 Kevin Skadron 1 1 University of Virginia Computer Science

More information

Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor

Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor Abstract Transient faults due to neutron and alpha particle strikes pose a significant obstacle to increasing processor transistor

More information

WITH the continuous decrease of CMOS feature size and

WITH the continuous decrease of CMOS feature size and IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 5, MAY 2012 777 IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan, Student

More information

Programming 8-bit PIC Microcontrollers in С

Programming 8-bit PIC Microcontrollers in С Programming 8-bit PIC Microcontrollers in С with Interactive Hardware Simulation Martin P. Bates älllllltlilisft &Щ*лЛ AMSTERDAM BOSTON HEIDELBERG LONDON ^^Ш NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 5 Processor-Level Techniques & Byzantine Failures Chapter 2 Hardware Fault Tolerance Part.5.1 Processor-Level Techniques

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming F 'C 3 R'"'C,_,. HO!.-IJJ () An Introduction to Parallel Programming Peter S. Pacheco University of San Francisco ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

PROBABILITY THAT A FAULT WILL CAUSE A DECLARED ERROR. THE FIRST

PROBABILITY THAT A FAULT WILL CAUSE A DECLARED ERROR. THE FIRST REDUCING THE SOFT-ERROR RATE OF A HIGH-PERFORMANCE MICROPROCESSOR UNLIKE TRADITIONAL APPROACHES, WHICH FOCUS ON DETECTING AND RECOVERING FROM FAULTS, THE TECHNIQUES INTRODUCED HERE REDUCE THE PROBABILITY

More information

ECE 574 Cluster Computing Lecture 19

ECE 574 Cluster Computing Lecture 19 ECE 574 Cluster Computing Lecture 19 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 10 November 2015 Announcements Projects HW extended 1 MPI Review MPI is *not* shared memory

More information

Information Modeling and Relational Databases

Information Modeling and Relational Databases Information Modeling and Relational Databases Second Edition Terry Halpin Neumont University Tony Morgan Neumont University AMSTERDAM» BOSTON. HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

Digital System Design with SystemVerilog

Digital System Design with SystemVerilog Digital System Design with SystemVerilog Mark Zwolinski AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid Capetown Sydney Tokyo

More information

Computing Architectural Vulnerability Factors for Address-Based Structures

Computing Architectural Vulnerability Factors for Address-Based Structures Computing Architectural Vulnerability Factors for Address-Based Structures Arijit Biswas 1, Paul Racunas 1, Razvan Cheveresan 2, Joel Emer 3, Shubhendu S. Mukherjee 1 and Ram Rangan 4 1 FACT Group, Intel

More information

ABSTRACT. Reducing the Soft Error Rates of a High-Performance Microprocessor Using Front-End Throttling

ABSTRACT. Reducing the Soft Error Rates of a High-Performance Microprocessor Using Front-End Throttling ABSTRACT Title of Thesis: Reducing the Soft Error Rates of a High-Performance Microprocessor Using Front-End Throttling Smitha M Kalappurakkal, Master of Science, 2006 Thesis directed by: Professor Manoj

More information

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical

More information

Outline. Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication. Outline

Outline. Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication. Outline Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication Khanh N. Dang and Xuan-Tu Tran Email: khanh.n.dang@vnu.edu.vn VNU Key Laboratory for Smart Integrated Systems

More information

System Assurance. Beyond Detecting. Vulnerabilities. Djenana Campara. Nikolai Mansourov

System Assurance. Beyond Detecting. Vulnerabilities. Djenana Campara. Nikolai Mansourov System Assurance Beyond Detecting Vulnerabilities Nikolai Mansourov Djenana Campara ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SYDNEY TOKYO Morgan Kaufmann

More information

Detailed Design and Evaluation of Redundant Multithreading Alternatives*

Detailed Design and Evaluation of Redundant Multithreading Alternatives* Detailed Design and Evaluation of Redundant Multithreading Alternatives* Shubhendu S. Mukherjee VSSAD Massachusetts Microprocessor Design Center Intel Corporation 334 South Street, SHR1-T25 Shrewsbury,

More information

Design and Evaluation of Hybrid Fault-Detection Systems

Design and Evaluation of Hybrid Fault-Detection Systems Design and Evaluation of Hybrid Fault-Detection Systems George A. Reis Jonathan Chang Neil Vachharajani Ram Rangan David I. August Departments of Electrical Engineering and Computer Science Princeton University

More information

Managed. Code Rootkits. Hooking. into Runtime. Environments. Erez Metula ELSEVIER. Syngress is an imprint of Elsevier SYNGRESS

Managed. Code Rootkits. Hooking. into Runtime. Environments. Erez Metula ELSEVIER. Syngress is an imprint of Elsevier SYNGRESS Managed Code Rootkits Hooking into Runtime Environments Erez Metula ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEWYORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Syngress is an imprint

More information

Calculating Architectural Vulnerability Factors for Spatial Multi-bit Transient Faults

Calculating Architectural Vulnerability Factors for Spatial Multi-bit Transient Faults Calculating Architectural Vulnerability Factors for Spatial Multi-bit Transient Faults Mark Wilkening, Vilas Sridharan, Si Li, Fritz Previlon, Sudhanva Gurumurthi and David R. Kaeli ECE Department, Northeastern

More information

DESIGN AND ANALYSIS OF TRANSIENT FAULT TOLERANCE FOR MULTI CORE ARCHITECTURE

DESIGN AND ANALYSIS OF TRANSIENT FAULT TOLERANCE FOR MULTI CORE ARCHITECTURE DESIGN AND ANALYSIS OF TRANSIENT FAULT TOLERANCE FOR MULTI CORE ARCHITECTURE DivyaRani 1 1pg scholar, ECE Department, SNS college of technology, Tamil Nadu, India -----------------------------------------------------------------------------------------------------------------------------------------------

More information

Multicore Soft Error Rate Stabilization Using Adaptive Dual Modular Redundancy

Multicore Soft Error Rate Stabilization Using Adaptive Dual Modular Redundancy Multicore Soft Error Rate Stabilization Using Adaptive Dual Modular Redundancy Ramakrishna Vadlamani, Jia Zhao, Wayne Burleson and Russell Tessier Department of Electrical and Computer Engineering University

More information

Engineering Real- Time Applications with Wild Magic

Engineering Real- Time Applications with Wild Magic 3D GAME ENGINE ARCHITECTURE Engineering Real- Time Applications with Wild Magic DAVID H. EBERLY Geometric Tools, Inc. AMSTERDAM BOSTON HEIDELRERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

Transient Fault Detection via Simultaneous Multithreading

Transient Fault Detection via Simultaneous Multithreading Transient Fault Detection via Simultaneous Multithreading Steven K. Reinhardt EECS Department University of Michigan, Ann Arbor 1301 Beal Avenue Ann Arbor, MI 48109-2122 stever@eecs.umich.edu Shubhendu

More information

Coding for Penetration

Coding for Penetration Coding for Penetration Testers Building Better Tools Jason Andress Ryan Linn ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Syngress is

More information

Fingerprinting: Hash-Based Error Detection in Microprocessors. Jared C. Smolens

Fingerprinting: Hash-Based Error Detection in Microprocessors. Jared C. Smolens CARNEGIE MELLON UNIVERSITY CARNEGIE INSTITUTE OF TECHNOLOGY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

More information

Algorithmic Graph Theory and Perfect Graphs

Algorithmic Graph Theory and Perfect Graphs Algorithmic Graph Theory and Perfect Graphs Second Edition Martin Charles Golumbic Caesarea Rothschild Institute University of Haifa Haifa, Israel 2004 ELSEVIER.. Amsterdam - Boston - Heidelberg - London

More information

Structured Parallel Programming

Structured Parallel Programming Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

MPEG-l.MPEG-2, MPEG-4

MPEG-l.MPEG-2, MPEG-4 The MPEG Handbook MPEG-l.MPEG-2, MPEG-4 Second edition John Watkinson PT ^PVTPR AMSTERDAM BOSTON HEIDELBERG LONDON. NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Focal Press is an

More information

Evaluating the Effects of Compiler Optimisations on AVF

Evaluating the Effects of Compiler Optimisations on AVF Evaluating the Effects of Compiler Optimisations on AVF Timothy M. Jones, Michael F.P. O Boyle Member of HiPEAC, School of Informatics University of Edinburgh, UK {tjones1,mob}@inf.ed.ac.uk Oğuz Ergin

More information

Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors. Moinuddin K. Qureshi Onur Mutlu Yale N.

Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors. Moinuddin K. Qureshi Onur Mutlu Yale N. Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors Moinuddin K. Qureshi Onur Mutlu Yale N. Patt High Performance Systems Group Department of Electrical

More information

POWER4 Systems: Design for Reliability. Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX

POWER4 Systems: Design for Reliability. Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX Systems: Design for Reliability Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX Microprocessor 2-way SMP system on a chip > 1 GHz processor frequency >1GHz Core Shared L2 >1GHz Core

More information

Examining the Impact of ACE interference on Multi-Bit AVF Estimates

Examining the Impact of ACE interference on Multi-Bit AVF Estimates Examining the Impact of ACE interference on Multi-Bit AVF Estimates Fritz Previlon, Mark Wilkening, Vilas Sridharan, Sudhanva Gurumurthi and David R. Kaeli ECE Department, Northeastern University, Boston,

More information

Structured Parallel Programming Patterns for Efficient Computation

Structured Parallel Programming Patterns for Efficient Computation Structured Parallel Programming Patterns for Efficient Computation Michael McCool Arch D. Robison James Reinders ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

Area-Efficient Error Protection for Caches

Area-Efficient Error Protection for Caches Area-Efficient Error Protection for Caches Soontae Kim Department of Computer Science and Engineering University of South Florida, FL 33620 sookim@cse.usf.edu Abstract Due to increasing concern about various

More information

Chip, Heal Thyself. The BulletProof Project

Chip, Heal Thyself. The BulletProof Project Chip, Heal Thyself Todd Austin Advanced Computer Architecture Lab University of Michigan With Prof. Valeria Bertacco, Prof. Scott Mahlke Kypros Constantinides, Smitha Shyam Mojtaba Mehrara, Mona Attariyan,

More information

Lecture 22: Fault Tolerance

Lecture 22: Fault Tolerance Lecture 22: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA 03, Wisconsin A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures, HPCA 07, Spain Error

More information

Survey of Error and Fault Detection Mechanisms

Survey of Error and Fault Detection Mechanisms Survey of Error and Fault Detection Mechanisms Ikhwan Lee ikhwan@mail.utexas.edu Michael Sullivan mbsullivan@mail.utexas.edu Evgeni Krimer krimer@utexas.edu Dong Wan Kim wannikim@utexas.edu Mehmet Basoglu

More information

Computer Animation. Algorithms and Techniques. z< MORGAN KAUFMANN PUBLISHERS. Rick Parent Ohio State University AN IMPRINT OF ELSEVIER SCIENCE

Computer Animation. Algorithms and Techniques. z< MORGAN KAUFMANN PUBLISHERS. Rick Parent Ohio State University AN IMPRINT OF ELSEVIER SCIENCE Computer Animation Algorithms and Techniques Rick Parent Ohio State University z< MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF ELSEVIER SCIENCE AMSTERDAM BOSTON LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

The Designer's Guide to VHDL Second Edition

The Designer's Guide to VHDL Second Edition The Designer's Guide to VHDL Second Edition Peter J. Ashenden EDA CONSULTANT, ASHENDEN DESIGNS PTY. VISITING RESEARCH FELLOW, ADELAIDE UNIVERSITY Cl MORGAN KAUFMANN PUBLISHERS An Imprint of Elsevier SAN

More information

The Pennsylvania State University The Graduate School College of Engineering REDUNDANCY AND PARALLELISM TRADEOFFS FOR

The Pennsylvania State University The Graduate School College of Engineering REDUNDANCY AND PARALLELISM TRADEOFFS FOR The Pennsylvania State University The Graduate School College of Engineering REDUNDANCY AND PARALLELISM TRADEOFFS FOR RELIABLE, HIGH-PERFORMANCE ARCHITECTURES A Thesis in Computer Science and Engineering

More information

EVALUATING OVERHEADS OF MULTIBIT SOFT-ERROR PROTECTION

EVALUATING OVERHEADS OF MULTIBIT SOFT-ERROR PROTECTION [3B2-9] mmi2013040010.3d 11/7/013 17:9 Page 2... EVALUATING OVERHEADS OF MULTIBIT SOFT-ERROR PROTECTION IN THE PROCESSOR CORE... THE SVALINN FRAMEWORK PROVIDES COMPREHENSIVE ANALYSIS OF MULTIBIT ERROR

More information

Utilizing Dynamically Coupled Cores to Form a Resilient Chip Multiprocessor

Utilizing Dynamically Coupled Cores to Form a Resilient Chip Multiprocessor Utilizing Dynamically Coupled Cores to Form a Resilient Chip Multiprocessor Christopher LaFrieda Engin İpek José F.Martínez Rajit Manohar Computer Systems Laboratory Cornell University Ithaca, NY 14853

More information

SlicK: Slice-based Locality Exploitation for Efficient Redundant Multithreading

SlicK: Slice-based Locality Exploitation for Efficient Redundant Multithreading SlicK: Slice-based Locality Exploitation for Efficient Redundant Multithreading Angshuman Parashar Sudhanva Gurumurthi Anand Sivasubramaniam Dept. of Computer Science and Engineering Dept. of Computer

More information

Memory technology and optimizations ( 2.3) Main Memory

Memory technology and optimizations ( 2.3) Main Memory Memory technology and optimizations ( 2.3) 47 Main Memory Performance of Main Memory: Latency: affects Cache Miss Penalty» Access Time: time between request and word arrival» Cycle Time: minimum time between

More information

On the Characterization of Data Cache Vulnerability in High-Performance Embedded Microprocessors

On the Characterization of Data Cache Vulnerability in High-Performance Embedded Microprocessors On the Characterization of Data Cache Vulnerability in High-Performance Embedded Microprocessors Shuai Wang, Jie Hu, and Sotirios G. Ziavras Department of Electrical and Computer Engineering New Jersey

More information

Low Power Cache Design. Angel Chen Joe Gambino

Low Power Cache Design. Angel Chen Joe Gambino Low Power Cache Design Angel Chen Joe Gambino Agenda Why is low power important? How does cache contribute to the power consumption of a processor? What are some design challenges for low power caches?

More information

Ultra Low-Cost Defect Protection for Microprocessor Pipelines

Ultra Low-Cost Defect Protection for Microprocessor Pipelines Ultra Low-Cost Defect Protection for Microprocessor Pipelines Smitha Shyam Kypros Constantinides Sujay Phadke Valeria Bertacco Todd Austin Advanced Computer Architecture Lab University of Michigan Key

More information

A Low Cost Checker for Matrix Multiplication

A Low Cost Checker for Matrix Multiplication A Low Cost Checker for Matrix Multiplication Lisbôa, C. A., Erigson, M. I., and Carro, L. Instituto de Informática, Universidade Federal do Rio Grande do Sul calisboa@inf.ufrgs.br, mierigson@terra.com.br,

More information

TECHNOLOGY scaling has driven the computer industry

TECHNOLOGY scaling has driven the computer industry 516 IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, VOL. 4, NO. 3, SEPTEMBER 2004 Timing Vulnerability Factors of Sequentials Norbert Seifert, Senior Member, IEEE, and Nelson Tam, Member, IEEE Abstract

More information

Eliminating Single Points of Failure in Software Based Redundancy

Eliminating Single Points of Failure in Software Based Redundancy Eliminating Single Points of Failure in Software Based Redundancy Peter Ulbrich, Martin Hoffmann, Rüdiger Kapitza, Daniel Lohmann, Reiner Schmid and Wolfgang Schröder-Preikschat EDCC May 9, 2012 SYSTEM

More information

Real-Time Systems and Programming Languages

Real-Time Systems and Programming Languages Real-Time Systems and Programming Languages Ada, Real-Time Java and C/Real-Time POSIX Fourth Edition Alan Burns and Andy Wellings University of York * ADDISON-WESLEY An imprint of Pearson Education Harlow,

More information

MSP430 Microcontroller Basics

MSP430 Microcontroller Basics MSP430 Microcontroller Basics John H. Davies AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Newnes is an imprint of Elsevier N WPIGS Contents Preface

More information

Application Programming

Application Programming Multicore Application Programming For Windows, Linux, and Oracle Solaris Darryl Gove AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris

More information

Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy

Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy Wei Chen, Rui Gong, Fang Liu, Kui Dai, Zhiying Wang School of Computer, National University of Defense Technology,

More information

Robust System Design with MPSoCs Unique Opportunities

Robust System Design with MPSoCs Unique Opportunities Robust System Design with MPSoCs Unique Opportunities Subhasish Mitra Robust Systems Group Departments of Electrical Eng. & Computer Sc. Stanford University Email: subh@stanford.edu Acknowledgment: Stanford

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

Self-Repair for Robust System Design. Yanjing Li Intel Labs Stanford University

Self-Repair for Robust System Design. Yanjing Li Intel Labs Stanford University Self-Repair for Robust System Design Yanjing Li Intel Labs Stanford University 1 Hardware Failures: Major Concern Permanent: our focus Temporary 2 Tolerating Permanent Hardware Failures Detection Diagnosis

More information

Chapter 8. Coping with Physical Failures, Soft Errors, and Reliability Issues. System-on-Chip EE141 Test Architectures Ch. 8 Physical Failures - P.

Chapter 8. Coping with Physical Failures, Soft Errors, and Reliability Issues. System-on-Chip EE141 Test Architectures Ch. 8 Physical Failures - P. Chapter 8 Coping with Physical Failures, Soft Errors, and Reliability Issues System-on-Chip EE141 Test Architectures Ch. 8 Physical Failures - P. 1 1 What is this chapter about? Gives an Overview of and

More information

Ultra Depedable VLSI by Collaboration of Formal Verifications and Architectural Technologies

Ultra Depedable VLSI by Collaboration of Formal Verifications and Architectural Technologies Ultra Depedable VLSI by Collaboration of Formal Verifications and Architectural Technologies CREST-DVLSI - Fundamental Technologies for Dependable VLSI Systems - Masahiro Fujita Shuichi Sakai Masahiro

More information

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical

More information

Moving to the Cloud. Developing Apps in. the New World of Cloud Computing. Dinkar Sitaram. Geetha Manjunath. David R. Deily ELSEVIER.

Moving to the Cloud. Developing Apps in. the New World of Cloud Computing. Dinkar Sitaram. Geetha Manjunath. David R. Deily ELSEVIER. Moving to the Cloud Developing Apps in the New World of Cloud Computing Dinkar Sitaram Geetha Manjunath Technical Editor David R. Deily AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO

More information

Checker Processors. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India

Checker Processors. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India Advanced Department of Computer Science Indian Institute of Technology New Delhi, India Outline Introduction Advanced 1 Introduction 2 Checker Pipeline Checking Mechanism 3 Advanced Core Checker L1 Failure

More information

Coding for Penetration Testers Building Better Tools

Coding for Penetration Testers Building Better Tools Coding for Penetration Testers Building Better Tools Second Edition Jason Andress Ryan Linn Clara Hartwell, Technical Editor ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO

More information

The Essential Guide to Video Processing

The Essential Guide to Video Processing The Essential Guide to Video Processing Second Edition EDITOR Al Bovik Department of Electrical and Computer Engineering The University of Texas at Austin Austin, Texas AMSTERDAM BOSTON HEIDELBERG LONDON

More information

DATABASE SYSTEM CONCEPTS

DATABASE SYSTEM CONCEPTS DATABASE SYSTEM CONCEPTS HENRY F. KORTH ABRAHAM SILBERSCHATZ University of Texas at Austin McGraw-Hill, Inc. New York St. Louis San Francisco Auckland Bogota Caracas Lisbon London Madrid Mexico Milan Montreal

More information

REPAS: Reliable Execution for Parallel ApplicationS in Tiled-CMPs

REPAS: Reliable Execution for Parallel ApplicationS in Tiled-CMPs REPAS: Reliable Execution for Parallel ApplicationS in Tiled-CMPs Daniel Sánchez, Juan L. Aragón and José M. García Departamento de Ingeniería y Tecnología de Computadores Universidad de Murcia, 30071

More information

Foundations of Multidimensional and Metric Data Structures

Foundations of Multidimensional and Metric Data Structures Foundations of Multidimensional and Metric Data Structures Hanan Samet University of Maryland, College Park ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor

A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor Jason Blome 1, Scott Mahlke 1, Daryl Bradley 2 and Krisztián Flautner 2 1 Advanced Computer Architecture

More information

Fine-Grain Redundancy Techniques for High- Reliable SRAM FPGA`S in Space Environment: A Brief Survey

Fine-Grain Redundancy Techniques for High- Reliable SRAM FPGA`S in Space Environment: A Brief Survey Fine-Grain Redundancy Techniques for High- Reliable SRAM FPGA`S in Space Environment: A Brief Survey T.Srinivas Reddy 1, J.Santosh 2, J.Prabhakar 3 Assistant Professor, Department of ECE, MREC, Hyderabad,

More information

Modern Embedded Computing Designing Connected, Pervasive, Media-Rich Systems

Modern Embedded Computing Designing Connected, Pervasive, Media-Rich Systems Modern Embedded Computing Designing Connected, Pervasive, Media-Rich Systems Peter Barry Patrick Crowley ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

Networked Graphics 01_P374423_PRELIMS.indd i 10/27/2009 6:57:42 AM

Networked Graphics 01_P374423_PRELIMS.indd i 10/27/2009 6:57:42 AM Networked Graphics Networked Graphics Building Networked Games and Virtual Environments Anthony Steed Manuel Fradinho Oliveira AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

PTC Mathcad Prime 3.0

PTC Mathcad Prime 3.0 Essential PTC Mathcad Prime 3.0 A Guide for New and Current Users Brent Maxfield, P.E. AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO @ Academic

More information

Computer Architecture

Computer Architecture Informatics 3 Computer Architecture Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh (thanks to Prof. Nigel Topham) General Information Instructor

More information

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON.

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON. DISTRIBUTED SYSTEMS 121r itac itple TAYAdiets Second Edition Andrew S. Tanenbaum Maarten Van Steen Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON Prentice Hall Upper Saddle River, NJ 07458 CONTENTS

More information

Puey Wei Tan. Danny Lee. IBM zenterprise 196

Puey Wei Tan. Danny Lee. IBM zenterprise 196 Puey Wei Tan Danny Lee IBM zenterprise 196 IBM zenterprise System What is it? IBM s product solutions for mainframe computers. IBM s product models: 700/7000 series System/360 System/370 System/390 zseries

More information

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define

More information

Designing Enterprise SSDs with Low Cost Media

Designing Enterprise SSDs with Low Cost Media Designing Enterprise SSDs with Low Cost Media Jeremy Werner Director of Marketing SandForce Flash Memory Summit August 2011 Santa Clara, CA 1 Everyone Knows Flash is migrating: To smaller nodes 2-bit and

More information

INTRODUCING ABSTRACTION TO VULNERABILITY ANALYSIS

INTRODUCING ABSTRACTION TO VULNERABILITY ANALYSIS INTRODUCING ABSTRACTION TO VULNERABILITY ANALYSIS A Dissertation Presented by Vilas Keshav Sridharan to The Department of Electrical and Computer Engineering in partial fulfillment of the requirements

More information

Reliability Improvement in Reconfigurable FPGAs

Reliability Improvement in Reconfigurable FPGAs Reliability Improvement in Reconfigurable FPGAs B. Chagun Basha Jeudis de la Comm 22 May 2014 1 Overview # 2 FPGA Fabrics BlockRAM resource Dedicated multipliers I/O Blocks Programmable interconnect Configurable

More information

COMP3221: Microprocessors and. and Embedded Systems. Overview. Lecture 23: Memory Systems (I)

COMP3221: Microprocessors and. and Embedded Systems. Overview. Lecture 23: Memory Systems (I) COMP3221: Microprocessors and Embedded Systems Lecture 23: Memory Systems (I) Overview Memory System Hierarchy RAM, ROM, EPROM, EEPROM and FLASH http://www.cse.unsw.edu.au/~cs3221 Lecturer: Hui Wu Session

More information

Arecent study [24] shows that the soft-error rate [16] per

Arecent study [24] shows that the soft-error rate [16] per IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 8, AUGUST 2007 1 Power-Efficient Approaches to Redundant Multithreading Niti Madan, Student Member, IEEE, and Rajeev Balasubramonian,

More information

Commercial-Off-the-shelf Hardware Transactional Memory for Tolerating Transient Hardware Errors

Commercial-Off-the-shelf Hardware Transactional Memory for Tolerating Transient Hardware Errors Commercial-Off-the-shelf Hardware Transactional Memory for Tolerating Transient Hardware Errors Rasha Faqeh TU- Dresden 19.01.2015 Dresden, 23.09.2011 Transient Error Recovery Motivation Folie Nr. 12 von

More information

Real World Multicore Embedded Systems

Real World Multicore Embedded Systems Real World Multicore Embedded Systems A Practical Approach Expert Guide Bryon Moyer AMSTERDAM BOSTON HEIDELBERG LONDON I J^# J NEW YORK OXFORD PARIS SAN DIEGO S V J SAN FRANCISCO SINGAPORE SYDNEY TOKYO

More information

Analysis of Soft Error Mitigation Techniques for Register Files in IBM Cu-08 90nm Technology

Analysis of Soft Error Mitigation Techniques for Register Files in IBM Cu-08 90nm Technology Analysis of Soft Error Mitigation Techniques for s in IBM Cu-08 90nm Technology Riaz Naseer, Rashed Zafar Bhatti, Jeff Draper Information Sciences Institute University of Southern California Marina Del

More information

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following: CS 47 Spring 27 Mike Lam, Professor Fault Tolerance Content taken from the following: "Distributed Systems: Principles and Paradigms" by Andrew S. Tanenbaum and Maarten Van Steen (Chapter 8) Various online

More information

SOFTWARE-IMPLEMENTED HARDWARE FAULT TOLERANCE

SOFTWARE-IMPLEMENTED HARDWARE FAULT TOLERANCE SOFTWARE-IMPLEMENTED HARDWARE FAULT TOLERANCE SOFTWARE-IMPLEMENTED HARDWARE FAULT TOLERANCE O. Goloubeva, M. Rebaudengo, M. Sonza Reorda, and M. Violante Politecnico di Torino - Dipartimento di Automatica

More information