Eliminating Single Points of Failure in Software Based Redundancy

Similar documents
Software-based Fault Tolerance Mission (Im)possible?

SIListra. Coded Processing in Medical Devices. Dr. Martin Süßkraut (TU-Dresden / SIListra Systems)

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992

Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy

Dependability. IC Life Cycle

Fault-Injection testing and code coverage measurement using Virtual Prototypes on the context of the ISO standard

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992

Automated Application of Fault Tolerance Mechanisms in a Component-Based System

CprE 458/558: Real-Time Systems. Lecture 17 Fault-tolerant design techniques

ARCHITECTURE DESIGN FOR SOFT ERRORS

OPERATING SYSTEM SUPPORT FOR REDUNDANT MULTITHREADING. Björn Döbel (TU Dresden)

Software Techniques for Dependable Computer-based Systems. Matteo SONZA REORDA

KESO Functional Safety and the Use of Java in Embedded Systems

A JVM for Soft-Error-Prone Embedded Systems

TU Wien. Fault Isolation and Error Containment in the TT-SoC. H. Kopetz. TU Wien. July 2007

Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance

Evaluating the Fault Tolerance Capabilities of Embedded Systems via BDM

Safety Architecture Patterns

AN-Encoding Compiler: Building Safety-Critical Systems with Commodity Hardware

FAULT TOLERANT SYSTEMS

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability. Copyright 2010 Daniel J. Sorin Duke University

Communication Patterns in Safety Critical Systems for ADAS & Autonomous Vehicles Thorsten Wilmer Tech AD Berlin, 5. March 2018

Is This What the Future Will Look Like?

Analysis of Soft Error Mitigation Techniques for Register Files in IBM Cu-08 90nm Technology

Safety and Reliability of Software-Controlled Systems Part 14: Fault mitigation

Dependability tree 1

A Diversity of Duplications

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

TSW Reliability and Fault Tolerance

Error Mitigation of Point-to-Point Communication for Fault-Tolerant Computing

A Low-Cost Correction Algorithm for Transient Data Errors

Elzar Triple Modular Redundancy using Intel AVX

Ilan Beer. IBM Haifa Research Lab 27 Oct IBM Corporation

Distributed Systems 24. Fault Tolerance

Defect Tolerance in VLSI Circuits

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:

Duke University Department of Electrical and Computer Engineering

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed Real-Time Fault Tolerance on a Virtualized Multi-Core System

FAULT TOLERANT SYSTEMS

Mixed Critical Architecture Requirements (MCAR)

Commercial-Off-the-shelf Hardware Transactional Memory for Tolerating Transient Hardware Errors

Fault-tolerant techniques

On the Consolidation of Mixed Criticalities Applications on Multicore Architectures

HW/SW Co-Detection of Transient and Permanent Faults with Fast Recovery in Statically Scheduled Data Paths

Issues in Programming Language Design for Embedded RT Systems

Reliable Architectures

Last Class:Consistency Semantics. Today: More on Consistency

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

FAULT TOLERANT SYSTEMS

Transient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof.

Part 2: Basic concepts and terminology

Dep. Systems Requirements

Implementation Issues. Remote-Write Protocols

AUTOBEST: A United AUTOSAR-OS And ARINC 653 Kernel. Alexander Züpke, Marc Bommert, Daniel Lohmann

Distributed Operating Systems

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components

A Fault Tolerant Superscalar Processor

Understanding SW Test Libraries (STL) for safetyrelated integrated circuits and the value of white-box SIL2(3) ASILB(D) YOGITECH faultrobust STL

ANB- and ANBDmem-Encoding: Detecting Hardware Errors in Software

Soft Error Fault Tolerant Systems: CS456 Survey

Today: Fault Tolerance

SOFTWARE-IMPLEMENTED HARDWARE FAULT TOLERANCE

6.033 Lecture Fault Tolerant Computing 3/31/2014

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

DEPENDABLE PROCESSOR DESIGN

Distributed Systems COMP 212. Lecture 19 Othon Michail

Module 8 Fault Tolerance CS655! 8-1!

Diagnosis in the Time-Triggered Architecture

Compiler-Assisted Software Fault Tolerance for Microcontrollers

Fault Tolerance Dealing with an imperfect world

Towards Transactional Memory for Safety-Critical Embedded Systems

A Low-Cost SEE Mitigation Solution for Soft-Processors Embedded in Systems On Programmable Chips

Functional Safety on Multicore Microcontrollers for Industrial Applications

Error Resilience in Digital Integrated Circuits

Multiple Event Upsets Aware FPGAs Using Protected Schemes

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)

Error Detection and Recovery for Transient Faults in Elliptic Curve Cryptosystems

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

DESIGN AND ANALYSIS OF TRANSIENT FAULT TOLERANCE FOR MULTI CORE ARCHITECTURE

Today: Fault Tolerance. Fault Tolerance

Functional Safety Architectural Challenges for Autonomous Drive

ID 025C: An Introduction to the OSEK Operating System

The Use of Java in the Context of AUTOSAR 4.0

Soft-error Detection Using Control Flow Assertions

Performance Estimation of Virtual Duplex Systems on Simultaneous Multithreaded Processors

Fault Tolerant Computing CS 530

Outline of Presentation Field Programmable Gate Arrays (FPGAs(

A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor

Intel iapx 432-VLSI building blocks for a fault-tolerant computer

Riccardo Mariani, Intel Fellow, IOTG SEG, Chief Functional Safety Technologist

Migration of SES to FPGA Based Architectural Concepts

Dependability Threats

Fast SEU Detection and Correction in LUT Configuration Bits of SRAM-based FPGAs

Reliable Computing I

Practical Byzantine Fault Tolerance

TU Wien. Shortened by Hermann Härtig The Rationale for Time-Triggered (TT) Ethernet. H Kopetz TU Wien December H. Kopetz 12.

An Integrated ECC and BISR Scheme for Error Correction in Memory

Building High Reliability into Microsemi Designs with Synplify FPGA Tools

6. Fault Tolerance. CS 313 High Integrity Systems; CS M13 Critical Systems; Michaelmas Term 2009, Sect

Transcription:

Eliminating Single Points of Failure in Software Based Redundancy Peter Ulbrich, Martin Hoffmann, Rüdiger Kapitza, Daniel Lohmann, Reiner Schmid and Wolfgang Schröder-Preikschat EDCC May 9, 2012 SYSTEM SOFTWARE GROUP http://www4.cs.fau.de

Transient Hardware Faults A Growing Problem [3] (Shivakumar, 2002) Transient hardware faults (Soft-Errors)! Induced by e.g., radiation, glitches, insufficient signal integrity Increasingly affecting microcontroller logic Future hardware designs: Even more performance and parallelism On the price of being less and less reliable! 2

Countermeasures - Hardware Safety-Critical System! Sensors SafetyCri+cal.Applica+on. Actuators Hardware-based countermeasures! Application-specific design or specialised hardware For example ECC, lock-step! Pragmatic approach (tackles problem right at source) Hardware costs (e.g., redundancy, checker, ) Selectivity (e.g., multi-application systems) Development costs (diverse safety concepts and HW, (re-)certification) 3

! Countermeasures - Software Safety-Critical System! Sensors SafetyCri+cal.Applica+on. Actuators Different approaches to address transient hardware faults! Hardware vs. software measures Applicability and costs 4

Countermeasures - Software Sensors Safety-Critical System! SafetyCri+cal.Applica+on.(1). SafetyCri+cal.Applica+on.(2). Actuators SafetyCri+cal.Applica+on.(3). Different approaches to address transient hardware faults! Hardware vs. software measures Applicability and costs Software-based triple modular redundancy (TMR)! Accepted and proven (e.g., recommended for ASIL D error handling) Selective (e.g., multi-application systems) 4

! Software-Based Redundancy in Detail Safety-Critical System! Replica.1. Sensors Interface. Replica.2. Majority. Voter. Actuators Replica.3. Software-based TMR requires:! Temporal and spatial isolation (isolation domains) Interface and Majority Voter Isola/ondomain Sphereofreplica/on(SOR) 5

Software-Based Redundancy in Detail Sensors Safety-Critical System! P = 1 1000 Interface. P = 998 1000 Replica.1. Replica.2. P = 1 1000 Majority. Voter. Actuators Replica.3. Software-based TMR requires:! Temporal and spatial isolation (isolation domains) Interface and Majority Voter Single points of failure! No error detection Very small Certain probability Isola/ondomain Sphereofreplica/on(SOR) 5

Software-Based Redundancy in Detail Sensors Safety-Critical System! P =? Interface. P =? Replica.1. Replica.2. P =? Majority. Voter. Actuators Replica.3. Isola/ondomain Software-based TMR requires:! Temporal and spatial isolation (isolation domains) Interface and Majority Voter Single points of failure! No error detection Very small Certain probability Risk analysis Inherently complex Random error distribution? (Nightingale, 2011) Sphereofreplica/on(SOR) 5

Software-Based Redundancy in Detail Safety-Critical System! Replica.1. Sensors Interface. Replica.2. Majority. Voter. Actuators Replica.3. Isola/ondomain Sphereofreplica/on(SOR) Software-based TMR requires:! Temporal and spatial isolation (isolation domains) Interface and Majority Voter Single points of failure! No error detection Very small Certain probability Risk analysis Inherently complex Random error distribution? (Nightingale, 2011) 5

Agenda Introduction! The Combined Redundancy Approach! Eliminating Vulnerabilities High-Reliability Voters Example: UAV Flight Control! CoRed Implementation Target System: I4Copter Evaluation! Experimental Setup Results Conclusion! 6

! CoRed Overview Holistic Protection Approach Isola/ondomain The Combined Redundancy Approach (CoRed)! TMR +{ Encodedopera/on Sphereofreplica/on(SOR) 7

! CoRed Overview Holistic Protection Approach Isola/ondomain The Combined Redundancy Approach (CoRed)! Data-flow encoding TMR +{ Encodedopera/on Sphereofreplica/on(SOR) 7

! CoRed Overview Holistic Protection Approach Isola/ondomain The Combined Redundancy Approach (CoRed)! Data-flow encoding TMR +{ High-reliability voters Encodedopera/on Sphereofreplica/on(SOR) 7

CoRed Overview Holistic Protection Approach 1 2 3 Isola/ondomain The Combined Redundancy Approach (CoRed)! Data-flow encoding TMR +{ High-reliability voters Encodedopera/on Holistic Protection Approach! Input to output protection 1 Reading inputs 2 Processing 3 Distributing outputs Composability On application and system level Sphereofreplica/on(SOR) 7

! Eliminating Input and Output Vulnerabilities SOR Y (Value) Encode. Y (EncodedValue) Decode. Y X (Value) Encode. X (EncodedValue) Decode. X Inter-domain data-flow protection! Checksum vs. Arithmetic code (AN code) AN Code Encoded data operations Enabler for high-reliability voter 8

Eliminating Input and Output Vulnerabilities SOR Y (Value) Encode. Y (EncodedValue) Decode. Y X (Value) Encode. X (EncodedValue) Decode. X Inter-domain data-flow protection! Checksum vs. Arithmetic code (AN code) AN Code Encoded data operations Enabler for high-reliability voter CoRed: Extended AN code ( code)! Based on VCP (Forin, 1989) 8

Eliminating Input and Output Vulnerabilities SOR Y (Value) Encode. Y (EncodedValue) Decode. Y X (Value) Encode. X (EncodedValue) Decode. X A (Prime) B X (Signature) D (Timestamp) Inter-domain data-flow protection! Checksum vs. Arithmetic code (AN code) AN Code Encoded data operations Enabler for high-reliability voter CoRed: Extended AN code ( code)! Based on VCP (Forin, 1989) Data integrity: Prime Address integrity: Per variable signature Outdated data: Timestamp } X = X A + B X + D 8

Eliminating Input and Output Vulnerabilities SOR Y (Value) X (Value) Encode. Encode.. Z Decode. Z = X Y A (Prime) B X (Signature) D (Timestamp) Inter-domain data-flow protection! Checksum vs. Arithmetic code (AN code) AN Code Encoded data operations Enabler for high-reliability voter CoRed: Extended AN code ( code)! Based on VCP (Forin, 1989) Data integrity: Prime Address integrity: Per variable signature Outdated data: Timestamp Set of arithmetic operands (+, -, *, =, ) Tailored for efficient encoded data voting } X = X A + B X + D 8

High-Reliability Voter Basics (1) Replica.1 X Encode. X Replica.2. Y Encode. Y Encoded.Voter. {E, W} Replica.3. Z Encode. Z ProviderEncodedVoter CoRed Encoded Voter! Input: variants ( X, Y, Z ) Output: Equality set (E) and winner (W) Based on operations No decoding necessary Branch decisions (equality) on encoded data! IFF difference of encoded values equals difference of static signatures X = Y X Y = B X B Y Each branch decision Unique signature! 9

High-Reliability Voter Basics (2) Replica.1 X Encode. X Replica.2. Y Encode. Y Encoded.Voter. {E, W} Check.(Decode). X Replica.3. Z Encode. Z e.g.,x isthewinner ProviderEncodedVoterConsumer Correct control-flow! Valid decision Unique control-flow path Each path Unique signature Control-flow signatures! Static signature (expected value): Compile-time Used as return value E Dynamic signature (actual value): Runtime, computed from variants Applied to winner W Validation: Subsequent check (decode) 10

CoRed Encoded Voter Example X = Y false Y = Z false X = Z false true true true apply(y, sig dyn {Y,Z}) return sig static {Y,Z} apply(x, sig dyn {X,Z}) return sig static {X,Z} X = Z false apply(x, sig dyn {X,Y}) return sig static {X,Y} return sig static {} true apply(x, sig dyn {X,Y,Z}) return sig static {X,Y,Z} Control-flow monitoring! Finding quorum Static signature Reapply path specific operations Sign winner with dynamic signature Check Subsequent decode 11

! CoRed Encoded Voter Example X = Y false Y = Z true false 1 X = Z true false true apply(y, sig dyn {Y,Z}) return sig static {Y,Z} apply(x, sig dyn {X,Z}) return sig static {X,Z} X = Z false apply(x, sig dyn {X,Y}) return sig static {X,Y} return sig static {} true apply(x, sig dyn {X,Y,Z}) return sig static {X,Y,Z} 1. Improper branch decision: Y Z Voter elects Y as winner (which is incorrect) Returns E and W correctly Subsequent decode will fail! sig static sig dyn 12

CoRed Encoded Voter Example X = Y true false Y = Z true apply(y, sig dyn {Y,Z}) return sig static {Y,Z} false 1 X = Z true apply(x, sig dyn {X,Z}) return sig static {X,Z} false X = Z false apply(x, sig dyn {X,Y}) return sig static {X,Y} 2 return sig static {} true apply(x, sig dyn {X,Y,Z}) return sig static {X,Y,Z} 1. Improper branch decision: Y Z Voter elects Y as winner (which is incorrect) Returns E and W correctly Subsequent decode will fail! sig static sig dyn 2. Faulty jump! Voter elects X and computes W correctly Returns incorrect E Again subsequent decode will fail! 12

Implementation CoRed implementation! Easy-to-use C++ templates and libraries Hardware independent: Code and Encoded Voter Thin OS integration layer PXROS-HR (Industry-strength commercial RTOS) CiAO (AUTOSAR-OS compatible) CoRed artefacts Real-time tasks and jobs Runtime-environment requirements! Temporal isolation static schedule (time triggered) Spatial isolation HW-based memory protection 13

CoRed Protected Flight Control Infineon TriCore TC1796 Redundant Sensor Setting Target System: I4Copter quadrotor platform Industry-grade hardware and software Triple redundant sensor setting Multi-application system Flight control application! Safety-critical Model-based: MATLAB Simulink Embedded Coder C++ code 14

Evaluation Experimental Setup FlightIControlApplica/on [Rebaudengo, 1999] Sensor System Sensor 1 Encode Sensor 2 Encode Sensor 3 Encode CoRed Encoded Tolerance Voter Decode Decode Decode Replica 1 Replica 2 Replica 3 Encode Encode Encode CoRed Encoded (Exact) Voter Network Interface Decode Actuator Remote Node FaultIInjec/on CampaignManager FaultDB ResultsDB SystemUnderTest HardwareDebugger HostComputer Fault injection Using hardware debugger! Injection of arbitrary fault patterns Minimal-intrusive Minimizing probe effects Fault list generation (Rebaudengo, 1999) Bits registers instructions Potentially huge fault space Vast majority of faults are non-effective Systematic elimination 15

Evaluation Experimental Setup FlightIControlApplica/on [Rebaudengo, 1999] Sensor System Sensor 1 Encode Sensor 2 Encode Sensor 3 Encode CoRed Encoded Tolerance Voter Decode Decode Decode Replica 1 Replica 2 Replica 3 Encode Encode Encode CoRed Encoded (Exact) Voter Network Interface Decode Actuator Remote Node FaultIInjec/on CampaignManager FaultDB ResultsDB SystemUnderTest HardwareDebugger HostComputer Fault injection Using hardware debugger! Injection of arbitrary fault patterns Minimal-intrusive Minimizing probe effects Fault list generation (Rebaudengo, 1999) Bits registers instructions Potentially huge fault space Vast majority of faults are non-effective Systematic elimination Outcome: 401,592experiments Effec+ve: 67,617errors Categories:.FailSilent,Masked,HardwareDetected,ICode, ControlIFlow,SilentDataCorrup/on 15

Evaluation Experimental Results (1) Interface. Replica.1. Replica.2. 90 % 80 % Data Address Replica.3. Silent Data Corruptions Hardware Detected -Code Detected Masked Distribution of Effective Faults 70 % 60 % 50 % 40 % 30 % 20 % 10 % 0 % Redundant execution campaign (Interface)! Total: ~45,000 Errors SDC HW Mask SDC HW Unprotected Plain TMR CoRed TMR Mask SDC HW Mask 16

Evaluation Experimental Results (1) Interface. Replica.1. Replica.2. 90 % 80 % Data Address Replica.3. Silent Data Corruptions Hardware Detected -Code Detected Masked Distribution of Effective Faults 70 % 60 % 50 % 40 % 30 % 20 % 10 % 0 % Redundant execution campaign (Interface)! Total: ~45,000 Errors Unprotected: Suffers from 3,622 corruptions! SDC HW Mask SDC HW Unprotected Plain TMR CoRed TMR Mask SDC HW Mask 16

Evaluation Experimental Results (1) Interface. Replica.1. Replica.2. 90 % 80 % Data Address Replica.3. Silent Data Corruptions Hardware Detected -Code Detected Masked Distribution of Effective Faults 70 % 60 % 50 % 40 % 30 % 20 % 10 % 0 % Redundant execution campaign (Interface)! Total: ~45,000 Errors Unprotected: Suffers from 3,622 corruptions! TMR: Suffers from 71 corruptions! SDC HW Mask SDC HW Unprotected Plain TMR CoRed TMR Mask SDC HW Mask 16

Evaluation Experimental Results (1) Interface. Replica.1. Replica.2. 90 % 80 % Data Address Replica.3. Silent Data Corruptions Hardware Detected -Code Detected Masked Distribution of Effective Faults 70 % 60 % 50 % 40 % 30 % 20 % 10 % 0 % SDC Redundant execution campaign (Interface)! Total: ~45,000 Errors Unprotected: Suffers from 3,622 corruptions! TMR: Suffers from 71 corruptions! CoRed: Remaining corruptions are covered 0 corruptions HW Mask SDC HW Unprotected Plain TMR CoRed TMR Mask SDC HW Mask 16

Evaluation Experimental Results (2) Replica.1. Replica.2. Voter. 90 % 80 % Data Address Replica.3. 70 % 60 % 50 % 40 % Silent Data Corruptions 30 % Hardware Detected -Code Detected 20 % Control-flow Monitoring 10 % Masked 0 % SDC HW CFM Mask SDC HW CFM Mask Voter campaign! Plain Voter CoRed Encoded Voter 17

Evaluation Experimental Results (2) Replica.1. Replica.2. Voter. 90 % 80 % Data Address Replica.3. 70 % 60 % Silent Data Corruptions Hardware Detected -Code Detected Control-flow Monitoring Masked 50 % 40 % 30 % 20 % 10 % 0 % SDC Voter campaign! Plain voter: Total ~11,000 2,465 masked 7,245 retry 1,223 corruptions HW Plain Voter CFM Mask SDC HW CFM CoRed Encoded Voter Mask 17

Evaluation Experimental Results (2) Replica.1. Replica.2. Voter. 90 % 80 % Data Address Replica.3. 70 % 60 % Silent Data Corruptions Hardware Detected -Code Detected Control-flow Monitoring Masked 50 % 40 % 30 % 20 % 10 % 0 % SDC Voter campaign! Plain voter: Total ~11,000 2,465 masked 7,245 retry 1,223 corruptions CoRed Encoded Voter: Total ~26,000 1,228 masked 24,682 retry 0 corruptions HW Plain Voter CFM Mask SDC HW CFM CoRed Encoded Voter Mask 17

Evaluation Experimental Results (2) Replica.1. Replica.2. Replica.3. 90 % 60 % Data Address Voter. 80 % 100 µs Evaluation Overhead 70 % 107.1% 80 µs 50 % I4Copter Flight-Control: 7.1% overhead (compared to 40 plain % TMR) 60 µs Absolute numbers: 1,842µs (application) Silent Data Corruptions 30 % Hardware Detected WCET 20 % -Code 40 Detected Selectivity! µs Control-flow Monitoring I4Copter system 10 % CPU utilisation: 41% Masked Full replication impossible, CPU: 120% 0 % 20 µs Mission-critical replication of flight control possible with CoRed, Plain CPU: Voter 60% CoRed Encoded Voter Voter campaign! 0 µs Plain voter: CoRed Pair Plain TMR CoRed TMR CoRed Pair + Spare Total ~11,000 2,465 masked 7,245 retry 1,223 corruptions CoRed Voter: Total ~26,000 1,228 masked 24,682 retry 0 corruptions Replication 100.0% Overhead Analysis! Voting Interface State Recovery SDC HW 103.2% 10.2µs (plain voter) vs. 77.6µs (CoRed voter) CFM Mask SDC HW 115.1% CFM 120 % 100 % Flight-Control Execution Time Mask 18

!!! Conclusion Safety-Critical System! Replica.1. Sensors Interface. Replica.2. Majority. Voter. Actuators Replica.3. 19

!! Conclusion Safety-Critical System! Decode. Replica.1. Encode. Sensors Encoded. Interface. Decode. Replica.2. Encode. Encoded Voter. Actuators Decode. Replica.3. Encode. The Combined Software Redundancy Approach (CoRed)! Eliminate Single Points of Failure in software-based TMR No specific application knowledge necessary Holistic approach: input-to-output protection 19

!! Conclusion Safety-Critical System! Decode. Replica.1. Encode. Sensors Encoded. Interface. Decode. Replica.2. Encode. Encoded Voter. Actuators Decode. Replica.3. Encode. The Combined Software Redundancy Approach (CoRed)! Eliminate Single Points of Failure in software-based TMR No specific application knowledge necessary Holistic approach: input-to-output protection 19

! Conclusion The Combined Software Redundancy Approach (CoRed)! Eliminate Single Points of Failure in software-based TMR No specific application knowledge necessary Holistic approach: input-to-output protection Applicability: Flight control! I4Copter MAV Selective and composable 19

Conclusion The Combined Software Redundancy Approach (CoRed)! Eliminate Single Points of Failure in software-based TMR No specific application knowledge necessary Holistic approach: input-to-output protection Applicability: Flight control! I4Copter MAV Selective and composable Experimental Results! CoRed is effective Silent data corruptions can be eliminated Only 7.1% overhead (flight control example) 19

Thank you! SYSTEM SOFTWARE GROUP http://www4.cs.fau.de

References (1) International Roadmap for Semiconductors, 2001 (2) Implications of microcontroller software on safety-critical automotive systems (Infineon 2008) (3) P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi, Modelling the effect of technology trends on the soft error rate of combinational logic, in DSN 02: Proceedings of the 2002 International Conference on Dependable Systems and Networks (4) Edmund B. Nightingale, John R Douceur, and Vince Orgovan, Cycles, Cells and Platters: An Empirical Analysis of Hardware Failures on a Million Consumer PCs, in Proceedings of EuroSys 2011, Awarded "Best Paper", ACM, April 2011 (5) M. Rebaudengo and M. S. Reorda, Evaluating the fault tolerance capabilities of embedded systems via bdm, VTS 1999 (6) Forin, Vital coded microprocessor principles and application for various transit systems, 1989 21