Safety and Reliability of Software-Controlled Systems Part 14: Fault mitigation

Similar documents
Functional Safety and Safety Standards: Challenges and Comparison of Solutions AA309

88 Dugald Campbell. Making Industrial Systems Safer Meeting the IEC standards

DEPENDABLE PROCESSOR DESIGN

What functional safety module designers need from IC developers

Failure Diagnosis and Prognosis for Automotive Systems. Tom Fuhrman General Motors R&D IFIP Workshop June 25-27, 2010

Safety and Reliability Engineering Part 5: Redundancy / Software Reliability

ISO26262 This Changes Everything!

Enabling Increased Safety with Fault Robustness in Microcontroller Applications

Functional safety in BATTERY MANAGEMENT SYSTEMS

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:

VDE Testing and Certification Institute

Understanding SW Test Libraries (STL) for safetyrelated integrated circuits and the value of white-box SIL2(3) ASILB(D) YOGITECH faultrobust STL

Issues in Programming Language Design for Embedded RT Systems

Software architecture in ASPICE and Even-André Karlsson

DK32 - DK34 - DK37 Supplementary instructions

FUNCTIONAL SAFETY AND THE GPU. Richard Bramley, 5/11/2017

COMPLEX EMBEDDED SYSTEMS

to 12a Added Standard and Electrical requirements for UL table 1.1

Fault-robust microcontrollers for automotive applications

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992

FlexRay International Workshop. Protocol Overview

Functional Safety Design Packages for STM32 & STM8 MCUs

Reliable Statements about a Fault-Tolerant X-by-Wire ecar. Reliable Statements about a Fault-Tolerant X-by-Wire ecar Unrestricted 2017 Siemens AG

UM1741. STM32F0 Series safety manual. User manual. Introduction

Extension to Chapter 2. Architectural Constraints

Proline Prowirl 72, 73

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

Failure Modes, Effects and Diagnostic Analysis

Welcome to the overview of ACS880 functional safety, FSO-11 Safety functions module.

Deriving safety requirements according to ISO for complex systems: How to avoid getting lost?

Industrial Embedded Systems - Design for Harsh Environment -

Verification and Test with Model-Based Design

High Reliability Systems. Lloyd Moore, President

Hiperface DSL Combined with Safety

PROFIBUS and Integrated Safety architectures in Ex areas

ISO INTERNATIONAL STANDARD. Road vehicles FlexRay communications system Part 4: Electrical physical layer specification

Is This What the Future Will Look Like?

ICS Regent. Multiplexed I/O Modules PD-6035 (T3491)

Report. Certificate Z Rev. 00. SIMATIC Safety System

Functional safety manual RB223

Overall Structure of RT Systems

Automotive ECU Design with Functional Safety for Electro-Mechanical Actuator Systems

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C- Ware, the Energy Efficient Solutions logo, Kinetis,

KESO Functional Safety and the Use of Java in Embedded Systems

Using an innovative SoC-level FMEA methodology to design in compliance with IEC61508

Foundation Fieldbus Safety Instrumented System (FF SIS) FF-SIS Meeting. Hannover. April 21, 2004

ICS Regent. Monitored Digital Input Modules 24 VDC (T3411) PD-6031

Engineering of Reliable Software Systems

Vibrating Switches SITRANS LVL 200S, LVL 200E. Relay (DPDT) With SIL qualification. Safety Manual. Siemens Parts

An Introduction to FlexRay as an Industrial Network

Hardware safety integrity (HSI) in IEC 61508/ IEC 61511

OPTISWITCH 5300C. Safety Manual. Vibrating Level Switch. Relay (2 x SPDT) With SIL qualification

Fault-Tolerant Computing

FUNCTIONAL SAFETY CERTIFICATE

Error Mitigation of Point-to-Point Communication for Fault-Tolerant Computing

HART Temperature Transmitter for up to SIL 2 applications

Fault-tolerant techniques

Certified Automotive Software Tester Sample Exam Paper Syllabus Version 2.0

FAULT TOLERANT SYSTEMS

FUNCTIONAL SAFETY FOR INDUSTRIAL AUTOMATION

Detector Control System board for FAIR. J. A. Lucio Martínez Infrastructure and Computer Systems in Data Processing (IRI) Goethe University Frankfurt

Type 9160 / Transmitter supply unit / Isolating repeater. Safety manual

Applying and Evaluating Architectural IEC Safety Patterns

A specification proposed by JASPAR has been adopted for AUTOSAR.

Safety Manual VEGASWING 61, 63. Relay (DPDT) With SIL qualification. Document ID: 52082

IMIO100 IMIO105. DDC controllers. Summary

HART Temperature Transmitter for up to SIL 2 applications

Functional Example AS-FE-I-013-V13-EN

Riccardo Mariani, Intel Fellow, IOTG SEG, Chief Functional Safety Technologist

Industrial Embedded Systems - Design for Harsh Environment - Dr. Alexander Walsch

PC104P-16AO2-MF Two-Channel 16-Bit High-Speed Analog Output PMC Board With 400,000 Samples per Second per Channel, and Independent Clocking

A CAN-Based Architecture for Highly Reliable Communication Systems

New developments about PL and SIL. Present harmonised versions, background and changes.

Functional Safety on Multicore Microcontrollers for Industrial Applications. Thomas Barth (h-da) Prof. Dr.-Ing. Peter Fromm (h-da)

Failure Modes, Effects and Diagnostic Analysis

Original operating instructions Safety relay with relay outputs with and without delay G1502S / / 2016

Intel iapx 432-VLSI building blocks for a fault-tolerant computer

SVI II ESD. SIL3 Partial Stroke Test Device October 2007 BW5000-ESD. The only SIL3 Smart ESD device that is live during and after a shutdown.

Product Specifications

Original operating instructions Safety relay with relay outputs G1501S / / 2016

Systematic Hardware Platform Selection - Introduction to Embedded Systems-

Safety Systems. Prof. Dr.-Ing. habil. Josef Börcsök, HIMA Paul Hildebrandt GmbH + Co KG, Germany. Introduction

Failure Modes, Effects and Diagnostic Analysis

Mobrey Hydratect 2462

Computer Hardware Requirements for Real-Time Applications

Multiple Views and Relationships for Quality Driven Architecture with AADL: A Multimodel for Software Product Lines

RazorMotion - The next level of development and evaluation is here. Highly automated driving platform for development and evaluation

Very Large Scale Integration (VLSI)

Distributed Systems COMP 212. Revision 2 Othon Michail

Safety Manual. VEGABAR series ma/hart - two-wire and slave sensors With SIL qualification. Document ID: 48369

Application of Functional Safety in All-Electric Control Systems. Dr. Carsten Mahler Prof. Dr. Markus Glaser 24 October 2018

Fault-Injection testing and code coverage measurement using Virtual Prototypes on the context of the ISO standard

VLSI System Testing. Fault Simulation

The Embedded computing platform. Four-cycle handshake. Bus protocol. Typical bus signals. Four-cycle example. CPU bus.

TU Wien. Fault Isolation and Error Containment in the TT-SoC. H. Kopetz. TU Wien. July 2007

SIRIUS Safety Integrated. Modular safety system 3RK3

6. Fault Tolerance. CS 313 High Integrity Systems; CS M13 Critical Systems; Michaelmas Term 2009, Sect

Fault-Tolerant Computing

Fault Tolerance. The Three universe model

Rear Drive Axle and Differential

Transcription:

Safety and Reliability of Software-Controlled Systems Part 14: Fault mitigation Prof. Dr.-Ing. Stefan Kowalewski Chair Informatik 11, Embedded Software Laboratory RWTH Aachen University Summer Semester 2011

MITIGATION OF HARDWARE & PROGRAMMING FAULTS Part 14: Fault mitigation, Slide 2

Hardware Reliability How do you avoid system failures due to random hardware faults? Fault prevention? Increase reliability of hardware components Common target failure rate: 10-9 h -1 Often not sufficient Fault removal? Testing, verification, simulation Detects production and design faults only Fault tolerance? Make use of redundancy Enables to achieve safety and/or reliability goal Part 14: Fault mitigation, Slide 3

Hardware Fault Tolerance Tolerance of hardware faults by means of hardware replication? Triple modular redundancy is often too expensive! Restricted use of hardware redundancy Software-implemented hardware fault tolerance: Make use of software to monitor the hardware Sophisticated monitoring concepts: Combine hardware and software techniques See e.g. E-Gas monitoring concept Part 14: Fault mitigation, Slide 4

Hardware Components Clock Power supply Sensor Connector Digital Input Digital Output Connector Actuator Sensor Connector Analogue Input Processing unit Analogue Output Connector Actuator Serial bus interface RAM ROM : information source / sensor : information sink / actuator : hardware component : function (internal data flow not specified) : data flow Part 14: Fault mitigation, Slide 5

Two aspects of fault tolerance: Fault Tolerance 1. Error Detection A deviation from expected service is detected. 2. System Recovery The system is transformed to a error-free state or a state in which the error does not occur again. Design for safety: Initiate transition to a safe state. Part 14: Fault mitigation, Slide 6

IEC 61508: Safe Failure Fraction Determine for each safety-related component: Failure rate of a safe failure: λ S Failure rate of an undetected dangerous failure: λ DU Failure rate of a detected dangerous failure: λ DD Safe Failure Fraction: SFF S DD S DD DU Part 14: Fault mitigation, Slide 7

IEC 61508: Safe Failure Fraction Part 14: Fault mitigation, Slide 8

Functional Tests of RAM cells Correct functioning of a ram cell means: Reading a 1 and a 0 correctly, changing a 1 into a 0 correctly and vise versa and writing a 1 and a 0 correctly each independently of the states of other cells Functional Test: Sequence of write and read accesses Complexity of a complete test of n cells: 2 n Use a fault model, e.g. stuck-at-faults, coupling faults Popular tests: March tests, test Abraham Part 14: Fault mitigation, Slide 9

March Test March tests consists of a sequence of march elements A March elements consists of a sequence of operations applied to a cell: Operations: w0, w1, r0, r1 Possible address orders: increasing order Example: March C- : decreasing order arbitrary order { (w0); (r0, w1); (r1, w0); (r0, w1); (r1,w0); (r0); } Part 14: Fault mitigation, Slide 10

Classification of Hardware Faults Classification with respect to persistency: Permanent faults: presence is assumed to be continuous in time Transient faults: presence is bounded in time What happens when a transient fault occurs?» logical 0 logical 1» logical 1 logical 0 Called bitflip, can lead to a soft error Causes?» Radiation» Crosstalk» Noise Part 14: Fault mitigation, Slide 11

Detecting memory faults Which class of faults is detected by functional tests? Permanent and transient faults But useful for permanent faults only Concurrent fault detection? Use redundancy Parity bit Block replication Error correction code (ECC) Fault detection in invariable memory? Cyclic redundancy checks (CRC) Part 14: Fault mitigation, Slide 12

Dependent failures Condition for independent events: P(A and B) = P(A) P(B) Condition for dependent failures: P(Failure A and Failure B ) < > P(Failure A ) P(Failure B ) Event Failure A Failure A Failure B Failure B Common Cause Failures Cascading Failures Part 14: Fault mitigation, Slide 13

Dependent Failures Typical events or root causes: Common and shared resources Hardware Power supply Input data Specification Environmental factors Temperature Humidity Electromagnetic compliance Part 14: Fault mitigation, Slide 14

Detecting faults in the Processing Unit Self-test by software Test of the registers and internal ram the coding and execution including flag register the address calculation the program counter and stack pointer Can a processing unit determine its own state of health? Common Cause Failures possible Increase fault coverage: Trigger and evaluate test by external hardware unit Part 14: Fault mitigation, Slide 15

Detecting faults in the Processing Unit Time redundancy Using the same software Detects transient faults only Using diverse software versions Transient and some permanent faults Control flow checking Define valid program paths at design time Compute golden signature Check compliance at run time Compute signature and check against golden signature Implemented either exclusively in software Or using a watchdog processor Part 14: Fault mitigation, Slide 16

E-Gas E-Gas: Throttle-by-wire Drive-by-wire application: no mechanical link between the control element and the actuator Required Computations: Metering fuel Adjusting ignition time point Controlling the air supply Possibility of increasing the power of the engine Safety-critical system! Ensure the correct function Part 14: Fault mitigation, Slide 17

Controlling the Drive Unit of a Vehicle Part 14: Fault mitigation, Slide 18 [Source: US Patent 5880568]

Controlling the Drive Unit of a Vehicle Part 14: Fault mitigation, Slide 19 [Source: US Patent 5880568]

Dual Core Microcontroller Two driving forces: 1. Performance same performance at 200MHz as a single-core MCU operating at 500 MHz Lower power consumption Lower heat generation 2. Safety redundancy: two processors different monitoring concepts possible Part 14: Fault mitigation, Slide 20

Dual Core Architectures Homogenous redundancy Core 1 Core 2 Symmetric execution Heterogeneous redundancy Core 1 Core 2 Asymmetric execution Program Core 1 Core 2 Program1 Program2 Core 1 Core 2 Part 14: Fault mitigation, Slide 21

Dual-Core Lockstep Dual-core lockstep: Lockstep principle: the same way. Fault detection unit: Homogenous, synchronous dual-core architecture Both processors respond to the same data in Comparator comparing the output data of the processors. Master Bus Peripherals Checker Comparator Signal error Part 14: Fault mitigation, Slide 22

Dual-Core Lockstep Disadvantages: No additional performance using a second core Detection of processor faults only: Susceptible to systematic and cascading failures High costs: special dual-core architecture required Common Cause Failures? Part 14: Fault mitigation, Slide 23

Software Faults How do you avoid system failures due to software faults? Fault avoidance Apply different techniques, e.g. (semi-)formal methods, graphical modeling, Coding guidelines Fault removal Reviewing, testing, simulation, verification Fault tolerance Assertions Plausibility checks N-version-programming Part 14: Fault mitigation, Slide 24

Choice of Programming Language For SIL 3 and 4 applies: The use of a language subset is highly recommended. Part 14: Fault mitigation, Slide 25 [IEC 61508-7, Annex C (informative)]

Why can C cause problems? Example: If (a = b) { /* some instruction */ } What does it refer to? If (a == b) { /* some instruction */ } a = b If (a!=0) { /* some instruction */ } Rule: Do not use assignments in conditions! Part 14: Fault mitigation, Slide 26

Design Recommendations Part 14: Fault mitigation, Slide 27 [IEC 61508-3, Annex B (normative)]

Coding Guidelines Goals of Coding Guidelines Avoid misunderstandings Avoid undefined behaviour Increase code readability Avoids the introduction of defects Makes debugging easier Simplifies adding new features Coding guidelines can be a controversial issue, e.g. using Naming conventions Style conventions. Part 14: Fault mitigation, Slide 28

MISRA-C MISRA: (Motor Industry Software Reliability Association) MISRA-C: Development guideline for vehicle based software implemented in C Popular guidelines not only in the automotive industry There are tools, e.g. PC-Lint offering MISRA compliance checking. Though, not all rules can be checked automatically. Part 14: Fault mitigation, Slide 29

Satisfying the Tool Original code: If (a=b) { /* some instruction /* } Tool reports violation: Condition should be of Boolean type. What the programmer did: If (!!(a=b)) { /* some instruction /* } Part 14: Fault mitigation, Slide 30

IEC 61508: Techniques & measures according to SIL Part 14: Fault mitigation, Slide 31

IEC 61508: Techniques & measures according to SIL Part 14: Fault mitigation, Slide 32