SAE AADL Error Model Annex: Discussion Items

Similar documents
SAE AADL Error Model Annex: An Overview

Error Model Annex Revision

CIS 890: High-Assurance Systems

AEROSPACE STANDARD. SAE Architecture Analysis and Design Language (AADL) Annex Volume 3: Annex E: Error Model Annex RATIONALE

Analytical Architecture Fault Models

AADL Fault Modeling and Analysis Within an ARP4761 Safety Assessment

OSATE Analysis Support

AEROSPACE STANDARD. SAE Architecture Analysis and Design Language (AADL) Annex Volume 3: Annex E: Error Model Annex RATIONALE

AADL v2.1 errata AADL meeting Sept 2014

Investigation of System Timing Concerns in Embedded Systems: Tool-based Analysis of AADL Models

Modeling the Implementation of Stated-Based System Architectures

Pattern-Based Analysis of an Embedded Real-Time System Architecture

Error Model Meta Model and Plug-in

Dependability Modeling Based on AADL Description (Architecture Analysis and Design Language)

Model-based Architectural Verification & Validation

Schedulability Analysis of AADL Models

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Distributed Systems COMP 212. Lecture 19 Othon Michail

02 - Distributed Systems

02 - Distributed Systems

A System Dependability Modeling Framework Using AADL and GSPNs

Flow Latency Analysis with the Architecture Analysis and Design Language (AADL)

TSW Reliability and Fault Tolerance

Analysis and Design Language (AADL) for Quantitative System Reliability and Availability Modeling

Architecture-led Diagnosis and Verification of a Stepper Motor Controller

Chapter 8 Fault Tolerance

Impact of Runtime Architectures on Control System Stability

Fault Tolerance. Distributed Systems. September 2002

Configuring Banyan VINES

Dependability tree 1

Module 8 Fault Tolerance CS655! 8-1!

CS 347: Distributed Databases and Transaction Processing Notes07: Reliable Distributed Database Management

Fault Tolerance. The Three universe model

Fault Tolerance. Distributed Systems IT332

Automatic Generation of Static Fault Trees from AADL Models

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992

Safety and Reliability of Software-Controlled Systems Part 14: Fault mitigation

Dep. Systems Requirements

Module 8 - Fault Tolerance

Today: Fault Tolerance

Ch. 21: Object Oriented Databases

Introduction to AADL analysis and modeling with FACE Units of Conformance

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

Illustrating the AADL Error Modeling Annex (v. 2) Using a Simple Safety-Critical Medical Device

Exercise Unit 2: Modeling Paradigms - RT-UML. UML: The Unified Modeling Language. Statecharts. RT-UML in AnyLogic

Fault Tolerance. Chapter 7

Distributed Systems

Last Class:Consistency Semantics. Today: More on Consistency

An Information Model for High-Integrity Real Time Systems

Failure Tolerance. Distributed Systems Santa Clara University

Distributed Systems Fault Tolerance

Priya Narasimhan. Assistant Professor of ECE and CS Carnegie Mellon University Pittsburgh, PA

Complexity-Reducing Design Patterns for Cyber-Physical Systems. DARPA META Project. AADL Standards Meeting January 2011 Steven P.

Issues in Programming Language Design for Embedded RT Systems

Diagnostic Information for Control-Flow Analysis of Workflow Graphs (aka Free-Choice Workflow Nets)

Today: Fault Tolerance. Fault Tolerance

An Implementation of the Behavior Annex in the AADL-toolset Osate2

Improving Quality Using Architecture Fault Analysis with Confidence Arguments

CS 347 Parallel and Distributed Data Processing

Methods and Tools for Embedded Distributed System Timing and Safety Analysis. Steve Vestal Honeywell Labs

Introduction to Software Fault Tolerance Techniques and Implementation. Presented By : Hoda Banki

Myron Hecht, Alex Lam, Chris Vogl, Presented to 2011 UML/AADL Workshop Las Vegas, NV. April, 2011

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Overloading, Type Classes, and Algebraic Datatypes

Implementation Issues. Remote-Write Protocols

Motivation State Machines

Fault Tolerance. Basic Concepts

Distributed Systems (ICE 601) Fault Tolerance

Fault Propagation and Transformation: A Safety Analysis. Malcolm Wallace

A System Performance in Presence of Faults Modeling Framework Using AADL and GSPNs

Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale

INTERNATIONAL TELECOMMUNICATION UNION

Translating AADL into BIP Application to the Verification of Real time Systems

Chapter 11 - Data Replication Middleware

Hierarchical FSMs with Multiple CMs

Eventual Consistency. Eventual Consistency

Update on AADL Requirements Annex

State-Based Testing Part B Error Identification. Generating test cases for complex behaviour

Unified Modeling Language 2

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

INTERNATIONAL TELECOMMUNICATION UNION

Time-Triggered Ethernet

AADL Webinar. Carnegie Mellon University Notices Architecture Analysis with AADL The Speed Regulation Case-Study... 4

ARINC653 AADL Annex. Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Julien Delange 07/08/2013

System Models for Distributed Systems

AADL Graphical Editor Design

e-issn: p-issn:

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

The Montana Toolset: OSATE Plugins for Analysis and Code Generation

Enhancing The Fault-Tolerance of Nonmasking Programs

ARTIST-Relevant Research from Linköping

Architecture Description Languages. Peter H. Feiler 1, Bruce Lewis 2, Steve Vestal 3 and Ed Colbert 4

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems. Fault Tolerance

Diagnosis in the Time-Triggered Architecture

Chapter 11 Object and Object- Relational Databases

AADL Meta Model & XML/XMI

TSP Secure. Software Engineering Institute Carnegie Mellon University Pittsburgh, PA September 2009

Model-Based Embedded System Engineering & Analysis of Performance-Critical Systems

Transcription:

SAE AADL Error Model Annex: Discussion Items Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 Peter Feiler phf@sei.cmu.edu April 2012 Sponsored by the U.S. Department of Defense 2011 by Carnegie Mellon University

Outline Recent changes Error annex name Observable error propagations Detection and recovery Transient errors Composite and abstracted error models Error type sets as conditions Unified transitions, outgoing propagations, detections, composite states Mappings and transformations Stochastic error flows Error type ontology Additional discussion items Three Examples in Document 2

Annex Name The annex is identified by a name in annex library and subclause declarations Annex EMV2 {** error model V1 syntax **}; Allows the original and EM V2 to co-exist 3

Error Propagation Paths Observable Error Propagations Example scenario: Co-located processors Without propagation path in core model: no connection or binding relationship Heat dissipation as observable error propagation Component A Component B Overheating Error Processor 1 Processor 2 Observable Error Propagation Path Observable error propagation points and paths Observable Error Propagation Point Observable propagation points and paths added via annex subclause observable_propagation ::= defining_observable_error_propagation_point_identifier : ( in out ) observable propagation error_type_set ; observable_propagation_connection ::= defining_observable_error_propagation_connection_identifier : observable source_qualified_observable_error_propagation_point -> target_qualified_observable_error_propagation_point ; 4

Error Source Specification Error Source of outgoing propagation Indication of intrinsic error types without explicitly declaring error events Es1: error source outp1{itemomission} when {ValueError} Alternative: Error event declaration as part of propagations view Es2: error source outp1{itemomission} when ErrEv1{ValueError} Component A Component B 5

Error Detection and Error Recovery Error detection and reporting Example: novalue as detectable persistent fault Self detection and reporting (internal to component) Detection of error propagation by recipient component Error recovery with probability of success and failure Branch transitions Recover Success Recover Failure Operational BadValueEvent PersistentFaultState out propagation NoValue ErrorOutport RecoverEvent BadValueState On Branch Transition BadValueState [RecoverEvent]-> ( Operational with 0.99, PersistentFaultState with 0.01); Component internal detection and reporting detections PersistentFaultState-[]-> ErrorOutPort {NoValue} ; out propagation BadValue Outport2 Inport1 ErrorOutport External detection and reporting detections all -[Inport1{NoValue}]-> ErrorOutPort {NoValue} ; 6

Transient Errors In error behavior state machine Probability that error event results in transient error behavior Branching transition with branch probability Error state with transition triggered by recover event Recover event has property to indicate time (range) and distribution over time (range) Distribution over single value vs. range Need for explicit characterization of propagation as transient? Transient nature of propagation in impact analysis Inferred from state machine vs. specification w/o state machine as part of error source specification 7

Composite Error Behavior Specification Composite error behavior specification for a component abstracted error behavior model in terms of the subcomponent error models composite error behavior use behavior qualified_error_state_machine_reference composite states [2 ormore (gps1{operational}, gps2.operational, gps3.operational) ]-> Working ; [1 ormore (gps1.failed{critical}, gps2.failed{critical}, gps3.failed{critical})]-> CriticalFailure ; [2 orless (gps1.failed{critical}, gps2.failed{critical}, gps3.failed{critical}) ]-> NonCriticalFailure ; end composite; Incoming and outgoing propagations in terms of the subcomponents Events in abstracted model : relation to composite model Transitions in abstracted model: relation to composite model Diagnosis of subcomponent failures or masked failures GPS gps1 sub1 sub2 sub3 gps2 gps3 Availability in critical and non-critical mode 8

Hierarchies of Independent Error Types Independent error type hierarchies Below are some canonical error type hierarchies Error types from different hierarchies can occur simultaneously Additional error type hierarchies Concurrency (Race Condition, Deadlock, Starvation) Parameters to types Range property for OutOfRange Rate for Rate related Bound (k) for Bounded Item Sequence Omission and Bounded Item Omission Interval 9

Error Type Sets Specification of error type (sets) on events, states, propagations Error types ValueError: type; TimingError: type; NoValue: type extends ValueError; BadValue: type extends ValueError; EarlyValue: type extends TimingError; LateValue: type extends TimingError; ETS1: type set {ValueError, TimingError}; E1: error event {ETS1}; S: state {ValueError, TimingError}; Trans1: S [E1]-> S1; Trans2: S{ValueError, EarlyValue} [ETS1]->S2; TS Two independent type hierarchies Error type set: Powerset of types from two type hierarchies Reference to named error type set Declaration of equivalent error type set constructor Transition triggered by E1 Error Type Set as Constraint {T1, T2} power set of two error type hierarchies {T1} error type set of one type {T1+T2} product type (one error type from each error type hierarchy) {T1,*} error type set with at least one element from type (hierarchy) T1 {NoError} represents the empty set Constrain transition to {EarlyValue} {BadValue} {NoValue} or {EarlyValue+BadValue} {EarlyValue+NoValue} tuples of S 10

Common Syntax Transitions S-[Inp1{BadValue}]-> S2 S-[Inp1{BadValue}]-> mask Component specific transition: Incoming propagation can trigger transition Explicit specification that an event (or incoming propagation) is ignored/masked in a given state Could be expressed as transition to itself all-[inbinding{nopower}]-> Failed Transition from any state to Failed state for given incoming propagation (or event) Outgoing propagations S{BadValue}-[]-> OutP1{BadValue} S{BadValue} [InP1{LateValue}]-> OutP1{BadValue,LateValue} S-[InP1{LateValue}]-> mask all [InP1{LateValue}]-> OutP1{BadValue,LateValue} Detections S{BadValue}-[]-> OutErrorPort1{BadValue} S{BadValue}-[]-> self.event_identifier{badvalue} Composite state State is reflected in outgoing propagation Incoming propagation is masked in a given state [2 ormore (gps1{operational}, gps2.operational, gps3.operational) ]-> Working ; Outgoing propagation determined by incoming propagation in a specific state Outgoing propagation of incoming propagation independent of any state 11

Completeness of Error Behavior Specification Handling of error events in Failure state Unspecified incomplete model Explicitly specify that event is ignored in a given state Option: Specify loopback transition Trans2: Failed [Fault ]-> Failed; Default resulting state type is value of last error event Specify type transformation with result type same as source type Component transition condition with result type (contributor) Need for Mask? Mask has no alternative for propagation declarations. Masking of incoming propagations and outgoing still affected by current state vs. outgoing propagation being NoError Failed [ Fault or Inport1]-> mask; Need for others keyword? Transition branch allows others to add up to 1 As transition condition to complement all others Catchall outgoing transition condition (used in original large example) As composite state specification meaning: Not in any other state 12

{NoError} aka {} NoError as error type constraint: {TimingError} means other error types in error type set are expected to be NoError No need to explicitly specify Noerror for all other error types in same typeset Expected presence of multiple types in tuple via product type declaration NoError as incoming propagation constraint Error Type Set as Constraint {T1, T2} power set of two error type hierarchies {T1} error type set of one type {T1+T2} product type (one error type from each error type hierarchy) {T1,*} error type set with at least one element from type (hierarchy) T1 {NoError} represents the empty set Inport1{LateValue}: don t care or no error on other incoming? Inport1{LateValue} and inport2{earlyvalue}: error on both Inport1{LateValue} and inport2{noerror}: Typed error state without a type token Can error state represent an empty type, i.e., {NoError}? 13

Type Mappings and Transformations Element type mapping and transformation Separate rules for each element type Mapping/transformation within the same type hierarchy BadValue -> NoValue BadValue -[NoValue]-> NoValue Type tuple mapping and transformation The left hand type set constraint must match Mapping/transformation to any result tuple {BadValue} -> {NoValue} {ValueError,TimingError} -[{NoValue}]-> {NoValue} Handling of corner cases Source is {NoError}: default is target = contributor Contributor is {NoError}: default is target = source Target becomes {NoError}: Target as Mask: target is unaffected by contributor Overlap in mapping or transformation rules Only if same result type We do not assume ordering and overriding rules (as was done in York FPTC) 14

Result Type Determination Default result rule for error type sets Contributor overrides source for common element type hierarchy Union of error types for non-overlapping element types Multiple contributors applied in order Inp1 and Inp2 By transformation rules Explicit result tuple s{novalue} [inp1{valueerror}]-> OutP{NoValue} all -[inp1{badvalue}]-> ErrorPort{NoValue} 15

Occurrence Properties on Error Flows Occurrence probability on flows On flow sources to represent probability of an error event of a given type within the component resulting in a propagation Occurrence probability can be specified for each error type On flow sinks to represent the probability of an incoming propagation being masked The same incoming propagation can also participate in a flow path for the same type or different types On flow path to represent probability of an incoming propagation of a specific type (or all types) following a given path Occurrence probability of outgoing/incoming propagation Determined from flows to same propagation point Explicitly specified:?? NoData Component A BadData NoData NoData Component A BadData P1 P2 LateData BadData P1 P2 Processor Memory Bus NoResource Processor Memory Bus NoResource 16

Discussion Items -1 Simultaneous error event occurrences Separate (untyped) error events representing value and timing errors Occurrence of timing and value error simultaneously: and condition on error events Error event with type set {ValueError, TimingError} Error event occurrence with tuple: single element or pair according to probability for each error type Separate error event declaration => separate events Simultaneous occurrence: arbitrary order Transition in new state handles second event Error events in separate components Concurrent state machine behavior Fanout & multi-step error propagation Propagation by discrete data communication Multiple component EBSM affected by error propagation 17

Discussion Items - 2 Outgoing propagation probability Probability that the error state or incoming propagation is getting propagated Propagation delay: Time (range) & distribution as property? Inherently absent error events or propagations Occurrence probability of 0 for event => prevention/absence Inherent absence: no replication error because no replication Prevention/elimination: via testing => low to zero probability Occurrence probability of 0 for propagation => masking Error source declaration referring to Contained Error 18

Three Examples Dual Redundant FGS Application with operational modes Application level redundancy Composite error model Link to behavior specification Network Protocols Network protocol layering Separation of logical protocol and physical network Link to behavior specification Triple peer voting in quad redundant system (large V1 example) Detection and decision making Assumption in fault management architecture 19

Error Model Definitions The terms fault, error, failure, and latency of a fault or error are those of [IFIP WG10.4-1992]. (this reference is 20 years old) The above terms have been revisited, when the SAE adopted SAE ARP 4754A (2010-12). It would make sense to define error, fault, fault condition, and other terms per SAE ARP4754A; Note that SAE ARP 4761 provides other definitions; however it is old and is being revised by the SAE. RTCA DO-178C provides additional definitions and is current. 20

21

22

23

24

25