SAE AADL Error Model Annex: Discussion Items Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 Peter Feiler phf@sei.cmu.edu April 2012 Sponsored by the U.S. Department of Defense 2011 by Carnegie Mellon University
Outline Recent changes Error annex name Observable error propagations Detection and recovery Transient errors Composite and abstracted error models Error type sets as conditions Unified transitions, outgoing propagations, detections, composite states Mappings and transformations Stochastic error flows Error type ontology Additional discussion items Three Examples in Document 2
Annex Name The annex is identified by a name in annex library and subclause declarations Annex EMV2 {** error model V1 syntax **}; Allows the original and EM V2 to co-exist 3
Error Propagation Paths Observable Error Propagations Example scenario: Co-located processors Without propagation path in core model: no connection or binding relationship Heat dissipation as observable error propagation Component A Component B Overheating Error Processor 1 Processor 2 Observable Error Propagation Path Observable error propagation points and paths Observable Error Propagation Point Observable propagation points and paths added via annex subclause observable_propagation ::= defining_observable_error_propagation_point_identifier : ( in out ) observable propagation error_type_set ; observable_propagation_connection ::= defining_observable_error_propagation_connection_identifier : observable source_qualified_observable_error_propagation_point -> target_qualified_observable_error_propagation_point ; 4
Error Source Specification Error Source of outgoing propagation Indication of intrinsic error types without explicitly declaring error events Es1: error source outp1{itemomission} when {ValueError} Alternative: Error event declaration as part of propagations view Es2: error source outp1{itemomission} when ErrEv1{ValueError} Component A Component B 5
Error Detection and Error Recovery Error detection and reporting Example: novalue as detectable persistent fault Self detection and reporting (internal to component) Detection of error propagation by recipient component Error recovery with probability of success and failure Branch transitions Recover Success Recover Failure Operational BadValueEvent PersistentFaultState out propagation NoValue ErrorOutport RecoverEvent BadValueState On Branch Transition BadValueState [RecoverEvent]-> ( Operational with 0.99, PersistentFaultState with 0.01); Component internal detection and reporting detections PersistentFaultState-[]-> ErrorOutPort {NoValue} ; out propagation BadValue Outport2 Inport1 ErrorOutport External detection and reporting detections all -[Inport1{NoValue}]-> ErrorOutPort {NoValue} ; 6
Transient Errors In error behavior state machine Probability that error event results in transient error behavior Branching transition with branch probability Error state with transition triggered by recover event Recover event has property to indicate time (range) and distribution over time (range) Distribution over single value vs. range Need for explicit characterization of propagation as transient? Transient nature of propagation in impact analysis Inferred from state machine vs. specification w/o state machine as part of error source specification 7
Composite Error Behavior Specification Composite error behavior specification for a component abstracted error behavior model in terms of the subcomponent error models composite error behavior use behavior qualified_error_state_machine_reference composite states [2 ormore (gps1{operational}, gps2.operational, gps3.operational) ]-> Working ; [1 ormore (gps1.failed{critical}, gps2.failed{critical}, gps3.failed{critical})]-> CriticalFailure ; [2 orless (gps1.failed{critical}, gps2.failed{critical}, gps3.failed{critical}) ]-> NonCriticalFailure ; end composite; Incoming and outgoing propagations in terms of the subcomponents Events in abstracted model : relation to composite model Transitions in abstracted model: relation to composite model Diagnosis of subcomponent failures or masked failures GPS gps1 sub1 sub2 sub3 gps2 gps3 Availability in critical and non-critical mode 8
Hierarchies of Independent Error Types Independent error type hierarchies Below are some canonical error type hierarchies Error types from different hierarchies can occur simultaneously Additional error type hierarchies Concurrency (Race Condition, Deadlock, Starvation) Parameters to types Range property for OutOfRange Rate for Rate related Bound (k) for Bounded Item Sequence Omission and Bounded Item Omission Interval 9
Error Type Sets Specification of error type (sets) on events, states, propagations Error types ValueError: type; TimingError: type; NoValue: type extends ValueError; BadValue: type extends ValueError; EarlyValue: type extends TimingError; LateValue: type extends TimingError; ETS1: type set {ValueError, TimingError}; E1: error event {ETS1}; S: state {ValueError, TimingError}; Trans1: S [E1]-> S1; Trans2: S{ValueError, EarlyValue} [ETS1]->S2; TS Two independent type hierarchies Error type set: Powerset of types from two type hierarchies Reference to named error type set Declaration of equivalent error type set constructor Transition triggered by E1 Error Type Set as Constraint {T1, T2} power set of two error type hierarchies {T1} error type set of one type {T1+T2} product type (one error type from each error type hierarchy) {T1,*} error type set with at least one element from type (hierarchy) T1 {NoError} represents the empty set Constrain transition to {EarlyValue} {BadValue} {NoValue} or {EarlyValue+BadValue} {EarlyValue+NoValue} tuples of S 10
Common Syntax Transitions S-[Inp1{BadValue}]-> S2 S-[Inp1{BadValue}]-> mask Component specific transition: Incoming propagation can trigger transition Explicit specification that an event (or incoming propagation) is ignored/masked in a given state Could be expressed as transition to itself all-[inbinding{nopower}]-> Failed Transition from any state to Failed state for given incoming propagation (or event) Outgoing propagations S{BadValue}-[]-> OutP1{BadValue} S{BadValue} [InP1{LateValue}]-> OutP1{BadValue,LateValue} S-[InP1{LateValue}]-> mask all [InP1{LateValue}]-> OutP1{BadValue,LateValue} Detections S{BadValue}-[]-> OutErrorPort1{BadValue} S{BadValue}-[]-> self.event_identifier{badvalue} Composite state State is reflected in outgoing propagation Incoming propagation is masked in a given state [2 ormore (gps1{operational}, gps2.operational, gps3.operational) ]-> Working ; Outgoing propagation determined by incoming propagation in a specific state Outgoing propagation of incoming propagation independent of any state 11
Completeness of Error Behavior Specification Handling of error events in Failure state Unspecified incomplete model Explicitly specify that event is ignored in a given state Option: Specify loopback transition Trans2: Failed [Fault ]-> Failed; Default resulting state type is value of last error event Specify type transformation with result type same as source type Component transition condition with result type (contributor) Need for Mask? Mask has no alternative for propagation declarations. Masking of incoming propagations and outgoing still affected by current state vs. outgoing propagation being NoError Failed [ Fault or Inport1]-> mask; Need for others keyword? Transition branch allows others to add up to 1 As transition condition to complement all others Catchall outgoing transition condition (used in original large example) As composite state specification meaning: Not in any other state 12
{NoError} aka {} NoError as error type constraint: {TimingError} means other error types in error type set are expected to be NoError No need to explicitly specify Noerror for all other error types in same typeset Expected presence of multiple types in tuple via product type declaration NoError as incoming propagation constraint Error Type Set as Constraint {T1, T2} power set of two error type hierarchies {T1} error type set of one type {T1+T2} product type (one error type from each error type hierarchy) {T1,*} error type set with at least one element from type (hierarchy) T1 {NoError} represents the empty set Inport1{LateValue}: don t care or no error on other incoming? Inport1{LateValue} and inport2{earlyvalue}: error on both Inport1{LateValue} and inport2{noerror}: Typed error state without a type token Can error state represent an empty type, i.e., {NoError}? 13
Type Mappings and Transformations Element type mapping and transformation Separate rules for each element type Mapping/transformation within the same type hierarchy BadValue -> NoValue BadValue -[NoValue]-> NoValue Type tuple mapping and transformation The left hand type set constraint must match Mapping/transformation to any result tuple {BadValue} -> {NoValue} {ValueError,TimingError} -[{NoValue}]-> {NoValue} Handling of corner cases Source is {NoError}: default is target = contributor Contributor is {NoError}: default is target = source Target becomes {NoError}: Target as Mask: target is unaffected by contributor Overlap in mapping or transformation rules Only if same result type We do not assume ordering and overriding rules (as was done in York FPTC) 14
Result Type Determination Default result rule for error type sets Contributor overrides source for common element type hierarchy Union of error types for non-overlapping element types Multiple contributors applied in order Inp1 and Inp2 By transformation rules Explicit result tuple s{novalue} [inp1{valueerror}]-> OutP{NoValue} all -[inp1{badvalue}]-> ErrorPort{NoValue} 15
Occurrence Properties on Error Flows Occurrence probability on flows On flow sources to represent probability of an error event of a given type within the component resulting in a propagation Occurrence probability can be specified for each error type On flow sinks to represent the probability of an incoming propagation being masked The same incoming propagation can also participate in a flow path for the same type or different types On flow path to represent probability of an incoming propagation of a specific type (or all types) following a given path Occurrence probability of outgoing/incoming propagation Determined from flows to same propagation point Explicitly specified:?? NoData Component A BadData NoData NoData Component A BadData P1 P2 LateData BadData P1 P2 Processor Memory Bus NoResource Processor Memory Bus NoResource 16
Discussion Items -1 Simultaneous error event occurrences Separate (untyped) error events representing value and timing errors Occurrence of timing and value error simultaneously: and condition on error events Error event with type set {ValueError, TimingError} Error event occurrence with tuple: single element or pair according to probability for each error type Separate error event declaration => separate events Simultaneous occurrence: arbitrary order Transition in new state handles second event Error events in separate components Concurrent state machine behavior Fanout & multi-step error propagation Propagation by discrete data communication Multiple component EBSM affected by error propagation 17
Discussion Items - 2 Outgoing propagation probability Probability that the error state or incoming propagation is getting propagated Propagation delay: Time (range) & distribution as property? Inherently absent error events or propagations Occurrence probability of 0 for event => prevention/absence Inherent absence: no replication error because no replication Prevention/elimination: via testing => low to zero probability Occurrence probability of 0 for propagation => masking Error source declaration referring to Contained Error 18
Three Examples Dual Redundant FGS Application with operational modes Application level redundancy Composite error model Link to behavior specification Network Protocols Network protocol layering Separation of logical protocol and physical network Link to behavior specification Triple peer voting in quad redundant system (large V1 example) Detection and decision making Assumption in fault management architecture 19
Error Model Definitions The terms fault, error, failure, and latency of a fault or error are those of [IFIP WG10.4-1992]. (this reference is 20 years old) The above terms have been revisited, when the SAE adopted SAE ARP 4754A (2010-12). It would make sense to define error, fault, fault condition, and other terms per SAE ARP4754A; Note that SAE ARP 4761 provides other definitions; however it is old and is being revised by the SAE. RTCA DO-178C provides additional definitions and is current. 20
21
22
23
24
25