Model-based Architectural Verification & Validation Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 Peter H Feiler Feb. 2009 2006 Carnegie Mellon University
Outline Architecture-Centric Model-based Engineering Multi-fidelity Model-based Analysis Validation of Implementations 2
Airbus Auto-Pilot Problem Fallback solution also computer-controlled 3
Mismatched Assumptions System Engineer Physical Plant Characteristics Control Engineer Hardware Engin neer System Under Control Compute Platform Data Stream Characteristics Distribution Redundancy Runtime Architecture Embedded SW System Engineer Control System Application Software Concurrency Communication Precision Units Applicat tion Developer Why do system level failures still occur despite fault tolerance techniques being deployed in systems? 4
System Level Fault Root Causes Data (stream) consistency End-to-end latency analysis Stream miss rates, Mismatched data representation, Latency jitter & age Violation of data stream assumptions Partitions as Isolation Regions Space, time, and bandwidth partitioning Isolation not guaranteed due to undocumented resource sharing fault containment, security levels, safety levels, distribution Fault propagation security analysis Logical vs. physical redundancy redundancy patterns Virtualization of time & resources Time stamping of data & asynchronous systems Inconsistent System States & Interactions Modal systems with modal components Concurrency & redundancy management Application level interaction protocols Modeling of partitioned architectures Validation by model checking & proofs 5
Potential Model-based Engineering Pitfalls The system Inconsistency between independently developed analytical models System models Confidence that model reflects implementation System implementation Models are more than Powerpoint pictures or block diagrams 6
Architecture-Centric Modeling Approach Availability & Reliability MTBF FMEA Hazard analysis Single Source Annotated Architecture Model Architecture Model Security Intrusion Integrity Confidentiality Data Quality Data precision/ accuracy Temporal correctness Confidence Auto-generated analytical models Real-time Performance Execution time/ Deadline Deadlock/starvation Latency Resource Consumption Bandwidth CPU time Power consumption Impact Across Quality Dimensions 7 7
Outline Architecture-Centric Model-based Engineering Multi-fidelity Model-based Analysis Validation of Implementations 8
Latency Contributors Operational Environment System Engineer System Under Control Control Engineer Control System Processing latency Sampling latency Physical signal latency 9
Impact of Sampling Latency Jitter Impact of Scheduler Choice on Controller Stability A. Cervin, Lund U., CCACSD 2006 Sampling jitter due execution time jitter and application-driven send/receive 10
11
Partition-Level Flow Latency Subsystem latency exceeds expected latency Lower bound latency inherent to partition architecture 12
Managed Latency Jitter through Deterministic Sampling From Partitions Nav signal data Navigation Sensor Processing Nav sensor data 20Hz Nav sensor data Integrated Navigation 10Hz Nav data Guidance Processing 20Hz Periodic I/O 20Hz Guidance To Partitions Input-compute-output (ICO) AADL thread semantics Immediate and delayed data port connections for deterministic sampling Fuel Flow Nav data FP data Aircraft Performance Calculation 2Hz Flight Plan Processing Performance data 5Hz FP data 13
Latency Revisited Latency has increased 14 14
Software-Based Latency Contributors Execution time variation: algorithm, use of cache Processor speed Resource contention Preemption Legacy & shared variable communication Rate group optimization Protocol specific communication delay Partitioned architecture Migration of functionality Fault tolerance strategy 15
Outline Architecture-Centric Model-based Engineering Multi-fidelity Model-based Analysis Validation of Implementations 16
Options Implements model semantics Validate generator vs. source code 17
Double Buffering From Customer Design Document The 200 Hz update rate was used because the MUX data needed to be processed at twice the rate of the fastest channel to avoid a race condition. Because channel 3 operates at 100 Hz, the IO processor had to operate at 200 Hz. The race condition has been fixed by double-buffering data, but the IO processor execution rate was left at 200 Hz to reduce latency of MUX data. Did double buffering solved the problem or do we need to do more buffering? 18 18
33 Application-based Send and Receive (ASR) (τ P τ C )* MP 3 buffers for ICO guarantee α P S&X Ω P MR T P α P S Ω P D P T C α C R Ω C D C α C R Ω C α : actual execution start time MC Ω : actual completion time α P - Ω P α C - Ω C non-deterministic sampling (S/R) order 19
34 Periodic Task Communication Summary Periodic ASR DSR DMT Same period IMT PMT IMT PMT τ P ; τ C MF:1B PD:2B S X R PD:2B R PD:2B S X/R MF:1B τ C ; τ P PD:1B PD:1B PD:1B PD:1B PD:1B τ P τ C ND:1B PD:2B X ND:1B PD:2B PD:2B PD:2B ND:1B R X/R τ P τ C ND:3B S/X C R C PD:2B X PD:2B R PD:2B X/R NDI:2B S/X/R C MF: Mid-Frame PD: Period Delay ND: Non-Deterministic NDI: No Data Integrity 1B: Single buffer 2B: Two buffers 3B: Three buffers 4B: Four buffers S, X, R : data copy S/X : IMT combined send/xfer S/X/R : DMT combined S, X, R X/R: DSR/PMT combined X, R o1 o2 : One operation copy 20
Dual Redundant Flight Guidance System 21
Mode Logic Specification Synchronous system case Asynchronous system case Component failure case Implementation Samples Distributed Mode State To validate: 1) At least one output 2) Exactly one output 3) Two outputs in critical mode Increased complexity of property 22
Modeling & Validation Issues Observation of events by sampling state Modeling in AADL helped identify issues Corrected asynchronous solution ignores pilot input events Push button requires complete event stream Distributed processing of mode state machine Central vs. distributed logic Fail safe coordination of state transitions Properties during mode transition Clock drift in asynchronous system Acceptable drift: bounded vs. fault Built-in drift bound through period: period wrap-around Loss of data stream element due to wrap around Button input as event to data port Reduced property complexity by mode transition as state Synchronization domains & fault conditions Event processing vs. data sampling 23
Quantified Out-of-sync Modes 24
Subtle Errors LM Aero UAV Sensor Voting OFP Triplex Voter 96 Simulink Subsystems 3 Stateflow Diagrams 6x10 13 Reachable States Formal Verification 25 Informal Requirements 57 Formal Properties 10 100+ 40 Minutes reachable to Analyze states in Rockwell examples 1 sync 2 input_a 3 input_b 4 input_c 5 status_a 6 status_b 7 status_c 8 dst_index DOC Text Resulting In sync<> [trigger] [A] [B] [C] [status_a] [status_b] [status_c] [DSTi] [trigger] [A] [B] [C] trip_level trip_level1 persist_lim persistence limit [MS] totalizer_lim persistence limit1 [DSTi] DST Data Store Read trip_level persist_lim Index Vector input_a input_b input_c trip_level persist_lim MS totalizer_lim Extract Bits u16 [0 3] failreport triplex_input_monitor pc tc failreport [MS] [status_a] [status_b] [status_c] [prev_sel] [A] [B] [C] persistence_cnt<pc> 24 Counterexamples 3 totalizer_cnt<tc> totalizer_cnt 10 Design Modifications Formal verification has found Several subtle Requirements errors that Clarifications would likely be missed by traditional testing. mon_failure_report status_a status_b status_c prev_sel input_a input_b input_c 2 persistence_cnt [trigger] [A] [B] [DSTi] failure_report Failure_Isolation pc trigger input_a input_b input_c DST_index input_sel triplex_input_selector - Lockheed Martin [DSTi] 1 failure_report 25 [C] 4 input_sel 1 z Unit Delay failure_report dst_index Failure_Processing [prev_sel] 25
Towards Architecture Centric Engineering Build on architecture tradeoff analysis (e.g., SEI ATAM) Provides focused evaluation method MBE/AADL provides quantitative analysis & starter models to build on Facilitate pattern-based technical architecture root cause analysis Identify systemic risks in technology migration and refresh AADL provides semantic framework to reason about technical problem areas and potential mitigation strategies Scalability through architecture extraction Leverage existing design data bases Challenge: abstract away from design details Focus on what instead of how Support system and software assurance Provides structured approach to safety/dependability assurance MBE/AADL provides evidence based on validated models 26
Peter H Feiler phf@sei.cmu.edu