IFAC 2014 Mixed-Criticality Systems based on a Router with Support for Fault Isolation and Selective Fault-Tolerance Roland Kammerer 1, Roman Obermaisser², Mino Sharkhawy 1 1 Vienna University of Technology, Austria ²University of Siegen, Germany
Overview Introduction Fault-Tolerant Router Redundant Routers and Mixed-Criticality Example Discussion
Controller Area Network () Event-triggered communication protocol Widely used in automotive networks Strengths Flexibility and migration support Resource efficiency Low hardware cost Disadvantages Large variability in transmission latencies Arbitration logic limits throughput No consistent atomic multicast No handling of babbling idiot failures
Body Computer Gateway Dynamic Vehicle Control Immobilizer B - Proprietary Bus Security, Sensoring Body Control Diagnosis Access for low speed GPS,GSM Telematic Info C - Break Assistant Engine Control Automatic Gear Adaptive Cruise Control Parking Brake Internal mirror and internal light Pneumatic Pressure Sensor Rain sensor Clima Door driver side coltrol Door passenger side control Instrument cluster Passive Entry Telematic Info for infotelematic Television capture CD Changer External lighting positioning Driver seat Vehicle stability sensor Lock siren Passenger seat Camera Hi-Fii Amplifier TV Angle steering sensor Wiper Drive assistant Steering wheel sensor Airbag Lock/Unlock Steering wheel Trunk R. Obermaisser, P. Peti, Naturwissenschaftlich-Technische F. Tagliabo: "An Integrated Architecture Fakultät for Future Car Generations" In "Real-Time Systems Journal", Volume 36, pp. 101-133, Springer. July 2007.
Challenges in -based Systems: Complexity Management Inherent application complexity Additional complexity ( accidental complexity ) and fault propagation through integration Example Integration of two application subsystem using Controller Area Network () Invalidation of existing services Node H Node Node Node L Node Node Node Node L H Node Node Node Node
Challenges in -based Systems: Diagnosis Faulty node masquerades as another node Data integrity is affected at the receivers when the message of a correct node is overwritten Misinterpretation of message semantics Diagnostic deficiency ID: 4 ID: 5 ID: 6 Plausibility checks correlate message contents with contents of other messages Correct node will appear as faulty at the receiver Host CC Bus CC Host ID: 1 Host CC ID: 5 CC Host Host CC CC Host ID: 2 ID: 3
Fault-Tolerant Router Fault isolation to improve robustness and facilitate complexity management Babbling idiot failures Masquerading failures Exceed limits of Bandwidth Cable length Namespace Improved diagnosis Tackle diagnostic deficiencies Trace back errors to faulty Field Replaceable Units (FRU) Legacy support Electrically compatible ports Interaction with nodes according to the protocol (e.g., use of CSMA/CA, acknowledgment, etc.)
Router Overall Architecture Segment with Multiple s Ethernet Management Port Segment with a Single Port Port Port ROUTER based on the Time-Triggered MPSoC Port Port Port Replacement of bus with a star topology router redirects messages between ports ports provide connection to segments each consisting of one or more nodes and a bus Management port enables the configuration of the router and the retrieval of diagnostic information
Router Overall Architecture Central Gateway 500 kbps 125 kbps 125 kbps Powertrain Comfort 1 Comfort 2 Replacement of bus with a star topology router redirects messages between ports Router ports provide connection to segments each consisting of one or more nodes and a bus Management port enables the configuration of the router and the retrieval of diagnostic information
Functions of the Router Message Rate Control Message Multicasting Message Scheduling Identifier Authentication and Translation Diagnosis and Maintenance
Message Rate Control Each node sends messages using a subset of the overall identifier range Interarrival time between successive message transmissions for a given identifier is a stochastic variable For each identifier a minimum message interarrival constrains the rate for message transmissions router enforces minimum message interarrival times Benefits Isolate the failure of a node Limit effect of a node on the temporal behavior of other nodes
Message Multicasting Messages with a given identifier are relayed to a subset of the other nodes router is equipped with routing tables, which specify for each identifier of a received message the destination ports Broadcasting and point-to-point communication are special use cases of multicasting Benefits Multicasting provides a more efficient use of communication resources than broadcasting
Message Scheduling On a conventional bus high-priority messages are transmitted before competing messages with lower priority The router controls the order of message transmissions based on the message identifiers Priority queue at each port with messages to be sent to the respective segment Multiple outgoing messages are ordered using the priorities A message overwrites any previous message with the same identifier in the priority queue, because receivers are typically interested in the most recent version of a variable Benefits Compatibility to existing networks and nodes
Identifier Authentication and Translation -based systems use a particular message identifier exclusively for a single node to transport a corresponding variable router performs identifier authentication by reserving identifiers for segments and restricting the use of identifiers router performs identifier translation by converting between message identifiers Benefits Prevention of masquerading failures Ability for integration of nodes with incompatible architectural styles (e.g., groups of nodes that use the same identifier for different communicated information) Resolving of naming incoherence (e..g, same identifier for different semantic entities, different identifiers for the same semantic entities)
Diagnosis and Management router collects diagnostic information Violations of minimum message interarrival times Use of invalid message identifiers Error conditions at ports Retrieval of diagnostic information using the management port Benefits Elementary diagnostic capabilities as a foundation for higher diagnostic services (e.g., maintenance-oriented diagnostic analysis algorithms Decreased fault-not-found ratio
Diagnosis and Management (2) Reconfiguration of router at run-time using the management port Addition/removal of messages in the routing tables Modification of minimum message interarrival times Modification of multicast patterns Enabling/disabling of ports to save energy Modification of the identifier translation New configuration becomes active at a consistent instant at all ports No inconsistent or intermediate configurations are used in the redirection of messages through the router Benefits: Support for system evolution (e.g., addition or modification of nodes) Recovery from failures (e.g., activation of stand-by nodes) Basis for energy saving modes and degraded service modes
Realization of the Router interface subsystem for each port controller interacts with bus of a segment Interface to NoC for communication with other interface subsystems Softcore CPU for message filtering, multicasting and ordering Eth. Transceiver Transceiver Transceiver Transceiver Controller SJA 1000 Controller Interface Out In Management Subsystem Interface Subsystem 0 Interface Subsystem 1 Interface Subsystem 7 Nios II Filtering Logic Routing Configuration NoC Interface On-Chip Interconnect: Time-Triggered Network-on-a-Chip Out Port 0 Out Port 1 Out Port 7 In Port 0 In Port 1 In Port 7 TTNoC Priority Queue Mngmt. Out Port Mngmt. In Port
Cyclic Temporal Behavior of the Router Cyclic behavior of router with a period of 2-15 s = 30.52µs Message transmission on time-triggered on-chip network Processing by softcore CPU Interaction with controller Sufficient for maximum load of 1 Mbps from every segment Temporal alignment of communication and computational activities w.r.t. the global time base Implicit synchronization between and TTNoC interfaces Essential for consistent switching to new configuration
Redundant Routers 22
Agreement on Consistent State Redundant routers have to agree on the input they receive from connected nodes Different cable lengths and slightly different sending times of the replicated controllers of a HCN make input agreement necessary Inability to perfectly synchronize the routers Agreement on ingress timestamps for rate-control Agree on the output of the redundant routers sent to destination nodes Differences on outcome of arbitration race on redundant buses (due to imperfect synchronization) 23
Message Flow in a Redundant Router Setup Application forwards message to the software layer residing at the HCN and the message is delivered to at least one router (1) Source CISes redirect critical message to the RedMU (2) RedMUs agree on the message and on the timestamp it was received (3,4) Agreed information is sent back to the source CISes (5) Messages checks based on agreed time and redirection to destination CISes (6) 24
Fault Hypothesis Fault containment regions nodes router Assumed failure mode of the router is fail-silence Internal error detection mechanisms Fault-tolerant multi-core architecture Arbitrary failure mode of node 25
Example 26
Example (2) 27
Discussion Full compatibility to existing networks and nodes Improved fault isolation and selective fault-tolerance Required in safety-relevant applications (e.g., to prevent common mode failures of replicas) Improved robustness and clear integration responsibilities A priori knowledge is used to block faulty messages Limited effect of a faulty node in the temporal domain Redundant message transmission with agreement between redundant routers Exceeding limits of Multicasting results in a more efficient use of the overall bandwidth Overall bandwidth (messages in all segments) can exceeding 1 Mbps, although the bandwidth in each individual segment is limited to 1 Mbps Longer overall network length than a bus Extension of the namespace through identifier translation