A Diversity of Duplications

Size: px

Start display at page:

Download "A Diversity of Duplications"

Felix Simon
5 years ago
Views:

1 A Diversity of Duplications David Powell Special event «Dependability of computing systems, Memories and future» in honor of Jean-Claude Laprie LAAS-CNRS, Toulouse, 16 April 2010

2 Duplication error Detection error error Tolerance

3 Outline Some memories on duplication Some recent and ongoing work on duplication

4 Some memories

5 Gordini First duplicated system built in LAAS Detection of HW faults Duplicated bit microprocessors 8 kbytes of parity-checked memory

6 Gordini

7 Bi-Gordini

8 1979 Hair! Gordini with people Gordini Jean-Claude Hiro Ihara

9 Armure A hot standby duplicated system developed for the French space agency in the context of the "SURF national project". Application was as part of the ground segment of the Cospas-Sarsat international satellitebased search-and-rescue system.

10 Armure A guy that worked on the project

11 Delta-4 Pioneering work on duplication implemented by software in a CORBA-like environment: active replication passive replication semi-active replication

12 1987 Delta-4 A guy that didn't work on the project

13 Some recent and ongoing work on duplication

14 A Railway Duplication Context: duplication of fail-safe controllers (coded processors) in automatic subway systems Problem: replica consistency despite unreliable communication

15 Inter-section handover T1 Section A Section B Block lock Negative detectors Controller A Controller B Unregisters trains leaving lock Registers trains entering lock Assigns target: next station or lock, block behind previous train 16

16 Duplication = danger! T2 Section A Section B Block T1 lock Negative detectors Controller A Controller B A1 A2 B1 B2 B1 registers T2 Assigns target Fails (while T2 proceeds) 17

17 Duplication = danger! T3 Section A Section B T2 Block T1 lock Negative detectors Controller A Controller B A1 A2 B1 B2 B2 registers T3 Assigns incorrect target (since it missed T2) 18

18 Problem Consistency between duplicated units Despite unreliable communication provably impossible!

19 Solution: PADRE Fail-safe multicast Protocol for Asynchronous Duplex REdundancy Repair Nominal duplex config. Simplex config. Fault of primary or secondary Fault of primary Potential inconsistency (transmission error) State restoration Repair Fault of secondary (Benign failure) Safe Safe duplex config. Fault of primary Catastrophic failure Nominal service Unsafe

20 Solution: PADRE Fail-safe multicast Protocol for Asynchronous Duplex REdundancy Deployed by Siemens Transportation Systems (previously Matra Transport) In New York (Carnarsie line), Barcelona, Paris (line 3), Roissy Soon in Saõ Paulo (line 4), Paris (line 1), Budapest (lines 2 & 4), Helsinki, Algiers, New York (PATH line)

21 A Robotics Duplication Context: temporal planning for an autonomous robot Problem: insufficient or erroneous knowledge encoded in domain models

22 Software Architecture Goals Decisional Layer Executive Layer Functional Layer Decision making, planning Decompose plan actions into elementary tasks Execution control of elementary tasks Environment sensing Execution of elementary tasks

23 How Do Robots Plan? Planning with IxTeT - planning in a plan space Declarative Model objects actions constraints Domain knowledge Heuristics Goals Search Engine Current Situation Initial partial plan Executive Layer Possible final plans Functional Layer

24 IxTeT Example

25 Problem Domain knowledge (models, heuristics) may be incomplete or wrong Validation intrinsically difficult Can tolerance be envisaged? Multiplicity of valid but incomparable plans What means can be used for detection?

26 Solution: FTplan Model 1 Goals Model 2 Detection before execution Temporal watchdog IxTeT FTplan Executive Layer Functional Layer IxTeT Plan analyzer Detection during/after execution Online goal checker Action failure detection Recovery Sequential planning Concurrent planning Dala robot implementation

27 Solution: FTplan Prototype implementation First diversification of declarative programs Validated by fault injection (model mutation) on simulated Dala robot First fault injection into declarative programs 30-40% goal reliability improvement in presence of injected faults Larger gains to be expected with a plan analyzer

28 An Avionics Duplication Context: connection of a commercial laptop to a life-critical system (i.e., an aircraft) Problem: malicious intrusion into laptop s COTS operating system

29 Maintenance laptop Pilot Maintenance engineer Onboard equipment Flight logbook Maintenance terminal Paper manuals Electronic manuals

30 Maintenance laptop Pilot Maintenance engineer Onboard equipment Flight logbook Maintenance terminal Paper manuals Maintenance laptop

31 Connecting a laptop Flight management Aircraft management Aircraft information system "Off-board"

32 Connecting a laptop Flight management Aircraft management Aircraft information system? "Off-board"

33 Enabling technologies Totel et al s "multi-level integrity" model framework for multiple criticality levels in a single system trusted computing base for isolation and mediation fault-tolerance to allow data to flow from low to high Platform virtualization techniques isolation between virtual machines attractive approach for implementing TCB

View Model Solution: Virtual Duplication

Controller AspectJ 2 View 2 AspectJ SWING

34 View Model Solution: Virtual Duplication ArSec «Architecture de Sécurités» to aircraft equipment 6' Model VO 6" Controller View 3 Controller' 3 Controller AspectJ 2 View 2 AspectJ SWING SWING SWING JVM JVM Safe VM JVM Hypervisor 1 7 Error XEN Hardware

35 Model Controller?! 6" VO View Controller' Model View Corruption attack ArSec «Architecture de Sécurités» 6' 3 3 Controller AspectJ 2 View 2 AspectJ SWING SWING SWING JVM JVM Safe VM JVM Hypervisor 1 7 Error XEN Hardware

36 Model Controller?! 6" VO View Controller' Model View Timing attack ArSec «Architecture de Sécurités» 6' 3 3 Controller AspectJ 2 View 2 AspectJ SWING SWING SWING JVM JVM Safe VM JVM Hypervisor 1 7 Error XEN Hardware

37 Reaction to attack ArSec «Architecture de Sécurités» X 6' X 6?! Model 6" VO Controller View Controller' View 2 AspectJ AspectJ SWING SWING SWING JVM JVM JVM Safe VM 1 7 Hypervisor Error XXEN Model Controller View Hardware Reboot Change laptops Revert to maintenance terminal

Availability Domain knowledge deficiencies Diversified domain models Good diversity

38 Summary Context Objective Problem Solution PADRE Railways Availability & Safety Unreliable communication Bad diversity Fail-safe asynchronous multicast FTplan Robotics Availability Domain knowledge deficiencies Diversified domain models Good diversity ArSec Avionics Security & Safety Malicious intrusion Virtualization & diversified OS s Good diversity

39 The Future ArSec «Architecture de Sécurités» Dealing with the dichotomy between: Good diversity: favors independent manifestation of design faults (including vulnerabilities) allowing their tolerance Bad diversity: causes non-deterministic behavior that gives rise to false positives Research directions for dealing with bad diversity: Constraints on internal operation of virtual machines (e.g., thread scheduling) without reducing good diversity Constraints on programmers (e.g., programming styles) without reducing ease-of-programming

40 Dependability : a Unifying Concept for Reliable Computing (FTCS-12)

41 A Diversity of Duplications "35 years of duplication without doing the same thing twice"

ENSURING SAFETY AND SECURITY FOR AVIONICS: A CASE STUDY

ENSURING SAFETY AND SECURITY FOR AVIONICS: A CASE STUDY Youssef Laarouchi 1,2, Yves Deswarte 1,2, David Powell 1,2, Jean Arlat 1,2, Eric De Nadai 3 1 CNRS ; LAAS ; 7 avenue du colonel Roche, F-31077 Toulouse,