A Diversity of Duplications

Size: px
Start display at page:

Download "A Diversity of Duplications"

Transcription

1 A Diversity of Duplications David Powell Special event «Dependability of computing systems, Memories and future» in honor of Jean-Claude Laprie LAAS-CNRS, Toulouse, 16 April 2010

2 Duplication error Detection error error Tolerance

3 Outline Some memories on duplication Some recent and ongoing work on duplication

4 Some memories

5 Gordini First duplicated system built in LAAS Detection of HW faults Duplicated bit microprocessors 8 kbytes of parity-checked memory

6 Gordini

7 Bi-Gordini

8 1979 Hair! Gordini with people Gordini Jean-Claude Hiro Ihara

9 Armure A hot standby duplicated system developed for the French space agency in the context of the "SURF national project". Application was as part of the ground segment of the Cospas-Sarsat international satellitebased search-and-rescue system.

10 Armure A guy that worked on the project

11 Delta-4 Pioneering work on duplication implemented by software in a CORBA-like environment: active replication passive replication semi-active replication

12 1987 Delta-4 A guy that didn't work on the project

13 Some recent and ongoing work on duplication

14 A Railway Duplication Context: duplication of fail-safe controllers (coded processors) in automatic subway systems Problem: replica consistency despite unreliable communication

15 Inter-section handover T1 Section A Section B Block lock Negative detectors Controller A Controller B Unregisters trains leaving lock Registers trains entering lock Assigns target: next station or lock, block behind previous train 16

16 Duplication = danger! T2 Section A Section B Block T1 lock Negative detectors Controller A Controller B A1 A2 B1 B2 B1 registers T2 Assigns target Fails (while T2 proceeds) 17

17 Duplication = danger! T3 Section A Section B T2 Block T1 lock Negative detectors Controller A Controller B A1 A2 B1 B2 B2 registers T3 Assigns incorrect target (since it missed T2) 18

18 Problem Consistency between duplicated units Despite unreliable communication provably impossible!

19 Solution: PADRE Fail-safe multicast Protocol for Asynchronous Duplex REdundancy Repair Nominal duplex config. Simplex config. Fault of primary or secondary Fault of primary Potential inconsistency (transmission error) State restoration Repair Fault of secondary (Benign failure) Safe Safe duplex config. Fault of primary Catastrophic failure Nominal service Unsafe

20 Solution: PADRE Fail-safe multicast Protocol for Asynchronous Duplex REdundancy Deployed by Siemens Transportation Systems (previously Matra Transport) In New York (Carnarsie line), Barcelona, Paris (line 3), Roissy Soon in Saõ Paulo (line 4), Paris (line 1), Budapest (lines 2 & 4), Helsinki, Algiers, New York (PATH line)

21 A Robotics Duplication Context: temporal planning for an autonomous robot Problem: insufficient or erroneous knowledge encoded in domain models

22 Software Architecture Goals Decisional Layer Executive Layer Functional Layer Decision making, planning Decompose plan actions into elementary tasks Execution control of elementary tasks Environment sensing Execution of elementary tasks

23 How Do Robots Plan? Planning with IxTeT - planning in a plan space Declarative Model objects actions constraints Domain knowledge Heuristics Goals Search Engine Current Situation Initial partial plan Executive Layer Possible final plans Functional Layer

24 IxTeT Example

25 Problem Domain knowledge (models, heuristics) may be incomplete or wrong Validation intrinsically difficult Can tolerance be envisaged? Multiplicity of valid but incomparable plans What means can be used for detection?

26 Solution: FTplan Model 1 Goals Model 2 Detection before execution Temporal watchdog IxTeT FTplan Executive Layer Functional Layer IxTeT Plan analyzer Detection during/after execution Online goal checker Action failure detection Recovery Sequential planning Concurrent planning Dala robot implementation

27 Solution: FTplan Prototype implementation First diversification of declarative programs Validated by fault injection (model mutation) on simulated Dala robot First fault injection into declarative programs 30-40% goal reliability improvement in presence of injected faults Larger gains to be expected with a plan analyzer

28 An Avionics Duplication Context: connection of a commercial laptop to a life-critical system (i.e., an aircraft) Problem: malicious intrusion into laptop s COTS operating system

29 Maintenance laptop Pilot Maintenance engineer Onboard equipment Flight logbook Maintenance terminal Paper manuals Electronic manuals

30 Maintenance laptop Pilot Maintenance engineer Onboard equipment Flight logbook Maintenance terminal Paper manuals Maintenance laptop

31 Connecting a laptop Flight management Aircraft management Aircraft information system "Off-board"

32 Connecting a laptop Flight management Aircraft management Aircraft information system? "Off-board"

33 Enabling technologies Totel et al s "multi-level integrity" model framework for multiple criticality levels in a single system trusted computing base for isolation and mediation fault-tolerance to allow data to flow from low to high Platform virtualization techniques isolation between virtual machines attractive approach for implementing TCB

34 View Model Solution: Virtual Duplication ArSec «Architecture de Sécurités» to aircraft equipment 6' Model VO 6" Controller View 3 Controller' 3 Controller AspectJ 2 View 2 AspectJ SWING SWING SWING JVM JVM Safe VM JVM Hypervisor 1 7 Error XEN Hardware

35 Model Controller?! 6" VO View Controller' Model View Corruption attack ArSec «Architecture de Sécurités» 6' 3 3 Controller AspectJ 2 View 2 AspectJ SWING SWING SWING JVM JVM Safe VM JVM Hypervisor 1 7 Error XEN Hardware

36 Model Controller?! 6" VO View Controller' Model View Timing attack ArSec «Architecture de Sécurités» 6' 3 3 Controller AspectJ 2 View 2 AspectJ SWING SWING SWING JVM JVM Safe VM JVM Hypervisor 1 7 Error XEN Hardware

37 Reaction to attack ArSec «Architecture de Sécurités» X 6' X 6?! Model 6" VO Controller View Controller' View 2 AspectJ AspectJ SWING SWING SWING JVM JVM JVM Safe VM 1 7 Hypervisor Error XXEN Model Controller View Hardware Reboot Change laptops Revert to maintenance terminal

38 Summary Context Objective Problem Solution PADRE Railways Availability & Safety Unreliable communication Bad diversity Fail-safe asynchronous multicast FTplan Robotics Availability Domain knowledge deficiencies Diversified domain models Good diversity ArSec Avionics Security & Safety Malicious intrusion Virtualization & diversified OS s Good diversity

39 The Future ArSec «Architecture de Sécurités» Dealing with the dichotomy between: Good diversity: favors independent manifestation of design faults (including vulnerabilities) allowing their tolerance Bad diversity: causes non-deterministic behavior that gives rise to false positives Research directions for dealing with bad diversity: Constraints on internal operation of virtual machines (e.g., thread scheduling) without reducing good diversity Constraints on programmers (e.g., programming styles) without reducing ease-of-programming

40 Dependability : a Unifying Concept for Reliable Computing (FTCS-12)

41 A Diversity of Duplications "35 years of duplication without doing the same thing twice"

ENSURING SAFETY AND SECURITY FOR AVIONICS: A CASE STUDY

ENSURING SAFETY AND SECURITY FOR AVIONICS: A CASE STUDY ENSURING SAFETY AND SECURITY FOR AVIONICS: A CASE STUDY Youssef Laarouchi 1,2, Yves Deswarte 1,2, David Powell 1,2, Jean Arlat 1,2, Eric De Nadai 3 1 CNRS ; LAAS ; 7 avenue du colonel Roche, F-31077 Toulouse,

More information

CprE 458/558: Real-Time Systems. Lecture 17 Fault-tolerant design techniques

CprE 458/558: Real-Time Systems. Lecture 17 Fault-tolerant design techniques : Real-Time Systems Lecture 17 Fault-tolerant design techniques Fault Tolerant Strategies Fault tolerance in computer system is achieved through redundancy in hardware, software, information, and/or computations.

More information

PADRE : A Protocol for Asymmetric Duplex REdundancy

PADRE : A Protocol for Asymmetric Duplex REdundancy PADRE : A Protocol for Asymmetric Duplex REdundancy D. Essamé, J. Arlat, D. Powell LAAS-CNRS, 7 avenue du colonel Roche, 31077 Toulouse cedex 4, France {essame, arlat, dpowell}@laas.fr Abstract Safety

More information

CS603: Distributed Systems

CS603: Distributed Systems CS603: Distributed Systems Lecture 1: Basic Communication Services Cristina Nita-Rotaru Lecture 1/ Spring 2006 1 Reference Material Textbooks Ken Birman: Reliable Distributed Systems Recommended reading

More information

Providing Real-Time and Fault Tolerance for CORBA Applications

Providing Real-Time and Fault Tolerance for CORBA Applications Providing Real-Time and Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS University Pittsburgh, PA 15213-3890 Sponsored in part by the CMU-NASA High Dependability Computing

More information

Fault Tolerant Computing CS 530

Fault Tolerant Computing CS 530 Fault Tolerant Computing CS 530 Lecture Notes 1 Introduction to the class Yashwant K. Malaiya Colorado State University 1 Instructor, TA Instructor: Yashwant K. Malaiya, Professor malaiya @ cs.colostate.edu

More information

Fault Tolerance. The Three universe model

Fault Tolerance. The Three universe model Fault Tolerance High performance systems must be fault-tolerant: they must be able to continue operating despite the failure of a limited subset of their hardware or software. They must also allow graceful

More information

Eliminating Single Points of Failure in Software Based Redundancy

Eliminating Single Points of Failure in Software Based Redundancy Eliminating Single Points of Failure in Software Based Redundancy Peter Ulbrich, Martin Hoffmann, Rüdiger Kapitza, Daniel Lohmann, Reiner Schmid and Wolfgang Schröder-Preikschat EDCC May 9, 2012 SYSTEM

More information

Priya Narasimhan. Assistant Professor of ECE and CS Carnegie Mellon University Pittsburgh, PA

Priya Narasimhan. Assistant Professor of ECE and CS Carnegie Mellon University Pittsburgh, PA OMG Real-Time and Distributed Object Computing Workshop, July 2002, Arlington, VA Providing Real-Time and Fault Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS Carnegie

More information

Today: Fault Tolerance

Today: Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Systems IT332 Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 18 Chapter 7 Case Studies Part.18.1 Introduction Illustrate practical use of methods described previously Highlight fault-tolerance

More information

Today: Fault Tolerance. Fault Tolerance

Today: Fault Tolerance. Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Physical Storage Media

Physical Storage Media Physical Storage Media These slides are a modified version of the slides of the book Database System Concepts, 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan. Original slides are available

More information

TSW Reliability and Fault Tolerance

TSW Reliability and Fault Tolerance TSW Reliability and Fault Tolerance Alexandre David 1.2.05 Credits: some slides by Alan Burns & Andy Wellings. Aims Understand the factors which affect the reliability of a system. Introduce how software

More information

Fault Tolerance for Highly Available Internet Services: Concept, Approaches, and Issues

Fault Tolerance for Highly Available Internet Services: Concept, Approaches, and Issues Fault Tolerance for Highly Available Internet Services: Concept, Approaches, and Issues By Narjess Ayari, Denis Barbaron, Laurent Lefevre and Pascale primet Presented by Mingyu Liu Outlines 1.Introduction

More information

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements

More information

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski Distributed Systems 09. State Machine Replication & Virtual Synchrony Paul Krzyzanowski Rutgers University Fall 2016 1 State machine replication 2 State machine replication We want high scalability and

More information

Data Backup for Mobile Nodes : a Cooperative Middleware and an Experimentation Platform

Data Backup for Mobile Nodes : a Cooperative Middleware and an Experimentation Platform Data Backup for Mobile Nodes : a Cooperative Middleware and an Experimentation Platform Marc-Olivier Killijian Matthieu Roy Gaétan Séverac Christophe Zanon roy@laas.fr http://theresumeexperience.blogspot.com/

More information

Computer-Based Control System Safety Requirements

Computer-Based Control System Safety Requirements Computer-Based Control System Safety Requirements International Space Station Program Revision B November 17, 1995 National Aeronautics and Space Administration International Space Station Program Johnson

More information

Part 2: Basic concepts and terminology

Part 2: Basic concepts and terminology Part 2: Basic concepts and terminology Course: Dependable Computer Systems 2012, Stefan Poledna, All rights reserved part 2, page 1 Def.: Dependability (Verlässlichkeit) is defined as the trustworthiness

More information

Software Diversity and Fault-Tolerance: An Overview

Software Diversity and Fault-Tolerance: An Overview Software Diversity and Fault-Tolerance: An Overview Daniel Rodriguez Retamosa and Mehrdad Saadatmand Mälardalen Real-Time Research Centre (MRTC) Mälardalen University Västerås, Sweden dra05002@student.mdh.se,

More information

Dep. Systems Requirements

Dep. Systems Requirements Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small

More information

High Availability and Disaster Recovery Solutions for Perforce

High Availability and Disaster Recovery Solutions for Perforce High Availability and Disaster Recovery Solutions for Perforce This paper provides strategies for achieving high Perforce server availability and minimizing data loss in the event of a disaster. Perforce

More information

Practical Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance Practical Byzantine Fault Tolerance Robert Grimm New York University (Partially based on notes by Eric Brewer and David Mazières) The Three Questions What is the problem? What is new or different? What

More information

Planning with Diversified Models for Fault-Tolerant Robots

Planning with Diversified Models for Fault-Tolerant Robots Planning with Diversified Models for Fault-Tolerant Robots Benjamin Lussier, Matthieu Gallien, Jérémie Guiochet, Félix Ingrand, Marc-Olivier Killijian, David Powell LAAS-CNRS, University of Toulouse, France

More information

Dependability tree 1

Dependability tree 1 Dependability tree 1 Means for achieving dependability A combined use of methods can be applied as means for achieving dependability. These means can be classified into: 1. Fault Prevention techniques

More information

Fault Tolerance. Basic Concepts

Fault Tolerance. Basic Concepts COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time

More information

Today: Fault Tolerance. Replica Management

Today: Fault Tolerance. Replica Management Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery

More information

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical

More information

2014 Software Global Client Conference

2014 Software Global Client Conference WW HMI SCADA-10 Best practices for distributed SCADA Stan DeVries Senior Director Solutions Architecture What is Distributed SCADA? It s much more than a distributed architecture (SCADA always has this)

More information

CPLD Developement & Nuclear Safety (NS) Constraints

CPLD Developement & Nuclear Safety (NS) Constraints CPLD Developement & Nuclear Safety (NS) Constraints SUMMARY NEXEYA EQUIPEMENT ARCHITECTURE DEVELOPEMENT VALIDATION TEST MAINTENANCE TEST BENCH REX (return of experience) COTS NEXEYA 3 NEXEYA Staff > 1

More information

Critical Systems. Objectives. Topics covered. Critical Systems. System dependability. Importance of dependability

Critical Systems. Objectives. Topics covered. Critical Systems. System dependability. Importance of dependability Objectives Critical Systems To explain what is meant by a critical system where system failure can have severe human or economic consequence. To explain four dimensions of dependability - availability,

More information

Time-Triggered Ethernet

Time-Triggered Ethernet Time-Triggered Ethernet Chapters 42 in the Textbook Professor: HONGWEI ZHANG CSC8260 Winter 2016 Presented By: Priyank Baxi (fr0630) fr0630@wayne.edu Outline History Overview TTEthernet Traffic Classes

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

Riccardo Mariani, Intel Fellow, IOTG SEG, Chief Functional Safety Technologist

Riccardo Mariani, Intel Fellow, IOTG SEG, Chief Functional Safety Technologist Riccardo Mariani, Intel Fellow, IOTG SEG, Chief Functional Safety Technologist Internet of Things Group 2 Internet of Things Group 3 Autonomous systems: computing platform Intelligent eyes Vision. Intelligent

More information

TU Wien. Fault Isolation and Error Containment in the TT-SoC. H. Kopetz. TU Wien. July 2007

TU Wien. Fault Isolation and Error Containment in the TT-SoC. H. Kopetz. TU Wien. July 2007 TU Wien 1 Fault Isolation and Error Containment in the TT-SoC H. Kopetz TU Wien July 2007 This is joint work with C. El.Salloum, B.Huber and R.Obermaisser Outline 2 Introduction The Concept of a Distributed

More information

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable

More information

Software Architecture. Lecture 4

Software Architecture. Lecture 4 Software Architecture Lecture 4 Last time We discussed tactics to achieve architecture qualities We briefly surveyed architectural styles 23-Jan-08 http://www.users.abo.fi/lpetre/sa08/ 2 Today We check

More information

CrashOS: Hypervisor testing tool

CrashOS: Hypervisor testing tool ISSRE 2017 Anaïs GANTET - Airbus Digital Security October 2017 Outline 1 Why CrashOS? 2 CrashOS presentation 3 Vulnerability research and results October 2017 2 ISSRE Outline 1 Why CrashOS? 2 CrashOS presentation

More information

Virtually Eliminating Router Bugs

Virtually Eliminating Router Bugs Virtually Eliminating Router Bugs Eric Keller, Minlan Yu, Matt Caesar, Jennifer Rexford Princeton University, UIUC NANOG 46: Philadelphia, PA Dealing with router bugs Internet s complexity implemented

More information

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to

More information

Reliable Statements about a Fault-Tolerant X-by-Wire ecar. Reliable Statements about a Fault-Tolerant X-by-Wire ecar Unrestricted 2017 Siemens AG

Reliable Statements about a Fault-Tolerant X-by-Wire ecar. Reliable Statements about a Fault-Tolerant X-by-Wire ecar Unrestricted 2017 Siemens AG Reliable Statements about a Fault-Tolerant X-by-Wire ecar Reliable Statements about a Fault-Tolerant X-by-Wire ecar Unrestricted 2017 Siemens AG Reliable Statements about a Fault-Tolerant X-by-Wire ecar

More information

CDA 5140 Software Fault-tolerance. - however, reliability of the overall system is actually a product of the hardware, software, and human reliability

CDA 5140 Software Fault-tolerance. - however, reliability of the overall system is actually a product of the hardware, software, and human reliability CDA 5140 Software Fault-tolerance - so far have looked at reliability as hardware reliability - however, reliability of the overall system is actually a product of the hardware, software, and human reliability

More information

Advanced Systems Security: Virtual Machine Systems

Advanced Systems Security: Virtual Machine Systems Systems and Internet Infrastructure Security Network and Security Research Center Department of Computer Science and Engineering Pennsylvania State University, University Park PA Advanced Systems Security:

More information

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit. Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication One-one communication One-many communication Distributed commit Two phase commit Failure recovery

More information

Pattern-Based Analysis of an Embedded Real-Time System Architecture

Pattern-Based Analysis of an Embedded Real-Time System Architecture Pattern-Based Analysis of an Embedded Real-Time System Architecture Peter Feiler Software Engineering Institute phf@sei.cmu.edu 412-268-7790 Outline Introduction to SAE AADL Standard The case study Towards

More information

Introduction to Software Fault Tolerance Techniques and Implementation. Presented By : Hoda Banki

Introduction to Software Fault Tolerance Techniques and Implementation. Presented By : Hoda Banki Introduction to Software Fault Tolerance Techniques and Implementation Presented By : Hoda Banki 1 Contents : Introduction Types of faults Dependability concept classification Error recovery Types of redundancy

More information

Reliable Distributed System Approaches

Reliable Distributed System Approaches Reliable Distributed System Approaches Manuel Graber Seminar of Distributed Computing WS 03/04 The Papers The Process Group Approach to Reliable Distributed Computing K. Birman; Communications of the ACM,

More information

Survey of Cyber Moving Targets. Presented By Sharani Sankaran

Survey of Cyber Moving Targets. Presented By Sharani Sankaran Survey of Cyber Moving Targets Presented By Sharani Sankaran Moving Target Defense A cyber moving target technique refers to any technique that attempts to defend a system and increase the complexity of

More information

A FAULT- AND INTRUSION-TOLERANT ARCHITECTURE FOR THE PORTUGUESE POWER DISTRIBUTION SCADA

A FAULT- AND INTRUSION-TOLERANT ARCHITECTURE FOR THE PORTUGUESE POWER DISTRIBUTION SCADA A FAULT- AND INTRUSION-TOLERANT ARCHITECTURE FOR THE PORTUGUESE POWER DISTRIBUTION SCADA Nuno Medeiros Alysson Bessani 1 Context: EDP Distribuição EDP Distribuição is the utility responsible for the distribution

More information

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following: CS 47 Spring 27 Mike Lam, Professor Fault Tolerance Content taken from the following: "Distributed Systems: Principles and Paradigms" by Andrew S. Tanenbaum and Maarten Van Steen (Chapter 8) Various online

More information

Fault-tolerant techniques

Fault-tolerant techniques What are the effects if the hardware or software is not fault-free in a real-time system? What causes component faults? Specification or design faults: Incomplete or erroneous models Lack of techniques

More information

Green Lights Forever: Analyzing the Security of Traffic Infrastructure

Green Lights Forever: Analyzing the Security of Traffic Infrastructure Green Lights Forever: Analyzing the Security of Traffic Infrastructure RAJSHAKHAR PAUL Outline Introduction Anatomy of a Traffic Infrastructure Case Study Threat Model Types of Attack Recommendation Broader

More information

Toward Intrusion Tolerant Clouds

Toward Intrusion Tolerant Clouds Toward Intrusion Tolerant Clouds Prof. Yair Amir, Prof. Vladimir Braverman Daniel Obenshain, Tom Tantillo Department of Computer Science Johns Hopkins University Prof. Cristina Nita-Rotaru, Prof. Jennifer

More information

Applying MILS to multicore avionics systems

Applying MILS to multicore avionics systems Applying MILS to multicore avionics systems Eur Ing Paul Parkinson FIET Principal Systems Architect, A&D EuroMILS Workshop, Prague, 19 th January 2016 2016 Wind River. All Rights Reserved. Agenda A Brief

More information

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

More information

Safety SPL/2010 SPL/20 1

Safety SPL/2010 SPL/20 1 Safety 1 system designing for concurrent execution environments system: collection of objects and their interactions system properties: Safety - nothing bad ever happens Liveness - anything ever happens

More information

Cyber Moving Targets. Yashar Dehkan Asl

Cyber Moving Targets. Yashar Dehkan Asl Cyber Moving Targets Yashar Dehkan Asl Introduction An overview of different cyber moving target techniques, their threat models, and their technical details. Cyber moving target technique: Defend a system

More information

Replace Single Server or Cluster

Replace Single Server or Cluster Caution Because this process is designed to work as a server replacement, you must perform it in the live environment. Cisco does not recommend doing this process on a dead net because a duplication of

More information

Mixed Critical Architecture Requirements (MCAR)

Mixed Critical Architecture Requirements (MCAR) Superior Products Through Innovation Approved for Public Release; distribution is unlimited. (PIRA AER200905019) Mixed Critical Architecture Requirements (MCAR) Copyright 2009 Lockheed Martin Corporation

More information

FP7-4: Introduction to Reliability and Fault Tolerance. FP7-4: Introduction to Reliability and Fault Tolerance. The NASA Mars Space Mission

FP7-4: Introduction to Reliability and Fault Tolerance. FP7-4: Introduction to Reliability and Fault Tolerance. The NASA Mars Space Mission FP7-4: Introduction to Reliability and Fault Tolerance Youmin Zhang Phone: 7912 7741 Office Location: FUV 0.22 Email: ymzhang@cs.aaue.dk http://www.cs.aaue.dk/~ymzhang/courses/reliability/index.html FP7-4:

More information

Leslie Lamport. April 20, Leslie Lamport. Jenny Tyrväinen. Introduction. Education and Career. Most important works.

Leslie Lamport. April 20, Leslie Lamport. Jenny Tyrväinen. Introduction. Education and Career. Most important works. April 20, 2016 Born February 7 1941 in New York Mathematician by his education Has worked in industry, not an academic Fields: concurrency and distributed systems Lists 180 publications and other texts

More information

From eventual to strong consistency. Primary-Backup Replication. Primary-Backup Replication. Replication State Machines via Primary-Backup

From eventual to strong consistency. Primary-Backup Replication. Primary-Backup Replication. Replication State Machines via Primary-Backup From eventual to strong consistency Replication s via - Eventual consistency Multi-master: Any node can accept operation Asynchronously, nodes synchronize state COS 418: Distributed Systems Lecture 10

More information

Dependability Threats

Dependability Threats Dependable Systems Dependability Threats Dr. Peter Tröger Operating Systems Group Dependability Dependability is defined as the trustworthiness of a computer system such that reliance can justifiable be

More information

Multi-Band (Ku, C, Wideband - Satcom, Narrowband Satcom) Telemetry Test System for UAV Application

Multi-Band (Ku, C, Wideband - Satcom, Narrowband Satcom) Telemetry Test System for UAV Application Multi-Band (Ku, C, Wideband - Satcom, Narrowband Satcom) Telemetry Test System for UAV Application Murat IMAY Turkish Aerospace Ind, Inc. Ankara, Turkey mimay@tai.com.tr, muratimay@gmail.com ABSTRACT "This

More information

HA Use Cases. 1 Introduction. 2 Basic Use Cases

HA Use Cases. 1 Introduction. 2 Basic Use Cases HA Use Cases 1 Introduction This use case document outlines the model and failure modes for NFV systems. Its goal is along with the requirements documents and gap analysis help set context for engagement

More information

Last Class:Consistency Semantics. Today: More on Consistency

Last Class:Consistency Semantics. Today: More on Consistency Last Class:Consistency Semantics Consistency models Data-centric consistency models Client-centric consistency models Eventual Consistency and epidemic protocols Lecture 16, page 1 Today: More on Consistency

More information

ARCHITECTURE DESIGN FOR SOFT ERRORS

ARCHITECTURE DESIGN FOR SOFT ERRORS ARCHITECTURE DESIGN FOR SOFT ERRORS Shubu Mukherjee ^ШВпШшр"* AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO T^"ТГПШГ SAN FRANCISCO SINGAPORE SYDNEY TOKYO ^ P f ^ ^ ELSEVIER Morgan

More information

A CAN-Based Architecture for Highly Reliable Communication Systems

A CAN-Based Architecture for Highly Reliable Communication Systems A CAN-Based Architecture for Highly Reliable Communication Systems H. Hilmer Prof. Dr.-Ing. H.-D. Kochs Gerhard-Mercator-Universität Duisburg, Germany E. Dittmar ABB Network Control and Protection, Ladenburg,

More information

Towards Recoverable Hybrid Byzantine Consensus

Towards Recoverable Hybrid Byzantine Consensus Towards Recoverable Hybrid Byzantine Consensus Hans P. Reiser 1, Rüdiger Kapitza 2 1 University of Lisboa, Portugal 2 University of Erlangen-Nürnberg, Germany September 22, 2009 Overview 1 Background Why?

More information

Complexity-Reducing Design Patterns for Cyber-Physical Systems. DARPA META Project. AADL Standards Meeting January 2011 Steven P.

Complexity-Reducing Design Patterns for Cyber-Physical Systems. DARPA META Project. AADL Standards Meeting January 2011 Steven P. Complexity-Reducing Design Patterns for Cyber-Physical Systems DARPA META Project AADL Standards Meeting 24-27 January 2011 Steven P. Miller Delivered to the Government in Accordance with Contract FA8650-10-C-7081

More information

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University Fault Tolerance Part I CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Overview Basic concepts Process resilience Reliable client-server communication Reliable group communication Distributed

More information

CHAPTER 1: REAL TIME COMPUTER CONTROL

CHAPTER 1: REAL TIME COMPUTER CONTROL CHAPTER 1 Page 1 ENGG4420 LECTURE 2 September 08 10 12:49 PM CHAPTER 1: REAL TIME COMPUTER CONTROL REFERENCES: G. F. Franklin et al., ``Feedback Control of Dynamic Systems,`` 5th Edition, Pearson, 2006.

More information

Issues in Programming Language Design for Embedded RT Systems

Issues in Programming Language Design for Embedded RT Systems CSE 237B Fall 2009 Issues in Programming Language Design for Embedded RT Systems Reliability and Fault Tolerance Exceptions and Exception Handling Rajesh Gupta University of California, San Diego ES Characteristics

More information

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical

More information

What are Embedded Systems? Lecture 1 Introduction to Embedded Systems & Software

What are Embedded Systems? Lecture 1 Introduction to Embedded Systems & Software What are Embedded Systems? 1 Lecture 1 Introduction to Embedded Systems & Software Roopa Rangaswami October 9, 2002 Embedded systems are computer systems that monitor, respond to, or control an external

More information

LCCI (Large-scale Complex Critical Infrastructures)

LCCI (Large-scale Complex Critical Infrastructures) LCCI (Large-scale Complex Critical Infrastructures) 1 LCCIs are Internet-scale constellations of heterogeneous systems glued together into a federated and open system by a data distribution middleware.

More information

Scalable Architectural Support for Trusted Software

Scalable Architectural Support for Trusted Software Scalable Architectural Support for Trusted Software David Champagne and Ruby B. Lee Princeton University Secure Processor Design 11/02/2017 Dimitrios Skarlatos Motivation Apps handle sensitive/secret information

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

Byzantine Fault Tolerance

Byzantine Fault Tolerance Byzantine Fault Tolerance CS 240: Computing Systems and Concurrency Lecture 11 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. So far: Fail-stop failures

More information

DEPENDABLE PROCESSOR DESIGN

DEPENDABLE PROCESSOR DESIGN DEPENDABLE PROCESSOR DESIGN Matteo Carminati Politecnico di Milano - October 31st, 2012 Partially inspired by P. Harrod (ARM) presentation at the Test Spring School 2012 - Annecy (France) OUTLINE What?

More information

Fault Tolerance. Distributed Software Systems. Definitions

Fault Tolerance. Distributed Software Systems. Definitions Fault Tolerance Distributed Software Systems Definitions Availability: probability the system operates correctly at any given moment Reliability: ability to run correctly for a long interval of time Safety:

More information

IST ATRIUM. A testbed of terabit IP routers running MPLS over DWDM. TF-NGN meeting

IST ATRIUM. A testbed of terabit IP routers running MPLS over DWDM. TF-NGN meeting IST 1999-20675 ATRIUM A testbed of terabit IP routers running MPLS over DWDM TF-NGN meeting 18-06-2001 http://www.alcatel.be/atrium The objectives of the presentation Present the project : objectives partners

More information

Software Techniques for Dependable Computer-based Systems. Matteo SONZA REORDA

Software Techniques for Dependable Computer-based Systems. Matteo SONZA REORDA Software Techniques for Dependable Computer-based Systems Matteo SONZA REORDA Summary Introduction State of the art Assertions Algorithm Based Fault Tolerance (ABFT) Control flow checking Data duplication

More information

Singularity Technical Report 1: Singularity Design Motivation

Singularity Technical Report 1: Singularity Design Motivation Singularity Technical Report 1: Singularity Design Motivation Galen C. Hunt James R. Larus December 17, 2004 MSR-TR-2004-105 Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052

More information

Stable Embedded Software Systems

Stable Embedded Software Systems Building Stable Embedded Software Systems Lui Sha lrs@cs.uiuc.edu Feb 2006 lrs@cs.uiuc.edu 1 The challenges of building large systems FAA's major modernization project, the Advanced Automation System (AAS),

More information

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d) Distributed Systems Fö 9/10-1 Distributed Systems Fö 9/10-2 FAULT TOLERANCE 1. Fault Tolerant Systems 2. Faults and Fault Models. Redundancy 4. Time Redundancy and Backward Recovery. Hardware Redundancy

More information

Model-Based Safety Approach for Early Validation of Integrated and Modular Avionics Architectures

Model-Based Safety Approach for Early Validation of Integrated and Modular Avionics Architectures Model-Based Safety Approach for Early Validation of Integrated and Modular Avionics Architectures Marion Morel THALES AVIONICS S.A.S., 31036 Toulouse, France marion.morel@fr.thalesgroup.com Abstract. Increasing

More information

Software-based Fault Tolerance Mission (Im)possible?

Software-based Fault Tolerance Mission (Im)possible? Software-based Fault Tolerance Mission Im)possible? Peter Ulbrich The 29th CREST Open Workshop on Software Redundancy November 18, 2013 System Software Group http://www4.cs.fau.de Embedded Systems Initiative

More information

Transient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof.

Transient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof. Transient Fault Detection and Reducing Transient Error Rate Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof. Steven Swanson Outline Motivation What are transient faults? Hardware Fault Detection

More information

TSM Paper Replicating TSM

TSM Paper Replicating TSM TSM Paper Replicating TSM (Primarily to enable faster time to recoverability using an alternative instance) Deon George, 23/02/2015 Index INDEX 2 PREFACE 3 BACKGROUND 3 OBJECTIVE 4 AVAILABLE COPY DATA

More information

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [ELECTION ALGORITHMS] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Does a process

More information

Evolving the CORBA standard to support new distributed real-time and embedded systems

Evolving the CORBA standard to support new distributed real-time and embedded systems Evolving the CORBA standard to support new distributed real-time and embedded systems Tom Bracewell Senior Principal Software Engineer Raytheon Integrated Defense Systems Sudbury, MA. / (978) 440-2539

More information

Dependability. IC Life Cycle

Dependability. IC Life Cycle Dependability Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr IC Life Cycle User s Requirements Design Re-Cycling In-field Operation Production 2 1 IC Life Cycle User s

More information

Parallel Streaming Computation on Error-Prone Processors. Yavuz Yetim, Margaret Martonosi, Sharad Malik

Parallel Streaming Computation on Error-Prone Processors. Yavuz Yetim, Margaret Martonosi, Sharad Malik Parallel Streaming Computation on Error-Prone Processors Yavuz Yetim, Margaret Martonosi, Sharad Malik Upsets/B muons/mb Average Number of Dopant Atoms Hardware Errors on the Rise Soft Errors Due to Cosmic

More information

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju Chapter 5: Distributed Systems: Fault Tolerance Fall 2013 Jussi Kangasharju Chapter Outline n Fault tolerance n Process resilience n Reliable group communication n Distributed commit n Recovery 2 Basic

More information

REDCENTRIC VSPHERE AGENT VERSION

REDCENTRIC VSPHERE AGENT VERSION REDCENTRIC VSPHERE AGENT VERSION 7.36.5686 RELEASE NOTES, MAY 17, 2016 vsphere Agent Version 7.36.5686 Release Notes, May 17, 2016 Contents 1 OVERVIEW 1.1 Release History 1.2 Supported Platforms/VMware

More information