A Diversity of Duplications
|
|
- Felix Simon
- 5 years ago
- Views:
Transcription
1 A Diversity of Duplications David Powell Special event «Dependability of computing systems, Memories and future» in honor of Jean-Claude Laprie LAAS-CNRS, Toulouse, 16 April 2010
2 Duplication error Detection error error Tolerance
3 Outline Some memories on duplication Some recent and ongoing work on duplication
4 Some memories
5 Gordini First duplicated system built in LAAS Detection of HW faults Duplicated bit microprocessors 8 kbytes of parity-checked memory
6 Gordini
7 Bi-Gordini
8 1979 Hair! Gordini with people Gordini Jean-Claude Hiro Ihara
9 Armure A hot standby duplicated system developed for the French space agency in the context of the "SURF national project". Application was as part of the ground segment of the Cospas-Sarsat international satellitebased search-and-rescue system.
10 Armure A guy that worked on the project
11 Delta-4 Pioneering work on duplication implemented by software in a CORBA-like environment: active replication passive replication semi-active replication
12 1987 Delta-4 A guy that didn't work on the project
13 Some recent and ongoing work on duplication
14 A Railway Duplication Context: duplication of fail-safe controllers (coded processors) in automatic subway systems Problem: replica consistency despite unreliable communication
15 Inter-section handover T1 Section A Section B Block lock Negative detectors Controller A Controller B Unregisters trains leaving lock Registers trains entering lock Assigns target: next station or lock, block behind previous train 16
16 Duplication = danger! T2 Section A Section B Block T1 lock Negative detectors Controller A Controller B A1 A2 B1 B2 B1 registers T2 Assigns target Fails (while T2 proceeds) 17
17 Duplication = danger! T3 Section A Section B T2 Block T1 lock Negative detectors Controller A Controller B A1 A2 B1 B2 B2 registers T3 Assigns incorrect target (since it missed T2) 18
18 Problem Consistency between duplicated units Despite unreliable communication provably impossible!
19 Solution: PADRE Fail-safe multicast Protocol for Asynchronous Duplex REdundancy Repair Nominal duplex config. Simplex config. Fault of primary or secondary Fault of primary Potential inconsistency (transmission error) State restoration Repair Fault of secondary (Benign failure) Safe Safe duplex config. Fault of primary Catastrophic failure Nominal service Unsafe
20 Solution: PADRE Fail-safe multicast Protocol for Asynchronous Duplex REdundancy Deployed by Siemens Transportation Systems (previously Matra Transport) In New York (Carnarsie line), Barcelona, Paris (line 3), Roissy Soon in Saõ Paulo (line 4), Paris (line 1), Budapest (lines 2 & 4), Helsinki, Algiers, New York (PATH line)
21 A Robotics Duplication Context: temporal planning for an autonomous robot Problem: insufficient or erroneous knowledge encoded in domain models
22 Software Architecture Goals Decisional Layer Executive Layer Functional Layer Decision making, planning Decompose plan actions into elementary tasks Execution control of elementary tasks Environment sensing Execution of elementary tasks
23 How Do Robots Plan? Planning with IxTeT - planning in a plan space Declarative Model objects actions constraints Domain knowledge Heuristics Goals Search Engine Current Situation Initial partial plan Executive Layer Possible final plans Functional Layer
24 IxTeT Example
25 Problem Domain knowledge (models, heuristics) may be incomplete or wrong Validation intrinsically difficult Can tolerance be envisaged? Multiplicity of valid but incomparable plans What means can be used for detection?
26 Solution: FTplan Model 1 Goals Model 2 Detection before execution Temporal watchdog IxTeT FTplan Executive Layer Functional Layer IxTeT Plan analyzer Detection during/after execution Online goal checker Action failure detection Recovery Sequential planning Concurrent planning Dala robot implementation
27 Solution: FTplan Prototype implementation First diversification of declarative programs Validated by fault injection (model mutation) on simulated Dala robot First fault injection into declarative programs 30-40% goal reliability improvement in presence of injected faults Larger gains to be expected with a plan analyzer
28 An Avionics Duplication Context: connection of a commercial laptop to a life-critical system (i.e., an aircraft) Problem: malicious intrusion into laptop s COTS operating system
29 Maintenance laptop Pilot Maintenance engineer Onboard equipment Flight logbook Maintenance terminal Paper manuals Electronic manuals
30 Maintenance laptop Pilot Maintenance engineer Onboard equipment Flight logbook Maintenance terminal Paper manuals Maintenance laptop
31 Connecting a laptop Flight management Aircraft management Aircraft information system "Off-board"
32 Connecting a laptop Flight management Aircraft management Aircraft information system? "Off-board"
33 Enabling technologies Totel et al s "multi-level integrity" model framework for multiple criticality levels in a single system trusted computing base for isolation and mediation fault-tolerance to allow data to flow from low to high Platform virtualization techniques isolation between virtual machines attractive approach for implementing TCB
34 View Model Solution: Virtual Duplication ArSec «Architecture de Sécurités» to aircraft equipment 6' Model VO 6" Controller View 3 Controller' 3 Controller AspectJ 2 View 2 AspectJ SWING SWING SWING JVM JVM Safe VM JVM Hypervisor 1 7 Error XEN Hardware
35 Model Controller?! 6" VO View Controller' Model View Corruption attack ArSec «Architecture de Sécurités» 6' 3 3 Controller AspectJ 2 View 2 AspectJ SWING SWING SWING JVM JVM Safe VM JVM Hypervisor 1 7 Error XEN Hardware
36 Model Controller?! 6" VO View Controller' Model View Timing attack ArSec «Architecture de Sécurités» 6' 3 3 Controller AspectJ 2 View 2 AspectJ SWING SWING SWING JVM JVM Safe VM JVM Hypervisor 1 7 Error XEN Hardware
37 Reaction to attack ArSec «Architecture de Sécurités» X 6' X 6?! Model 6" VO Controller View Controller' View 2 AspectJ AspectJ SWING SWING SWING JVM JVM JVM Safe VM 1 7 Hypervisor Error XXEN Model Controller View Hardware Reboot Change laptops Revert to maintenance terminal
38 Summary Context Objective Problem Solution PADRE Railways Availability & Safety Unreliable communication Bad diversity Fail-safe asynchronous multicast FTplan Robotics Availability Domain knowledge deficiencies Diversified domain models Good diversity ArSec Avionics Security & Safety Malicious intrusion Virtualization & diversified OS s Good diversity
39 The Future ArSec «Architecture de Sécurités» Dealing with the dichotomy between: Good diversity: favors independent manifestation of design faults (including vulnerabilities) allowing their tolerance Bad diversity: causes non-deterministic behavior that gives rise to false positives Research directions for dealing with bad diversity: Constraints on internal operation of virtual machines (e.g., thread scheduling) without reducing good diversity Constraints on programmers (e.g., programming styles) without reducing ease-of-programming
40 Dependability : a Unifying Concept for Reliable Computing (FTCS-12)
41 A Diversity of Duplications "35 years of duplication without doing the same thing twice"
ENSURING SAFETY AND SECURITY FOR AVIONICS: A CASE STUDY
ENSURING SAFETY AND SECURITY FOR AVIONICS: A CASE STUDY Youssef Laarouchi 1,2, Yves Deswarte 1,2, David Powell 1,2, Jean Arlat 1,2, Eric De Nadai 3 1 CNRS ; LAAS ; 7 avenue du colonel Roche, F-31077 Toulouse,
More informationCprE 458/558: Real-Time Systems. Lecture 17 Fault-tolerant design techniques
: Real-Time Systems Lecture 17 Fault-tolerant design techniques Fault Tolerant Strategies Fault tolerance in computer system is achieved through redundancy in hardware, software, information, and/or computations.
More informationPADRE : A Protocol for Asymmetric Duplex REdundancy
PADRE : A Protocol for Asymmetric Duplex REdundancy D. Essamé, J. Arlat, D. Powell LAAS-CNRS, 7 avenue du colonel Roche, 31077 Toulouse cedex 4, France {essame, arlat, dpowell}@laas.fr Abstract Safety
More informationCS603: Distributed Systems
CS603: Distributed Systems Lecture 1: Basic Communication Services Cristina Nita-Rotaru Lecture 1/ Spring 2006 1 Reference Material Textbooks Ken Birman: Reliable Distributed Systems Recommended reading
More informationProviding Real-Time and Fault Tolerance for CORBA Applications
Providing Real-Time and Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS University Pittsburgh, PA 15213-3890 Sponsored in part by the CMU-NASA High Dependability Computing
More informationFault Tolerant Computing CS 530
Fault Tolerant Computing CS 530 Lecture Notes 1 Introduction to the class Yashwant K. Malaiya Colorado State University 1 Instructor, TA Instructor: Yashwant K. Malaiya, Professor malaiya @ cs.colostate.edu
More informationFault Tolerance. The Three universe model
Fault Tolerance High performance systems must be fault-tolerant: they must be able to continue operating despite the failure of a limited subset of their hardware or software. They must also allow graceful
More informationEliminating Single Points of Failure in Software Based Redundancy
Eliminating Single Points of Failure in Software Based Redundancy Peter Ulbrich, Martin Hoffmann, Rüdiger Kapitza, Daniel Lohmann, Reiner Schmid and Wolfgang Schröder-Preikschat EDCC May 9, 2012 SYSTEM
More informationPriya Narasimhan. Assistant Professor of ECE and CS Carnegie Mellon University Pittsburgh, PA
OMG Real-Time and Distributed Object Computing Workshop, July 2002, Arlington, VA Providing Real-Time and Fault Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS Carnegie
More informationToday: Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationDistributed Systems COMP 212. Lecture 19 Othon Michail
Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails
More informationFault Tolerance. Distributed Systems IT332
Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to
More informationFAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 18 Chapter 7 Case Studies Part.18.1 Introduction Illustrate practical use of methods described previously Highlight fault-tolerance
More informationToday: Fault Tolerance. Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationPhysical Storage Media
Physical Storage Media These slides are a modified version of the slides of the book Database System Concepts, 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan. Original slides are available
More informationTSW Reliability and Fault Tolerance
TSW Reliability and Fault Tolerance Alexandre David 1.2.05 Credits: some slides by Alan Burns & Andy Wellings. Aims Understand the factors which affect the reliability of a system. Introduce how software
More informationFault Tolerance for Highly Available Internet Services: Concept, Approaches, and Issues
Fault Tolerance for Highly Available Internet Services: Concept, Approaches, and Issues By Narjess Ayari, Denis Barbaron, Laurent Lefevre and Pascale primet Presented by Mingyu Liu Outlines 1.Introduction
More informationFailure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems
Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements
More informationDistributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski
Distributed Systems 09. State Machine Replication & Virtual Synchrony Paul Krzyzanowski Rutgers University Fall 2016 1 State machine replication 2 State machine replication We want high scalability and
More informationData Backup for Mobile Nodes : a Cooperative Middleware and an Experimentation Platform
Data Backup for Mobile Nodes : a Cooperative Middleware and an Experimentation Platform Marc-Olivier Killijian Matthieu Roy Gaétan Séverac Christophe Zanon roy@laas.fr http://theresumeexperience.blogspot.com/
More informationComputer-Based Control System Safety Requirements
Computer-Based Control System Safety Requirements International Space Station Program Revision B November 17, 1995 National Aeronautics and Space Administration International Space Station Program Johnson
More informationPart 2: Basic concepts and terminology
Part 2: Basic concepts and terminology Course: Dependable Computer Systems 2012, Stefan Poledna, All rights reserved part 2, page 1 Def.: Dependability (Verlässlichkeit) is defined as the trustworthiness
More informationSoftware Diversity and Fault-Tolerance: An Overview
Software Diversity and Fault-Tolerance: An Overview Daniel Rodriguez Retamosa and Mehrdad Saadatmand Mälardalen Real-Time Research Centre (MRTC) Mälardalen University Västerås, Sweden dra05002@student.mdh.se,
More informationDep. Systems Requirements
Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small
More informationHigh Availability and Disaster Recovery Solutions for Perforce
High Availability and Disaster Recovery Solutions for Perforce This paper provides strategies for achieving high Perforce server availability and minimizing data loss in the event of a disaster. Perforce
More informationPractical Byzantine Fault Tolerance
Practical Byzantine Fault Tolerance Robert Grimm New York University (Partially based on notes by Eric Brewer and David Mazières) The Three Questions What is the problem? What is new or different? What
More informationPlanning with Diversified Models for Fault-Tolerant Robots
Planning with Diversified Models for Fault-Tolerant Robots Benjamin Lussier, Matthieu Gallien, Jérémie Guiochet, Félix Ingrand, Marc-Olivier Killijian, David Powell LAAS-CNRS, University of Toulouse, France
More informationDependability tree 1
Dependability tree 1 Means for achieving dependability A combined use of methods can be applied as means for achieving dependability. These means can be classified into: 1. Fault Prevention techniques
More informationFault Tolerance. Basic Concepts
COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time
More informationToday: Fault Tolerance. Replica Management
Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery
More informationRedundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992
Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical
More information2014 Software Global Client Conference
WW HMI SCADA-10 Best practices for distributed SCADA Stan DeVries Senior Director Solutions Architecture What is Distributed SCADA? It s much more than a distributed architecture (SCADA always has this)
More informationCPLD Developement & Nuclear Safety (NS) Constraints
CPLD Developement & Nuclear Safety (NS) Constraints SUMMARY NEXEYA EQUIPEMENT ARCHITECTURE DEVELOPEMENT VALIDATION TEST MAINTENANCE TEST BENCH REX (return of experience) COTS NEXEYA 3 NEXEYA Staff > 1
More informationCritical Systems. Objectives. Topics covered. Critical Systems. System dependability. Importance of dependability
Objectives Critical Systems To explain what is meant by a critical system where system failure can have severe human or economic consequence. To explain four dimensions of dependability - availability,
More informationTime-Triggered Ethernet
Time-Triggered Ethernet Chapters 42 in the Textbook Professor: HONGWEI ZHANG CSC8260 Winter 2016 Presented By: Priyank Baxi (fr0630) fr0630@wayne.edu Outline History Overview TTEthernet Traffic Classes
More informationFault Tolerance. Distributed Systems. September 2002
Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend
More informationChapter 8 Fault Tolerance
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to
More informationRiccardo Mariani, Intel Fellow, IOTG SEG, Chief Functional Safety Technologist
Riccardo Mariani, Intel Fellow, IOTG SEG, Chief Functional Safety Technologist Internet of Things Group 2 Internet of Things Group 3 Autonomous systems: computing platform Intelligent eyes Vision. Intelligent
More informationTU Wien. Fault Isolation and Error Containment in the TT-SoC. H. Kopetz. TU Wien. July 2007
TU Wien 1 Fault Isolation and Error Containment in the TT-SoC H. Kopetz TU Wien July 2007 This is joint work with C. El.Salloum, B.Huber and R.Obermaisser Outline 2 Introduction The Concept of a Distributed
More informationCprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University
Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable
More informationSoftware Architecture. Lecture 4
Software Architecture Lecture 4 Last time We discussed tactics to achieve architecture qualities We briefly surveyed architectural styles 23-Jan-08 http://www.users.abo.fi/lpetre/sa08/ 2 Today We check
More informationCrashOS: Hypervisor testing tool
ISSRE 2017 Anaïs GANTET - Airbus Digital Security October 2017 Outline 1 Why CrashOS? 2 CrashOS presentation 3 Vulnerability research and results October 2017 2 ISSRE Outline 1 Why CrashOS? 2 CrashOS presentation
More informationVirtually Eliminating Router Bugs
Virtually Eliminating Router Bugs Eric Keller, Minlan Yu, Matt Caesar, Jennifer Rexford Princeton University, UIUC NANOG 46: Philadelphia, PA Dealing with router bugs Internet s complexity implemented
More informationLast time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson
Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to
More informationReliable Statements about a Fault-Tolerant X-by-Wire ecar. Reliable Statements about a Fault-Tolerant X-by-Wire ecar Unrestricted 2017 Siemens AG
Reliable Statements about a Fault-Tolerant X-by-Wire ecar Reliable Statements about a Fault-Tolerant X-by-Wire ecar Unrestricted 2017 Siemens AG Reliable Statements about a Fault-Tolerant X-by-Wire ecar
More informationCDA 5140 Software Fault-tolerance. - however, reliability of the overall system is actually a product of the hardware, software, and human reliability
CDA 5140 Software Fault-tolerance - so far have looked at reliability as hardware reliability - however, reliability of the overall system is actually a product of the hardware, software, and human reliability
More informationAdvanced Systems Security: Virtual Machine Systems
Systems and Internet Infrastructure Security Network and Security Research Center Department of Computer Science and Engineering Pennsylvania State University, University Park PA Advanced Systems Security:
More informationBasic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.
Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication One-one communication One-many communication Distributed commit Two phase commit Failure recovery
More informationPattern-Based Analysis of an Embedded Real-Time System Architecture
Pattern-Based Analysis of an Embedded Real-Time System Architecture Peter Feiler Software Engineering Institute phf@sei.cmu.edu 412-268-7790 Outline Introduction to SAE AADL Standard The case study Towards
More informationIntroduction to Software Fault Tolerance Techniques and Implementation. Presented By : Hoda Banki
Introduction to Software Fault Tolerance Techniques and Implementation Presented By : Hoda Banki 1 Contents : Introduction Types of faults Dependability concept classification Error recovery Types of redundancy
More informationReliable Distributed System Approaches
Reliable Distributed System Approaches Manuel Graber Seminar of Distributed Computing WS 03/04 The Papers The Process Group Approach to Reliable Distributed Computing K. Birman; Communications of the ACM,
More informationSurvey of Cyber Moving Targets. Presented By Sharani Sankaran
Survey of Cyber Moving Targets Presented By Sharani Sankaran Moving Target Defense A cyber moving target technique refers to any technique that attempts to defend a system and increase the complexity of
More informationA FAULT- AND INTRUSION-TOLERANT ARCHITECTURE FOR THE PORTUGUESE POWER DISTRIBUTION SCADA
A FAULT- AND INTRUSION-TOLERANT ARCHITECTURE FOR THE PORTUGUESE POWER DISTRIBUTION SCADA Nuno Medeiros Alysson Bessani 1 Context: EDP Distribuição EDP Distribuição is the utility responsible for the distribution
More informationCS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:
CS 47 Spring 27 Mike Lam, Professor Fault Tolerance Content taken from the following: "Distributed Systems: Principles and Paradigms" by Andrew S. Tanenbaum and Maarten Van Steen (Chapter 8) Various online
More informationFault-tolerant techniques
What are the effects if the hardware or software is not fault-free in a real-time system? What causes component faults? Specification or design faults: Incomplete or erroneous models Lack of techniques
More informationGreen Lights Forever: Analyzing the Security of Traffic Infrastructure
Green Lights Forever: Analyzing the Security of Traffic Infrastructure RAJSHAKHAR PAUL Outline Introduction Anatomy of a Traffic Infrastructure Case Study Threat Model Types of Attack Recommendation Broader
More informationToward Intrusion Tolerant Clouds
Toward Intrusion Tolerant Clouds Prof. Yair Amir, Prof. Vladimir Braverman Daniel Obenshain, Tom Tantillo Department of Computer Science Johns Hopkins University Prof. Cristina Nita-Rotaru, Prof. Jennifer
More informationApplying MILS to multicore avionics systems
Applying MILS to multicore avionics systems Eur Ing Paul Parkinson FIET Principal Systems Architect, A&D EuroMILS Workshop, Prague, 19 th January 2016 2016 Wind River. All Rights Reserved. Agenda A Brief
More informationDistributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf
Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need
More informationSafety SPL/2010 SPL/20 1
Safety 1 system designing for concurrent execution environments system: collection of objects and their interactions system properties: Safety - nothing bad ever happens Liveness - anything ever happens
More informationCyber Moving Targets. Yashar Dehkan Asl
Cyber Moving Targets Yashar Dehkan Asl Introduction An overview of different cyber moving target techniques, their threat models, and their technical details. Cyber moving target technique: Defend a system
More informationReplace Single Server or Cluster
Caution Because this process is designed to work as a server replacement, you must perform it in the live environment. Cisco does not recommend doing this process on a dead net because a duplication of
More informationMixed Critical Architecture Requirements (MCAR)
Superior Products Through Innovation Approved for Public Release; distribution is unlimited. (PIRA AER200905019) Mixed Critical Architecture Requirements (MCAR) Copyright 2009 Lockheed Martin Corporation
More informationFP7-4: Introduction to Reliability and Fault Tolerance. FP7-4: Introduction to Reliability and Fault Tolerance. The NASA Mars Space Mission
FP7-4: Introduction to Reliability and Fault Tolerance Youmin Zhang Phone: 7912 7741 Office Location: FUV 0.22 Email: ymzhang@cs.aaue.dk http://www.cs.aaue.dk/~ymzhang/courses/reliability/index.html FP7-4:
More informationLeslie Lamport. April 20, Leslie Lamport. Jenny Tyrväinen. Introduction. Education and Career. Most important works.
April 20, 2016 Born February 7 1941 in New York Mathematician by his education Has worked in industry, not an academic Fields: concurrency and distributed systems Lists 180 publications and other texts
More informationFrom eventual to strong consistency. Primary-Backup Replication. Primary-Backup Replication. Replication State Machines via Primary-Backup
From eventual to strong consistency Replication s via - Eventual consistency Multi-master: Any node can accept operation Asynchronously, nodes synchronize state COS 418: Distributed Systems Lecture 10
More informationDependability Threats
Dependable Systems Dependability Threats Dr. Peter Tröger Operating Systems Group Dependability Dependability is defined as the trustworthiness of a computer system such that reliance can justifiable be
More informationMulti-Band (Ku, C, Wideband - Satcom, Narrowband Satcom) Telemetry Test System for UAV Application
Multi-Band (Ku, C, Wideband - Satcom, Narrowband Satcom) Telemetry Test System for UAV Application Murat IMAY Turkish Aerospace Ind, Inc. Ankara, Turkey mimay@tai.com.tr, muratimay@gmail.com ABSTRACT "This
More informationHA Use Cases. 1 Introduction. 2 Basic Use Cases
HA Use Cases 1 Introduction This use case document outlines the model and failure modes for NFV systems. Its goal is along with the requirements documents and gap analysis help set context for engagement
More informationLast Class:Consistency Semantics. Today: More on Consistency
Last Class:Consistency Semantics Consistency models Data-centric consistency models Client-centric consistency models Eventual Consistency and epidemic protocols Lecture 16, page 1 Today: More on Consistency
More informationARCHITECTURE DESIGN FOR SOFT ERRORS
ARCHITECTURE DESIGN FOR SOFT ERRORS Shubu Mukherjee ^ШВпШшр"* AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO T^"ТГПШГ SAN FRANCISCO SINGAPORE SYDNEY TOKYO ^ P f ^ ^ ELSEVIER Morgan
More informationA CAN-Based Architecture for Highly Reliable Communication Systems
A CAN-Based Architecture for Highly Reliable Communication Systems H. Hilmer Prof. Dr.-Ing. H.-D. Kochs Gerhard-Mercator-Universität Duisburg, Germany E. Dittmar ABB Network Control and Protection, Ladenburg,
More informationTowards Recoverable Hybrid Byzantine Consensus
Towards Recoverable Hybrid Byzantine Consensus Hans P. Reiser 1, Rüdiger Kapitza 2 1 University of Lisboa, Portugal 2 University of Erlangen-Nürnberg, Germany September 22, 2009 Overview 1 Background Why?
More informationComplexity-Reducing Design Patterns for Cyber-Physical Systems. DARPA META Project. AADL Standards Meeting January 2011 Steven P.
Complexity-Reducing Design Patterns for Cyber-Physical Systems DARPA META Project AADL Standards Meeting 24-27 January 2011 Steven P. Miller Delivered to the Government in Accordance with Contract FA8650-10-C-7081
More informationFault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University
Fault Tolerance Part I CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Overview Basic concepts Process resilience Reliable client-server communication Reliable group communication Distributed
More informationCHAPTER 1: REAL TIME COMPUTER CONTROL
CHAPTER 1 Page 1 ENGG4420 LECTURE 2 September 08 10 12:49 PM CHAPTER 1: REAL TIME COMPUTER CONTROL REFERENCES: G. F. Franklin et al., ``Feedback Control of Dynamic Systems,`` 5th Edition, Pearson, 2006.
More informationIssues in Programming Language Design for Embedded RT Systems
CSE 237B Fall 2009 Issues in Programming Language Design for Embedded RT Systems Reliability and Fault Tolerance Exceptions and Exception Handling Rajesh Gupta University of California, San Diego ES Characteristics
More informationRedundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992
Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical
More informationWhat are Embedded Systems? Lecture 1 Introduction to Embedded Systems & Software
What are Embedded Systems? 1 Lecture 1 Introduction to Embedded Systems & Software Roopa Rangaswami October 9, 2002 Embedded systems are computer systems that monitor, respond to, or control an external
More informationLCCI (Large-scale Complex Critical Infrastructures)
LCCI (Large-scale Complex Critical Infrastructures) 1 LCCIs are Internet-scale constellations of heterogeneous systems glued together into a federated and open system by a data distribution middleware.
More informationScalable Architectural Support for Trusted Software
Scalable Architectural Support for Trusted Software David Champagne and Ruby B. Lee Princeton University Secure Processor Design 11/02/2017 Dimitrios Skarlatos Motivation Apps handle sensitive/secret information
More informationGFS: The Google File System. Dr. Yingwu Zhu
GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can
More informationByzantine Fault Tolerance
Byzantine Fault Tolerance CS 240: Computing Systems and Concurrency Lecture 11 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. So far: Fail-stop failures
More informationDEPENDABLE PROCESSOR DESIGN
DEPENDABLE PROCESSOR DESIGN Matteo Carminati Politecnico di Milano - October 31st, 2012 Partially inspired by P. Harrod (ARM) presentation at the Test Spring School 2012 - Annecy (France) OUTLINE What?
More informationFault Tolerance. Distributed Software Systems. Definitions
Fault Tolerance Distributed Software Systems Definitions Availability: probability the system operates correctly at any given moment Reliability: ability to run correctly for a long interval of time Safety:
More informationIST ATRIUM. A testbed of terabit IP routers running MPLS over DWDM. TF-NGN meeting
IST 1999-20675 ATRIUM A testbed of terabit IP routers running MPLS over DWDM TF-NGN meeting 18-06-2001 http://www.alcatel.be/atrium The objectives of the presentation Present the project : objectives partners
More informationSoftware Techniques for Dependable Computer-based Systems. Matteo SONZA REORDA
Software Techniques for Dependable Computer-based Systems Matteo SONZA REORDA Summary Introduction State of the art Assertions Algorithm Based Fault Tolerance (ABFT) Control flow checking Data duplication
More informationSingularity Technical Report 1: Singularity Design Motivation
Singularity Technical Report 1: Singularity Design Motivation Galen C. Hunt James R. Larus December 17, 2004 MSR-TR-2004-105 Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052
More informationStable Embedded Software Systems
Building Stable Embedded Software Systems Lui Sha lrs@cs.uiuc.edu Feb 2006 lrs@cs.uiuc.edu 1 The challenges of building large systems FAA's major modernization project, the Advanced Automation System (AAS),
More informationFAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)
Distributed Systems Fö 9/10-1 Distributed Systems Fö 9/10-2 FAULT TOLERANCE 1. Fault Tolerant Systems 2. Faults and Fault Models. Redundancy 4. Time Redundancy and Backward Recovery. Hardware Redundancy
More informationModel-Based Safety Approach for Early Validation of Integrated and Modular Avionics Architectures
Model-Based Safety Approach for Early Validation of Integrated and Modular Avionics Architectures Marion Morel THALES AVIONICS S.A.S., 31036 Toulouse, France marion.morel@fr.thalesgroup.com Abstract. Increasing
More informationSoftware-based Fault Tolerance Mission (Im)possible?
Software-based Fault Tolerance Mission Im)possible? Peter Ulbrich The 29th CREST Open Workshop on Software Redundancy November 18, 2013 System Software Group http://www4.cs.fau.de Embedded Systems Initiative
More informationTransient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof.
Transient Fault Detection and Reducing Transient Error Rate Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof. Steven Swanson Outline Motivation What are transient faults? Hardware Fault Detection
More informationTSM Paper Replicating TSM
TSM Paper Replicating TSM (Primarily to enable faster time to recoverability using an alternative instance) Deon George, 23/02/2015 Index INDEX 2 PREFACE 3 BACKGROUND 3 OBJECTIVE 4 AVAILABLE COPY DATA
More informationCS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University
CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [ELECTION ALGORITHMS] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Does a process
More informationEvolving the CORBA standard to support new distributed real-time and embedded systems
Evolving the CORBA standard to support new distributed real-time and embedded systems Tom Bracewell Senior Principal Software Engineer Raytheon Integrated Defense Systems Sudbury, MA. / (978) 440-2539
More informationDependability. IC Life Cycle
Dependability Alberto Bosio, Associate Professor UM Microelectronic Departement bosio@lirmm.fr IC Life Cycle User s Requirements Design Re-Cycling In-field Operation Production 2 1 IC Life Cycle User s
More informationParallel Streaming Computation on Error-Prone Processors. Yavuz Yetim, Margaret Martonosi, Sharad Malik
Parallel Streaming Computation on Error-Prone Processors Yavuz Yetim, Margaret Martonosi, Sharad Malik Upsets/B muons/mb Average Number of Dopant Atoms Hardware Errors on the Rise Soft Errors Due to Cosmic
More informationChapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju
Chapter 5: Distributed Systems: Fault Tolerance Fall 2013 Jussi Kangasharju Chapter Outline n Fault tolerance n Process resilience n Reliable group communication n Distributed commit n Recovery 2 Basic
More informationREDCENTRIC VSPHERE AGENT VERSION
REDCENTRIC VSPHERE AGENT VERSION 7.36.5686 RELEASE NOTES, MAY 17, 2016 vsphere Agent Version 7.36.5686 Release Notes, May 17, 2016 Contents 1 OVERVIEW 1.1 Release History 1.2 Supported Platforms/VMware
More information