A Distributed and Robust SDN Control Plane for Transactional Network Updates
|
|
- Bennett Patrick
- 5 years ago
- Views:
Transcription
1 A Distributed and Robust SDN Control Plane for Transactional Network Updates Marco Canini (UCL) with Petr Kuznetsov (Télécom ParisTech), Dan Levin (TU Berlin), Stefan Schmid (TU Berlin & T-Labs) 1
2 Network Policy Specification Routing Load Monitoring Access Waypoint Balancing Control Centralized Network Policy Controller Platform 2
3 Network Policy Specification forward from X to Y Routing Load Monitoring Access Waypoint Balancing Control Centralized Network Policy Controller Platform X 1.<Match, Action> 2.<Match, Action> 3.<Match, Action> 4.<Match, Action> 5.<Match, Action> Y 3
4 Network Policy Specification X Routing Load Monitoring Access Waypoint Balancing Control Centralized Network Policy Composition Controller Platform Y Policy composition assembles data plane updates as a semantically sound set of rules 4
5 Policy Composition Review Foster 11, Monsanto 13: Modular, parallel and sequential composition Srv Load Balancing >> Routing U Monitoring Ferguson 13: Policy trees for multi-authorship Composition Routing Waypoint Reitblatt 12: Consistent network updates 5
6 Conflicting Policies dst = H fwd(x) Routing Load Balancing Conflict X!= Y In the general case, policies might conflict Examples: Overlapping domains and same precedence Scarce flowtable resources Goal: Must avoid conflicting policies! dst = H fwd(y) 6
7 Centralized Network Control? Routing Load Monitoring Access Waypoint Balancing Control Centralized Network Policy Controller Platform Fully centralized Inadequate availability, scalability and responsiveness 7
8 Distributed Network Control Routing Load Monitoring Access Waypoint Balancing Control Centralized Network Policy Controller Platform Composition? Consistency and concurrency Faults and asynchronous communication 8
9 Now, consider policy composition in the distributed control plane... Routing Monitoring Waypoint Policy Controller Platform Routing Monitoring Waypoint Policy Controller Platform Switch Reader- Writer Model State Distribution Model Control Logic Factorization 9
10 Why should the programmer care? We believe the programmer should not! Enter Software Transactional Networking Let a dedicated component implement a general solution to all hard-to-solve, low-level concurrency and fault tolerance issues to policy composition apply(π) ack / nack(reason) Transactional Interface 10
11 Consistency: Linearizability of updates apply(π 1 ) ack apply(π 1 ) ack p 1 p 1 apply(π 2 ) ack apply(π 2 ) ack p 2 apply(π 3 ) nack p 2 p 3 p 3 sw 1 sw 1 sw 2 sw 2 sw 3 Original history We don t control traffic! 11 sw 3 Manipulate the network as though there is no concurrency Sequential equivalent
12 Software Transactional Networking: Consistent Policy Composition (CPC) Routing Monitoring Waypoint apply(π 1 ) ack apply(π 2 ) nack(reason) CPC Interface 1. All-or-nothing semantics 2. Tolerate up to f controller node crash failures 3. Non conflicting policies eventually installed and at least one policy commits (among conflicting ones) 4. Ensure policy updates affect traffic as a sequential composition of their policies 12
13 Conceptualizing CPC Routing Monitoring Waypoint Reliable but asynchronous channel CPC Node CPC Node Every controller node receives and participate in installing every policy update 13
14 Conceptualizing CPC Routing Monitoring Waypoint apply(π) ack CPC Node CPC Node Policy π unique tag τ 1. Internal ports match on tag τ 2. Ingress ports apply tag τ 14
15 Conceptualizing CPC Routing Monitoring Waypoint CPC Node CPC Node RMW RMW RMW RMW RMW Atomic read-modify-write primitive RMW RMW 15
16 RMW model Controllers access ports with atomic read-modify-write primitive RMW(g,v): read current state v apply and return g(v,v ) RMW(g,v) Controller g(v,v ) Intuition: do not update if policy update conflicts with currently installed policy In the paper: Theorem: 1-resilient read-write CPC is impossible 16
17 FixTag: upper bound algorithm Operation: 1. Unique tag per path 2. Broadcast policy π to all other controllers 3. Update ingress ports in predefined order 4. add rule to tag all packets matching dom(π) with the tag corresponding to the path π (i) for ingress port i Upsides: wait-free (tolerates all failure patterns) Controllers only synchronize through the data plane Downsides: tag complexity linear in # possible policies and paths May grow super-exponential in the size of the network 17
18 Can we lower the tax complexity? No, if we get no feedback from the network Tag τ cannot be reused if a packet tagged with τ is still in flight Suppose, we can correctly evaluate the set of active tags Correct (but asynchronous) oracle Single-controller scenario: one bit is enough! Upon policy update π i, wait until (i mod 2)-traffic is over, and use tag i mod 2 Two or more controllers: inherent price of concurrency? Between constant and super-exponential? Yes, if controllers coordinate the use of tags 18
19 ReuseTag: linear complexity Proportional to the level of resilience: Up to f failures: f+2 tags needed (proved optimal) Controllers use replicated state machine that imposes a total order on the policy updates and ensures coordinated use and reuse of tags All requests are serialized, even non-conflicting ones 19
20 Summary Software Transactional Networking framework for consistent policy composition (CPC) in distributed SDN control planes Transactional interface to manipulate the network as though there is no concurrency Policies compose or conflict (and abort) Formal model of the problem is in the paper Two CPC algorithms FixTag ReuseTag: f+2 tags (minimal number) 20
21 Backup 21
22 Acknowledgements Petr Kuznetsov Dan Levin Stefan Schmid Supported by ARC grant 13/ from Communauté française de Belgique, and European Union s Horizon 2020 ENDEAVOUR project (grant agreement ) 22
Software Transactional Networking: Concurrent and Consistent Policy Composition
Software Transactional Networking: Concurrent and Consistent Policy Composition Dan Levin Marco Canini, Petr Kuznetsov*, Stefan Schmid TU Berlin/T-Labs, *Telecom ParisTech Software Transactional Networking:
More informationarxiv: v3 [cs.dc] 20 May 2014
A Distributed SDN Control Plane for Consistent Policy Updates Marco Canini 1 Petr Kuznetsov 2 Dan Levin 3 Stefan Schmid 4 1 Université catholique de Louvain, Place Sainte Barbe 2, 1348 Louvain-la-Neuve,
More informationSoftware Transactional Networking: Concurrent and Consistent Policy Composition
Software Transactional Networking: Concurrent and Consistent Policy Composition Marco Canini 1 Petr Kuznetsov 1,2 Dan Levin 1 Stefan Schmid 1 1 TU Berlin / T-Labs 2 Télécom ParisTech @net.t-labs.tu-berlin.de
More informationIntroduction to Distributed Systems Seif Haridi
Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send
More informationDistributed Systems Fault Tolerance
Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable
More informationLarge-Scale Key-Value Stores Eventual Consistency Marco Serafini
Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,
More informationarxiv: v3 [cs.dc] 17 Feb 2015
Inherent Limitations of Hybrid Transactional Memory Dan Alistarh 1 Justin Kopinsky 4 Petr Kuznetsov 2 Srivatsan Ravi 3 Nir Shavit 4,5 1 Microsoft Research, Cambridge 2 Télécom ParisTech 3 TU Berlin 4 Massachusetts
More informationReplication in Distributed Systems
Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over
More informationFatTire: Declarative Fault Tolerance for SDN
FatTire: Declarative Fault Tolerance for SDN Mark Reitblatt Marco Canini Arjun Guha Nate Foster (Cornell) (TU Berlin UC Louvain) (Cornell UMass Amherst) (Cornell) 1 In a Perfect World... 2 But in Reality...
More informationIn-Band Synchronization for Distributed SDN Control Planes
In-Band Synchronization for Distributed SDN Control Planes Liron Schiff 1 Stefan Schmid 2 Petr Kuznetsov 3 1 Tel Aviv University, Israel 2 Aalborg University, Denmark 3 Télécom ParisTech, France ABSTRACT
More informationCausal Consistency and Two-Phase Commit
Causal Consistency and Two-Phase Commit CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency
More informationCSE 5306 Distributed Systems
CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves
More informationCSE 5306 Distributed Systems. Fault Tolerance
CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure
More informationDistributed Algorithms Reliable Broadcast
Distributed Algorithms Reliable Broadcast Alberto Montresor University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents
More informationDistributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201
Distributed Systems ID2201 replication Johan Montelius 1 The problem The problem we have: servers might be unavailable The solution: keep duplicates at different servers 2 Building a fault-tolerant service
More informationTopics in Reliable Distributed Systems
Topics in Reliable Distributed Systems 049017 1 T R A N S A C T I O N S Y S T E M S What is A Database? Organized collection of data typically persistent organization models: relational, object-based,
More informationBasic vs. Reliable Multicast
Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?
More informationFault Tolerance. Distributed Systems. September 2002
Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend
More informationChapter 8 Fault Tolerance
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to
More informationSynchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors. Michel Raynal, Julien Stainer
Synchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors Michel Raynal, Julien Stainer Synchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors
More informationFault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit
Fault Tolerance o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication o Distributed Commit -1 Distributed Commit o A more general problem of atomic
More informationFault Tolerance. Basic Concepts
COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time
More informationSynchronization. Chapter 5
Synchronization Chapter 5 Clock Synchronization In a centralized system time is unambiguous. (each computer has its own clock) In a distributed system achieving agreement on time is not trivial. (it is
More informationThe Wait-Free Hierarchy
Jennifer L. Welch References 1 M. Herlihy, Wait-Free Synchronization, ACM TOPLAS, 13(1):124-149 (1991) M. Fischer, N. Lynch, and M. Paterson, Impossibility of Distributed Consensus with One Faulty Process,
More informationA lightweight protocol for consistent policy update on software-defined networking with multiple controllers
Accepted Manuscript A lightweight protocol for consistent policy update on software-defined networking with multiple controllers Diogo Menezes Ferrazani Mattos, Otto Carlos Muniz Bandeira Duarte, Guy Pujolle
More informationConsensus in Distributed Systems. Jeff Chase Duke University
Consensus in Distributed Systems Jeff Chase Duke University Consensus P 1 P 1 v 1 d 1 Unreliable multicast P 2 P 3 Consensus algorithm P 2 P 3 v 2 Step 1 Propose. v 3 d 2 Step 2 Decide. d 3 Generalizes
More informationChapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju
Chapter 4: Distributed Systems: Replication and Consistency Fall 2013 Jussi Kangasharju Chapter Outline n Replication n Consistency models n Distribution protocols n Consistency protocols 2 Data Replication
More informationDistributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf
Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need
More informationFault-Tolerant Distributed Services and Paxos"
Fault-Tolerant Distributed Services and Paxos" INF346, 2015 2015 P. Kuznetsov and M. Vukolic So far " " Shared memory synchronization:" Wait-freedom and linearizability" Consensus and universality " Fine-grained
More informationA Layered Architecture for Erasure-Coded Consistent Distributed Storage
A Layered Architecture for Erasure-Coded Consistent Distributed Storage 1 Kishori M. Konwar, N. Prakash, Nancy Lynch, Muriel Médard Department of Electrical Engineering & Computer Science Massachusetts
More informationDistributed Systems 11. Consensus. Paul Krzyzanowski
Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one
More informationOverview. Distributed Computing with Oz/Mozart (VRH 11) Mozart research at a glance. Basic principles. Language design.
Distributed Computing with Oz/Mozart (VRH 11) Carlos Varela RPI March 15, 2007 Adapted with permission from: Peter Van Roy UCL Overview Designing a platform for robust distributed programming requires
More informationReplications and Consensus
CPSC 426/526 Replications and Consensus Ennan Zhai Computer Science Department Yale University Recall: Lec-8 and 9 In the lec-8 and 9, we learned: - Cloud storage and data processing - File system: Google
More informationAnnouncements. me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris
Announcements Email me your survey: See the Announcements page Today Conceptual overview of distributed systems System models Reading Today: Chapter 2 of Coulouris Next topic: client-side processing (HTML,
More informationDistributed Systems (5DV147)
Distributed Systems (5DV147) Replication and consistency Fall 2013 1 Replication 2 What is replication? Introduction Make different copies of data ensuring that all copies are identical Immutable data
More informationTransactum Business Process Manager with High-Performance Elastic Scaling. November 2011 Ivan Klianev
Transactum Business Process Manager with High-Performance Elastic Scaling November 2011 Ivan Klianev Transactum BPM serves three primary objectives: To make it possible for developers unfamiliar with distributed
More informationCoordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast
Coordination 2 Today l Group communication l Basic, reliable and l ordered multicast How can processes agree on an action or a value? Modes of communication Unicast 1ç è 1 Point to point Anycast 1è
More informationData Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 14: Data Replication Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database Replication What is database replication The advantages of
More informationFIT: A Distributed Database Performance Tradeoff. Faleiro and Abadi CS590-BDS Thamir Qadah
FIT: A Distributed Database Performance Tradeoff Faleiro and Abadi CS590-BDS Thamir Qadah Desirable features in Distributed Databases Impossible to achieve Fairness Isolation Throughput It is impossible
More informationConcurrent Computing
Concurrent Computing Introduction SE205, P1, 2017 Administrivia Language: (fr)anglais? Lectures: Fridays (15.09-03.11), 13:30-16:45, Amphi Grenat Web page: https://se205.wp.imt.fr/ Exam: 03.11, 15:15-16:45
More informationZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems
ZooKeeper & Curator CS 475, Spring 2018 Concurrent & Distributed Systems Review: Agreement In distributed systems, we have multiple nodes that need to all agree that some object has some state Examples:
More informationDistributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski
Distributed Systems 09. State Machine Replication & Virtual Synchrony Paul Krzyzanowski Rutgers University Fall 2016 1 State machine replication 2 State machine replication We want high scalability and
More informationShared memory model" Shared memory guarantees" Read-write register" Wait-freedom: unconditional progress " Liveness" Shared memory basics"
Shared memory basics INF346, 2014 Shared memory model Processes communicate by applying operations on and receiving responses from shared objects! A shared object is a state machine ü States ü Operations/Responses
More informationFault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior
Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure
More informationChapter 8 Fault Tolerance
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance Fault Tolerance Basic Concepts Being fault tolerant is strongly related to what
More informationConsistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016
Consistency in Distributed Storage Systems Mihir Nanavati March 4 th, 2016 Today Overview of distributed storage systems CAP Theorem About Me Virtualization/Containers, CPU microarchitectures/caches, Network
More informationSCALABLE CONSISTENCY AND TRANSACTION MODELS
Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:
More informationConsistency. CS 475, Spring 2018 Concurrent & Distributed Systems
Consistency CS 475, Spring 2018 Concurrent & Distributed Systems Review: 2PC, Timeouts when Coordinator crashes What if the bank doesn t hear back from coordinator? If bank voted no, it s OK to abort If
More informationDistributed System. Gang Wu. Spring,2018
Distributed System Gang Wu Spring,2018 Lecture4:Failure& Fault-tolerant Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the
More informationSection 4 Concurrent Objects Correctness, Progress and Efficiency
Section 4 Concurrent Objects Correctness, Progress and Efficiency CS586 - Panagiota Fatourou 1 Concurrent Objects A concurrent object is a data object shared by concurrently executing processes. Each object
More informationDistributed KIDS Labs 1
Distributed Databases @ KIDS Labs 1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Appears to user as a single system Database
More informationComputing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1
Computing Parable The Archery Teacher Courtesy: S. Keshav, U. Waterloo Lecture 16, page 1 Consistency and Replication Today: Consistency models Data-centric consistency models Client-centric consistency
More information6.033: Fault Tolerance: Isolation Lecture 17 Katrina LaCurts,
6.033: Fault Tolerance: Isolation Lecture 17 Katrina LaCurts, lacurts@mit.edu 0. Introduction - Last time: Atomicity via logging. We're good with atomicity now. - Today: Isolation - The problem: We have
More informationDistributed Systems 24. Fault Tolerance
Distributed Systems 24. Fault Tolerance Paul Krzyzanowski pxk@cs.rutgers.edu 1 Faults Deviation from expected behavior Due to a variety of factors: Hardware failure Software bugs Operator errors Network
More informationRESEARCH ARTICLE. A Simple Byzantine Fault-Tolerant Algorithm for a Multi-Writer Regular Register
The International Journal of Parallel, Emergent and Distributed Systems Vol., No.,, 1 13 RESEARCH ARTICLE A Simple Byzantine Fault-Tolerant Algorithm for a Multi-Writer Regular Register Khushboo Kanjani,
More informationTAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton
TAPIR By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton Outline Problem Space Inconsistent Replication TAPIR Evaluation Conclusion Problem
More informationNONBLOCKING COMMIT PROTOCOLS
Dale Skeen NONBLOCKING COMMIT PROTOCOLS MC714 Sistemas Distribuídos Nonblocking Commit Protocols Dale Skeen From a certain point onward there is no longer any turning back. That is the point that must
More informationTHE AUSTRALIAN NATIONAL UNIVERSITY Mid-semester Quiz, Second Semester COMP2310 (Concurrent and Distributed Systems)
THE AUSTRALIAN NATIONAL UNIVERSITY Mid-semester Quiz, Second Semester 2001 COMP2310 (Concurrent and Distributed Systems) Writing Period: 65 minutes duration Study Period: 0 minutes duration Permitted Materials:
More informationApplication of SDN: Load Balancing & Traffic Engineering
Application of SDN: Load Balancing & Traffic Engineering Outline 1 OpenFlow-Based Server Load Balancing Gone Wild Introduction OpenFlow Solution Partitioning the Client Traffic Transitioning With Connection
More informationNetwork Programming Languages. Nate Foster
Network Programming Languages Nate Foster We are at the start of a revolution! Network architectures are being opened up giving programmers the freedom to tailor their behavior to suit applications!
More informationGlobal atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol
Global atomicity In distributed systems a set of processes may be taking part in executing a task Their actions may have to be atomic with respect to processes outside of the set example: in a distributed
More informationLinearizability CMPT 401. Sequential Consistency. Passive Replication
Linearizability CMPT 401 Thursday, March 31, 2005 The execution of a replicated service (potentially with multiple requests interleaved over multiple servers) is said to be linearizable if: The interleaved
More informationFault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all
Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures or predictable: exhibit a well defined failure behavior
More informationDistributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance
Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Fault Tolerance Version: December 2, 2010 2 / 65 Contents Chapter
More informationUnderstanding Non-Blocking Atomic Commitment
Understanding Non-Blocking Atomic Commitment Özalp Babaoğlu Sam Toueg Technical Report UBLCS-93-2 January 1993 Department of Computer Science University of Bologna Mura Anteo Zamboni 7 40127 Bologna (Italy)
More informationOptimal Resilience for Erasure-Coded Byzantine Distributed Storage
Optimal Resilience for Erasure-Coded Byzantine Distributed Storage Christian Cachin IBM Research Zurich Research Laboratory CH-8803 Rüschlikon, Switzerland cca@zurich.ibm.com Stefano Tessaro ETH Zurich
More informationThe Power of Locality
GIAN Course on Distributed Network Algorithms The Power of Locality Case Study: Graph Coloring Stefan Schmid @ T-Labs, 2011 Case Study: Graph Coloring Case Study: Graph Coloring Assign colors to nodes.
More informationAtomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?
CPSC-4/6: Operang Systems Atomic Transactions The Transaction Model / Primitives Serializability Implementation Serialization Graphs 2-Phase Locking Optimistic Concurrency Control Transactional Memory
More informationToday: Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationFrom eventual to strong consistency. Primary-Backup Replication. Primary-Backup Replication. Replication State Machines via Primary-Backup
From eventual to strong consistency Replication s via - Eventual consistency Multi-master: Any node can accept operation Asynchronously, nodes synchronize state COS 418: Distributed Systems Lecture 10
More informationPrimary-Backup Replication
Primary-Backup Replication CS 240: Computing Systems and Concurrency Lecture 7 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Simplified Fault Tolerance
More informationUsing Optimistic Atomic Broadcast in Transaction Processing Systems
Using Optimistic Atomic Broadcast in Transaction Processing Systems Bettina Kemme Fernando Pedone Gustavo Alonso André Schiper Matthias Wiesmann School of Computer Science McGill University Montreal, Canada,
More informationDistributed Data Management Transactions
Felix Naumann F-2.03/F-2.04, Campus II Hasso Plattner Institut must ensure that interactions succeed consistently An OLTP Topic Motivation Most database interactions consist of multiple, coherent operations
More informationToday: Fault Tolerance. Reliable One-One Communication
Today: Fault Tolerance Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing Message logging Lecture 17, page 1 Reliable One-One Communication Issues
More informationChapter 2 Distributed Information Systems Architecture
Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 2 Distributed Information Systems Architecture Chapter Outline
More informationCoordinating distributed systems part II. Marko Vukolić Distributed Systems and Cloud Computing
Coordinating distributed systems part II Marko Vukolić Distributed Systems and Cloud Computing Last Time Coordinating distributed systems part I Zookeeper At the heart of Zookeeper is the ZAB atomic broadcast
More informationReview of last lecture. Goals of this lecture. DPHPC Overview. Lock-based queue. Lock-based queue
Review of last lecture Design of Parallel and High-Performance Computing Fall 2013 Lecture: Linearizability Instructor: Torsten Hoefler & Markus Püschel TA: Timo Schneider Cache-coherence is not enough!
More informationResource-bound process algebras for Schedulability and Performance Analysis of Real-Time and Embedded Systems
Resource-bound process algebras for Schedulability and Performance Analysis of Real-Time and Embedded Systems Insup Lee 1, Oleg Sokolsky 1, Anna Philippou 2 1 RTG (Real-Time Systems Group) Department of
More informationTransactions in a Distributed Key-Value Store
Transactions in a Distributed Key-Value Store 6.824 Final Project James Thomas, Deepak Narayanan, Arjun Srinivasan May 11, 2014 Introduction Over the last several years, NoSQL databases such as Redis,
More informationCommit-On-Sharing High Level Design
Commit-On-Sharing High Level Design Mike Pershin, Alexander Zam Zarochentsev 8 August 2008 1 Introduction 1.1 Definitions VBR version-based recovery transaction an atomic operation which reads and possibly
More informationNetwork Protocols. Sarah Diesburg Operating Systems CS 3430
Network Protocols Sarah Diesburg Operating Systems CS 3430 Protocol An agreement between two parties as to how information is to be transmitted A network protocol abstracts packets into messages Physical
More informationShared Memory Seif Haridi
Shared Memory Seif Haridi haridi@kth.se Real Shared Memory Formal model of shared memory No message passing (No channels, no sends, no delivers of messages) Instead processes access a shared memory Models
More informationReplication and Consistency
Replication and Consistency Today l Replication l Consistency models l Consistency protocols The value of replication For reliability and availability Avoid problems with disconnection, data corruption,
More informationA coded shared atomic memory algorithm for message passing architectures
Distrib. Comput. (2017) 30:49 73 DOI 10.1007/s00446-016-0275-x A coded shared atomic memory algorithm for message passing architectures Viveck R. Cadambe 1 ancy Lynch 2 Muriel Mèdard 3 Peter Musial 4 Received:
More informationErasure Coding in Object Stores: Challenges and Opportunities
Erasure Coding in Object Stores: Challenges and Opportunities Lewis Tseng Boston College July 2018, PODC Acknowledgements Nancy Lynch Muriel Medard Kishori Konwar Prakash Narayana Moorthy Viveck R. Cadambe
More informationCSE 344 MARCH 5 TH TRANSACTIONS
CSE 344 MARCH 5 TH TRANSACTIONS ADMINISTRIVIA OQ6 Out 6 questions Due next Wednesday, 11:00pm HW7 Shortened Parts 1 and 2 -- other material candidates for short answer, go over in section Course evaluations
More informationThere Is More Consensus in Egalitarian Parliaments
There Is More Consensus in Egalitarian Parliaments Iulian Moraru, David Andersen, Michael Kaminsky Carnegie Mellon University Intel Labs Fault tolerance Redundancy State Machine Replication 3 State Machine
More informationImportant Lessons. A Distributed Algorithm (2) Today's Lecture - Replication
Important Lessons Lamport & vector clocks both give a logical timestamps Total ordering vs. causal ordering Other issues in coordinating node activities Exclusive access to resources/data Choosing a single
More informationModule 8 - Fault Tolerance
Module 8 - Fault Tolerance Dependability Reliability A measure of success with which a system conforms to some authoritative specification of its behavior. Probability that the system has not experienced
More informationCS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 138: Practical Byzantine Consensus CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Scenario Asynchronous system Signed messages s are state machines It has to be practical CS 138
More informationPractice: Large Systems Part 2, Chapter 2
Practice: Large Systems Part 2, Chapter 2 Overvie Introduction Strong Consistency Crash Failures: Primary Copy, Commit Protocols Crash-Recovery Failures: Paxos, Chubby Byzantine Failures: PBFT, Zyzzyva
More informationHorizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage
Horizontal or vertical scalability? Scaling Out Key-Value Storage COS 418: Distributed Systems Lecture 8 Kyle Jamieson Vertical Scaling Horizontal Scaling [Selected content adapted from M. Freedman, B.
More informationArchitecture or Parallel Computers CSC / ECE 506
Architecture or Parallel Computers CSC / ECE 506 Summer 2006 Scalable Programming Models 6/19/2006 Dr Steve Hunter Back to Basics Parallel Architecture = Computer Architecture + Communication Architecture
More informationNon-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei
Non-blocking Array-based Algorithms for Stacks and Queues Niloufar Shafiei Outline Introduction Concurrent stacks and queues Contributions New algorithms New algorithms using bounded counter values Correctness
More informationLecture X: Transactions
Lecture X: Transactions CMPT 401 Summer 2007 Dr. Alexandra Fedorova Transactions A transaction is a collection of actions logically belonging together To the outside world, a transaction must appear as
More informationDon t Give Up on Serializability Just Yet. Neha Narula
Don t Give Up on Serializability Just Yet Neha Narula Don t Give Up on Serializability Just Yet A journey into serializable systems Neha Narula MIT CSAIL GOTO Chicago May 2015 2 @neha PhD candidate at
More information6.852: Distributed Algorithms Fall, Class 21
6.852: Distributed Algorithms Fall, 2009 Class 21 Today s plan Wait-free synchronization. The wait-free consensus hierarchy Universality of consensus Reading: [Herlihy, Wait-free synchronization] (Another
More informationFailure Tolerance. Distributed Systems Santa Clara University
Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot
More informationImplementing Shared Registers in Asynchronous Message-Passing Systems, 1995; Attiya, Bar-Noy, Dolev
Implementing Shared Registers in Asynchronous Message-Passing Systems, 1995; Attiya, Bar-Noy, Dolev Eric Ruppert, York University, www.cse.yorku.ca/ ruppert INDEX TERMS: distributed computing, shared memory,
More informationToday: Fault Tolerance. Failure Masking by Redundancy
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing
More information