Verdi: A Framework for Implementing and Formally Verifying Distributed Systems
|
|
- Nigel Gibson
- 5 years ago
- Views:
Transcription
1 Verdi: A Framework for Implementing and Formally Verifying Distributed Systems Key-value store VST James R. Wilcox, Doug Woos, Pavel Panchekha, Zach Tatlock, Xi Wang, Michael D. Ernst, Thomas Anderson
2 Challenges
3 Challenges Distributed systems run in unreliable environments
4 Challenges Distributed systems run in unreliable environments Many types of failure can occur
5 Challenges Distributed systems run in unreliable environments Many types of failure can occur Fault-tolerance mechanisms are challenging to implement correctly
6 Challenges Distributed systems run in unreliable environments Many types of failure can occur Fault-tolerance mechanisms are challenging to implement correctly
7 Challenges Contributions Distributed systems run in unreliable environments Formalize network as operational semantics Many types of failure can occur Build semantics for a variety of fault models Fault-tolerance mechanisms are challenging to implement correctly Verify fault-tolerance as transformation between semantics
8 Verdi Workflow Build, verify system in simple semantics Client I/O Key-value store
9 Verdi Workflow Build, verify system in simple semantics Client I/O Key-value store V Apply verified system transformer S T KV Consensus Client I/O KV Consensus KV Consensus
10 Verdi Workflow Build, verify system in simple semantics Client I/O Key-value store V Apply verified system transformer T S KV End-to-end correctness by composition Client I/O KV Consensus Consensus KV Consensus
11 Contributions Formalize network as operational semantics Build semantics for a variety of fault models Verify fault-tolerance as transformation between semantics
12 Contributions General Approach Formalize network as operational semantics Find environments in your problem domain Build semantics for a variety of fault models Formalize these environments as operational semantics Verify fault-tolerance as transformation between semantics Verify layers as transformations between semantics
13 Verdi Successes Applications Key-value store Lock service Fault-tolerance mechanisms Sequence numbering Retransmission Primary-backup replication Consensus-based replication linearizability
14 Verdi Successes Applications Key-value store Lock service Fault-tolerance mechanisms Sequence numbering Retransmission Primary-backup replication Consensus-based replication linearizability
15 Important data
16 Replicated KV store Replicated KV store Replicated KV store Replicated for availability
17 Replicated KV store Replicated KV store Replicated KV store
18 Crash Reorder Drop Duplicate Partition... Replicated KV store Replicated KV store Replicated KV store
19 Crash Reorder Drop Duplicate Partition... Replicated KV store Replicated KV store Replicated KV store Environment is unreliable
20 Crash Reorder Drop Duplicate Partition... Replicated KV store Replicated KV store Replicated KV store Decades of research; still difficult to implement correctly Implementations often have bugs
21 Bug-free Implementations
22 Bug-free Implementations
23 Bug-free Implementations Several inspiring successes in formal verification CompCert, sel4, Jitk, Bedrock, IronClad, Frenetic, Quark
24 Bug-free Implementations Several inspiring successes in formal verification CompCert, sel4, Jitk, Bedrock, IronClad, Frenetic, Quark Goal: formally verify distributed system implementations
25 Formally Verify Distributed Implementations
26 Formally Verify Distributed Implementations Separate independent system components
27 Formally Verify Distributed Implementations App Fault tolerance App Fault tolerance App Fault tolerance Separate independent system components
28 Formally Verify Distributed Implementations App Fault tolerance App Fault tolerance App Fault tolerance Separate independent system components Verify application logic independently from fault-tolerance tolerance
29 Formally Verify Distributed Implementations KV Consensus KV Consensus KV Consensus Separate independent system components Verify application key-value store logic independently from consensus
30 Formally Verify Distributed Implementations KV KV 1. Verify application logic Consensus 2. Verify fault tolerance mechanism KV 3. Run the system! Consensus Consensus Separate independent system components Verify application key-value store logic independently from consensus
31 1. Verify Application Logic Client I/O Key-value store
32 1. Verify Application Logic Simple model, prove good map Client I/O Key-value store
33 1. Verify Application Logic Simple model, prove good map Client I/O Key-value store
34 2. Verify Fault Tolerance Mechanism Simple model, prove good map Client I/O Key-value store
35 2. Verify Fault Tolerance Mechanism Simple model, prove good map Client I/O Key-value store V S T
36 2. Verify Fault Tolerance Mechanism Simple model, prove good map Client I/O Key-value store V S T KV Consensus Client I/O KV Consensus KV Consensus
37 2. Verify Fault Tolerance Mechanism Simple model, prove good map Client I/O Key-value store V Apply verified system transformer, prove properties preserved S T KV Consensus Client I/O KV Consensus KV Consensus
38 2. Verify Fault Tolerance Mechanism Simple model, prove good map Client I/O Key-value store V Apply verified system transformer, prove properties preserved S T KV Consensus Client I/O KV Consensus KV Consensus
39 2. Verify Fault Tolerance Mechanism Simple model, prove good map Client I/O Key-value store V Apply verified system transformer, prove properties preserved S T KV End-to-end correctness by composition Client I/O KV Consensus Consensus KV Consensus
40 3. Run the System! KV Consensus KV Consensus KV Consensus Extract to OCaml, link unverified shim Run on real networks
41 Verifying application logic
42 Simple One-node Model Key-value State: {}
43 Simple One-node Model Set k v" Key-value State: {}
44 Simple One-node Model Set k v" Key-value State: { k : v }
45 Simple One-node Model Set k v" Key-value State: { k : v } Resp k v
46 Simple One-node Model Set k v" Key-value State: { k : v } Resp k v Trace: [Set k v", Resp k v ]
47 Simple One-node Model System State: σ H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input
48 Simple One-node Model Input:! System State: σ H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input
49 Simple One-node Model Input:! System State: σ H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input
50 Simple One-node Model Input:! System State: σ State: σ H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input
51 Simple One-node Model Input:! System State: σ State: σ Output: o H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input
52 Simple One-node Model Input:! System State: σ State: σ Output: o H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input
53 Simple One-node Model Input:! System State: σ State: σ Output: o H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input
54 Simple One-node Model Input:! System State: σ State: σ Output: o Trace: [!, o] H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input
55 Simple One-node Model Input:! System State: σ State: σ Output: o Trace: [!, o] H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input
56 Simple One-node Model Input:! System State: σ State: σ Output: o Trace: [!, o] H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input
57 Simple One-node Model
58 Simple One-node Model Spec: operations have expected behavior (good map) Set, Get Del, Get
59 Simple One-node Model Spec: operations have expected behavior (good map) Set, Get Del, Get Verify system against semantics by induction Safety Property
60 Verifying Fault Tolerance
61 The Raft Transformer Consensus provides a replicated state machine Raft Same inputs on each node Calls into original system Raft Raft
62 The Raft Transformer Log of operations Consensus provides a replicated state machine Raft Same inputs on each node Calls into original system Raft Raft
63 The Raft Transformer Log of operations Consensus provides a replicated state Original machine system Raft Same inputs on each node Calls into original system Raft Raft
64 The Raft Transformer When input received: Add to log Send to other nodes Raft When op replicated: Apply to state machine Send output Raft Raft
65 The Raft Transformer For KV store: Ops are Get, Set, Del State is dictionary Raft Raft Raft
66 Raft Correctness V Correctly transforms systems T S Preserves traces Raft Linearizability Raft Raft
67 Fault Model Model global state Model internal communication Model failure
68 Fault Model: Global State
69 Fault Model: Global State Machines have names 1 2 3
70 Fault Model: Global State Machines have names Σ maps name to state 1 Σ[1] 2 3 Σ[2] Σ[3]
71 Fault Model: Messages 1 Σ[1] 2 3 Σ[2] Σ[3]
72 Fault Model: Messages 1 Vote? Σ[1] Vote? 2 3 Σ[2] Σ[3]
73 Fault Model: Messages 1 Network <1,2, Vote? > Vote? Σ[1] Vote? <1,3, Vote? > 2 3 Σ[2] Σ[3]
74 Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <1,2, Vote? > 2 3 Σ[2] Σ[3]
75 Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <1,2, Vote? > 2 3 Σ[2] Σ[3] H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)
76 Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <1,2, Vote? > 2 3 Σ[2] Σ [2] = σ Σ[3] H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)
77 Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <1,2, Vote? > 2 3 Σ[2] Σ [2] = σ Σ[3] Output: o H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)
78 Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <1,2, Vote? > <2,1, +1 > 2 3 Σ[2] Σ [2] = σ Σ[3] Output: o H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)
79 Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <2,1, +1 > <1,2, Vote? > 2 3 Σ[2] Σ [2] = σ Σ[3] Output: o H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)
80 Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <2,1, +1 > <1,2, Vote? > 2 3 Σ[2] Σ [2] = σ Σ[3] Output: o H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)
81 Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <2,1, +1 > <1,2, Vote? > 2 3 Σ[2] Σ [2] = σ Σ[3] Output: o H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)
82 Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <2,1, +1 > <1,2, Vote? > 2 3 Σ[2] Σ [2] = σ Σ[3] Output: o H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)
83 Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <2,1, +1 > <1,2, Vote? > 2 3 Σ[2] Σ [2] = σ Σ[3] Output: o H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)
84 Fault Model: Failures Network <1,2, Vote? > <1,3, Vote? > 1 Σ[1] 2 3 Σ[2] Σ[3]
85 Fault Model: Failures Network <1,3, Vote? > Message drop 1 Σ[1] 2 3 Σ[2] Σ[3]
86 Fault Model: Failures Network <1,3, Vote? > Message drop <1,3, Vote? > Message duplication 1 Σ[1] 2 3 Σ[2] Σ[3]
87 Fault Model: Failures Network <1,3, Vote? > Message drop <1,3, Vote? > Message duplication 1 Σ[1] Machine crash 2 3 Σ[2] Σ[3]
88 Fault Model: Drop Network <1,2, hi > <1,3, hi > ({p} ] P,, T) drop (P,, T) Drop
89 Fault Model: Drop Network <1,2, hi > <1,3, hi > ({p} ] P,, T) drop (P,, T) Drop
90 Fault Model: Drop Network <1,2, hi > <1,3, hi > ({p} ] P,, T) drop (P,, T) Drop
91 Toward Verifying Raft General theory of linearizability 1k lines of implementation, 5k lines for linearizability State machine safety: 30k lines Most state invariants proved, some left to do
92 Verified System Transformers Functions on systems Transform systems between semantics Maintain equivalent traces Get correctness of transformed system for free
93 Verified System Transformers
94 Verified System Transformers Raft Consensus App
95 Verified System Transformers Raft Consensus Primary Backup App App
96 Verified System Transformers Raft Consensus Primary Backup Seq # and Retrans App App
97 Verified System Transformers Raft Primary Seq # and Ghost Consensus Backup Retrans Variables App App
98 Running Verdi Programs
99 Running Verdi Programs Coq extraction to Ocaml Thin, unverified shim Trusted compute base: shim, Coq, Ocaml, OS
100 Performance Evaluation Compare with etcd, a similar open-source store 10% performance overhead Mostly disk/network bound etcd has had linearizability bugs
101 Previous Approaches EventML [Schiper 2014] Verified Paxos using the NuPRL proof assistant MACE [Killian 2007] Model checking distributed systems in C++ TLA+ [Lamport 2002] Specification language and logic
102 Contributions Formalize network as operational semantics Build semantics for a variety of fault models Verify fault-tolerance as transformation between semantics
103 Contributions Formalize network as operational semantics Build semantics for a variety of fault models Verify fault-tolerance as transformation between semantics
104 Contributions Formalize network as operational semantics Build semantics for a variety of fault models Verify fault-tolerance as transformation between semantics Thanks!
Programming and Proving with Distributed Protocols Disel: Distributed Separation Logic.
Programming and Proving with Distributed Protocols Disel: Distributed Separation Logic ` {P } c {Q} http://distributedcomponents.net Ilya Sergey James R. Wilcox Zach Tatlock Distributed Systems Distributed
More informationProgramming Language Abstractions for Modularly Verified Distributed Systems. James R. Wilcox Zach Tatlock Ilya Sergey
Programming Language Abstractions for Modularly Verified Distributed Systems ` {P } c {Q} James R. Wilcox Zach Tatlock Ilya Sergey Distributed Systems Distributed Infrastructure Distributed Applications
More informationAn Empirical Study on the Correctness of Formally Verified Distributed Systems. Pedro Fonseca, Kaiyuan Zhang, Xi Wang, Arvind Krishnamurthy
An Empirical Study on the Correctness of Formally Verified Distributed Systems Pedro Fonseca, Kaiyuan Zhang, Xi Wang, Arvind Krishnamurthy We need robust distributed systems Distributed systems are critical!
More informationBrowser Security Guarantees through Formal Shim Verification
Browser Security Guarantees through Formal Shim Verification Dongseok Jang Zachary Tatlock Sorin Lerner UC San Diego Browsers: Critical Infrastructure Ubiquitous: many platforms, sensitive apps Vulnerable:
More informationDesigning Systems for Push-Button Verification
Designing Systems for Push-Button Verification Luke Nelson, Helgi Sigurbjarnarson, Xi Wang Joint work with James Bornholt, Dylan Johnson, Arvind Krishnamurthy, EminaTorlak, Kaiyuan Zhang Formal verification
More informationHyperkernel: Push-Button Verification of an OS Kernel
Hyperkernel: Push-Button Verification of an OS Kernel Luke Nelson, Helgi Sigurbjarnarson, Kaiyuan Zhang, Dylan Johnson, James Bornholt, Emina Torlak, and Xi Wang The OS Kernel is a critical component Essential
More informationThere Is More Consensus in Egalitarian Parliaments
There Is More Consensus in Egalitarian Parliaments Iulian Moraru, David Andersen, Michael Kaminsky Carnegie Mellon University Intel Labs Fault tolerance Redundancy State Machine Replication 3 State Machine
More informationIronFleet. Dmitry Bondarenko, Yixuan Chen
IronFleet Dmitry Bondarenko, Yixuan Chen A short survey How many people have the confidence that your Paxos implementation has no bug? How many people have proved that the implementation is correct? Why
More informationReplication in Distributed Systems
Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over
More informationImplementing a Verified On-Disk Hash Table
Implementing a Verified On-Disk Hash Table Stephanie Wang Abstract As more and more software is written every day, so too are bugs. Software verification is a way of using formal mathematical methods to
More informationToward Rigorous Design of Domain-Specific Distributed Systems
FormaliSE 16 Toward Rigorous Design of Domain-Specific Distributed Systems Mohammed S. Al-Mahfoudh Ganesh Gopalakrishnan Ryan Statesman The University of Utah Outline Intro Nowadays situation solutions:
More informationAn Empirical Study on the Correctness of Formally Verified Distributed Systems
An Empirical Study on the Correctness of Formally Verified Distributed Systems Pedro Fonseca Kaiyuan Zhang Xi Wang Arvind Krishnamurthy University of Washington {pfonseca, kaiyuanz, xi, arvind}@cs.washington.edu
More informationPush-Button Verification of File Systems
1 / 25 Push-Button Verification of File Systems via Crash Refinement Helgi Sigurbjarnarson, James Bornholt, Emina Torlak, Xi Wang University of Washington October 26, 2016 2 / 25 File systems are hard
More informationByzantine Fault Tolerant Raft
Abstract Byzantine Fault Tolerant Raft Dennis Wang, Nina Tai, Yicheng An {dwang22, ninatai, yicheng} @stanford.edu https://github.com/g60726/zatt For this project, we modified the original Raft design
More informationPush-Button Verification of File Systems
1 / 24 Push-Button Verification of File Systems via Crash Refinement Helgi Sigurbjarnarson, James Bornholt, Emina Torlak, Xi Wang University of Washington 2 / 24 File systems are hard to get right Complex
More informationEECS 498 Introduction to Distributed Systems
EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Replicated State Machines Logical clocks Primary/ Backup Paxos? 0 1 (N-1)/2 No. of tolerable failures October 11, 2017 EECS 498
More informationA Reliable Broadcast System
A Reliable Broadcast System Yuchen Dai, Xiayi Huang, Diansan Zhou Department of Computer Sciences and Engineering Santa Clara University December 10 2013 Table of Contents 2 Introduction......3 2.1 Objective...3
More informationProgramming Distributed Systems
Annette Bieniusa Programming Distributed Systems Summer Term 2018 1/ 26 Programming Distributed Systems 09 Testing Distributed Systems Annette Bieniusa AG Softech FB Informatik TU Kaiserslautern Summer
More informationpicoq: Parallel Regression Proving for Large-Scale Verification Projects
picoq: Parallel Regression Proving for Large-Scale Verification Projects Karl Palmskog, Ahmet Celik, and Milos Gligoric The University of Texas at Austin, USA 1 / 29 Introduction Verification Using Proof
More informationExploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout
Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: replication adds latency and throughput overheads CURP: Consistent Unordered Replication Protocol
More informationTwo phase commit protocol. Two phase commit protocol. Recall: Linearizability (Strong Consistency) Consensus
Recall: Linearizability (Strong Consistency) Consensus COS 518: Advanced Computer Systems Lecture 4 Provide behavior of a single copy of object: Read should urn the most recent write Subsequent reads should
More informationVelisarios: Byzantine Fault-Tolerant Protocols Powered by Coq
Velisarios: Byzantine Fault-Tolerant Protocols Powered by Coq Vincent Rahli, Ivana Vukotic, Marcus Völp, Paulo Esteves-Verissimo SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg firstname.lastname@uni.lu
More informationFailures, Elections, and Raft
Failures, Elections, and Raft CS 8 XI Copyright 06 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved. Distributed Banking SFO add interest based on current balance PVD deposit $000 CS 8 XI Copyright
More informationCS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 138: Practical Byzantine Consensus CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Scenario Asynchronous system Signed messages s are state machines It has to be practical CS 138
More informationStrong Consistency & CAP Theorem
Strong Consistency & CAP Theorem CS 240: Computing Systems and Concurrency Lecture 15 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency models
More informationRemote Procedure Call. Tom Anderson
Remote Procedure Call Tom Anderson Why Are Distributed Systems Hard? Asynchrony Different nodes run at different speeds Messages can be unpredictably, arbitrarily delayed Failures (partial and ambiguous)
More informationTranslation Validation for a Verified OS Kernel
To appear in PLDI 13 Translation Validation for a Verified OS Kernel Thomas Sewell 1, Magnus Myreen 2, Gerwin Klein 1 1 NICTA, Australia 2 University of Cambridge, UK L4.verified sel4 = a formally verified
More informationDeveloping Correctly Replicated Databases Using Formal Tools
Developing Correctly Replicated Databases Using Formal Tools Nicolas Schiper Vincent Rahli Robbert Van Renesse Mark Bickford Robert L. Constable Department of Computer Science, Cornell University Abstract
More informationDistributed Consensus Protocols
Distributed Consensus Protocols ABSTRACT In this paper, I compare Paxos, the most popular and influential of distributed consensus protocols, and Raft, a fairly new protocol that is considered to be a
More informationFailure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18
Failure models Byzantine Fault Tolerance Fail-stop: nodes either execute the protocol correctly or just stop Byzantine failures: nodes can behave in any arbitrary way Send illegal messages, try to trick
More informationByzantine Fault Tolerance
Byzantine Fault Tolerance CS6450: Distributed Systems Lecture 10 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University.
More informationRecall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos
Consensus I Recall our 2PC commit problem FLP Impossibility, Paxos Client C 1 C à TC: go! COS 418: Distributed Systems Lecture 7 Michael Freedman Bank A B 2 TC à A, B: prepare! 3 A, B à P: yes or no 4
More informationCS6450: Distributed Systems Lecture 15. Ryan Stutsman
Strong Consistency CS6450: Distributed Systems Lecture 15 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed
More informationCausal Consistency and Two-Phase Commit
Causal Consistency and Two-Phase Commit CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency
More informationRunway. A new tool for distributed systems design. Diego Ongaro Lead Software Engineer, Compute
Runway A new tool for distributed systems design Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie https://runway.systems Outline 1. Why we need new tools for distributed systems design
More informationValidating Key-value Based Implementations of the Raft Consensus Algorithm for Distributed Systems
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2018 Validating Key-value Based Implementations of the Raft Consensus Algorithm for Distributed
More informationProphecy: Using History for High Throughput Fault Tolerance
Prophecy: Using History for High Throughput Fault Tolerance Siddhartha Sen Joint work with Wyatt Lloyd and Mike Freedman Princeton University Non crash failures happen Non crash failures happen Model as
More informationBasic vs. Reliable Multicast
Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?
More informationStatic and User-Extensible Proof Checking. Antonis Stampoulis Zhong Shao Yale University POPL 2012
Static and User-Extensible Proof Checking Antonis Stampoulis Zhong Shao Yale University POPL 2012 Proof assistants are becoming popular in our community CompCert [Leroy et al.] sel4 [Klein et al.] Four-color
More informationGenie. Distributed Systems Synthesis and Verification. Marc Rosen. EN : Advanced Distributed Systems and Networks May 1, 2017
Genie Distributed Systems Synthesis and Verification Marc Rosen EN.600.667: Advanced Distributed Systems and Networks May 1, 2017 1 / 35 Outline Introduction Problem Statement Prior Art Demo How does it
More informationDesigning for Understandability: the Raft Consensus Algorithm. Diego Ongaro John Ousterhout Stanford University
Designing for Understandability: the Raft Consensus Algorithm Diego Ongaro John Ousterhout Stanford University Algorithms Should Be Designed For... Correctness? Efficiency? Conciseness? Understandability!
More informationExtend PB for high availability. PB high availability via 2PC. Recall: Primary-Backup. Putting it all together for SMR:
Putting it all together for SMR: Two-Phase Commit, Leader Election RAFT COS 8: Distributed Systems Lecture Recall: Primary-Backup Mechanism: Replicate and separate servers Goal #: Provide a highly reliable
More informationRecovering from a Crash. Three-Phase Commit
Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator
More informationTurning proof assistants into programming assistants
Turning proof assistants into programming assistants ST Winter Meeting, 3 Feb 2015 Magnus Myréen Why? Why combine proof- and programming assistants? Why proofs? Testing cannot show absence of bugs. Some
More informationA Graphical Interactive Debugger for Distributed Systems
A Graphical Interactive Debugger for Distributed Systems Doug Woos March 23, 2018 1 Introduction Developing correct distributed systems is difficult. Such systems are nondeterministic, since the network
More informationA Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm
Appears as Technical Memo MIT/LCS/TM-590, MIT Laboratory for Computer Science, June 1999 A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm Miguel Castro and Barbara Liskov
More informationCSE 124: REPLICATED STATE MACHINES. George Porter November 8 and 10, 2017
CSE 24: REPLICATED STATE MACHINES George Porter November 8 and 0, 207 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons
More informationOvid A Software-Defined Distributed Systems Framework. Deniz Altinbuken, Robbert van Renesse Cornell University
Ovid A Software-Defined Distributed Systems Framework Deniz Altinbuken, Robbert van Renesse Cornell University Ovid Build distributed systems that are easy to evolve easy to reason about easy to compose
More informationŒuf: Minimizing the Coq Extraction TCB
Œuf: Minimizing the Coq Extraction TCB Eric Mullen University of Washington, USA USA Abstract Zachary Tatlock University of Washington, USA USA Verifying systems by implementing them in the programming
More informationEnhancing Throughput of
Enhancing Throughput of NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano Enhancing Throughput of Partially Replicated State Machines via NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano Enhancing
More informationFunctional Programming in Coq. Nate Foster Spring 2018
Functional Programming in Coq Nate Foster Spring 2018 Review Previously in 3110: Functional programming Modular programming Data structures Interpreters Next unit of course: formal methods Today: Proof
More informationByzantine Fault Tolerance
Byzantine Fault Tolerance CS 240: Computing Systems and Concurrency Lecture 11 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. So far: Fail-stop failures
More informationRecall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers
Replicated s, RAFT COS 8: Distributed Systems Lecture 8 Recall: Primary-Backup Mechanism: Replicate and separate servers Goal #: Provide a highly reliable service Goal #: Servers should behave just like
More informationSpecifying the Ethereum Virtual Machine for Theorem Provers
1/28 Specifying the Ethereum Virtual Machine for Theorem Provers Yoichi Hirai Ethereum Foundation Cambridge, Sep. 13, 2017 (FC 2017 + some updates) 2/28 Outline Problem Motivation EVM as a Machine Wanted
More informationTrends in Automated Verification
Trends in Automated Verification K. Rustan M. Leino Senior Principal Engineer Automated Reasoning Group (ARG), Amazon Web Services 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
More informationTemporal NetKAT. Ryan Beckett Michael Greenberg*, David Walker. Pomona College* Princeton University
Temporal NetKAT Ryan Beckett Michael Greenberg*, David Walker Princeton University Pomona College* Software-Defined Networking Controller Software-Defined Networking Controller Match Action dst=1.2.3.4
More informationPaxos Made Moderately Complex Made Moderately Simple
Paxos Made Moderately Complex Made Moderately Simple Tom Anderson and Doug Woos State machine replication Reminder: want to agree on order of ops Can think of operations as a log S1 S2 Lab 3 Paxos? S3
More informationSplit-Brain Consensus
Split-Brain Consensus On A Raft Up Split Creek Without A Paddle John Burke jcburke@stanford.edu Rasmus Rygaard rygaard@stanford.edu Suzanne Stathatos sstat@stanford.edu ABSTRACT Consensus is critical for
More informationFrom Crypto to Code. Greg Morrisett
From Crypto to Code Greg Morrisett Languages over a career Pascal/Ada/C/SML/Ocaml/Haskell ACL2/Coq/Agda Latex Powerpoint Someone else s Powerpoint 2 Cryptographic techniques Already ubiquitous: e.g., SSL/TLS
More informationToday: Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationSelf-healing Data Step by Step
Self-healing Data Step by Step Uwe Friedrichsen (codecentric AG) NoSQL matters Cologne, 29. April 2014 @ufried Uwe Friedrichsen uwe.friedrichsen@codecentric.de http://slideshare.net/ufried http://ufried.tumblr.com
More informationPractical Byzantine Fault Tolerance
Practical Byzantine Fault Tolerance Robert Grimm New York University (Partially based on notes by Eric Brewer and David Mazières) The Three Questions What is the problem? What is new or different? What
More informationPractical Byzantine Fault Tolerance. Castro and Liskov SOSP 99
Practical Byzantine Fault Tolerance Castro and Liskov SOSP 99 Why this paper? Kind of incredible that it s even possible Let alone a practical NFS implementation with it So far we ve only considered fail-stop
More informationFormal verification of program obfuscations
Formal verification of program obfuscations Sandrine Blazy joint work with Roberto Giacobazzi and Alix Trieu IFIP WG 2.11, 2015-11-10 1 Background: verifying a compiler Compiler + proof that the compiler
More informationDistributed Consensus: Making Impossible Possible
Distributed Consensus: Making Impossible Possible QCon London Tuesday 29/3/2016 Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 What is Consensus? The process
More informationEECS 498 Introduction to Distributed Systems
EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Implementing RSMs Logical clock based ordering of requests Cannot serve requests if any one replica is down Primary-backup replication
More informationExploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout
Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: consistent replication adds latency and throughput overheads Why? Replication happens after ordering
More informationIronclad Apps: End-to-End Security via Automated Full-System Verification. Danfeng Zhang
Ironclad Apps: End-to-End Security via Automated Full-System Verification Chris Hawblitzel Jon Howell Jay Lorch Arjun Narayan Bryan Parno Danfeng Zhang Brian Zill Online and Mobile Security Chase Online,
More informationCS6450: Distributed Systems Lecture 11. Ryan Stutsman
Strong Consistency CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed
More informationFast Paxos. Leslie Lamport
Distrib. Comput. (2006) 19:79 103 DOI 10.1007/s00446-006-0005-x ORIGINAL ARTICLE Fast Paxos Leslie Lamport Received: 20 August 2005 / Accepted: 5 April 2006 / Published online: 8 July 2006 Springer-Verlag
More informationA Framework for the Formal Verification of Time-Triggered Systems
A Framework for the Formal Verification of Time-Triggered Systems Lee Pike leepike@galois.com Indiana University, Bloomington Department of Computer Science Advisor: Prof. Steven D. Johnson December 12,
More informationKami: A Framework for (RISC-V) HW Verification
Kami: A Framework for (RISC-V) HW Verification Murali Vijayaraghavan Joonwon Choi, Adam Chlipala, (Ben Sherman), Andy Wright, Sizhuo Zhang, Thomas Bourgeat, Arvind 1 The Riscy Expedition by MIT Riscy Library
More informationIntroduction to riak_ensemble. Joseph Blomstedt Basho Technologies
Introduction to riak_ensemble Joseph Blomstedt (@jtuple) Basho Technologies riak_ensemble Paxos framework for scalable consistent system 2 node node node node node node node node 3 What about state? 4
More informationDistributed Systems 24. Fault Tolerance
Distributed Systems 24. Fault Tolerance Paul Krzyzanowski pxk@cs.rutgers.edu 1 Faults Deviation from expected behavior Due to a variety of factors: Hardware failure Software bugs Operator errors Network
More informationDistributed Consensus: Making Impossible Possible
Distributed Consensus: Making Impossible Possible Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 hh360.user.srcf.net Sometimes inconsistency is not an option
More informationBuilding Consistent Transactions with Inconsistent Replication
DB Reading Group Fall 2015 slides by Dana Van Aken Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports
More informationConsensus, impossibility results and Paxos. Ken Birman
Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}
More informationRecall use of logical clocks
Causal Consistency Consistency models Linearizability Causal Eventual COS 418: Distributed Systems Lecture 16 Sequential Michael Freedman 2 Recall use of logical clocks Lamport clocks: C(a) < C(z) Conclusion:
More informationConsensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.
Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}
More informationPARALLEL CONSENSUS PROTOCOL
CANOPUS: A SCALABLE AND MASSIVELY PARALLEL CONSENSUS PROTOCOL Bernard Wong CoNEXT 2017 Joint work with Sajjad Rizvi and Srinivasan Keshav CONSENSUS PROBLEM Agreement between a set of nodes in the presence
More informationBeyond FLP. Acknowledgement for presentation material. Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen
Beyond FLP Acknowledgement for presentation material Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen Paper trail blog: http://the-paper-trail.org/blog/consensus-protocols-paxos/
More informationVeriCon: Towards Verifying Controller Programs in SDNs
VeriCon: Towards Verifying Controller Programs in SDNs Thomas Ball, Nikolaj Bjorner, Aaron Gember, Shachar Itzhaky, Aleksandr Karbyshev, Mooly Sagiv, Michael Schapira, Asaf Valadarsky 1 Guaranteeing network
More informationVerasco: a Formally Verified C Static Analyzer
Verasco: a Formally Verified C Static Analyzer Jacques-Henri Jourdan Joint work with: Vincent Laporte, Sandrine Blazy, Xavier Leroy, David Pichardie,... June 13, 2017, Montpellier GdR GPL thesis prize
More informationPractical Byzantine Fault Tolerance and Proactive Recovery
Practical Byzantine Fault Tolerance and Proactive Recovery MIGUEL CASTRO Microsoft Research and BARBARA LISKOV MIT Laboratory for Computer Science Our growing reliance on online services accessible on
More informationLarge-Scale Key-Value Stores Eventual Consistency Marco Serafini
Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,
More informationImproving Interrupt Response Time in a Verifiable Protected Microkernel
Improving Interrupt Response Time in a Verifiable Protected Microkernel Bernard Blackham Yao Shi Gernot Heiser The University of New South Wales & NICTA, Sydney, Australia EuroSys 2012 Motivation The desire
More informationToday: Fault Tolerance. Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationAn Extensible Programming Language for Verified Systems Software. Adam Chlipala MIT CSAIL WG 2.16 meeting, 2012
An Extensible Programming Language for Verified Systems Software Adam Chlipala MIT CSAIL WG 2.16 meeting, 2012 The status quo in computer system design DOM JS API Web page Network JS API CPU Timer interrupts
More informationDistributed Systems. Day 11: Replication [Part 3 Raft] To survive failures you need a raft
Distributed Systems Day : Replication [Part Raft] To survive failures you need a raft Consensus Consensus: A majority of nodes agree on a value Variations on the problem, depending on assumptions Synchronous
More information340 Review Fall Midterm 1 Review
340 Review Fall 2016 Midterm 1 Review Concepts A. UML Class Diagrams 1. Components: Class, Association (including association name), Multiplicity Constraints, General Constraints, Generalization/Specialization,
More informationHow Can You Trust Formally Verified Software?
How Can You Trust Formally Verified Software? Alastair Reid Arm Research @alastair_d_reid Formal verification Of libraries and apps Of compilers Of operating systems 2 Fonseca et al., An Empirical Study
More informationREPLICATED STATE MACHINES
5/6/208 REPLICATED STATE MACHINES George Porter May 6, 208 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license These
More informationVerification of Fault-Tolerant Protocols with Sally
Verification of Fault-Tolerant Protocols with Sally Bruno Dutertre, Dejan Jovanović, and Jorge A. Navas Computer Science Laboratory, SRI International Abstract. Sally is a model checker for infinite-state
More informationHow much is a mechanized proof worth, certification-wise?
How much is a mechanized proof worth, certification-wise? Xavier Leroy Inria Paris-Rocquencourt PiP 2014: Principles in Practice In this talk... Some feedback from the aircraft industry concerning the
More informationCertified compilers. Do you trust your compiler? Testing is immune to this problem, since it is applied to target code
Certified compilers Do you trust your compiler? Most software errors arise from source code But what if the compiler itself is flawed? Testing is immune to this problem, since it is applied to target code
More informationCoordinating distributed systems part II. Marko Vukolić Distributed Systems and Cloud Computing
Coordinating distributed systems part II Marko Vukolić Distributed Systems and Cloud Computing Last Time Coordinating distributed systems part I Zookeeper At the heart of Zookeeper is the ZAB atomic broadcast
More informationDatabase Management Systems
Database Management Systems Distributed Databases Doug Shook What does it mean to be distributed? Multiple nodes connected by a network Data on the nodes is logically related The nodes do not need to be
More informationDistributed System. Gang Wu. Spring,2018
Distributed System Gang Wu Spring,2018 Lecture4:Failure& Fault-tolerant Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the
More informationNegations in Refinement Type Systems
Negations in Refinement Type Systems T. Tsukada (U. Tokyo) 14th March 2016 Shonan, JAPAN This Talk About refinement intersection type systems that refute judgements of other type systems. Background Refinement
More informationHandling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler
Handling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler Joshua Auerbach, Martin Hirzel, Louis Mandel, Avi Shinnar and Jérôme Siméon IBM
More information