Verdi: A Framework for Implementing and Formally Verifying Distributed Systems

Similar documents
Programming and Proving with Distributed Protocols Disel: Distributed Separation Logic.

Programming Language Abstractions for Modularly Verified Distributed Systems. James R. Wilcox Zach Tatlock Ilya Sergey

An Empirical Study on the Correctness of Formally Verified Distributed Systems. Pedro Fonseca, Kaiyuan Zhang, Xi Wang, Arvind Krishnamurthy

Browser Security Guarantees through Formal Shim Verification

Designing Systems for Push-Button Verification

Hyperkernel: Push-Button Verification of an OS Kernel

There Is More Consensus in Egalitarian Parliaments

IronFleet. Dmitry Bondarenko, Yixuan Chen

Replication in Distributed Systems

Implementing a Verified On-Disk Hash Table

Toward Rigorous Design of Domain-Specific Distributed Systems

An Empirical Study on the Correctness of Formally Verified Distributed Systems

Push-Button Verification of File Systems

Byzantine Fault Tolerant Raft

Push-Button Verification of File Systems

EECS 498 Introduction to Distributed Systems

A Reliable Broadcast System

Programming Distributed Systems

picoq: Parallel Regression Proving for Large-Scale Verification Projects

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Two phase commit protocol. Two phase commit protocol. Recall: Linearizability (Strong Consistency) Consensus

Velisarios: Byzantine Fault-Tolerant Protocols Powered by Coq

Failures, Elections, and Raft

CS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Strong Consistency & CAP Theorem

Remote Procedure Call. Tom Anderson

Translation Validation for a Verified OS Kernel

Developing Correctly Replicated Databases Using Formal Tools

Distributed Consensus Protocols

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Byzantine Fault Tolerance

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

CS6450: Distributed Systems Lecture 15. Ryan Stutsman

Causal Consistency and Two-Phase Commit

Runway. A new tool for distributed systems design. Diego Ongaro Lead Software Engineer, Compute

Validating Key-value Based Implementations of the Raft Consensus Algorithm for Distributed Systems

Prophecy: Using History for High Throughput Fault Tolerance

Basic vs. Reliable Multicast

Static and User-Extensible Proof Checking. Antonis Stampoulis Zhong Shao Yale University POPL 2012

Genie. Distributed Systems Synthesis and Verification. Marc Rosen. EN : Advanced Distributed Systems and Networks May 1, 2017

Designing for Understandability: the Raft Consensus Algorithm. Diego Ongaro John Ousterhout Stanford University

Extend PB for high availability. PB high availability via 2PC. Recall: Primary-Backup. Putting it all together for SMR:

Recovering from a Crash. Three-Phase Commit

Turning proof assistants into programming assistants

A Graphical Interactive Debugger for Distributed Systems

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm

CSE 124: REPLICATED STATE MACHINES. George Porter November 8 and 10, 2017

Ovid A Software-Defined Distributed Systems Framework. Deniz Altinbuken, Robbert van Renesse Cornell University

Œuf: Minimizing the Coq Extraction TCB

Enhancing Throughput of

Functional Programming in Coq. Nate Foster Spring 2018

Byzantine Fault Tolerance

Recall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers

Specifying the Ethereum Virtual Machine for Theorem Provers

Trends in Automated Verification

Temporal NetKAT. Ryan Beckett Michael Greenberg*, David Walker. Pomona College* Princeton University

Paxos Made Moderately Complex Made Moderately Simple

Split-Brain Consensus

From Crypto to Code. Greg Morrisett

Today: Fault Tolerance

Self-healing Data Step by Step

Practical Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance. Castro and Liskov SOSP 99

Formal verification of program obfuscations

Distributed Consensus: Making Impossible Possible

EECS 498 Introduction to Distributed Systems

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Ironclad Apps: End-to-End Security via Automated Full-System Verification. Danfeng Zhang

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

Fast Paxos. Leslie Lamport

A Framework for the Formal Verification of Time-Triggered Systems

Kami: A Framework for (RISC-V) HW Verification

Introduction to riak_ensemble. Joseph Blomstedt Basho Technologies

Distributed Systems 24. Fault Tolerance

Distributed Consensus: Making Impossible Possible

Building Consistent Transactions with Inconsistent Replication

Consensus, impossibility results and Paxos. Ken Birman

Recall use of logical clocks

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

PARALLEL CONSENSUS PROTOCOL

Beyond FLP. Acknowledgement for presentation material. Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen

VeriCon: Towards Verifying Controller Programs in SDNs

Verasco: a Formally Verified C Static Analyzer

Practical Byzantine Fault Tolerance and Proactive Recovery

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Improving Interrupt Response Time in a Verifiable Protected Microkernel

Today: Fault Tolerance. Fault Tolerance

An Extensible Programming Language for Verified Systems Software. Adam Chlipala MIT CSAIL WG 2.16 meeting, 2012

Distributed Systems. Day 11: Replication [Part 3 Raft] To survive failures you need a raft

340 Review Fall Midterm 1 Review

How Can You Trust Formally Verified Software?

REPLICATED STATE MACHINES

Verification of Fault-Tolerant Protocols with Sally

How much is a mechanized proof worth, certification-wise?

Certified compilers. Do you trust your compiler? Testing is immune to this problem, since it is applied to target code

Coordinating distributed systems part II. Marko Vukolić Distributed Systems and Cloud Computing

Database Management Systems

Distributed System. Gang Wu. Spring,2018

Negations in Refinement Type Systems

Handling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler

Transcription:

Verdi: A Framework for Implementing and Formally Verifying Distributed Systems Key-value store VST James R. Wilcox, Doug Woos, Pavel Panchekha, Zach Tatlock, Xi Wang, Michael D. Ernst, Thomas Anderson

Challenges

Challenges Distributed systems run in unreliable environments

Challenges Distributed systems run in unreliable environments Many types of failure can occur

Challenges Distributed systems run in unreliable environments Many types of failure can occur Fault-tolerance mechanisms are challenging to implement correctly

Challenges Distributed systems run in unreliable environments Many types of failure can occur Fault-tolerance mechanisms are challenging to implement correctly

Challenges Contributions Distributed systems run in unreliable environments Formalize network as operational semantics Many types of failure can occur Build semantics for a variety of fault models Fault-tolerance mechanisms are challenging to implement correctly Verify fault-tolerance as transformation between semantics

Verdi Workflow Build, verify system in simple semantics Client I/O Key-value store

Verdi Workflow Build, verify system in simple semantics Client I/O Key-value store V Apply verified system transformer S T KV Consensus Client I/O KV Consensus KV Consensus

Verdi Workflow Build, verify system in simple semantics Client I/O Key-value store V Apply verified system transformer T S KV End-to-end correctness by composition Client I/O KV Consensus Consensus KV Consensus

Contributions Formalize network as operational semantics Build semantics for a variety of fault models Verify fault-tolerance as transformation between semantics

Contributions General Approach Formalize network as operational semantics Find environments in your problem domain Build semantics for a variety of fault models Formalize these environments as operational semantics Verify fault-tolerance as transformation between semantics Verify layers as transformations between semantics

Verdi Successes Applications Key-value store Lock service Fault-tolerance mechanisms Sequence numbering Retransmission Primary-backup replication Consensus-based replication linearizability

Verdi Successes Applications Key-value store Lock service Fault-tolerance mechanisms Sequence numbering Retransmission Primary-backup replication Consensus-based replication linearizability

Important data

Replicated KV store Replicated KV store Replicated KV store Replicated for availability

Replicated KV store Replicated KV store Replicated KV store

Crash Reorder Drop Duplicate Partition... Replicated KV store Replicated KV store Replicated KV store

Crash Reorder Drop Duplicate Partition... Replicated KV store Replicated KV store Replicated KV store Environment is unreliable

Crash Reorder Drop Duplicate Partition... Replicated KV store Replicated KV store Replicated KV store Decades of research; still difficult to implement correctly Implementations often have bugs

Bug-free Implementations

Bug-free Implementations

Bug-free Implementations Several inspiring successes in formal verification CompCert, sel4, Jitk, Bedrock, IronClad, Frenetic, Quark

Bug-free Implementations Several inspiring successes in formal verification CompCert, sel4, Jitk, Bedrock, IronClad, Frenetic, Quark Goal: formally verify distributed system implementations

Formally Verify Distributed Implementations

Formally Verify Distributed Implementations Separate independent system components

Formally Verify Distributed Implementations App Fault tolerance App Fault tolerance App Fault tolerance Separate independent system components

Formally Verify Distributed Implementations App Fault tolerance App Fault tolerance App Fault tolerance Separate independent system components Verify application logic independently from fault-tolerance tolerance

Formally Verify Distributed Implementations KV Consensus KV Consensus KV Consensus Separate independent system components Verify application key-value store logic independently from consensus

Formally Verify Distributed Implementations KV KV 1. Verify application logic Consensus 2. Verify fault tolerance mechanism KV 3. Run the system! Consensus Consensus Separate independent system components Verify application key-value store logic independently from consensus

1. Verify Application Logic Client I/O Key-value store

1. Verify Application Logic Simple model, prove good map Client I/O Key-value store

1. Verify Application Logic Simple model, prove good map Client I/O Key-value store

2. Verify Fault Tolerance Mechanism Simple model, prove good map Client I/O Key-value store

2. Verify Fault Tolerance Mechanism Simple model, prove good map Client I/O Key-value store V S T

2. Verify Fault Tolerance Mechanism Simple model, prove good map Client I/O Key-value store V S T KV Consensus Client I/O KV Consensus KV Consensus

2. Verify Fault Tolerance Mechanism Simple model, prove good map Client I/O Key-value store V Apply verified system transformer, prove properties preserved S T KV Consensus Client I/O KV Consensus KV Consensus

2. Verify Fault Tolerance Mechanism Simple model, prove good map Client I/O Key-value store V Apply verified system transformer, prove properties preserved S T KV Consensus Client I/O KV Consensus KV Consensus

2. Verify Fault Tolerance Mechanism Simple model, prove good map Client I/O Key-value store V Apply verified system transformer, prove properties preserved S T KV End-to-end correctness by composition Client I/O KV Consensus Consensus KV Consensus

3. Run the System! KV Consensus KV Consensus KV Consensus Extract to OCaml, link unverified shim Run on real networks

Verifying application logic

Simple One-node Model Key-value State: {}

Simple One-node Model Set k v" Key-value State: {}

Simple One-node Model Set k v" Key-value State: { k : v }

Simple One-node Model Set k v" Key-value State: { k : v } Resp k v

Simple One-node Model Set k v" Key-value State: { k : v } Resp k v Trace: [Set k v", Resp k v ]

Simple One-node Model System State: σ H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input

Simple One-node Model Input:! System State: σ H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input

Simple One-node Model Input:! System State: σ H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input

Simple One-node Model Input:! System State: σ State: σ H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input

Simple One-node Model Input:! System State: σ State: σ Output: o H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input

Simple One-node Model Input:! System State: σ State: σ Output: o H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input

Simple One-node Model Input:! System State: σ State: σ Output: o H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input

Simple One-node Model Input:! System State: σ State: σ Output: o Trace: [!, o] H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input

Simple One-node Model Input:! System State: σ State: σ Output: o Trace: [!, o] H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input

Simple One-node Model Input:! System State: σ State: σ Output: o Trace: [!, o] H inp (, i)=( 0, o) (, T) s ( 0, T ++ hi, oi) Input

Simple One-node Model

Simple One-node Model Spec: operations have expected behavior (good map) Set, Get Del, Get

Simple One-node Model Spec: operations have expected behavior (good map) Set, Get Del, Get Verify system against semantics by induction Safety Property

Verifying Fault Tolerance

The Raft Transformer Consensus provides a replicated state machine Raft Same inputs on each node Calls into original system Raft Raft

The Raft Transformer Log of operations Consensus provides a replicated state machine Raft Same inputs on each node Calls into original system Raft Raft

The Raft Transformer Log of operations Consensus provides a replicated state Original machine system Raft Same inputs on each node Calls into original system Raft Raft

The Raft Transformer When input received: Add to log Send to other nodes Raft When op replicated: Apply to state machine Send output Raft Raft

The Raft Transformer For KV store: Ops are Get, Set, Del State is dictionary Raft Raft Raft

Raft Correctness V Correctly transforms systems T S Preserves traces Raft Linearizability Raft Raft

Fault Model Model global state Model internal communication Model failure

Fault Model: Global State

Fault Model: Global State Machines have names 1 2 3

Fault Model: Global State Machines have names Σ maps name to state 1 Σ[1] 2 3 Σ[2] Σ[3]

Fault Model: Messages 1 Σ[1] 2 3 Σ[2] Σ[3]

Fault Model: Messages 1 Vote? Σ[1] Vote? 2 3 Σ[2] Σ[3]

Fault Model: Messages 1 Network <1,2, Vote? > Vote? Σ[1] Vote? <1,3, Vote? > 2 3 Σ[2] Σ[3]

Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <1,2, Vote? > 2 3 Σ[2] Σ[3]

Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <1,2, Vote? > 2 3 Σ[2] Σ[3] H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)

Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <1,2, Vote? > 2 3 Σ[2] Σ [2] = σ Σ[3] H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)

Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <1,2, Vote? > 2 3 Σ[2] Σ [2] = σ Σ[3] Output: o H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)

Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <1,2, Vote? > <2,1, +1 > 2 3 Σ[2] Σ [2] = σ Σ[3] Output: o H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)

Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <2,1, +1 > <1,2, Vote? > 2 3 Σ[2] Σ [2] = σ Σ[3] Output: o H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)

Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <2,1, +1 > <1,2, Vote? > 2 3 Σ[2] Σ [2] = σ Σ[3] Output: o H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)

Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <2,1, +1 > <1,2, Vote? > 2 3 Σ[2] Σ [2] = σ Σ[3] Output: o H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)

Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <2,1, +1 > <1,2, Vote? > 2 3 Σ[2] Σ [2] = σ Σ[3] Output: o H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)

Fault Model: Messages 1 Network Vote? Σ[1] Vote? <1,3, Vote? > <2,1, +1 > <1,2, Vote? > 2 3 Σ[2] Σ [2] = σ Σ[3] Output: o H net (dst, [dst], src, m)=( 0, o, P 0 ) 0 = [dst 7! 0 ] ({(src, dst, m)} ] P,, T) r (P ] P 0, 0, T ++ hoi)

Fault Model: Failures Network <1,2, Vote? > <1,3, Vote? > 1 Σ[1] 2 3 Σ[2] Σ[3]

Fault Model: Failures Network <1,3, Vote? > Message drop 1 Σ[1] 2 3 Σ[2] Σ[3]

Fault Model: Failures Network <1,3, Vote? > Message drop <1,3, Vote? > Message duplication 1 Σ[1] 2 3 Σ[2] Σ[3]

Fault Model: Failures Network <1,3, Vote? > Message drop <1,3, Vote? > Message duplication 1 Σ[1] Machine crash 2 3 Σ[2] Σ[3]

Fault Model: Drop Network <1,2, hi > <1,3, hi > ({p} ] P,, T) drop (P,, T) Drop

Fault Model: Drop Network <1,2, hi > <1,3, hi > ({p} ] P,, T) drop (P,, T) Drop

Fault Model: Drop Network <1,2, hi > <1,3, hi > ({p} ] P,, T) drop (P,, T) Drop

Toward Verifying Raft General theory of linearizability 1k lines of implementation, 5k lines for linearizability State machine safety: 30k lines Most state invariants proved, some left to do

Verified System Transformers Functions on systems Transform systems between semantics Maintain equivalent traces Get correctness of transformed system for free

Verified System Transformers

Verified System Transformers Raft Consensus App

Verified System Transformers Raft Consensus Primary Backup App App

Verified System Transformers Raft Consensus Primary Backup Seq # and Retrans App App

Verified System Transformers Raft Primary Seq # and Ghost Consensus Backup Retrans Variables App App

Running Verdi Programs

Running Verdi Programs Coq extraction to Ocaml Thin, unverified shim Trusted compute base: shim, Coq, Ocaml, OS

Performance Evaluation Compare with etcd, a similar open-source store 10% performance overhead Mostly disk/network bound etcd has had linearizability bugs

Previous Approaches EventML [Schiper 2014] Verified Paxos using the NuPRL proof assistant MACE [Killian 2007] Model checking distributed systems in C++ TLA+ [Lamport 2002] Specification language and logic

Contributions Formalize network as operational semantics Build semantics for a variety of fault models Verify fault-tolerance as transformation between semantics

Contributions Formalize network as operational semantics Build semantics for a variety of fault models Verify fault-tolerance as transformation between semantics http://verdi.uwplse.org

Contributions Formalize network as operational semantics Build semantics for a variety of fault models Verify fault-tolerance as transformation between semantics Thanks! http://verdi.uwplse.org