Reducing Crash Recoverability to Reachability

Size: px
Start display at page:

Download "Reducing Crash Recoverability to Reachability"

Transcription

1 Reducing Crash Recoverability to Reachability Eric Koskinen Yale University Junfeng Yang Columbia University Principles of Programming Languages St. Petersburg, Florida 20 January 2016

2 We are pretty good at writing programs

3 We are pretty good at writing programs

4 We are pretty good at writing programs

5 We are pretty good at writing programs

6 We are pretty good at writing programs

7 We are pretty good at writing programs?

8 1. What do we mean by crash and recovery? Specification 2. Can we prove (automatically) that a program recovers from a crash? 3. Does this actually work on real examples?

9 What do we mean by crash and recovery? 0 1 CRASH 2 3

10 What do we mean by crash and recovery? CRASH 1. Boot machine 2. Establish program env. 3. Execute program 4. Crash mid-execution 5. Re-boot computer 6. Execute Recovery Script 7. Establish program env. 8. Re-execute program

11 What do we mean by crash and recovery? 0 1 CRASH 1. Boot machine 2. Establish program env. 3. Execute program Initial State 4. Crash mid-execution 2 5. Re-boot computer 6. Execute Recovery Script 7. Establish program env Re-execute program

12 What do we mean by crash and recovery? 0 1 CRASH 1. Boot machine 2. Establish program env. 3. Execute program Initial State 4. Crash mid-execution Crash 2 5. Re-boot computer 6. Execute Recovery Script 7. Establish program env Re-execute program

13 What do we mean by crash and recovery? 0 1 CRASH 1. Boot machine 2. Establish program env. 3. Execute program Initial State 4. Crash mid-execution Crash 2 5. Re-boot computer 6. Execute Recovery Script Recover 7. Establish program env Re-execute program

14 What do we mean by crash and recovery? 0 1 CRASH in = open(input) read(in,buf); CRASH

15 in = open(input) out = open(output,o_creat O_WRONLY O_TRUNC) write(out, A ) CRASH...

16 CRASH in = open(input) out = open(output,o_creat O_WRONLY O_TRUNC) write(out, A ) CRASH

17 Is this new trace ok? CRASH in = open(input) out = open(output,o_creat O_WRONLY O_TRUNC) write(out, A ) CRASH

18 Program states 7min

19 Program states With possibility of crashes... possibility of new behaviors Definition: If the program crashes, when it is re-executed, should not have new behaviors that weren t in the original program. Matches what the program does Program must handle new initial states

20 Program states With possibility of crashes... possibility of new behaviors Would like to prove that they are already included in the original program. C C Therefore... We can use the original program as the specification for how the program should behave in the presence of crashes.

21 Program states With possibility of crashes... possibility of new behaviors Would like to prove that they are already included in the original program. C C Therefore... We can use the original program as the specification for how the program should behave in the presence of crashes.

22 Non-determinism in = open(input) out = open(output,o_creat O_WRONLY O_TRUNC) if(rand()) { write(out, A ); CRASH } else { write(out, B ); }...

23 Non-determinism CRASH in = open(input) out = open(output,o_creat O_WRONLY O_TRUNC) if(rand()) { write(out, A ); CRASH } else { write(out, B ); }

24 Recovery Scripts RECOVER 10 CRASH Described in the Paper in = open(input) out = creat(output) if(rand()) { write(out, A ); CRASH; RECOVER() } else { write(out, B ); }... RECOVER() { if(exists(output)) unlink(output); }

25 6CRASH Specification Checkpoints RECOVER 3 Described in the Paper in = open(input) out = creat(output) write(out, pre ); fsync_commit(out); chkpt: if(rand()) { CRASH; RECOVER()... } else {... } RECOVER() { if(committed) { in=open(input); out=open(output); goto chkpt; } }

26 Hierarchy of Crash Recoverability 0-recoverability 1-recoverability N-recoverability

27 Hierarchy of Crash Recoverability 0-recoverability 1-recoverability N-recoverability

28 Hierarchy of Crash Recoverability 0-recoverability 1-recoverability N-recoverability

29 Hierarchy of Crash Recoverability 0-recoverability 1-recoverability N-recoverability Simulation

30 Hierarchy of Crash Recoverability 0-recoverability 1-recoverability N-recoverability Recoverability

31 Hierarchy of Crash Recoverability 0-recoverability 1-recoverability N-recoverability -recoverability

32 1. What do we mean by crash and recovery? 2. Can we prove (automatically) that a program recovers from a crash? 3. Does this actually work on real examples?

33 Key Idea: Transformation ( ) 0 fd = open(pw); {fd < 0} 2 5 {joe buf} 1 {fd 0} buf = read(fd); close(fd); 3 {joe buf} d=readdir(/u); 4 creat(pw2); append(pw2,buf); append(pw2,joe); fsync(pw2); close(pw2); 7 rename(pw2,pw); psync(pw); mkdir(/u/joe);

34 Key Idea: Transformation ( ) 0 fd = open(pw); {fd < 0} 2 5 {joe buf} 1 {fd 0} buf = read(fd); close(fd); 3 {joe buf} d=readdir(/u); 4 creat(pw2); append(pw2,buf); append(pw2,joe); fsync(pw2); close(pw2); 7 rename(pw2,pw); psync(pw); mkdir(/u/joe); reduce to reachability: ( ). cannot reach qerr is crash-recoverable

35 Key Idea: Transformation ( ) 0 fd = open(pw); {fd < 0} 2 5 {joe buf} 1 {fd 0} buf = read(fd); close(fd); 3 {joe buf} d=readdir(/u); 4 creat(pw2); append(pw2,buf); append(pw2,joe); fsync(pw2); close(pw2); 7 rename(pw2,pw); psync(pw); mkdir(/u/joe); reduce to reachability: Well-founded relation. cannot reach qerr is crash-recoverable

36 r = m(a) 1 2 s = n(b) σ1 ain Theorem. cannot reach qerr is crash-recoverable

37 r = m(a) r 1 2 Create Snapshot pw2 := pw; mem2 := mem; σ s = n(b) σ2 ain Theorem. cannot reach qerr is crash-recoverable

38 r = m(a) s = n(b) Create Snapshot pw3 := pw; mem3 := mem; r s σ σ2 σ3 ain Theorem. cannot reach qerr is crash-recoverable

39 r = m(a) s = n(b) r s σ σ2 σ3 σ4 σ5 ain Theorem. cannot reach qerr is crash-recoverable

40 r = m(a) s = n(b) r s Crash σ2 σ3 σ4 σ5 ain Theorem σ. cannot reach qerr is crash-recoverable qerr Recovery Termination

41 r = m(a) r s σ s = n(b) σ2 σ3 σ4 σ5 `σ ain Theorem Load Snapshot `pw := pw2; `mem := mem2;. cannot reach qerr is crash-recoverable qerr

42 r = m(a) r s σ s = n(b) `s s `σ Execute uncrashed snapshot `s := n(`b); And recovered state ain Theorem s := n(b);. cannot reach qerr is crash-recoverable qerr

43 r = m(a) s = n(b) r s σ σ `σ `σ `t t qerr ain Theorem. cannot reach qerr is crash-recoverable

44 r = m(a) s = n(b) r s σ `σ `t t qerr ain Theorem. cannot reach qerr is crash-recoverable

45 ain Theorem. cannot reach qerr is crash-recoverable

46 1. What do we mean by crash and recovery? 2. Can we prove (automatically) that a program recovers from a crash? 3. Does this actually work on real examples?

47 Eleven 82

48 Eleven 82 counter- example Proof.

49 Notes Specification Built on CPAchecker Compiler Macros Model of the filesystem with arrays and integers Copying with arrays Eleven 82 counter- example Proof.

50 Benchmarks Simple examples from earlier in this talk Examples of crash recovery protocols of real-world examples [Pillai et al. OSDI 14] Google s LevelDB PostgreSQL - Used by 30% of tech companies SQLite - Used by probably every Android app (1B users) VMware ZooKeeper - Distributed applications, used by Yahoo

51

52

53 Related Work Chen et al. Using Crash Hoare logic for certifying the FSCQ file system. SOSP 2015 Broadly complementary: verified FS versus verifying user-level programs Specifically different: we focus on automation while they focus on proof modularity/reusability (require user-provided CHL specifications and user help in proof obligations) Ntzik et al. Fault-Tolerant Resource Reasoning. APLAS Novel logic explicitly tracking volatile/persistant Support concurrency, Not automated Gardner et al. Local Reasoning for the POSIX filesystem. ESOP Ridge et al. SibylFS: formal specification and oracle-based testing for POSIX and real-world file systems. SOSP 2015

54 Reducing Crash Recoverability to Reachability Eric Koskinen Yale University POPL 2016 Junfeng Yang Columbia University Contributions Specification - Definitions on what it means for a crash to recover Automatic - Reduction to automaton reachability - Proved recoverability of commit protocols from real systems (SQLite, LevelDB, ZooKeeper, etc.) Open Challenges - Code scope, O/S layers N-recoverability, infinite-recoverability Timing - Does recovery happen promptly? Concurrency

55 Thank you!

EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University

EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors Junfeng Yang, Can Sar, Dawson Engler Stanford University Why check storage systems? Storage system errors are among the

More information

All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications

All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications All File Systems Are Not Created Equal: On the Compleity of Crafting Crash-Consistent Applications Thanumalayan Sankaranarayana Pillai Vijay Chidambaram Ramnatthan Alagappan, Samer Al-Kiswany Andrea Arpaci-Dusseau,

More information

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm Appears as Technical Memo MIT/LCS/TM-590, MIT Laboratory for Computer Science, June 1999 A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm Miguel Castro and Barbara Liskov

More information

Push-Button Verification of File Systems

Push-Button Verification of File Systems 1 / 24 Push-Button Verification of File Systems via Crash Refinement Helgi Sigurbjarnarson, James Bornholt, Emina Torlak, Xi Wang University of Washington 2 / 24 File systems are hard to get right Complex

More information

Modular Verification of Order-Preserving Write-Back Caches

Modular Verification of Order-Preserving Write-Back Caches Modular Verification of Order-Preserving Write-Back Caches Jörg Pfähler, Gidon Ernst, Stefan Bodenmüller, Gerhard Schellhorn, and Wolfgang Reif Institute for Software and Systems Engineering University

More information

Using Crash Hoare Logic for Certifying the FSCQ File System

Using Crash Hoare Logic for Certifying the FSCQ File System 1 / 27 Using Crash Hoare Logic for Certifying the FSCQ File System Haogang Chen, Daniel Ziegler, Tej Chajed, Adam Chlipala, Frans Kaashoek, and Nickolai Zeldovich MIT CSAIL 2 / 27 File systems are complex

More information

Push-Button Verification of File Systems

Push-Button Verification of File Systems 1 / 25 Push-Button Verification of File Systems via Crash Refinement Helgi Sigurbjarnarson, James Bornholt, Emina Torlak, Xi Wang University of Washington October 26, 2016 2 / 25 File systems are hard

More information

Flashix: Results and Perspective. Jörg Pfähler, Stefan Bodenmüller, Gerhard Schellhorn, (Gidon Ernst)

Flashix: Results and Perspective. Jörg Pfähler, Stefan Bodenmüller, Gerhard Schellhorn, (Gidon Ernst) Flashix: Results and Perspective Jörg Pfähler, Stefan Bodenmüller, Gerhard Schellhorn, (Gidon Ernst) Overview 1. Flash Memory and Flash File Systems 2. Results of Flashix I 3. Current Result: Integration

More information

Push-button verification of Files Systems via Crash Refinement

Push-button verification of Files Systems via Crash Refinement Push-button verification of Files Systems via Crash Refinement Verification Primer Behavioral Specification and implementation are both programs Equivalence check proves the functional correctness Hoare

More information

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency Local file systems Disks are terrible abstractions: low-level blocks, etc. Directories, files, links much

More information

Network File Systems

Network File Systems Network File Systems CS 240: Computing Systems and Concurrency Lecture 4 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Abstraction, abstraction, abstraction!

More information

Advanced Memory Management

Advanced Memory Management Advanced Memory Management Main Points Applications of memory management What can we do with ability to trap on memory references to individual pages? File systems and persistent storage Goals Abstractions

More information

Recovering from a Crash. Three-Phase Commit

Recovering from a Crash. Three-Phase Commit Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator

More information

An Operational and Axiomatic Semantics for Non-determinism and Sequence Points in C

An Operational and Axiomatic Semantics for Non-determinism and Sequence Points in C An Operational and Axiomatic Semantics for Non-determinism and Sequence Points in C Robbert Krebbers Radboud University Nijmegen January 22, 2014 @ POPL, San Diego, USA 1 / 16 What is this program supposed

More information

Certifying a file system: Correctness in the presence of crashes

Certifying a file system: Correctness in the presence of crashes 1 / 28 Certifying a file system: Correctness in the presence of crashes Tej Chajed, Haogang Chen, Stephanie Wang, Daniel Ziegler, Adam Chlipala, Frans Kaashoek, and Nickolai Zeldovich MIT CSAIL 2 / 28

More information

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong Relatively recent; still applicable today GFS: Google s storage platform for the generation and processing of data used by services

More information

Towards Efficient, Portable Application-Level Consistency

Towards Efficient, Portable Application-Level Consistency Towards Efficient, Portable Application-Level Consistency Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Joo-Young Hwang, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau 1 File System Crash

More information

introduction to Programming in C Department of Computer Science and Engineering Lecture No. #40 Recursion Linear Recursion

introduction to Programming in C Department of Computer Science and Engineering Lecture No. #40 Recursion Linear Recursion introduction to Programming in C Department of Computer Science and Engineering Lecture No. #40 Recursion Linear Recursion Today s video will talk about an important concept in computer science which is

More information

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements

More information

The POSIX shell as a programming language

The POSIX shell as a programming language The POSIX shell as a programming language Michael Greenberg (Pomona College) OBT 2017 Paris, France i love shell shell is everywhere vital for managing systems maintenance deployment universal tool for

More information

The challenges of non-stable predicates. The challenges of non-stable predicates. The challenges of non-stable predicates

The challenges of non-stable predicates. The challenges of non-stable predicates. The challenges of non-stable predicates The challenges of non-stable predicates Consider a non-stable predicate Φ encoding, say, a safety property. We want to determine whether Φ holds for our program. The challenges of non-stable predicates

More information

Reducing the Costs of Large-Scale BFT Replication

Reducing the Costs of Large-Scale BFT Replication Reducing the Costs of Large-Scale BFT Replication Marco Serafini & Neeraj Suri TU Darmstadt, Germany Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de

More information

Verified compilers. Guest lecture for Compiler Construction, Spring Magnus Myréen. Chalmers University of Technology

Verified compilers. Guest lecture for Compiler Construction, Spring Magnus Myréen. Chalmers University of Technology Guest lecture for Compiler Construction, Spring 2015 Verified compilers Magnus Myréen Chalmers University of Technology Mentions joint work with Ramana Kumar, Michael Norrish, Scott Owens and many more

More information

Implementing a Verified On-Disk Hash Table

Implementing a Verified On-Disk Hash Table Implementing a Verified On-Disk Hash Table Stephanie Wang Abstract As more and more software is written every day, so too are bugs. Software verification is a way of using formal mathematical methods to

More information

NPTEL Course Jan K. Gopinath Indian Institute of Science

NPTEL Course Jan K. Gopinath Indian Institute of Science Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,

More information

CSE 486/586 Distributed Systems

CSE 486/586 Distributed Systems CSE 486/586 Distributed Systems Failure Detectors Slides by: Steve Ko Computer Sciences and Engineering University at Buffalo Administrivia Programming Assignment 2 is out Please continue to monitor Piazza

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

PostgreSQL on Solaris. PGCon Josh Berkus, Jim Gates, Zdenek Kotala, Robert Lor Sun Microsystems

PostgreSQL on Solaris. PGCon Josh Berkus, Jim Gates, Zdenek Kotala, Robert Lor Sun Microsystems PostgreSQL on Solaris PGCon 2007 Josh Berkus, Jim Gates, Zdenek Kotala, Robert Lor Sun Microsystems 1 Agenda Sun Cluster ZFS Zones Dtrace Service Management Facility (SMF) PGCon 2007 2 Hightly Available

More information

Note that in this definition, n + m denotes the syntactic expression with three symbols n, +, and m, not to the number that is the sum of n and m.

Note that in this definition, n + m denotes the syntactic expression with three symbols n, +, and m, not to the number that is the sum of n and m. CS 6110 S18 Lecture 8 Structural Operational Semantics and IMP Today we introduce a very simple imperative language, IMP, along with two systems of rules for evaluation called small-step and big-step semantics.

More information

Today CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra

Today CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra Today CSCI 5105 Recovery CAP Theorem Instructor: Abhishek Chandra 2 Recovery Operations to be performed to move from an erroneous state to an error-free state Backward recovery: Go back to a previous correct

More information

The objective. Atomic Commit. The setup. Model. Preserve data consistency for distributed transactions in the presence of failures

The objective. Atomic Commit. The setup. Model. Preserve data consistency for distributed transactions in the presence of failures The objective Atomic Commit Preserve data consistency for distributed transactions in the presence of failures Model The setup For each distributed transaction T: one coordinator a set of participants

More information

CS5232 Formal Specification and Design Techniques. Using PAT to verify the Needham-Schroeder Public Key Protocol

CS5232 Formal Specification and Design Techniques. Using PAT to verify the Needham-Schroeder Public Key Protocol CS5232 Formal Specification and Design Techniques Using PAT to verify the Needham-Schroeder Public Key Protocol Semester 2, AY 2008/2009 1/37 Table of Contents 1. Project Introduction 3 2. Building the

More information

Fork Sequential Consistency is Blocking

Fork Sequential Consistency is Blocking Fork Sequential Consistency is Blocking Christian Cachin Idit Keidar Alexander Shraer May 14, 2008 Abstract We consider an untrusted server storing shared data on behalf of clients. We show that no storage

More information

APPLICATIONS AND PROTOCOLS. Mihir Bellare UCSD 1

APPLICATIONS AND PROTOCOLS. Mihir Bellare UCSD 1 APPLICATIONS AND PROTOCOLS Mihir Bellare UCSD 1 Some applications and protocols Internet Casino Commitment Shared coin flips Threshold cryptography Forward security Program obfuscation Zero-knowledge Certified

More information

Today: Fault Tolerance. Reliable One-One Communication

Today: Fault Tolerance. Reliable One-One Communication Today: Fault Tolerance Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing Message logging Lecture 17, page 1 Reliable One-One Communication Issues

More information

Crash Recovery. Assignment 1 Posted Saturday

Crash Recovery. Assignment 1 Posted Saturday Crash Recovery Wyatt Lloyd Assignment 1 Posted Saturday On github, instructions in readme.md: https://github.com/usc657/username-assignment1 Posted later than I intended => You get lots of late days Please

More information

System support for adaptation and composition of applications

System support for adaptation and composition of applications System support for adaptation and composition of applications Stephen Kell Stephen.Kell@cl.cam.ac.uk Computer Laboratory University of Cambridge... adaptation and composition... p. 1 First, a video Note:

More information

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented

More information

Intermediate Code Generation Part II

Intermediate Code Generation Part II Intermediate Code Generation Part II Chapter 6: Type checking, Control Flow Slides adapted from : Robert van Engelen, Florida State University Static versus Dynamic Checking Static checking: the compiler

More information

CA485 Ray Walshe Google File System

CA485 Ray Walshe Google File System Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage

More information

Fork Sequential Consistency is Blocking

Fork Sequential Consistency is Blocking Fork Sequential Consistency is Blocking Christian Cachin Idit Keidar Alexander Shraer Novembe4, 008 Abstract We consider an untrusted server storing shared data on behalf of clients. We show that no storage

More information

Haopeng Liu, Xu Wang *, Guangpu Li, Shan Lu, Feng Ye, and Chen Tian.

Haopeng Liu, Xu Wang *, Guangpu Li, Shan Lu, Feng Ye, and Chen Tian. Haopeng Liu, Xu Wang *, Guangpu Li, Shan Lu, Feng Ye, and Chen Tian http://fcatch.cs.uchicago.edu/ * 1 Component failure: node crashes and message drops 2 #. Events 3 1000 900 800 700 600 500 400 300 200

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

Yuxi Chen, Shu Wang, Shan Lu, and Karthikeyan Sankaralingam *

Yuxi Chen, Shu Wang, Shan Lu, and Karthikeyan Sankaralingam * Yuxi Chen, Shu Wang, Shan Lu, and Karthikeyan Sankaralingam * * 2 q Synchronization mistakes in multithreaded programs Thread 1 Thread 2 If(ptr){ tmp = *ptr; ptr = NULL; } Segfault q Common q Hard to diagnose

More information

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit Fault Tolerance o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication o Distributed Commit -1 Distributed Commit o A more general problem of atomic

More information

Network File System (NFS)

Network File System (NFS) Network File System (NFS) Brad Karp UCL Computer Science CS GZ03 / M030 19 th October, 2009 NFS Is Relevant Original paper from 1985 Very successful, still widely used today Early result; much subsequent

More information

Operating Systems (2INC0) 2018/19. Introduction (01) Dr. Tanir Ozcelebi. Courtesy of Prof. Dr. Johan Lukkien. System Architecture and Networking Group

Operating Systems (2INC0) 2018/19. Introduction (01) Dr. Tanir Ozcelebi. Courtesy of Prof. Dr. Johan Lukkien. System Architecture and Networking Group Operating Systems (2INC0) 20/19 Introduction (01) Dr. Courtesy of Prof. Dr. Johan Lukkien System Architecture and Networking Group Course Overview Introduction to operating systems Processes, threads and

More information

Network File System (NFS)

Network File System (NFS) Network File System (NFS) Brad Karp UCL Computer Science CS GZ03 / M030 14 th October 2015 NFS Is Relevant Original paper from 1985 Very successful, still widely used today Early result; much subsequent

More information

CS 261 Fall Mike Lam, Professor. Exceptional Control Flow and Processes

CS 261 Fall Mike Lam, Professor. Exceptional Control Flow and Processes CS 261 Fall 2017 Mike Lam, Professor Exceptional Control Flow and Processes Exceptional control flow Most control flow is sequential However, we have seen violations of this rule Exceptional control flow

More information

Weak Levels of Consistency

Weak Levels of Consistency Weak Levels of Consistency - Some applications are willing to live with weak levels of consistency, allowing schedules that are not serialisable E.g. a read-only transaction that wants to get an approximate

More information

The Dangers and Complexities of SQLite Benchmarking. Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram

The Dangers and Complexities of SQLite Benchmarking. Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram The Dangers and Complexities of SQLite Benchmarking Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram 2 3 Benchmarking SQLite is Non-trivial! Benchmarking complex systems in a repeatable fashion

More information

Hypervisor-based Fault-tolerance. Where should RC be implemented? The Hypervisor as a State Machine. The Architecture. In hardware

Hypervisor-based Fault-tolerance. Where should RC be implemented? The Hypervisor as a State Machine. The Architecture. In hardware Where should RC be implemented? In hardware sensitive to architecture changes At the OS level state transitions hard to track and coordinate At the application level requires sophisticated application

More information

Distributed Systems 16. Distributed File Systems II

Distributed Systems 16. Distributed File Systems II Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS

More information

OS COMPONENTS OVERVIEW OF UNIX FILE I/O. CS124 Operating Systems Fall , Lecture 2

OS COMPONENTS OVERVIEW OF UNIX FILE I/O. CS124 Operating Systems Fall , Lecture 2 OS COMPONENTS OVERVIEW OF UNIX FILE I/O CS124 Operating Systems Fall 2017-2018, Lecture 2 2 Operating System Components (1) Common components of operating systems: Users: Want to solve problems by using

More information

CSE Database Management Systems. York University. Parke Godfrey. Winter CSE-4411M Database Management Systems Godfrey p.

CSE Database Management Systems. York University. Parke Godfrey. Winter CSE-4411M Database Management Systems Godfrey p. CSE-4411 Database Management Systems York University Parke Godfrey Winter 2014 CSE-4411M Database Management Systems Godfrey p. 1/16 CSE-3421 vs CSE-4411 CSE-4411 is a continuation of CSE-3421, right?

More information

Membrane: Operating System support for Restartable File Systems

Membrane: Operating System support for Restartable File Systems Membrane: Operating System support for Restartable File Systems Membrane Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M.

More information

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin) : LFS and Soft Updates Ken Birman (based on slides by Ben Atkin) Overview of talk Unix Fast File System Log-Structured System Soft Updates Conclusions 2 The Unix Fast File System Berkeley Unix (4.2BSD)

More information

Type Theory meets Effects. Greg Morrisett

Type Theory meets Effects. Greg Morrisett Type Theory meets Effects Greg Morrisett A Famous Phrase: Well typed programs won t go wrong. 1. Describe abstract machine: M ::= 2. Give transition relation: M 1 M 2

More information

Unit 9 Transaction Processing: Recovery Zvi M. Kedem 1

Unit 9 Transaction Processing: Recovery Zvi M. Kedem 1 Unit 9 Transaction Processing: Recovery 2013 Zvi M. Kedem 1 Recovery in Context User%Level (View%Level) Community%Level (Base%Level) Physical%Level DBMS%OS%Level Centralized Or Distributed Derived%Tables

More information

Topics in Software Testing

Topics in Software Testing Dependable Software Systems Topics in Software Testing Material drawn from [Beizer, Sommerville] Software Testing Software testing is a critical element of software quality assurance and represents the

More information

Fault-Tolerance & Paxos

Fault-Tolerance & Paxos Chapter 15 Fault-Tolerance & Paxos How do you create a fault-tolerant distributed system? In this chapter we start out with simple questions, and, step by step, improve our solutions until we arrive at

More information

Storage Integration with Host-based Write-back Caching

Storage Integration with Host-based Write-back Caching Storage Integration with Host-based Write-back Caching Andy Banta @andybanta NetApp SolidFire Santa Clara, CA 1 Agenda Patented information How virtual machines use storage Caching methods And who can

More information

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering Agreement and Consensus SWE 622, Spring 2017 Distributed Software Engineering Today General agreement problems Fault tolerance limitations of 2PC 3PC Paxos + ZooKeeper 2 Midterm Recap 200 GMU SWE 622 Midterm

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

Vembu BDR Suite vs HPE VM Explorer

Vembu BDR Suite vs HPE VM Explorer vs VEMBU TECHNOLOGIES www.vembu.com About Vembu Vembu Technologies is a well known technology innovator in offering (BDR) product targeted at IT-Administrators in large, medium and small businesses to

More information

Verifying C & C++ with ESBMC

Verifying C & C++ with ESBMC Verifying C & C++ with ESBMC Denis A Nicole dan@ecs.soton.ac.uk CyberSecuritySoton.org [w] @CybSecSoton [fb & tw] ESBMC ESBMC, the Efficient SMT-Based Context-Bounded Model Checker was originally developed

More information

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,

More information

The semantics of a programming language is concerned with the meaning of programs, that is, how programs behave when executed on computers.

The semantics of a programming language is concerned with the meaning of programs, that is, how programs behave when executed on computers. Semantics The semantics of a programming language is concerned with the meaning of programs, that is, how programs behave when executed on computers. The semantics of a programming language assigns a precise

More information

Regular Languages (14 points) Solution: Problem 1 (6 points) Minimize the following automaton M. Show that the resulting DFA is minimal.

Regular Languages (14 points) Solution: Problem 1 (6 points) Minimize the following automaton M. Show that the resulting DFA is minimal. Regular Languages (14 points) Problem 1 (6 points) inimize the following automaton Show that the resulting DFA is minimal. Solution: We apply the State Reduction by Set Partitioning algorithm (särskiljandealgoritmen)

More information

disk writes, then a problem arises.

disk writes, then a problem arises. DOI:10.1145/3051092 Certifying a File System Using Crash Hoare Logic: Correctness in the Presence of Crashes By Tej Chajed, Haogang Chen, Adam Chlipala, M. Frans Kaashoek, Nickolai Zeldovich, and Daniel

More information

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017 Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google

More information

Fault Tolerance. Chapter 7

Fault Tolerance. Chapter 7 Fault Tolerance Chapter 7 Basic Concepts Dependability Includes Availability Reliability Safety Maintainability Failure Models Type of failure Crash failure Omission failure Receive omission Send omission

More information

COMMENTS. AC-1: AC-1 does not require all processes to reach a decision It does not even require all correct processes to reach a decision

COMMENTS. AC-1: AC-1 does not require all processes to reach a decision It does not even require all correct processes to reach a decision ATOMIC COMMIT Preserve data consistency for distributed transactions in the presence of failures Setup one coordinator a set of participants Each process has access to a Distributed Transaction Log (DT

More information

Hiding local state in direct style: a higher-order anti-frame rule

Hiding local state in direct style: a higher-order anti-frame rule 1 / 65 Hiding local state in direct style: a higher-order anti-frame rule François Pottier January 28th, 2008 2 / 65 Contents Introduction Basics of the type system A higher-order anti-frame rule Applications

More information

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks. Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}

More information

Toward (SOS) Self-stabilizing Operating System

Toward (SOS) Self-stabilizing Operating System Toward (SOS) Self-stabilizing Operating System Shlomi Dolev and Reuven Yagel, Ben-Gurion University, Israel Sep. 1st SAACS 04 Workshop, Zaragoza Outline Motivation: current operating systems do not stabilize!

More information

Next-Generation Cloud Platform

Next-Generation Cloud Platform Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology

More information

Consensus, impossibility results and Paxos. Ken Birman

Consensus, impossibility results and Paxos. Ken Birman Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}

More information

Self-stabilizing Byzantine Digital Clock Synchronization

Self-stabilizing Byzantine Digital Clock Synchronization Self-stabilizing Byzantine Digital Clock Synchronization Ezra N. Hoch, Danny Dolev and Ariel Daliot The Hebrew University of Jerusalem We present a scheme that achieves self-stabilizing Byzantine digital

More information

Turning proof assistants into programming assistants

Turning proof assistants into programming assistants Turning proof assistants into programming assistants ST Winter Meeting, 3 Feb 2015 Magnus Myréen Why? Why combine proof- and programming assistants? Why proofs? Testing cannot show absence of bugs. Some

More information

Main Goal. Language-independent program verification framework. Derive program properties from operational semantics

Main Goal. Language-independent program verification framework. Derive program properties from operational semantics Main Goal Language-independent program verification framework Derive program properties from operational semantics Questions: Is it possible? Is it practical? Answers: Sound and complete proof system,

More information

CS 550 Operating Systems Spring Operating Systems Overview

CS 550 Operating Systems Spring Operating Systems Overview 1 CS 550 Operating Systems Spring 2018 Operating Systems Overview 2 What is an OS? Applications OS Hardware A software layer between the hardware and the application programs/users which provides a virtualization

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,

More information

Database!! Structured data collection!! Records!! Relationships. Enforces that data maintains certain consistency properties

Database!! Structured data collection!! Records!! Relationships. Enforces that data maintains certain consistency properties Relational Databases Sam Madden Key ideas: Declarative programming Transactions Database Structured data collection Records Relationships Database management system (DBMS) Why? 1) Widely used 2) Several

More information

Reasoning about modules: data refinement and simulation

Reasoning about modules: data refinement and simulation Reasoning about modules: data refinement and simulation David Naumann naumann@cs.stevens-tech.edu Stevens Institute of Technology Naumann - POPL 02 Java Verification Workshop p.1/17 Objectives of talk

More information

Lecture Notes: Hoare Logic

Lecture Notes: Hoare Logic Lecture Notes: Hoare Logic 17-654/17-754: Analysis of Software Artifacts Jonathan Aldrich (jonathan.aldrich@cs.cmu.edu) Lecture 3 1 Hoare Logic The goal of Hoare logic is to provide a formal system for

More information

p x i 1 i n x, y, z = 2 x 3 y 5 z

p x i 1 i n x, y, z = 2 x 3 y 5 z 3 Pairing and encoding functions Our aim in this part of the course is to show that register machines can compute everything that can be computed, and to show that there are things that can t be computed.

More information

Programming Languages Third Edition

Programming Languages Third Edition Programming Languages Third Edition Chapter 12 Formal Semantics Objectives Become familiar with a sample small language for the purpose of semantic specification Understand operational semantics Understand

More information

An Empirical Study of High Availability in Stream Processing Systems

An Empirical Study of High Availability in Stream Processing Systems An Empirical Study of High Availability in Stream Processing Systems Yu Gu, Zhe Zhang, Fan Ye, Hao Yang, Minkyong Kim, Hui Lei, Zhen Liu Stream Processing Model software operators (PEs) Ω Unexpected machine

More information

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How.

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How. Recap Best Practices Distributed Systems Failure Detectors Steve Ko Computer Sciences and Engineering University at Buffalo 2 Today s Question Two Different System Models How do we handle failures? Cannot

More information

Concurrent specifications beyond linearizability

Concurrent specifications beyond linearizability Concurrent specifications beyond linearizability Éric Goubault Jérémy Ledent Samuel Mimram École Polytechnique, France OPODIS 2018, Hong Kong December 19, 2018 1 / 14 Objects Processes communicate through

More information

Material from Recitation 1

Material from Recitation 1 Material from Recitation 1 Darcey Riley Frank Ferraro January 18, 2011 1 Introduction In CSC 280 we will be formalizing computation, i.e. we will be creating precise mathematical models for describing

More information

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD Department of Computer Science Institute of System Architecture, Operating Systems Group DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD OUTLINE Classical distributed file systems NFS: Sun Network File System

More information

CS505: Distributed Systems

CS505: Distributed Systems Department of Computer Science CS505: Distributed Systems Lecture 13: Distributed Transactions Outline Distributed Transactions Two Phase Commit and Three Phase Commit Non-blocking Atomic Commit with P

More information

Byzantine Fault Tolerance

Byzantine Fault Tolerance Byzantine Fault Tolerance CS 240: Computing Systems and Concurrency Lecture 11 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. So far: Fail-stop failures

More information

File Systems: Consistency Issues

File Systems: Consistency Issues File Systems: Consistency Issues File systems maintain many data structures Free list/bit vector Directories File headers and inode structures res Data blocks File Systems: Consistency Issues All data

More information

Introduction to Database Systems

Introduction to Database Systems Introduction to Database Systems Based on slides by Dan Suciu Adapted by Michael Hahsler 1 / 16 Database What is a database? Physical storage: A collection of files storing related data. Logical: A collection

More information

Streaming Analytics with Apache Flink. Stephan

Streaming Analytics with Apache Flink. Stephan Streaming Analytics with Apache Flink Stephan Ewen @stephanewen Apache Flink Stack Libraries DataStream API Stream Processing DataSet API Batch Processing Runtime Distributed Streaming Data Flow Streaming

More information

Lecture 1: Overview

Lecture 1: Overview 15-150 Lecture 1: Overview Lecture by Stefan Muller May 21, 2018 Welcome to 15-150! Today s lecture was an overview that showed the highlights of everything you re learning this semester, which also meant

More information