Reducing Crash Recoverability to Reachability
|
|
- Marybeth York
- 5 years ago
- Views:
Transcription
1 Reducing Crash Recoverability to Reachability Eric Koskinen Yale University Junfeng Yang Columbia University Principles of Programming Languages St. Petersburg, Florida 20 January 2016
2 We are pretty good at writing programs
3 We are pretty good at writing programs
4 We are pretty good at writing programs
5 We are pretty good at writing programs
6 We are pretty good at writing programs
7 We are pretty good at writing programs?
8 1. What do we mean by crash and recovery? Specification 2. Can we prove (automatically) that a program recovers from a crash? 3. Does this actually work on real examples?
9 What do we mean by crash and recovery? 0 1 CRASH 2 3
10 What do we mean by crash and recovery? CRASH 1. Boot machine 2. Establish program env. 3. Execute program 4. Crash mid-execution 5. Re-boot computer 6. Execute Recovery Script 7. Establish program env. 8. Re-execute program
11 What do we mean by crash and recovery? 0 1 CRASH 1. Boot machine 2. Establish program env. 3. Execute program Initial State 4. Crash mid-execution 2 5. Re-boot computer 6. Execute Recovery Script 7. Establish program env Re-execute program
12 What do we mean by crash and recovery? 0 1 CRASH 1. Boot machine 2. Establish program env. 3. Execute program Initial State 4. Crash mid-execution Crash 2 5. Re-boot computer 6. Execute Recovery Script 7. Establish program env Re-execute program
13 What do we mean by crash and recovery? 0 1 CRASH 1. Boot machine 2. Establish program env. 3. Execute program Initial State 4. Crash mid-execution Crash 2 5. Re-boot computer 6. Execute Recovery Script Recover 7. Establish program env Re-execute program
14 What do we mean by crash and recovery? 0 1 CRASH in = open(input) read(in,buf); CRASH
15 in = open(input) out = open(output,o_creat O_WRONLY O_TRUNC) write(out, A ) CRASH...
16 CRASH in = open(input) out = open(output,o_creat O_WRONLY O_TRUNC) write(out, A ) CRASH
17 Is this new trace ok? CRASH in = open(input) out = open(output,o_creat O_WRONLY O_TRUNC) write(out, A ) CRASH
18 Program states 7min
19 Program states With possibility of crashes... possibility of new behaviors Definition: If the program crashes, when it is re-executed, should not have new behaviors that weren t in the original program. Matches what the program does Program must handle new initial states
20 Program states With possibility of crashes... possibility of new behaviors Would like to prove that they are already included in the original program. C C Therefore... We can use the original program as the specification for how the program should behave in the presence of crashes.
21 Program states With possibility of crashes... possibility of new behaviors Would like to prove that they are already included in the original program. C C Therefore... We can use the original program as the specification for how the program should behave in the presence of crashes.
22 Non-determinism in = open(input) out = open(output,o_creat O_WRONLY O_TRUNC) if(rand()) { write(out, A ); CRASH } else { write(out, B ); }...
23 Non-determinism CRASH in = open(input) out = open(output,o_creat O_WRONLY O_TRUNC) if(rand()) { write(out, A ); CRASH } else { write(out, B ); }
24 Recovery Scripts RECOVER 10 CRASH Described in the Paper in = open(input) out = creat(output) if(rand()) { write(out, A ); CRASH; RECOVER() } else { write(out, B ); }... RECOVER() { if(exists(output)) unlink(output); }
25 6CRASH Specification Checkpoints RECOVER 3 Described in the Paper in = open(input) out = creat(output) write(out, pre ); fsync_commit(out); chkpt: if(rand()) { CRASH; RECOVER()... } else {... } RECOVER() { if(committed) { in=open(input); out=open(output); goto chkpt; } }
26 Hierarchy of Crash Recoverability 0-recoverability 1-recoverability N-recoverability
27 Hierarchy of Crash Recoverability 0-recoverability 1-recoverability N-recoverability
28 Hierarchy of Crash Recoverability 0-recoverability 1-recoverability N-recoverability
29 Hierarchy of Crash Recoverability 0-recoverability 1-recoverability N-recoverability Simulation
30 Hierarchy of Crash Recoverability 0-recoverability 1-recoverability N-recoverability Recoverability
31 Hierarchy of Crash Recoverability 0-recoverability 1-recoverability N-recoverability -recoverability
32 1. What do we mean by crash and recovery? 2. Can we prove (automatically) that a program recovers from a crash? 3. Does this actually work on real examples?
33 Key Idea: Transformation ( ) 0 fd = open(pw); {fd < 0} 2 5 {joe buf} 1 {fd 0} buf = read(fd); close(fd); 3 {joe buf} d=readdir(/u); 4 creat(pw2); append(pw2,buf); append(pw2,joe); fsync(pw2); close(pw2); 7 rename(pw2,pw); psync(pw); mkdir(/u/joe);
34 Key Idea: Transformation ( ) 0 fd = open(pw); {fd < 0} 2 5 {joe buf} 1 {fd 0} buf = read(fd); close(fd); 3 {joe buf} d=readdir(/u); 4 creat(pw2); append(pw2,buf); append(pw2,joe); fsync(pw2); close(pw2); 7 rename(pw2,pw); psync(pw); mkdir(/u/joe); reduce to reachability: ( ). cannot reach qerr is crash-recoverable
35 Key Idea: Transformation ( ) 0 fd = open(pw); {fd < 0} 2 5 {joe buf} 1 {fd 0} buf = read(fd); close(fd); 3 {joe buf} d=readdir(/u); 4 creat(pw2); append(pw2,buf); append(pw2,joe); fsync(pw2); close(pw2); 7 rename(pw2,pw); psync(pw); mkdir(/u/joe); reduce to reachability: Well-founded relation. cannot reach qerr is crash-recoverable
36 r = m(a) 1 2 s = n(b) σ1 ain Theorem. cannot reach qerr is crash-recoverable
37 r = m(a) r 1 2 Create Snapshot pw2 := pw; mem2 := mem; σ s = n(b) σ2 ain Theorem. cannot reach qerr is crash-recoverable
38 r = m(a) s = n(b) Create Snapshot pw3 := pw; mem3 := mem; r s σ σ2 σ3 ain Theorem. cannot reach qerr is crash-recoverable
39 r = m(a) s = n(b) r s σ σ2 σ3 σ4 σ5 ain Theorem. cannot reach qerr is crash-recoverable
40 r = m(a) s = n(b) r s Crash σ2 σ3 σ4 σ5 ain Theorem σ. cannot reach qerr is crash-recoverable qerr Recovery Termination
41 r = m(a) r s σ s = n(b) σ2 σ3 σ4 σ5 `σ ain Theorem Load Snapshot `pw := pw2; `mem := mem2;. cannot reach qerr is crash-recoverable qerr
42 r = m(a) r s σ s = n(b) `s s `σ Execute uncrashed snapshot `s := n(`b); And recovered state ain Theorem s := n(b);. cannot reach qerr is crash-recoverable qerr
43 r = m(a) s = n(b) r s σ σ `σ `σ `t t qerr ain Theorem. cannot reach qerr is crash-recoverable
44 r = m(a) s = n(b) r s σ `σ `t t qerr ain Theorem. cannot reach qerr is crash-recoverable
45 ain Theorem. cannot reach qerr is crash-recoverable
46 1. What do we mean by crash and recovery? 2. Can we prove (automatically) that a program recovers from a crash? 3. Does this actually work on real examples?
47 Eleven 82
48 Eleven 82 counter- example Proof.
49 Notes Specification Built on CPAchecker Compiler Macros Model of the filesystem with arrays and integers Copying with arrays Eleven 82 counter- example Proof.
50 Benchmarks Simple examples from earlier in this talk Examples of crash recovery protocols of real-world examples [Pillai et al. OSDI 14] Google s LevelDB PostgreSQL - Used by 30% of tech companies SQLite - Used by probably every Android app (1B users) VMware ZooKeeper - Distributed applications, used by Yahoo
51
52
53 Related Work Chen et al. Using Crash Hoare logic for certifying the FSCQ file system. SOSP 2015 Broadly complementary: verified FS versus verifying user-level programs Specifically different: we focus on automation while they focus on proof modularity/reusability (require user-provided CHL specifications and user help in proof obligations) Ntzik et al. Fault-Tolerant Resource Reasoning. APLAS Novel logic explicitly tracking volatile/persistant Support concurrency, Not automated Gardner et al. Local Reasoning for the POSIX filesystem. ESOP Ridge et al. SibylFS: formal specification and oracle-based testing for POSIX and real-world file systems. SOSP 2015
54 Reducing Crash Recoverability to Reachability Eric Koskinen Yale University POPL 2016 Junfeng Yang Columbia University Contributions Specification - Definitions on what it means for a crash to recover Automatic - Reduction to automaton reachability - Proved recoverability of commit protocols from real systems (SQLite, LevelDB, ZooKeeper, etc.) Open Challenges - Code scope, O/S layers N-recoverability, infinite-recoverability Timing - Does recovery happen promptly? Concurrency
55 Thank you!
EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University
EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors Junfeng Yang, Can Sar, Dawson Engler Stanford University Why check storage systems? Storage system errors are among the
More informationAll File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications
All File Systems Are Not Created Equal: On the Compleity of Crafting Crash-Consistent Applications Thanumalayan Sankaranarayana Pillai Vijay Chidambaram Ramnatthan Alagappan, Samer Al-Kiswany Andrea Arpaci-Dusseau,
More informationA Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm
Appears as Technical Memo MIT/LCS/TM-590, MIT Laboratory for Computer Science, June 1999 A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm Miguel Castro and Barbara Liskov
More informationPush-Button Verification of File Systems
1 / 24 Push-Button Verification of File Systems via Crash Refinement Helgi Sigurbjarnarson, James Bornholt, Emina Torlak, Xi Wang University of Washington 2 / 24 File systems are hard to get right Complex
More informationModular Verification of Order-Preserving Write-Back Caches
Modular Verification of Order-Preserving Write-Back Caches Jörg Pfähler, Gidon Ernst, Stefan Bodenmüller, Gerhard Schellhorn, and Wolfgang Reif Institute for Software and Systems Engineering University
More informationUsing Crash Hoare Logic for Certifying the FSCQ File System
1 / 27 Using Crash Hoare Logic for Certifying the FSCQ File System Haogang Chen, Daniel Ziegler, Tej Chajed, Adam Chlipala, Frans Kaashoek, and Nickolai Zeldovich MIT CSAIL 2 / 27 File systems are complex
More informationPush-Button Verification of File Systems
1 / 25 Push-Button Verification of File Systems via Crash Refinement Helgi Sigurbjarnarson, James Bornholt, Emina Torlak, Xi Wang University of Washington October 26, 2016 2 / 25 File systems are hard
More informationFlashix: Results and Perspective. Jörg Pfähler, Stefan Bodenmüller, Gerhard Schellhorn, (Gidon Ernst)
Flashix: Results and Perspective Jörg Pfähler, Stefan Bodenmüller, Gerhard Schellhorn, (Gidon Ernst) Overview 1. Flash Memory and Flash File Systems 2. Results of Flashix I 3. Current Result: Integration
More informationPush-button verification of Files Systems via Crash Refinement
Push-button verification of Files Systems via Crash Refinement Verification Primer Behavioral Specification and implementation are both programs Equivalence check proves the functional correctness Hoare
More informationNFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency
Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency Local file systems Disks are terrible abstractions: low-level blocks, etc. Directories, files, links much
More informationNetwork File Systems
Network File Systems CS 240: Computing Systems and Concurrency Lecture 4 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Abstraction, abstraction, abstraction!
More informationAdvanced Memory Management
Advanced Memory Management Main Points Applications of memory management What can we do with ability to trap on memory references to individual pages? File systems and persistent storage Goals Abstractions
More informationRecovering from a Crash. Three-Phase Commit
Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator
More informationAn Operational and Axiomatic Semantics for Non-determinism and Sequence Points in C
An Operational and Axiomatic Semantics for Non-determinism and Sequence Points in C Robbert Krebbers Radboud University Nijmegen January 22, 2014 @ POPL, San Diego, USA 1 / 16 What is this program supposed
More informationCertifying a file system: Correctness in the presence of crashes
1 / 28 Certifying a file system: Correctness in the presence of crashes Tej Chajed, Haogang Chen, Stephanie Wang, Daniel Ziegler, Adam Chlipala, Frans Kaashoek, and Nickolai Zeldovich MIT CSAIL 2 / 28
More informationGeorgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong
Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong Relatively recent; still applicable today GFS: Google s storage platform for the generation and processing of data used by services
More informationTowards Efficient, Portable Application-Level Consistency
Towards Efficient, Portable Application-Level Consistency Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Joo-Young Hwang, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau 1 File System Crash
More informationintroduction to Programming in C Department of Computer Science and Engineering Lecture No. #40 Recursion Linear Recursion
introduction to Programming in C Department of Computer Science and Engineering Lecture No. #40 Recursion Linear Recursion Today s video will talk about an important concept in computer science which is
More informationFailure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems
Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements
More informationThe POSIX shell as a programming language
The POSIX shell as a programming language Michael Greenberg (Pomona College) OBT 2017 Paris, France i love shell shell is everywhere vital for managing systems maintenance deployment universal tool for
More informationThe challenges of non-stable predicates. The challenges of non-stable predicates. The challenges of non-stable predicates
The challenges of non-stable predicates Consider a non-stable predicate Φ encoding, say, a safety property. We want to determine whether Φ holds for our program. The challenges of non-stable predicates
More informationReducing the Costs of Large-Scale BFT Replication
Reducing the Costs of Large-Scale BFT Replication Marco Serafini & Neeraj Suri TU Darmstadt, Germany Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de
More informationVerified compilers. Guest lecture for Compiler Construction, Spring Magnus Myréen. Chalmers University of Technology
Guest lecture for Compiler Construction, Spring 2015 Verified compilers Magnus Myréen Chalmers University of Technology Mentions joint work with Ramana Kumar, Michael Norrish, Scott Owens and many more
More informationImplementing a Verified On-Disk Hash Table
Implementing a Verified On-Disk Hash Table Stephanie Wang Abstract As more and more software is written every day, so too are bugs. Software verification is a way of using formal mathematical methods to
More informationNPTEL Course Jan K. Gopinath Indian Institute of Science
Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,
More informationCSE 486/586 Distributed Systems
CSE 486/586 Distributed Systems Failure Detectors Slides by: Steve Ko Computer Sciences and Engineering University at Buffalo Administrivia Programming Assignment 2 is out Please continue to monitor Piazza
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationPostgreSQL on Solaris. PGCon Josh Berkus, Jim Gates, Zdenek Kotala, Robert Lor Sun Microsystems
PostgreSQL on Solaris PGCon 2007 Josh Berkus, Jim Gates, Zdenek Kotala, Robert Lor Sun Microsystems 1 Agenda Sun Cluster ZFS Zones Dtrace Service Management Facility (SMF) PGCon 2007 2 Hightly Available
More informationNote that in this definition, n + m denotes the syntactic expression with three symbols n, +, and m, not to the number that is the sum of n and m.
CS 6110 S18 Lecture 8 Structural Operational Semantics and IMP Today we introduce a very simple imperative language, IMP, along with two systems of rules for evaluation called small-step and big-step semantics.
More informationToday CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra
Today CSCI 5105 Recovery CAP Theorem Instructor: Abhishek Chandra 2 Recovery Operations to be performed to move from an erroneous state to an error-free state Backward recovery: Go back to a previous correct
More informationThe objective. Atomic Commit. The setup. Model. Preserve data consistency for distributed transactions in the presence of failures
The objective Atomic Commit Preserve data consistency for distributed transactions in the presence of failures Model The setup For each distributed transaction T: one coordinator a set of participants
More informationCS5232 Formal Specification and Design Techniques. Using PAT to verify the Needham-Schroeder Public Key Protocol
CS5232 Formal Specification and Design Techniques Using PAT to verify the Needham-Schroeder Public Key Protocol Semester 2, AY 2008/2009 1/37 Table of Contents 1. Project Introduction 3 2. Building the
More informationFork Sequential Consistency is Blocking
Fork Sequential Consistency is Blocking Christian Cachin Idit Keidar Alexander Shraer May 14, 2008 Abstract We consider an untrusted server storing shared data on behalf of clients. We show that no storage
More informationAPPLICATIONS AND PROTOCOLS. Mihir Bellare UCSD 1
APPLICATIONS AND PROTOCOLS Mihir Bellare UCSD 1 Some applications and protocols Internet Casino Commitment Shared coin flips Threshold cryptography Forward security Program obfuscation Zero-knowledge Certified
More informationToday: Fault Tolerance. Reliable One-One Communication
Today: Fault Tolerance Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing Message logging Lecture 17, page 1 Reliable One-One Communication Issues
More informationCrash Recovery. Assignment 1 Posted Saturday
Crash Recovery Wyatt Lloyd Assignment 1 Posted Saturday On github, instructions in readme.md: https://github.com/usc657/username-assignment1 Posted later than I intended => You get lots of late days Please
More informationSystem support for adaptation and composition of applications
System support for adaptation and composition of applications Stephen Kell Stephen.Kell@cl.cam.ac.uk Computer Laboratory University of Cambridge... adaptation and composition... p. 1 First, a video Note:
More informationDistributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented
More informationIntermediate Code Generation Part II
Intermediate Code Generation Part II Chapter 6: Type checking, Control Flow Slides adapted from : Robert van Engelen, Florida State University Static versus Dynamic Checking Static checking: the compiler
More informationCA485 Ray Walshe Google File System
Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage
More informationFork Sequential Consistency is Blocking
Fork Sequential Consistency is Blocking Christian Cachin Idit Keidar Alexander Shraer Novembe4, 008 Abstract We consider an untrusted server storing shared data on behalf of clients. We show that no storage
More informationHaopeng Liu, Xu Wang *, Guangpu Li, Shan Lu, Feng Ye, and Chen Tian.
Haopeng Liu, Xu Wang *, Guangpu Li, Shan Lu, Feng Ye, and Chen Tian http://fcatch.cs.uchicago.edu/ * 1 Component failure: node crashes and message drops 2 #. Events 3 1000 900 800 700 600 500 400 300 200
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationYuxi Chen, Shu Wang, Shan Lu, and Karthikeyan Sankaralingam *
Yuxi Chen, Shu Wang, Shan Lu, and Karthikeyan Sankaralingam * * 2 q Synchronization mistakes in multithreaded programs Thread 1 Thread 2 If(ptr){ tmp = *ptr; ptr = NULL; } Segfault q Common q Hard to diagnose
More informationFault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit
Fault Tolerance o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication o Distributed Commit -1 Distributed Commit o A more general problem of atomic
More informationNetwork File System (NFS)
Network File System (NFS) Brad Karp UCL Computer Science CS GZ03 / M030 19 th October, 2009 NFS Is Relevant Original paper from 1985 Very successful, still widely used today Early result; much subsequent
More informationOperating Systems (2INC0) 2018/19. Introduction (01) Dr. Tanir Ozcelebi. Courtesy of Prof. Dr. Johan Lukkien. System Architecture and Networking Group
Operating Systems (2INC0) 20/19 Introduction (01) Dr. Courtesy of Prof. Dr. Johan Lukkien System Architecture and Networking Group Course Overview Introduction to operating systems Processes, threads and
More informationNetwork File System (NFS)
Network File System (NFS) Brad Karp UCL Computer Science CS GZ03 / M030 14 th October 2015 NFS Is Relevant Original paper from 1985 Very successful, still widely used today Early result; much subsequent
More informationCS 261 Fall Mike Lam, Professor. Exceptional Control Flow and Processes
CS 261 Fall 2017 Mike Lam, Professor Exceptional Control Flow and Processes Exceptional control flow Most control flow is sequential However, we have seen violations of this rule Exceptional control flow
More informationWeak Levels of Consistency
Weak Levels of Consistency - Some applications are willing to live with weak levels of consistency, allowing schedules that are not serialisable E.g. a read-only transaction that wants to get an approximate
More informationThe Dangers and Complexities of SQLite Benchmarking. Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram
The Dangers and Complexities of SQLite Benchmarking Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram 2 3 Benchmarking SQLite is Non-trivial! Benchmarking complex systems in a repeatable fashion
More informationHypervisor-based Fault-tolerance. Where should RC be implemented? The Hypervisor as a State Machine. The Architecture. In hardware
Where should RC be implemented? In hardware sensitive to architecture changes At the OS level state transitions hard to track and coordinate At the application level requires sophisticated application
More informationDistributed Systems 16. Distributed File Systems II
Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS
More informationOS COMPONENTS OVERVIEW OF UNIX FILE I/O. CS124 Operating Systems Fall , Lecture 2
OS COMPONENTS OVERVIEW OF UNIX FILE I/O CS124 Operating Systems Fall 2017-2018, Lecture 2 2 Operating System Components (1) Common components of operating systems: Users: Want to solve problems by using
More informationCSE Database Management Systems. York University. Parke Godfrey. Winter CSE-4411M Database Management Systems Godfrey p.
CSE-4411 Database Management Systems York University Parke Godfrey Winter 2014 CSE-4411M Database Management Systems Godfrey p. 1/16 CSE-3421 vs CSE-4411 CSE-4411 is a continuation of CSE-3421, right?
More informationMembrane: Operating System support for Restartable File Systems
Membrane: Operating System support for Restartable File Systems Membrane Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M.
More informationAdvanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)
: LFS and Soft Updates Ken Birman (based on slides by Ben Atkin) Overview of talk Unix Fast File System Log-Structured System Soft Updates Conclusions 2 The Unix Fast File System Berkeley Unix (4.2BSD)
More informationType Theory meets Effects. Greg Morrisett
Type Theory meets Effects Greg Morrisett A Famous Phrase: Well typed programs won t go wrong. 1. Describe abstract machine: M ::= 2. Give transition relation: M 1 M 2
More informationUnit 9 Transaction Processing: Recovery Zvi M. Kedem 1
Unit 9 Transaction Processing: Recovery 2013 Zvi M. Kedem 1 Recovery in Context User%Level (View%Level) Community%Level (Base%Level) Physical%Level DBMS%OS%Level Centralized Or Distributed Derived%Tables
More informationTopics in Software Testing
Dependable Software Systems Topics in Software Testing Material drawn from [Beizer, Sommerville] Software Testing Software testing is a critical element of software quality assurance and represents the
More informationFault-Tolerance & Paxos
Chapter 15 Fault-Tolerance & Paxos How do you create a fault-tolerant distributed system? In this chapter we start out with simple questions, and, step by step, improve our solutions until we arrive at
More informationStorage Integration with Host-based Write-back Caching
Storage Integration with Host-based Write-back Caching Andy Banta @andybanta NetApp SolidFire Santa Clara, CA 1 Agenda Patented information How virtual machines use storage Caching methods And who can
More informationAgreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering
Agreement and Consensus SWE 622, Spring 2017 Distributed Software Engineering Today General agreement problems Fault tolerance limitations of 2PC 3PC Paxos + ZooKeeper 2 Midterm Recap 200 GMU SWE 622 Midterm
More informationGFS: The Google File System. Dr. Yingwu Zhu
GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can
More informationVembu BDR Suite vs HPE VM Explorer
vs VEMBU TECHNOLOGIES www.vembu.com About Vembu Vembu Technologies is a well known technology innovator in offering (BDR) product targeted at IT-Administrators in large, medium and small businesses to
More informationVerifying C & C++ with ESBMC
Verifying C & C++ with ESBMC Denis A Nicole dan@ecs.soton.ac.uk CyberSecuritySoton.org [w] @CybSecSoton [fb & tw] ESBMC ESBMC, the Efficient SMT-Based Context-Bounded Model Checker was originally developed
More informationLarge-Scale Key-Value Stores Eventual Consistency Marco Serafini
Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,
More informationThe semantics of a programming language is concerned with the meaning of programs, that is, how programs behave when executed on computers.
Semantics The semantics of a programming language is concerned with the meaning of programs, that is, how programs behave when executed on computers. The semantics of a programming language assigns a precise
More informationRegular Languages (14 points) Solution: Problem 1 (6 points) Minimize the following automaton M. Show that the resulting DFA is minimal.
Regular Languages (14 points) Problem 1 (6 points) inimize the following automaton Show that the resulting DFA is minimal. Solution: We apply the State Reduction by Set Partitioning algorithm (särskiljandealgoritmen)
More informationdisk writes, then a problem arises.
DOI:10.1145/3051092 Certifying a File System Using Crash Hoare Logic: Correctness in the Presence of Crashes By Tej Chajed, Haogang Chen, Adam Chlipala, M. Frans Kaashoek, Nickolai Zeldovich, and Daniel
More informationHadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017
Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google
More informationFault Tolerance. Chapter 7
Fault Tolerance Chapter 7 Basic Concepts Dependability Includes Availability Reliability Safety Maintainability Failure Models Type of failure Crash failure Omission failure Receive omission Send omission
More informationCOMMENTS. AC-1: AC-1 does not require all processes to reach a decision It does not even require all correct processes to reach a decision
ATOMIC COMMIT Preserve data consistency for distributed transactions in the presence of failures Setup one coordinator a set of participants Each process has access to a Distributed Transaction Log (DT
More informationHiding local state in direct style: a higher-order anti-frame rule
1 / 65 Hiding local state in direct style: a higher-order anti-frame rule François Pottier January 28th, 2008 2 / 65 Contents Introduction Basics of the type system A higher-order anti-frame rule Applications
More informationConsensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.
Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}
More informationToward (SOS) Self-stabilizing Operating System
Toward (SOS) Self-stabilizing Operating System Shlomi Dolev and Reuven Yagel, Ben-Gurion University, Israel Sep. 1st SAACS 04 Workshop, Zaragoza Outline Motivation: current operating systems do not stabilize!
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationConsensus, impossibility results and Paxos. Ken Birman
Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}
More informationSelf-stabilizing Byzantine Digital Clock Synchronization
Self-stabilizing Byzantine Digital Clock Synchronization Ezra N. Hoch, Danny Dolev and Ariel Daliot The Hebrew University of Jerusalem We present a scheme that achieves self-stabilizing Byzantine digital
More informationTurning proof assistants into programming assistants
Turning proof assistants into programming assistants ST Winter Meeting, 3 Feb 2015 Magnus Myréen Why? Why combine proof- and programming assistants? Why proofs? Testing cannot show absence of bugs. Some
More informationMain Goal. Language-independent program verification framework. Derive program properties from operational semantics
Main Goal Language-independent program verification framework Derive program properties from operational semantics Questions: Is it possible? Is it practical? Answers: Sound and complete proof system,
More informationCS 550 Operating Systems Spring Operating Systems Overview
1 CS 550 Operating Systems Spring 2018 Operating Systems Overview 2 What is an OS? Applications OS Hardware A software layer between the hardware and the application programs/users which provides a virtualization
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationGFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures
GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,
More informationDatabase!! Structured data collection!! Records!! Relationships. Enforces that data maintains certain consistency properties
Relational Databases Sam Madden Key ideas: Declarative programming Transactions Database Structured data collection Records Relationships Database management system (DBMS) Why? 1) Widely used 2) Several
More informationReasoning about modules: data refinement and simulation
Reasoning about modules: data refinement and simulation David Naumann naumann@cs.stevens-tech.edu Stevens Institute of Technology Naumann - POPL 02 Java Verification Workshop p.1/17 Objectives of talk
More informationLecture Notes: Hoare Logic
Lecture Notes: Hoare Logic 17-654/17-754: Analysis of Software Artifacts Jonathan Aldrich (jonathan.aldrich@cs.cmu.edu) Lecture 3 1 Hoare Logic The goal of Hoare logic is to provide a formal system for
More informationp x i 1 i n x, y, z = 2 x 3 y 5 z
3 Pairing and encoding functions Our aim in this part of the course is to show that register machines can compute everything that can be computed, and to show that there are things that can t be computed.
More informationProgramming Languages Third Edition
Programming Languages Third Edition Chapter 12 Formal Semantics Objectives Become familiar with a sample small language for the purpose of semantic specification Understand operational semantics Understand
More informationAn Empirical Study of High Availability in Stream Processing Systems
An Empirical Study of High Availability in Stream Processing Systems Yu Gu, Zhe Zhang, Fan Ye, Hao Yang, Minkyong Kim, Hui Lei, Zhen Liu Stream Processing Model software operators (PEs) Ω Unexpected machine
More informationC 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How.
Recap Best Practices Distributed Systems Failure Detectors Steve Ko Computer Sciences and Engineering University at Buffalo 2 Today s Question Two Different System Models How do we handle failures? Cannot
More informationConcurrent specifications beyond linearizability
Concurrent specifications beyond linearizability Éric Goubault Jérémy Ledent Samuel Mimram École Polytechnique, France OPODIS 2018, Hong Kong December 19, 2018 1 / 14 Objects Processes communicate through
More informationMaterial from Recitation 1
Material from Recitation 1 Darcey Riley Frank Ferraro January 18, 2011 1 Introduction In CSC 280 we will be formalizing computation, i.e. we will be creating precise mathematical models for describing
More informationDISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD
Department of Computer Science Institute of System Architecture, Operating Systems Group DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD OUTLINE Classical distributed file systems NFS: Sun Network File System
More informationCS505: Distributed Systems
Department of Computer Science CS505: Distributed Systems Lecture 13: Distributed Transactions Outline Distributed Transactions Two Phase Commit and Three Phase Commit Non-blocking Atomic Commit with P
More informationByzantine Fault Tolerance
Byzantine Fault Tolerance CS 240: Computing Systems and Concurrency Lecture 11 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. So far: Fail-stop failures
More informationFile Systems: Consistency Issues
File Systems: Consistency Issues File systems maintain many data structures Free list/bit vector Directories File headers and inode structures res Data blocks File Systems: Consistency Issues All data
More informationIntroduction to Database Systems
Introduction to Database Systems Based on slides by Dan Suciu Adapted by Michael Hahsler 1 / 16 Database What is a database? Physical storage: A collection of files storing related data. Logical: A collection
More informationStreaming Analytics with Apache Flink. Stephan
Streaming Analytics with Apache Flink Stephan Ewen @stephanewen Apache Flink Stack Libraries DataStream API Stream Processing DataSet API Batch Processing Runtime Distributed Streaming Data Flow Streaming
More informationLecture 1: Overview
15-150 Lecture 1: Overview Lecture by Stefan Muller May 21, 2018 Welcome to 15-150! Today s lecture was an overview that showed the highlights of everything you re learning this semester, which also meant
More information