Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

Similar documents
10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Data Replication CS 188 Distributed Systems February 3, 2015

Distributed File Systems

Distributed Data Management Replication

CSE 486/586: Distributed Systems

Today: Coda, xfs! Brief overview of other file systems. Distributed File System Requirements!

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?

NFS. Don Porter CSE 506

416 Distributed Systems. Distributed File Systems 4 Jan 23, 2017

Disconnected Operation in the Coda File System

Distributed Systems 8L for Part IB. Additional Material (Case Studies) Dr. Steven Hand

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016

Today: Coda, xfs. Case Study: Coda File System. Brief overview of other file systems. xfs Log structured file systems HDFS Object Storage Systems

Current Topics in OS Research. So, what s hot?

Consistency: Relaxed. SWE 622, Spring 2017 Distributed Software Engineering

Lecture 14: Distributed File Systems. Contents. Basic File Service Architecture. CDK: Chapter 8 TVS: Chapter 11

File Locking in NFS. File Locking: Share Reservations

Linearizability CMPT 401. Sequential Consistency. Passive Replication

Distributed File Systems. CS432: Distributed Systems Spring 2017

XP: Backup Your Important Files for Safety

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

Introduction to Distributed Data Systems

Exam 2 Review. Fall 2011

BRANCH:IT FINAL YEAR SEVENTH SEM SUBJECT: MOBILE COMPUTING UNIT-IV: MOBILE DATA MANAGEMENT

Mobile Devices: Server and Management Lesson 07 Mobile File Systems and CODA

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

DHCP Failover: An Improved Approach to DHCP Redundancy

Replication. Feb 10, 2016 CPSC 416

Lecture#16: VM, thrashing, Replacement, Cache state

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Chapter 18 Distributed Systems and Web Services

Federated Array of Bricks Y Saito et al HP Labs. CS 6464 Presented by Avinash Kulkarni

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 15: Caching: Demand Paged Virtual Memory

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

There is a tempta7on to say it is really used, it must be good

Weak Consistency and Disconnected Operation in git. Raymond Cheng

416 practice questions (PQs)

DISTRIBUTED FILE SYSTEMS & NFS

CMU-ITC ITC File System Goals. 12 September File System Group

Avoiding the Cost of Confusion: SQL Server Failover Cluster Instances versus Basic Availability Group on Standard Edition

PRIMARY-BACKUP REPLICATION

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Chapter 7: File-System

Chapter 17: Distributed-File Systems. Operating System Concepts 8 th Edition,

Distributed File Systems. Case Studies: Sprite Coda

CSE 444: Database Internals. Lecture 25 Replication

Dynamo: Key-Value Cloud Storage

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

CSE380 - Operating Systems

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Scaling Out Key-Value Storage

Chapter 11: File System Implementation. Objectives

Cache introduction. April 16, Howard Huang 1

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Distributed Systems. Lec 12: Consistency Models Sequential, Causal, and Eventual Consistency. Slide acks: Jinyang Li

Causal Consistency and Two-Phase Commit

Process Synchroniztion Mutual Exclusion & Election Algorithms

EECS 482 Introduction to Operating Systems

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

CS454/654 Midterm Exam Fall 2004

NFS 3/25/14. Overview. Intui>on. Disconnec>on. Challenges

A Comparison of Two Distributed Systems: Amoeba & Sprite. By: Fred Douglis, John K. Ousterhout, M. Frans Kaashock, Andrew Tanenbaum Dec.

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

File Availability and Consistency for Mobile Systems

Dept. Of Computer Science, Colorado State University

Inside the PostgreSQL Shared Buffer Cache

NFS. CSE/ISE 311: Systems Administra5on

Distributed Systems. Catch-up Lecture: Consistency Model Implementations

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

CS 147: Computer Systems Performance Analysis

Final Examination CS 111, Fall 2016 UCLA. Name:

Orisync Usability Improvement

CS 3640: Introduction to Networks and Their Applications

Remote Procedure Call. Tom Anderson

Cache Coherence Tutorial

Distributed Systems - III

Implementing caches. Example. Client. N. America. Client System + Caches. Asia. Client. Africa. Client. Client. Client. Client. Client.

Synchronization (contd.)

Disks and Files. Jim Gray s Storage Latency Analogy: How Far Away is the Data? Components of a Disk. Disks

BookKeeper overview. Table of contents

Distributed Computation Models

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

CS5412: TRANSACTIONS (I)

CSCI 4717 Computer Architecture

DB2 Data Sharing Then and Now

EBOOK. FROM DISASTER RECOVERY TO ACTIVE-ACTIVE: NuoDB AND MULTI-DATA CENTER DEPLOYMENTS

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

GFS: The Google File System

3.3 Understanding Disk Fault Tolerance Windows May 15th, 2007

Transcription:

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015 Page 1

Example Replicated File Systems NFS Coda Ficus Page 2

NFS Originally NFS did not have any replication capability Replication of read-only file systems added later Primary copy read/write replication added later Page 3

NFS Read-Only Replication Almost by hand Sysadmin ensures multiple copies of file systems are identical Typically on different machines Avoid writing to any replica E.g., mount them read-only Use automounting facilities to handle failover and load balancing Page 4

Primary Copy NFS Replication Commonly referred to as DRDB Typically two replicas Primarily for reliability One replica is the primary It can be written Other replica mirrors the primary Provides service if primary unavailable Page 5

Some Primary Copy Issues Handling updates How and when do they propagate? Determining failure Of the secondary copy Of the primary copy Handling recovery Page 6

Update Issues In DRDB Two choices: Synchronous Writes don t return until both copies are updated Asynchronous Writes return once primary updated Secondary updated later Page 7

Implications of Synchronous Writes Slower, since can t indicate success till both copies are written One is written across the network, ensuring slowness Fewer consistency issues If write returned, both copies have it If not, neither does Real bad timing requires some cleanup Page 8

Implications of Asynchronous Writes Faster, since you only wait for primary copy Almost always works just fine Almost always Problems when it doesn t though Different values of same data at different copies May not clear how it happened Perhaps even worse Page 9

Detecting Failures DRDB usually uses a heartbeat process Primary and secondary expect to communicate every few seconds E.g., every two seconds If too many heartbeats in a row missed, declare the partner dead Might just be unreachable, though Page 10

Responding To Failures Switch service from the primary to the secondary Which becomes the primary Including write service Ensures continued operation after failure Update logging ensure new primary is up to date Page 11

Recovery From Failures Recovered node becomes the secondary Receives missed updates from primary Complications if network failure caused the failure The split brain problem Page 12

The Split Brain Problem Primary Secondary Primary NETWORK PARTITION! Update 1 Update 2 Now what? Update 3 Page 13

The Simple Solution Prevent access to both Until sysadmin designates one of them as the new primary Throw away the other and reset to the designated primary Simple for the sysadmin, maybe not for the users Page 14

What Other Solution Is Possible? Try to figure out what the correct version of the data is In NFS case, chances are good writes are to different files In which case, you probably just need the most recent copy of each file But there are complex cases NFS replication doesn t try to do this Page 15

Coda A follow-on to the Andrew File System (AFS) Using the basic philosophy of AFS But specifically to handle mobile computers Page 16

A server pool The AFS System Clients request files from the servers Client workstations Page 17

AFS Characteristics Files permanently stored at exactly one server Clients keep cached copies Writes cached until file close Asynchronous writes Other copies then invalidated Stateful servers Unless write conflicts Page 18

Adding Mobile Computers A server pool Just like AFS, except... Client workstations Some of the clients are mobile Page 19

Why Does That Make a Difference? Mobile computers come and go Well, so do users at workstations But mobile computers take their files with them And expect to access them while they are gone What happens when they do? Page 20

The Mobile Problem for AFS Now it reconnects The laptop downloads some files to its disk Then it disconnects from the network Then it uses the files And maybe writes them Page 21

Why Is This Different Than Normal AFS? We might get write conflicts here Normal AFS might, too But normal AFS conflicts have a small window Truly concurrent writes only Cache invalidation when someone closes For laptop, close could occur weeks before reconnect Page 22

Handling Disconnected Operations Could use a solution like NFS Server has primary copy Client has secondary copy If client can t access server, can t write Or could use an optimistic solution Assume no one else is going to write your file, so go ahead yourself Detect problems and fix as needed Page 23

The Coda Approach Essentially optimistic When connected, operates much like AFS When disconnected, client is allowed to update cached files Access control permitting But unlike AFS, can t propagate updates on file close After all, it s disconnected Instead, remember this failure until later Page 24

Ficus A more peer-oriented replicated file system A descendant of the Locus operating system Specifically designed for mobile computers Page 25

AFS, Coda, and Caching Like AFS, client machines only cache files An AFS cache miss is just a performance penalty Get it from the server A Coda cache miss when disconnected is a disaster User can t access his file Page 26

Avoiding Disconnected Cache Misses Really requires thinking ahead Initially Coda required users to do it Maintain a list of files they wanted to be sure to always cache In case of disconnected operations Eventually went to a hoarding solution We ll discuss hoarding later Page 27

Coda Reintegration When a disconnected Coda client reconnects Tries to propagate updates occurring during disconnection to a server If no one else updated that file, just like a normal AFS update If someone else updated the file during disconnection, what then? Page 28

Coda and Conflicts Such update problems on Coda reintegration are conflicts Two (or more) users made concurrent writes to a file Original solution was that later update (mostly) lost Update on server wins Other update put in special conflict directory Owning user or sysadmin notified to take action Or not take action... Page 29

Later Coda Conflict Solutions Automated reconciliation of conflicts When possible User tools to help handle them when automation doesn t work Can you think of particularly problematic issues here? Page 30

The Locus Computing Model System composed of many personal workstations Connected by a local area network Shared by all! And perhaps a few shared server machines All machines have dedicated storage But provide the illusion of... Page 31

The Ficus Computing Model Just like the Locus model, except... Some of the workstations are portable computers Which might disconnect from the network Taking their storage with them Page 32

Ficus Shares Some Problems With Coda Portable computers can only access local disks while disconnected Updates involving disconnected computers are complicated And can even cause conflicts Page 33

Ficus Has Some Unique Problems, Too What happens to this when the portables storage goes away? It s really... And, unfortunately... Page 34

Handling the Problems Rely on replication Replicate the files that the portable needs while disconnected Replicate the files it s taking away when it departs So everyone else can still see them Page 35

Updates in Ficus Ficus uses peer replication No primary copy All replicas are equally good So if access permissions allow update And you can get to a replica You can update it How does Ficus handle that? Page 36

The Easy Case All replicas are present and available Allow update to one of the replicas Make a reasonable effort to propagate the update to all others But not synchronously On a good day, this works and everything is identical Page 37

The Hard Case The best effort to propagate an update from the original replica fails Perhaps because you can t reach one or more other replicas Perhaps because the portable computers holding them are elsewhere Page 38

With primary copies Handling Updates With peer copies Primary Secondary If they re the same, no problem If they re the different, the primary always wins Only possible reason is that the secondary is old If they re the same, still no problem But what if they re different? Page 39

What Are the Possibilities? 1. One is old and the other is updated Or... How do we tell which is the new one? 2. Both have been updated Now what? Page 40

More Complicated If >2 Replicas Here s just one example Replica 1 Replica 2 Replica 3 What s the right thing to do? And how do you figure that out? Update replica 1 Propagate to replica 2 Propagate to replica 3 Somehow you figure out replica 2 is newer than replica 3 Update replica 1 Page 41

Reconciliation Always an option in Locus and Ficus Much more important with disconnected operation When a replica notices a previously unavailable replica, Check for missing updates and trade information about them The async operation that ensures eventual update propagation Page 42

Gossiping in Ficus Primary copy replication and systems like Coda always propagate updates the same way Other replicas give their updates to a single site And get new updates from that site Peer systems like Ficus have another option Any peer with later updates can pass them to you Even if they aren t the primary and didn t create the updates In file systems, this is called gossiping Page 43

How Does Ficus Track Updates? Ficus uses version vectors An applied type of vector clock These clocks keep one vector element per replica With a vector clock stored at each replica Clocks tick only on updates Page 44

Version Vector Use Example Replica 1 Replica 2 Replica 3 01 0 0 1 0 0 01 01 0 0 1 When replica 2 comes back, its version will be recognized as old Compared to either replica 1 or replica 3 Page 45

Version Vectors and Conflicts Ficus recognizes concurrent (and thus conflicting) writes Using version vectors If neither of two version vectors dominates the other, there s a conflict Implying concurrent write Typically detected during reconciliation Page 46

For Example CONFLICT! Replica 1 Replica 2 0 0 1 0 1 0 Page 47

Now What? Conflicting files represent concurrent writes There is no correct order to apply them Use other techniques to resolve the conflicts Creating a semantically correct and/ or acceptable version Page 48

Example Conflict Resolution Identical conflicts Same update made in two different places Easy to resolve Assuming updates in question are idempotent Conflicts involving append-only files Merge the appends Most Unix directory conflicts are automatically resolvable Page 49

Ficus Replication Granularity NFS replicates volumes Coda replicates individual files Ficus replicates volumes Later, selective replication of files within volumes added Page 50

Hoarding A portable machine off the network must operate off its own disk Only! So it better replicate the files it needs If you know/predict portable disconnection, pre-replicate those files That s called hoarding Page 51

Mechanics of Hoarding Mechanically easy if you replicate at file granularity E.g., Coda or Ficus with selective replication Simply replicate what you need Inefficient if you replicate at volume granularity Page 52

What Do You Hoard? Could be done manually Doesn t work out well Could replicate every file the portable ever touches Might overfill its disk Could use LRU Experience shows that fails oddly Page 53

What Does Work Well? You might think clustering Identify files that are used together If one of them recently used, hoard them all Basic approach in Seer Actually, LRU plus some sleazy tricks works equally well And is much cheaper Page 54