Distributed System. Gang Wu. Spring,2018

Save this PDF as:

Size: px
Start display at page:

Download "Distributed System. Gang Wu. Spring,2018"


1 Distributed System Gang Wu Spring,2018

2 Lecture7:DFS What is DFS? A method of storing and accessing files base in a client/server architecture. A distributed file system is a client/server-based application that allows clients to access and process data stored on the servers as if it were on their own computer. Fault Tolerant in DFS HA Data Integrity

3 Accessing romote files FTP Explicit access User-directed connection to access remote resource Need transparency Allow user to access remote resource just as local ones

4 NFS built on RPC Low performance file consistency Security Issues

5 Google s view Component failures are the norm rather than the exception Files are huge by traditional standards. Most files are mutated by appending new data rather than overwriting existing data. Co-designing the applications and the file system API benefits the overall system by increasing flexibility.

6 GFS-- design overview Assumptions: Component s Monitoring Storing of huge data Reading and writing of data Well defined semantics for multiple clients Importance of Bandwidth Interface Not POSIX compliant Basic ops: create/delete/open/close/read/write Additional operations Snapshot append (Allow multi-clients to append atomically without locking)

7 GFS-- Architecture Cluster Computing Single Master Multiple Chunk Servers Stores 64MB file chunks Multiple clients

8 GFS-- Architecture Single Master: Minimal Master Load. Fixed chunk Size. The master also predicatively provide chunk locations immediately following those requested by unique id. Chunk Size : 64 MB size. Read and write operations on same chunk. Reduces network overhead and size of metadata in the master.

9 GFS-- Architecture-- Metadata Types of Metadata: File and chunk namespaces Mapping from files to chunks Location of each chunks replicas In-memory data structures: Master operations are fast. Periodic scanning entire state is easy and efficient

10 GFS-- Architecture-- Metadata Chunk Locations: Master polls chunk server for the information. Client request data from chunk server. Operation Log: Keeps track of activities. It is central to GFS. (restore the file system) It stores on the disk (persistence), copy on a romote site. Periodic checkpoints (B-tree) to avoid playing back entire log

11 GFS-- Architecture-- System Interactions Leases And Mutation order: Leases maintain consistent mutation order across the replicas. Master picks one replica as primary. Primary defines serial order for mutations. Replicas follow same serial order. Minimize management overhead at the master.

12 GFS-- Architecture-- System Interactions Atomic Record Appends: GFS offers Record Append. Clients on different machines append to the same file concurrently. The data is written at least once as an atomic unit. Snapshot: It creates quick copy of files or a directory. Master revokes lease for that file Duplicate metadata On first write to a chunk after the snapshot operation All chunk servers create new chunk Data can be copied locally

13 GFS-- Architecture-- Master operations Namespace Management and Locking: GFS maps full pathname to Metadata in a table. Each master operation acquires a set of locks. Locking scheme allows concurrent mutations in same directory. Locks are acquired in a consistent total order to prevent deadlock. Replica Placement: Maximizes reliability, availability and network bandwidth utilization. Spread chunk replicas across racks

14 GFS-- Architecture-- Master operations Create: Equalize disk utilization. Limit the number of creation on chunk server. Spread replicas across racks. Re-replication: Re-replication of chunk happens on priority Rebalancing: Move replica for better disk space and load balancing. Remove replicas on chunk servers with below average free space.

15 GFS-- Architecture-- Master operations Rebalancing: Move replica for better disk space and load balancing. Remove replicas on chunk servers with below average free space. Stale Replica detection: Chunk version number identifies the stale replicas. Client or chunk server verifies the version number.

16 GFS-- Fault Tolerance Frequently components breaking down We treat component failures as the norm rather than the exception. detect

17 GFS-- Fault Tolerance High Availability Elimination of single points Reliable crossover Detection of failures as they occur Data Integrity Users never see the failure How to replicate? How to discover? How to recover?

18 Archtecture Review

19 Chunk replicas 1 chunk 3 replicas(by default) Primary (chunk lease) Secondary Chunk Replica placement Across racks Communications cross network switches The trade-off between Data reliability and availability and Network bandwidth utilization

20 Creation of Replica Chunk Creation Disk utilization Limit the recent creation on each server Re-replication Higher priority for chunk of less replicas Copy directly from the valid replica Rebalancing Gradually fills up the chunkserver

21 Stale Replica unsuccessful mutations while server is down Chunk version number Get an increase when master grants new lease To all currently available replicas Be detected by master while restarting

22 Master Replication Operational log + checkpoints Stored on multiple machines Process fails Restart instantly Machine fails Start the process elsewhere relocated

23 Shadow Master Read-Only access to the file system Shadow!= Mirror Lag the primary master slightly Probably stale in metadata

24 Disk failures Checksum Detecting replica corruption 64MB chunk ---> 64KB blocks Each has a 32 bit checksum In memory, and keep persistent as log Verify before return the requester Scan during idle periods

25 Failure discovery Outside monitor infrastructure--> master fails Starting new master process with the operational log Polling all chunkservers to discover chunk servers HeartBeat Message Collect state--> chunkserver fails

26 Chunkserver recovery Time to restore depends on resource amount 1 chunkserver breaks down Cloning limit 15000chunks, 600GB, 23,2 min 2 chunkserver break down 0.8% chunks has single replica Less replica = higher priority Quickly return to a stable state

27 Google File System II Colossus Master fault Tolerant improve better for small file (small chunk)