Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong Relatively recent; still applicable today GFS: Google s storage platform for the generation and processing of data used by services that require large data sets Goals: performance, scalability, reliability, and availability Design driven by observed application workloads 1

Component failures are not uncommon commodity parts File size is far larger than traditional FS standards multi-gb files Files are mutated by appending data few large, sequential writes; many producers High bandwidth is required bulk data processing Consists of: Single master Multiple chunkservers Chunkservers store chunks on local disks Files divided into fixed-size chunks of 64MB Each chunk is replicated on multiple chunkservers Multiple clients File data not cached by client 2

Makes chunk placement and replication decisions using global knowledge Minimal involvement in reads and writes Maintains all file system metadata Periodically sends HeartBeat messages to collect state 3

!"#$ "#% Creation Place replicas on chunkservers with below-average disk space utilization Limit number of recent creations on each chunkserver Spread replicas across racks Re-replication Occurs when replicas fall below a threshold Takes priority into account Operates with throttled bandwidth for clone operations Rebalancing % Marks deletion rather than reclaim resources immediately; deletion operation occurs three days after being marked Marks files by turning them into hidden files Able to identify orphaned and stale chunks Advantages Simple and reliable Occurs when master is relatively free Three day safety net for deletion 4

& Historical record of critical metadata changes logged using logical time Defines order of concurrent operations Updates master s state simply, reliably, and without risking inconsistencies upon master crashing File mutations are atomic Consistent all replicas have the same data Defined all replicas have current mutation in its entirety (also consistent) Successful mutations guarantee All mutations in order All replicas are defined and consistent 5

& Minimize master involvement Mutation change to data or metadata Leases maintain consistency Primary replica that receives lease Receives commands from client Decides order for all mutations Control and data are separate Pipelining to minimize latency and bottlenecks Chunkservers repeat data to the next closest chunkserver while receiving 6

' # GFS serializes concurrent record appends and writes each record at least once atomically Order of writes does not matter Simplifies concurrent writes for client app Failures if a write fails, the client retries Client handles inconsistent regions 7

# Used to create branch copies Master receives snapshot request and revokes/times out leases Copies metadata Next client request triggers branch chunk creation New chunk created on same chunkserver 8

Hardware failures are inevitable Failures can make data unavailable or corrupt GFS Goals High Availability High Data Integrity (% Fast Recovery master and chunkserver can boot very quickly Replication simple and easy to monitor with automatic re-replication Master logs and checkpoints are replicated To recover master, we load last checkpoint and replay operation log from when it was checkpointed Alias used by clients to access master Shadow masters used for read operations 9

Chunkservers use checksumming Chunks have 64KB blocks with 32bit checksums Checksums are verified on reads Checksums can be incremented when partial blocks are added to by append Inactive chunks are periodically scanned and verified )!$ 1 master - 16 clients - 16 chunkservers 10

)!*#' Storage many terabytes of data from hundreds of thousands of files Metadata 380MB per chunkserver Read Rates 580MB/s Master Load 200 to 500 OP/s Recovery Time 15,000 chunks 600 GB Took 23 minutes to replicate chunks System optimized for a unique set of application workloads Large Files Hardware Failures Many Concurrent Appends Large Sequential Reads Fault Tolerance Automatic Replication and checksums Master involvement minimized with large chunk sizes, metadata caching, and chunk leases 11

+,, # http://labs.google.com/papers/gfssosp2003.pdf http://storagemojo.com/google-file-systemeval-part-i/ http://communication.howstuffworks.com/goo gle-file-system6.htm 12