GOOGLE FILE SYSTEM: MASTER Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung

ECE7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective (Winter 2015) Presentation Report GOOGLE FILE SYSTEM: MASTER Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Prepared For: Prof. Song Jiang Prepared By: Suneel K. Oad (FL1146) 3/12/2015

Google File System was designed and implemented by Google, to meet Google s rapidly growing and highly distributed File System environment. The presented solution has been successfully implemented, and fulfilling its objective of design. GFS is managed by a very simple but super smart Master node (server), which is responsible for managing and storing all metadata of entire GFS cluster, handling and controlling other tasks like creating chunks, replicating chunks, garbage collection and rebalancing chunkservers in GFS environment. In this report, some of key questions are answered related to Master in Google File System. Question # 1: We use leases to maintain a consistent mutation order across replicas. Could you show a scenario where unexpected result may appear if the lease mechanism is not implemented? Also explain how leases help address the problem? Solution: Absence of leases could result in data inconsistency across the replicas, which could be best visualized with the help of figure below (Question 1-1: No Lease): Question 1-1: No Lease However, data could be made consistent by introducing lease mechanism. Which can be briefly explained as: GFS client send data to chunkservers, which is kept in buffer Master designate a primary chunkserver, which will save/write chunk data on its storage Primary chunkserver then send the order in which it saved chunks, to other secondary chunkservers Secondary chunkservers will then save/write chunk data in same order as of primary chunkserver s. This is how chunk data could be made consistent across replicas. This could be best visualized with the help of below figure (Question 1-2: With Lease)

Question 1-2: With Lease Question # 2: Without network congestion, the ideal elapsed time for transferring B bytes to R replicas is B/T + RL where T is the network throughput and L is latency to transfer bytes between two machines. Please explain the statement. Solution: In GFS environment data is transferred based on following facts: Data Flow is linear Data is transferred in pipelines Data forwarded to Closest ones Full duplex switch links are used, which means sending and receiving of data could be done concurrently. This could be best visualized with the help of figure (Question 2: Data Transfer) below: Time to transfer B bytes to R replica= B/T + RL E.g. Network link (T) = 100Mbps, Latency (L) = 1 ms Therefore, 1 MB can ideally be distributed in about 80 ms Question 2: Data Transfer

Question # 3: One nice property of this locking scheme is that it allows concurrent mutations in the same directory. Explain how this property is received in GFS, and not in the traditional file systems. Solution: Multiple file creations can be executed concurrently in same directory. Each acquires a read lock on the directory name, and write lock on file name. Read Lock on the directory name suffice to prevent the directory being deleted, renamed or snapshotted. Unlike traditional file systems, GFS logically represents its namespace as a look-up table, mapping full path names to metadata Example: File /home/user/foo is prevented from being created while /home/user is being snapshotted to /save/user Snapshot operation acquires read locks on /home and /save, and write locks on /home/user and /save/user The file creation acquires read locks on /home and /home/user and write locks on /home/user/foo Two operations will be serialized properly because they try to obtain conflicting locks on /home/user

Question # 4: When the master creates a chunk, it chooses where to place the initially empty replicas What are criteria for choosing where to place the initially empty replicas? Solution: Following are criteria for placing initial empty replica. I. New replicas could be placed on chunkserver whose disk space utilization is below average. Disk utilization will be equalized across chunkservers over the time. II. III. Number of replica creation could be limited by certain value. This will ensure that chunkserver will not be exhausted by heavy traffic, which usually happen when chunks are recently created. Replicas could be spread across chunkserver racks, in order to secure data in case of entire rack failure. Question # 5: The master re-replicates a chunk as soon as the number of available replicas falls below a user-specified goal. When a new chunkserver is added into the system, the master mostly uses chunk rebalancing rather than using writing new chunks to fill up it. Why? Solution: Master rebalance chunkservers gradually, because this will ensure that chunkservers are not being exhausted by heavy traffic, and balances the load across chunkservers.

Question # 6: After a file is deleted, GFS does not immediately reclaim the available physical storage. It does so only lazily during regular garbage collection at both the file and chunk levels. How files and chunks are deleted? What s the advantages of the delayed space reclamation (garbage collection), rather than eager deletion? Solution: Garbage Collection Mechanism: When file is deleted by application, master logs the deletion immediately. File is renamed to hidden name that include the deletion timestamp. During master s regular scan, it remove any such hidden files if they have existed for more than three days. After hidden files are removed from namespace, its in-memory metadata is erased. Advantages: Replica deletion message may be lost and master has to remember to resend. Thus just logging delete request is enough for master to fulfill, when it will start garbage collection. It is done only when master is relatively free. Instant deletion may cause too much workload for master to do. It merges storage reclamation into regular background activities of the master. Thus no separate handling is required. Delay in reclaiming storage provides a safety net against accidental deletion. References: 1. Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung. The Google File System, Google 2. http://en.wikipedia.org/wiki/google_file_system 3. Google Developers, https://www.youtube.com/watch?v=5eib_h_zcey 4. Dr. Song Jiang, Lecture 7-Part II, ECE 7650 Scalable and Secure Internet Services and Architecture, Winter 2015