The Google File System

Similar documents
The Google File System

Google File System. By Dinesh Amatya

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

The Google File System

The Google File System

The Google File System

The Google File System

CA485 Ray Walshe Google File System

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

NPTEL Course Jan K. Gopinath Indian Institute of Science

CLOUD-SCALE FILE SYSTEMS

The Google File System (GFS)

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Google File System. Arun Sundaram Operating Systems

Google File System (GFS) and Hadoop Distributed File System (HDFS)

The Google File System

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) handle appends efficiently (no random writes & sequential reads)

The Google File System. Alexandru Costan

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

GFS: The Google File System. Dr. Yingwu Zhu

CSE 124: Networked Services Fall 2009 Lecture-19

GOOGLE FILE SYSTEM: MASTER Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung

Google Disk Farm. Early days

CSE 124: Networked Services Lecture-16

GFS: The Google File System

The Google File System

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Distributed Filesystem

Distributed File Systems II

Distributed System. Gang Wu. Spring,2018

Distributed Systems 16. Distributed File Systems II

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

The Google File System GFS

Abstract. 1. Introduction. 2. Design and Implementation Master Chunkserver

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

Seminar Report On. Google File System. Submitted by SARITHA.S

Distributed File Systems. Directory Hierarchy. Transfer Model

Staggeringly Large File Systems. Presented by Haoyan Geng

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

GFS. CS6450: Distributed Systems Lecture 5. Ryan Stutsman

HDFS: Hadoop Distributed File System. Sector: Distributed Storage System

Map-Reduce. Marco Mura 2010 March, 31th

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

Distributed File Systems (Chapter 14, M. Satyanarayanan) CS 249 Kamal Singh

CS435 Introduction to Big Data FALL 2018 Colorado State University. 11/7/2018 Week 12-B Sangmi Lee Pallickara. FAQs

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

MapReduce. U of Toronto, 2014

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

9/26/2017 Sangmi Lee Pallickara Week 6- A. CS535 Big Data Fall 2017 Colorado State University

Google is Really Different.

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

Distributed File Systems

Staggeringly Large Filesystems

BigData and Map Reduce VITMAC03

NPTEL Course Jan K. Gopinath Indian Institute of Science

L1:Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung ACM SOSP, 2003

7680: Distributed Systems

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

NPTEL Course Jan K. Gopinath Indian Institute of Science

Distributed Systems. GFS / HDFS / Spanner

Google File System 2

Lecture XIII: Replication-II

2/27/2019 Week 6-B Sangmi Lee Pallickara

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Lecture 3 Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, SOSP 2003

AN OVERVIEW OF DISTRIBUTED FILE SYSTEM Aditi Khazanchi, Akshay Kanwar, Lovenish Saluja

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

This material is covered in the textbook in Chapter 21.

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

TI2736-B Big Data Processing. Claudia Hauff

Current Topics in OS Research. So, what s hot?

Google Cluster Computing Faculty Training Workshop

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

Extreme computing Infrastructure

Performance Gain with Variable Chunk Size in GFS-like File Systems

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

Hadoop Distributed File System(HDFS)

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

HDFS Architecture Guide

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

11/5/2018 Week 12-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

CS 345A Data Mining. MapReduce

Introduction to MapReduce

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Transcription:

The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771

Overview Google file system is a scalable distributed file system designed mainly for large distributed data intensive applications. GFS provides fault tolerance and gives high aggregate performance to large number of clients. GFS has been designed by observations of workloads and storage needs. Google as platform can provide hundreds of TB of storage on thousands of machines which can be accessed by 100 of clients. In GFS throughput matters more than latency.

Introduction Design Overview Assumptions Interface Architecture Single Master Chunk Size Metadata In memory Data Structures Chunk Locations Operation Log System Interactions Lease and Mutation Order Data Flow Questions discussed Conclusions Topics Covered

Introduction GFS is designed to meet the rapid growing demands of Google data processing needs. Various factors kept in mind while designing GFS: Components failures are common. As system consists of thousands of storage machines which are built from inexpensive commodity parts. Therefore constant monitoring, error detection, fault tolerance and automatic recovery must be integral to the system. Now a days use of multi GB files are common. Block size should be revisited as its difficult to manage in data in KB, even if system can support it. Caching data blocks in client losses its appeal. Increases flexibility.

Design: Assumptions: Due to inexpensive commodities, failure rate is high. Management of large files must be done efficiently. Small files must be supported but we need not to optimize them. Once written files are seldom modified again. Atomicity with minimal synchronization is very important. Large streaming reads. High sustained bandwidth is more important than low latency.

Interface: Files are organized hierarchically in directories and identified by the path name. Usual operations are supported like: Create, delete, open, close, read and write. GFS also has snapshot and record append operations.

Architecture: It has a single master. Data is stored in multiple chunk servers, accessed by multiple clients. Files are divided into chunks of fixed size(64 MB). Each chunk is identified by an immutable and globally unique 64 bit chunk handler assigned by the master at the time of creation. Chunk servers stores the chunk on local disks as Linux files and they read or write data specified by chunk handlers and byte range. Keeping reliability in mind each chunk is replicated three times. Master has all the metadata and controls system activities like chunk lease management, garbage collection and chunk migration between chunk servers.

Master collects chunk servers state using heartbeat messages. Neither client nor the chunk servers caches the data. Not caching data simplifies the overall client system and also removes the cache coherency issues. Chunk servers don t cache file data as chunks are stored as local files so Linux buffer already keeps frequently accessed data in memory.

Single Master I A single Master simplifies the design and enables the master to make sophisticated chunk placement and replication decisions. Master involvement must be minimized so that it does not became a bottleneck. Single master can be a bottleneck? GFS gives solution as Shadow masters. Also storage should not be a problem as a chunk of size 64MB requires 64bytes of memory. So 20 Million files can be stored in 200MB space.

Chunk Size: As GFS deals with large files. So chunk size has been selected as 64MB. Lazy space allocation avoids wasting space due to internal fragmentation. Advantages of large chunk size: Reduces client master interactions. Reduces network overhead. Disadvantage: Chunk servers storing small files with small chunks perhaps just one might became hotspots. GFS fixed this problem by storing such executable with higher replication factor. A potential solution is to allow clients to read data from other clients in such situations.

Metadata: Master stores three major types of metadata: File and chunk namespaces Mapping from files to chunks Location of each chunk replica s All the metadata is stored in master. Master does not store chunk information persistently instead it asks chunk servers about its chunks at master start up or when a chunk server joins the cluster. Log help us to update master state simply and reliably and without risking inconsistency in event of master crash.

In memory data structure Since metadata is stored in memory masters operations are fast. Also its easy for master to scan through its entire state in background. Scanning is done for chunk garbage collection, rereplication in case of failure of chunk server and chunk migration to balance load and disk space. One of the concern is capacity of whole system is limited by how much memory master has?

Chunk Locations Master simply polls the chunk servers at start up rather keeping a record. Master monitors chunk servers status by regular heart beat messages.

Operation log: It consists historical record of critical meta data changes and also serves as the timeline that defines the concurrent operations. Log should be stored reliably or else we can loose the whole file system and recent client operations. Masters recovers its file state by replaying the operational log file, so to minimize the start up time operational log should be small. Master checkpoints its state whenever the log goes beyond a certain size. Master switches to a new log file and creates the new check point in a separate thread. Recovery needs only latest files so older files can be deleted freely.

System interactions: System is designed in such a way that masters involvement is minimized. Based on these assumptions we can decide how client, master and chunk server interact with each other to implement data mutation. Lease and mutation order: Basically mutation is an operation that changes the contents of a chunk such as write or an append operation. Also it is implemented on all chunk replicas.

Now what is lease? Lease is used to maintain a consistent mutation order across replica s. Master grants a chunk lease to one of the replica called as primary. A serial order for all mutations to chunk is decided by primary which is followed by all the replica s. Thus the global mutation order is defined first by lease grant order chosen by master and then with in the lease the serial number assigned by primary. This lease mechanism is designed to minimize management overhead at the master.

If the communication with primary is lost, then master can safely grants a new lease to another replica.

Data flow Flow of data is decoupled from flow of control to use network efficiently. Data in chunk servers is pushed linearly to fully utilize machine s network bandwidth. Latency is being minimized by pipelining the data transfer over TCP connections. If we ignore the network congestion the ideal elapsed time for transferring B bytes to R replicas is (B/T+RL) where T is the network throughput and L is the latency.

Questions by Professor: 1. its design has been driven by key observations of our application workloads and technological environment, What are the workload and technology characteristics GFS assumed in its design and what are their corresponding design choices? Sol: Answered in slides 4 and 5. 2. while caching data blocks in the client loses its appeal. GFS does not cache file data. Why does this design choice not lead to performance loss? What benefit does this choice have? Solution: File data is not cached by the client or chunk server. Large streaming reads offer little caching benefits since most of the cache data will always be overwritten due to less space. It directly gets the data from chunk server which leads to less interactions with master and improves overall efficiency.

3.) Small files must be supported, but we need not optimize for them. Why? Think of a scenario where such an assumption on workloads is not valid. Solution: Such an assumption is not valid when many small files are stored in a chunk of 64MB which are stored on different chunkservers. In such scenario load balancing is an issue. Hot spots can be created if many clients access these files at same time. 4.) Clients interact with the master for metadata operations, but all databearing communication goes directly to the chunk servers. How does this design help improve the system s performance? Solution: The clients contact the master only for metadata; reading and writing file data go through the chunk server. Besides, prefetching metadata further reduces client master interactions. Reduces load on master results in increase in efficiency. Central point of management. Masters is only involved in control messages

5.) A GFS cluster consists of a single master. What s benefit of having only a single master? What s its potential performance risk? How does GFS minimize such a risk? Sol.) Answered in slide 9. 6.) Each chunk replica is stored as a plain Linux file on a chunk server and is extended only as needed. How does GFS collaborate with chunk server s local file system to store file chunks? What s lazy space allocation and what s its benefit? Sol.) GFS does not store all the useful data on a single chunk rather it distributes it to balance load and disk space on chunk server. With lazy space allocation, the physical allocation of space is delayed as long as possible, until data at the size of the chunk size (in GFS's case, 64 MB) is accumulated. In other words, the decision process that precedes the allocation of a new chunk on disk, is heavily influenced by the size of the data that is to be written. This preference of waiting instead of allocating more chunks based on some other characteristic, minimizes the chance of internal fragmentation (i.e. unused portions of the 64 MB chunk).

7.) On the other hand, a large chunks size, even with lazy space allocation, has its disadvantages. Give an example disadvantage. Sol.) Answered in slide 10 8.) One potential concern for this memory only approach is that the number of chunks and hence the capacity of the whole system is limited by how much memory the master has. Why is GFS s master able to keep the metadata in memory? Sol.) Storage should not be a problem as a chunk of size 64MB requires 64bytes of memory. So 20 Million files can be stored in 200MB space. 9.) Since the operation log is critical, we must store it reliably and not make changes visible to clients until metadata changes are made persistent. Explain the role of the log. Sol.) explained in slide 14.

Conclusions: GFS demonstrate how to support large scale processing workloads on commodity hardware. Designed to tolerate frequent component failures. Optimize for huge files that are mostly appended and read. Going for simple solutions eg: Single Master. Finally GFS has met Google s storage needs.. So it must be good.