The Google File System

Similar documents
The Google File System

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

The Google File System

Google File System. By Dinesh Amatya

The Google File System

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

The Google File System

CLOUD-SCALE FILE SYSTEMS

Distributed System. Gang Wu. Spring,2018

The Google File System

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

The Google File System (GFS)

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Google File System. Arun Sundaram Operating Systems

The Google File System

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

The Google File System. Alexandru Costan

The Google File System GFS

Google Disk Farm. Early days

CA485 Ray Walshe Google File System

The Google File System

GFS: The Google File System. Dr. Yingwu Zhu

GOOGLE FILE SYSTEM: MASTER Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung

Google File System (GFS) and Hadoop Distributed File System (HDFS)

CSE 124: Networked Services Lecture-16

CSE 124: Networked Services Fall 2009 Lecture-19

GFS: The Google File System

NPTEL Course Jan K. Gopinath Indian Institute of Science

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Distributed File Systems (Chapter 14, M. Satyanarayanan) CS 249 Kamal Singh

Distributed Filesystem

CS435 Introduction to Big Data FALL 2018 Colorado State University. 11/7/2018 Week 12-B Sangmi Lee Pallickara. FAQs

NPTEL Course Jan K. Gopinath Indian Institute of Science

Google File System 2

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

NPTEL Course Jan K. Gopinath Indian Institute of Science

Staggeringly Large File Systems. Presented by Haoyan Geng

Seminar Report On. Google File System. Submitted by SARITHA.S

Distributed File Systems II

7680: Distributed Systems

goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) handle appends efficiently (no random writes & sequential reads)

Abstract. 1. Introduction. 2. Design and Implementation Master Chunkserver

Staggeringly Large Filesystems

Lecture 3 Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, SOSP 2003

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

9/26/2017 Sangmi Lee Pallickara Week 6- A. CS535 Big Data Fall 2017 Colorado State University

MapReduce. U of Toronto, 2014

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

L1:Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung ACM SOSP, 2003

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Lecture XIII: Replication-II

BigData and Map Reduce VITMAC03

2/27/2019 Week 6-B Sangmi Lee Pallickara

Distributed File Systems. Directory Hierarchy. Transfer Model

Google is Really Different.

Distributed Systems. GFS / HDFS / Spanner

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

Distributed Systems 16. Distributed File Systems II

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

GFS. CS6450: Distributed Systems Lecture 5. Ryan Stutsman

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

Google Cluster Computing Faculty Training Workshop

Map-Reduce. Marco Mura 2010 March, 31th

GFS-python: A Simplified GFS Implementation in Python

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

HDFS: Hadoop Distributed File System. Sector: Distributed Storage System

This material is covered in the textbook in Chapter 21.

TI2736-B Big Data Processing. Claudia Hauff

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

11/5/2018 Week 12-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

Performance Gain with Variable Chunk Size in GFS-like File Systems

HDFS Architecture Guide

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Cloud Scale Storage Systems. Yunhao Zhang & Matthew Gharrity

CS655: Advanced Topics in Distributed Systems [Fall 2013] Dept. Of Computer Science, Colorado State University

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Data Storage in the Cloud

Hadoop Distributed File System(HDFS)

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

CLOUD- SCALE FILE SYSTEMS THANKS TO M. GROSSNIKLAUS

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

Hadoop and HDFS Overview. Madhu Ankam

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Distributed File Systems

CS 345A Data Mining. MapReduce

Outline. Challenges of DFS CEPH A SCALABLE HIGH PERFORMANCE DFS DATA DISTRIBUTION AND MANAGEMENT IN DISTRIBUTED FILE SYSTEM 11/16/2010

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

Transcription:

The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google* 정학수, 최주영 1

Outline Introduction Design Overview System Interactions Master Operation Fault Tolerance and Diagnosis Conclusions 2

Introduction GFS was designed to meet the demands of Google s data processing needs. Emphasis on Design Component failures Files are huge Most files are mutated by appending 3

DESIGN OVERVIEW 4

Assumptions Composed of inexpensive components often fail Stores 100 MB or larger size file Large streaming reads, small random reads Large, sequential writes that append data to files. Atomicity with minimal synchronization overhead is essential. High sustained bandwidth is more important than low latency 5

Interface Files are organized hierarchically in directories and identified by pathnames Operation Create Delete Open Close Read Write Snapshot Record append Function Create file Delete file Open file Close file Read file Write file Create a copy of file or a directory tree Allow multiple clients to append data to the same file 6

Architecture Google File System. Designed for system-to-system interaction, and not for user-to-system interaction. 7

Single Master 8

Chunk Size Large chunk size 64MB Advantages Reduce client-master interaction Reduce network overhead Reduce the size of metadata Disadvantages Hot spot - Many clients accessing the same file 9

Metadata All metadata is kept in master s memory Less than 64bytes metadata each chunk Types File and chunk namespace File to chunk mapping Location of each chunk s replicas 10

Metadata(Cont d) In-Memory data structure Master operations are fast Easy and efficient periodically scan Operation log Contain historical record of critical metadata changes Replicate on multiple remote machines Respond to client only after log record Recovery by replaying the operation log 11

Consistency Model Consistent all clients will always see the same data regardless of which replicas they read from Defined consistent and clients will see what mutation writes in its entirety Inconsistent different clients may see different data at different times 12

SYSTEM INTERACTION 13

Leases and Mutation Order Leases To maintain a consistent mutation order across replicas and minimize management overhead The master grants one of the replicas to become the primary Primary picks a serial order of mutation When applying mutation all replicas follow the order 14

Leases and Mutation Order(Cont d) 15

Data Flow Fully utilize network bandwidth Decouple control flow and data flow Avoid network bottlenecks and high-latency Forwards the data to the closest machine Minimize latency Pipelining the data transfer 16

Atomic Record Appends Record append : atomic append operation Client specifies only the data GFS appends data at an offset of GFS s choosing and return that offset to client Many clients append to the same file concurrently such files often serves as multiple-producer/ singleconsumer queue Contain merged results 17

Snapshot Make a copy of a file or a directory tree Standard copy-on-write SNAPSHOT 18

MASTER OPERATION 19

Namespace Management and Locking Namespace Lookup table mapping full pathname to metadata Locking To ensure proper serialization multiple operations active and use locks over regions of the namespace Allow concurrent mutations in the same directory Prevent deadlock consistent total order 20

Replica Placement Maximize data reliability and availability Maximize network bandwidth utilization Spread replicas across machines Spread chunk replicas across the racks 21

Creation, Re-replication, Rebalancing Creation Demanded by writers Re-replication Number of available replicas fall down below a user-specifying goal Rebalancing For better disk space and load balancing 22

Garbage Collection Lazy reclaim Log deletion immediately Rename to a hidden name with deletion timestamp Remove 3 days later Undelete by renaming back to normal Regular scan Heartbeat message exchange with each chunkserver Identify orphaned chunks and erase the metadata 23

Stale Replica Detection Maintain a chunk version number Detect stale replicas Remove stale replicas in regular garbage collection 24

FAULT TOLERANCE AND DIAGNOSIS 25

High Availability Fast recovery Restore state and start in seconds Chunk replication Different replication levels for different parts of the file namespace Master clones existing replicas as chunkservers go offline or detect corrupted replicas through checksum verification 26

High Availability Master replication Operation log and checkpoints are replicated on multiple machines Master machine or disk fail Monitoring infrastructure outside GFS starts new master process Shadow master Read-only access when primary master is down 27

Data Integrity Checksum To detect corruption Every 64KB block in each chunk In memory and stored persistently with logging Read Chunkserver verifies checksum before returning Write Append Incrementally update the checksum for the last block Compute new checksum 28

Data Integrity(Cont d) Write Overwrite Read and verify the first and last block then write Compute and record new checksums During idle periods Chunkservers scan and verify inactive chunks 29

MEASUREMENTS 30

Micro-benchmarks GFS cluster 1 master 2 master replicas 16 chunkservers 16 clients Server machines connected to one switch client machines connected to the other Two switches are connected with 1 Gbps link. 31

Micro-benchmarks Figure 3: Aggregate Throughputs. Top curves show theoretical limits imposed by our network topology. Bottom curves show measured throughputs. They have error bars that show 95% confidence intervals, which are illegible in some cases because of low variance in measurements. 32

Real World Clusters Table2: characteristic Of two GFS clusters 33

Real World Clusters Table 3: Performance Metrics for Two GFS Clusters 34

Real World Clusters In cluster B Killed a single chunk server containing 15,000 chunks (600GB of data) All chunks restored in 23.2minutes Effective replication rate of 440MB/s Killed two chunk servers each 16,000 chunks (660GB of data) 266 chunks only have a single replica Higher priority Restored with in 2 minutes 35

Conclusions Demonstrates qualities essential to support large-scale processing workloads Treat component failure as the norm Optimize for huge files Extend and relax standard file system Fault tolerance provide Consistent monitoring Replicating crucial data Fast and automatic recovery Use checksum to detect data corruption High aggregate throughput 36