CSE 124: Networked Services Fall 2009 Lecture-19

Similar documents
CSE 124: Networked Services Lecture-16

The Google File System

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

The Google File System

The Google File System (GFS)

The Google File System

The Google File System

The Google File System

Google File System. By Dinesh Amatya

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

The Google File System

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

CLOUD-SCALE FILE SYSTEMS

Distributed System. Gang Wu. Spring,2018

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

Google File System. Arun Sundaram Operating Systems

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

CA485 Ray Walshe Google File System

NPTEL Course Jan K. Gopinath Indian Institute of Science

The Google File System

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Distributed Filesystem

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

GFS: The Google File System. Dr. Yingwu Zhu

The Google File System. Alexandru Costan

Google Disk Farm. Early days

GFS: The Google File System

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

The Google File System GFS

Google is Really Different.

NPTEL Course Jan K. Gopinath Indian Institute of Science

Distributed Systems 16. Distributed File Systems II

Google File System 2

MapReduce. U of Toronto, 2014

CS435 Introduction to Big Data FALL 2018 Colorado State University. 11/7/2018 Week 12-B Sangmi Lee Pallickara. FAQs

GOOGLE FILE SYSTEM: MASTER Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung

CSE 124: Networked Services Lecture-17

goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) handle appends efficiently (no random writes & sequential reads)

Distributed File Systems II

BigData and Map Reduce VITMAC03

Abstract. 1. Introduction. 2. Design and Implementation Master Chunkserver

Staggeringly Large Filesystems

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

HDFS: Hadoop Distributed File System. Sector: Distributed Storage System

The Google File System

9/26/2017 Sangmi Lee Pallickara Week 6- A. CS535 Big Data Fall 2017 Colorado State University

7680: Distributed Systems

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Seminar Report On. Google File System. Submitted by SARITHA.S

NPTEL Course Jan K. Gopinath Indian Institute of Science

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Distributed Systems. GFS / HDFS / Spanner

2/27/2019 Week 6-B Sangmi Lee Pallickara

Distributed File Systems (Chapter 14, M. Satyanarayanan) CS 249 Kamal Singh

Lecture XIII: Replication-II

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

Hadoop and HDFS Overview. Madhu Ankam

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

Map-Reduce. Marco Mura 2010 March, 31th

Performance Gain with Variable Chunk Size in GFS-like File Systems

Staggeringly Large File Systems. Presented by Haoyan Geng

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles

HDFS Architecture Guide

GFS. CS6450: Distributed Systems Lecture 5. Ryan Stutsman

Lecture 3 Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, SOSP 2003

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

GFS-python: A Simplified GFS Implementation in Python

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

Google Cluster Computing Faculty Training Workshop

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Hadoop Distributed File System(HDFS)

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

L1:Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung ACM SOSP, 2003

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

Distributed File Systems. Directory Hierarchy. Transfer Model

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

A Study of Comparatively Analysis for HDFS and Google File System towards to Handle Big Data

A BigData Tour HDFS, Ceph and MapReduce

CS655: Advanced Topics in Distributed Systems [Fall 2013] Dept. Of Computer Science, Colorado State University

Cluster-Level Google How we use Colossus to improve storage efficiency

11/5/2018 Week 12-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

Map Reduce. Yerevan.

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

Transcription:

CSE 124: Networked Services Fall 2009 Lecture-19 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa09/cse124 Some of these slides are adapted from various sources/individuals including but not limited to the images and text from IEEE/ACM digital libraries, Google File System Documentation, and some of the publicly available information gathered through google search engine. Use of these slides other than for pedagogical purpose for CSE 124, may require explicit permissions from the respective sources. 12/1/2009 CSE 124 Networked Services Fall 2009 1

Announcements Programming Project-2 Presentation/Demo scheduled for 12/03/2009 Each team gets 10 minutes presentation, demo, and Q&A. The presentation less than 10 slides Should contain the purpose of the project the functioning of the project the challenges that you have faced problems that you have solved and the expected commercial potential for your project. The demo either follows the presentation or can be along with the presentation. Finals schedule: Thursday during the Finals week, 3pm-5.59pm CENTER 218 Topics details will be announced soon. 12/1/2009 CSE 124 Networked Services Fall 2009 2

Google File Systems Google File system A scalable distributed file system Large distributive data intensive applications Widely deployed in Google Scalability 100s of terabytes 1000s of disks 1000s of machines Main benefits Fault tolerance while running over commodity hardware High aggregate performance 12/1/2009 CSE 124 Networked Services Fall 2009 3

Why GFS? Component failures are common Aplication bugs, OS bugs, human errors, failure of disks, memory, connectors, networking, or power failure File sizes are huge Multi-GB is common Even TBs are expected I/O operations and Block Sizes are to be reconsidered Most files are appended most often Most operations include appending new data Fewer overwriting Random writes within files are mostly non-existent Large repositories scanned by data analysis programs Data streams generated by continuous programs Archival data File system Co-design with application will be far more optimal APIs design must consider the application Atomic append helps multiple clients to concurrently append data Can be useful for clustering 1000s of nodes 12/1/2009 CSE 124 Networked Services Fall 2009 4

Design objectives of GFS Inexpensive commodity hardware Must store modest number of large files Few million files each of 100MB or even Multi-GB Must support small files Must support two kinds of reads Large streaming reads (1 MB or more) Small random reads (few KBs) May batch and sort for multiple small reads Must support many large sequential writes Similar in size to those Reads Written files are seldom modified Small writes must be supported (may be with less efficiency) Must support concurrent appending of the files Multiple clients must append the same file Must provide high sustained throughput 12/1/2009 CSE 124 Networked Services Fall 2009 5

GFS API Similar to the standard POSIX file API Supports usual create, delete, open, close, read, and write operations Additional interfaces Snapshot Creates a copy of a file or directory tree very efficiently Record append Allows multiple clients to simultaneously Atomicity is guaranteed 12/1/2009 CSE 124 Networked Services Fall 2009 6

GFS architecture 12/1/2009 CSE 124 Networked Services Fall 2009 7

GFS Master Single master design To simplify the original design Maintains Namespace for files as well as chunks Access control information Mapping from files to chunks Current locations of chunks Does Maps files to chunks Chunk lease management Garbage collection Chunk migration between chunkservers Scalability of single Master design several Peta bytes and Processing load for meta files GFS evolved to multiple GFS masters over a collection of chunk servers Upto 8 masters could be mapped onto on chunk server 12/1/2009 CSE 124 Networked Services Fall 2009 8

Master Design Master sends instructions to chunkserver To Delete a given chunk To Create a new chunk Periodic communication between Master and chunkserver to keep state information: Is chunkserver down? disk failures on chunkserver? Any replicas corrupted? Which chunk replicas does chunkserver store? 12/1/2009 CSE 124 Networked Services Fall 2009 9

Master bottleneck Master is typically faster Since metadata is small Less than 64 bytes per file name 64 bytes metadata per 64MB chunk So one Master worked well early designs When the file sizes are smaller Large file sizes resulted Gmail files Metadata became too huge Master s memory is limited to hold the metadata Master became a bottleneck 12/1/2009 CSE 124 Networked Services Fall 2009 10

Chunks and chunk servers Analogous to block, however, very large Stored as file on chunkserver Size: 64 MB! Chunk handle (~ chunk file name) used to locate Chunk replicated across multiple chunkservers Minimum three replicas Chunk servers Stores chunks Do not cache chunks Large chunk size: Pros Helps reduce the number of Client-Master interactions Helps using persistent TCP connection between Client-Chunkserver Reduces the size of metadata in the Master Large chunk size: Cons Can create hotspots Inefficiency in storing smaller files (Gmail files) 12/1/2009 CSE 124 Networked Services Fall 2009 11

Client-Chunkserver interactions Read request is originated by Applications GFS Client receives the request from Applications GFS Client translates the reqeust <File name, byte offset> <File name, chunk index> Note that chunk sizes are fixed (64MB) GFS client queries the Master with <Filename and Chunk index> Master identifies the <chunk handle> and the location of the chunk servers GFS client request chunks from one of the chosen chunk servers Usually the nearest chunkserver is chosen Chunkserver sends requested data to the clients GFS client forwards the data to the application 12/1/2009 CSE 124 Networked Services Fall 2009 12

Example of Client-Master interaction Application (Search indexer) 1 File name: crawl_index_99 Offset: 2048 Bytes GFS client GFS Master File name: crawl_index_99 Index: 3 2 crawl_index_99 Chunk_001 (R1,R5, R8 ) Chunk_002 (R8, R4, R6) 3 Chunk_003 (R4,R3, R2 ) Chunk_003, Chunkservers: R4, R3, R2 12/1/2009 CSE 124 Networked Services Fall 2009 13

Example of Client-chunkserver interaction Application (Search indexer) 6 GFS client 2048 Bytes 4 Chunk_003, 2048 Bytes 2048 Bytes 5 Chunkserver R2 Chunkserver R3 Chunkserver R2 12/1/2009 CSE 124 Networked Services Fall 2009 15

Example of Write operation Application (Search indexer) GFS Master File name: crawl_index_99, DATA 1 GFS client File name, crawl_index_99 Chunk Index 2 3 Chunk_handle, Primary and secondary replica information 12/1/2009 CSE 124 Networked Services Fall 2009 17

Example Write Application (Search indexer) GFS client 4 DATA Secondary Chunkserver chunk Buffer Primary Chunkserver chunk Buffer Secondary Chunkserver chunk Buffer 12/1/2009 CSE 124 Networked Services Fall 2009 18

Example Write Application (Search indexer) GFS client 5 Write Secondary Chunkserver Buffer Primary Chunkserver 6 D1 D2 D3 Secondary Chunkserver Buffer chunk chunk chunk 7 Write 12/1/2009 CSE 124 Networked Services Fall 2009 19

Example Write Application (Search indexer) GFS client 9 Resp onse Secondary Chunkserver Buffer Primary Chunkserver D1 D2 D3 Secondary Chunkserver Buffer chunk chunk chunk 8 Resp onses 12/1/2009 CSE 124 Networked Services Fall 2009 20

Write control and data flow in GFS From the original GFS publication Data flow may not be one-to-many Bandwidth utilization Location of primaries and secondaries Data and control flow separation 12/1/2009 CSE 124 Networked Services Fall 2009 22

Master Operations novelties Locking Read locks and write locks are separate Efficiency in multiple activities Replica placement Large number of big chunks Bandwidth limitations of racks Combined bandwidth of all servers will far exceed the rack bandwidth Policies Maximize reliability and availability Maximize the network bandwidth utilization Solution Place chunk replicas across racks Nearby racks so that the network bandwidth can be better utilized 12/1/2009 CSE 124 Networked Services Fall 2009 23

Master Operations novelties Creation, re-replication, rebalancing Creation: Where to place the initially empty (new) chunks Place new chunks on new chunk servers with low average utilization Place the new chunks on chunk servers where the number of recent creations where high Heavy write traffic may follow Place new chunk replicas across different racks Re-replication refers to creating additional replicas When the existing number of replicas fall below the user s requirement Chunkserver may be unavailable Replica corruption Replica errors replication goal is increased Strategy: Master picks the highest priority chunk and replicates it on additional chunk servers Priority of a chunk is boosted base on its impact One has more failed replicas 12/1/2009 CSE 124 Networked Services Fall 2009 24

Master Operations novelties Rebalancing Examines the current replica distribution Moves replicas to better disk space Balances the network as well as file system space Decide which replicas to move Gradually fills new chunk servers Garbage Collection Deletion of a chunk is logged instantly actual chunk deletion is not done immediately Deleted file is renamed to a hidden name with time stamp During regular scan of the file system name space Old hidden files are removed Until then it can be undeleted Orphan chunks are deleted as well Not reachable from any file Master-Chunk server HeartBeat message is used for garbage collection Chunk server reports the chunks it has Master responds with the chunks it does not have metadata Chunk server deletes the unwanted chunks Stale replica deletion (version number) 12/1/2009 CSE 124 Networked Services Fall 2009 25

High availability Fast Recovery Chunk replication Master replication Replicates the metadata, logs, and check points Shadow masters helps replicate read-only operation of master Data integrity Checksum-based Each chunk is broken into 64KB blocks Each block has a 32bit checksum 12/1/2009 CSE 124 Networked Services Fall 2009 26

Performance Setup 1 master Two master replicas 16 chunk servers 16 clients 1Gbps link 1.4GHz pentium III 2 GB memory 100Mbps full duplex 2x80GB 5400 RPM disks Servers (19) Clients (16) 12/1/2009 CSE 124 Networked Services Fall 2009 27

Performance (Read) Read rate for N clients Upto 16 readers Each client randomly reads a 4MBb block from a 320GB file It is repeated to read the entire file Aggregate and theoretical limits At 125MBps, the 1Gbps network link saturates Or 2.5MBps/client saturates 80% of per client limit is achieved 94% of aggregate limit is achieved 12/1/2009 CSE 124 Networked Services Fall 2009 28

Performance (Write) N clients writes to N distinct files Each client writes 1GB data to a file 1 MB writes Aggregate rate and theoretical limit are provided Limit: 67MBps Multiple replica writes Each with 12.5MBps NIC Aggregate: 35MBps/16 clients 2.2MBps/client Main culprit was network software stack Multiple writes for each file Network protocol stack does not do pipelining well In real-world performance is better 12/1/2009 CSE 124 Networked Services Fall 2009 29

GFS performance Record append N clients append simultaneously to a single file Performance is limited by the network bandwidth of the chunk server with last chunk Drops from 6MBps/client to 4.8MBps/client 12/1/2009 CSE 124 Networked Services Fall 2009 30

Real-world measurements Cluster A: Used for research and development (by over a hundred engineers) Typical task initiated by user and runs for a few hours. Task reads MB s-tb s of data, transforms/analyzes the data, and writes results back. Cluster B: Used for production data processing. Typical task runs much longer than a Cluster A tas Continuously generates and processes multi-tb data sets. Human users rarely involved. Clusters had been running for about a week when measurements were taken. 12/1/2009 CSE 124 Networked Services Fall 2009 31

Performance with clusters 12/1/2009 CSE 124 Networked Services Fall 2009 32

Performance of GFS 12/1/2009 CSE 124 Networked Services Fall 2009 33

Reading Google File System documents 12/1/2009 CSE 124 Networked Services Fall 2009 35