CS435 Introduction to Big Data FALL 2018 Colorado State University. 11/7/2018 Week 12-B Sangmi Lee Pallickara. FAQs

Similar documents
9/26/2017 Sangmi Lee Pallickara Week 6- A. CS535 Big Data Fall 2017 Colorado State University

2/27/2019 Week 6-B Sangmi Lee Pallickara

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

11/5/2018 Week 12-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

Google Disk Farm. Early days

CLOUD-SCALE FILE SYSTEMS

The Google File System

NPTEL Course Jan K. Gopinath Indian Institute of Science

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

GFS: The Google File System. Dr. Yingwu Zhu

The Google File System

The Google File System

The Google File System

Google File System 2

The Google File System

GFS: The Google File System

The Google File System (GFS)

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Distributed File Systems II

The Google File System. Alexandru Costan

Google File System. Arun Sundaram Operating Systems

Google File System. By Dinesh Amatya

Distributed System. Gang Wu. Spring,2018

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

GOOGLE FILE SYSTEM: MASTER Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

The Google File System

The Google File System GFS

7680: Distributed Systems

Distributed Filesystem

Google File System (GFS) and Hadoop Distributed File System (HDFS)

The Google File System

CSE 124: Networked Services Lecture-16

NPTEL Course Jan K. Gopinath Indian Institute of Science

CSE 124: Networked Services Fall 2009 Lecture-19

Distributed Systems. GFS / HDFS / Spanner

Lecture XIII: Replication-II

The Google File System

Staggeringly Large Filesystems

Distributed File Systems (Chapter 14, M. Satyanarayanan) CS 249 Kamal Singh

CA485 Ray Walshe Google File System

MapReduce. U of Toronto, 2014

Map-Reduce. Marco Mura 2010 March, 31th

BigData and Map Reduce VITMAC03

L1:Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung ACM SOSP, 2003

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Google is Really Different.

Distributed Systems 16. Distributed File Systems II

Abstract. 1. Introduction. 2. Design and Implementation Master Chunkserver

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Seminar Report On. Google File System. Submitted by SARITHA.S

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

NPTEL Course Jan K. Gopinath Indian Institute of Science

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

GFS-python: A Simplified GFS Implementation in Python

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) handle appends efficiently (no random writes & sequential reads)

Data Storage in the Cloud

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

Lecture 3 Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, SOSP 2003

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #4 Tuesday, December 15 th 11:00 12:15. Advanced Topics: Distributed File Systems

Hadoop and HDFS Overview. Madhu Ankam

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Staggeringly Large File Systems. Presented by Haoyan Geng

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

Google Cluster Computing Faculty Training Workshop

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

This material is covered in the textbook in Chapter 21.

HDFS: Hadoop Distributed File System. Sector: Distributed Storage System

GFS. CS6450: Distributed Systems Lecture 5. Ryan Stutsman

CS655: Advanced Topics in Distributed Systems [Fall 2013] Dept. Of Computer Science, Colorado State University

Introduction to Cloud Computing

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Hadoop Distributed File System(HDFS)

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Performance Gain with Variable Chunk Size in GFS-like File Systems

Outline. Spanner Mo/va/on. Tom Anderson

CSE 153 Design of Operating Systems

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

A Distributed Namespace for a Distributed File System

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Intra-cluster Replication for Apache Kafka. Jun Rao

Big Data Technologies

Transcription:

11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.0.0 CS435 Introduction to Big Data 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.1 FAQs Deadline of the Programming Assignment 3 has been extended 11/10 11:59PM No late submission allowed PART 2. LARGE SCALE DATA STORAGE SYSTEMS DISTRIBUTED FILE SYSTEMS CAP Theorem Consistency Availability Partition Tolerance Computer Science, Colorado State University http://www.cs.colostate.edu/~cs435 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.2 Today s topics s Google File System 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.3 How a write is actually performed 1. Chunkserver holding the current lease for the chunk and the location of the other replica 4. Write request Client 3*. 2. Identity of the primary MASTER and the locations of the other replicas Secondary Replica A 7. Final Reply Primary Replica 5. Write request/ 6. Acknowledgement Secondary Replica B 3. Client pushes the data to all the replicas 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.4 Client pushes data to all the replicas [1/2] While the data is written, each chunk server stores data in an LRU buffer until Data is used Aged out 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.5 Client pushes data to all the replicas [2/2] When chunk servers acknowledge receipt of data Client sends a write request to primary Primary assigns consecutive serial numbers to mutations Forwards to replicas 1

11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.6 Data flow is decoupled from the control flow to utilize network efficiently Utilize each machine s network bandwidth 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.7 Avoid network bottlenecks Avoid high-latency links Leverage network topology Estimate distances from IP addresses (in Google s data center) Pipeline the data transfer Once a chunkserver receives some data, it starts forwarding immediately. For transferring B bytes to R other replicas ideal elapsed time will be B/T+RL Where, T is the network throughput L is latency to transfer bytes between two machines Google File System (GFS): Record Append and Inconsistency 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.8 The control flow for record appends is similar to that of writes Google s record appends Atomic record appends Client pushes data to replicas of the last chunk of file Primary replica checks if the record fits in this chunk (64MB) 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.9 Primary replica checks if the record append will breach the size (64MB) threshold If chunk size would be breached Pad the chunk to maximum size Tell client, that operation should be retried on next chunk If the record fits, the primary Appends data to its replica Notifies secondaries to write at the exact offset 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.10 Record sizes and fragmentation Size is restricted to ¼ the chunk size (16MB) Minimizes worst-case fragmentation Internal fragmentation in each chunk 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.11 What if the secondary replicas could not finish the write operation? Client request is considered failed Modified region is inconsistent No attempt to delete this from the chunk Client must handle this inconsistency Client retries the failed mutation 2

11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.12 Inconsistent Regions 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.13 What if record append fails at one of the replicas Data 1 Data 1 Data 1 Data 2 Data 2 Data 2 Data 3 Data 3 User will re-try to store Data 3 Data 1 Data 1 Data 1 Data 2 Data 2 Data 2 Data 3 Data 3 Data 3 Failed Empty Client must retry the operation Replicas of same chunk may contain Different data Duplicates of the same record In whole or in part Replicas of chunks are not bit-wise identical! In most systems, replicas are identical Data 3 Data 3 Data 3 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.14 GFS only guarantees that the data will be written at least once as an atomic unit For an operation to return success Data must be written at the same offset on all the replicas After the write, all replicas are as long as the end of the record Any future record will be assigned a higher offset or a different chunk 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.15 GFS client code implements the file system API Communications with master and chunk servers done transparently On behalf of apps that read or write data Interact with master for metadata Data-bearing communications directly to chunk servers 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.16 Handling failed write in Hadoop HDFS [1/3] Different from GFS HDFS maintains a data queue and an ack queue ack queue Internal queue of packets that are waiting to be acknowledged by datanodes A packet is removed when it has been acknowledged by all the datanodes in the pipeline 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.17 Handling failed write in Hadoop HDFS [2/3] Different from GFS 1. Pipeline is closed Any packets in the ack queue are added to the front of the data queue Datanodes that are downstream from the failed node will not miss any packets 2. The current block on the good datanodes is given a new identity Reports to the namenode To detect and delete partial block on the failed datanode later on 3

11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.18 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.19 Handling failed write in Hadoop HDFS [3/3] 3. Remainder of the block s data is written to the two good datanodes in the pipeline 4. Namenode notices the block is under-replicated It arranges for a further replica to be created on another node Write quorum dfs.replication.min (default to 1) Google File System (GFS): Creating Snapshot 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.20 Snapshots Copying file or directory tree almost instantaneously Minimizing any interruptions of ongoing mutations Providing checkpoint mechanism Users can commit later Rollback 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.21 Snapshots allow you to make a copy of a file very fast 1. Master revokes outstanding leases for any chunks of the file (source) to be snapshot 2. Log the operation to disk 3. Update in-memory state Duplicate metadata of the source file 4. Newly created file points to the same chunks as the source 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.22 When a client wants to write to a chunk C after the snapshot operation Master sees the reference count to C > 1 Pick new chunk-handle C Ask chunk-server with current replica of C Create new chunk C Data is copied locally, not over the network From this point chunk handling of C is no different 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.23 GFS does not have a per-directory structure that lists files in the directory Name spaces represented as a lookup table Maps full pathnames to metadata Each node has an associated read/write lock File creation does not require a lock on the directory structure No inode needs to be protected from modification 4

11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.24 Each master operation acquires a set of locks before it runs Read lock prevents a directory/file from being deleted, renamed, or snapshotted Others can still read it Others CANNOT mutate (write/append/delete) this file Write lock on directory/file names serialize attempts to create a file with the same twice Others CANNOT mutate (write/append/delete) this file Others CANNOT read this file If operation involves /d1/d2/ /dn/leaf Acquire read locks on directory names /d1, /d1/d2,, /d1/d2/ /dn Read or write lock on full pathname /d1/d2/ /dn/leaf 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.25 Locks are used to prevent operations during snapshots How do we prevent creating /home/user/foo While /home/user is being snapshotted to /save/user? /home/user is being snapshotted to /save/user Read locks on /home and /save Write lock on /home/user and /save/user To create file, /home/user/foo Read lock on /home and /home/user Write lock on /home/user/foo The two operations will be serialized because they try to obtain /home/user File creation does not require write lock on parent directory there is no directory Read locks on /home and /home/user Write lock on /home/user/foo 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.26 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.27 Garbage collection in GFS After a file is deleted, GFS does not reclaim space immediately Google File System (GFS): Deletion of files and garbage collection Done lazily during garbage collection at File and chunk levels 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.28 Master logs a file s deletion immediately File is renamed to a hidden name Includes deletion timestamp Master scans the file system namespace Delete if hidden file existed for more than 3 days 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.29 Garbage collection: When Master scans its chunk namespace Identifies orphaned chunks Not reachable from any file Erase metadata for these chunks When file removed from namespace In memory metadata is also removed Severs links to all its chunks! 5

11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.30 The role of heart-beats in garbage collection Chunk server reports subset of chunks it currently has Master replies with identity of chunks no longer present Chunk server free to delete its replica of such chunks 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.31 Stale chunks and issues If a chunk server fails AND misses mutations to the chunk The chunk replica becomes stale Working with a stale replica causes problems with: Correctness Consistency 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.32 Aiding the detection of stale chunks Master maintains a chunk version number for each chunk Distinguish between stale and up-to-date chunks When master grants a new lease on chunk Increase version number Inform replicas chunk Record new version persistently Occurs BEFORE any client can write to 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.33 If a replica is unavailable its version number will not be advanced When a chunk server restarts, it reports to the Master with the following: Set of Chunks Corresponding version numbers Used to detect stale replicas Remove stale replicas in regular garbage collection 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.34 Additional safeguards against stale replicas 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.35 Include chunk version number When client requests chunk information Client/Chunk server verify version to make sure things are up-to-date During cloning operations Clone the most up-to-date chunk Clients and chunk servers expected to verify versioning information Google File System (GFS): Data Integrity 6

11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.36 Data Integrity Impractical to detect chunk corruptions across replicas Not bytewise identical in any case! Detection of corruption should be self-contained 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.37 Data Integrity Break chunks into 64 KB data blocks Compute 32-bit checksum for block Keep in chunk server memory Store persistently, separate from the data Verify checksums of data blocks that overlap read range 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.38 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.39 The master server is a single point of failure Google File System (GFS): Inefficiencies Master server restart takes several seconds Shadow servers exist Can handle reads of files In place of the master But not writes Requires a massive main memory 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.40 The system is optimized for large files But not for a very large number of very small files Primary operation on files Long, sequential reads/writes Large number of random overwrites will clog things up quite a bit 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.41 Consistency Issues: GFS expects clients to resolve inconsistencies File chunks may have gaps or duplicates of some records The client has to be able to deal with this Imagine doing this for a scientific application Portions of a massive array are corrupted Clients would have to detect this HDFS does NOT have this problem 7

11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.42 Security model None Operation is expected to be in a trusted environment 8