BigData and Map Reduce VITMAC03

Similar documents
MapReduce. U of Toronto, 2014

Distributed File Systems II

Distributed Filesystem

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

CA485 Ray Walshe Google File System

Google File System (GFS) and Hadoop Distributed File System (HDFS)

The Google File System. Alexandru Costan

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

The Google File System (GFS)

Google File System. Arun Sundaram Operating Systems

Distributed System. Gang Wu. Spring,2018

CLOUD-SCALE FILE SYSTEMS

The Google File System

The Google File System

The Google File System

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

GFS: The Google File System. Dr. Yingwu Zhu

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

The Google File System

Distributed Systems 16. Distributed File Systems II

Google File System. By Dinesh Amatya

CSE 124: Networked Services Lecture-16

GFS: The Google File System

Google File System 2

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

CSE 124: Networked Services Fall 2009 Lecture-19

The Google File System

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

The Google File System

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Google Disk Farm. Early days

Map-Reduce. Marco Mura 2010 March, 31th

NPTEL Course Jan K. Gopinath Indian Institute of Science

NPTEL Course Jan K. Gopinath Indian Institute of Science

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

CS435 Introduction to Big Data FALL 2018 Colorado State University. 11/7/2018 Week 12-B Sangmi Lee Pallickara. FAQs

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

Introduction to MapReduce

The Google File System

NPTEL Course Jan K. Gopinath Indian Institute of Science

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

Hadoop Distributed File System(HDFS)

goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) handle appends efficiently (no random writes & sequential reads)

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

A brief history on Hadoop

11/5/2018 Week 12-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

The Google File System GFS

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

Google is Really Different.

Hadoop and HDFS Overview. Madhu Ankam

GFS. CS6450: Distributed Systems Lecture 5. Ryan Stutsman

Lecture 11 Hadoop & Spark

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

Staggeringly Large Filesystems

A BigData Tour HDFS, Ceph and MapReduce

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

Hadoop/MapReduce Computing Paradigm

Distributed Systems. GFS / HDFS / Spanner

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Programming Models MapReduce

7680: Distributed Systems

Abstract. 1. Introduction. 2. Design and Implementation Master Chunkserver

MI-PDB, MIE-PDB: Advanced Database Systems

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Map Reduce. Yerevan.

HDFS: Hadoop Distributed File System. Sector: Distributed Storage System

Introduction to MapReduce

Seminar Report On. Google File System. Submitted by SARITHA.S

The Google File System

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

L1:Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung ACM SOSP, 2003

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

HADOOP FRAMEWORK FOR BIG DATA

Introduction to MapReduce. Instructor: Dr. Weikuan Yu Computer Sci. & Software Eng.

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Clustering Lecture 8: MapReduce

Distributed File Systems. Directory Hierarchy. Transfer Model

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Lecture XIII: Replication-II

Dept. Of Computer Science, Colorado State University

This material is covered in the textbook in Chapter 21.

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Transcription:

BigData and Map Reduce VITMAC03 1

Motivation Process lots of data Google processed about 24 petabytes of data per day in 2009. A single machine cannot serve all the data You need a distributed system to store and process in parallel Parallel programming? Threading is hard! How do you facilitate communication between nodes? How do you scale to more machines? How do you handle machine failures? 2 2

MapReduce MapReduce [OSDI 04] provides Automatic parallelization, distribution I/O scheduling Load balancing Network and data transfer optimization Fault tolerance Handling of machine failures Need more power: Scale out, not up! Large number of commodity servers as opposed to some high end specialized servers Apache Hadoop: Open source implementation of MapReduce 3 3

Typical problem solved by MapReduce Read a lot of data Map: extract something you care about from each record Shuffle and Sort Reduce: aggregate, summarize, filter, or transform Write the results 4 4

MapReduce workflow Input Data Split 0 Split 1 Split 2 5 read local write Map extract something you care about from each record remote read, sort Reduce aggregate, summarize, filter, or transform 5 write Output Data Output File 0 Output File 1

Mappers and Reducers Need to handle more data? Just add more Mappers/Reducers! No need to handle multithreaded code Mappers and Reducers are typically single threaded and deterministic Determinism allows for restarting of failed jobs Mappers/Reducers run entirely independent of each other In Hadoop, they run in separate JVMs 6 6

Example: Word Count http://kickstarthadoop.blogspot.ca/2011/04/word-count-hadoop-map-reduce-example.html 7 7

MapReduce Input Data Split 0 Split 1 Split 2 10 Transfer petascale read data through network Map Hadoop Program fork fork fork assign map local write Master remote read, sort assign reduce Reduce 10 write Output Data Output File 0 Output File 1

Google File System (GFS) Hadoop Distributed File System (HDFS) Split data and store 3 replica on commodity servers 11 11

12 MapReduce Input Data Split 0 Split 1 Split 2 Read from local disk HDFS NameNode Location of the chunks of input data assign map Split 0 Split 1 Split 2 Map Where are the chunks of input data? local write Master remote read, sort assign reduce Reduce 12 write Output Data Output File 0 Output File 1

Failure in MapReduce Failures are norm in commodity hardware failure Detect failure via periodic heartbeats Re-execute in-progress map/reduce tasks Master failure Single point of failure; Resume from Execution Log Robust Google s experience: lost 1600 of 1800 machines once!, but finished fine 14 14

Summary MapReduce Programming paradigm for data-intensive computing Distributed & parallel execution model Simple to program The framework automates many tedious tasks (machine selection, failure handling, etc.) 18 18

Zoom in: GFS in more detail 19 19

Motivation: Large Scale Data Storage Manipulate large (Peta Scale) sets of data Large number of machines with commodity hardware Component failure is the norm Goal: Scalable, high performance, fault tolerant distributed file system 20 20

Why a new file system? None designed for their failure model Few scale as highly or dynamically and easily Lack of special primitives for large distributed computation 21 21

What should expect from GFS Designed for Google s application Control of both file system and application Applications use a few specific access patterns Append to larges files Large streaming reads Not a good fit for low-latency data access lots of small files, multiple writers, arbitrary file modifications Not POSIX, although mostly traditional Specific operations: RecordAppend 22 22

Contents Motivation Design overview Write Example Record Append Fault Tolerance & Replica Management Conclusions 24 24

Components Master (NameNode) Manages metadata (namespace) Not involved in data transfer Controls allocation, placement, replication 25 Chunkserver (DataNode) Stores chunks of data No knowledge of GFS file system structure Built on local linux file system www.cse.buffalo.edu/~okennedy/courses/cs e704fa2012/2.2-hdfs.pptx 25

GFS Architecture 26 26

Write(filename, offset, data) 28 4) Commit 7) Success 3) Data push 3) Data push 3) Data push Client Primary Replica Secondary ReplicaA Secondary ReplicaB 1) Who has the lease? 2) Lease info 6)Commit ACK Control Data Master 5) Serialized Commit 6)Commit ACK 28

RecordAppend(filename, data) Significant use in distributed apps. For example at Google production cluster: 21% of bytes written 28% of write operations Guaranteed: All data appended at least once as a single consecutive byte range Same basic structure as write Client obtains information from master Client sends data to data nodes (chunkservers) Client sends append-commit Lease holder serializes append Advantage: Large number of concurrent writers with minimal coordination 29 29

RecordAppend (2) Record size is limited by chunk size When a record does not fit into available space, chunk is padded to end and client retries request. 30 30

Contents Motivation Design overview Write Example Record Append Fault Tolerance & Replica Management Conclusions 31 31

Fault tolerance Replication High availability for reads User controllable, default 3 (non-raid) Provides read/seek bandwidth Master is responsible for directing re-replication if a data node dies Online checksumming in data nodes Verified on reads 32 32

Replica Management Bias towards topological spreading Rack, data center Rebalancing Move chunks around to balance disk fullness Gently fixes imbalances due to: Adding/removing data nodes 33 33

Replica Management (Cloning) Chunk replica lost or corrupt Goal: minimize app disruption and data loss Approximately in priority order More replica missing-> priority boost Deleted file-> priority decrease Client blocking on a write-> large priority boost Master directs copying of data Performance on a production cluster Single failure, full recovery (600GB): 23.2 min Double failure, restored 2x replication: 2min 34 34

Garbage Collection Master does not need to have a strong knowledge of what is stored on each data node Master regularly scans namespace After GC interval, deleted files are removed from the namespace Data node periodically polls Master about each chunk it knows of. If a chunk is forgotten, the master tells data node to delete it. 35 35

Limitations Master is a central point of failure Master can be a scalability bottleneck Latency when opening/stating thousands of files Security model is weak 36 36

Conclusion Inexpensive commodity components can be the basis of a large scale reliable system Adjusting the API, e.g. RecordAppend, can enable large distributed apps Fault tolerant Useful for many similar apps 37 37