L5-6:Runtime Platforms Hadoop and HDFS

Similar documents
Distributed Filesystem

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Big Data for Engineers Spring Resource Management

The Google File System. Alexandru Costan

L1:Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung ACM SOSP, 2003

L3: Spark & RDD. CDS Department of Computational and Data Sciences. Department of Computational and Data Sciences

Big Data 7. Resource Management

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Spark Programming with RDD

CLOUD-SCALE FILE SYSTEMS

Distributed Systems 16. Distributed File Systems II

Distributed File Systems II

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

MapReduce. U of Toronto, 2014

The Google File System

The Google File System

The Google File System

Introduction to MapReduce

Hadoop and HDFS Overview. Madhu Ankam

BigData and Map Reduce VITMAC03

Lecture 11 Hadoop & Spark

MI-PDB, MIE-PDB: Advanced Database Systems

Google File System (GFS) and Hadoop Distributed File System (HDFS)

CS370 Operating Systems

Tutorial: Apache Storm

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

The Google File System

Hadoop MapReduce Framework

CS370 Operating Systems

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

CSE 124: Networked Services Fall 2009 Lecture-19

A BigData Tour HDFS, Ceph and MapReduce

CAP Theorem, BASE & DynamoDB

The Google File System

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Google File System. Arun Sundaram Operating Systems

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Introduction to MapReduce

Distributed Computation Models

CCA-410. Cloudera. Cloudera Certified Administrator for Apache Hadoop (CCAH)

CSE 124: Networked Services Lecture-16

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Distributed System. Gang Wu. Spring,2018

NPTEL Course Jan K. Gopinath Indian Institute of Science

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

CA485 Ray Walshe Google File System

Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ]

Introduction to MapReduce. Instructor: Dr. Weikuan Yu Computer Sci. & Software Eng.

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

Map-Reduce. Marco Mura 2010 March, 31th

Google File System. By Dinesh Amatya

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Clustering Lecture 8: MapReduce

CS 345A Data Mining. MapReduce

Programming Models MapReduce

Hadoop/MapReduce Computing Paradigm

Hadoop Distributed File System(HDFS)

PaaS and Hadoop. Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University

The Evolving Apache Hadoop Ecosystem What it means for Storage Industry

A brief history on Hadoop

The Google File System

2/26/2017. For instance, consider running Word Count across 20 splits

GFS: The Google File System

HADOOP FRAMEWORK FOR BIG DATA

Google File System 2

The Google File System

HADOOP 3.0 is here! Dr. Sandeep Deshmukh Sadepach Labs Pvt. Ltd. - Let us grow together!

The Google File System (GFS)

MapReduce-style data processing

CLIENT DATA NODE NAME NODE

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler

CS 61C: Great Ideas in Computer Architecture. MapReduce

Dept. Of Computer Science, Colorado State University

TI2736-B Big Data Processing. Claudia Hauff

UNIT-IV HDFS. Ms. Selva Mary. G

Hadoop An Overview. - Socrates CCDH

L10: Dictionary & Skip List

August Li Qiang, Huang Qiulan, Sun Gongxing IHEP-CC. Supported by the National Natural Science Fund

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Programming Systems for Big Data

Database Applications (15-415)

The Analysis Research of Hierarchical Storage System Based on Hadoop Framework Yan LIU 1, a, Tianjian ZHENG 1, Mingjiang LI 1, Jinpeng YUAN 1

Cloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University

Automatic-Hot HA for HDFS NameNode Konstantin V Shvachko Ari Flink Timothy Coulter EBay Cisco Aisle Five. November 11, 2011

GFS: The Google File System. Dr. Yingwu Zhu

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Discretized Streams. An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters

SparkStreaming. Large scale near- realtime stream processing. Tathagata Das (TD) UC Berkeley UC BERKELEY

Transcription:

Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत Department of Computational and Data Sciences SE256:Jan16 (2:1) L5-6:Runtime Platforms Hadoop and HDFS Yogesh Simmhan 03/ 10 F e b, 2016 Yogesh Simmhan & Partha Talukdar, 2016 This work is licensed under a Creative Commons Attribution 4.0 International License Copyright for external content used with attribution is retained by their original authors

Learning Objectives 1. How does HDFS work? Why is it effective? 2. How does Hadoop MapReduce work? Why is it effective? 3. Optimizations for performance and reliability in Hadoop MR

Data Centre Architecture Recap Commodity hardware 1000 s machines of medium performance and reliability Failure is a given. Design to withstand failure. Network bottlenecks Hierarchical network design Push compute to data Introduction to MapReduce and Hadoop, Matei Zaharia, UC Berkeley

Data Centre Architecture Recap I/O bottlenecks & failure Multiple disks for cumulative bandwidth Data redundancy: Hot/Hot Example: How long to read 150TB of CC data? Say you can store 150GB in a SATA disk SATA Disk bandwidth is 3Gbps 111hrs Say you have 50 disks of 3Gbps, each with 3TB I/O controller in node handles 10Gbps 33hrs to read Say cluster with 12 nodes, 4 disks of 3TB each (150/12)TB *1000*8bits/10Gbps = 2.7hrs to read Say cluster with 12 nodes, dual Ethernet, reading over network (150/12)TB*1000*8/2Gbps = 13.9hrs to read Time to read across network is not very different from time to read from stressed disk GrayWulf, Scalable Clustered Architecture for Data Intensive Computing, Szalay, HICSS, 2008

E.g. Open Cloud Server High density: 24 blades / chassis, 96 blades / rack Compute blades Dual socket, 4 HDD, 4 SSD 16-32 CPU cores 4-16TB HDD/SSD JBOD Blade 10 to 80 HDDs, 6G or 12G SAS 40-160TB HDD http://www.opencompute.org/wiki/motherboard/specsanddesigns

Class Cluster Nodes 8 core AMD Opteron 3380, 2.6GHz 32GB DDR3 2TB HDD 1Gbps LAN 12 nodes, 3U 1 Gigabit within switch, 10Gbps across switches http://www.supermicro.com/aplus/system/3u/3012/as-3012ma-h12trf.cfm

Cisco s Data Center in Texas

Google s Data Center in Georgia

Microsoft s Data Center in Ireland

NSA s Data Center in Utah

Who is this? Doug Cutting and Hadoop the elephant Hadoop was created by Doug Cutting (Yahoo) and Mike Cafarella (UW) in 2006. Cutting's son, then 2, was just beginning to talk and called his beloved stuffed yellow elephant "Hadoop" (with the stress on the first syllable). http://www.cnbc.com/id/100769719 https://en.wikipedia.org/wiki/apache_hadoop#history

Hadoop: Big Picture Interactions Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer, 2010

Hadoop: Big Picture Interactions Data Node TaskTracker TaskTracker Data Node Hadoop: The Definitive Guide, 4 th Edition, 2015

Hadoop Distributed File System

Hadoop Distributed File System (HDFS) Based on Google File System (GFS) Optimized for huge files Write once, read many Create new data. Never update-in-place, only append. No write locks needed. Initial cost of writes is amortized. Optimized for sequential reads Typically, start at a point and read to completion Throughput (cumulative bandwidth) favoured over low latency Low total time for all data than time per small files Survive high disk/node failures Both persistence, availability The Google File System, Sanjay Ghemawat, et al, SOSP, 2003

HDFS File Distribution Files are split into blocks of equal size Unit of data that can be read or written Block sizes are large, e.g. 128MB Blocks themselves are persisted on local disks, e.g. using POSIX file system Blocks are replicated Advantages Larger reads/writes (throughput) Files can be larger than single disk Eases distributed management Same size, opaque content, complexity pushed up. Unit of recovery, replication

HDFS Design Master-slave architecture Master manages namespace, directory/file names/tree structure, metadata, block ids, permissions Slave manages blocks containing data Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer, 2010

Master: Name Node Persists names, trees, metadata, permissions on disk Namespace image (fsimage) Edit log of deltas (rename, permission, create) Mapping from files to list of blocks Security is not a priority Block location not persistent, kept in-memory Mapping from blocks to locations is dynamic Why? Reconstructs location of blocks from data nodes

Master: Name Node Detects health of FS Is data node alive? Is data block under-replicated? Rebalancing block allocation across data nodes, improved disk utilization Coordinates file operations Directs application clients to datanodes for reads Allocates blocks on datanodes for writes

Master: Name Node Single Point of failure! (Hadoop 1.x) Upgrades Downtime Failure Data loss (file names, file:block ID mapping) Secondary NameNode (Stale backup), multiple FS NameNode High Availability (Hadoop 2.x) Cold standby can take time 10mins to load, 1hr for block list Active acks after write to NFS Hot standby refreshed periodically Loads current status into memory Shared NFS should be reliable! DataNodes send heartbeat to both But ops received only from active http://blog.cloudera.com/blog/2012/03/high-availability-for-the-hadoopdistributed-file-system-hdfs/

Slave/Worker: Data Node Store & retrieve blocks Respond to client and master requests for block operations Sends heartbeat every 3 secs for liveliness Periodically sends list of block IDs and location on that node Piggyback on heartbeat message e.g., send block list every hour

File Reads Clients get list of data nodes locations for each block from NameNode, sorted by distance Blocks read in order Connection opened and closed to nearest DataNode for each block Tries alternate on network failure, checksum failure Hadoop: The Definitive Guide, 4 th Edition, 2015

File Writes Clients get list of data nodes to store a block s replica First copy on same data node as client, or random. Second is off-rack. Third on same rack as second. Blocks written in order. Forwarded in a pipeline. Acks from all replicas expected before next block written. Hadoop: The Definitive Guide, 4 th Edition, 2015

Hadoop YARN Yet Another Resource Negotiator

MapReduce v1 MapReduce v2 (YARN) Apache Hadoop YARN, Arun C. Murthy, et al, HortonWorks, Addison Wesley, 2014

YARN Designed for scalability 10k nodes, 400k tasks Designed for availability Separate application management from resource management Improve utilization Flexible slot allocation. Slots not bound to Map or Reduce types. Go beyond MapReduce

YARN ResourceManager for cluster Keeps track of nodes, capacities, allocations Failure and recovery (heartbeats) Coordinates scheduling of jobs on the cluster Decides which node to allocate to a job Ensures load balancing Used by programming frameworks to schedule distributed applications MapReduce, Spark, etc. NodeManager Offers slots with given capacity on a host to schedule tasks Container maps to one or more slots

YARN Application Lifecycle Apache Hadoop YARN, Arun C. Murthy, et al, HortonWorks, Addison Wesley, 2014

v1 vs. v2 Application Lifecycle Apache Hadoop YARN, Arun C. Murthy, et al, HortonWorks, Addison Wesley, 2014

MapReduce Job Lifecycle Container requests are aware of block locality Hadoop: The Definitive Guide, 4 th Edition, 2015

Container heartbeat status to AM Apache Hadoop YARN, Arun C. Murthy, et al, HortonWorks, Addison Wesley, 2014

Scheduling in YARN Scheduler has narrow mandate FIFO, as soon as resource available Capacity using different queues, min capacity per queue Allocate excess resource to more loaded Fair Give all available Redistribute as jobs arrive Hadoop: The Definitive Guide, 4 th Edition, 2015

Hadoop MapReduce

Hadoop: The Definitive Guide, 4 th Edition, 2015

Mapping tasks to blocks Data blocks converted to one or more splits Typically, 1 split per block reduce task creation overhead vs. overwhelm single task Each split handled by a single Mapper task Records read from each split, forms Key-Value pair input to Map function Compute tasks are moved to data block location, if possible Enhances data locality

Mapping tasks to blocks Hadoop: The Definitive Guide, 4 th Edition, 2015

Hadoop: The Definitive Guide, 4 th Edition, 2015

Local Disk Local Disk Spills when memory buffer full Divides the data into partitions for each reducer performs an in-memory sort by key, per partition Runs combiner if present on sorted output and writes If >3 spill files, runs combiner again. Outputs merged, partitioned and sorted into single file on disk Hadoop: The Definitive Guide, 4 th Edition, 2015

Local Disk Local Disk Reducer copies as soon as map output available Copied to reducer memory if small, then merged/spilled to disk on overflow Incremental merge sort takes place in background Full merge/sort takes place before reduce method called Hadoop: The Definitive Guide, 4 th Edition, 2015

Fault Tolerance Idempotent, side-effect free Save data to local disk before reduce Task crash & recover Node crash & recover Skipping bad records

Re-execution Redundant execution Improve Utilization Speculative execution Locating stragglers

Reading Hadoop: The Definitive Guide, 4 th Edition, 2015 Chapters 3, 4, 7 Additional Resources Apache Hadoop YARN: Moving Beyond MapReduce and Batch Processing with Apache Hadoop, 2015 Chapters 1, 3, 4, 7