Big Data for Engineers Spring Resource Management

Similar documents
Big Data 7. Resource Management

CCA-410. Cloudera. Cloudera Certified Administrator for Apache Hadoop (CCAH)

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

L5-6:Runtime Platforms Hadoop and HDFS

Database Applications (15-415)

Vendor: Cloudera. Exam Code: CCA-505. Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam.

MI-PDB, MIE-PDB: Advanced Database Systems

Introduction To YARN. Adam Kawa, Spotify The 9 Meeting of Warsaw Hadoop User Group 2/23/13

Ghislain Fourny. Big Data 6. Massive Parallel Processing (MapReduce)

A Glimpse of the Hadoop Echosystem

CCA Administrator Exam (CCA131)

Ghislain Fourny. Big Data Fall Massive Parallel Processing (MapReduce)

Facilitating Consistency Check between Specification & Implementation with MapReduce Framework

Hadoop MapReduce Framework

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2

Introduction to MapReduce

Exam Questions CCA-500

50 Must Read Hadoop Interview Questions & Answers

PaaS and Hadoop. Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University

Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ]

CS 378 Big Data Programming

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

HADOOP 3.0 is here! Dr. Sandeep Deshmukh Sadepach Labs Pvt. Ltd. - Let us grow together!

HADOOP FRAMEWORK FOR BIG DATA

Hadoop Integration User Guide. Functional Area: Hadoop Integration. Geneos Release: v4.9. Document Version: v1.0.0

Map Reduce & Hadoop Recommended Text:

Lecture 11 Hadoop & Spark

Mixing and matching virtual and physical HPC clusters. Paolo Anedda

Hadoop Map Reduce 10/17/2018 1

2/26/2017. For instance, consider running Word Count across 20 splits

Vendor: Cloudera. Exam Code: CCD-410. Exam Name: Cloudera Certified Developer for Apache Hadoop. Version: Demo

3. Monitoring Scenarios

Volume 3, Issue 9, September 2015 ISSN Hadoop2 Yarn. Asso.Prof. Ashish Sharma #1 Snehlata Vyas *2

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

docs.hortonworks.com

Distributed Systems CS6421

YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa

Distributed Face Recognition Using Hadoop

An Enhanced Approach for Resource Management Optimization in Hadoop

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

Actual4Dumps. Provide you with the latest actual exam dumps, and help you succeed

Programming Models MapReduce

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Automatic Voting Machine using Hadoop

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Chapter 5. The MapReduce Programming Model and Implementation

Hadoop. copyright 2011 Trainologic LTD

MAXIMIZING PARALLELIZATION OPPORTUNITIES BY AUTOMATICALLY INFERRING OPTIMAL CONTAINER MEMORY FOR ASYMMETRICAL MAP TASKS. Shubhendra Shrimal.

1. Introduction (Sam) 2. Syntax and Semantics (Paul) 3. Compiler Architecture (Ben) 4. Runtime Environment (Kurry) 5. Testing (Jason) 6. Demo 7.

CS370 Operating Systems

Hadoop/MapReduce Computing Paradigm

Aeromancer: A Workflow Manager for Large- Scale MapReduce-Based Scientific Workflows

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud

Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures. Hiroshi Yamaguchi & Hiroyuki Adachi

Dynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c

TP1-2: Analyzing Hadoop Logs

Welcome to. uweseiler

Getting Started with Hadoop

7 Deadly Hadoop Misconfigurations. Kathleen Hadoop Talks Meetup, 27 March 2014

2/4/2019 Week 3- A Sangmi Lee Pallickara

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

Hortonworks HDPCD. Hortonworks Data Platform Certified Developer. Download Full Version :

Distributed Systems. CS422/522 Lecture17 17 November 2014

Clustering Lecture 8: MapReduce

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Exam Questions CCA-505

Cloud Computing CS

Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University

Performance evaluation of job schedulers on Hadoop YARN

Certified Big Data and Hadoop Course Curriculum

A brief history on Hadoop

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391

Apache Hadoop 3. Balazs Gaspar Sales Engineer CEE & CIS Cloudera, Inc. All rights reserved.

A Survey on Big Data

Novel Scheduling Algorithms for Efficient Deployment of MapReduce Applications in Heterogeneous Computing Environments

Introduction to the Hadoop Ecosystem - 1

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Sinbad. Leveraging Endpoint Flexibility in Data-Intensive Clusters. Mosharaf Chowdhury, Srikanth Kandula, Ion Stoica. UC Berkeley

ExamTorrent. Best exam torrent, excellent test torrent, valid exam dumps are here waiting for you

Hadoop An Overview. - Socrates CCDH

Your First Hadoop App, Step by Step

CS370 Operating Systems

VMware vsphere Big Data Extensions Command-Line Interface Guide

Top 25 Hadoop Admin Interview Questions and Answers

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Deep Data Locality on Apache Hadoop

Coflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan

KillTest *KIJGT 3WCNKV[ $GVVGT 5GTXKEG Q&A NZZV ]]] QORRZKYZ IUS =K ULLKX LXKK [VJGZK YKX\OIK LUX UTK _KGX

HPC-Reuse: efficient process creation for running MPI and Hadoop MapReduce on supercomputers

Top 25 Big Data Interview Questions And Answers

itpass4sure Helps you pass the actual test with valid and latest training material.

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Hadoop On Demand: Configuration Guide

Progress on Efficient Integration of Lustre* and Hadoop/YARN

MapReduce, Hadoop and Spark. Bompotas Agorakis

Introduction to Map/Reduce. Kostas Solomos Computer Science Department University of Crete, Greece

Transcription:

Ghislain Fourny Big Data for Engineers Spring 2018 7. Resource Management artjazz / 123RF Stock Photo

Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 2

Where we are User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 3

Last week: MapReduce Input data Map Map Map Map Map Map Map Map Intermediate data (shuffled) Reduce Reduce Reduce Reduce Reduce Reduce Reduce Reduce Output data 4

Hadoop infrastructure (version 1) Namenode /dir/file Datanode Datanode Datanode Datanode Datanode Datanode 5

Hadoop infrastructure (version 1) Namenode + JobTracker /dir/file Datanode + TaskTracker Datanode + TaskTracker Datanode + TaskTracker Datanode + TaskTracker Datanode + TaskTracker Datanode + TaskTracker 6

Responsibilities of the MapReduce JobTracker Resource Management 7

Responsibilities of the MapReduce JobTracker Resource Management Scheduling 8

Responsibilities of the MapReduce JobTracker Resource Management Scheduling Monitoring 9

Responsibilities of the MapReduce JobTracker Resource Management Scheduling Monitoring Job lifecycle 10

Responsibilities of the MapReduce JobTracker Resource Management Scheduling Monitoring Job lifecycle Fault-tolerance 11

Issue 1: scalability M M M M M M M M M M M M < 4,000 nodes < 40,000 tasks 12

Issue 2: bottleneck JobTracker Bottleneck TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker 13 13

Issue 3: Jack of all trades Scheduling Monitoring 14 14

Issue 4: Utilization (task slots) Static (Decide on M/R at configuration time) Fixed-size 15 15

Issue 5: Not fungible Map Reduce 16 16

Issue 5: Not fungible Working at maximum capacity Idle Map Reduce 17 17

kirtchanut / 123RF Stock Photo YARN 18

YARN Yet Another Resource Negotiator 19

YARN Scheduling Application Monitoring management Resource Manager Application Master Application Master Application Master Application Master Application Master 20

Scales more M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M 10,000 nodes 100,000 tasks 21

YARN architecture ResourceManager 22

YARN architecture ResourceManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager 23

YARN architecture ResourceManager Container Container Container NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager 24

Remember... It does ring a bell, doesn't it? 25

Master-slave architecture Master Slave Slave Slave Slave Slave Slave 26

HDFS server architecture Namenode /dir/file1 /dir/file2 /file3 Datanode Datanode Datanode Datanode Datanode Datanode 27

YARN ResourceManager Container Container Container NodeManager NodeManager NodeManager NodeManager NodeManager 28

YARN Client ResourceManager Job Container Container Container NodeManager NodeManager NodeManager NodeManager NodeManager 29

YARN: RM allocates an Application Master Client ResourceManager Job Schedules Container Container Container NodeManager NodeManager NodeManager NodeManager NodeManager 30

YARN: RM allocates an Application Master Client ResourceManager Job Schedules Application Master Container Container NodeManager NodeManager NodeManager NodeManager NodeManager 31

YARN: RM allocates an Application Master Client ResourceManager Job Application Master Container Container NodeManager NodeManager NodeManager NodeManager NodeManager 32

YARN: RM allocates an Application Master Client ResourceManager Job Application Master Container Container NodeManager NodeManager NodeManager NodeManager NodeManager 33

Application Master communicates with containers Application Master Container Container Container Execute Monitor Container 34

kirtchanut / 123RF Stock Photo YARN's Resource Manager 35

Resource Manager Capacity guarantees Cluster Utilization Fairness Service Level Agreements 36

Communication with clients 37

Communication with clients Client Service Application (start, end) Queue information Statistics 38

Communication with clients Client Service Application (start, end) Queue information Statistics Admin Service Refresh the node list Queue configuration 39

Communication with the node managers 40

Communication with the node managers Resource Tracker 41

Communication with the node managers Resource Tracker Liveliness 42

Communication with the node managers Resource Tracker Liveliness Nodes List Manager valid invalid 43

Communication with the application masters 44

Communication with the application masters Application Master Service (registration) 45

Communication with the application masters Application Master Service (registration) Liveliness 46

Communication with the application masters Application Master Service (registration) Liveliness Application Master Service (container requests) 47

Authentication 48

Authentication Application Token 49

Authentication Application Token Container Token 50

Authentication Application Token Application ACL Container Token 51

Pure scheduler Does not monitor tasks. Does not restart upon failure. 52

Scheduling strategies: pluggable scheduler 53

Scheduling strategies: pluggable scheduler FIFO scheduler 54

Scheduling strategies: pluggable scheduler FIFO scheduler 55

Scheduling strategies: pluggable scheduler FIFO scheduler 56

Scheduling strategies: pluggable scheduler FIFO scheduler 57

Scheduling strategies: pluggable scheduler FIFO scheduler 58

Scheduling strategies: pluggable scheduler Capacity scheduler Queue 1 Queue 2 59

Scheduling strategies: pluggable scheduler Capacity scheduler Queue 1 Queue 2 60

Scheduling strategies: pluggable scheduler Capacity scheduler Queue 1 Queue 2 61

Scheduling strategies: pluggable scheduler Capacity scheduler Queue 1 Queue 2 62

Scheduling strategies: pluggable scheduler Capacity scheduler Queue 1 Queue 2 63

Scheduling strategies: pluggable scheduler Capacity scheduler Queue 1 Queue 2 64

Scheduling strategies: pluggable scheduler Fair scheduler 65

Scheduling strategies: pluggable scheduler Fair scheduler 66

Scheduling strategies: pluggable scheduler Fair scheduler 67

Scheduling strategies: pluggable scheduler Fair scheduler 68

Scheduling strategies: pluggable scheduler Fair scheduler 69

Scheduling strategies: pluggable scheduler Fair scheduler 70

Scheduling strategies: pluggable scheduler Fair scheduler 71

Resource container X GB W cores, U GHz Y TB Z MBps 72

kirtchanut / 123RF Stock Photo YARN's Node Manager 73

NodeManager: one per node NodeManager NodeManager NodeManager NodeManager 74

Monitoring Memory CPU Disk Network 75

Reports to ResourceManager Memory CPU ResourceManager Disk Network 76

Container 77

kirtchanut / 123RF Stock Photo YARN's Application Masters 78

Application Master Application Master is per application. 79

Application Master Application Master is application-specific. 80

Framework-specific application masters MapReduce DAG distributed processing Message Passing Interface Graph processing 81

Complexity is moved to the Application Master ResourceManager complexity 82

Application Master ResourceManager negotiates resources 83

Application Master ResourceManager negotiates resources executes and monitors NodeManager 84

Fault tolerance is on the application master 85

Fault tolerance is on the application master 86

Fault tolerance is on the application master relaunch 87

Application-specific monitoring no longer a bottleneck 88

Application Master is not trusted 89

Application Master is not trusted Evil plan to book containers and not use them 90

Summary Separation between scheduling and monitoring 91

Summary Separation between scheduling and monitoring Scalability 92

Summary Separation between scheduling and monitoring Scalability Availability 93

Summary Separation between scheduling and monitoring Scalability Availability Multi-tenancy 94

Forward compatibility with DAGs of tasks 95