MOHA: Many-Task Computing Framework on Hadoop

Similar documents
Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC

Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work

Typically applied in clusters and grids Loosely-coupled applications with sequential jobs Large amounts of computing for long periods of times

The Fusion Distributed File System

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Flash Storage Complementing a Data Lake for Real-Time Insight

The amount of data increases every day Some numbers ( 2012):

2/26/2017. The amount of data increases every day Some numbers ( 2012):

DATA SCIENCE USING SPARK: AN INTRODUCTION

Big Data and Object Storage

Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse

CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase. Chen Zhang Hans De Sterck University of Waterloo

August Li Qiang, Huang Qiulan, Sun Gongxing IHEP-CC. Supported by the National Natural Science Fund

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Lightweight Streaming-based Runtime for Cloud Computing. Shrideep Pallickara. Community Grids Lab, Indiana University

Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work

Progress on Efficient Integration of Lustre* and Hadoop/YARN

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Distributed File Systems II

EsgynDB Enterprise 2.0 Platform Reference Architecture

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Improving the MapReduce Big Data Processing Framework

Harp-DAAL for High Performance Big Data Computing

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Embedded Technosolutions

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Hadoop Virtualization Extensions on VMware vsphere 5 T E C H N I C A L W H I T E P A P E R

MapReduce. U of Toronto, 2014

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs

A Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council. Perth, July 31-Aug 01, 2017

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Overview Past Work Future Work. Motivation Proposal. Work-in-Progress

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

High Performance Computing on MapReduce Programming Framework

Big Data: Tremendous challenges, great solutions

Sizing Guidelines and Performance Tuning for Intelligent Streaming

Introduction to Grid Computing

A Data Diffusion Approach to Large Scale Scientific Exploration

BigData and Map Reduce VITMAC03

Apache Hadoop 3. Balazs Gaspar Sales Engineer CEE & CIS Cloudera, Inc. All rights reserved.

Tuning Intelligent Data Lake Performance

Ian Foster, An Overview of Distributed Systems

Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing and Management

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

BIG DATA TESTING: A UNIFIED VIEW

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD

Adaptive Cluster Computing using JavaSpaces

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Lecture 11 Hadoop & Spark

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS

CAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters

Data Management in Parallel Scripting

L3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

An Introduction to Big Data Analysis using Spark

CSE 124: Networked Services Lecture-17

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Enosis: Bridging the Semantic Gap between

Sharing High-Performance Devices Across Multiple Virtual Machines

Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications

FaBRiQ: Leveraging Distributed Hash Tables towards Distributed Publish-Subscribe Message Queues

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

Fault-Tolerant Parallel Analysis of Millisecond-scale Molecular Dynamics Trajectories. Tiankai Tu D. E. Shaw Research

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

Cloud Computing & Visualization

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

Configuring and Deploying Hadoop Cluster Deployment Templates

Auto Management for Apache Kafka and Distributed Stateful System in General

MERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

Commercial Data Intensive Cloud Computing Architecture: A Decision Support Framework

Mixing and matching virtual and physical HPC clusters. Paolo Anedda

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

High Performance and Cloud Computing (HPCC) for Bioinformatics

Chapter 5. The MapReduce Programming Model and Implementation

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Best Practices and Performance Tuning on Amazon Elastic MapReduce

Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure

Global Journal of Engineering Science and Research Management

SMCCSE: PaaS Platform for processing large amounts of social media

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing

Transcription:

Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017

Table of Contents Introduction Design and Implementation of MOHA Evaluation Discussion Conclusion

Introduction Distributed/Parallel computing systems to support various types of challenging applications HTC (High-Throughput Computing) a computing paradigm focusing on the efficient execution of a largenumber of loosely-coupled tasks (e.g., independent and sequential tasks) over long periods of time exploiting many optimistic computing resources deployed across multiple distributed administration domains primary metrics: # of operations or tasks per month HPC (High-Performance Computing) targets efficiently processing tightly-coupled parallel tasks usually being executed on a particular site (e.g., on a supercomputer) with low-latency interconnects primary metrics: # of floating point operations per second (FLOPS)

Introduction DIC (Data-intensive Computing) A parallel processing framework employing data parallel approach to efficiently process large volumes of data (e.g., terabytes or petabytes in size typically referred to as big data) Devoting most of its processing time on I/O and movement and manipulation of data CIC (Compute-intensive computing) Used to describe compute-bound applications Devoting most of its execution time to computational requirements as apposed to I/O (e.g., typically requiring small volume of data)

Introduction Many-Task Computing (MTC) as a new computing paradigm [I. Raicu, I. Foster, Y. Zhao, MTAGS 08] A very large number of tasks (millions or even billions) Relatively short per task execution time (seconds to minutes) A large variance of task execution time (i.e., ranging from hundreds of milliseconds to hours) Data intensive tasks (i.e., lots of relatively small-sized file operations ranging from hundreds of KBs to tens of MBs of I/Os) Communication-intensive, however, not based on message passing interface but rather through file operations. astronomy, physics, pharmaceuticals, chemistry, etc.

Introduction Many-Task Computing Applications astronomy, physics, pharmaceuticals, chemistry, etc. A very large # of tasks millions or even billions Relatively short pertask execution time A large variance of task execution time from hundreds of milliseconds to hours Data intensive tasks tens of MB of I/O per second Communication through files seconds to minutes High-Performance Task Dispatching Dynamic Load Balancing Another Type of Data-intensive among compute nodes Workload Slide #6

Introduction Hadoop, the de facto standard Big Data store and processing infrastructure with the advent of YARN, Hadoop is moving beyond MapReduce and Batch processing, and evolving into multi-use data platform build/run multiple distributed frameworks/applications on top of Hadoop cluster

Introduction YARN facilitates the development of other purpose-built data processing models beyond MapReduce MOHA (Many-task computing On HAdoop): Proof of Concept development for enabling the Many-Task Computing practices on a Hadoop Cluster currently prototyped as a YARN application MTC Multi-level Scheduling Hadoop YARN Resource Management

Related Work GERBIL: MPI+YARN [L. Xu, M. Li, A. R. Butt, CCGrid 15] A framework for hosting MPI applications on a Hadoop cluster so as to transparently co-host unmodified MPI applications alongside MapReduce applications allows realization of rich data analytics workflows as well as efficient data sharing between the MPI and MapReduce models within a single cluster Scientific applications like Metagenomics are beneficial from GERBIL Consisting of compute and communication-intensive tasks (MPI) and dataintensive tasks (MapReduce) in its workflow

Related Work: HTCaaS We have developed the HTCaaS system for MTC execution on top of a distributed & heterogeneous computing infrastructure (e.g., Clusters, Supercomputers, Grids, etc) HTCaaS: a Multi-level Scheduling System flexible and powerful high-level job description in XML e.g., parameter sweeps or N-body calculations automated job split into a set of tasks to be enqueued agent-based multi-level scheduling 1 st level scheduler for launching agents and 2 nd level scheduler for task pulling & execution and handling fault tolerances etc. data staging-in & staging-out HTCaaS is currently running as a pilot service on top of PLSI supporting a number of scientific applications from pharmaceutical domain and high-energy physics

Related Work: HTCaaS HTCaaS Client HTCaaS Server HTCaaS Agent

YARN vs. HTCaaS Platform Layer (1 st level scheduling) Application Framework Layer (2 nd level scheduling) YARN Resource Manager (RM) Node Manager (NM) YARN Client YARN Application Master YARN Container HTCaaS HTCaaS Server (e.g., HTCaaS Agent Manager Component) Local Resource Manager running on local data/computing sites (e.g., PBS, Grid RM, Cloud RM) HTCaaS Client HTCaaS Server HTCaaS Agent

Table of Contents Introduction Design and Implementation of MOHA Evaluation Discussion Conclusion

Hadoop YARN Execution Model YARN separates all of its functionality into two layers platform layer is responsible for resource management (firstlevel scheduling) Resource Manager, Node Manager framework layer coordinates application execution (secondlevel scheduling) Client, ApplicationMaster, Container

MOHA System Architecture YARN Client YARN Container YARN ApplicationMaster

MOHA System Architecture MOHA Client create & submit a MOHA MTC job and performs data staging A MOHA job is a bag of tasks (i.e., a collection of multiple tasks) provides a simple JDL(Job Description Language) upload required data into the HDFS application input data, application executable, MOHA JAR, JDL etc. prepare an execution environment for the MOHA Manager based on YARN s Resource Localization Mechanism required data are automatically downloaded and prepared for use in the local working directories

MOHA System Architecture MOHA Manager create a MOHA job queues split a MOHA MTC job described in JDL into multiple tasks and insert them into the queue get containers allocated and launch MOHA TaskExecutors MOHA TaskExecutor pull the MOHA tasks from the MOHA job queues and process them monitor and report the task execution Assign Containers Start AppMaster & register Resource capabilities Request Containers MOHA Manager pulling the tasks Multi-level Scheduling Mechanism

Table of Contents Introduction Design and Implementation of MOHA Evaluation Discussion Conclusion

Experimental Setup MOHA Testbed consists of 3 rack mount servers 2 * Intel Xeon E5-2620v3 CPUS (12 CPU cores) 64GB of main memory 2 * 1TB SATA HDD (1 for Linux, 1 for HDFS) Software stack Hortonworks Data Platform (HDP) 2.3.2 automated install with Apache Ambari Operating Systems Requirements CentOS release 6.7 (Final) Identical environment with the Hortonworks Sandbox VM Slide #19

Experimental Setup Comparison Models YARN Distributed-Shell a simple YARN application that can execute shell commands (scripts) on distributed containers in a Hadoop cluster MOHA-ActiveMQ ActiveMQ running on a single node with New I/O (NIO) Transport MOHA-Kafka 3 Kafka Brokers with minimum fetch size (64 bytes) Workload Microbenchmark varying the # of sleep 0 tasks Performance Metrics Elapsed time (i.e., makespan or turnaround time) Task processing rate (# of tasks/sec) Slide #20

Experimental Results Performance Comparison (Total Elapsed Time) multiple resource (de)allocations in YARN Distributed-Shell multi-level scheduling mechanisms enable MOHA frameworks to substantially reduce the cost of executing many tasks 8.4x 28.5x Slide #21

Experimental Results Task Dispatching Rate and Initialization Overhead MOHA-Kafka outperforms MOHA-ActiveMQ as the number of TaskExecutors increases (also Falkon s 15,000 tasks/sec) have not fully utilized Kafka s task bundling functionality Initialization Overhead mostly queuing time Slide #22

Table of Contents Introduction Design and Implementation of MOHA Evaluation Discussion Conclusion

Discussion MTC applications typically require much larger numbers of tasks relatively short per-task execution time substantial amount of data operations with potential interactions through files high-performance task dispatching effective dynamic load balancing data-intensive workload support seamless integration Hadoop can be a viable choice for addressing these challenging MTC applications technologies from MTC community should be effectively converged into the ecosystem

Discussion Potential Research Issues Scalable Job/Metadata Management removing potential performance bottleneck Dynamic Task Load Balancing Task bundling and Job profiling techniques Scalable Job & Metadata Management Dynamic Load Balancing Executor Executor Executor Executor Pulling based streamlined task dispatching Executor Executor Slide #25

Discussion Potential Research Issues Data-aware resource allocation leveraging Hadoop s data locality (computations close to data) Data Grouping & Declustering aggregating a groups of small files ( data bundle ) YARN 1 2 3 4 MOHA Manager (Job & Metadata Management) Locality Metadata Task Executor 1 Task Executor 4 1 2 3 4 5 task task data data data data data data data data data data data data 5 Task Executor 2 2 3 5 Task Bundling & Data Grouping can be closely related Slide #26

Table of Contents Introduction Design and Implementation of MOHA Discussion Evaluation Conclusion Slide #27

Conclusion Design and implementation of MOHA (Many-task computing On HAdoop) framework effectively combine MTC technologies with Hadoop developed as one of Hadoop YARN applications transparently co-host existing MTC applications with other Big Data processing frameworks in a single Hadoop cluster MOHA prototype as a Proof-of-Concept can execute shell command based many tasks across distributed computing resources substantially reduce the overall execution time of many-task processing with minimal amount of resources compared to the existing YARN Distributed-Shell efficiently dispatch a large number of tasks by exploiting multi-level scheduling and streamlined task dispatching Slide #28

Another work-in-progress PoC development in relevance with the MOHA development

Making Hadoop Ecosystem available for use on top of Lustre file system, too Motivation MTC often requires a large number (e.g., tens of millions) of relatively small-sized (e.g., tens or hundreds of KBs to MBs) of file I/O operations over the course of a MTC job execution doesn t fit with the underlying HDFS where a typical data block size is 64MB Most of scientific data, the major target data that MTC often has to deal with, is stored in parallel file system (PFS) like Lustre. Requires performance-degrading manual data copying between the two file systems: HDFS and Lustre file system.

Thank you! National Institute of Supercomputing and Networking 2017