Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse

Similar documents
HPC-Reuse: efficient process creation for running MPI and Hadoop MapReduce on supercomputers

SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers

Progress on Efficient Integration of Lustre* and Hadoop/YARN

Hadoop Map Reduce 10/17/2018 1

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

Hadoop MapReduce Framework

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Guoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14.

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Map Reduce & Hadoop Recommended Text:

Introduction to MapReduce. Instructor: Dr. Weikuan Yu Computer Sci. & Software Eng.

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MOHA: Many-Task Computing Framework on Hadoop

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Introduction to MapReduce

Hadoop/MapReduce Computing Paradigm

Improving the MapReduce Big Data Processing Framework

2/26/2017. For instance, consider running Word Count across 20 splits

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

2/4/2019 Week 3- A Sangmi Lee Pallickara

Introduction To YARN. Adam Kawa, Spotify The 9 Meeting of Warsaw Hadoop User Group 2/23/13

Big Data for Engineers Spring Resource Management

MapReduce Design Patterns

HaLoop Efficient Iterative Data Processing on Large Clusters

DATA SCIENCE USING SPARK: AN INTRODUCTION

MI-PDB, MIE-PDB: Advanced Database Systems

Big data systems 12/8/17

Large-Scale GPU programming

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

Programming Models MapReduce

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs

Don t Get Caught In the Cold, Warm-up Your JVM Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems

Expert Lecture plan proposal Hadoop& itsapplication

YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Big Data 7. Resource Management

Cloud Computing CS

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

MapReduce, Hadoop and Spark. Bompotas Agorakis

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

CSE 444: Database Internals. Lecture 23 Spark

Big Data Hadoop Course Content

HADOOP 3.0 is here! Dr. Sandeep Deshmukh Sadepach Labs Pvt. Ltd. - Let us grow together!

Introduction to MapReduce

CCA-410. Cloudera. Cloudera Certified Administrator for Apache Hadoop (CCAH)

Lecture 11 Hadoop & Spark

Clustering Lecture 8: MapReduce

Distributed Filesystem

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

Resource and Performance Distribution Prediction for Large Scale Analytics Queries

Batch Processing Basic architecture

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

Certified Big Data and Hadoop Course Curriculum

MapReduce and Friends

High Performance Computing on MapReduce Programming Framework

Introduction to Hadoop and MapReduce

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

Big Data Hadoop Stack

Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Facilitating Consistency Check between Specification & Implementation with MapReduce Framework

Performance Best Practices Paper for IBM Tivoli Directory Integrator v6.1 and v6.1.1

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove

RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

7 Deadly Hadoop Misconfigurations. Kathleen Hadoop Talks Meetup, 27 March 2014

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Global Journal of Engineering Science and Research Management

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

BigData and Map Reduce VITMAC03

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

TP1-2: Analyzing Hadoop Logs

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

A BigData Tour HDFS, Ceph and MapReduce

CS 345A Data Mining. MapReduce

Welcome to the New Era of Cloud Computing

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

CISC 7610 Lecture 2b The beginnings of NoSQL

Chapter 5. The MapReduce Programming Model and Implementation

WHITEPAPER. Improve Hadoop Performance with Memblaze PBlaze SSD

Parallelizing Multiple Group by Query in Shared-nothing Environment: A MapReduce Study Case

iihadoop: an asynchronous distributed framework for incremental iterative computations

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Large Scale Processing with Hadoop

Analysis in the Big Data Era

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

Resilient Distributed Datasets

Distributed computing: index building and use

Today s content. Resilient Distributed Datasets(RDDs) Spark and its data model

Research on Load Balancing in Task Allocation Process in Heterogeneous Hadoop Cluster

Workload Characterization and Optimization of TPC-H Queries on Apache Spark

Apache Hadoop.Next What it takes and what it means

Transcription:

Thanh-Chung Dao 1 Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse Thanh-Chung Dao and Shigeru Chiba The University of Tokyo

Thanh-Chung Dao 2 Supercomputers Expensive clusters Multi-core processors Large capacity of main memory High-speed network Focus mainly on compute-intensive applications Data-intensive workloads are emerging as supercomputing problems Graph processing Pre-processing of simulation data

Thanh-Chung Dao 3 MapReduce Simple parallel paradigm to process large datasets Hidden parallelization & communication PageRank example Input Splitting Mapping Shuffling Reducing Result Function Mapper Input PageA à PageB, PageC Begin N = outbound links For each outbound link output <Page, 1/N> End PageA à PageB, PageC PageB à PageC PageC à PageA, PageB PageA à PageB, PageC PageB à PageC PageC à PageA, PageB <PageB, 0.5> <PageC, 0.5> Shuffling Rank contribution Done automatically <PageC, 1> (Users can ignore) <PageA, 0.5> <PageB, 0.5> <PageA, Function 0.5> Reducer <PageA, 0.5> Input <PageA, x 1 >, <PageA, x n > Begin rank = 0 <PageB, 0.5> <PageB, 1> <PageB, For 0.5> each item x i rank += x i output <PageA, rank> End <PageC, 0.5> <PageC, 1> <PageC, 1.5> PageA 0.5 PageB 1 PageC 1.5

Thanh-Chung Dao 4 Hadoop MapReduce Standard of MapReduce implementation Provide easy-to-use MapReduce APIs TCP/IP-based communication Designed to run on commodity clusters Lab clusters, or Amazon EC2 Scalability (32,000 nodes at Yahoo) & Resilience Written in Java

Thanh-Chung Dao 5 Improving Hadoop MapReduce Performance on Supercomputers Hadoop MapReduce is good choice on supercomputers Maturity Productivity Resource allocation at runtime (# of processes, memory, CPU) Supercomputer Static Hadoop Dynamic Communication MPI TCP/IP Workload Compute-intensive Data-intensive

Thanh-Chung Dao 6 Our Approach JVM Reuse Statically create JVM processes and dynamically allocate to Hadoop tasks Enable efficient MPI communication by Hadoop tasks Statically created processes can exploit efficient MPI Dynamic allocation enables to use the original Hadoop implementation Shorten start-up time of processes Technique Process pool is used to implement JVM Reuse Minimize changes of the original Hadoop engine

Thanh-Chung Dao 7 Why MPI is required for Hadoop The de facto high-speed communication on supercomputers Improve slow MapReduce shuffling Enable Hadoop to co-host traditional MPI applications Combine MPI and MapReduce models Rich data analysis workflow Throughput (Mbps) 0 10000 30000 Efficient data sharing between MPI and MapReduce models E.g. MPI can access data located at Hadoop file system (HDFS) Throughput (Mbps) MPI TCP On FX10 supercomputer 10 times faster 2 0 2 4 2 8 2 12 2 16 2 20 2 26 Message size (Bytes)

Thanh-Chung Dao 8 Slow MapReduce shuffling on Hadoop TCP/IP-based communication JVM-Bypass (Wang et al., IPDPS 2013) MapTasks Map output 1 Map output n HTTP Servlet Server ReduceTasks Sort & Merge nodes Local disk Multiple requests at once Reducing Mapping Phase Shuffling Phase Reducing Phase

Thanh-Chung Dao 9 Dynamic Process Creation on MPI Discouraged on supercomputers Reasons of performance Collective mechanism (MPISpawn) Gang scheduling (error-prone if not enough resource) Gerbil (Xu el al., CCGrid 2015) Co-hosting MPI applications on Hadoop Creating dynamically processes Its experiments showed significant overhead Resources should be specified before running MPI applications Number of processes is known (static) Memory and CPU cores

Thanh-Chung Dao 10 Dynamic Process Creation on Hadoop Required Resources are allocated on demand to run MapReduce applications Number of processes is unknown (dynamic)

Thanh-Chung Dao 11 Dynamic Process Creation on Hadoop Required Resources are allocated on demand to run MapReduce applications Number of processes is unknown (dynamic) 1 Master node 2 n A Node

Thanh-Chung Dao 12 Dynamic Process Creation on Hadoop Required Resources are allocated on demand to run MapReduce applications Number of processes is unknown (dynamic) 6 tasks 1 User Job Submission Master node 6 tasks 2 8 tasks n A Node Request sending

Thanh-Chung Dao 13 Dynamic Process Creation on Hadoop Required Resources are allocated on demand to run MapReduce applications Number of processes is unknown (dynamic) 6 tasks 1 Process creation 6 processes User Job Submission Master node 6 tasks 2 Process creation 6 processes 8 tasks n Process creation 8 processes Processes A Node Request sending Each task is run on a process

Thanh-Chung Dao 14 Dynamic Process Creation on Hadoop Required Resources are allocated on demand to run MapReduce applications Number of processes is unknown (dynamic) 6 tasks 1 Process creation Task running User Job Submission Master node 6 tasks 2 Process creation Task running 8 tasks n Process creation Task running Processes A Node Request sending Each task is run on a process

Thanh-Chung Dao 15 Dynamic Process Creation on Hadoop Required Resources are allocated on demand to run MapReduce applications Number of processes is unknown (dynamic) 6 tasks 1 Process creation Terminated User Job Submission Master node 6 tasks 2 Process creation Terminated 8 tasks n Process creation Terminated Processes A Node Request sending Each task is run on a process

Thanh-Chung Dao 16 Dynamic Process Creation on Hadoop Required Resources are allocated on demand to run MapReduce applications Number of processes is unknown (dynamic) 1 User Job Completion Master node 2 n A Node Request sending

Thanh-Chung Dao 17 Idea of Reusing added Idle JVM processes Number of processes is statically fixed 1 idle idle idle Master node 2 idle idle idle Processes A Node n idle idle idle

Thanh-Chung Dao 18 Idea of Reusing added Idle JVM processes Number of processes is statically fixed 6 tasks 1 idle idle idle User Job Submission Master node 6 tasks 2 idle idle idle Processes A Node 8 tasks n Request sending idle idle idle

Thanh-Chung Dao 19 Idea of Reusing added Idle JVM processes Number of processes is statically fixed 6 tasks 1 Allocation Busy Busy idle User Job Submission Processes A Node Master node 6 tasks 8 tasks 2 n Request sending Allocation Busy Busy idle Busy Busy Busy

Thanh-Chung Dao 20 Idea of Reusing added Idle JVM processes Number of processes is statically fixed 6 tasks 1 Running Running idle User Job Submission Master node 6 tasks 2 Running Running idle Processes A Node 8 tasks n Request sending Running Running Running

Thanh-Chung Dao 21 Idea of Reusing added Idle JVM processes Number of processes is statically fixed 6 tasks 1 Cleanup Cleanup idle User Job Submission Master node 6 tasks 2 Cleanup Cleanup idle Processes A Node 8 tasks n Request sending Cleanup Cleanup Cleanup

Thanh-Chung Dao 22 Idea of Reusing added Idle JVM processes Number of processes is statically fixed 6 tasks 1 idle idle idle User Job Submission Master node 6 tasks 2 idle idle idle Processes A Node 8 tasks n Request sending idle idle idle

Thanh-Chung Dao 23 Idea of Reusing added Idle JVM processes Number of processes is statically fixed 1 idle idle idle User Job Completion Master node 2 idle idle idle Processes A Node n Request sending idle idle idle

Thanh-Chung Dao 24 JVM Reuse enables MPI communication MPI communication is established at the beginning JVM Reuse keeps processes running MPI connection is always available

Thanh-Chung Dao 25 JVM Reuse shortens start-up time JVM start-up flow of Program A Cmd: java A Class loader subsystem OS level process creation Class loading Class linking (verification & initializing) Invoke main() method of A Execution engine JIT compiler After A finishes, Program B wants to reuse JVM of A Execute A instructions Process creation & class loader are skipped Invoke main() method of B Execute B instructions

Thanh-Chung Dao 26 Iterative jobs benefit from JVM Reuse Iterative jobs Many short running JVM processes PageRank is an example Iterative job flow Yes, then quit No Stop Cond? Initial data Pre-processing job Job A Maps use results of the previous Iteration Job A Map Reduce Map Reduce Map Reduce

Thanh-Chung Dao 27 Implementation: Process Pool Process creation request Hadoop YARN (Node manager) Process Request sending State changing Pool Manager Round-robin scheduling Allocation Set busy Export env. variables Create new class loader Invoke main() method Busy Busy idle Busy Busy Busy idle De-allocation Clean static fields Unset busy Process Pool on each node

Thanh-Chung Dao 28 Our MPI shuffling design MapTasks Map output 2 Map output n Node 2 Local disk MPI send/recv Shuffle Manager One request at once ReduceTask 2 Sort & Merge Reducing Mapping Phase Shuffling Phase Reducing Phase

Thanh-Chung Dao 29 Reuse s Technical Issues Loading user s classes The original flow exports CLASSPATH before running Reflection Load user s classes at runtime Create a new class loader for each user Avoid class confliction Clean-up Static fields Security problem e.g. UserGroup static field Must be reset Current design Reset user information and job conf. static fields

Thanh-Chung Dao 30 Other Technical Issues Enable Hadoop YARN to host traditional MPI applications YARN is a resource manager Work in progress MPI AppMaster Monitor MPI ranks MPI Container Host a rank Avoid gang scheduling of MPI Work in progress

Thanh-Chung Dao 31 Evaluation Hadoop version v2.2.0 Changes in our implementation Line of code / total of Hadoop: ~1000 / 1,851,473 Number of classes / total of Hadoop: 9 / 35142

Thanh-Chung Dao 32 Cluster setup FX10 supercomputer Sparc64 Ixfx 1.848 GHz (16 cores) & 32GB RAM MPI over Tofu interconnection (5GB/s) Central storage Hadoop setup One master and many slaves OpenJDK 7 HDFS is run on the central storage OpenMPI 1.6 Java MPI binding (Vega-Gisbert et al.) MCA parameter: plm_ple_cpu_affinity = 0

Thanh-Chung Dao 33 Evaluation of JVM Reuse MPI benefit MPI vs. TCP/IP shuffling Tera-sort job Run on 32 FX10 nodes 4-slot pool & -Xmx4096 Start-up time JVM Reuse vs. the original PageRank iterative job 400 GB wikipedia data Run on 8 FX10 nodes 6-slot pool & -Xmx4096

Thanh-Chung Dao 34 MPI vs. TCP/IP shuffling Tera-sort Execution Time (s) 0 50 100 150 200 250 Nonblocking TCP Shuffle Blocking TCP shuffle Blocking MPI shuffle up to 10% faster 4 8 16 Input size (GB) Nonblocking : accept multi-connection at once Blocking : accept one connection at once

Thanh-Chung Dao 35 Shorten start-up time (PageRank) Task ID 1 30 64 98 137 180 223 Start-up (JVM & User info) Task initializing Shuffling (at Reducers) Data reading Task running Task finishing (MapOutput writing) Start-up time 1 st iteration 2 nd iteration Original Hadoop 0s 40s 80s 120s 160s 200s 240s 280s 320s 360s 400s Task ID 1 30 64 98 136 179 222 2 nd iteration 50 seconds faster JVM Reuse Hadoop 0s 40s 80s 120s 160s 200s 240s 280s 320s 360s 400s Time (s)

Thanh-Chung Dao 36 More iterations Sum of start-up time (s) 0 1000 3000 5000 Original Hadoop JVM Reuse-based Hadoop 1 2 4 6 8 Iteration number (n) Execution Time (s) 0 200 400 600 800 1000 Original Hadoop JVM Reuse-based Hadoop 1 2 4 6 8 Iteration number (n) Execution Time (s) 0 5000 10000 15000 20000 Start-up time Computation time Original Hadoop Approach JVM Reuse Hadoop Sum of start-up time Execution time Ratio

Thanh-Chung Dao 37 Related work M3R (VLDB 2012) Also apply JVM Reuse to enable in-memory MapReduce Not providing any optimization of JVM Reuse and its evaluation Hadoop MapReduce (HMR) engine is written in X10 We keep the original HMR engine with minimum changes JVM Reuse in Hadoop v1 (2012) Only for a single job JVM processes are terminated after their job is completed Gerbil: MPI + YARN (CCGrid 2015) Hadoop Yarn co-hosts MPI applications Long start-up time and significant overhead DataMPI (IPDPS 2014) Hadoop-like MapReduce implementation using MPI & C JVM-Bypass (IPDPS 2013) C-based shuffling engine & RDMA supported We focus on using MPI over Hadoop processes

Thanh-Chung Dao 38 Summary Improving Hadoop MapReduce performance on supercomputers Approach: JVM Reuse Statically create JVM processes and dynamically allocate to Hadoop tasks Enable efficient MPI communication on Hadoop Shorten start-up time Minimum changes of the original Hadoop Future work JVM Reuse drawback Affect CPU-bound tasks Co-host MPI applications more efficiently Full cleanup