Mixing and matching virtual and physical HPC clusters. Paolo Anedda

Similar documents
MapReduce. U of Toronto, 2014

Lecture 11 Hadoop & Spark

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

BigData and Map Reduce VITMAC03

Hadoop/MapReduce Computing Paradigm

MI-PDB, MIE-PDB: Advanced Database Systems

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

Big Data for Engineers Spring Resource Management

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

A brief history on Hadoop

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

Distributed Systems 16. Distributed File Systems II

Overview. Why MapReduce? What is MapReduce? The Hadoop Distributed File System Cloudera, Inc.

Hadoop and HDFS Overview. Madhu Ankam

Big Data 7. Resource Management

CA485 Ray Walshe Google File System

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Distributed File Systems II

Distributed Systems CS6421

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Hadoop-PR Hortonworks Certified Apache Hadoop 2.0 Developer (Pig and Hive Developer)

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

CCA-410. Cloudera. Cloudera Certified Administrator for Apache Hadoop (CCAH)

Distributed Filesystem

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

Enhanced Hadoop with Search and MapReduce Concurrency Optimization

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

HADOOP 3.0 is here! Dr. Sandeep Deshmukh Sadepach Labs Pvt. Ltd. - Let us grow together!

The Google File System. Alexandru Costan

HADOOP FRAMEWORK FOR BIG DATA

CLIENT DATA NODE NAME NODE

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Introduction to Hadoop and MapReduce

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

Evaluation of Apache Hadoop for parallel data analysis with ROOT

Analytics in the cloud

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

Introduction to MapReduce

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

Cloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University

Facilitating Consistency Check between Specification & Implementation with MapReduce Framework

Hadoop. copyright 2011 Trainologic LTD

Distributed Systems. CS422/522 Lecture17 17 November 2014

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

August Li Qiang, Huang Qiulan, Sun Gongxing IHEP-CC. Supported by the National Natural Science Fund

Chapter 5. The MapReduce Programming Model and Implementation

The Analysis Research of Hierarchical Storage System Based on Hadoop Framework Yan LIU 1, a, Tianjian ZHENG 1, Mingjiang LI 1, Jinpeng YUAN 1

Vendor: Cloudera. Exam Code: CCA-505. Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam.

CLOUD-SCALE FILE SYSTEMS

Hadoop MapReduce Framework

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud

Distributed Computing.

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) handle appends efficiently (no random writes & sequential reads)

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

Hadoop Virtualization Extensions on VMware vsphere 5 T E C H N I C A L W H I T E P A P E R

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

CSE 124: Networked Services Lecture-16

Database Applications (15-415)

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Modeling and Optimization of Resource Allocation in Cloud

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

CS370 Operating Systems

3. Monitoring Scenarios

CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase. Chen Zhang Hans De Sterck University of Waterloo

CSE 124: Networked Services Fall 2009 Lecture-19

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Abstract. 1. Introduction. 2. Design and Implementation Master Chunkserver

PaaS and Hadoop. Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University

Distributed System. Gang Wu. Spring,2018

TP1-2: Analyzing Hadoop Logs

Google File System 2

Batch Inherence of Map Reduce Framework

L5-6:Runtime Platforms Hadoop and HDFS

VIAF: Verification-based Integrity Assurance Framework for MapReduce. YongzhiWang, JinpengWei

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

Certified Big Data Hadoop and Spark Scala Course Curriculum

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

2/26/2017. For instance, consider running Word Count across 20 splits

MapReduce-style data processing

Introduction to MapReduce. Instructor: Dr. Weikuan Yu Computer Sci. & Software Eng.

Programming Systems for Big Data

Mounica B, Aditya Srivastava, Md. Faisal Alam

MapReduce Simplified Data Processing on Large Clusters

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Google File System. Arun Sundaram Operating Systems

Dept. Of Computer Science, Colorado State University

Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391

Certified Big Data and Hadoop Course Curriculum

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies

Transcription:

Mixing and matching virtual and physical HPC clusters Paolo Anedda paolo.anedda@crs4.it HPC 2010 - Cetraro 22/06/2010 1

Outline Introduction Scalability Issues System architecture Conclusions & Future Works 2

Introduction Scalability Issues System architecture Conclusions & Future Works 3

Supporting High Data Producing Applications

HPC: The Traditional Way Based on a (almost) fixed hardware/software platform. Good for standard production environments. Unsuitable for research and development enviroments. It lacks flexibility. 5

What if... We need to support multiple computational paradigms at the same time? We need to deploy transient experimental clusters? We need to deploy multiple development environment? We need to experiment new solutions?

Why Virtual Clusters? Virtualization is a consitent technology. Support for Multiple Computational Paradigms. Virtual Cluster makes the management of HPC environments flexible. The loss of performances can be acceptable (~5%). Support for hardware accelerator. Virtual Clusters can be saved for later use.

But... Virtual clusters operations can lead to scalability problems. Managing virtual clusters can be very difficult with traditional tools. Some users still want to run their code on traditional systems.

HPC: ANewWay Build Smarter Management Tools: Enable dynamic and flexible computational environments. Very different computational approaches can coexist on the same physical facility: Map-reduce. Standard parallel jobs. Virtual HPC Clusters. 9

Introduction Scalability Issues System architecture Conclusions & Future Works 10

Virtual Clusters Virtual cluster are collections of virtual machines deployed and managed as a single entity (Foster et al. 2006). HPC virtual cluster are atomic objects I.e., macro-computing task subdivided between the VC nodes. HPC virtual clusters are big objects E.g., 128 nodes (3GB disk + 8GB memory) ~ 1.5TB. 11

Three BasicOperations on VMImages Repository Repository Get: one -> many save: many -> many Repository restore: many -> many

NFS based image transfer

VCOperations VM Disk images are, as far as the repository is concerned, WORM (Write Once Read Many) objects. Get, save and restore are all simple I/O operations: only one client writes and writes sequentially; when a file is closed is closed, no appends needed;suitable for applications that have large data sets. It appears that HDFS (KFS, GFS ) should be ok.

HDFS Distributed File System designed to run on commodity hardware. Suitable for applications that have large data sets. Highly Fault-Tolerant.

Measurement Procedure S := # of physical cluster nodes N := # of virtual cluster nodes R := block replication Blocksize := 64MB Procedure Allocate a cluster with S nodes and install HDFS Save reference image in HDFS (from a node NOT in the cluster) Randomly select groups of N=2,4,8,16,32,,S nodes from the cluster Use dsh for concurrent get, save and restore requests.

Transfer Time

Effective Bandwidth

Introduction Scalability Issues System architecture Conclusions & Future Works 19

Requirements Flexibility. Scalability. Support for Multiple Computational Paradigms. Encapsulation. Reliability and Security. High Performances. 20

Architecture VIDA: Allocate the Virtual Clusters. Manages all the Virtual Clusters operations. Gridengine Allocate the physical resources. Support different computational environment. HDFS: A parallel filesystem. HaDeS: A physical images deployment tool.

Gridengine An open source batch-queuing system. Supports advance reservation. Supports multiple computational paradigms. Integration with Hadoop.

VIDA: Why another tool? Traditional tools: Virtual Machines Oriented. Management operations are carried on using a polling approach. Aren't very reliable. VIDA: Virtual Cluster Oriented. Management based on a heartbeat approach. Very reliable.

VIDAArchitecture Virtual Cluster Tracker (VCT): Manages all the clusters operations, coordinates the creation of each single virtual machine, collect and mantain all the status informations coming from the VMs on each node. 24

VIDAArchitecture Virtual Machine Tracker (VMT): Coordinates the operations on a specific node, reports the status of the physical resources available on the host to the VCT, creates and manages the virtual machines according to the directives received from the VCT. Virtual Machine Handler (VMH): Control and administer a single virtual machine. 25

VCT Service Interface Heartbeat Service FSManager Resources Scheduler

VIDA-GE Integration 27

Virtual Bubble Cluster

VIDAScalability Deploy Time vs Virtual Nodes Number. Average Data Transfer vs Virtual Nodes Number. Settings: Core number: 132 25000000 20000000 Replication Factor: 3 15000000 Average Data Transfert Image size: 4.39 GB 10000000 5000000 0 7 5 9 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Nodes Number

Introduction Virtualization and Cloud Computing Virtual Clusters System architecture Conclusions & Future Works 30

Conclusions Virtual Clusters simplify the management of HPC environments making them more flexible. Gridengine+VIDA = A very flexible architecture for the deployment and management of Virtual Clusters. Gridengine+VIDA+HDFS = A scalable architecture for the deployment and management of Virtual Clusters.

Future Works Support for encrypted filesystems. VMs commissioning and decomissioning. Integration with the Haizea Scheduler. Release VIDA on Sourceforge.

THANK YOU! 33

WORMData Need Specialized Filesystems datanode Block 1 mapper worker 1 datanode mapper worker n namenode Block N HDFS master

Virtual map-reduce cluster datanode worker-1 reducer mapper mapper Task tracker namenode JobTracker HDFS master datanode worker-n reducer mapper mapper Task tracker MapReduce master

HDFS Distributed File System designed to run on commodity hardware. Suitable for applications that have large data sets. Highly Fault-Tolerant.

Hadoop datanode worker-1 reducer mapper mapper Task tracker namenode JobTracker HDFS master datanode worker-n reducer mapper mapper Task tracker MapReduce master

The Heartbeat Mechanism

The Heartbeat Mechanism

The Heartbeat Mechanism

The Heartbeat Mechanism

The Heartbeat Mechanism