Labs of the World, Unite!!!

Similar documents
Labs of the World, Unite!!!

Laboratório de Sistemas Distribuídos, Universidade Federal de Campina Grande. {zflavio, Eliane Araújo

Introduction to Grid Computing

Peer-to-peer grid computing with the OurGrid Community

GridUnit: Using the Computational Grid to Speed up Software Testing

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Scheduling in Bag-of-Task Grids: The PAUÁ Case

Systematic Cooperation in P2P Grids

Virtualization & On-Premise Cloud

GerpavGrid: using the Grid to maintain the city road system

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

<Insert Picture Here> Enterprise Data Management using Grid Technology

High Throughput WAN Data Transfer with Hadoop-based Storage

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

DISTRIBUTED computing, in which large-scale computing

Computing as a Service

20 Fast Facts About Microsoft Windows Server 2012

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

Chapter 20: Database System Architectures

Assignment 5. Georgia Koloniari

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

Database Architectures

Personal Grid. 1 Introduction. Zhiwei Xu, Lijuan Xiao, and Xingwu Liu

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Layers of an information system. Design strategies.

L3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat

MONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT

INTEGRATING DELL EQUALLOGIC SANS WITH CITRIX XENSERVER

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Distributed Systems LEEC (2006/07 2º Sem.)

Dell EMC ScaleIO Ready Node

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery

The Google File System

Chapter 2 Example Modeling and Forecasting Scenario

A Universal Micro-Server Ecosystem Exceeding the Energy and Performance Scaling Boundaries

CloudLab. Updated: 5/24/16

HP environmental messaging

P2P technologies, PlanetLab, and their relevance to Grid work. Matei Ripeanu The University of Chicago

Big Data Retos y Oportunidades

2014 VMware Inc. All rights reserved.

Xerox and Cisco Identity Services Engine (ISE) White Paper

IOS: A Middleware for Decentralized Distributed Computing

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010.

Multiprocessor Scheduling. Multiprocessor Scheduling

Multiprocessor Scheduling

Introduction to Distributed Systems (DS)

Update on EZ-Grid. Priya Raghunath University of Houston. PI : Dr Barbara Chapman

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

Grid Architectural Models

ANALYSIS OF PERFORMANCE BOTTLENECK OF P2P GRID APPLICATIONS

Lecture 23 Database System Architectures

Cloud Computing. Summary

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system

NETWORK MANAGEMENT NEEDS NEW IDEAS

CAS 703 Software Design

Distributed Systems CS6421

THE API DEVELOPER EXPERIENCE ENABLING RAPID INTEGRATION

Knowledge Discovery Services and Tools on Grids

Chapter 3 Virtualization Model for Cloud Computing Environment

IBM WebSphere Message Broker for z/os V6.1 delivers the enterprise service bus built for connectivity and transformation

Sentinet for Microsoft Azure SENTINET


Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

VMware vsphere 5.5 Advanced Administration

7 Things ISVs Must Know About Virtualization

HPC learning using Cloud infrastructure

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne

VMware vsphere 4.0 The best platform for building cloud infrastructures

The End of Storage. Craig Nunes. HP Storage Marketing Worldwide Hewlett-Packard

NC Education Cloud Feasibility Report

Designing and debugging real-time distributed systems

Abstract. 1. Introduction

Microsoft SQL Server on Stratus ftserver Systems

2008 WebSphere System z Podcasts - Did you say Mainframe?

Worldcast Research M-PRESS

10 Steps to Virtualization

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Networking for a smarter data center: Getting it right

VMWARE PROTECTION WITH DELL EMC NETWORKER 9

The SD-WAN security guide

Overview. Distributed Systems. Distributed Software Architecture Using Middleware. Components of a system are not always held on the same host

Cycle Sharing Systems

Introduction to User Stories. CSCI 5828: Foundations of Software Engineering Lecture 05 09/09/2014

Grid Computing Security hack.lu 2006 :: Security in Grid Computing :: Lisa Thalheim 1

ScaleArc for SQL Server

Electricity Security Assessment Framework

Virtualization. Michael Tsai 2018/4/16

Availability in the Modern Datacenter

Oracle and Tangosol Acquisition Announcement

Grid Middleware and Globus Toolkit Architecture

Description of a Lightweight Bartering Grid Architecture

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

The UK s National Cyber Security Strategy

MyCloud Computing Business computing in the cloud, ready to go in minutes

Symantec NetBackup 7 for VMware

UNCLASSIFIED. FY 2016 Base FY 2016 OCO

Cloud Computing. What is cloud computing. CS 537 Fall 2017

VMware vsphere with ESX 4.1 and vcenter 4.1

Mapping to the National Broadband Plan

SAS and Grid Computing Maximize Efficiency, Lower Total Cost of Ownership Cheryl Doninger, SAS Institute, Cary, NC

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

Transcription:

Labs of the World, Unite!!! Walfredo Cirne walfredo@dsc.ufcg.edu.br Universidade Federal de Campina Grande, Brasil Departamento de Sistemas e Computação Laboratório de Sistemas Distribuídos http://www.lsd.ufcg.edu.br/

e-science Computers are changing scientific research Enabling collaboration As investigation tools (simulations, data mining, etc...) As a result, many research labs around the world are now computation hungry Buying more computers is just part of answer Better using existing resources is the other

Solution 1: Voluntary Computing SETI@home, FightAIDS@home, Folding@home, YouNameIt@home SETI@home reported the harvesting of more than 2.2 million years of CPU time, from over 5.3 million users, spread over 226 countries However, to use this solution, you must have a very good support team to run the server invest a good deal of effort in advertising have a very high visibility project be in a prestigious institution

Solution 2: Globus Grids promise plug on the wall and solve your problem Globus is the closest realization of such vision Deployed for dozens of sites But it requires highly-specialized skills and complex off-line negotiation Good solution for large labs that work in collaboration with other large labs CERN s LCG, GRID 5000, TeraGrid and some others

And what about the many thousands of small and middle research labs throughout the world which also need lots of computing power?

Most labs in the world Are small Focus their research on some narrow topic Do not belong to top Universities Cannot count on cutting-edge computer support team Yet they increasingly demand large amounts of computing power, just as large labs and highvisibility projects do

Solution 3: peer-to-peer grid Each lab correspond to a peer in the system and contributes with its idle resources cpu utilization for lab 1 1 real time cpu utilization for lab 2 1 real time cpu utilization for the p2p grid 2 real time

OurGrid Design Principles Labs can freely join the system without any human intervention No need for negotiation; no paperwork Clear incentive to join the system One can t be worse off by joining the system Noticeable increased response time Freeriding resistant Basic dependability properties Some level of security Some resilience to failures Scalability Easy to install, configure and program No need for specialized support team

But there is no free lunch To simplify the problem, we focus on Bag-of-Tasks (BoT) applications No need for communication among tasks Facilitates scheduling and security enforcement Simple fail-over/retry mechanisms to tolerate faults No need for QoS guarantees Script-based programming is natural

Fortunately, many important applications are BoT! Data mining Massive search Bio computing Parameter sweep Monte Carlo simulations Fractal calculations Image processing And many others

OurGrid Architecture OurGrid: A peer-to-peer network that performs fair resource sharing among unknown peers MyGrid: A broker that efficiently schedules BoT applications without requiring information neither on system load nor on applications and resources characteristics SWAN: A VM-based sandbox that makes it safe for a peer to run a foreign computation

OurGrid Architecture User Interface Application Scheduling Sandboxing Site Manager Grid-wide Resource Sharing

An Example: Factoring with MyGrid task: init: store./fat.class $PLAYPEN grid: java Fat 3 18655 34789789799 output-$job-$task final: get $PLAYPEN/output-$JOB-$TASK results task: init: store./fat.class $PLAYPEN grid: java Fat 18656 37307 34789789799 output-$job-$task final: get $PLAYPEN/output-$JOB-$TASK results task: init: store./fat.class $PLAYPEN grid: java Fat 37308 55968 34789789799 output-$job-$task final: get $PLAYPEN/output-$JOB-$TASK results...

MyGrid GUI

Simple abstractions to write an application File transfer put, store, get Hide heterogeneity $PLAYPEN, $STORAGE Define constraints Job requirements and grid machine attributes

Scheduling with no Information Grid scheduling typically depends on information about the grid (e.g. machine speed and load) and the application (e.g. task size) However, getting accurate information about all applications and resources is hard Can we efficiently schedule tasks without requiring access to information? This would make the system much easier to deploy and simple to use

Workqueue with Replication Tasks are sent to idle processors When there are no more tasks, running tasks are replicated on idle processors The first replica to finish is the official execution Other replicas are cancelled

Evaluation 8000 experiments Experiments varied in grid heterogeneity application heterogeneity application granularity Performance summary: Sufferage DFPLTF Workqueue WQR 2x WQR 3x WQR 4x Average 13530.26 12901.78 23066.99 12835.70 12123.66 11652.80 Std. Dev. 9556.55 9714.08 32655.85 10739.50 9434.70 8603.06

WQR Overhead Obviously, the drawback in WQR is cycles wasted by the cancelled replicas Wasted cycles: WQR 2x WQR 3x WQR 4x Average 23.55% 36.32% 48.87% Std. Dev. 22.29% 34.79% 48.93%

Data Aware Scheduling WQR achieves good performance for CPUintensive BoT applications However, many important BoT applications are data-intensive These applications frequently reuse data During the same execution Between two successive executions

Storage Affinity Storage Affinity uses replication and just a bit of static information to achieve good scheduling for data intensive applications Storage Affinity uses information on which data servers have already stored a data item

Storage Affinity Results 3000 experiments Experiments varied in grid heterogeneity application heterogeneity application granularity Performance summary: Storage Affinity X-Suffrage WQR Average (seconds) 57.046 59.523 150.270 Standard Deviation 39.605 30.213 119.200

OurGrid Architecture User Interface Application Scheduling Sandboxing Site Manager Grid-wide Resource Sharing

Network of Favors It is important to encourage collaboration within OurGrid (i.e. resource sharing) In file-sharing, most users freeride OurGrid uses the Network of Favors All peers maintain a local balance for all known peers Peers with greater balances have priority Newcomers and peers with negative balance are treated equally The emergent behavior of the system is that by donating more, one gets more resources back No additional infrastructure is needed

Network of Favors * = sem recursos ociosos B D 60 45 * = sem recursos ociosos MyGrid A D * ConsumerFavor ProviderFavorReport B C * E

Network of Favors B 60 * = sem recursos ociosos D 45 A E 0 * ConsumerQuery ProviderWorkRequest * D MyGrid C MyGrid B E

Free-rider Consumption Epsilon is the fraction of resources consumed by free-riders

Equity Among Collaborators

Autonomous Accounting

OurGrid Architecture User Interface Application Scheduling Sandboxing Site Manager Grid-wide Resource Sharing

SWAN: OurGrid Security Running an unknown application that comes from an unknown peer is a clear security threat We leverage the fact that BoT applications only communicate to receive input and return the output The remote task runs inside a Xen virtual machine, with no network access, and disk access only to a designated partition Input/output is done by OurGrid itself that runs in a Xen virtual machine Sanity checks are executed before a new task is run

SWAN Architecture Grid Application Grid Application Grid Application Grid OS Grid OS Grid Middleware Grid OS Guest OS Grid Grid Middleware Middleware Guest Guest OS OS

On-going research Sabotage-tolerance Checkpointing Adaptive scheduling Leveraging the power of meat-space communities Going beyond BoT Improved virtualization support Better presenting OurGrid components in a more diverse SOA setting

OurGrid Status OurGrid is open source (GPL) and version 3.2 is available at www.ourgrid.org OurGrid free-to-join grid is in production since December 2004 We currently have 500 machines spread around 30 sites See status.ourgrid.org for the current status of the system

Conclusions We have a free-to-join grid solution for BoT applications that is working today Climate forecast and management of water resources Development of drugs for the Brazilian HIV variant Optimization of pipeline operation in oil industry Support for efficient test-driven programming Simulations, including those that support our own research HP-funded Itanium-based grid empowering education Real users provide invaluable feedback for systems research Delivering results to real users has been really cool!

Contacts and URLs Walfredo Cirne (walfredo@dsc.ufcg.edu.br) Francisco Brasileiro (fubica@dsc.ufcg.edu.br) LSD site (http://www.lsd.ufcg.edu.br/) OurGrid site (http://www.ourgrid.org/) Related projects COPAD (http://portalcopad.dca.ufcg.edu.br/) SegHidro (http://seghidro.lsd.ufcg.edu.br/) Bio Pauá (https://www.biopaua.lncc.br/engl/index.php) Portal GIGA (http://portalgiga.unisantos.edu.br/) GridUnit (http://gridunit.sourceforge.net/) FailureSpotter (http://www.spotter.lsd.ufcg.edu.br/)

Thank you! Merci! Danke! Grazie! Gracias! Obrigado!

Grids & Itanium What are the opportunities?

What is your experience with grids? UFCG: We developed our grid technology to keep things simples CERN? Others?

Development It is clearly very important for Itanium that grid solutions run well on Itanium CERN LCG middleware and applications What else?

Research Can we use the grid massive parallelism to automatically tune applications for Itanium? What else?

What about Itanium? We view our role in Gelato as assuring that grid software run well (or better) in Itanium We currently feel need of: better Java performance virtualization services