Problems for Resource Brokering in Large and Dynamic Grid Environments

Similar documents
Problems for Resource Brokering in Large and Dynamic Grid Environments*

Extending a Distributed Usage SLA Resource Broker to Support Dynamic Grid Environments

DiPerF: automated DIstributed PERformance testing Framework

DiPerF: automated DIstributed PERformance testing Framework

VIRTUAL DOMAIN SHARING IN E-SCIENCE BASED ON USAGE SERVICE LEVEL AGREEMENTS

High Performance Computing Course Notes Grid Computing I

The Design, Usage, and Performance of GRUBER: A Grid Usage Service Level Agreement based BrokERing Infrastructure

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Grid Architectural Models

A Data Diffusion Approach to Large Scale Scientific Exploration

Assignment 5. Georgia Koloniari

A Performance Evaluation of WS-MDS in the Globus Toolkit

Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu, Ian Foster

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

This research work is carried out under the FP6 Network of Excellence CoreGRID

Grid Computing Systems: A Survey and Taxonomy

Making Gnutella-like P2P Systems Scalable

BUILDING A SCALABLE MOBILE GAME BACKEND IN ELIXIR. Petri Kero CTO / Ministry of Games

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Coherence & WebLogic Server integration with Coherence (Active Cache)

Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore

An agent-based peer-to-peer grid computing architecture

The Problem of Grid Scheduling

Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers

MOHA: Many-Task Computing Framework on Hadoop

CHAPTER 3 GRID MONITORING AND RESOURCE SELECTION

A LAYERED FRAMEWORK FOR CONNECTING CLIENT OBJECTIVES AND RESOURCE CAPABILITIES

Functional Requirements for Grid Oriented Optical Networks

IEPSAS-Kosice: experiences in running LCG site

Executing Evaluations over Semantic Technologies using the SEALS Platform

Grid Middleware and Globus Toolkit Architecture

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago

Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets

Early Measurements of a Cluster-based Architecture for P2P Systems

Quality of Service Aspects and Metrics in Grid Computing

ISTITUTO NAZIONALE DI FISICA NUCLEARE

UNICORE Globus: Interoperability of Grid Infrastructures

Grid Scheduling Architectures with Globus

Introduction to Grid Technology

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Introduction to Grid Computing

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS

Accommodating Bursts in Distributed Stream Processing Systems

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

AB Drives. T4 - Process Control: Virtualization for Manufacturing. Insert Photo Here Anthony Baker. PlantPAx Characterization & Lab Manager

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Grid Computing Security hack.lu 2006 :: Security in Grid Computing :: Lisa Thalheim 1

An Evaluation of Alternative Designs for a Grid Information Service

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

PROJECT FINAL REPORT

Data Management for the World s Largest Machine

Interconnect EGEE and CNGRID e-infrastructures

Andrea Sciabà CERN, Switzerland

A Capacity Planning Methodology for Distributed E-Commerce Applications

C-Meter: A Framework for Performance Analysis of Computing Clouds

Load Dynamix Enterprise 5.2

GridFTP Scalability and Performance Results Ioan Raicu Catalin Dumitrescu -

Routing protocols in WSN

Corral: A Glide-in Based Service for Resource Provisioning

Adaptive Cluster Computing using JavaSpaces

XSEDE High Throughput Computing Use Cases

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Empowering the Service Economy with SLA-aware Infrastructures in the project

Description of a Lightweight Bartering Grid Architecture

EFFICIENT SCHEDULING TECHNIQUES AND SYSTEMS FOR GRID COMPUTING

Grid Computing with Voyager

pre-ws MDS and WS-MDS Performance Study Ioan Raicu 01/05/06 Page 1 of 17 pre-ws MDS and WS-MDS Performance Study Ioan Raicu 01/05/06

Load Balancing Algorithm over a Distributed Cloud Network

Decentralized Grid Management Model Based on Broker Overlay

Chapter 5. Minimization of Average Completion Time and Waiting Time in Cloud Computing Environment

Chapter 3. Design of Grid Scheduler. 3.1 Introduction

Universität Stuttgart

The National Fusion Collaboratory

Preservation Planning in the OAIS Model

The LHC Computing Grid

Semantic SOA - Realization of the Adaptive Services Grid

PUB-2-SUB: A Content-Based Publish/Subscribe Framework for Cooperative P2P Networks

Real-time grid computing for financial applications

CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase. Chen Zhang Hans De Sterck University of Waterloo

Assessing performance in HP LeftHand SANs

EGEE and Interoperation

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme

OASIS: Self-tuning Storage for Applications

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Performance and Evaluation of Integrated Video Transmission and Quality of Service for internet and Satellite Communication Traffic of ATM Networks

CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status

Decentralized Resource Discovery and Management for Future Manycore Systems

Distributed Systems INF Michael Welzl

Special Topics: CSci 8980 Edge History

Tortoise vs. hare: a case for slow and steady retrieval of large files

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

The coolest place on earth

EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

Best of Breed Surveillance System SOLUTION WHITEPAPER

Resources and Services Virtualization without Boundaries (ReSerVoir)

Microsoft SQL Server Fix Pack 15. Reference IBM

Synergetics-Standard-SQL Server 2012-DBA-7 day Contents

SOFT 437 Quiz #2 February 26, 2015

Transcription:

Problems for Resource Brokering in Large and Dynamic Grid Environments Cătălin L. Dumitrescu Computer Science Department The University of Chicago cldumitr@cs.uchicago.edu (currently at TU Delft) Kindly presented by Ana Lucia Varbanescu (TU Delft)

Introduction Grids provide a means for harnessing the computational and storage power of widely distributed collections of resources Resources are provided under agreements that appear and vanish with a high frequency due to the environment's scale We distinguish between two types of participating entities: Providers: want to express uslas about resource availabilities Consumers: want to interpret uslas published by providers Usage Service Level Agreements (uslas): sharing rules about how a resource is used after access was granted C. Dumitrescu, M. Wilde, I. Foster, A Model for Usage Policy based Resource Sharing in VOs, Policy Workshop 2005, Stockholm, Sweden

What Is Grid Brokering? Resources in Grids are characterized by: independence from global control support for attribute based search availability governed by various local uslas Grid Brokering: (a) the automated discovery and negotiation of usage agreements based on the resource attributes and local uslas (b) the identification of the extended execution environment based on a Grid scheduler Our focus: the identification of requirements to support usla expression, publication, discovery, interpretation, enforcement, and verification provisioning the design ingredients for building a scalable distributed brokering service

Presentation Overview Introduction / What is Grid Brokering? An Overview of the Brokering Key Requirements Environment Examples The Three Requirements Illustration in a Concrete Case Some Background DI GRUBER Framework Enhanced DI GRUBER Description Performance Results Metrics Accuracy with Brokering Network Mesh Connectivity DP Scheduling Comparison with a P2P solution Conclusions

Environment Examples Open Science Grid (OSG, previously known as Grid3): multi VO environment that sustains production level services comprises more than 30 sites and 4500 CPUs, over 1300 simultaneous jobs and more than 2 TB/day aggregate data traffic participating sites are the main resource providers under various uslas LHC Computing Project (LCG): data storage and analysis infrastructure for the entire high energy physics community that will use the LHC (Large Hadron Collider) >100,000 CPUs required for data processing >5000 scientists in approx 500 research institutes worldwide focuses on developing and deploying computing services based on a Grid model requires the management of acquisition, installation, and capacity planning for a large number of commodity hardware components

Identified Brokering Requirements Support for brokering of dynamic and numerous resources Adequate accuracy independent of the infrastructure Fault tolerance

Dynamic and numerous resources Communities, providers and VOs might join a Grid environment for short time intervals (days to weeks) to quickly solve problems When the environment is large (> 1k providers and 10k consumers), changes may occur every hour Thus, rapid propagation is required for information about available resources new administrative policies about how resources are available The brokering infrastructure must be: distributed and not rely on a single decision point (DP) scalable, due to the size of the environment

Adequate Accuracy For a distributed infrastructure, several operations have to be considered, such as propagation, reconciliation and removal These operations may occur whenever new decisions are performed and new resources join or leave the environment The entire brokering infrastructure must become aware of these changes in a timely fashion manner

Fault Tolerance A client expects to perform scheduling operations over the Grid even when some failures occur When many clients perform queries, the brokering infrastructure must be able to cope adequately with the increased request load The employment of an adequate fault tolerance strategy is important to the client

Presentation Overview Introduction / What is Grid Brokering? An Overview of the Brokering Key Requirements Environment Examples The Three Requirements Illustration in a Concrete Case Some Background DI GRUBER Framework Enhanced DI GRUBER Description Performance Results Metrics Accuracy with Brokering Network Mesh Connectivity DP Scheduling Comparison with a P2P solution Conclusions

Illustration Settings What brokers? GRUBER and DI GRUBER grid brokers Where? Planet Lab nodes What size? Brokering testing for a grid 10x bigger than today s OSG/Grid3

GRUBER A Bit of History Started in Grid3 context as a monitoring tool Evolved in a site recommendation engine Later got enhanced with capabilities for: Enforcement components (Queue Managers) Complex uslas and specification interfaces Distributed capabilities (the start of DI GRUBER)

GRUBER: A Grid Broker Implements the brokering functionalities required for steering workloads in a distributed environment based on uslas Implemented as a Grid Web Service using Globus technology Does not perform job submission by itself, but can be used in conjunction with various grid job submission infrastructures Interfaced with Euryale and Pegasus (largely used on OSG/Grid3) for job execution C. Dumitrescu, I. Foster, GRUBER: A Grid Resource usla-based Broker, Euro-Par 2005, Lisboa, Portugal

GRUBER in a Nutshell C. Dumitrescu, I. Foster, GRUBER: A Grid Resource usla-based Broker, Euro-Par 2005, Lisboa, Portugal

DI GRUBER Framework A single usla management decision point providing brokering decisions over hundreds/thousands of jobs and sites => bottleneck Distributed GRUBER (DI GRUBER): extends GRUBER with support for distributed brokering provides a scalable management service with the same functionalities as GRUBER but in a distributed approach allows multiple decision points to coexist and cooperate in real time implemented as a two layer resource brokering service C. Dumitrescu, I. Raicu, I. Foster, DI-GRUBER: A Distributed Brokering Infrastructure, Super-Computing 2006, Seattle, USA

DI GRUBER Operational Mode Decision points (DPs / PEPs) are responsible for executing uslas: gather monitoring metrics and other information relevant to their operations use this information to steer resource allocations as specified by the uslas Two types of PEPs: Site PEPs VO PEPs C. Dumitrescu, I. Raicu, I. Foster, DI-GRUBER: A Distributed Brokering Infrastructure, Super-Computing 2006, Seattle, USA

DI GRUBER Novelty Capable to handle: Sites with RMs & VOs Multiple submission hosts based on GTx technology Model usage allocations (uslas) at several levels Large grids with many resources and users Dynamic VO bootstrap Capacity to: Collect monitoring metrics from a grid Make various decisions based on this information Enforce complex uslas by various means

DI GRUBER Enhancements Transparent Decision Point (DP) Bootstrapping: DPs register with WS Index Service at startup and unregister when leave Transparent Client Scheduling: DPs and clients use the registry to discover the infrastructure clients are scheduled to DPs based on a least used (LU) strategy whenever a DP stops responding, its clients are re scheduled Failure Handling: based on a fault signaling mechanism: every time a client fails to communicate with a DP, it sends a request fault based on faults and a specific policy new DP are started (WS GRAM invocation)

Presentation Overview Introduction / What is Grid Brokering? An Overview of the Brokering Key Requirements Environment Examples The Three Requirements Illustration in a Concrete Case Some Background DI GRUBER Framework Enhanced DI GRUBER Description Performance Results Metrics Accuracy with Brokering Network Mesh Connectivity DP Scheduling Comparison with a P2P solution Conclusions

Metrics for Performance Analysis Response (RTi = individual response for job i; N = number of jobs): Response = Σ i=1..n RT i / N Throughput: defined as the number of requests completed successfully by the service per time unit (second) Accuracy (SAi = ratio of free resources at the selected site for the job i to total free resources in the grid): Accuracy = Σ i=1..n (SA i ) / N

Experimental Settings Emulated environment: 300 sites, approx 40,000 nodes (a grid 10 x today s OSG/Grid3) based on OSG/Grid3 configuration in terms of CPU counts and network connectivity 120 submission hosts Synthetic workloads in which jobs are: arriving with a rate of 1 job/s at each submission host submitted by a submission host to a site, but queued or held running at a site and completed Composite workloads that overlay work for 60VOs, 10 groups/vo A submission hosts maintains connection with a DP, selected w/ a given policy (LU / random) The experiment duration is 3600s (1 hour)

Accuracy vs. Mesh Connectivity DP connectivity: 100% connectivity 50% connectivity 25% connectivity Accuracy drops linearly with connectivity degree of each DP Utilization is low because jobs do not start all in the beginning Table. DI GRUBER Accuracy Performance with Mesh Connectivity Connectivity Util Accuracy All 35% 75% One half 27% 62% One fourth 20% 55% Requests Handled by GRUBER Total Request All 41% 68% One half 30% 60% One fourth 21% 50%

Throughput (random vs. LU) 20 18 Throughput (run10) Throughput (run3) Throughput (queries / sec 10 9 8 7 6 5 Throughput (10DP) Throughput (3DP) Throughput (1DP) Throughput (queries / sec) 16 14 12 10 8 6 4 3 2 1 0 0 10 20 30 40 50 60 70 80 90 100 110 120 Load (# of concurent clients) 4 2 0 0 10 20 30 40 50 60 70 80 90 100

Response (random vs. LU) Response Time (sec) 100 90 80 70 Response Time (1DP) Response Time (3DP) Response Time (10DP) 60 50 40 Response Time (se 60 50 40 Response Time (run3) Response Time (run10) 30 30 20 20 10 10 0 0 10 20 30 40 50 60 70 80 90 100 110 120 0 0 10 20 30 40 50 60 70 80 90 100 Load (# of concurent clients)

DP Scheduling and Gains On average, we find: modest improvements for 3 decision points (19% higher throughput and 8% lower response time) significant improvements for 10 decision points (68% higher throughput and 70% lower response times).

Comparison with a P2P System PAST free distributed lookup service (based on PASTRY) Response: 90% smaller compared to the 3 DP DI GRUBER 50% smaller compared to the 10 DP DI GRUBER P2P has higher variance in the beginning (the stabilization of the P2P network) Throughput: 7 times higher compared to the 3 DP DI GRUBER Only 1.6 times higher compared to the 10 DP DI GRUBER The message lost rate for P2P network is much higher compared with the DI GRUBER's one

Conclusions What are the key requirements an already existing management infrastructure should meet in order to support large and dynamic Grid environments? Our experimental results show in a concrete case the importance of this question: the brokering accuracy decreases almost linearly with the loss of connectivity for a single decision point instance the performance of the system almost doubles in the 10 decision points case due to the better repartition of the clients

Summary We have identified three essential requirements a brokering infrastructure must meet when deployed in large and dynamic Grids We have analysed their feasibility and performance improvements in two novel metrics for DI GRUBER: brokering accuracy as a function of infrastructure components connectivity performance gains when using automated decision point scheduling for the clients We have compared the performance of this brokering solution with a P2P based system for file management

Acknowledgements Ian Foster Michael Wilde Jens S. Vöckler Yong Zhao Ioan Raicu Luiz Meyer ivdgl / GriPhyN Teams

The end Thank you for your attention! For more in depth details, I (the speaker) recommend: Read the paper (that s what I did ) Ask me I will try to answer I will FWD your questions to Mr. Dumitrescu Skip the speaker, send an e mail: c.l.dumitrescu@tudelft.nl