The Problem of Grid Scheduling

Similar documents
Sphinx: A Scheduling Middleware for Data Intensive Applications on a Grid

EFFICIENT SCHEDULING TECHNIQUES AND SYSTEMS FOR GRID COMPUTING

LIGO Virtual Data. Realizing. UWM: Bruce Allen, Scott Koranda. Caltech: Kent Blackburn, Phil Ehrens, Albert. Lazzarini, Roy Williams

Grid Compute Resources and Job Management

Cloud Computing. Up until now

SPHINX: A Fault-Tolerant System for Scheduling in Dynamic Grid Environments

Pegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute

Part IV. Workflow Mapping and Execution in Pegasus. (Thanks to Ewa Deelman)

Pegasus WMS Automated Data Management in Shared and Nonshared Environments

Grid Architectural Models

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

Carelyn Campbell, Ben Blaiszik, Laura Bartolo. November 1, 2016

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Grid Compute Resources and Grid Job Management

Introduction to Grid Computing

Production Grids. Outline

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

Grid Scheduling Architectures with Globus

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Overview. Scientific workflows and Grids. Kepler revisited Data Grids. Taxonomy Example systems. Chimera GridDB

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

BnP on the Grid Russ Miller 1,2,3, Mark Green 1,2, Charles M. Weeks 3

STORK: Making Data Placement a First Class Citizen in the Grid

CMS Tier-2 Program for user Analysis Computing on the Open Science Grid Frank Würthwein UCSD Goals & Status

Globus GTK and Grid Services

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

BOSCO Architecture. Derek Weitzel University of Nebraska Lincoln

The GridWay. approach for job Submission and Management on Grids. Outline. Motivation. The GridWay Framework. Resource Selection

Introduction to Grid Computing

Grid Middleware and Globus Toolkit Architecture

Mapping Abstract Complex Workflows onto Grid Environments

Workflow, Planning and Performance Information, information, information Dr Andrew Stephen M c Gough

Planning the SCEC Pathways: Pegasus at work on the Grid

Architecture Proposal

The Integration of Grid Technology with OGC Web Services (OWS) in NWGISS for NASA EOS Data

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Problems for Resource Brokering in Large and Dynamic Grid Environments

CMS HLT production using Grid tools

The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4

Managing large-scale workflows with Pegasus

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS

The glite middleware. Presented by John White EGEE-II JRA1 Dep. Manager On behalf of JRA1 Enabling Grids for E-sciencE

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

Meeting the Challenges of Managing Large-Scale Scientific Workflows in Distributed Environments

Juliusz Pukacki OGF25 - Grid technologies in e-health Catania, 2-6 March 2009

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager

WMS overview and Proposal for Job Status

UNIT IV PROGRAMMING MODEL. Open source grid middleware packages - Globus Toolkit (GT4) Architecture, Configuration - Usage of Globus

GriPhyN-LIGO Prototype Draft Please send comments to Leila Meshkat

Future Developments in the EU DataGrid

LCG-2 and glite Architecture and components

Gergely Sipos MTA SZTAKI

GT 4.2.0: Community Scheduler Framework (CSF) System Administrator's Guide

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project

A Distributed Media Service System Based on Globus Data-Management Technologies1

What makes workflows work in an opportunistic environment?

Quick Guide GriPhyN Virtual Data System.

Based on: Grid Intro and Fundamentals Review Talk by Gabrielle Allen Talk by Laura Bright / Bill Howe

Dynamic Workflows for Grid Applications

EGEE and Interoperation

Layered Architecture

Cloud Computing. Summary

Transparent Grid Computing: a Knowledge-Based Approach

DiPerF: automated DIstributed PERformance testing Framework

Managing Large-Scale Scientific Workflows in Distributed Environments: Experiences and Challenges

Planning and Metadata on the Computational Grid

THE WIDE AREA GRID. Architecture

User Tools and Languages for Graph-based Grid Workflows

Usage of LDAP in Globus

Knowledge Discovery Services and Tools on Grids

Multiple Broker Support by Grid Portals* Extended Abstract

By Ian Foster. Zhifeng Yun

CSF4:A WSRF Compliant Meta-Scheduler

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

Chapter 3. Design of Grid Scheduler. 3.1 Introduction

NUSGRID a computational grid at NUS

The Grid: From Parallel to Virtualized Parallel Computing

The glite middleware. Ariel Garcia KIT

A Practical Approach for a Workflow Management System

Introduction, History

IMAGE: An approach to building standards-based enterprise Grids

On the Use of Cloud Computing for Scientific Workflows

High Throughput Urgent Computing

High Throughput WAN Data Transfer with Hadoop-based Storage

Functional Requirements for Grid Oriented Optical Networks

High Performance Computing Course Notes Grid Computing I

Grid Interoperation and Regional Collaboration

Grid Computing: Status and Perspectives. Alexander Reinefeld Florian Schintke. Outline MOTIVATION TWO TYPICAL APPLICATION DOMAINS

Design The way components fit together

Introduction to Grid Technology

Workflow Management and Virtual Data

GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content

Data Throttling for Data-Intensive Workflows

Michigan Grid Research and Infrastructure Development (MGRID)

Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System

On the employment of LCG GRID middleware

Grid-Based Galaxy Morphology Analysis for the National Virtual Observatory

Clouds: An Opportunity for Scientific Applications?

Transcription:

Grid Scheduling

The Problem of Grid Scheduling Decentralised ownership No one controls the grid Heterogeneous composition Difficult to guarantee execution environments Dynamic availability of resources Ubiquitous monitoring infrastructure needed Complex policies Issues of trust Lack of accounting infrastructure May change with time

A Real Life Example Merge two grids into a single multi-vo Inter-Grid How to ensure that neither VO is harmed? both VOs actually benefit? there are answers to questions like: LBL Caltech UW UC UI ANL With what probability will my job be scheduled and complete before my conference deadline? UCSD UTA OU SMU Rice IU UM FNAL BU MIT BNL UF Clear need for a scheduling middleware!

Emerging Challenge: Intra-VO Management Today s Grid Few Production Managers Tomorrow s Grid Few Production Managers Many Analysis Users How to ensure that Handles exist for the VO to throttle different priorities Production vs. Analysis User-A vs. User-B The VO is able to Inventory all resources currently available to it over some time period Strategically plan for its own use of those resources during that time period

Quality of Service For grid computing to become economically viable, a Quality of Service is needed Can the grid possibly handle my request within my required time window? If not, why not? When might it be able to accommodate such a request? If yes, with what probability?

Quality of Service But, grid computing today typically: Relies on a greedy job placement strategies Works well in a resource rich (user poor) environment Assumes no correlation between job placement choices Provides no QoS As a grid becomes resource limited, QoS becomes more important! greedy strategies not always a good choice Strong correlation between job placement choices

Some Requirements for Effective Grid Scheduling Information requirements Past & future dependencies of the application Persistent storage of workflows Resource usage estimation Policies Expected to vary slowly over time Global views of job descriptions Request Tracking and Usage Statistics State information important

Slides for requirements for scheduling and SPHINX Prepared by Rick Cavanaugh for CHEP 2004 Requirements for effective Grid Scheduling (cont..) Resource Properties and Status Expected to vary slowly with time Grid weather Accurate and pertinent parameter reporting important Replica management System requirements Distributed, fault-tolerant scheduling Customisability Interoperability with other scheduling systems Quality of Service

Incorporate Requirements into a Framework VDT Client??? Assume the GriPhyN Virtual Data Toolkit: Client (request/job submission) Globus clients Condor-G/DAGMan Chimera Virtual Data System Server (resource gatekeeper) MonALISA Monitoring Service Globus services RLS (Replica Location Service) VDT Server VDT Server VDT Server

Incorporate Requirements into a Framework Framework design principles: Information driven Flexible client-server model General, but pragmatic and simple Implement now; learn; extend over time Avoid adding middleware requirements on grid resources Take what is offered! Assume the GriPhyN Virtual Data Toolkit: Client (request/job submission) Globus clients Condor-G/DAGMan Chimera Virtual Data System Server (resource gatekeeper) MonALISA Monitoring Service Globus services RLS (Replica Location Service) VDT Client VDT Server? VDT Server Recommendation Engine VDT Server

The Sphinx Framework Data Warehouse Request Processing Data Management Clarens WS Backbone Sphinx Client Condor-G/DAGMan VDT Client Globus Resource Chimera Virtual Data System Sphinx Server Information Gathering Replica Location Service MonALISA Monitoring Service VDT Server Site

Sphinx Scheduling Server Functions as the Nerve Centre Data Warehouse Policies, Account Information, Grid Weather, Resource Properties and Status, Request Tracking, Workflows, etc Control Process Finite State Machine Different modules modify jobs, graphs, workflows, etc and change their state Flexible Extensible Data Warehouse Sphinx Server Control Process Message Interface Graph Reducer Job Predictor Graph Predictor Job Admission Control Graph Admission Control Graph Data Planner Job Execution Planner Graph Tracker Data Management Information Gatherer

Slides on Pegasus adapted from a presentation by Gaurang Mehta in CondorWeek2004 Pegasus Pegasus Planning for Execution in Grid Pegasus is a configurable system that can plan, schedule and execute complex workflows on the Grid. Algorithmic and AI based techniques are used. Pegasus takes an abstract workflow as input. The abstract workflow describes the transformations and data in terms of their logical names. It then queries the Replica Location Service (RLS) for existence of any materialized data. If any derived data exists then it is reused and a workflow reduction is done

Pegasus (cont) It then locates physical locations for both components (transformations and data) Uses Globus Replica Location Service (RLS) and the Transformation Catalog (TC) Finds appropriate resources to execute Via Globus Monitoring and Discovery Service (MDS) Adds the stage-in jobs to transfer raw and materialized input files to the computation sites. Adds the stage out jobs to transfer derived data to the user selected storage location. Both input and output staging is done Globus GridFtp Publishes newly derived data products for reuse RLS, Chimera virtual data catalog (VDC)

Current Pegasus system Current Pegasus implementation plans the entire workflow before submitting it for execution. (Full ahead) Deferred planning in progress Pegasus(Abstract Workflow) Concrete Worfklow DAGMan(CW)) Original Abstract Workflow Current Pegasus Workflow Execution

Pegasus (cont) Pegasus generates the concrete workflow in Condor Dagman format and submits them to Dagman/Condor-G for execution on the Grid. These concrete Dags have the concrete location of the data and the site where the computation is to be performed. Condor-G submits these jobs via Globus- Gram to remote schedulers running Condor, PBS, LSF and Sun Grid Engine.