Pegasus WMS Automated Data Management in Shared and Nonshared Environments

Similar documents
Managing large-scale workflows with Pegasus

Complex Workloads on HUBzero Pegasus Workflow Management System

Pegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute

Pegasus. Pegasus Workflow Management System. Mats Rynge

Pegasus. Automate, recover, and debug scientific computations. Mats Rynge

Pegasus. Automate, recover, and debug scientific computations. Rafael Ferreira da Silva.

Pegasus. Automate, recover, and debug scientific computations. Mats Rynge

Part IV. Workflow Mapping and Execution in Pegasus. (Thanks to Ewa Deelman)

Clouds: An Opportunity for Scientific Applications?

Automating Real-time Seismic Analysis

BOSCO Architecture. Derek Weitzel University of Nebraska Lincoln

Getting Started with OSG Connect ~ an Interactive Tutorial ~

The Problem of Grid Scheduling

A Cloud-based Dynamic Workflow for Mass Spectrometry Data Analysis

The HTCondor CacheD. Derek Weitzel, Brian Bockelman University of Nebraska Lincoln

glideinwms architecture by Igor Sfiligoi, Jeff Dost (UCSD)

STORK: Making Data Placement a First Class Citizen in the Grid

Condor and Workflows: An Introduction. Condor Week 2011

Grid Compute Resources and Job Management

Pegasus User Guide

Frank McKenna. Pacific Earthquake Engineering Research Center

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

BOSCO Architecture. Derek Weitzel University of Nebraska Lincoln

Outline. ASP 2012 Grid School

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

XSEDE High Throughput Computing Use Cases

Scientific Workflows and Cloud Computing. Gideon Juve USC Information Sciences Institute

Li Yu. University of Notre Dame

Condor-G: HTCondor for grid submission. Jaime Frey (UW-Madison), Jeff Dost (UCSD)

glideinwms: Quick Facts

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

Meeting the Challenges of Managing Large-Scale Scientific Workflows in Distributed Environments

OSGMM and ReSS Matchmaking on OSG

Managing Large-Scale Scientific Workflows in Distributed Environments: Experiences and Challenges

glideinwms Training Glidein Internals How they work and why by Igor Sfiligoi, Jeff Dost (UCSD) glideinwms Training Glidein internals 1

A Fully Automated Faulttolerant. Distributed Video Processing and Off site Replication

Grid Compute Resources and Grid Job Management

Enabling Distributed Scientific Computing on the Campus

VC3. Virtual Clusters for Community Computation. DOE NGNS PI Meeting September 27-28, 2017

The iplant Data Commons

Solving Hard Integer Programs with MW

Changing landscape of computing at BNL

LCG-2 and glite Architecture and components

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018

The Physiome Model Repository. Poul Nielsen

End-to-End Online Performance Data Capture and Analysis of Scientific Workflows

Enabling Large-scale Scientific Workflows on Petascale Resources Using MPI Master/Worker

Pegasus 4.0 User Guide

XSEDE s Campus Bridging Project Jim Ferguson National Institute for Computational Sciences

The ATLAS PanDA Pilot in Operation

On the Use of Cloud Computing for Scientific Workflows

HTCondor with KRB/AFS Setup and first experiences on the DESY interactive batch farm

arxiv: v3 [cs.dc] 19 Nov 2013

Flying HTCondor at 100gbps Over the Golden State

Shooting for the sky: Testing the limits of condor. HTCondor Week May 2015 Edgar Fajardo On behalf of OSG Software and Technology

Building the International Data Placement Lab. Greg Thain

Grid Data Management

Introducing the HTCondor-CE

XSEDE Software and Services Table For Service Providers and Campus Bridging

Scientific data management

LIGO Virtual Data. Realizing. UWM: Bruce Allen, Scott Koranda. Caltech: Kent Blackburn, Phil Ehrens, Albert. Lazzarini, Roy Williams

Introduction to Grid Computing

Mapping Abstract Complex Workflows onto Grid Environments

glite Grid Services Overview

On the Use of Cloud Computing for Scientific Workflows

A Virtual Comet. HTCondor Week 2017 May Edgar Fajardo On behalf of OSG Software and Technology

Quick Guide GriPhyN Virtual Data System.

Architecture Proposal

Getting Started with XSEDE Andrew Grimshaw and Karolina Sarnowska- Upton

Fractals exercise. Investigating task farms and load imbalance

WorkflowSim: A Toolkit for Simulating Scientific Workflows in Distributed Environments

Corral: A Glide-in Based Service for Resource Provisioning

HIGH ENERGY PHYSICS ON THE OSG. Brian Bockelman CCL Workshop, 2016

Open Science Grid LATBauerdick/Fermilab

Monitoring HTCondor with the BigPanDA monitoring package

Building Campus HTC Sharing Infrastructures. Derek Weitzel University of Nebraska Lincoln (Open Science Grid Hat)

Computing over the Internet: Beyond Embarrassingly Parallel Applications. BOINC Workshop 09. Fernando Costa

Grid Mashups. Gluing grids together with Condor and BOINC

DAGMan workflow. Kumaran Baskaran

Leveraging Globus Identity for the Grid. Suchandra Thapa GlobusWorld, April 22, 2016 Chicago

Planning the SCEC Pathways: Pegasus at work on the Grid

Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library

irods workflows for the data management in the EUDAT pan-european infrastructure

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

History of SURAgrid Deployment

Techniques for Efficient Execution of Large-Scale Scientific Workflows in Distributed Environments

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

Day 9: Introduction to CHTC

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System

A High-Level Distributed Execution Framework for Scientific Workflows

A Cleanup Algorithm for Implementing Storage Constraints in Scientific Workflow Executions

Work Queue + Python. A Framework For Scalable Scientific Ensemble Applications

XSEDE Software and Services Table For Service Providers and Campus Bridging

EFFICIENT SCHEDULING TECHNIQUES AND SYSTEMS FOR GRID COMPUTING

A Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid

Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example

DMTN-003: Description of v1.0 of the Alert Production Simulator

COMP6511A: Large-Scale Distributed Systems. Windows Azure. Lin Gu. Hong Kong University of Science and Technology Spring, 2014

Transcription:

Pegasus WMS Automated Data Management in Shared and Nonshared Environments Mats Rynge <rynge@isi.edu> USC Information Sciences Institute

Pegasus Workflow Management System NSF funded project and developed since 2001 as a collaboration between USC Information Sciences Institute and the HTCondor Team at UW Madison Builds on top of HTCondor DAGMan. Abstract Workflows - Pegasus input workflow description Workflow high-level language Only identifies the computation, devoid of resource descriptions, devoid of data locations Pegasus is a workflow compiler (plan/map) Target is DAGMan DAGs and HTCondor submit files Transforms the workflow for performance and reliability Automatically locates physical locations for both workflow components and data Collects runtime provenance A B B B B C C C C D 2

Abstract Workflow 3

Abstract to Executable Workflow Mapping Abstraction provides Ease of Use (do not need to worry about low-level execution details) Portability (can use the same workflow description to run on a number of resources and/or across them) Gives opportunities for optimization and fault tolerance automatically restructure the workflow automatically provide fault recovery (retry, choose different resource) 4

Supported Data Staging Approaches Shared Filesystem setup (typical of XSEDE and HPC sites) Worker nodes and the head node have a shared filesystem, usually a parallel filesystem with great I/O characteristics Condor IO Worker nodes don t share a filesystem Data is pulled from / pushed to the submit host via Condor file transfers NonShared filesystem setup using an existing storage element for staging (typical of OSG and campus Condor pools) Worker nodes don t share a filesystem. Data is pulled from / pushed to the existing storage element. 5

Workflow Reduction (Data Reuse) f.ip f.ip f.ip A A A f.a f.a f.a f.a f.a B C B C C f.b f.c f.b f.c f.c D E D E E f.d f.e f.d f.e f.d f.e F F F f.out f.out f.out Abstract Workflow File f.d exists somewhere. Reuse it. Mark Jobs D and B to delete Delete Job D and Job B 6

File cleanup (cont) Montage 1 degree workflow run with cleanup 7

pegasus-transfer subsystem Command line tool used internally by Pegasus workflows Input is a list of source and destination URLs Transfers the data by calling out to tools provided by the system (cp, wget, ) Pegasus (pegasus-gridftp, pegasus-s3) or third party (gsutil) Transfers are parallelized Transfers between non-compatible protocols are split up into two transfers using the local filesystem as a staging point for example: GridFTP->GS becomes GridFTP->File and File->GS Supported URLs GridFTP SRM irods S3 GS SCP HTTP File Symlink 10

Relevant Links http://pegasus.isi.edu Tutorial and documentation: http://pegasus.isi.edu/wms/docs/latest/ Mats Rynge rynge@isi.edu 11

Catalogs Pegasus uses 3 catalogs to fill in the blanks of the abstract workflow Site catalog Defines the execution environment and potential data staging resources Simple in the case of Condor pool, but can be more complex when running on grid resources Transformation catalog Defines executables used by the workflow Executables can be installed in different locations at different sites Replica catalog Locations of existing data products input files and intermediate files from previous runs 12