Pegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute

Similar documents
Complex Workloads on HUBzero Pegasus Workflow Management System

Pegasus WMS Automated Data Management in Shared and Nonshared Environments

Pegasus. Automate, recover, and debug scientific computations. Rafael Ferreira da Silva.

A Cloud-based Dynamic Workflow for Mass Spectrometry Data Analysis

Pegasus. Automate, recover, and debug scientific computations. Mats Rynge

Managing large-scale workflows with Pegasus

Clouds: An Opportunity for Scientific Applications?

Automating Real-time Seismic Analysis

Pegasus. Pegasus Workflow Management System. Mats Rynge

Frank McKenna. Pacific Earthquake Engineering Research Center

BOSCO Architecture. Derek Weitzel University of Nebraska Lincoln

Pegasus. Automate, recover, and debug scientific computations. Mats Rynge

Scientific Workflows and Cloud Computing. Gideon Juve USC Information Sciences Institute

Part IV. Workflow Mapping and Execution in Pegasus. (Thanks to Ewa Deelman)

BOSCO Architecture. Derek Weitzel University of Nebraska Lincoln

Condor and Workflows: An Introduction. Condor Week 2011

Grid Compute Resources and Job Management

The Problem of Grid Scheduling

Corral: A Glide-in Based Service for Resource Provisioning

Enabling Large-scale Scientific Workflows on Petascale Resources Using MPI Master/Worker

Grid Compute Resources and Grid Job Management

Cloud Computing. Up until now

Pegasus User Guide

Grid Scheduling Architectures with Globus

On the Use of Cloud Computing for Scientific Workflows

Meeting the Challenges of Managing Large-Scale Scientific Workflows in Distributed Environments

Workflow applications on EGI with WS-PGRADE. Peter Kacsuk and Zoltan Farkas MTA SZTAKI

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

Gergely Sipos MTA SZTAKI

glideinwms: Quick Facts

On the Use of Cloud Computing for Scientific Workflows

Integration of Workflow Partitioning and Resource Provisioning

XSEDE High Throughput Computing Use Cases

Getting Started with OSG Connect ~ an Interactive Tutorial ~

STORK: Making Data Placement a First Class Citizen in the Grid

VC3. Virtual Clusters for Community Computation. DOE NGNS PI Meeting September 27-28, 2017

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

A Virtual Comet. HTCondor Week 2017 May Edgar Fajardo On behalf of OSG Software and Technology

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

Doing Science on GENI GENI and NRP

Building Campus HTC Sharing Infrastructures. Derek Weitzel University of Nebraska Lincoln (Open Science Grid Hat)

End-to-End Online Performance Data Capture and Analysis of Scientific Workflows

EGEE and Interoperation

glite Grid Services Overview

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018

A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Managing Large-Scale Scientific Workflows in Distributed Environments: Experiences and Challenges

Climate Data Management using Globus

User Tools and Languages for Graph-based Grid Workflows

Multiple Broker Support by Grid Portals* Extended Abstract

OpenSees on Teragrid

LIGO Virtual Data. Realizing. UWM: Bruce Allen, Scott Koranda. Caltech: Kent Blackburn, Phil Ehrens, Albert. Lazzarini, Roy Williams

Leveraging Globus Identity for the Grid. Suchandra Thapa GlobusWorld, April 22, 2016 Chicago

Distributed Computing Framework. A. Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille

Chapter 2 Introduction to the WS-PGRADE/gUSE Science Gateway Framework

The Materials Data Facility

A Cleanup Algorithm for Implementing Storage Constraints in Scientific Workflow Executions

Distributed and Adaptive Execution of Condor DAGMan Workflows

The glite middleware. Ariel Garcia KIT

Workload management at KEK/CRC -- status and plan

Techniques for Efficient Execution of Large-Scale Scientific Workflows in Distributed Environments

XRAY Grid TO BE OR NOT TO BE?

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Dynamic Workflows for Grid Applications

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

Work Queue + Python. A Framework For Scalable Scientific Ensemble Applications

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager

Introduction to Grid Computing

WorkflowSim: A Toolkit for Simulating Scientific Workflows in Distributed Environments

A Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Shooting for the sky: Testing the limits of condor. HTCondor Week May 2015 Edgar Fajardo On behalf of OSG Software and Technology

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System

Enabling Distributed Scientific Computing on the Campus

Cloud Computing. Up until now

Carelyn Campbell, Ben Blaiszik, Laura Bartolo. November 1, 2016

Reduction of Workflow Resource Consumption Using a Density-based Clustering Model

ICAT Job Portal. a generic job submission system built on a scientific data catalog. IWSG 2013 ETH, Zurich, Switzerland 3-5 June 2013

High Throughput WAN Data Transfer with Hadoop-based Storage

Federating FutureGrid and GENI

XSEDE Software and Services Table For Service Providers and Campus Bridging

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Getting Started with XSEDE Andrew Grimshaw and Karolina Sarnowska- Upton

Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

Overview. Scientific workflows and Grids. Kepler revisited Data Grids. Taxonomy Example systems. Chimera GridDB

glideinwms architecture by Igor Sfiligoi, Jeff Dost (UCSD)

Flying HTCondor at 100gbps Over the Golden State

DAGMan workflow. Kumaran Baskaran

Apache Airavata Open Source Framework

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

Introducing the HTCondor-CE

A Login Shell interface for INFN-GRID

Quick Start Guide. Table of Contents

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

Grid Mashups. Gluing grids together with Condor and BOINC

Professor: Ioan Raicu. TA: Wei Tang. Everyone else

EFFICIENT SCHEDULING TECHNIQUES AND SYSTEMS FOR GRID COMPUTING

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Transcription:

Pegasus Workflow Management System Gideon Juve USC Informa3on Sciences Ins3tute

Scientific Workflows Orchestrate complex, multi-stage scientific computations Often expressed as directed acyclic graphs (DAGs) Can execute in parallel on distributed resources Capture analysis pipelines for sharing and reuse Setup Split Filter & Convert Map Merge create_dir fastqsplit fastqsplit fastqsplit fastqsplit fastqsplit fastqsplit filtercontams filtercontams filtercontams filtercontams filtercontams filtercontams filtercontams filtercontams filtercontams filtercontams filtercontams filtercontams sol2sanger sol2sanger sol2sanger sol2sanger sol2sanger sol2sanger sol2sanger sol2sanger sol2sanger sol2sanger sol2sanger sol2sanger fast2bfq fast2bfq fast2bfq fast2bfq fast2bfq fast2bfq fast2bfq fast2bfq fast2bfq fast2bfq fast2bfq fast2bfq map map map map map map map map map map map map mapmerge mapmerge mapmerge mapmerge mapmerge mapmerge Analyze mapmerge chr21 pileup Epigenomics Workflow 2

Large, Data-intensive Workflows John Good (Caltech) Montage Galactic Plane Workflow 18 million input images (~2.5 TB / wf) 1000 output images (2.5 GB each, 2.4 TB / wf) 10.5 million tasks (34,000 CPU hours) 3

Pegasus Workflow Management System (WMS) A collaboration between USC/ISI and the Condor Team at UW Madison Pegasus plans the workflow DAGMan is the workflow engine Condor schedules jobs and manages resources Started in 2001 Actively used by many applications in a variety of domains Earth science, physics, astronomy, bioinformatics, others http://pegasus.isi.edu/applications 4

Pegasus WMS API Interfaces Pegasus WMS Mapper Clouds Cloudware OpenStack Eucalyptus, Nimbus Portals Notifications Monitoring Engine Scheduler Compute Amazon EC2, RackSpace, FutureGrid Logs Users Workflow DB Storage S3 Distributed Resources Campus Clusters, Local Clusters, Open Science Grid, XSEDE Other Workflow Composition Tools: Grayson, Triana, Wings P B S GRAM LSF SGE C O N D O R MIDDLEWARE COMPUTE GridFTP HTTP FTP SRM IRODS SCP STORAGE 5

DAGMan Directed Acyclic Graph Manager Enables dependencies between jobs A Provides reliability B C Retries for job-level failures Rescue DAGs for workflow-level failures Other features Throttling Priorities Pre- and Post-Scripts Workflow of workflows D JOB A a.submit JOB B b.submit JOB C c.submit JOB D d.submit PARENT A CHILD B C PARENT B C CHILD D 6

Pegasus Planner Pegasus planner compiles abstract workflows into executable workflows Adds data management jobs (transfer, cleanup, registration) Performs optimizations (task clustering, reduction, etc.) Generates executable artifacts (DAG, submit scripts, etc.) Enables portability and optimization Catalogs Information Catalogs Catalogs Pegasus Planner Scripts Scripts Submit Scripts Condor DAX 7 DAG DAGMan

Other Pegasus WMS Features Monitoring and Troubleshooting Web GUI and command-line tools for reporting and investigating workflow progress and failures Hierarchical Workflows Workflow of Workflows Provenance Generates a database with information about when and where jobs were executed Others MPI task clustering (pegasus-mpi-cluster) Script generation Notifications 8

Distributed Workflow Challenges/Requirements Security and identity management Different credentials on different sites / clusters Complexity of grid security, X.509 Firewall and network issues, e.g. when using Glideins Job submission Local: Dedicated login node / submit host Remote: Middleware, e.g. Globus GRAM Data transfer Need high-performance transfer tools, e.g. GridFTP Many different protocols/tools (scp, GridFTP, http, irods, srm, s3) Complex storage configurations 9

Job Submission in Pegasus Personal Condor Local Condor pool Local Batch (PBS, SGE, etc.) Remote Condor pool (Condor-C, flocking) Batch using GRAM, UNICORE, ARC Glideins with BOSCO or glideinwms Batch using SSH w/ BOSCO (scalability?) 10

Distributed Data Management in Pegasus Pegasus supports many different data configurations Complex data flows Many protocols and tools Push, pull, and 3 rd party transfers Inter-site transfers Squid, get http, put!http Site Types Local site: Pegasus WMS Storage site: inputs and outputs Staging site: intermediate Compute site: compute jobs Input Site Staging Site Output Site Local Site (submit host) Compute Site 11

Data Management Configurations Shared File System WN Submit Host WN Non-shared File System WN Submit Host WN Condor IO Submit Host Local FS Jobs Data Shared FS Compute Site Compute Site Staging Site WN WN Compute Site e.g HPC Cluster e.g Amazon EC2 w/ S3 e.g. Open Science Grid 12

Workflows and Science Gateways HubZero Pegasus as a Service Credit: Frank McKenna, UC Berkeley, NEES, HUBzero 13

More Information Website: http://pegasus.isi.edu Email: pegasus-users@isi.edu pegasus-support@isi.edu Pegasus Team @ ISI Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Rajiv Mayani 14

CC-NIE: ADAMANT Pegasus Start workflow 1. Workflow ORCA Dynamic Slice Few compute nodes for beginning steps Dynamically create 2. compute nodes Add compute nodes for parallel compute intensive step Compute intensive 3. workflow step Time Dynamically provision network between cloud sites 4. Dynamically destroy compute nodes 5. End workflow Free unneeded compute nodes after compute step Jeff Chase, Duke Ewa Deelman, USC/ISI Ilya Baldin, UNC CH/RENCI Charles Schmitt, UNC CH/RENCI 15