Workflow as a Service: An Approach to Workflow Farming

Similar documents
Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Corral: A Glide-in Based Service for Resource Provisioning

L3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat

Transactum Business Process Manager with High-Performance Elastic Scaling. November 2011 Ivan Klianev

Executing dynamic heterogeneous workloads on Blue Waters with RADICAL-Pilot

High Speed Asynchronous Data Transfers on the Cray XT3

MOHA: Many-Task Computing Framework on Hadoop

The LGI Pilot job portal. EGI Technical Forum 20 September 2011 Jan Just Keijser Willem van Engen Mark Somers

Optimizing Web Service Composition in Parallel

Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's

Cost-efficient Task Farming with ConPaaS Ana Oprescu, Thilo Kielmann Vrije Universiteit, Amsterdam Haralambie Leahu, Technical University Eindhoven

A Cloud Framework for Big Data Analytics Workflows on Azure

LHCb Computing Strategy

Most real programs operate somewhere between task and data parallelism. Our solution also lies in this set.

Building scalable service-based applications Wicked Fast

AWS Solution Architecture Patterns

Distributed Information Processing

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Performance and Scalability with Griddable.io

XSEDE High Throughput Computing Use Cases

Chapter 3. Design of Grid Scheduler. 3.1 Introduction

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery

A Federated Grid Environment with Replication Services

Building Distributed Access Control System Using Service-Oriented Programming Model

CHAPTER 5 PARALLEL GENETIC ALGORITHM AND COUPLED APPLICATION USING COST OPTIMIZATION

Software Architecture

Overview of ATLAS PanDA Workload Management

A LAYERED FRAMEWORK FOR CONNECTING CLIENT OBJECTIVES AND RESOURCE CAPABILITIES

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

MONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT

Getting Started with Serial and Parallel MATLAB on bwgrid

Alteryx Technical Overview

Chapter 20: Database System Architectures

Lecture 23 Database System Architectures

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

2/26/2017. For instance, consider running Word Count across 20 splits

Methods of Distributed Processing for Combat Simulation Data Generation

Rule-Based Automatic Management of a Distributed Simulation Environment

Lightweight Streaming-based Runtime for Cloud Computing. Shrideep Pallickara. Community Grids Lab, Indiana University

The Power of Many: Scalable Execution of Heterogeneous Workloads

Architecture Proposal

New Features in PanDA. Tadashi Maeno (BNL)

Zero to Microservices in 5 minutes using Docker Containers. Mathew Lodge Weaveworks

S-Store: Streaming Meets Transaction Processing

PoS(EGICF12-EMITC2)143

Introduction to Grid Computing

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

<Insert Picture Here> QCon: London 2009 Data Grid Design Patterns

Sparrow. Distributed Low-Latency Spark Scheduling. Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica

Scalable Computing: Practice and Experience Volume 10, Number 4, pp

Figure 1: VRengine (left rack)

Data Acquisition. The reference Big Data stack

Distributed Computing on Browsers

AutoPyFactory: A Scalable Flexible Pilot Factory Implementation

Mapping a group of jobs in the error recovery of the Grid-based workflow within SLA context

arxiv: v2 [cs.dc] 19 Jul 2015

Advanced Topics in Distributed Systems. Dr. Ayman A. Abdel-Hamid. Computer Science Department Virginia Tech

Unit 5: Distributed, Real-Time, and Multimedia Systems

Enabling Large-scale Scientific Workflows on Petascale Resources Using MPI Master/Worker

A High Availability Solution for GRID Services

Liberate, a component-based service orientated reporting architecture

Parallel Programming Patterns Overview and Concepts

Networked Systems and Services, Fall 2018 Chapter 4. Jussi Kangasharju Markku Kojo Lea Kutvonen

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

Developing Windows Communication Foundation Solutions with Microsoft Visual Studio 2010

Low Latency Data Grids in Finance

Extreme-scale scripting: Opportunities for large taskparallel applications on petascale computers

Data Sheet: Storage Management Veritas Storage Foundation for Oracle RAC from Symantec Manageability and availability for Oracle RAC databases

IST GridLab - A Grid Application Toolkit and Testbed. Result Evaluation. Jason Maassen, Rob V. van Nieuwpoort, Andre Merzky, Thilo Kielmann

High-Performance Data Loading and Augmentation for Deep Neural Network Training

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

High Throughput WAN Data Transfer with Hadoop-based Storage

Enterprise print management in VMware Horizon

02 - Distributed Systems

Enabling GPU support for the COMPSs-Mobile framework

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Grid Scheduling Architectures with Globus

Grid Computing Competence Center Large Scale Computing Infrastructures (MINF 4526 HS2011)

02 - Distributed Systems

Characterising Resource Management Performance in Kubernetes. Appendices.

Introduction to MapReduce

BENCHFLOW A FRAMEWORK FOR BENCHMARKING BPMN 2.0 WORKFLOW MANAGEMENT SYSTEMS

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD

Complex Workloads on HUBzero Pegasus Workflow Management System

Container-Native Storage

Data Acquisition. The reference Big Data stack

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

Scientific Computing on Emerging Infrastructures. using HTCondor

Principal Solutions Architect. Architecting in the Cloud

A Software Developing Environment for Earth System Modeling. Depei Qian Beihang University CScADS Workshop, Snowbird, Utah June 27, 2012

CSE544 Database Architecture

Work Queue + Python. A Framework For Scalable Scientific Ensemble Applications

CMS experience of running glideinwms in High Availability mode

Remote Persistent Memory SNIA Nonvolatile Memory Programming TWG

From gridified scripts to workflows: the FSL Feat case

On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows

Improved 3G Bridge scalability to support desktop grid executions

Transcription:

Workflow as a Service: An Approach to Workflow Farming Reginald Cushing, Adam Belloum, Vladimir Korkhov, Dmitry Vasyunin, Marian Bubak, Carole Leguy Institute for Informatics University of Amsterdam 3 rd International Workshop on Emerging Computational Methods for the Life Sciences 18 th June 2012

Outline Scientific Workflows Farming Concepts Workflow as a Service (WfaaS) System overview Task Harnessing Messaging Application Use Case Results Conclusions

Scientific Workflows Composing experiments from reusable modules Vertexes represent computation Edges represent data dependency and data communication Modules/Tasks communicate through channels represented by ports Workflow engines distribute workload onto resources such as grids and clouds Modules run in parallel thus achieving better throughput

Farming Concepts Many scientific applications require a parameter space study a.k.a parameter sweep In workflows parameter sweeps can be achieved by running multiple identical workflows with different parameter inputs Cons: Every instance of a workflow has to be submitted to distributed resources where queue waiting times play significant role on throughput

Farming Concepts Parameters organized on message queues Task

Farming Concepts Parameters organized on message queues Task Task processes data sequentially

Farming Concepts Parameters organized on message queues Task Task processes data sequentially

Farming Concepts Parameters organized on message queues Task Task processes data sequentially

Farming Concepts Parameters organized on message queues Task Task Task Task processes data sequentially Adding more tasks increases message consumption rate Challenge: How many tasks to create? Too many - tasks get stuck on queues. Too few - optimal performance not achieved

Workflow as a Service Workflow execution is persistent i.e. it runs, process data and does NOT terminate but wait for more data An active workflow instance can process multiple parameters Make better usage of computing resources A parameter space can be partitioned amongst a pool of active workflow instances (a farm of workflows) A workflow acts as a service by accepting requests to process data with given parameters Request 1: data A, parameters {p1,p2,...} Request 2: data A, parameters {k1,k2,...} Multiple WfaaS processing requests form a farm of workflows

System Overview Loosely coupled modules revolving around a message Queues

Enactment Engine Dataflow engine (top-level scheduler) based on Freefluo engine Models workflows as dataflow graphs Vertices are tasks while edges are dependencies(data Tasks have ports to simulate data channels Dataflow model dictates that only tasks which have input are scheduled for execution. http://freefluo.sourceforge.net

Message Broker Message broker plays a pivotal role in the system Message queues act as a data buffer Communicating tasks are time decoupled Through queue sharing we can achieve scaling Tasks communicate through messaging where messages contain references to actual data

Submission System Pluggable schedulers (bottomlevel) for task match-making Submitters (drivers) abstract actual resources such as cluster, grid, cloud Scheduler matches a task to a submitter Submitter does actual task/job submission

Task Harnessing Task harness is a late binding, pilotjob mechanism A pilot-job (harness) is submitted which will pull the actual job The harness separates data transport from scientific logic Better control of tasks

Task Auto-Scaling Messages between tasks are monitored Size of queued data and mean data processing time are used to calculate task load Auto-scaling replicates a particular task to ameliorate the task load Replicated tasks (clones) partition data by sharing same input message queues

Parameter Mapping One to one mapping: each parameter is mapped to one workflow instance Generates many workflow instances which end up stuck on queues waiting execution High scheduling overhead, high concurrency Many to one mapping: all parameters are mapped to the same workflow instance Only one workflow to schedule, takes long to process all the parameter space Low scheduling overhead, Low concurrency Many to many: parameter space is partitioned amongst a farm of workflows A number of workflows scheduled which accelerates processing Low scheduling overhead, high concurrency

Task harnessing WfaaS is enabled through task harnessing A harness is a caretaker code that runs alongside the module on the resource/worker node It implements a plugin architecture Modules are dynamically loaded at runtime Data communication to and from the module is taken care of by the harness The harness invokes the module with new requests of data processing The harness is akin to a container while the module is akin to a service The harness enables asynchronous module execution as communication is done through messaging

Messaging In WfaaS modules communicate through messaging Message queues allow multiple instances of modules to share the same input space Through message queues, data is partitioned amongst modules Messaging circumvents the need to co-allocate resources A pull model implies that each module can process data at its own pace Once a module has finished processing data it asks for more (pull)

Application Use Case Biomedical study for which 3000 runs were required to perform global sensitivity analysis Patient-specific simulation includes many parameters based on data measured in-vivo Arterial tree model geometry and representation of model parameters constrained to uncertainties Parameters: flow velocity, brachial, radial, ulnar radii. Length of brachial, radial, ulnar. etc

Results Left: WfaaS 100 simulations takes around 3h:15min Right: Non WfaaS 100 simulations take 5h:15min The WfaaS approach, each workflow instance performs multiple simulations which drastically reduces queue waiting times The non-wfaas approach generates 100 workflow instances with most of them getting stuck on job queues In both cases worklows were competing for 28 worker nodes

Conclusions WfaaS is an ideal approach to large parametric studies WfaaS reduces common scheduling overhead associated with queue waiting times WfaaS is achieved through task harnessing whereby caretaker routines can invoke the task multiple times A farm of wokflows can progress at its own pace through a parameter pulling mechanisim

Further Information WSVLAM workflow management system http://staff.science.uva.nl/~gvlam/wsvlam/ Computational Sciences at University of Amsterdam http://uva.computationalscience.nl COMMIT http://www.commit-nl.nl/new