Data Throttling for Data-Intensive Workflows

Similar documents
Data Throttling for Data-Intensive Workflows

Leveraging Legacy Workflow Capabilities in a Grid Environment

End-to-End QoS on Shared Clouds for Highly Dynamic, Large-Scale Sensing Data Streams

A Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid

Semantic-Based Workflow Composition for Video Processing in the Grid

Clouds: An Opportunity for Scientific Applications?

of-service Support on the Internet

The Network Layer and Routers

Mapping Semantic Workflows to Alternative Workflow Execution Engines

A High-Level Distributed Execution Framework for Scientific Workflows

Nowadays data-intensive applications play a

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS

SNAG: SDN-managed Network Architecture for GridFTP Transfers

Tackling the Provenance Challenge One Layer at a Time

Enhancing Lustre Performance and Usability

The Semantic Web: Service discovery and provenance in my Grid

The Circulate architecture: avoiding workflow bottlenecks caused by centralised orchestration

Re-using Data Mining Workflows

Lecture 24: Scheduling and QoS

Managing Large-Scale Scientific Workflows in Distributed Environments: Experiences and Challenges

Managing and Securing Computer Networks. Guy Leduc. Chapter 2: Software-Defined Networks (SDN) Chapter 2. Chapter goals:

PeerApp Case Study. November University of California, Santa Barbara, Boosts Internet Video Quality and Reduces Bandwidth Costs

Workflow Partitioning and Deployment on the Cloud using Orchestra

LIME: A Framework for Lustre Global QoS Management. Li Xi DDN/Whamcloud Zeng Lingfang - JGU

The Problem of Grid Scheduling

Advanced Computer Networks

Network Layer Enhancements

Merging overlapping orchestrations: an application to the Bronze Standard medical application

Data Management for Distributed Scientific Collaborations Using a Rule Engine

Predictable Time-Sharing for DryadLINQ Cluster. Sang-Min Park and Marty Humphrey Dept. of Computer Science University of Virginia

ADVANCED COMPUTER NETWORKS

A Preferred Service Architecture for Payload Data Flows. Ray Gilstrap, Thom Stone, Ken Freeman

Workflow System for Data-intensive Many-task Computing

A Granular Concurrency Control for Collaborative Scientific Workflow Composition

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

Quality of Service (QoS)

Accelerating Parameter Sweep Workflows by Utilizing Ad-hoc Network Computing Resources: an Ecological Example

Real-Time Protocol (RTP)

Enable Infrastructure Beyond Cloud

Flexible Delivery of Visualization Software and Services

Disk Cache-Aware Task Scheduling

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING

Lecture 22 Overview. Last Lecture. This Lecture. Next Lecture. Internet Applications. ADSL, ATM Source: chapter 14

Tom Oinn,

White Paper. How the Meltdown and Spectre bugs work and what you can do to prevent a performance plummet. Contents

Optimizing Service Orchestrations

H3C S9500 QoS Technology White Paper

The Social Grid. Leveraging the Power of the Web and Focusing on Development Simplicity

Implementation of a leaky bucket module for simulations in NS-3

Brain Image Registration Analysis Workflow for fmri Studies on Global Grids

Quality of Service in the Internet

Quality of Service in the Internet

Tackling the Provenance Challenge One Layer at a Time

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

Event-Based Software-Defined Networking: Build a Secure Science DMZ

Pegasus. Automate, recover, and debug scientific computations. Rafael Ferreira da Silva.

Advanced Reservation-based Scheduling of Task Graphs on Clusters

NiW: Converting Notebooks into Workflows to Capture Dataflow and Provenance

Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks

Centralized versus distributed schedulers for multiple bag-of-task applications

Pegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute

Pairwise genome comparison workflow in the Cloud using Galaxy

Optimizing workflow data footprint

Policy-Aware Optimization of Parallel Execution of Composite Services

Configuring QoS. Finding Feature Information. Prerequisites for QoS

A Service-oriented Data Integration and Analysis Environment for In Silico Experiments and Bioinformatics Research

CS550. TA: TBA Office: xxx Office hours: TBA. Blackboard:

THE SECOND GENERATION ONION ROUTER. Roger Dingledine Nick Mathewson Paul Syverson. -Presented by Arindam Paul

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

IP Network Emulation

Eliminating The Middleman: Peer-to-Peer Dataflow

Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore

MONTAGE An Astronomical Image Mosaic Service for the National Virtual Observatory

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

Service-Oriented Architecture for VIEW: a Visual Scientific Workflow Management System

Lecture 14: Performance Architecture

Addressing the Provenance Challenge using ZOOM

Towards Interactive Exploration of Images, Meta-Data, and Analytic Results in the Open Microscopy Environment

PineApp Archive-SeCure

Storage Networking Strategy for the Next Five Years

Importance of Interoperability in High Speed Seamless Redundancy (HSR) Communication Networks

Peer-to-Peer Based Grid Workflow Runtime Environment of SwinDeW-G

Accelerate AI with Cisco Computing Solutions

GIoTS & IOT Week 2017: IOT Reality Check Patrick Wetterwald, CTAO IOT Standards and Architecture

EMPRESS Extensible Metadata PRovider for Extreme-scale Scientific Simulations

3. Evaluation of Selected Tree and Mesh based Routing Protocols

Provenance Collection Support in the Kepler Scientific Workflow System

CS 5114 Network Programming Languages Data Plane. Nate Foster Cornell University Spring 2013

Resource allocation in networks. Resource Allocation in Networks. Resource allocation

Quality of Service and Bandwidth Management

Enabling Workflows in GridSolve: Request Sequencing and Service Trading

Page 1. Quality of Service. CS 268: Lecture 13. QoS: DiffServ and IntServ. Three Relevant Factors. Providing Better Service.

Automatic Scaling Iterative Computations. Aug. 7 th, 2012

An Autonomic Workflow Management System for Global Grids

Automatic Generation of Workflow Provenance

WhitePaper: XipLink Real-Time Optimizations

CS 268: Integrated Services

John W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands

Chapter -6 IMPROVED CONGESTION CONTROL MECHANISM FOR REAL TIME DATA TRANSMISSION

Configuring QoS. Finding Feature Information. Prerequisites for QoS. General QoS Guidelines

Transcription:

Data Throttling for Data-Intensive Workflows Sang-Min Park and Marty Humphrey Dept. of Computer Science, University of Virginia This work was supported by the National Science Foundation under Grant No. SCI-0426972 and Grant No. SCI-0438263, the DOE Early Career program, and the Microsoft Research escience Program.

Overview E-Science Workflows Good and Bad Data Throttling Motivation Data Throttling Design and Implementation Evaluations Simulation Experiments with Montage workflow Conclusions

Example workflow from Taverna system Oinn, T.; Addis, M.; Ferris, J.; Marvin, D.; Senger, M.; Greenwood, M.; Carver, T.; Glover, K.; Pocock, M. R.; Wipat, A. & Li, P. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 2004, 20, 3045-3054

E-Science Workflows - Good Natural way to describe scientific logic In terms of computation and data dependency GUI support Easy to map on GRID Different aspects of escience workflows For domain scientist: Problem solving environment Data provenance for reproducible science For computer scientist Separation of concern: single abstraction and multiple implementation Unified application model to apply research ideas

E-Science Workflows - Bad Easy to domain scientist = Hard to computer scientist Hard problems (research issues) Reliability: How to run tens of thousand tasks without failure? Abstraction: how to describe scientific logic? Is there a standard, unified way to do it? Performance: How to run fast or how to run it efficiently?

E-Science Workflows Performance Big, complex application Tens of thousands task (e.g., LIGO, Montage, etc) Complex dependencies between tasks Dynamic GRID resources Availability of computing nodes change Data bandwidths fluctuate Large infrastructure does not guarantee high performance Unbalanced computation Slow communication Inefficient workflow engine and middleware

E-Science Workflows Performance Task scheduling existing solutions Map tasks (or data) to nodes in a way to minimize workflow s execution time (old and hard problem) Scheduling is not sufficient Resource s state changes rapidly Data increasingly dominates computation Often data movement delay is not considered Simplistic approach to predict bandwidth does not solve the problem Prediction is often wrong and too intrusive to be accurate Workflow generates huge amount data flow, breaking the prediction by itself

Problem in data movement Conventional practice of data movement Use high performance tool (e.g., GridFTP) Send (or receive) immediately, as fast as possible This is often unnecessary Data is not consumed by tasks until all predecessor s data is received This is harmful! Link s capacity is limited Some unnecessarily fast transfer slow down other (more urgent) transfers

Problem in data movement Outgoing edge Why send data to T2, T3 at the same speed? T1 Then, can we send it to T3 faster than T2? (No, currently) Incoming edge 1 min. T2 T3 2 min. Does T6 need to receive data from T2 earlier than T4, T5? T4 T5 Then, can we slow down the transfer from T2? (No, currently) T6

Data Throttling Control workflow edge s data movement delay Improving performance Throttle up to more urgent sections of workflow Throttle down to less urgent sections Compensate imbalance of workflow execution Efficient bandwidth usage Throttle down if early arrival is not necessary Use limited bandwidth with other GRID activities in harmonious way

Data Throttling Programmable interface annotation on workflow graph Enforcement mechanism QoS-enabled GridFTP

Data Throttling Interface Allow users to control data transfer delay Define new workflow programming attribute: Relative Transfer Time Finished (RTTF) RTTF defines relative file transfer time within a group of incoming or outgoing edges User or workflow engine annotate workflow graph with RTTF values

Data Throttling Interface RTTF attribute Enforce priority of file transfer delay among edges Throttle down bandwidth if fast movement is not necessary Why relative delay as interface? Sufficient to express priority relationship between edges Simpler implementation (w/o end-to-end router QoS support) T2 T1 {2:1} T4 {1:1:1} T6 T3 T5

Data Throttling Implementation Application-level QoS via rate limiting on GridFTP Congestion in today s network often occurs at end hosts, not broad backbone (e.g., TeraGrid 40Gb/s) So it is often enough to control bandwidth consumption by popular services at end-hosts Simple implementation without special hardware support Sufficient to enforce relative bandwidth guarantee

Data Throttling Implementation Token bucket rate limiting Algorithm used in routers GridFTP transfer n bytes only if there are more than n tokens in a bucket Token is filled into bucket at rate R Implemented as XIO drivers in Grid server (no modification to GridFTP client)

QoS-enabled GridFTP GridFTP Server Relative Transfer Bandwidth G={1.0, 1.5, 3.0, } T2 T1 {2:1} T3 RTB Interface T4 T5 T2 GridFTP process Rate Limiter Concrete Bandwidth {1.0, 2.0, 8.0, } mb/s {1:1:1} T6 T3 GridFTP process Compensator GridFTP process

Evaluation target applications Throttling improves performance as it complements task execution imbalance But not all applications are applicable Computation >> Communication: little room to improve Computation << Communication: computation s imbalance is not significant Find range of computation-communication ratio (CCR) that throttling is worth to apply Simulation using GridSim package that simulates QoS network

Evaluation target applications Run simple, but frequent workflow pattern with varying CCR Users can find at which sections of large workflow throttling is worth H P1 P2 P3 Pn T

Evaluation target applications Degree of imbalance Throttling is most effective if {0.5 <=CCR<=2} More improvement as task imbalance is large (especially when CCR < 1)

Evaluation Montage workflow Montage Image mosaic engine for astronomy research Compute and data intensive application with inherently large parallelism Integrated with Grid workflow systems (e.g., Pegasus) Has range of CCR worth to apply throttling Resources UVA runs data server NCSA has compute nodes Internet 2 links between UVA and NCSA

Evaluation Montage workflow Raw Images mimgtbl Overlapping Images Metadata about projected images moverlaps Difference table mbgmodel mprojexec mdiffexec Plane-fitting table Global corrections mbgexec madd Corrected Images An Image File

Evaluation Montage workflow Throttling to control raw image delivery Some Computations run slow Overlapping Images Raw Images mimgtbl Metadata about projected images moverlaps Difference table mbgmodel Data server at UVA Plane-fitting table Global corrections mprojexec mdiffexec Compute nodes at NCSA mbgexec madd Corrected Images An Image File

Results Montage workflow 2Mass object Baseline (sec.) Rate Throttled (sec.) Speedup (Baseline/ Throttle) M70 253.3 219.5 1.15 M71 274.3 226.5 1.21 M72 349.5 266.8 1.31 M73 307.5 277.5 1.11 M74 267.6 234.3 1.14 M75 294.0 256.3 1.15 M76 299.5 268.6 1.12 M77 319.7 282.8 1.13 M78 276.4 242.5 1.14 M79 296.1 263.6 1.12 Average 293.8 253.8 1.16

Conclusions Current workflow system lacks control over data movement delay Relative bandwidth guarantee via QoSenabled GridFTP can improve workflow performance as well as enhance efficiency in bandwidth consumption

Questions? (sp2kn@virginia.edu)