San Diego Supercomputer Center, UCSD, U.S.A. The Consortium for Conservation Medicine, Wildlife Trust, U.S.A.

Similar documents
A High-Level Distributed Execution Framework for Scientific Workflows

Accelerating Parameter Sweep Workflows by Utilizing Ad-hoc Network Computing Resources: an Ecological Example

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System

A High-Level Distributed Execution Framework for Scientific Workflows

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

Kepler and Grid Systems -- Early Efforts --

Sliding Window Calculations on Streaming Data using the Kepler Scientific Workflow System

Big Data Applications using Workflows for Data Parallel Computing

A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System

Hands-on tutorial on usage the Kepler Scientific Workflow System

Integrated Machine Learning in the Kepler Scientific Workflow System

Using Web Services and Scientific Workflow for Species Distribution Prediction Modeling 1

Workflow Fault Tolerance for Kepler. Sven Köhler, Thimothy McPhillips, Sean Riddle, Daniel Zinn, Bertram Ludäscher

Heterogeneous Workflows in Scientific Workflow Systems

The Use of Cloud Computing Resources in an HPC Environment

Approaches to Distributed Execution of Scientific Workflows in Kepler

Context-Aware Actors. Outline

Parameter Space Exploration using Scientific Workflows

Scientific Workflows

biokepler: A Comprehensive Bioinforma2cs Scien2fic Workflow Module for Distributed Analysis of Large- Scale Biological Data

Workflow Exchange and Archival: The KSW File and the Kepler Object Manager. Shawn Bowers (For Chad Berkley & Matt Jones)

Component-Based Design of Embedded Control Systems

Concurrent Component Patterns, Models of Computation, and Types

Grid Resources Search Engine based on Ontology

Challenges and Approaches for Distributed Workflow-Driven Analysis of Large-Scale Biological Data

Autonomic Activities in the Execution of Scientific Workflows: Evaluation of the AWARD Framework

Process-Based Software Components Final Mobies Presentation

Procedia Computer Science

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Provenance for MapReduce-based Data-Intensive Workflows

Nimrod/K: Towards Massively Parallel Dynamic Grid Workflows

Autonomic Activities in the Execution of Scientific Workflows: Evaluation of the AWARD Framework

Large Scale Sky Computing Applications with Nimbus

ACET s e-research Activities

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

Simulation of Conditional Value-at-Risk via Parallelism on Graphic Processing Units

Kepler Scientific Workflow and Climate Modeling

Efficient Resource Management for Cloud Computing Environments

Advanced Supercomputing Hub for OMICS Knowledge in Agriculture. Help to Access Discovery Studio v- 4.1

7 th International Digital Curation Conference December 2011

CIS 1 Lecture Notes. Chapter 1 What is a computer?

Safe, distributed computation for reluctant data sharers: At home in the Microsoft Azure cloud

NVIDIA GRID A True PC Experience for Everyone Anywhere

escience in the Cloud Dan Fay Director Earth, Energy and Environment

Scheduling distributed applications can be challenging in a multi-cloud environment due to the lack of knowledge

Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life

KEPLER: Overview and Project Status

Graphics in the Cloud Will Wade, NVIDIA VGX Product Line Manager Ian Williams, Director of Applied Engineering

Robust Workflows for Science and Engineering

The Gigascale Silicon Research Center

Embedded Software from Concurrent Component Models

A Three Tier Architecture for LiDAR Interpolation and Analysis

System-Level Design Languages: Orthogonalizing the Issues

Virtual GPU 을활용한 VDI 구현엔비디아서완석.

A Dataflow-Oriented Atomicity and Provenance System for Pipelined Scientific Workflows

SSS: An Implementation of Key-value Store based MapReduce Framework. Hirotaka Ogawa (AIST, Japan) Hidemoto Nakada Ryousei Takano Tomohiro Kudoh

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

HPC Architectures. Types of resource currently in use

Automatic Transformation from Geospatial Conceptual Workflow to Executable Workflow Using GRASS GIS Command Line Modules in Kepler *

Kepler: An Extensible System for Design and Execution of Scientific Workflows

2/12/11. Addendum (different syntax, similar ideas): XML, JSON, Motivation: Why Scientific Workflows? Scientific Workflows

The Distributed-SDF Domain

CUDA. Matthew Joyner, Jeremy Williams

Course Content. 07-Feb-17 Faculty of Computer Science & Engineering 1 BK TP.HCM

Sky Computing on FutureGrid and Grid 5000 with Nimbus. Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes Bretagne Atlantique Rennes, France

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

USB 2.0 DISPLAY ADAPTER INSTALLATION GUIDE ON MAC

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

<Insert Picture Here> JavaFX 2.0

Data Management: the What, When and How

Provenance Collection Support in the Kepler Scientific Workflow System

POST-SIEVING ON GPUs

UB CCR's Industry Cluster

Hierarchical FSMs with Multiple CMs

SentinelOne Technical Brief

ISSN: Supporting Collaborative Tool of A New Scientific Workflow Composition

GV-600B, GV-650B, GV-800B

Forming an ad-hoc nearby storage, based on IKAROS and social networking services

Distributed Systems Principles and Paradigms

Giotto Domain. 5.1 Introduction. 5.2 Using Giotto. Edward Lee Christoph Kirsch

USB 3.0 DISPLAY ADAPTER INSTALLATION GUIDE ON MAC

Introduction to GPU hardware and to CUDA

Analysis and summary of stakeholder recommendations First Kepler/CORE Stakeholders Meeting, May 13-15, 2008

Introduction to High-Performance Computing

NVIDIA GRID APPLICATION SIZING FOR AUTODESK REVIT 2016

Geant4 on Azure using Docker containers

The 5th Pan-Galactic BOINC Workshop. Desktop Grid System

Operating Systems Fundamentals. What is an Operating System? Focus. Computer System Components. Chapter 1: Introduction

Development of new security infrastructure design principles for distributed computing systems based on open protocols

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

MOHA: Many-Task Computing Framework on Hadoop

آنستیتیوت تکنالوجی معلوماتی و مخابراتی ICTI

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Smart Flash Drive SecurePRO User Manual. Version_010

The Key Installation Guide

DECENTRALIZED ATTRIBUTE-BASED ENCRYPTION AND DATA SHARING SCHEME IN CLOUD STORAGE

Requirements and Design Overview

IDL DISCOVER WHAT S IN YOUR DATA

Transcription:

Accelerating Parameter Sweep Workflows by Utilizing i Ad-hoc Network Computing Resources: an Ecological Example Jianwu Wang 1, Ilkay Altintas 1, Parviez R. Hosseini 2, Derik Barseghian 2, Daniel Crawl 1, Chad Berkley 3, Matthew B. Jones 3 1 San Diego Supercomputer Center, UCSD, U.S.A. 2 The Consortium for Conservation Medicine, Wildlife Trust, U.S.A. 3 National Center for Ecological Analysis and Synthesis, UCSB, U.S.A.

Outline Introduction Theoretical Ecology Use Case Background Kepler Master-Slave Architecture Our Approach Distributed Composite Actor Provenance Collection Results Conclusion and Future Work A High-Level Distributed Execution Framework for Scientific Workflows 2

Introduction Many scientific computing problems have linear or greater time complexity based on parameter configuration ranges Domain scientists should be able to easily leverage distributed computing resources with little knowledge of the underlying techniques We will discuss a distributed execution framework, called Master-Slave Distribution, to distribute ds bue sub-workflows o s to ad-hoc network computing resources A High-Level Distributed Execution Framework for Scientific Workflows 3

Theoretical Ecology Use Case Parameter Sweep Simhofi Program R App. Result Display Parameter Preparation Computation Tasks Result Display Characteristics ti of the use case Parameter Sweep: independent multiple execution, i.e., embarrassingly parallel l problems Smooth Transition of Computation Environments Partial Workflow Distribution ib ti Provenance Collection A High-Level Distributed Execution Framework for Scientific Workflows 4

Background Kepler Actor-oriented Modeling All these actors inherit the same interfaces, such as prefire(), fire() and postfire() Model of Computation Synchronous Data Flow (SDF) director: actors execute sequentially Process Network (PN) director: each actor has its own execution thread and execute in parallel A High-Level Distributed Execution Framework for Scientific Workflows 5

Background Conceptual Architecture for Workflow Distributed Execution A High-Level Distributed Execution Framework for Scientific Workflows 6

Our Approach Distributed Composite Actor As the role of Master, each token received by this Actor is distributed to a Slave node, executed, and the results returned. Different behavior with different computation models SDF Domain Master PN Domain Master Composite Actor Distributed Composite Actor Composite Actor Composite Actor Distributed Composite Actor Composite Actor 1 2 1 3 Slave 1 Slave 1 3 4 Slave 2 2 4 Slave 2 A High-Level Distributed Execution Framework for Scientific Workflows 7

Our Approach Provenance Collection By collecting workflow structure and executions, our provenance framework make it easier for users to track data files for large parameter sweeps It can be configured to support centralized or decentralized provenance information recording A High-Level Distributed Execution Framework for Scientific Workflows 8

Results Workflow A High-Level Distributed Execution Framework for Scientific Workflows 9

Results Usability Users use the DistributedCompositeActor just like the common composite actor Interaction for execution environment transition A High-Level Distributed Execution Framework for Scientific Workflows 10

Results Experiment Parameters b=<0.1, 0.2, 0.3, 0.4>, d=<2.5, 3.5>, X=3, Y=3, E=4 b=<0.1, 0.2, 0.3, 0.4>, d=<2.5, 3.5>, X=8, Y=8, E=30 b=<0.1, 0.2, 0.3, 0.4>, d=<2.5, 3.5>, X=16, Y=16, E=30 Execution Time (minutes) SDF locally PN locally SDF Master-Slave PN Master-Slave 0.39 0.35 0.60 0.52 32.21 32.24 19.05 9.38 502.2 510 309 147 Testbed Constitution OS Memory CPU Notebook Window XP 2 GB 2.00 GHz Duo Core Desktop Mac OS X 2 GB 2.80 GHz Duo Core A High-Level Distributed Execution Framework for Scientific Workflows 11

Conclusion and Future Work A distributed execution framework in the Kepler Distribute sub-workflows to ad-hoc network computing resources Applicable to parameter sweep applications to realize parallel independent execution Future Work Generalize for Cluster, Grid, and Cloud platforms. Categorize different distributed approaches in Kepler to match different requirements A High-Level Distributed Execution Framework for Scientific Workflows 12

Thanks! For More Information: Distributed Execution Interest Group of Kepler: https://dev.kepler-project.org/developers/interestgroups/distributed ib t d Contact: jianwu@sdsc.edu A High-Level Distributed Execution Framework for Scientific Workflows 13