Meeting the Challenges of Managing Large-Scale Scientific Workflows in Distributed Environments
|
|
- Valentine Russell
- 6 years ago
- Views:
Transcription
1 Meeting the Challenges of Managing Large-Scale Scientific s in Distributed Environments Ewa Deelman Yolanda Gil USC Information Sciences Institute
2 Scientific s Current workflow approaches are exploring specific aspects of the problem: Creation, reuse, provenance, performance, reliability New requirements are emerging Streaming data, from batch to interactive steering, event-driven analysis, collaborative design of workflows Need to develop a science of workflows A more comprehensive treatment of workflow lifecycle Understand current and long-term requirements from science applications reproducibility s as first-class citizens in CyberInfrastructure
3 Lifecycle Data Products Adapt, Modify and Component Libraries Template User Experiences Populate with data Data, Metadata Catalogs Data, Metadata, Provenance Information Instance Scheduling/ Execution Compute, Storage and Network Resources Execute Executable Map to available resources Planning Resource, Application Component Descriptions Ewa Deelman
4 Outline Rendering the workflow lifecycle Wings/Pegasus/DAGMan Challenges across the various aspects of workflow management User experiences Planning/Mapping Execution s-what are they good for? Research issues Conclusions
5 Lifecycle Data Products WINGS Adapt, Modify and Component Libraries Template WINGS Populate with data Data, Metadata Catalogs Data, Metadata, Provenance Information Instance Compute, Storage and Network Resources Execute DAGMan Executable Map to available resources Pegasus Resource, Application Component Descriptions Ewa Deelman
6 Entities Decimate FFT Template Instance GridFTP f1 S1-> R1 Decimate ( f1) at R1 GridFTP g1 R1-> R2 FFT(g1) at R2 GridFTP h1 R2->S1 Register h1 in RLS GridFTP f2 S1-> R1 Decimate ( f2) at R1 GridFTP g2 R1-> R2 FFT(g2) at R2 GridFTP h2 R2->S1 Register h2 in RLS GridFTP f1000 S1-> R1 Decimate ( f1000) at R1 GridFTP g1000 R1-> R2 FFT(g1000) at R2 GridFTP h1000 R2->S1 Register h1000 in RLS Executable
7 WINGS/Pegasus: Instance Generation and Selection, Using semantic technologies for workflow generation Show me workflows that generate hazard maps WINGS Selection - templates specify complex analyses sequences - instances specify data Libraries Creation Validate this workflow based on the component specs EXPERT SCIENTIST SCIENTIST Run that with the USGS data set Template Data Selection Instance Pegasus Ontologies: Domain terms, Component types, Products (OWL) Data Repositories - Preexisting data collections - execution results Executable DAGMan/ Globus Application Components Component Specification - Specifies data requirements - Specifies execution requirements SCIENTIST RESEARCHING Here is a new NEW MODELS wave propagation model, takes in a series of fault ruptures, is compiled for MPI Wings for Pegasus: A Semantic Approach to Creating Very Large Scientific s Yolanda Gil, Varun Ratnakar, Ewa Deelman, Marc Spraragen, and Jihie Kim, OWL: Experiences and Directions 2006
8 Pegasus: Planning for Execution in Grids Maps from workflow instance to executable workflow Automatically locates physical locations for both workflow components and data Finds appropriate resources to execute the components Augments the workflow with data staging and registration Reuses existing data products where applicable Publishes newly derived data products /usr/local/bin/fft /home/file1 Executable FFT filea Instance Move filea from host1:// home/filea to host2://home/file1 Ewa Deelman DataTransfer Data Registration
9 Condor DAGMan (University of Wisconsin) Follows dependencies in workflow Releases nodes to execution (to Condor Q) Provides retry capabilities executing done OK waiting
10 Challenges in user experiences Users expectations vary greatly High-level descriptions Detailed plans that include specific resources Users interactions can be exploratory Modifying portions of the workflow as the computation progresses Users need progress, failure information at the right level of detail Ewa Deelman
11 Portals, Providing high-level Interfaces Region Name, Degrees Pegasus JPL User Portal mgridexec Abstract Grid Scheduling and Execution Service ISI Concrete Condor DAGMAN mdagfiles Abstract DAGMan JPL Abstract Service Computational Grid TeraGrid Clusters SDSC m2masslist Image List mnotify IPAC 2MASS Image List Service IPAC User Notification Service NCSA ISI Condor Pool Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking, J. C. Jacob, D. S. Katz, G.. erriman, J. Good, A. C. Laity, E. Deelman, C. Kesselman, G. Singh, M.-H. Su, T. A. Prince, R. Williams,, IJCSE, to appear 2006 Galactic Star Formation Region RCW 49 Ewa Deelman
12 Portals, Providing high-level Interfaces TG Science Gateway, Washington University EarthWorks Project (SCEC), lead by with J. Muench P. Maechling, H. Francoeur, and others SCEC Earthworks: Community Access to Wave Propagation Simulations, J. Muench, H. Francoeur, D. Okaya, Y. Cui, P. Maechling, E. Deelman, G. Mehta, T. Jordan TG 2006 Ewa Deelman
13 SCEC CyberShake, not a one shot workflow Needs to run before rest of the workflow is instantiated
14 Iterative workflow instantiation, mapping and execution PreSGTFD CVMFD CVMFD PreSGTFD XYZGRD CVMFD PreSGTFD CVM CVM CVM PreSGT pmvlchk PreSGT pmvlchk PreSGT oxnamecheck oxnamecheck pmvlchk oxnamecheck 127_6.txt.variation SGT-s000 -h000 SGT _7.txt.variation SGT-s000 -h000 SGT282 Wings Generated Data Files 127_6.txt.variation SGT - s000 - h000 SG T _7.txt.variation SGT- s000 - h000 SG T _11.txt.variation GT S GT - s000 - h000 SG T1 6 1 Pegasus DAGMan Globus G e nmd Se isg e n_ Li G e nmd Se isg e n_ Li... G e nmd Se isg e n_ Li Seismograms_PAS_127_6.grm Seismograms_PAS_127_7.grm Seismograms_PAS_151_11.grm Pe a kv a lca lc Pe a kv a lca lc Pe a kv a lca lc PeakVals_allPAS_127_6.bsa PeakVals_allPAS_127_7.bsa PeakVals_allPAS_151_11.bsa Results Wings for Pegasus: A Semantic Approach to Creating Very Ewa Deelman Large Scientific s Yolanda Gil, Varun Ratnakar, Ewa Deelman, Marc Spraragen, and Jihie Kim, in submission
15 Some challenges in workflow mapping Automated management of data Through workflow modification Efficient mapping the workflow instances to resources Performance Data space optimizations Fault tolerance (involves interfacing with the workflow execution system) Recovery by replanning plan Providing feedback to the user Feasibility, time estimates Ewa Deelman
16 Execution Environment Pegasus DAGMan GRAM PS LSF Condor GridF TP HTT P Storage Condor-G GRAM PS LOCAL SUMIT HOST Condor Q Condor-C Condor LSF Condor -G GridFTP G RAM PS LSF GRAM PS LSF HTTP Storage Condor Condor GridF TP HTT P Storage GridF TP HTT P Storage Ewa Deelman, deelman@isi.edu pegasus.isi.edu
17 A Node clustering A C C C C C C C C Level-based clustering D D Vertical clustering A A Arbitrary clustering cluster _1 cluster _2 C C C C C C C C D Useful for small granularity jobs D Ewa Deelman, deelman@isi.edu pegasus.isi.edu
18 Small 1,200 Montage Montage application ~7,000 compute jobs in instance ~10,000 nodes in the executable workflow same number of clusters as processors speedup of ~15 on 32 processors
19 Efficient data handling Input data is staged dynamically, new data products are generated during execution For large workflows 10,000+ files Similar order of intermediate and output files Total space occupied is far greater than available space failures occur Solution: Determine which data is no longer needed and when Add nodes to the workflow do cleanup data along the way Issues: minimize the number of nodes and dependencies added so as not to slow down workflow execution deal with portions of workflows scheduled to multiple sites deal with files on partition boundaries Ewa Deelman, pegasus.isi.edu
20 Challenges in Execution Provide fault tolerance Mask errors, Interact with the workflow planner Support resource provisioning Provide monitoring information Providing execution-level provenance Support debugging Provide workflow traces for easy replay
21 Southern California Earthquake Center (SCEC) workflows on the TeraGrid Executable workflow (nice TeraGrid folks) Hazard Map Condor Glide-ins Provision the resources Resource Descriptions VDS Provenance Tracking Catalog Record Information about the Task Info Pegasus Map the onto the Grid resources Executable Run the on the Grid Resources Tasks Abstract Condor DAGMan Globus Joint work with: R. Graves, T. Jordan, C. Kesselman, P. Maechling, D. Okaya & others
22 Gurmeet Singh et al. Application-level Resource Provisioning, Wednesday, M15, 14:30-16:00 session SCEC on the TeraGrid Fall Number of jobs per day (23 days), 261,823 jobs total, Number of CPU hours per day, 15,706 hours total (1.8 years), 10T of data JOS HRS /19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27 10/28 10/29 10/30 10/31 11/1 11/2 11/3 11/4 11/5 11/6 11/7 11/8 11/9 11/10
23 enefits of Scientific s (from the point of view of an application scientist) Conducts a series of computational tasks. Resources distributed across Internet. Chaining (outputs become inputs) replaces manual hand-offs. Accelerated creation of products. Ease of use - gives non-developers access to sophisticated codes. Avoids need to download-install-learn how to use someone else's code. Provides framework to host or assemble community set of applications. Honors original codes. Allows for heterogeneous coding styles. Framework to define common formats or standards when useful. Promotes exchange of data, products, codes. Community metadata. Multi-disciplinary workflows can promote even broader collaborations. E.g., ground motions fed into simulation of building shaking. Certain rules or guidelines make it easier to add a code into a workflow. Slide courtesy of David Okaya, SCEC, USC
24 s for education and sharing Application specialists design individual application components Domain experts compose workflows using application components Set correct parameters for components Pick appropriate data sets Students run sophisticated workflows on training data sets Young researchers run sophisticated workflows on data sets of interest to them Scientist share workflows across collaborations to validate a hypothesis Need to develop tools, workflow libraries, component libraries
25 Current and Future Research Resource selection Resource provisioning restructuring Adaptive computing refinement adapts to changing execution environment provenance (including provenance of the mapping process) new collaboration with Luc Moreau Management and optimization across multiple workflows debugging Streaming data workflows Automated guidance for workflow restructuring Support for long-lived and recurrent workflows Ewa Deelman, pegasus.isi.edu
26 General Conclusions s are recipes for CyberInfrastructure Need to support the dynamic nature of science Support for long-lived and recurrent workflows Many challenges and many workflow tools out there Interoperability is desired Need common representations that can be used by various workflow management systems Maybe semantic technologies? Need common provenance tracking capabilities See IPAW 06, and the Provenance Challenge To make forward progress collaboration with application scientists is essential collaboration between workflow system designers is essential Ewa Deelman
27 Scientific s a very active area Many workshops Special issues of SIGMOD 2005, JOGC 2005, SciProg 2006 (to appear) ook on e-science s (Taylor, Deelman, Gannon, Shields eds.) to appear 2006 ill Gate s SC 2005 Keynote NSF Workshop on the Challenges of Scientific s (co-chaired with Yolanda Gil), May 2006, Ewa Deelman
28 Acknowledgments Pegasus is being developed at ISI by Gaurang Mehta, Mei-Hui Su, and Karan Vahi Wings is lead by Yolanda Gil, Jihie Kim, Varun Ratnakar DAGMan is lead by Miron Livny Many application scientists made the workflows happen (GriPhyN, NVO, LIGO, Telescience, SCEC) Ewa Deelman
Part IV. Workflow Mapping and Execution in Pegasus. (Thanks to Ewa Deelman)
AAAI-08 Tutorial on Computational Workflows for Large-Scale Artificial Intelligence Research Part IV Workflow Mapping and Execution in Pegasus (Thanks to Ewa Deelman) 1 Pegasus-Workflow Management System
More informationManaging Large-Scale Scientific Workflows in Distributed Environments: Experiences and Challenges
Managing Large-Scale Scientific s in Distributed Environments: Experiences and Challenges Ewa Deelman, Yolanda Gil USC Information Sciences Institute, Marina Del Rey, CA 90292, deelman@isi.edu, gil@isi.edu
More informationClouds: An Opportunity for Scientific Applications?
Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences Institute Acknowledgements Yang-Suk Ki (former PostDoc, USC) Gurmeet Singh (former Ph.D. student, USC) Gideon Juve
More informationPlanning the SCEC Pathways: Pegasus at work on the Grid
Planning the SCEC Pathways: Pegasus at work on the Grid Philip Maechling, Vipin Gupta, Thomas H. Jordan Southern California Earthquake Center Ewa Deelman, Yolanda Gil, Sridhar Gullapalli, Carl Kesselman,
More informationProvenance Trails in the Wings/Pegasus System
To appear in the Journal of Concurrency And Computation: Practice And Experience, 2007 Provenance Trails in the Wings/Pegasus System Jihie Kim, Ewa Deelman, Yolanda Gil, Gaurang Mehta, Varun Ratnakar Information
More informationManaging large-scale workflows with Pegasus
Funded by the National Science Foundation under the OCI SDCI program, grant #0722019 Managing large-scale workflows with Pegasus Karan Vahi ( vahi@isi.edu) Collaborative Computing Group USC Information
More informationPegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute
Pegasus Workflow Management System Gideon Juve USC Informa3on Sciences Ins3tute Scientific Workflows Orchestrate complex, multi-stage scientific computations Often expressed as directed acyclic graphs
More informationSemantic Metadata Generation for Large Scientific Workflows
Semantic Metadata Generation for Large Scientific Workflows Jihie Kim 1, Yolanda Gil 1, and Varun Ratnakar 1, 1 Information Sciences Institute, University of Southern California 4676 Admiralty Way, Marina
More informationMapping Semantic Workflows to Alternative Workflow Execution Engines
Proceedings of the Seventh IEEE International Conference on Semantic Computing (ICSC), Irvine, CA, 2013. Mapping Semantic Workflows to Alternative Workflow Execution Engines Yolanda Gil Information Sciences
More informationA Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid
A Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid Daniel S. Katz, Joseph C. Jacob Jet Propulsion Laboratory California Institute of Technology Pasadena, CA 91109 Daniel.S.Katz@jpl.nasa.gov
More informationA Cleanup Algorithm for Implementing Storage Constraints in Scientific Workflow Executions
2014 9th Workshop on Workflows in Support of Large-Scale Science A Cleanup Algorithm for Implementing Storage Constraints in Scientific Workflow Executions Sudarshan Srinivasan Department of Computer Science
More informationMapping Abstract Complex Workflows onto Grid Environments
Mapping Abstract Complex Workflows onto Grid Environments Ewa Deelman, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Karan Vahi Information Sciences Institute University of Southern California
More informationWings for Pegasus: Creating Large-Scale Scientific Applications Using Semantic Representations of Computational Workflows
Proceedings of the Nineteenth Conference on Innovative Applications of Artificial Intelligence (IAAI-07), July 22 26, 2007, Vancouver, British Columbia, Canada. Wings for Pegasus: Creating Large-Scale
More informationIntegrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example
Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mandal, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, Karan Vahi USC Information Sciences Institute Marina Del Rey, CA 90292
More informationPegasus WMS Automated Data Management in Shared and Nonshared Environments
Pegasus WMS Automated Data Management in Shared and Nonshared Environments Mats Rynge USC Information Sciences Institute Pegasus Workflow Management System NSF funded project and developed
More informationCorral: A Glide-in Based Service for Resource Provisioning
: A Glide-in Based Service for Resource Provisioning Gideon Juve USC Information Sciences Institute juve@usc.edu Outline Throughput Applications Grid Computing Multi-level scheduling and Glideins Example:
More informationLIGO Virtual Data. Realizing. UWM: Bruce Allen, Scott Koranda. Caltech: Kent Blackburn, Phil Ehrens, Albert. Lazzarini, Roy Williams
Realizing LIGO Virtual Data Caltech: Kent Blackburn, Phil Ehrens, Albert Lazzarini, Roy Williams ISI: Ewa Deelman, Carl Kesselman, Gaurang Mehta, Leila Meshkat, Laura Pearlman UWM: Bruce Allen, Scott Koranda
More informationKickstarting Remote Applications
Kickstarting Remote Applications Jens-S. Vöckler 1 Gaurang Mehta 1 Yong Zhao 2 Ewa Deelman 1 Mike Wilde 3 1 University of Southern California Information Sciences Institute 4676 Admiralty Way Ste 1001
More informationAutomating Real-time Seismic Analysis
Automating Real-time Seismic Analysis Through Streaming and High Throughput Workflows Rafael Ferreira da Silva, Ph.D. http://pegasus.isi.edu Do we need seismic analysis? Pegasus http://pegasus.isi.edu
More informationPegasus. Automate, recover, and debug scientific computations. Mats Rynge https://pegasus.isi.edu
Pegasus Automate, recover, and debug scientific computations. Mats Rynge rynge@isi.edu https://pegasus.isi.edu Why Pegasus? Automates complex, multi-stage processing pipelines Automate Enables parallel,
More informationOn the Use of Cloud Computing for Scientific Workflows
Fourth IEEE International Conference on escience On the Use of Cloud Computing for Scientific Workflows Christina Hoffa 1, Gaurang Mehta 2, Timothy Freeman 3, Ewa Deelman 2, Kate Keahey 3, Bruce Berriman
More informationThe Problem of Grid Scheduling
Grid Scheduling The Problem of Grid Scheduling Decentralised ownership No one controls the grid Heterogeneous composition Difficult to guarantee execution environments Dynamic availability of resources
More informationEnabling Large-scale Scientific Workflows on Petascale Resources Using MPI Master/Worker
Enabling Large-scale Scientific Workflows on Petascale Resources Using MPI Master/Worker Mats Rynge 1 rynge@isi.edu Gideon Juve 1 gideon@isi.edu Karan Vahi 1 vahi@isi.edu Scott Callaghan 2 scottcal@usc.edu
More informationA Cloud-based Dynamic Workflow for Mass Spectrometry Data Analysis
A Cloud-based Dynamic Workflow for Mass Spectrometry Data Analysis Ashish Nagavaram, Gagan Agrawal, Michael A. Freitas, Kelly H. Telu The Ohio State University Gaurang Mehta, Rajiv. G. Mayani, Ewa Deelman
More informationPegasus. Automate, recover, and debug scientific computations. Rafael Ferreira da Silva.
Pegasus Automate, recover, and debug scientific computations. Rafael Ferreira da Silva http://pegasus.isi.edu Experiment Timeline Scientific Problem Earth Science, Astronomy, Neuroinformatics, Bioinformatics,
More informationIntegration of Workflow Partitioning and Resource Provisioning
Integration of Workflow Partitioning and Resource Provisioning Weiwei Chen Information Sciences Institute University of Southern California Marina del Rey, CA, USA wchen@isi.edu Ewa Deelman Information
More informationOn the Use of Cloud Computing for Scientific Workflows
On the Use of Cloud Computing for Scientific Workflows Christina Hoffa 1, Gaurang Mehta 2, Timothy Freeman 3, Ewa Deelman 2, Kate Keahey 3, Bruce Berriman 4, John Good 4 1 Indiana University, 2 University
More informationWINGS: Intelligent Workflow- Based Design of Computational Experiments
IEEE Intelligent Systems, Vol. 26, No. 1, 2011. WINGS: Intelligent Workflow- Based Design of Computational Experiments Yolanda Gil, Varun Ratnakar, Jihie Kim, Pedro Antonio González-Calero, Paul Groth,
More informationPart III. Computational Workflows in Wings/Pegasus
AAAI-08 Tutorial on Computational Workflows for Large-Scale Artificial Intelligence Research Part III Computational Workflows in Wings/Pegasus 1 Our Approach Express analysis as distributed workflows Data
More informationIntroduction to Grid Computing
Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able
More informationScheduling Data-Intensive Workflows onto Storage-Constrained Distributed Resources
Scheduling Data-Intensive Workflows onto Storage-Constrained Distributed Resources Arun Ramakrishnan 1, Gurmeet Singh 2, Henan Zhao 3, Ewa Deelman 2, Rizos Sakellariou 3, Karan Vahi 2 Kent Blackburn 4,
More information10 years of CyberShake: Where are we now and where are we going
10 years of CyberShake: Where are we now and where are we going with physics based PSHA? Scott Callaghan, Robert W. Graves, Kim B. Olsen, Yifeng Cui, Kevin R. Milner, Christine A. Goulet, Philip J. Maechling,
More informationChapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.
Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies
More informationData Placement for Scientific Applications in Distributed Environments
Data Placement for Scientific Applications in Distributed Environments Ann Chervenak, Ewa Deelman, Miron Livny 2, Mei-Hui Su, Rob Schuler, Shishir Bharathi, Gaurang Mehta, Karan Vahi USC Information Sciences
More informationTransparent Grid Computing: a Knowledge-Based Approach
Transparent Grid Computing: a Knowledge-Based Approach Jim Blythe, Ewa Deelman, Yolanda Gil, Carl Kesselman USC Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina Del Rey, CA 90292 {blythe,
More informationIntroduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project
Introduction to GT3 The Globus Project Argonne National Laboratory USC Information Sciences Institute Copyright (C) 2003 University of Chicago and The University of Southern California. All Rights Reserved.
More informationPegasus. Automate, recover, and debug scientific computations. Mats Rynge
Pegasus Automate, recover, and debug scientific computations. Mats Rynge rynge@isi.edu https://pegasus.isi.edu Why Pegasus? Automates complex, multi-stage processing pipelines Automate Enables parallel,
More informationAn Intelligent Assistant for Interactive Workflow Composition
In Proceedings of the International Conference on Intelligent User Interfaces (IUI-2004); Madeira, Portugal, 2004 An Intelligent Assistant for Interactive Workflow Composition Jihie Kim, Marc Spraragen
More informationOptimizing workflow data footprint
Scientific Programming 15 (2007) 249 268 249 IOS Press Optimizing workflow data footprint Gurmeet Singh a, Karan Vahi a, Arun Ramakrishnan a, Gaurang Mehta a, Ewa Deelman a, Henan Zhao b, Rizos Sakellariou
More informationScientific Workflow Scheduling with Provenance Support in Multisite Cloud
Scientific Workflow Scheduling with Provenance Support in Multisite Cloud Ji Liu 1, Esther Pacitti 1, Patrick Valduriez 1, and Marta Mattoso 2 1 Inria, Microsoft-Inria Joint Centre, LIRMM and University
More information7 th International Digital Curation Conference December 2011
Golden Trail 1 Golden-Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository Practice Paper Paolo Missier, Newcastle University, UK Bertram Ludäscher, Saumen Dey, Michael
More information{deelman,
USING CLOUDS FOR SCIENCE, IS IT JUST KICKING THE CAN DOWN THE ROAD? Ewa Deelman 1, Gideon Juve 1 and G. Bruce Berriman 2 1 Information Sciences Institute, University of Southern California, Marina del
More informationPlanning and Metadata on the Computational Grid
Planning and Metadata on the Computational Grid Jim Blythe, Ewa Deelman, Yolanda Gil USC Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina Del Rey, CA 90292 {blythe, deelman, gil}@isi.edu
More informationPegasus. Pegasus Workflow Management System. Mats Rynge
Pegasus Pegasus Workflow Management System Mats Rynge rynge@isi.edu https://pegasus.isi.edu Automate Why workflows? Recover Automates complex, multi-stage processing pipelines Enables parallel, distributed
More informationGrid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007
Grid Programming: Concepts and Challenges Michael Rokitka SUNY@Buffalo CSE510B 10/2007 Issues Due to Heterogeneous Hardware level Environment Different architectures, chipsets, execution speeds Software
More informationParameterized specification, configuration and execution of data-intensive scientific workflows
Cluster Comput (2010) 13: 315 333 DOI 10.1007/s10586-010-0133-8 Parameterized specification, configuration and execution of data-intensive scientific workflows Vijay S. Kumar Tahsin Kurc Varun Ratnakar
More informationGrid Scheduling Architectures with Globus
Grid Scheduling Architectures with Workshop on Scheduling WS 07 Cetraro, Italy July 28, 2007 Ignacio Martin Llorente Distributed Systems Architecture Group Universidad Complutense de Madrid 1/38 Contents
More informationData Management for Distributed Scientific Collaborations Using a Rule Engine
Data Management for Distributed Scientific Collaborations Using a Rule Engine Sara Alspaugh Department of Computer Science University of Virginia alspaugh@virginia.edu Ann Chervenak Information Sciences
More informationSTORK: Making Data Placement a First Class Citizen in the Grid
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar University of Wisconsin-Madison May 25 th, 2004 CERN Need to move data around.. TB PB TB PB While doing this.. Locate the data
More informationISSN: Supporting Collaborative Tool of A New Scientific Workflow Composition
Abstract Supporting Collaborative Tool of A New Scientific Workflow Composition Md.Jameel Ur Rahman*1, Akheel Mohammed*2, Dr. Vasumathi*3 Large scale scientific data management and analysis usually relies
More informationData Throttling for Data-Intensive Workflows
Data Throttling for Data-Intensive Workflows Sang-Min Park and Marty Humphrey Department of Computer Science, University of Virginia, Charlottesville, VA 22904 { sp2kn humphrey }@cs.virginia.edu Abstract
More informationThe Role of Planning in Grid Computing
From: ICAPS-03 Proceedings. Copyright 2003, AAAI (www.aaai.org). All rights reserved. The Role of Planning in Grid Computing Jim Blythe, Ewa Deelman, Yolanda Gil, Carl Kesselman, Amit Agarwal, Gaurang
More informationWorkflow Management and Virtual Data
Workflow Management and Virtual Data Ewa Deelman USC Information Sciences Institute Tutorial Objectives Provide a detailed introduction to existing services for workflow and virtual data management Provide
More informationStorage-system Research Group Barcelona Supercomputing Center *Universitat Politècnica de Catalunya
Replica management in GRID superscalar Jesús Malo Poyatos, Jonathan Martí Fraiz, Toni Cortes* Storage-system Research Group Barcelona Supercomputing Center {jesus.malo,jonathan.marti,toni.cortes}@bsc.es
More informationOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows Rafael Ferreira da Silva, Scott Callaghan, Ewa Deelman 12 th Workflows in Support of Large-Scale Science (WORKS) SuperComputing
More informationParameterized Specification, Configuration and Execution of Data-Intensive Scientific Workflows
Noname manuscript No. (will be inserted by the editor) Parameterized Specification, Configuration and Execution of Data-Intensive Scientific Workflows Vijay S. Kumar Tahsin Kurc Varun Ratnakar Jihie Kim
More informationMultiple Broker Support by Grid Portals* Extended Abstract
1. Introduction Multiple Broker Support by Grid Portals* Extended Abstract Attila Kertesz 1,3, Zoltan Farkas 1,4, Peter Kacsuk 1,4, Tamas Kiss 2,4 1 MTA SZTAKI Computer and Automation Research Institute
More informationWhat makes workflows work in an opportunistic environment?
What makes workflows work in an opportunistic environment? Ewa Deelman 1 Tevfik Kosar 2 Carl Kesselman 1 Miron Livny 2 1 USC Information Science Institute, Marina Del Rey, CA deelman@isi.edu, carl@isi.edu
More informationEFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD
EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD S.THIRUNAVUKKARASU 1, DR.K.P.KALIYAMURTHIE 2 Assistant Professor, Dept of IT, Bharath University, Chennai-73 1 Professor& Head, Dept of IT, Bharath
More informationWorkflow, Planning and Performance Information, information, information Dr Andrew Stephen M c Gough
Workflow, Planning and Performance Information, information, information Dr Andrew Stephen M c Gough Technical Coordinator London e-science Centre Imperial College London 17 th March 2006 Outline Where
More informationUSC Viterbi School of Engineering
Introduction to Computational Thinking and Data Science USC Viterbi School of Engineering http://www.datascience4all.org Term: Fall 2016 Time: Tues- Thur 10am- 11:50am Location: Allan Hancock Foundation
More informationTHE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid
THE GLOBUS PROJECT White Paper GridFTP Universal Data Transfer for the Grid WHITE PAPER GridFTP Universal Data Transfer for the Grid September 5, 2000 Copyright 2000, The University of Chicago and The
More informationWorkflowSim: A Toolkit for Simulating Scientific Workflows in Distributed Environments
WorkflowSim: A Toolkit for Simulating Scientific Workflows in Distributed Environments Weiwei Chen Information Sciences Institute University of Southern California Marina del Rey, CA, USA E-mail: wchen@isi.edu
More informationMetadata Catalogs with Semantic Representations
International Provenance and Annotation Workshop (IPAW'06), Chicago, Illinois, May 3-5, 2006. s with Semantic Representations Yolanda Gil, Varun Ratnakar, and Ewa Deelman USC Information Sciences Institute
More informationTackling the Provenance Challenge One Layer at a Time
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE [Version: 2002/09/19 v2.02] Tackling the Provenance Challenge One Layer at a Time Carlos Scheidegger 1, David Koop 2, Emanuele Santos 1, Huy Vo 1, Steven
More informationProvenance-aware Faceted Search in Drupal
Provenance-aware Faceted Search in Drupal Zhenning Shangguan, Jinguang Zheng, and Deborah L. McGuinness Tetherless World Constellation, Computer Science Department, Rensselaer Polytechnic Institute, 110
More informationOpenSees on Teragrid
OpenSees on Teragrid Frank McKenna UC Berkeley OpenSees Parallel Workshop Berkeley, CA What isteragrid? An NSF sponsored computational science facility supported through a partnership of 13 institutions.
More informationCybersecurity Through Nimble Task Allocation: Semantic Workflow Reasoning for Mission-Centered Network Models
Cybersecurity Through Nimble Task Allocation: Semantic Workflow Reasoning for Mission-Centered Network Models Information Sciences Institute University of Southern California gil@isi.edu http://www.isi.edu/~gil
More informationGrid Computing Middleware. Definitions & functions Middleware components Globus glite
Seminar Review 1 Topics Grid Computing Middleware Grid Resource Management Grid Computing Security Applications of SOA and Web Services Semantic Grid Grid & E-Science Grid Economics Cloud Computing 2 Grid
More informationGrid Compute Resources and Job Management
Grid Compute Resources and Job Management How do we access the grid? Command line with tools that you'll use Specialised applications Ex: Write a program to process images that sends data to run on the
More informationTackling the Provenance Challenge One Layer at a Time
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE [Version: 2002/09/19 v2.02] Tackling the Provenance Challenge One Layer at a Time Carlos Scheidegger 1, David Koop 2, Emanuele Santos 1, Huy Vo 1, Steven
More informationHigh Throughput Urgent Computing
Condor Week 2008 High Throughput Urgent Computing Jason Cope jason.cope@colorado.edu Project Collaborators Argonne National Laboratory / University of Chicago Pete Beckman Suman Nadella Nick Trebon University
More informationNUSGRID a computational grid at NUS
NUSGRID a computational grid at NUS Grace Foo (SVU/Academic Computing, Computer Centre) SVU is leading an initiative to set up a campus wide computational grid prototype at NUS. The initiative arose out
More informationGT-OGSA Grid Service Infrastructure
Introduction to GT3 Background The Grid Problem The Globus Approach OGSA & OGSI Globus Toolkit GT3 Architecture and Functionality: The Latest Refinement of the Globus Toolkit Core Base s User-Defined s
More informationDay 1 : August (Thursday) An overview of Globus Toolkit 2.4
An Overview of Grid Computing Workshop Day 1 : August 05 2004 (Thursday) An overview of Globus Toolkit 2.4 By CDAC Experts Contact :vcvrao@cdacindia.com; betatest@cdacindia.com URL : http://www.cs.umn.edu/~vcvrao
More informationScientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego
Scientific Workflow Tools Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego 1 escience Today Increasing number of Cyberinfrastructure (CI) technologies Data Repositories: Network
More informationAccelerating the Scientific Exploration Process with Kepler Scientific Workflow System
Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System Jianwu Wang, Ilkay Altintas Scientific Workflow Automation Technologies Lab SDSC, UCSD project.org UCGrid Summit,
More informationProvenance-Aware Faceted Search in Drupal
Provenance-Aware Faceted Search in Drupal Zhenning Shangguan, Jinguang Zheng, and Deborah L. McGuinness Tetherless World Constellation, Computer Science Department, Rensselaer Polytechnic Institute, 110
More informationScheduling of Scientific Workflows in the ASKALON Grid Environment
Scheduling of Scientific Workflows in the ASKALON Grid Environment Marek Wieczorek, Radu Prodan and Thomas Fahringer Institute of Computer Science, University of Innsbruck Technikerstraße 21a, A-6020 Innsbruck,
More informationEfficient Dynamic Resource Allocation Using Nephele in a Cloud Environment
International Journal of Scientific & Engineering Research Volume 3, Issue 8, August-2012 1 Efficient Dynamic Resource Allocation Using Nephele in a Cloud Environment VPraveenkumar, DrSSujatha, RChinnasamy
More informationGrid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms
Grid Computing 1 Resource sharing Elements of Grid Computing - Computers, data, storage, sensors, networks, - Sharing always conditional: issues of trust, policy, negotiation, payment, Coordinated problem
More informationzymake: a computational workflow system for machine learning and natural language processing
zymake: a computational workflow system for machine learning and natural language processing Eric Breck Department of Computer Science Cornell University Ithaca, NY 14853 USA ebreck@cs.cornell.edu Abstract
More informationScience-as-a-Service
Science-as-a-Service The iplant Foundation Rion Dooley Edwin Skidmore Dan Stanzione Steve Terry Matthew Vaughn Outline Why, why, why! When duct tape isn t enough Building an API for the web Core services
More informationIntroduction. Software Trends. Topics for Discussion. Grid Technology. GridForce:
GridForce: A Multi-tier Approach to Prepare our Workforce for Grid Technology Bina Ramamurthy CSE Department University at Buffalo (SUNY) 201 Bell Hall, Buffalo, NY 14260 716-645-3180 (108) bina@cse.buffalo.edu
More informationScheduling distributed applications can be challenging in a multi-cloud environment due to the lack of knowledge
ISSN: 0975-766X CODEN: IJPTFI Available Online through Research Article www.ijptonline.com CHARACTERIZATION AND PROFILING OFSCIENTIFIC WORKFLOWS Sangeeth S, Srikireddy Sai Kiran Reddy, Viswanathan M*,
More informationOverview. Scientific workflows and Grids. Kepler revisited Data Grids. Taxonomy Example systems. Chimera GridDB
Grids and Workflows Overview Scientific workflows and Grids Taxonomy Example systems Kepler revisited Data Grids Chimera GridDB 2 Workflows and Grids Given a set of workflow tasks and a set of resources,
More informationGergely Sipos MTA SZTAKI
Application development on EGEE with P-GRADE Portal Gergely Sipos MTA SZTAKI sipos@sztaki.hu EGEE Training and Induction EGEE Application Porting Support www.lpds.sztaki.hu/gasuc www.portal.p-grade.hu
More informationescience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows
escience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows Jie Li1, Deb Agarwal2, Azure Marty Platform Humphrey1, Keith Jackson2, Catharine van Ingen3, Youngryel Ryu4
More informationDecreasing End-to Job Execution Times by Increasing Resource Utilization using Predictive Scheduling in the Grid
Decreasing End-to to-end Job Execution Times by Increasing Resource Utilization using Predictive Scheduling in the Grid Ioan Raicu Computer Science Department University of Chicago Grid Computing Seminar
More informationComplex Workloads on HUBzero Pegasus Workflow Management System
Complex Workloads on HUBzero Pegasus Workflow Management System Karan Vahi Science Automa1on Technologies Group USC Informa1on Sciences Ins1tute HubZero A valuable platform for scientific researchers For
More informationAn Architectural Model for a Grid based Workflow Management Platform in Scientific Applications
An Architectural Model for a Grid based Workflow Management Platform in Scientific Applications Alexandru Costan*, Florin Pop*, Corina Stratan*, Ciprian Dobre*, Catalin Leordeanu*, Valentin Cristea* *Faculty
More informationVirtual Data Grid Middleware Services for Data-Intensive Science
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1 7 [Version: 2002/09/19 v2.02] Virtual Data Grid Middleware Services for Data-Intensive Science Yong
More informationCSEP Progress Report. D. Schorlemmer (USC) T. Jordan, M. Liukis, P. Maechling (USC) M. Gerstenberger (GNS) F. Euchner, S. Wiemer, J.
CSEP Progress Report D. Schorlemmer (USC) T. Jordan, M. Liukis, P. Maechling (USC) M. Gerstenberger (GNS) F. Euchner, S. Wiemer, J. Woessner (ETH) Major Tasks Software CSEP Software V1.0 development Distribution
More informationGriPhyN-LIGO Prototype Draft Please send comments to Leila Meshkat
Technical Report GriPhyN-2001-18 www.griphyn.org GriPhyN-LIGO Prototype Draft Please send comments to Leila Meshkat (meshkat@isi.edu) Kent Blackburn, Phil Ehrens, Albert Lazzarini, Roy Williams Caltech
More informationReduction of Workflow Resource Consumption Using a Density-based Clustering Model
Reduction of Workflow Resource Consumption Using a Density-based Clustering Model Qimin Zhang, Nathaniel Kremer-Herman, Benjamin Tovar, and Douglas Thain WORKS Workshop at Supercomputing 2018 The Cooperative
More informationarxiv: v1 [astro-ph.im] 19 May 2010
The Role of Provenance Management in Accelerating the Rate of Astronomical Research arxiv:1005.3358v1 [astro-ph.im] 19 May 2010 NASA Exoplanet Science Institute, Infrared Processing and Analysis Center,
More informationDennis Gannon Data Center Futures Microsoft Research
Dennis Gannon Data Center Futures Microsoft Research The question What is role of the private sector data center in the national science agenda? The Quest to Broaden Access Science as a Web Service Science
More informationCarelyn Campbell, Ben Blaiszik, Laura Bartolo. November 1, 2016
Carelyn Campbell, Ben Blaiszik, Laura Bartolo November 1, 2016 Data Landscape Collaboration Tools (e.g. Google Drive, DropBox, Sharepoint, Github, MatIN) Data Sharing Communities (e.g. Dryad, FigShare,
More informationPOMELo: A PML Online Editor
POMELo: A PML Online Editor Alvaro Graves Tetherless World Constellation Department of Cognitive Sciences Rensselaer Polytechnic Institute Troy, NY 12180 gravea3@rpi.edu Abstract. This paper introduces
More informationCreating Personal Adaptive Clusters for Managing Scientific Jobs in a Distributed Computing Environment
1 Creating Personal Adaptive Clusters for Managing Scientific Jobs in a Distributed Computing Environment Edward Walker, Jeffrey P. Gardner, Vladimir Litvin, and Evan L. Turner Abstract We describe a system
More information