The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration
|
|
- Warren Cross
- 5 years ago
- Views:
Transcription
1 The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration Summer Grid 2004 UT Brownsville South Padre Island Center 24 June 2004 Mike Wilde Argonne National Laboratory Mathematics and Computer Science Division
2 GriPhyN: Grid Physics Network Mission Enhance scientific productivity through discovery and processing of datasets, using the grid as a scientific workstation Virtual Data enables this approach by creating datasets from workflow recipes and recording their provenance. GriPhyN works to cross the chasm - application and computer scientists create and field-test paradigms and toolkits together 2
3 Acknowledgements: Virtual Data is a Large Team Effort The Chimera Virtual Data System is the work of Ian Foster, Jens Voeckler, Mike Wilde and Yong Zhao The Pegasus Planner is the work of Ewa Deelman, Gaurang Mehta, and Karan Vahi Applications described are the work of many people, including: James Annis, Rick Cavanaugh, Dan Engh, Rob Gardner, Albert Lazzarini, Natalia Maltsev, Marge Bardeen, and their wonderful teams 3
4 Virtual Data Scenario file1 psearch t 10 file8 simulate t 10 file2 reformat f fz file1 file1 File3,4,5 file7 conv I esd o aod file6 summarize t 10 Update workflow following changes Manage workflow; Explain provenance, e.g. for file8: psearch t 10 i file3 file4 file5 o file8 summarize t 10 i file6 o file7 reformat f fz i file2 o file3 file4 file5 conv l esd o aod i file 2 o file6 simulate t 10 o file1 file2 On-demand data generation 4
5 Virtual Data Describes analysis workflow file1 psearch t 10 file8 simulate t 10 file2 reformat f fz file1 file1 File3,4,5 file7 Requested dataset conv I esd o aod file6 summarize t 10 The recorded virtual data recipe here is: Files: 8 < (1,3,4,5,7), 7 < 6, (3,4,5,6) < 2 Programs: 8 < psearch, 7 < summarize, (3,4,5) < reformat, 6 < conv, (1,2) < simulate 5
6 Virtual Data Describes analysis workflow file1 psearch t 10 file8 simulate t 10 file2 reformat f fz file1 file1 File3,4,5 file7 Requested file conv I esd o aod file6 summarize t 10 To recreate file 8: Step 1 simulate > file1, file2 6
7 Virtual Data Describes analysis workflow file1 psearch t 10 file8 simulate t 10 file2 reformat f fz file1 file1 File3,4,5 file7 Requested file conv I esd o aod file6 summarize t 10 To re-create file8: Step 2 files 3, 4, 5, 6 derived from file 2 reformat > file3, file4, file5 conv > file 6 7
8 Virtual Data Describes analysis workflow file1 psearch t 10 file8 simulate t 10 file2 reformat f fz file1 file1 File3,4,5 file7 Requested file conv I esd o aod file6 summarize t 10 To re-create file 8: step 3 File 7 depends on file 6 Summarize > file 7 8
9 Virtual Data Describes analysis workflow file1 psearch t 10 file8 simulate t 10 file2 reformat f fz file1 file1 File3,4,5 file7 Requested file conv I esd o aod file6 summarize t 10 To re-create file 8: final step File 8 depends on files 1, 3, 4, 5, 7 psearch < file1, file3, file4, file5, file 7 > file 8 9
10 Grid3 The Laboratory Supported by the National Science Foundation and the Department of Energy. 10
11 VDL: Virtual Data Language Describes Data Transformations Transformation Abstract template of program invocation Similar to "function definition" Derivation Function call to a Transformation Store past and future: > A record of how data products were generated > A recipe of how data products can be generated Invocation Record of a Derivation execution These XML documents reside in a virtual data catalog VDC - a relational database 11
12 TR tr1(in a1, out a2) { VDL Describes Workflow via Data Dependencies argument stdin = ${a1}; argument stdout = ${a2}; } TR tr2(in a1, out a2) { argument stdin = ${a1}; argument stdout = ${a2}; } DV x1->tr1(a1=@{in:file1}, a2=@{out:file2}); DV x2->tr2(a1=@{in:file2}, a2=@{out:file3}); file1 x1 file2 x2 file3 12
13 Workflow example preprocess findrange findrange analyze Graph structure Fan-in Fan-out "left" and "right" can run in parallel Needs external input file Located via replica catalog Data file dependencies Form graph structure 13
14 Complete VDL workflow Generate appropriate derivations DV out:"f.b2"} ], ); DV left->findrange( name="left", p="0.5" ); DV right->findrange( name="right" ); DV ); 14
15 Compound Transformations Enable Functional Abstractions Compound TR encapsulates an entire sub-graph: TR rangeanalysis (in fa, p1, p2, { out fd, io fc1, io fc2, io fb1, io fb2, ) call preprocess( a=${fa}, b=[ ${out:fb1}, ${out:fb2} ] ); call findrange( a1=${in:fb1}, a2=${in:fb2}, name="left", p=${p1}, b=${out:fc1} ); call findrange( a1=${in:fb1}, a2=${in:fb2}, name="right", p=${p2}, b=${out:fc2} ); call analyze( a=[ ${in:fc1}, ${in:fc2} ], b=${fd} ); } 15
16 Derivation scripts Representation of virtual data provenance: DV d1->diamond( p2="100", p1="0" ); DV d2->diamond( p2=" ", p1="0" );... DV d70->diamond( p2="800", p1="18" ); 16
17 Invocation Provenance Completion status and resource usage Attributes of executable transformation Attributes of input and output files 17
18 Executing VDL Workflows Abstract workflow Global planner Pegasus jit planner (research) Concrete DAG Grid Info local planner DAGman / Condor-G 18
19 GriPhyN-iVDGL Applications to date ATLAS, BTeV, CMS HEP event simulation Argonne Computational Biology sequence comparison and result capture LIGO Pulsar search Sloan Digital Sky Survey cluster finding; near-earth object search planned Quarknet science education cosmic rays, HEP analysis 19
20 Genome Analysis Database Update Hit Public and Run Registered Groups Collaborators Data Flow and Storage at various levels Automatic Workflows Created as per User Request or Project A B C D Jazz/ANL B C A D End Users Interface to the Server UofWisc Grid3 Grid D B C A GADU - G C A D B Server Jetspeed Application work by Alex Rodriguez, Dina Sulakhe, Natalia Matlsev, Argonne MCS Described in GGF10 workshop paper. Chimera, Condor, Globus 20
21 Virtual Data Example: Galaxy Cluster Search DAG Sloan Data Galaxy cluster size distribution Number of Clusters Number of Galaxies Jim Annis, Steve Kent, Vijay Sehkri, Fermilab, Michael Milligan, Yong Zhao, University of Chicago. Described in SC2002 paper 21
22 Cluster Search Workflow Graph and Execution Trace Workflow jobs vs time 22
23 Virtual Data Application: mass = 200 decay = bbhigh Energy Physics Data Analysis mass = 200 mass = 200 decay = WW mass = 200 decay = ZZ mass = 200 decay = WW stability = 3 mass = 200 decay = WW stability = 1 LowPt = 20 HighPt = mass = 200 event = 8 mass = 200 plot = 1 Work and slide by Rick Cavanaugh and Dimitri Bourilkov, University of Florida Ref: CHEP 2002 paper mass = 200 decay = WW plot = 1 mass = 200 decay = WW stability = 1 mass = 200 decay = WW event = 8 mass = 200 decay = WW stability = 1 event = 8 mass = 200 decay = WW stability = 1 plot = 1 23
24 Using Virtual Data for Science Education The QuarkNet-Trillium collaboration is using Grid virtual data tools and methods to enrich science education Its an experiment to give students the means to: discover and apply datasets, algorithms, and data analysis methods collaborate by developing new ones and sharing results and observations learn data analysis methods that will ready and excite them for a scientific career And in later steps, we may actually use the Grid! 24
25 Quarknet Virtual Data Project Quarknet Virtual Data Portal Student Data, Algorithms, Results, Notes, and communications Central High School Reston, Virginia Locally Collected Data Cosmic Ray Detector Student/ Teacher Teams Virtual Data Toolkit Standard Web access Foothills High School Great Falls, Montana Locally Collected Data Cosmic Ray Detector Student/ Teacher Teams Virtual Data Catalog Yale / Middletown High Collaboration Hartford, Connecticut Locally Collected Data Cosmic Ray Detector Student teacher teams sharing data, methods, programs, and knowledge Student/ Teacher Teams Enabling collaboration-intensive science discovery with virtual data tools and methods 25
26 Detector Performance Study 26
27 Example: BTeV Event Simulation 27
28 Support for Search and Discovery Goal: make it as easy to use as Google More advanced capabilities lie below the surface (as with Google) Understand the structure and meaning of the datasets and their fields. Advanced search, using SQL-like queries Find both DATA and TRANSFORMATIONS Create datasets from queries Perform calculations on datasets, filtering results to look for patterns 28
29 Search by Metadata 29
30 Derving a new dataset to find mass of z particle: 30
31 Workflow for missing energy calculations 31
32 Virtual Provenance: list of derivations and files <job id="id000001" namespace="quarknet.hepsrch" name="ecalenergysum" level="5 dv-namespace="quarknet.hepsrch" dv-name="run1aesum"> <argument><filename file="run1a.event"/> <filename file="run1a.esm"/></argument> <uses file="run1a.esm" link="output" dontregister="false" donttransfer="false"/> <uses file="run1a.event" link="input" dontregister="false" donttransfer="false"/> </job> <job id="id000002" namespace="quarknet.hepsrch" name="ecalenergysum" level="7 dv-namespace="quarknet.hepsrch" <argument><filename file="electron10gev.event"/> <filenamefile="electron10gev.sum </job> <job id="id000014" namespace="quarknet.hepsrch" name="recontotalenergy" level="3" <argument><filename file="run1a.mis"/> <filename file="run1a.ecal"/> <uses file="run1a.muon" link="input" dontregister="false" donttransfer="false"/> <uses file="run1a.total" link="output" dontregister="false" donttransfer="false"/ <uses file="run1a.ecal" link="input" dontregister="false" donttransfer="false"/> <uses file="run1a.hcal" link="input" dontregister="false" donttransfer="false"/> <uses file="run1a.mis" link="input" dontregister="false" donttransfer="false"/> </job> <!--list of all files used --> <filename file="ecal.pct" link="inout"/> <filename file="electron10gev.avg" link="inout"/> <filename file="electron10gev.sum" link="inout"/> <filename file="hcal.pct" link="inout"/>. (excerpted for display) 32
33 Virtual Provenance in XML: control flow graph <child ref="id000003"> <parent ref="id000002"/> </child> <child ref="id000004"> <parent ref="id000003"/> </child> <child ref="id000005"> <parent ref="id000004"/> <parent ref="id00000 <child ref="id000009"> <parent ref="id000008"/> </child> <child ref="id000010"> <parent ref="id000009"/> <parent ref="id00000 <child ref="id000012"> <parent ref="id000011"/> </child> <child ref="id000013"> <parent ref="id000011"/> </child> <child ref="id000014"> <parent ref="id000010"/> <parent ref="id00001 <parent ref="id000013"/> </child> (excerpted for display ) 33
34 And writing the results up in a poster
35 Poster describing analysis 35
36 Using active data from Web Services 36
37 37
38 38
39 39
40 Levels of Interaction Skins use it like a calculator, experiment with scenarios and settings, use virtual data like a log book to document, assess, and share parameter values. Blocks re-assemble workflow pipelines using existing ones as patterns and predeveloped transforms as building blocks Code write new transforms in a variety of languages and data models 40
41 Observations A provenance approach based on interface definition and data flow declaration fits well with Grid requirements for code and data transportability and heterogeneity Working in a provenance-managed system has many fringe benefits: uniformity, precision, structure, communication, documentation The real world is messy finding the right abstractions is hard, and handling legacy applications is even harder 41
42 Vision for Provenance in the Large Universal knowledge management and production systems Vendors integrate the provenance tracking protocol into data processing products Ability to run anywhere in the Grid 42
43 Virtual Data Grid Vision discovery virtual data catalog virtual data catalog Production Manager planning Science Review workflow executor (DAGman) composition request executor (Condor-G, GRAM) workflow planner Researcher Grid Monitor request planner request predictor (Prophesy) sharing discovery derivation Data Transport virtual data index storage element replica location service simulation data simulation Data Grid storage element virtual data catalog storage element raw data Storage Resource Mgmt analysis detector Grid Operations Computing Grid 43
44 Planned Dataset Model <FORM <Title > /FORM> File Set of files Object closure XML Element Relational query or spreadsheet range New user-defined dataset type: Set of files with relational index Speculative model described in CIDR 2003 paper by Foster, Voeckler, Wilde and Zhao 44
45 Planned Dataset Type Model FileDataset Representational File FileSet MultiFileSet (Nonleaf Types are Superclasses) TarFileSet Logical EventCollection RawEventSet SimulatedEventSet MonteCarlo Simulation DiscreteEvent Simulation 45
46 Provenance Server Plans OGSA-based Grid services Discovery, security, resource management Supports code and data discovery and workflow management Object names (TR, DS, TY, DV, IV) can be used as global cross-server links Derivations can reference remote transformations and datasets Structured object namespaces & object-level access control enable large VO collaboration Generalize transforms to describe service calls, database queries and language interpreters 46
47 Provenance Hyperlinks Personal VDS DV DS TR DV DV TR TR TR DS DV DS Collaboration VDS Group VDS DV DV Personal VDS 47
48 Indexing Servers to Support Discovery Group Index Personal VDS Collaborationlevel index TR TR TR Collaboration VDS DV TR Group VDS DS DV DV DS DV DV DV DS Personal Index Personal Index Personal Index Personal VDS Collaboration-wide index 48
49 For Information and Software Virtual Data System - Chimera Virtual Data System: Overview, papers, software Grids and Grid Software - Using Grid3 - Virtual Data Toolkit The Globus Toolkit - The Condor Project Particle Physics Data Grid 49
50 Acknowledgements GriPhyN, ivdgl, and QuarkNet (in part) are supported by the National Science Foundation The Globus Alliance, PPDG, and QuarkNet are supported in part by the US Department of Energy, Office of Science; by the NASA Information Power Grid program; and by IBM 50
Workflow Management and Virtual Data
Workflow Management and Virtual Data Ewa Deelman USC Information Sciences Institute Tutorial Objectives Provide a detailed introduction to existing services for workflow and virtual data management Provide
More informationThe Virtual Data System a workflow toolkit for TeraGrid science applications. TeraGrid 06 Indianapolis, IN June 12, 2006
The Virtual Data System a workflow toolkit for TeraGrid science applications TeraGrid 06 Indianapolis, IN June 12, 2006 Ben Clifford 1 Gaurang Mehta 3 Karan Vahi 3 Michael Wilde 1,2 benc@mcs.anl.gov gmehta
More informationVirtual Data Grid Middleware Services for Data-Intensive Science
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1 7 [Version: 2002/09/19 v2.02] Virtual Data Grid Middleware Services for Data-Intensive Science Yong
More informationLIGO Virtual Data. Realizing. UWM: Bruce Allen, Scott Koranda. Caltech: Kent Blackburn, Phil Ehrens, Albert. Lazzarini, Roy Williams
Realizing LIGO Virtual Data Caltech: Kent Blackburn, Phil Ehrens, Albert Lazzarini, Roy Williams ISI: Ewa Deelman, Carl Kesselman, Gaurang Mehta, Leila Meshkat, Laura Pearlman UWM: Bruce Allen, Scott Koranda
More informationVirtual Data in CMS Analysis
Virtual Data in CMS Analysis A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, J. Rodriguez University of Florida, Gainesville, FL 3611, USA G. Graham FNAL, Batavia, IL 651, USA M. Wilde ANL, Argonne, IL
More informationThe Problem of Grid Scheduling
Grid Scheduling The Problem of Grid Scheduling Decentralised ownership No one controls the grid Heterogeneous composition Difficult to guarantee execution environments Dynamic availability of resources
More information57 Middleware 2004 Companion
57 Middleware 2004 Companion Grid Middleware Services for Virtual Data Discovery, Composition, and Integration Yong Zhao 1, Michael Wilde 2, Ian Foster 1,2, Jens Voeckler 1, Thomas Jordan 3, Elizabeth
More informationQuick Guide GriPhyN Virtual Data System.
Quick Guide GriPhyN Virtual Data System. Ewa Deelman, Gaurang Mehta, Karan Vahi (deelman,gmehta,vahi@isi.edu) Jens Voeckler, Mike Wilde(wilde@mcs.anl.gov,voeckler@cs.uchicago.edu) Version - 1.0 The quick
More informationOverview. Scientific workflows and Grids. Kepler revisited Data Grids. Taxonomy Example systems. Chimera GridDB
Grids and Workflows Overview Scientific workflows and Grids Taxonomy Example systems Kepler revisited Data Grids Chimera GridDB 2 Workflows and Grids Given a set of workflow tasks and a set of resources,
More informationMapping Abstract Complex Workflows onto Grid Environments
Mapping Abstract Complex Workflows onto Grid Environments Ewa Deelman, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Karan Vahi Information Sciences Institute University of Southern California
More informationClouds: An Opportunity for Scientific Applications?
Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences Institute Acknowledgements Yang-Suk Ki (former PostDoc, USC) Gurmeet Singh (former Ph.D. student, USC) Gideon Juve
More informationApplying Chimera Virtual Data Concepts to Cluster Finding in the Sloan Sky Survey
Applying Chimera Virtual Data Concepts to Cluster Finding in the Sloan Sky Survey James Annis 1 Yong Zhao 2 Jens Voeckler 2 Michael Wilde 3 Steve Kent 1 Ian Foster 2,3 1 Experimental Astrophysics, Fermilab,
More informationA Data Diffusion Approach to Large Scale Scientific Exploration
A Data Diffusion Approach to Large Scale Scientific Exploration Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Yong Zhao: Microsoft Ian Foster:
More informationIntroduction to Grid Computing
Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able
More informationManaging large-scale workflows with Pegasus
Funded by the National Science Foundation under the OCI SDCI program, grant #0722019 Managing large-scale workflows with Pegasus Karan Vahi ( vahi@isi.edu) Collaborative Computing Group USC Information
More informationTransparent Grid Computing: a Knowledge-Based Approach
Transparent Grid Computing: a Knowledge-Based Approach Jim Blythe, Ewa Deelman, Yolanda Gil, Carl Kesselman USC Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina Del Rey, CA 90292 {blythe,
More informationPlanning and Metadata on the Computational Grid
Planning and Metadata on the Computational Grid Jim Blythe, Ewa Deelman, Yolanda Gil USC Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina Del Rey, CA 90292 {blythe, deelman, gil}@isi.edu
More informationManaging Large-Scale Scientific Workflows in Distributed Environments: Experiences and Challenges
Managing Large-Scale Scientific s in Distributed Environments: Experiences and Challenges Ewa Deelman, Yolanda Gil USC Information Sciences Institute, Marina Del Rey, CA 90292, deelman@isi.edu, gil@isi.edu
More informationGrid2003 and Open Science Grid
Grid2003 and Open Science Grid Ruth Pordes Fermilab (contributes facilities and infrastructure for CDF, D0, SDSS, U.S CMS, BTeV..) U.S. CMS Trillium: PPDG Coordinator, ivdgl Management, April 16th 2004
More informationPart III. Computational Workflows in Wings/Pegasus
AAAI-08 Tutorial on Computational Workflows for Large-Scale Artificial Intelligence Research Part III Computational Workflows in Wings/Pegasus 1 Our Approach Express analysis as distributed workflows Data
More informationThe Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration
The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration Ian Foster 1,2 Jens Vöckler 2 Michael Wilde 1 Yong Zhao 2 1 Mathematics and Computer Science Division, Argonne National
More informationApplying the Virtual Data Provenance Model
Applying the Virtual Data Provenance Model Yong Zhao, University of Chicago, yongzh@cs.uchicago.edu Michael Wilde, University of Chicago and Argonne National Laboratory Ian Foster, University of Chicago
More informationAdvanced School in High Performance and GRID Computing November Introduction to Grid computing.
1967-14 Advanced School in High Performance and GRID Computing 3-14 November 2008 Introduction to Grid computing. TAFFONI Giuliano Osservatorio Astronomico di Trieste/INAF Via G.B. Tiepolo 11 34131 Trieste
More informationMeeting the Challenges of Managing Large-Scale Scientific Workflows in Distributed Environments
Meeting the Challenges of Managing Large-Scale Scientific s in Distributed Environments Ewa Deelman Yolanda Gil USC Information Sciences Institute Scientific s Current workflow approaches are exploring
More informationPart IV. Workflow Mapping and Execution in Pegasus. (Thanks to Ewa Deelman)
AAAI-08 Tutorial on Computational Workflows for Large-Scale Artificial Intelligence Research Part IV Workflow Mapping and Execution in Pegasus (Thanks to Ewa Deelman) 1 Pegasus-Workflow Management System
More informationGrid Computing in High Energy Physics
Grid Computing in High Energy Physics Enabling Data Intensive Global Science Paul Avery University of Florida avery@phys.ufl.edu Beauty 2003 Conference Carnegie Mellon University October 14, 2003 Beauty
More informationGrid Technologies & Applications: Architecture & Achievements
Grid Technologies & Applications: Architecture & Achievements Ian Foster Mathematics & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA Department of Computer Science, The
More informationKickstarting Remote Applications
Kickstarting Remote Applications Jens-S. Vöckler 1 Gaurang Mehta 1 Yong Zhao 2 Ewa Deelman 1 Mike Wilde 3 1 University of Southern California Information Sciences Institute 4676 Admiralty Way Ste 1001
More informationPegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute
Pegasus Workflow Management System Gideon Juve USC Informa3on Sciences Ins3tute Scientific Workflows Orchestrate complex, multi-stage scientific computations Often expressed as directed acyclic graphs
More informationA Cloud-based Dynamic Workflow for Mass Spectrometry Data Analysis
A Cloud-based Dynamic Workflow for Mass Spectrometry Data Analysis Ashish Nagavaram, Gagan Agrawal, Michael A. Freitas, Kelly H. Telu The Ohio State University Gaurang Mehta, Rajiv. G. Mayani, Ewa Deelman
More informationGrid Challenges and Experience
Grid Challenges and Experience Heinz Stockinger Outreach & Education Manager EU DataGrid project CERN (European Organization for Nuclear Research) Grid Technology Workshop, Islamabad, Pakistan, 20 October
More informationProblems for Resource Brokering in Large and Dynamic Grid Environments
Problems for Resource Brokering in Large and Dynamic Grid Environments Cătălin L. Dumitrescu Computer Science Department The University of Chicago cldumitr@cs.uchicago.edu (currently at TU Delft) Kindly
More informationCarelyn Campbell, Ben Blaiszik, Laura Bartolo. November 1, 2016
Carelyn Campbell, Ben Blaiszik, Laura Bartolo November 1, 2016 Data Landscape Collaboration Tools (e.g. Google Drive, DropBox, Sharepoint, Github, MatIN) Data Sharing Communities (e.g. Dryad, FigShare,
More informationWGL A Workflow Generator Language and Utility
WGL A Workflow Generator Language and Utility Technical Report Luiz Meyer, Marta Mattoso, Mike Wilde, Ian Foster Introduction Many scientific applications can be characterized as having sets of input and
More informationIntegrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example
Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mandal, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, Karan Vahi USC Information Sciences Institute Marina Del Rey, CA 90292
More informationGlobus Platform Services for Data Publication. Greg Nawrocki University of Chicago & Argonne National Lab GeoDaRRS August 7, 2018
Globus Platform Services for Data Publication Greg Nawrocki greg@globus.org University of Chicago & Argonne National Lab GeoDaRRS August 7, 2018 Outline Globus Overview Globus Data Publication v1 Lessons
More informationGriPhyN-LIGO Prototype Draft Please send comments to Leila Meshkat
Technical Report GriPhyN-2001-18 www.griphyn.org GriPhyN-LIGO Prototype Draft Please send comments to Leila Meshkat (meshkat@isi.edu) Kent Blackburn, Phil Ehrens, Albert Lazzarini, Roy Williams Caltech
More informationPlanning the SCEC Pathways: Pegasus at work on the Grid
Planning the SCEC Pathways: Pegasus at work on the Grid Philip Maechling, Vipin Gupta, Thomas H. Jordan Southern California Earthquake Center Ewa Deelman, Yolanda Gil, Sridhar Gullapalli, Carl Kesselman,
More informationChapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.
Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies
More informationGrid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007
Grid Programming: Concepts and Challenges Michael Rokitka SUNY@Buffalo CSE510B 10/2007 Issues Due to Heterogeneous Hardware level Environment Different architectures, chipsets, execution speeds Software
More informationPegasus WMS Automated Data Management in Shared and Nonshared Environments
Pegasus WMS Automated Data Management in Shared and Nonshared Environments Mats Rynge USC Information Sciences Institute Pegasus Workflow Management System NSF funded project and developed
More informationIoan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago
Running 1 Million Jobs in 10 Minutes via the Falkon Fast and Light-weight Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration with: Ian Foster,
More informationParsl: Developing Interactive Parallel Workflows in Python using Parsl
Parsl: Developing Interactive Parallel Workflows in Python using Parsl Kyle Chard (chard@uchicago.edu) Yadu Babuji, Anna Woodard, Zhuozhao Li, Ben Clifford, Ian Foster, Dan Katz, Mike Wilde, Justin Wozniak
More informationThe NASA/GSFC Advanced Data Grid: A Prototype for Future Earth Science Ground System Architectures
The NASA/GSFC Advanced Data Grid: A Prototype for Future Earth Science Ground System Architectures Samuel D. Gasster, Craig A. Lee, Brooks Davis, Matt Clark, Mike AuYeung, John R. Wilson Computer Systems
More informationCondor and Workflows: An Introduction. Condor Week 2011
Condor and Workflows: An Introduction Condor Week 2011 Kent Wenger Condor Project Computer Sciences Department University of Wisconsin-Madison Outline > Introduction/motivation > Basic DAG concepts > Running
More informationBy Ian Foster. Zhifeng Yun
By Ian Foster Zhifeng Yun Outline Introduction Globus Architecture Globus Software Details Dev.Globus Community Summary Future Readings Introduction Globus Toolkit v4 is the work of many Globus Alliance
More informationProduction Grids. Outline
Production Grids Last Time» Administrative Info» Coursework» Signup for Topical Reports! (signup immediately if you haven t)» Vision of Grids Today» Reality of High Performance Distributed Computing» Example
More informationFuture Developments in the EU DataGrid
Future Developments in the EU DataGrid The European DataGrid Project Team http://www.eu-datagrid.org DataGrid is a project funded by the European Union Grid Tutorial 4/3/2004 n 1 Overview Where is the
More informationDecreasing End-to Job Execution Times by Increasing Resource Utilization using Predictive Scheduling in the Grid
Decreasing End-to to-end Job Execution Times by Increasing Resource Utilization using Predictive Scheduling in the Grid Ioan Raicu Computer Science Department University of Chicago Grid Computing Seminar
More informationBio-Workflows with BizTalk: Using a Commercial Workflow Engine for escience
Bio-Workflows with BizTalk: Using a Commercial Workflow Engine for escience Asbjørn Rygg, Scott Mann, Paul Roe, On Wong Queensland University of Technology Brisbane, Australia a.rygg@student.qut.edu.au,
More informationExtreme-scale scripting: Opportunities for large taskparallel applications on petascale computers
Extreme-scale scripting: Opportunities for large taskparallel applications on petascale computers Michael Wilde, Ioan Raicu, Allan Espinosa, Zhao Zhang, Ben Clifford, Mihael Hategan, Kamil Iskra, Pete
More informationIntroduction to Grid Computing
Introduction to Grid Computing Jennifer M. Schopf UK National escience Centre Argonne National Lab Overview and Outline What is a Grid And what is not a Grid History Globus Toolkit and Standards Grid 2003
More informationATLAS Analysis Workshop Summary
ATLAS Analysis Workshop Summary Matthew Feickert 1 1 Southern Methodist University March 29th, 2016 Matthew Feickert (SMU) ATLAS Analysis Workshop Summary March 29th, 2016 1 Outline 1 ATLAS Analysis with
More informationA Notation and System for Expressing and Executing Cleanly Typed Workflows on Messy Scientific Data
Zhao, Y., Dobson, J., Foster, I., Moreau, L., Wilde, M., A Notation and System for Expressing and Executing Cleanly Typed Workflows on Messy Scientific Data, SIGMOD Record, September 2005. A Notation and
More informationProvenance Trails in the Wings/Pegasus System
To appear in the Journal of Concurrency And Computation: Practice And Experience, 2007 Provenance Trails in the Wings/Pegasus System Jihie Kim, Ewa Deelman, Yolanda Gil, Gaurang Mehta, Varun Ratnakar Information
More informationManaging and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers
Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Collaborators:
More informationMultiple Broker Support by Grid Portals* Extended Abstract
1. Introduction Multiple Broker Support by Grid Portals* Extended Abstract Attila Kertesz 1,3, Zoltan Farkas 1,4, Peter Kacsuk 1,4, Tamas Kiss 2,4 1 MTA SZTAKI Computer and Automation Research Institute
More informationWhat makes workflows work in an opportunistic environment?
What makes workflows work in an opportunistic environment? Ewa Deelman 1 Tevfik Kosar 2 Carl Kesselman 1 Miron Livny 2 1 USC Information Science Institute, Marina Del Rey, CA deelman@isi.edu, carl@isi.edu
More informationDatabase Assessment for PDMS
Database Assessment for PDMS Abhishek Gaurav, Nayden Markatchev, Philip Rizk and Rob Simmonds Grid Research Centre, University of Calgary. http://grid.ucalgary.ca 1 Introduction This document describes
More informationUser Tools and Languages for Graph-based Grid Workflows
User Tools and Languages for Graph-based Grid Workflows User Tools and Languages for Graph-based Grid Workflows Global Grid Forum 10 Berlin, Germany Grid Workflow Workshop Andreas Hoheisel (andreas.hoheisel@first.fraunhofer.de)
More informationAutomatic Generation of Workflow Provenance
Automatic Generation of Workflow Provenance Roger S. Barga 1 and Luciano A. Digiampietri 2 1 Microsoft Research, One Microsoft Way Redmond, WA 98052, USA 2 Institute of Computing, University of Campinas,
More informationSphinx: A Scheduling Middleware for Data Intensive Applications on a Grid
Sphinx: A Scheduling Middleware for Data Intensive Applications on a Grid Richard Cavanaugh University of Florida Collaborators: Janguk In, Sanjay Ranka, Paul Avery, Laukik Chitnis, Gregory Graham (FNAL),
More informationGrid Middleware and Globus Toolkit Architecture
Grid Middleware and Globus Toolkit Architecture Lisa Childers Argonne National Laboratory University of Chicago 2 Overview Grid Middleware The problem: supporting Virtual Organizations equirements Capabilities
More informationGrid-Based Galaxy Morphology Analysis for the National Virtual Observatory
Grid-Based Galaxy Morphology Analysis for the National Virtual Observatory Ewa Deelman Information Sciences Institute, University of Southern California, Marina Del Rey, CA 90202 (ISI), deelman@isi.edu
More informationPegasus. Automate, recover, and debug scientific computations. Rafael Ferreira da Silva.
Pegasus Automate, recover, and debug scientific computations. Rafael Ferreira da Silva http://pegasus.isi.edu Experiment Timeline Scientific Problem Earth Science, Astronomy, Neuroinformatics, Bioinformatics,
More informationKnowledge Discovery Services and Tools on Grids
Knowledge Discovery Services and Tools on Grids DOMENICO TALIA DEIS University of Calabria ITALY talia@deis.unical.it Symposium ISMIS 2003, Maebashi City, Japan, Oct. 29, 2003 OUTLINE Introduction Grid
More informationAn Introduction to Grid Computing
An Introduction to Grid Computing Bina Ramamurthy Bina Ramamurthy bina@cse.buffalo.edu http://www.cse.buffalo.edu/gridforce Partially Supported by NSF DUE CCLI A&I Grant 0311473 7/13/2005 TCIE Seminar
More informationEvolution of the ATLAS PanDA Workload Management System for Exascale Computational Science
Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science T. Maeno, K. De, A. Klimentov, P. Nilsson, D. Oleynik, S. Panitkin, A. Petrosyan, J. Schovancova, A. Vaniachine,
More informationMathematics and Computer Science Division. Department of Agricultural and Biological Engineering
Mathematics and Computer Science Division Department of Science and Technologies University of Naples Parthenope FACE-IT: Earth science workflows made easy with Globus and Galaxy technologies (Provide
More informationIntroduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project
Introduction to GT3 The Globus Project Argonne National Laboratory USC Information Sciences Institute Copyright (C) 2003 University of Chicago and The University of Southern California. All Rights Reserved.
More informationData publication and discovery with Globus
Data publication and discovery with Globus Questions and comments to outreach@globus.org The Globus data publication and discovery services make it easy for institutions and projects to establish collections,
More informationOn the Use of Cloud Computing for Scientific Workflows
On the Use of Cloud Computing for Scientific Workflows Christina Hoffa 1, Gaurang Mehta 2, Timothy Freeman 3, Ewa Deelman 2, Kate Keahey 3, Bruce Berriman 4, John Good 4 1 Indiana University, 2 University
More informationDATA MINING - 1DL105, 1DL111
1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database
More informationIntroduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill
Introduction to FREE National Resources for Scientific Computing Dana Brunson Oklahoma State University High Performance Computing Center Jeff Pummill University of Arkansas High Peformance Computing Center
More informationGlobus GTK and Grid Services
Globus GTK and Grid Services Michael Rokitka SUNY@Buffalo CSE510B 9/2007 OGSA The Open Grid Services Architecture What are some key requirements of Grid computing? Interoperability: Critical due to nature
More informationThe Grid Architecture
U.S. Department of Energy Office of Science The Grid Architecture William E. Johnston Distributed Systems Department Computational Research Division Lawrence Berkeley National Laboratory dsd.lbl.gov What
More informationA Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid
A Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid Daniel S. Katz, Joseph C. Jacob Jet Propulsion Laboratory California Institute of Technology Pasadena, CA 91109 Daniel.S.Katz@jpl.nasa.gov
More informationBased on: Grid Intro and Fundamentals Review Talk by Gabrielle Allen Talk by Laura Bright / Bill Howe
Introduction to Grid Computing 1 Based on: Grid Intro and Fundamentals Review Talk by Gabrielle Allen Talk by Laura Bright / Bill Howe 2 Overview Background: What is the Grid? Related technologies Grid
More informationAutomating Real-time Seismic Analysis
Automating Real-time Seismic Analysis Through Streaming and High Throughput Workflows Rafael Ferreira da Silva, Ph.D. http://pegasus.isi.edu Do we need seismic analysis? Pegasus http://pegasus.isi.edu
More informationData Placement for Scientific Applications in Distributed Environments
Data Placement for Scientific Applications in Distributed Environments Ann Chervenak, Ewa Deelman, Miron Livny 2, Mei-Hui Su, Rob Schuler, Shishir Bharathi, Gaurang Mehta, Karan Vahi USC Information Sciences
More informationICAT Job Portal. a generic job submission system built on a scientific data catalog. IWSG 2013 ETH, Zurich, Switzerland 3-5 June 2013
ICAT Job Portal a generic job submission system built on a scientific data catalog IWSG 2013 ETH, Zurich, Switzerland 3-5 June 2013 Steve Fisher, Kevin Phipps and Dan Rolfe Rutherford Appleton Laboratory
More informationEFFICIENT SCHEDULING TECHNIQUES AND SYSTEMS FOR GRID COMPUTING
EFFICIENT SCHEDULING TECHNIQUES AND SYSTEMS FOR GRID COMPUTING By JANG-UK IN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR
More informationSocial Informatics Data Grid
Social Informatics Data Grid Bennett Bertenthal 1,5, Robert Grossman 3, David Hanley 3, Mark Hereld 1,6, Sarah Kenny 1, Gina-Anne Levow 2, Michael E. Papka 1,2,6, Stephen W. Porges 4, Kavithaa Rajavenkateshwaran
More informationEducating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management Jian Qin School of Information Studies Syracuse University Microsoft escience Workshop, Chicago, October 9, 2012 Talk points Data
More informationThe Grid: Feng Shui for the Terminally Rectilinear
The Grid: Feng Shui for the Terminally Rectilinear Martha Stewart Introduction While the rapid evolution of The Internet continues to define a new medium for the sharing and management of information,
More informationPegasus. Pegasus Workflow Management System. Mats Rynge
Pegasus Pegasus Workflow Management System Mats Rynge rynge@isi.edu https://pegasus.isi.edu Automate Why workflows? Recover Automates complex, multi-stage processing pipelines Enables parallel, distributed
More informationThe Role of Planning in Grid Computing
From: ICAPS-03 Proceedings. Copyright 2003, AAAI (www.aaai.org). All rights reserved. The Role of Planning in Grid Computing Jim Blythe, Ewa Deelman, Yolanda Gil, Carl Kesselman, Amit Agarwal, Gaurang
More informationGrid Compute Resources and Job Management
Grid Compute Resources and Job Management How do we access the grid? Command line with tools that you'll use Specialised applications Ex: Write a program to process images that sends data to run on the
More informationCMS HLT production using Grid tools
CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea Sciaba` (INFN Pisa) Massimo Sgaravatto (INFN Padova)
More informationThe Dartmouth Green Grid
The Dartmouth Green Grid James E. Dobson 1,, Jeffrey B. Woodward 1, Susan A. Schwarz 3, John C. Marchesini 2, Hany Farid 2, and Sean W. Smith 2 1 Department of Psychological and Brain Sciences, Dartmouth
More informationIoan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?
Ioan Raicu More information at: http://www.cs.iit.edu/~iraicu/ Everyone else Background? What do you want to get out of this course? 2 Data Intensive Computing is critical to advancing modern science Applies
More informationGrid-Based Data Mining and the KNOWLEDGE GRID Framework
Grid-Based Data Mining and the KNOWLEDGE GRID Framework DOMENICO TALIA (joint work with M. Cannataro, A. Congiusta, P. Trunfio) DEIS University of Calabria ITALY talia@deis.unical.it Minneapolis, September
More informationHEP Grid Activities in China
HEP Grid Activities in China Sun Gongxing Institute of High Energy Physics, Chinese Academy of Sciences CANS Nov. 1-2, 2005, Shen Zhen, China History of IHEP Computing Center Found in 1974 Computing Platform
More informationGrid Scheduling Architectures with Globus
Grid Scheduling Architectures with Workshop on Scheduling WS 07 Cetraro, Italy July 28, 2007 Ignacio Martin Llorente Distributed Systems Architecture Group Universidad Complutense de Madrid 1/38 Contents
More informationTHE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid
THE GLOBUS PROJECT White Paper GridFTP Universal Data Transfer for the Grid WHITE PAPER GridFTP Universal Data Transfer for the Grid September 5, 2000 Copyright 2000, The University of Chicago and The
More informationPegasus. Automate, recover, and debug scientific computations. Mats Rynge
Pegasus Automate, recover, and debug scientific computations. Mats Rynge rynge@isi.edu https://pegasus.isi.edu Why Pegasus? Automates complex, multi-stage processing pipelines Automate Enables parallel,
More informationWings for Pegasus: Creating Large-Scale Scientific Applications Using Semantic Representations of Computational Workflows
Proceedings of the Nineteenth Conference on Innovative Applications of Artificial Intelligence (IAAI-07), July 22 26, 2007, Vancouver, British Columbia, Canada. Wings for Pegasus: Creating Large-Scale
More informationHarnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets
Page 1 of 5 1 Year 1 Proposal Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets Year 1 Progress Report & Year 2 Proposal In order to setup the context for this progress
More informationProvenance Management in Swift
Provenance Management in Swift Luiz M. R. Gadelha Jr.,a,b, Ben Clifford, Marta Mattoso a, Michael Wilde c,d, Ian Foster c,d a Computer and Systems Engineering Program, Federal University of Rio de Janeiro,
More informationExploiting Virtual Observatory and Information Technology: Techniques for Astronomy
Exploiting Virtual Observatory and Information Technology: Techniques for Astronomy Nicholas Walton AstroGrid Project Scientist Institute of Astronomy, The University of Cambridge Lecture #3 Goal: Applications
More information