Workflow Exchange and Archival: The KSW File and the Kepler Object Manager. Shawn Bowers (For Chad Berkley & Matt Jones)

Similar documents
Hybrid-Type Extensions for Actor-Oriented Modeling (a.k.a. Semantic Data-types for Kepler) Shawn Bowers & Bertram Ludäscher

A High-Level Distributed Execution Framework for Scientific Workflows

A(nother) Vision of ppod Data Integration!?

Automatic Transformation from Geospatial Conceptual Workflow to Executable Workflow Using GRASS GIS Command Line Modules in Kepler *

Where we are so far. Intro to Data Integration (Datalog, mediators, ) more to come (your projects!): schema matching, simple query rewriting

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

San Diego Supercomputer Center, UCSD, U.S.A. The Consortium for Conservation Medicine, Wildlife Trust, U.S.A.

Kepler: An Extensible System for Design and Execution of Scientific Workflows

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

DataONE: Open Persistent Access to Earth Observational Data

Using Web Services and Scientific Workflow for Species Distribution Prediction Modeling 1

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

Quality Assured (QA) data

A High-Level Distributed Execution Framework for Scientific Workflows

Dutch View on URN:NBN and Related PID Services

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Kepler and Grid Systems -- Early Efforts --

Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example

KEPLER: Overview and Project Status

About the integration of IBM Content Collector with IBM Classification Module

Putting the Archives to Work: Workflow and Metadata-driven Analysis in LTER Science

Robin Wilson Director. Digital Identifiers Metadata Services

Core DOI Specification

Unique Identifiers Assessment: Results. R. Duerr

Agent Framework For Intelligent Data Processing

Kepler Scientific Workflow and Climate Modeling

2/12/11. Addendum (different syntax, similar ideas): XML, JSON, Motivation: Why Scientific Workflows? Scientific Workflows

Convergence and Collaboration: Transforming Business Process and Workflows

Persistent Identifiers for Earth Science Provenance

How to use Water Data to Produce Knowledge: Data Sharing with the CUAHSI Water Data Center

Building Virtual Earth Observatories Using Scientific Database, Semantic Web and Linked Geospatial Data Technologies

Red Hat Atomic Details Dockah, Dockah, Dockah! Containerization as a shift of paradigm for the GNU/Linux OS

Building a federated science web one link at a time

"Out of the Box" Workflow Simplicity Data Access Using PPDM in a Multi-Vendor Environment

Working with Metadata in ArcGIS

Data is the new Oil (Ann Winblad)

Wade Sheldon. Georgia Coastal Ecosystems LTER University of Georgia CUAHSI Virtual Workshop Field Data Management Solutions

ebxml Technical Architecture San Jose, CA USA Wednesday, 9 August 2000

Ways for a Machine-actionable Processing Chain for Identifier, Metadata, and Data

C. The system is equally reliable for classifying any one of the eight logo types 78% of the time.

Persistent Identifiers

Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

SERVO - ACES Abstract

Handles at LC as of July 1999

When using this architecture for accessing distributed services, however, query broker and/or caches are recommendable for performance reasons.

GIT TUTORIAL. Creative Software Architectures for Collaborative Projects CS 130 Donald J. Patterson

Parametric Polymorphism for Java: A Reflective Approach

Basics in good research data management (RDM) for reviewing DMPs

Florida Coastal Everglades LTER Program

Opal: Simple Web Services Wrappers for Scientific Applications

D2K Support for Standard Content Repositories: Design Notes. Request For Comments

Wendy Thomas Minnesota Population Center NADDI 2014

TDWG Life Sciences Identifiers (LSID) Applicability Statement

RADAR Project. Data Archival and Publication as a Service. Matthias Razum FIZ Karlsruhe RESEARCH DATA REPOSITORIUM. Zurich, December 15, 2014

Enosis: Bridging the Semantic Gap between

Persistent Identifiers for Digital Resources

DSpace Fedora. Eprints Greenstone. Handle System

The NASA/GSFC Advanced Data Grid: A Prototype for Future Earth Science Ground System Architectures

Approach for Mapping Ontologies to Relational Databases

ReST 2000 Roy Fielding W3C

Provenance Collection Support in the Kepler Scientific Workflow System

An Archiving System for Managing Evolution in the Data Web

GUID Guide for Data Providers

Building a Digital Library Software

Tracking data usage at NCAR s Research Data Archive. Steven Worley Computational and Information System Laboratory NCAR

Scalable Semantic Annotation using Lattice-based Ontologies

Improving Data Discovery in Metadata Repositories through Semantic Search

Workflow Fault Tolerance for Kepler. Sven Köhler, Thimothy McPhillips, Sean Riddle, Daniel Zinn, Bertram Ludäscher

DataONE Enabling Cyberinfrastructure for the Biological, Environmental and Earth Sciences

Grid Platform for Medical Federated Queries Supporting Semantic and Visual Annotations

To request the MODS1 datastream of an object the following syntax is used:

Enhanced Access to High-Resolution LiDAR Topography through Cyberinfrastructure-Based Data Distribution and Processing

Scientific Workflows

Python: Working with Multidimensional Scientific Data. Nawajish Noman Deng Ding

IHE Radiology Technical Framework Supplement. Rev. 1.4 Trial Implementation

irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam

Agenda. A. Paschke 1, A. Kozlenkov 2 1. RuleResponder Approach Reaction RuleML Prova Semantic Web Rule Engine Use Cases Summary

Semantic Web Technologies

A Knowledge-Based Approach to Scientific Workflow Composition

Personal Digital Information Project, Part 2: Hands-on Exercise

Digital Preservation. Unique Identifiers

Semantic Web: Core Concepts and Mechanisms. MMI ORR Ontology Registry and Repository

An overview of irods clients. Ton Smeele

Sessions 3/4: Member Node Breakouts. John Cobb Matt Jones Laura Moyers 7 July 2013 DataONE Users Group

A Service Oriented Architecture Combining Agents and Ontologies Towards Pervasive Adaptation

The Integrated Data Viewer A Tool for Scientific Analysis and Visualization

Flexible Collaboration over XML Documents

Introduction to Lexical Analysis

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Analysis and summary of stakeholder recommendations First Kepler/CORE Stakeholders Meeting, May 13-15, 2008

Reducing Consumer Uncertainty

Supporting Data Stewardship Throughout the Data Life Cycle in the Solid Earth Sciences

Context-Aware Actors. Outline

Data-Transformation on historical data using the RDF Data Cube Vocabulary

What is a Web Service?

Introduction to Prod-Trees

Transcription:

Workflow Exchange and Archival: The KSW File and the Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis May, 2005 Outline 1. The 2. Archival and Exchange via KSW Files 3. Object Cache 4. Work in Progress (literally)

Motivation In SEEK (and generally, Kepler) we: Envision a large number of science-specific actors E.g., ecology, biodiversity, geoscience, bioinformatics, chemistry Envision a large number of available data sets Some may be fairly large; already various collections (e.g., LTER, GEON) Want to enable search / discovery of actors, datasets, and workflows Want to enable wide-scale sharing and re-use of data and workflows (and their components) across disciplines and have desired more than CVS to accomplish these goals Managing distributed collections of workflow objects Remote /Actors /Actors Download Publish Search Archive Actors Data Archived Actors Data Archived

Main Issues and Goals Handle various kinds of objects Workflows, Actors,, Libraries, Applications, Ontologies, Metadata, Transport objects Publish to and download from remote repositories Recognize local versus remote objects Search for objects Both local and remote repositories w/in Kepler Version objects E.g., managing conflicts in dependency chains Enable functional groups Core group plus packages, e.g., biodiversity, geospatial, R, etc. /Actors Overall Architecture Life-Science Identifiers (LSIDs) Similar to DOIs (urn s, etc.) Includes (some) versioning support Well-defined APIs /Actors Kepler Scientific Workflow (KSW) Files An archive/jar file of objects KSW Metadata (a la MoML) For publish/download/archive/groups EcoGrid Remote Access/Query A thin client / protocol Object Repository Object retrieval, indexing, etc. Object Cache Managing local vs. remote objects Dynamic class loader KSW Cache Repository

KSW Files An archive/jar file of objects Actor code and metadata, libraries; Dataset, metadata, etc. A manifest (what s in the file) All objects in KSW files have associated LSID identifiers An LSID is a urn, e.g., urn:lsid:kepler.org:actor:1000:2 Authority Namespace OID Version (opt.) Objects have associated metadata files (basically MoML) Distinguish between object definitions and object references Objects within a KSW file have corresponding definitions Dependent objects not included are denoted via references Semantic types, dependency information, ports and types, etc. KSW Metadata example (notional) KSW-File (jar) Manifest Unpacks /Actors New Actor Classes New Libraries Publish New Workflow refs KSW Metadata (MoML + ID) Creates Cache Repository id Workflow Metadata (MoML + ID) NOTE: Only *new* classes/libraries/etc. are included known via object manager

Object Cache Helps manage LSIDs Creates them locally (for local actors, datasets, etc.) Resolves LSIDs (both local and remote) Handles publication (local id -> repository id, etc.) Support for packing/unpacking KSW files Provides simple database-like capabilities Persistent storage / file indexing Result caching (e.g., for remote queries) Temporary files Object indexing (e.g., dependency graphs, etc.) Handles multiple formats (within Kepler) Stream versus File access Representation conversion (e.g., binary interleaved, ascii grid) Work In Progress Still much to do Still finalizing Object- interfaces / APIs The object cache is partly implemented Simple KSW files can be packed / unpacked The KSW metadata format still being finalized Working on MoML parsing, handling ids, etc. Defining properties, like semtypes, dependencies, refs, etc. We have a simple implementation of lsids for actor lib. Our goal is to leverage Ptolemy s strengths Most of what we have added / plan to add is layered on top of Ptolemy We want well-defined, generic interfaces We are soliciting volunteers!