Semantic Web Technologies

Similar documents
The Data Web and Linked Data.

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

A General Approach to Query the Web of Data

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Semantic Web. Semantic Web Services. Morteza Amini. Sharif University of Technology Spring 90-91

Introduction. October 5, Petr Křemen Introduction October 5, / 31

Semantic Web. Semantic Web Services. Morteza Amini. Sharif University of Technology Fall 94-95

SEBI: An Architecture for Biomedical Image Discovery, Interoperability and Reusability based on Semantic Enrichment

Semantics for Optimization of the Livestock Farming

An overview of RDB2RDF techniques and tools

Semantic Web Technologies

DBpedia-An Advancement Towards Content Extraction From Wikipedia

Linked Open Data: a short introduction

Data Integration and Data Warehousing Database Integration Overview

SELF-SERVICE SEMANTIC DATA FEDERATION

Bioqueries: A Social Community Sharing Experiences while Querying Biological Linked Data (

COMPUTER AND INFORMATION SCIENCE JENA DB. Group Abhishek Kumar Harshvardhan Singh Abhisek Mohanty Suhas Tumkur Chandrashekhara

Use of Semantic Technologies at Eli Lilly and Company. J Phil Brooks Information Consultant, SE Data Team Discover IT Eli Lilly and Company

Semantic Queries and Mediation in a RESTful Architecture

Grid Resources Search Engine based on Ontology

An Entity Name Systems (ENS) for the [Semantic] Web

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

Ontology Servers and Metadata Vocabulary Repositories

Semantiska webben DFS/Gbg

Linked Data: Fast, low cost semantic interoperability for health care?

Reducing Consumer Uncertainty

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

Introduction. Semantic Web Services

Application of Named Graphs Towards Custom Provenance Views

Towards the Semantic Desktop. Dr. Øyvind Hanssen University Library of Tromsø

Proposal for Implementing Linked Open Data on Libraries Catalogue

Semantic Web. Lecture XIII Tools Dieter Fensel and Katharina Siorpaes. Copyright 2008 STI INNSBRUCK

SEMANTIC SUPPORT FOR MEDICAL IMAGE SEARCH AND RETRIEVAL

Resilient Linked Data. Dave Reynolds, Epimorphics

Semantic Integration with Apache Jena and Apache Stanbol

Scientific Workflows

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

Semantic Web: Core Concepts and Mechanisms. MMI ORR Ontology Registry and Repository

Semantic Web Fundamentals

The Semantic Web: Service discovery and provenance in my Grid

Opus: University of Bath Online Publication Store

Using RDF to Model the Structure and Process of Systems

Benchmarking RDF Production Tools

CHAPTER 7. Observations, Conclusions and Future Directions Observations 7.2. Limitations of the Model 7.3. Conclusions 7.4.

The Emerging Data Lake IT Strategy

New Approach to Graph Databases

Multi-agent and Semantic Web Systems: Linked Open Data

Enrichment of Sensor Descriptions and Measurements Using Semantic Technologies. Student: Alexandra Moraru Mentor: Prof. Dr.

On the use of Abstract Workflows to Capture Scientific Process Provenance

Semantic Web Information Management

OKKAM-based instance level integration

Two-staged approach for semantically annotating and brokering TV-related services

Visual Concept Detection and Linked Open Data at the TIB AV- Portal. Felix Saurbier, Matthias Springstein Hamburg, November 6 SWIB 2017

Advances in Data Integration & Representation in Systems Biology

Open And Linked Data Oracle proposition Subtitle

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering

1 What-is-anopen-platform/

MDA & Semantic Web Services Integrating SWSF & OWL with ODM

Publishing a Scorecard for Evaluating the Use of Open-Access Journals Using Linked Data Technologies

Semantic Web Fundamentals

BioSStore: A Client Interface for a Repository of Semantically Annotated Bioinformatics Web Services

Chapter 4 Research Prototype

Comparative Study of RDB to RDF Mapping using D2RQ and R2RML Mapping Languages

Business Process Modelling & Semantic Web Services

COMP6217 Social Networking Technologies Web evolution and the Social Semantic Web. Dr Thanassis Tiropanis

Semantic Web Tools. Federico Chesani 18 Febbraio 2010

SPARQL Protocol And RDF Query Language

1. PUBLISHABLE SUMMARY

Introduction to RDF and the Semantic Web for the life sciences

warwick.ac.uk/lib-publications

ISA Action 1.17: A Reusable INSPIRE Reference Platform (ARE3NA)

Data integration perspectives from the LTB project

Semantic Web Programming

From Raw Sensor Data to Semantic Web Triples Information Flow in Semantic Sensor Networks

Towards a Semantic Clinical Data Warehouse: A Case Study of Discovering Similar Genes

SEMANTIC SOLUTIONS FOR OIL & GAS: ROLES AND RESPONSIBILITIES

APPLYING KNOWLEDGE BASED AI TO MODERN DATA MANAGEMENT. Mani Keeran, CFA Gi Kim, CFA Preeti Sharma

OLAP over Federated RDF Sources

Linked Data. The World is Your Database

Data Governance for the Connected Enterprise

Index. Callimachus, 112 Contexts and Dependency Injection (CDI), 111 createdefaultmodel() method, 94 CubicWeb, 109 Cypher Query Language (CQL), 188

Provenance-aware Faceted Search in Drupal

Acquiring Experience with Ontology and Vocabularies

The Emerging Web of Linked Data

A Generic Approach for Compliance Assessment of Interoperability Artifacts

STS Infrastructural considerations. Christian Chiarcos

Linked Data and RDF. COMP60421 Sean Bechhofer

Unified Lightweight Semantic Descriptions of Web APIs and Web Services

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

WSDL versioning. Facts Basic scenario. WSDL -Web Services Description Language SAWSDL -Semantic Annotations for WSDL and XML Schema

Ontology-Based Mediation in the. Pisa June 2007

Natural Language Processing with PoolParty

Enterprise Information Integration using Semantic Web Technologies:

CHAPTER 1 INTRODUCTION

ISO CTS2 and Value Set Binding. Harold Solbrig Mayo Clinic

Orchestrating Music Queries via the Semantic Web

Transcription:

1/33 Semantic Web Technologies Lecture 11: SWT for the Life Sciences 4: BioRDF and Scientifc Workflows Maria Keet email: keet -AT- inf.unibz.it home: http://www.meteck.org blog: http://keet.wordpress.com/category/computer-science/72010-semwebtech/ KRDB Research Centre Free University of Bozen-Bolzano, Italy 11 January 2010

2/33 Outline BioRDF Considerations One RDF-ised base Toward federation Scientific workflows Introduction Scientific workflows in the Semantic Web setting

3/33 Outline BioRDF Considerations One RDF-ised base Toward federation Scientific workflows Introduction Scientific workflows in the Semantic Web setting

4/33 Outline BioRDF Considerations One RDF-ised base Toward federation Scientific workflows Introduction Scientific workflows in the Semantic Web setting

5/33 The backdrop Data integration again (still...), but now with new technologies bio having a go at RDF, RDFS, SPARQL and the like Using technologies for newly developed RDFizers, ontology development tools, triplestore tools (e.g., Sesame, Virtuoso, AllegroGraph), and visualisation tools Mesh- vs. mash-ups

5/33 The backdrop Data integration again (still...), but now with new technologies bio having a go at RDF, RDFS, SPARQL and the like Using technologies for newly developed RDFizers, ontology development tools, triplestore tools (e.g., Sesame, Virtuoso, AllegroGraph), and visualisation tools Mesh- vs. mash-ups

5/33 The backdrop Data integration again (still...), but now with new technologies bio having a go at RDF, RDFS, SPARQL and the like Using technologies for newly developed RDFizers, ontology development tools, triplestore tools (e.g., Sesame, Virtuoso, AllegroGraph), and visualisation tools Mesh- vs. mash-ups

5/33 The backdrop Data integration again (still...), but now with new technologies bio having a go at RDF, RDFS, SPARQL and the like Using technologies for newly developed RDFizers, ontology development tools, triplestore tools (e.g., Sesame, Virtuoso, AllegroGraph), and visualisation tools Mesh- vs. mash-ups

6/33 Recollecting the Ruttenberg et al (2007) paper (lecture 8) Are (should?) Tools and strategies to extract or translate from non-rdf data sources to enable their interoperability with data organized as statements. (be) part of the set of SW Technologies? Or: where are (W3C) standardization efforts for RDBMS RDF, excel RDF, OBO OWL, structured flat file language y mappings? BioRDF has the goal of converting a number of publicly available life sciences data sources into RDF and OWL. Thus: not using SW Tech but preparing for use

6/33 Recollecting the Ruttenberg et al (2007) paper (lecture 8) Are (should?) Tools and strategies to extract or translate from non-rdf data sources to enable their interoperability with data organized as statements. (be) part of the set of SW Technologies? Or: where are (W3C) standardization efforts for RDBMS RDF, excel RDF, OBO OWL, structured flat file language y mappings? BioRDF has the goal of converting a number of publicly available life sciences data sources into RDF and OWL. Thus: not using SW Tech but preparing for use

6/33 Recollecting the Ruttenberg et al (2007) paper (lecture 8) Are (should?) Tools and strategies to extract or translate from non-rdf data sources to enable their interoperability with data organized as statements. (be) part of the set of SW Technologies? Or: where are (W3C) standardization efforts for RDBMS RDF, excel RDF, OBO OWL, structured flat file language y mappings? BioRDF has the goal of converting a number of publicly available life sciences data sources into RDF and OWL. Thus: not using SW Tech but preparing for use

6/33 Recollecting the Ruttenberg et al (2007) paper (lecture 8) Are (should?) Tools and strategies to extract or translate from non-rdf data sources to enable their interoperability with data organized as statements. (be) part of the set of SW Technologies? Or: where are (W3C) standardization efforts for RDBMS RDF, excel RDF, OBO OWL, structured flat file language y mappings? BioRDF has the goal of converting a number of publicly available life sciences data sources into RDF and OWL. Thus: not using SW Tech but preparing for use

7/33 Recollecting the Ruttenberg et al (2007) paper (lecture 8) D2RQ The mappings allow RDF applications to access the contents of relational databases using Semantic Web query languages like SPARQL. Doing such a mapping requires us to choose how tables, columns, and values in the database map to URIs for classes, properties, instances, and data values. Name the pros and cons of RDF triplestores vs RDBMSs

7/33 Recollecting the Ruttenberg et al (2007) paper (lecture 8) D2RQ The mappings allow RDF applications to access the contents of relational databases using Semantic Web query languages like SPARQL. Doing such a mapping requires us to choose how tables, columns, and values in the database map to URIs for classes, properties, instances, and data values. Name the pros and cons of RDF triplestores vs RDBMSs

8/33 Data integration strategies (lecture 8) I. Physical schema mappings Global As View (GAV) Local As View (LAV) GLAV II. Conceptual model-based data integration III. Data federation IV. Data warehouses V. Data marts VI. Services-mediated integration VII. Peer-to-peer data integration VIII. Ontology-based data integration I or II (possibly in conjunction with the others) through an ontology Linked data by means of an ontology

9/33 Outline BioRDF Considerations One RDF-ised base Toward federation Scientific workflows Introduction Scientific workflows in the Semantic Web setting

10/33 Overview (Bio2RDF as example) Data integration with SWT What can be achieved with publicly (freely, open source) available tools and data sources Main immediate goal of Bio2RDF: convert those data sources to RDF How: Bottom-up ontology development of chosen subject domain Decide on architecture: which integration strategy, which data sources, which triplestore, which presentation tools Develop RDF-izer programs (jsp programs HTMLtoRDF, XMLtoRDF, SQLtoRDF, etc) Refinements, a.o.: URI normalization : multiple URIs denote the same entity, generate new one, link them with owl:sameas

10/33 Overview (Bio2RDF as example) Data integration with SWT What can be achieved with publicly (freely, open source) available tools and data sources Main immediate goal of Bio2RDF: convert those data sources to RDF How: Bottom-up ontology development of chosen subject domain Decide on architecture: which integration strategy, which data sources, which triplestore, which presentation tools Develop RDF-izer programs (jsp programs HTMLtoRDF, XMLtoRDF, SQLtoRDF, etc) Refinements, a.o.: URI normalization : multiple URIs denote the same entity, generate new one, link them with owl:sameas

11/33 Bio2RDF architecture from Belleau et al, 2008

12/33 First step in the workflow: SeRQL query to fetch basic data from Entrez and GO from Belleau et al, 2008

13/33 A consideration An ontology belongs to a community who adapts it, uses it and shares it. With the warehouse stored into a triplestore, it is possible to query the local knowledge base with SeRQL queries. However, the semantic web is meant to be distributed. With more RDF resources available on the web and by using the SPARQL [58] language and protocol, a standard defined by the W3C, the data warehousing concept could become obsolete in the future. This is one perspective of the semantic web. from Belleau et al, 2008

14/33 Outline BioRDF Considerations One RDF-ised base Toward federation Scientific workflows Introduction Scientific workflows in the Semantic Web setting

One strategy for a virtual repository, various architectures and tools 1 OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints (receptor explorer) AIDA Toolkit for cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations FeDeRate enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Vocabulary of interlinked Datasets (void) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints 1 Kei-Hoi Cheung, H Robert Frost, M Scott Marshall, Eric Prudhommeaux, Matthias Samwald, Jun Zhao, and Adrian Paschke. A journey to Semantic Web query federation in the life sciences. BMC Bioinformatics 2009, 10(Suppl 10):S10 15/33

16/33 Differences The layer at which the virtual repository abstraction is supported: Receptor Explorer: Semantic Service Bus (ESB + Jena + Sesame) API exposed to client applications AIDA toolkit: web services, using the SOA API exposed to client applications by WSDL FeDeRate: at the SPARQL query interface Generality of approach: Receptor Explorer is a single scenario with step-wise exploration/navigation AIDA Search interface is general-purpose and can be utilized to explore a wide range of RDF data retrieved from multiple locations (SKOS + Sesame, Virtuoso) FeDeRate also broad range of RDF-based applications that query a range of datasets and repositories

16/33 Differences The layer at which the virtual repository abstraction is supported: Receptor Explorer: Semantic Service Bus (ESB + Jena + Sesame) API exposed to client applications AIDA toolkit: web services, using the SOA API exposed to client applications by WSDL FeDeRate: at the SPARQL query interface Generality of approach: Receptor Explorer is a single scenario with step-wise exploration/navigation AIDA Search interface is general-purpose and can be utilized to explore a wide range of RDF data retrieved from multiple locations (SKOS + Sesame, Virtuoso) FeDeRate also broad range of RDF-based applications that query a range of datasets and repositories

17/33 FeDeRate s approach to query management Use the common variables in a SPARQL query to decompose it into separate queries for the different data sources; e.g., connecting them by a common human EntrezGene identifier:

18/33 FeDeRate s approach to query management Determines which variables in each GRAPH constraint are present in the SELECT (or are referenced in subsequent graph patterns), then translate the constraint into a subordinate SPARQL query:

19/33 FeDeRate s approach to query management Subsequent subordinate queries with bindings constraining the variables bound by earlier queries, expressed as standard SPARQL constraints:

20/33 Consideration The billion triple challenge Focus on data or on knowledge; how to combine it? Scalability Date integration strategy

Some recent observations from LODD A significant challenge... is the strong prevalence of terminology conflicts, synonyms, and homonyms. These problems are not addressed by simply making data sets available on the Web using RDF as common syntax but require deeper semantic integration For applications that focus on discovery and data navigation, having explicit links between data sources is often already a huge benefit even without semantic integration For other applications that rely on expressive querying or automated reasoning deeper integration is essential..., it would be beneficial if more community practices on publishing term and schema mappings would be established Anja Jentzsch, Bo Andersson, Oktie Hassanzadeh, Susie Stephens, Christian Bizer. Enabling Tailored Therapeutics with Linked Data. LDOW2009, April 20, 2009, Madrid, Spain. More at: http://esw.w3.org/topic/hclsig/lodd 21/33

Some recent observations from LODD A significant challenge... is the strong prevalence of terminology conflicts, synonyms, and homonyms. These problems are not addressed by simply making data sets available on the Web using RDF as common syntax but require deeper semantic integration For applications that focus on discovery and data navigation, having explicit links between data sources is often already a huge benefit even without semantic integration For other applications that rely on expressive querying or automated reasoning deeper integration is essential..., it would be beneficial if more community practices on publishing term and schema mappings would be established Anja Jentzsch, Bo Andersson, Oktie Hassanzadeh, Susie Stephens, Christian Bizer. Enabling Tailored Therapeutics with Linked Data. LDOW2009, April 20, 2009, Madrid, Spain. More at: http://esw.w3.org/topic/hclsig/lodd 21/33

Some recent observations from LODD A significant challenge... is the strong prevalence of terminology conflicts, synonyms, and homonyms. These problems are not addressed by simply making data sets available on the Web using RDF as common syntax but require deeper semantic integration For applications that focus on discovery and data navigation, having explicit links between data sources is often already a huge benefit even without semantic integration For other applications that rely on expressive querying or automated reasoning deeper integration is essential..., it would be beneficial if more community practices on publishing term and schema mappings would be established Anja Jentzsch, Bo Andersson, Oktie Hassanzadeh, Susie Stephens, Christian Bizer. Enabling Tailored Therapeutics with Linked Data. LDOW2009, April 20, 2009, Madrid, Spain. More at: http://esw.w3.org/topic/hclsig/lodd 21/33

22/33 Outline BioRDF Considerations One RDF-ised base Toward federation Scientific workflows Introduction Scientific workflows in the Semantic Web setting

23/33 Outline BioRDF Considerations One RDF-ised base Toward federation Scientific workflows Introduction Scientific workflows in the Semantic Web setting

24/33 Requirements 2 Seamless access to resources and service Service composition & reuse and workflow design Scalability Detached execution Reliability and fault-tolerance User-interaction Smart re-runs Smart (semantic) links Data provenance 2 Ludäscher et al. Scientific Workflow Management and the Kepler System.

25/33 Some general characteristics of Scientific Workflows 3 The ability to handle many and varied analysis tools; not merely database systems that have to be linked up, but the many (custom-made) analysis tools w.r.t. amount of databases Interfaces to a diverse range of computational environments (supercomputers, grid, Internet and Semantic Web) The ability to handle activity mixes that are different from typical business profiles and there are, at least initially, few canned and reusable workflows (i.e., design from scratch) Need for explicit representation of knowledge at different stages Auditability of the computations (when the results are used to make decisions that carry regulatory or legislative implications; e.g., data analysis of clinical trials, climate model predictions) 3 http://people.engr.ncsu.edu/mpsingh/papers/databases/workflows/sciworkflows.html

25/33 Some general characteristics of Scientific Workflows 3 The ability to handle many and varied analysis tools; not merely database systems that have to be linked up, but the many (custom-made) analysis tools w.r.t. amount of databases Interfaces to a diverse range of computational environments (supercomputers, grid, Internet and Semantic Web) The ability to handle activity mixes that are different from typical business profiles and there are, at least initially, few canned and reusable workflows (i.e., design from scratch) Need for explicit representation of knowledge at different stages Auditability of the computations (when the results are used to make decisions that carry regulatory or legislative implications; e.g., data analysis of clinical trials, climate model predictions) 3 http://people.engr.ncsu.edu/mpsingh/papers/databases/workflows/sciworkflows.html

25/33 Some general characteristics of Scientific Workflows 3 The ability to handle many and varied analysis tools; not merely database systems that have to be linked up, but the many (custom-made) analysis tools w.r.t. amount of databases Interfaces to a diverse range of computational environments (supercomputers, grid, Internet and Semantic Web) The ability to handle activity mixes that are different from typical business profiles and there are, at least initially, few canned and reusable workflows (i.e., design from scratch) Need for explicit representation of knowledge at different stages Auditability of the computations (when the results are used to make decisions that carry regulatory or legislative implications; e.g., data analysis of clinical trials, climate model predictions) 3 http://people.engr.ncsu.edu/mpsingh/papers/databases/workflows/sciworkflows.html

25/33 Some general characteristics of Scientific Workflows 3 The ability to handle many and varied analysis tools; not merely database systems that have to be linked up, but the many (custom-made) analysis tools w.r.t. amount of databases Interfaces to a diverse range of computational environments (supercomputers, grid, Internet and Semantic Web) The ability to handle activity mixes that are different from typical business profiles and there are, at least initially, few canned and reusable workflows (i.e., design from scratch) Need for explicit representation of knowledge at different stages Auditability of the computations (when the results are used to make decisions that carry regulatory or legislative implications; e.g., data analysis of clinical trials, climate model predictions) 3 http://people.engr.ncsu.edu/mpsingh/papers/databases/workflows/sciworkflows.html

25/33 Some general characteristics of Scientific Workflows 3 The ability to handle many and varied analysis tools; not merely database systems that have to be linked up, but the many (custom-made) analysis tools w.r.t. amount of databases Interfaces to a diverse range of computational environments (supercomputers, grid, Internet and Semantic Web) The ability to handle activity mixes that are different from typical business profiles and there are, at least initially, few canned and reusable workflows (i.e., design from scratch) Need for explicit representation of knowledge at different stages Auditability of the computations (when the results are used to make decisions that carry regulatory or legislative implications; e.g., data analysis of clinical trials, climate model predictions) 3 http://people.engr.ncsu.edu/mpsingh/papers/databases/workflows/sciworkflows.html

26/33 Do biologists and bioinformaticians need scientific workflows? What if we want to run a (scientific) workflow with tools and databases freely available on the Internet? What are the problems of (scientific) workflows, if any, and can they be solved with SWT? Do scientific workflows need Semantic Web technologies?

26/33 Do biologists and bioinformaticians need scientific workflows? What if we want to run a (scientific) workflow with tools and databases freely available on the Internet? What are the problems of (scientific) workflows, if any, and can they be solved with SWT? Do scientific workflows need Semantic Web technologies?

26/33 Do biologists and bioinformaticians need scientific workflows? What if we want to run a (scientific) workflow with tools and databases freely available on the Internet? What are the problems of (scientific) workflows, if any, and can they be solved with SWT? Do scientific workflows need Semantic Web technologies?

26/33 Do biologists and bioinformaticians need scientific workflows? What if we want to run a (scientific) workflow with tools and databases freely available on the Internet? What are the problems of (scientific) workflows, if any, and can they be solved with SWT? Do scientific workflows need Semantic Web technologies?

27/33 Outline BioRDF Considerations One RDF-ised base Toward federation Scientific workflows Introduction Scientific workflows in the Semantic Web setting

28/33 Further additions w.r.t. bio and the Semantic Web Software version of the Materials & Methods, i.e., a pipeline of activities that can, is, and has to be, be carried out more than once Repeatability of the in silico experiment Customization of the data sources and methods for each researcher open system Provenance of the data; or: a need for addressing the trust layer in the Semantic Web layer cake

28/33 Further additions w.r.t. bio and the Semantic Web Software version of the Materials & Methods, i.e., a pipeline of activities that can, is, and has to be, be carried out more than once Repeatability of the in silico experiment Customization of the data sources and methods for each researcher open system Provenance of the data; or: a need for addressing the trust layer in the Semantic Web layer cake

28/33 Further additions w.r.t. bio and the Semantic Web Software version of the Materials & Methods, i.e., a pipeline of activities that can, is, and has to be, be carried out more than once Repeatability of the in silico experiment Customization of the data sources and methods for each researcher open system Provenance of the data; or: a need for addressing the trust layer in the Semantic Web layer cake

28/33 Further additions w.r.t. bio and the Semantic Web Software version of the Materials & Methods, i.e., a pipeline of activities that can, is, and has to be, be carried out more than once Repeatability of the in silico experiment Customization of the data sources and methods for each researcher open system Provenance of the data; or: a need for addressing the trust layer in the Semantic Web layer cake

28/33 Further additions w.r.t. bio and the Semantic Web Software version of the Materials & Methods, i.e., a pipeline of activities that can, is, and has to be, be carried out more than once Repeatability of the in silico experiment Customization of the data sources and methods for each researcher open system Provenance of the data; or: a need for addressing the trust layer in the Semantic Web layer cake

29/33 Further additions w.r.t. bio and the Semantic Web Software version of the Materials & Methods, i.e., a pipeline of activities that can, is, and has to be, be carried out more than once Repeatability of the in silico experiment Customization of the data sources and methods for each researcher open system Provenance of the data; or: a need for addressing the trust layer in the Semantic Web layer cake

30/33 Where can we plug SWT languages into Scientific Workflows? RDF: common data format (for linking and integration) SPARQL: querying data OWL: representation of the knowledge across the workflow Rules: orchestrate the service execution Services (e.g., WSDL, OWL-S): to discover useful scripts that can perform a task in the workflow The trust (provenance) layer:... (currently with any of the previous ones)

30/33 Where can we plug SWT languages into Scientific Workflows? RDF: common data format (for linking and integration) SPARQL: querying data OWL: representation of the knowledge across the workflow Rules: orchestrate the service execution Services (e.g., WSDL, OWL-S): to discover useful scripts that can perform a task in the workflow The trust (provenance) layer:... (currently with any of the previous ones)

30/33 Where can we plug SWT languages into Scientific Workflows? RDF: common data format (for linking and integration) SPARQL: querying data OWL: representation of the knowledge across the workflow Rules: orchestrate the service execution Services (e.g., WSDL, OWL-S): to discover useful scripts that can perform a task in the workflow The trust (provenance) layer:... (currently with any of the previous ones)

30/33 Where can we plug SWT languages into Scientific Workflows? RDF: common data format (for linking and integration) SPARQL: querying data OWL: representation of the knowledge across the workflow Rules: orchestrate the service execution Services (e.g., WSDL, OWL-S): to discover useful scripts that can perform a task in the workflow The trust (provenance) layer:... (currently with any of the previous ones)

30/33 Where can we plug SWT languages into Scientific Workflows? RDF: common data format (for linking and integration) SPARQL: querying data OWL: representation of the knowledge across the workflow Rules: orchestrate the service execution Services (e.g., WSDL, OWL-S): to discover useful scripts that can perform a task in the workflow The trust (provenance) layer:... (currently with any of the previous ones)

30/33 Where can we plug SWT languages into Scientific Workflows? RDF: common data format (for linking and integration) SPARQL: querying data OWL: representation of the knowledge across the workflow Rules: orchestrate the service execution Services (e.g., WSDL, OWL-S): to discover useful scripts that can perform a task in the workflow The trust (provenance) layer:... (currently with any of the previous ones)

31/33 Examples w.r.t. SWT Taverna, on top of my Grid RDF for data interoperability, for provenance RDF query language for querying the RDF data OWL ontologies for the domain, for the services, for the workflows; for consistency checking, taxonomic classification WSMO s tasks Jena, Sesame, Protégé Kepler, on top of Ptolemy services explorations with ontologies Wings, on top of Pegasus ontologies (OWL), reasoning (with Jena)

Provenance and trust, system examples Taverna: experiment-, workflow-, and knowledge-provenance, representing a mixture of RDF(S) and OWL to represent the overall model, individual provenance graphs of a particular workflow 4 PASS experiments, with another provenance ontology for the workflow 5, and Pychinko, a Semantic Web rule engine to orchestrate the service execution 6 4 Carole Goble et al. Knowledge Discovery for biology with Taverna. In: Semantic Web: Revolutionizing knowledge discovery in the life sciences. 2007, pp355-395. 5 http://provenance.mindswap.org/provenance.owl 6 Jennifer Golbeck and James Hendler. A Semantic Web Approach to the Provenance Challenge. Concurrency Computat.: Pract. Exper., 2007 32/33

33/33 Summary BioRDF Considerations One RDF-ised base Toward federation Scientific workflows Introduction Scientific workflows in the Semantic Web setting