The RMap Project: Linking the Products of Research and Scholarly Communication Tim DiLauro

Similar documents
DOIs for Research Data

The DataCite Metadata Schema. Frauke Ziedorn Workshop: Metadata and Persistent Identifiers for Social and Economic Data 7th May 2012

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Powering Linked Open Data Applications

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

Interoperability Framework Recommendations

Ontology Servers and Metadata Vocabulary Repositories

For Attribution: Developing Data Attribution and Citation Practices and Standards

Preservation Standards (& Specifications) (&& Best Practices)

Serving Ireland s Geospatial as Linked Data on the Web

SHARING YOUR RESEARCH DATA VIA

Sharing Archival Metadata MODULE 20. Aaron Rubinstein

Data Curation Profile Human Genomics

Reproducibility and FAIR Data in the Earth and Space Sciences

Protecting Future Access Now Models for Preserving Locally Created Content

> Semantic Web Use Cases and Case Studies

Fedora Commons: Taking on the Challenge of the Next Generation of Scholarly Communication

A distributed network of digital heritage information

Introducing Fedora 4. Overview, examples, and features. David Wilcox,

Data Publishing and Data Linking Introducing SCHOLIX

Europeana update: aspects of the data

Multi-agent and Semantic Web Systems: Linked Open Data

RKB, sameas and dotac

The Semantic Institution: An Agenda for Publishing Authoritative Scholarly Facts. Leslie Carr

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

Web of Science. Platform Release Nina Chang Product Release Date: March 25, 2018 EXTERNAL RELEASE DOCUMENTATION

Brown University Libraries Technology Plan

Harvesting RDF triples

Eleven+ Views of Semantic Search

Enrichment, Reconciliation and Publication of Linked Data with the BIBFRAME model. Tiziana Possemato Casalini Libri

Reducing Consumer Uncertainty

NOTSL Fall Meeting, October 30, 2015 Cuyahoga County Public Library Parma, OH by

Management of Complex Product Ontologies Using a Web-Based Natural Language Processing Interface

OAI-ORE. A non-technical introduction to: (

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS

Making research data repositories visible and discoverable. Robert Ulrich Karlsruhe Institute of Technology

Collage: A Declarative Programming Model for Compositional Development and Evolution of Cross-Organizational Applications

Paving the Rocky Road Toward Open and FAIR in the Field Sciences

Building for the Future

Linking datasets with user commentary, annotations and publications: the CHARMe project

SciENCV - Putting the Pieces Together VIVO

Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure

Semantic Web Programming

How to contribute information to AGRIS

LORE: A Compound Object Authoring and Publishing Tool for Literary Scholars based on the FRBR. Anna Gerber, Jane Hunter

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

Finding Topic-centric Identified Experts based on Full Text Analysis

Scholix Metadata Schema for Exchange of Scholarly Communication Links

RDF: Resource Description Failures and Linked Data Letdowns

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

A Dublin Core Application Profile for Scholarly Works (eprints)

Description Cross Domain - Metadata Schema Registry Presentation to ISO Working Group Sydney, 2 November 2004

Linked Open Data Cloud. John P. McCrae, Thierry Declerck

Semantic Web: Core Concepts and Mechanisms. MMI ORR Ontology Registry and Repository

Description Cross-domain Task Force Research Design Statement

The CASPAR Finding Aids

The OpenAIREplus Project

Increasing access to OA material through metadata aggregation

Hello, I m Melanie Feltner-Reichert, director of Digital Library Initiatives at the University of Tennessee. My colleague. Linda Phillips, is going

The necessity of hypermedia RDF and an approach to achieve it

The European Repositories Landscape - The view from 20,000 feet

OAI AND AMF FOR ACADEMIC SELF-DOCUMENTATION

D WSMO Data Grounding Component

Assessing the FAIRness of Datasets in Trustworthy Digital Repositories: a 5 star scale

CODATA: Data Citation Workshop Perspectives from Editors and Publishers. Brooks Hanson Director, Publications, AGU

Research Data Management and Institutional Repositories

Launching the. Data Curation Network NDS/MBDH 2018

Data Governance for the Connected Enterprise

White Paper. EVERY THING CONNECTED How Web Object Technology Is Putting Every Physical Thing On The Web

Is Linked Data the future of data integration in the enterprise?

Copyright 2008, Paul Conway.

The Emerging Data Lake IT Strategy

National Documentation Centre Open access in Cultural Heritage digital content

GETTING STARTED WITH DIGITAL COMMONWEALTH

A Data API with Security and Graph-Level Access Control

Edinburgh DataShare: Tackling research data in a DSpace institutional repository

Business Model for Global Platform for Big Data for Official Statistics in support of the 2030 Agenda for Sustainable Development

A FRAMEWORK FOR MULTILINGUAL AND SEMANTIC ENRICHMENT OF DIGITAL CONTENT (NEW L10N BUSINESS OPPORTUNITIES) FREME WEBINAR HELD FOR GALA, 28 APRIL 2016

COMPUTER AND INFORMATION SCIENCE JENA DB. Group Abhishek Kumar Harshvardhan Singh Abhisek Mohanty Suhas Tumkur Chandrashekhara

Semantic Web Fundamentals

The Semantic Planetary Data System

Tara McPherson School of Cinematic Arts USC Los Angeles, CA, USA

Harvesting RDF Triples

ISNI Identifying Parties in Creative Supply Chains

Fedora: A network overlay approach to federated searching

Corso di Biblioteche Digitali

Not Just for Geeks: A practical approach to linked data for digital collections managers

Open Archives Initiative Object Reuse and Exchange Technical Committee Meeting, May 29, Edited by: Carl Lagoze & Herbert Van de Sompel

Linda Strick Fraunhofer FOKUS. EOSC Summit - Rules of Participation Workshop, Brussels 11th June 2018

Exploring the Generation and Integration of Publishable Scientific Facts Using the Concept of Nano-publications

Stewarding NOAA s Data: How NCEI Allocates Stewardship Resources

Interoperability in Science Data: Stories from the Trenches

James Hardiman Library. Digital Scholarship Enablement Strategy

SESAR, IGSN, & a vision for a Repository Portal and Hosted Collection Management

The Point of View Axis: Varying the Levels of Explanation Within a Generic RDF Data Browsing Environment

Development of an Ontology-Based Portal for Digital Archive Services

An Industry Definition of Business Architecture

ArchiMate 2.0. Structural Concepts Behavioral Concepts Informational Concepts. Business. Application. Technology

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Transcription:

The RMap Project: Linking the Products of Research and Scholarly Communication 2015 04 22 Tim DiLauro <timmo@jhu.edu>

Motivation Compound objects fast becoming the norm for outputs of scholarly communication. In many circumstances, the traditional article is not the object of long-term interest for at least some segment of the community. Components may reside in different repositories, maintained by different institutions, employing different technologies. Some of these components and their repositories are not part of the traditional scholarly communication ecosystem. Acknowledgement that these objects do not stand alone -- and of a broad need to understand their context.

Research Partnership Data Conservancy: Expertise in management of large data archives from multiple disciplines IEEE: Expertise in management of data-intensive scholarly journal publications Portico: Expertise in digital preservation, publisher workflow requirements, and existing relationships with 275 publishers Funding from the Alfred P. Sloan Foundation

Some High-Level Goals RMap tool working prototype Collaborative partnerships with the community System that supports emerging forms of digital scholarship and publishing Plan for sustainability of the project

Work Plan Year One Planning Phase: Gather requirements, create use cases, hold workshop with stakeholders, refine use scenarios based on community feedback [You are here] Year Two Prototype Development: Create system to identify, store, update, and retrieve relationships among publications and other forms of scholarly output, including data and software

TECHNOLOGY The RMap Project

Key Objectives Support assertions from broad set of contributors Integrate with Linked Data Leverage data from existing scholarly publishing stakeholders (publishers, identifier providers, data and software repositories) Provide support for agents and other resources without identifiers (authors, textual citations)

Data Model (simplified)

Data Model - Resource Things (abstract or concrete) that can have an identifier Basic building block of the WWW Key entity for description and retrieval within RMap Other core entities in the data model are also Resources

Data Model - Agent A person or thing (or group of these) responsible for some action Distinction between scholarly (e.g., author, funder, publisher, data processing program) and system (RMap component, user, etc.)

Data Model - Event Capture provenance within RMap system An action or activity involving System Agents and other resources Provenance of Scholarly Resources can be captured separately by registering it in RMap via DiSCOs.

Data Model RDF Statement (triple) Building blocks of the semantic web Conceptually of the form: <subject> <predicate> <object> Like subject verb object in English

Data Model - DiSCO Distributed Scholarly Compound Object Primary unit of registration within RMap Basically a set of resources and related RDF description. Similar to OAI ORE

Data Model - DiSCO DiSCO A-2 Compound Object source S-1 D-2 outputof

Create dataset Data Publishers RMap Linked Data Graph Incoming DiSCO C Creator A Article Dataset D S

Create software S-1 Data Publishers S-1 S-1 RMap Linked Data Graph Incoming DiSCO C Creator A Article Dataset D S

Generate dataset D-2 Data Publishers S-1 outputof inputof D-2 source outputof S-1 inputof D-2 source RMap Linked Data Graph Incoming DiSCO C Creator A Article Dataset D S

Article related to D-2 Data Publishers S-1 outputof source inputof D-2 A-1 A-1 D-2 source RMap Linked Data Graph Incoming DiSCO C Creator A Article Dataset D S

Creation of software S-2 Data Publishers S-1 outputof source inputof D-2 A-1 S-2 S-2 RMap Linked Data Graph Incoming DiSCO C Creator A Article Dataset D S

Generation of dataset D-3 Data Publishers S-1 outputof inputof D-2 source A-1 outputof S-2 inputof inputof D-3 source S-2 source outputof D-3 RMap Linked Data Graph Incoming DiSCO C Creator A Article Dataset D S

Article A-2 related to D-3 Data Publishers S-1 outputof inputof D-2 source A-2 A-1 D-3 A-2 inputof source D-2 source S-2 outputof D-3 RMap Linked Data Graph Incoming DiSCO C Creator A Article Dataset D S

Correct article identifier Data Publishers S-1 outputof inputof D-2 source A-3 A-1 D-3 A-3 inputof source D-2 source S-2 outputof D-3 RMap Linked Data Graph Incoming DiSCO C Creator A Article Dataset D S

Dataset connections Data Publishers S-1 outputof inputof D-2 source A-3 A-1? inputof S-2 source outputof D-3 RMap Linked Data Graph Perspective C Creator A Article Dataset D S

Creator connections Data Publishers S-1 outputof inputof D-2 source A-3 A-1? inputof S-2 source outputof D-3 RMap Linked Data Graph Perspective C Creator A Article Dataset D S

Associate resources with identity Data Publishers Identity Providers S-1 outputof inputof D-2 source A-3 A-1 A-1 D-2 inputof S-2 outputof source D-3 S-1 C Creator A Article Dataset D S

Associate resources with more identities Data Publishers Identity Providers S-1 outputof inputof inputof D-2 source A-3 A-1 A-1 D-2 A-3 D-3 source S-2 outputof D-3 S-1 S-2 C Creator A Article Dataset D S

RESTful APIs Programming language independent Easy to test with web tools (curl, wget) Abstraction away from underlying implementations and models, which we expect to change more often

REST APIs (subset) Function HTTP verb API rel path (base=/api/{version}) Retrieve related triples GET /{resourceuri}/stmts Retrieve related events GET /{resourceuri}/events Retrieve related DiSCOs GET /{resourceuri}/discos Create DiSCO POST /disco Retrieve DiSCO GET /disco/{discoid} Update DiSCO POST /disco/{discoid}/update Delete a DiSCO DELETE /disco/{discoid}/delete Retrieve an Event GET /event/{eventid} Get DiSCOs related to event GET /event/{eventid}/discos Perform SPARQL query POST /sparql

API Specification and Documentation Behaviors API paths Data Models Serializations (media types, content negotiation) Implementations

API Description (simplified) Function: Update DiSCO Behavior within RMap Failed requests will be rolled back, so as not to require manual cleanup (transaction) Insufficient authorization will result in failed transaction and offer to authenticate with other credentials. A new DiSCO will be instantiated; the previous (old) DiSCO will be marked inactive Add triple <new DiSCO URI> <hasversion> <old DiSCO URI> Resources will be instantiated for objects without identifiers (e.g., citation as string) Scholarly Agents will be instantiated for agents lacking URIs (e.g., as string) Event(s) created capture activity Request Verb/relative path: POST /disco/{id}/update Path parameters: {id} URI of existing (old) DiSCO Model: Resources + relationships (like OAI ORE) Serializations: RDF/XML, Turtle, or JSON LD Response Model: (custom) Serializations: JSON, HTML New DiSCO URI in header: Location: <new DiSCO URI> Old DiSCO URI in header: Link <old DiSCO URI>;rel= predecessor version Event URI(s) in header: Link <event URI>;rel= http://www.w3.org/ns/prov#wasgeneratedby [Enumerate response codes, labels, and their meanings]

API Coverage Current focus on APIs to populate and access the graph Future focus Authentication Administration Composition & normalization Inferencing System operability

Technical Team Activity Developed and captured initial set of use cases Developed and documented initial data model Specified API behaviors Developed and documented API methods, including REST paths, request and response formats, models, and serializations (media types) Still a couple issues to sort out Prototype platform implementation Participation in RDA Data Publishing groups Actively working on harvesting relationship data to push into RMap

Harvesting links and proxy registration RMap Instance Register Register Publisher Harvest RMap Harvester Data Repository Harvest Harvest Register Data Repository Identity Service Publisher

Community Engagement The RMap Project

Workshop: Key Feedback RMap Project should be a clearinghouse or metaservice that captures information about various data linking services Important to add value to the publication & data linking work already underway in the community Having an established publisher as a research partner is a comparative advantage for the RMap Project

Workshop Feedback (continued) One approach would be to focus on the input side of the process (with special attention for software and research workflows) in order to create a generalizable approach to gathering content The challenge of secondary data, such as the inferred connections between publications and data or software remains unaddressed and important

Some of the things you can do for RMap Feedback Do the articulated use cases, approach, goals, and proposed offerings align with your interests. Where they don t, how could we better align? Share Your Data As we populate our prototype, we need to gather a broad swath of test data, covering a variety of resource types (e.g., journals, repositories, funders, s, articles, data, software, instruments, samples) and the relationships that connect them. Use Consider using RMap capabilities to register, discover connections to, and augment your own content, once those capabilities become available.

Some of the things RMap will do for you Aggregate and offer an inclusive and normalized view of distributed scholarly compound objects and associated resource relationships, including those from sources without membership in existing identity services (e.g., source code management platforms, institutional repositories). Reduce cost and complexity of transforming information from multiple systems. Provide a single mechanism to discover context (e.g., relationships and related resources) for scholarly objects in which you are interested. Reduce cost and complexity of developing and managing multiple interfaces for multiple systems. Expose records of a particular statement (e.g., who has asserted that Resource X was created by Agent Y?) or the history of assertions associated with a with a particular resource (i.e., what has been said about Resource X?). Capture sufficient provenance information to allow evaluation of assertions by their source and content. Streamline logic for automatic integration of citation and reference to objects of interest.

Team Members and Acknowledgements Sayeed Choudhury, Tim DiLauro: Data Conservancy, Johns Hopkins Mark Donoghue, Gerry Grenier, Renny Guida, Ken Rawson: IEEE Vinay Cheruku, Karen Hanson, Amy Kirchhoff, John Meyer, Sheila Morrissey, Stephanie Orphan, Jabin White, Kate Wittenberg: Portico This research project is made possible through generous support from the Alfred P. Sloan Foundation We thank our workshop participants for their valuable feedback

Q&A For more information, please visit: http://rmap-project.info/rmap/