Combining Content, Semantic Relationships, and Web Services Fedora

Similar documents
Fedora. CS 431 April 17, 2006 Carl Lagoze Cornell University. Acknowledgements: Sandy Payette (Cornell)

Representing and Storing Complex Digital Objects Fedora

Fedora Relationships and Information Network Overlays. CS 431 April 19, 2006 Carl Lagoze Cornell University

Fedora. An Architecture for Complex Objects and their Relationships. Carl Lagoze, Sandy Payette, Edwin Shin, Chris Wilper

Fedora Commons Update

Comparing Open Source Digital Library Software

Introduction to Federico 2.0 and Fedora Commons

Institutional repositories: description of VITAL as an example of a Fedora-based digital assets management system.

Share.TEC Repository System

The Fedora Project. D-Lib Magazine April An Open-source Digital Object Repository Management System. Introduction

Mellon Fedora Technical Specification. (December 2002)

Digital Objects, Data Models, and Surrogates. Carl Lagoze Computing and Information Science Cornell University

Article begins on next page

OAI-ORE. A non-technical introduction to: (

Fedora: A network overlay approach to federated searching

Building for the Future

ARROW Update 4 November The ARROW team

DISCOVER THE POWER OF VITAL The Solution for Your Digital Collection Management

RUtgers COmmunity REpository (RUcore)

Web-based workflow software to support book digitization and dissemination. The Mounting Books project

Persistent identifiers, long-term access and the DiVA preservation strategy

Drupal for Virtual Learning And Higher Education

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Representing LEAD Experiments in a FEDORA digital repository

Ing. José A. Mejía Villar M.Sc. Computing Center of the Alfred Wegener Institute for Polar and Marine Research

DSpace Fedora. Eprints Greenstone. Handle System

Web Services in Cincom VisualWorks. WHITE PAPER Cincom In-depth Analysis and Review

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

Fedora Commons: Taking on the Challenge of the Next Generation of Scholarly Communication

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Using metadata for interoperability. CS 431 February 28, 2007 Carl Lagoze Cornell University

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Agenda. Summary of Previous Session. XML for Java Developers G Session 6 - Main Theme XML Information Processing (Part II)

Digital Library Interoperability. Europeana

Metadata. Week 4 LBSC 671 Creating Information Infrastructures

Introduction to Archivists Toolkit Version (update 5)

Knowledge-based Grids

Appendix REPOX User Manual

The Mellon Fedora Project Digital Library Architecture Meets XML and Web Services

LORE: A Compound Object Authoring and Publishing Tool for Literary Scholars based on the FRBR. Anna Gerber, Jane Hunter

Ontology Servers and Metadata Vocabulary Repositories

Mitigating Risk of Data Loss in Preservation Environments

XML information Packaging Standards for Archives

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

Building a Digital Repository on a Shoestring Budget

Content-based image retrieval integrated into Fedora

Assessment of product against OAIS compliance requirements

The Semantic Institution: An Agenda for Publishing Authoritative Scholarly Facts. Leslie Carr

Registry Interchange Format: Collections and Services (RIF-CS) explained

CUSTOMIZED OAI-ORE AND OAI-PMH EXPORTS OF COMPOUND OBJECTS FOR FEDORA REPOSITORIES

Fedora and GSearch in a Research Project about Integrated Search Open Repositories 2009

Medici for Digital Cultural Heritage Libraries. George Tsouloupas, PhD The LinkSCEEM Project

Working with Islandora

Problem: Solution: No Library contains all the documents in the world. Networking the Libraries

C exam. IBM C IBM WebSphere Application Server Developer Tools V8.5 with Liberty Profile. Version: 1.

Data Exchange and Conversion Utilities and Tools (DExT)

About the Edinburgh Pathway Editor:

Oracle Fusion Middleware

The RMap Project: Linking the Products of Research and Scholarly Communication Tim DiLauro

D WSMO Data Grounding Component

Metadata Standards and Applications. 4. Metadata Syntaxes and Containers

From Raw Sensor Data to Semantic Web Triples Information Flow in Semantic Sensor Networks

COMPUTER AND INFORMATION SCIENCE JENA DB. Group Abhishek Kumar Harshvardhan Singh Abhisek Mohanty Suhas Tumkur Chandrashekhara

Java J Course Outline

OAI-PMH. DRTC Indian Statistical Institute Bangalore

A Repository of Metadata Crosswalks. Jean Godby, Devon Smith, Eric Childress, Jeffrey A. Young OCLC Online Computer Library Center Office of Research

The adore Federation Architecture

Building a federated science web one link at a time

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol for Dynamic Metadata Management Under Grid Environment

DC Regional Fedora Users Meeting

The OAIS Reference Model: current implementations

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

Open Source Software Packages for E-Resource Management

Semantic Extensions to Defuddle: Inserting GRDDL into XML

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 14 Database Connectivity and Web Technologies

Web Services Development for IBM WebSphere Application Server V7.0

Development of an Ontology-Based Portal for Digital Archive Services

A Simple Mass Storage System for the SRB Data Grid

Agent-Enabling Transformation of E-Commerce Portals with Web Services

B. Assets are shared-by-copy by default; convert the library into *.jar and configure it as a shared library on the server runtime.

An Annotation Tool for Semantic Documents

COMP9321 Web Application Engineering

Policy-Driven Repository Interoperability: Enabling Integration Patterns for irods and Fedora

Adding OAI ORE Support to Repository Platforms

COMP9321 Web Application Engineering

How to contribute information to AGRIS

Chapter 2 Introduction

ADOBE DIGITAL ENTERPRISE PLATFORM DOCUMENT SERVICES OVERVIEW

Powering Linked Open Data Applications

WWW, REST, and Web Services

Active Endpoints. ActiveVOS Platform Architecture Active Endpoints

XML for Java Developers G Session 8 - Main Theme XML Information Rendering (Part II) Dr. Jean-Claude Franchitti

Distributed Multitiered Application

Research Data Repository Interoperability Primer

Red Hat JBoss Data Virtualization 6.3 Glossary Guide

Semantic Web Fundamentals

AGGREGATIVE DATA INFRASTRUCTURES FOR THE CULTURAL HERITAGE

Ways for a Machine-actionable Processing Chain for Identifier, Metadata, and Data

An aggregation system for cultural heritage content

Transcription:

Combining Content, Semantic Relationships, and Web Services Fedora CS 431 April 12, 2006 Carl Lagoze Cornell University Acknowledgements: Sandy Payette (Cornell) Sang Shin (Sun)

What is a Web Service A software application Identified by a URI Interfaces and bindings described by XML Supports direct interactions with other software applications Messaging (request and response) based on XML Uses Internet protocols (e.g., HTTP)

Basic Web Service Architecture

Web Services Compents (SOAP) Simple Object Access Protocol XML-based RPC Uses XML for data encoding Defines Message envelope Encoding Rules RPC Convention Binding with underlying protocol

Web Service Components - WSDL Web Services Description Language Components Abstract Definition of operations and messages Concrete binding to network protocol and endpoint address

The Fedora Project Fedora Flexible Extensible Digital Object Repository Architecture Open source software Not Red Hat! Mozilla Public License http://www.fedora.info

Heterogeneous Digital Content Conventional objects Complex, compound, dynamic objects

Fedora History Cornell Research (1997-present) DARPA and NSF-funded research First reference implementation developed Distributed, Interoperable Repositories (experiments with CNRI) Policy Enforcement First Application (1999-2001) University of Virginia digital library prototype Technical implementation: adapted to web; RDBMS storage Scale/stress testing for 10,000,000 objects Open Source Software (2002-present) Andrew W. Mellon Foundation grants Technical implementation: XML and web services Fedora 1.0 (May 2003) Fedora 2.0 (Jan 2005)

Fedora Use Cases Digital Library Collections Institutional Repository Educational Software Information Network Overlay Digital archives and preservation Digital Asset Management Content Management System Scholarly publishing

Selected Fedora Users University of Virginia: digital library (image collector, EAD, e-texts) VTLS (software company): commercial product (VITAL) Tufts University: education (VUE/concept maps); digital library Northwestern: academic technologies (images, art, video, e-texts) National Science Digital Library (NSDL): Cornell Core Integration ARROW: National Library of Australia and Monash University Royal Library of Denmark and DTU Rutgers University: digital library (e-journals, numeric data) Indiana University: EVIA Digital Archive (video) American Geophysical Union: scholarly publications Max Planck Institue: Scholarly Communication Cornell University: Bear Access Yale University electronic records New York University: humanities computing; digital library OhioLink DISA South Africa, History of Apartheid resistance

Why Fedora? (1) Digital Object Model Abstraction for heterogeneous digital resources Container for content and metadata Aggregate local and remote content Associate behaviors with objects (extensible service interfaces) Repository web service Digital object storage Web service APIs (SOAP and REST) to manage, access, search Relationships Define and query object-to-object relationships Feature-worthy for archiving and preservation XML object serialization for ingest, storage, and export Content versioning Event history

Why Fedora? (2) Content repurposing Reuse digital content in different contexts Re-purpose content via mechanisms for dynamically transforming content to fit new requirements Web Services SOAP and REST bindings WSDL to define interfaces XML transmission Easy integration with other apps and systems Does not assume any particular workflow or end-user application Generic repository service as substrate

Digital Object Model

Graph View of Fedora Objects hasrep hasmember info:fedora/ demo:11 hasrep hasrep hasrep info:fedora/demo:11/dc info:fedora/demo:11/thumb info:fedora/ demo:10 info:fedora/demo:11/high info:fedora/demo:11/bdef:2/zpan hasrep hasmember info:fedora/ demo:12 hasrep hasrep info:fedora/demo:12/dc info:fedora/demo:10/bdef:1/members info:fedora/demo:12/thumb

Fedora Digital Object Model Component View Persistent ID (PID) Relations (RELS-EXT) Dublin Core (DC) Digital object identifier Reserved Datastreams Key object metadata Audit Trail (AUDIT) Datastream Datastream Default Disseminator Disseminator Datastreams Set of content or metadata items Disseminators Pointers to service definitions to provide service-mediated views

The Datastream Component 4 Classifications for Datastreams Inline XML Fedora stores a name-spaced block of XML content within the Fedora digital object XML file. Managed Content Fedora stores and manages the content bytestream (non-xml content) External Referenced Fedora stores a reference (URL) to the content External Redirected Fedora stores a reference (URL) to the content, but will not mediate access to content. (Optimized for streaming)

Simple Fedora model for aggregating static content Representations map to datastreams Datastreams may be local or surrogates (redirect) to remote data REST URL s give client access to representations

Digital Object Aggregating Local Content http://localhost:8080/fedora/get/demo:100 demo:100 Digital Object Header Page Fedora API DC HTML http://localhost:8080/fedora/get/demo:100/pdf application/pdf PDF TEX Fedora Repository

Digital Object for Local and Remote Content http://localhost:8080/fedora/get/demo:200 demo:200 Digital Object Header Page DC Fedora API IMAGE1 IMAGE2 http://localhost:8080/fedora/get/demo:100/image3 Image/jpeg IMAGE3 MyText Fedora Repository

Fedora for dynamic content Representations map to service-based transforms of data (in addition to static datastreams) Opaque to REST based access (client see only representations, not how they are produced) Motivating examples Canonical XML metadata format XSLT to Dublin Core Document source in TeX, programmatic transform to PDF, PS, HTML, etc.

Understanding Dynamic Disseminations (1)

Understanding Dynamic Disseminations (2) Behavior Definitions (bdef) Special digital object defining client side functionality (method template) Behavior Mechanism (bmech) Special digital object that refines a bdef by defining: Data profile: set of datastreams required for execution Service binding: where the work is performed May be many bmechs for a bdef Disseminator Association of a bmech/bdef with a digital object endowing it with bdef-defined functionality (methods) A digital object may have multiple disseminators (polymorphic typing)

Understanding Dynamic Disseminations (3) Client Fedora API Data Dissemination Requests Uniform API, Data & Service Mediation Datastreams as data surrogates Web Service Transformation of data according to arguments bdef bmech Defines client visible operations Operations defined by Augments API Disseminator Data binding Service & data binding defined by Defines data binding profile Registers service

Dynamic Dissemination Access Client Request http://localhost:8080/fedora/get/demo:2/ demo:bdef1/m1?arg1=val1 Data access from dependent data streams DC Web Service Invocation http://otherhost.org/servlet/ service1?arg1=ds1&arg2=ds2&arg3=val1 Client Response Fedora API Datastream1 Datastream2 Datastream3 Web Service Response Disseminator1 Fedora Repository Web Service

Dynamic Dissemination Example Client Request http://localhost:8080/fedora/get/ demo:300/demo:ex3bdef/getcontent Data access from dependent data streams DC Web Service Invocation http://localhost:8080/service/ saxon?xsl=xsl&source=source HTML Output Fedora API XSL (xml to HTML) Source (poem data) Saxon Response Disseminator1 Fedora Repository Saxon Service

Fedora XML for digital objects FOXML (Fedora Object XML) Simple XML format directly expresses Fedora object model Easily adapts to Fedora new and planned features Easily translated to other well-known formats Internal storage format for objects in repository XML-based Ingest/Export of objects FOXML, METS (Fedora extension) Extensible to accommodate new XML formats Planned: METS 1.4, MPEG21 DIDL

2 FOXML Object Properties <foxml:objectproperties> <foxml:property NAME="http://www.w3.org/1999/02/22-rdf-syntax-ns#type" VALUE="FedoraObject"/> <foxml:property NAME="info:fedora/fedora-system:def/model#state" VALUE="A" /> <foxml:property NAME="info:fedora/fedora-system:def/model#label" VALUE="Sandy's Test Object"/> <foxml:property NAME="info:fedora/fedora-system:def/model#contentModel" VALUE="TEST"/> </foxml:objectproperties>

2 FOXML Datastream (type E ) <foxml:datastream CONTROL_GROUP="E" ID="DS5" STATE="A" VERSIONABLE="true"> <foxml:datastreamversion ID="DS5.0" MIMETYPE="image/x-mrsid-image" LABEL="Pavilion III"> <foxml:contentlocation REF="http://iris.lib.virginia.edu/mrsid//archerp01.sid" TYPE="URL"/> </foxml:datastreamversion> </foxml:datastream>

2 FOXML Relationships Datastream <foxml:datastream ID="RELS-EXT" CONTROL_GROUP="X"> <foxml:datastreamversion ID="RELS-EXT.0" MIMETYPE="text/xml" LABEL="Relationship Metadata"> <foxml:xmlcontent> <rdf:rdf xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#".> <rdf:description rdf:about="info:fedora/image:100"> <fedora:ismemberofcollection rdf:resource="info:fedora/history:49"/> <fedora:ismemberofcollection rdf:resource="info:fedora/architecture:48"/> </rdf:description> </rdf:rdf> </foxml:xmlcontent> </foxml:datastreamversion> </foxml:datastream>

2 FOXML Disseminator <foxml:disseminator ID="DISS2" BDEF_CONTRACT_PID="demo:8" STATE="A" VERSIONABLE="true"> <foxml:disseminatorversion ID="DISS2.0" BMECH_SERVICE_PID="demo:9" LABEL="MrSID Service"> <foxml:serviceinputmap> <foxml:datastreambinding DATASTREAM_ID="DS5" KEY="MRSID" LABEL= Image binding"/> </foxml:serviceinputmap> </foxml:disseminatorversion> </foxml:disseminator>

Fedora Resource Index: Using RDF and ontologies

Fedora Digital Objects Resource Index View dc:creator lastmoddate "Eddie Shin" hasmember info:fedora/ demo:11 hasrep hasrep "2005-01-10:11:02" info:fedora/demo:11/dc info:fedora/ demo:10 info:fedora/demo:11/bdef:2/gethigh hasmember info:fedora/ demo:12 dc:creator lastmoddate "Chris Wilper" hasrep hasrep hasrep "2005-02-01:12:05" dc:creator lastmoddate info:fedora/demo:12/dc info:fedora/demo:10/bdef:1/members info:fedora/demo:12/bdef:2/gethigh "2005-01-01:10:00" "Elly Cramer"

Fedora 2.0 and RDF Object-to-object Relationships Ontology of common relationships (RDF schema) Relationships stored in special datastream (RELS-EXT) Resource Index (RI) RDF-based index of repository (Kowari triple-store) Graph-based index includes: Object properties and Dublin Core Object Relationships Object Disseminations RI Search Powerful querying of graph of inter-related objects REST-based query interface (using RDQL or ITQL) Results in different formats (triples, tuples, sparql)

Uses of Object Relationships Define collections (e.g., collection objects) Assert critical relationships among object for management purposes Enable network overlay Surrogate objects referring to external entities Assert relationships among them Assert other relationships (e.g., annotations) Enable navigation of repository (as tree or graph)

Fedora Relationship Ontology (RDFS) ispartof / haspart ismemberof / hasmember isdescriptionof / hasdescription hasequivalent others

Demo: Collection Member Relationships Collection Object [smiley] Datastream containing a query to Resource Index for all members of collection Image Objects [brush] Use RELS-EXT datastream to assert relationship to collection object

Fedora Repository Service

Fedora Repository Service Client App Web Browser Batch Program Server App Web Service Exposure Layer HTTP SOAP HTTP SOAP HTTP SOAP HTTP Manage Access Search OAI Provider User Authentication Management Subsystem Object Mgmt Security Subsystem Policy Mgmt Policy Enforcement Access Subsystem Search SOAP Remote Service Object Validation Users/Groups Object Reflection HTTP Local Service PID Generation Policies Dissemination Storage Subsystem Digital Objects Datastreams RDBMS External Content Source HTTP HTTP External Content Retriever Content XML Triples

Fedora Repository: 3 Layers 1. Interfaces Access/Search Service Management Service OAI Provider Service Resource Index Service 2. Modules Configurable modules that implement all repository functionality in terms of the Fedora digital object model. 3. Persistent Store RDBMS Digital object registry Object cache for performance File System XML object serializations Managed Content (Datastreams)

Fedora 2.0 Server Feature Set Management module Ingest and Export (NEW! METS or FOXML) Validation (XML and Schematron Rules) PID assignment Replication to object cache Incremental indexing of metadata Object create/modify/delete/purge XML Translation module METS or FOXML ingest and export Convert between formats Storage module: File system for XML object wrappers relational db object registry and object cache Content Versioning Automatic version control for datastreams and disseminators Enables date-time stamped API requests (see object as it looked then)

Fedora 2.0 Server Feature Set Access and Dissemination modules Mediation - auto-dispatching to distributed web services for content transformation Built-in services: XSLT, image manipulation, xml-to-pdf Search Module Searching of object properties and DC record of each object Security module HTTP Basic Authentication and simple access control NEW! LDAP tie-in for user attributes NEW! XACML policies and policy enforcement Future: Shibboleth OAI-PMH Resource Index RDF-based index of repository (Kowari triple-store) Contains key object attributes, DC, relationships REST-based query interface (using RDQL or ITQL)

Fedora Web Service APIs in a Nutshell Management Service (API-M) Ingest Object Export Object Get Object XML Purge Object Modify Object Get Next PID Get Datastream(s) Get DatastreamHistory Get DisseminatorHistory Get Disseminator(s) Add/modify/purge Datastream Add/modify/purge Disseminator Set State

Fedora Web Service APIs in a Nutshell Access Service (API-A and API-A-LITE) Describe Repository Get Object Profile Get Object History Get Datastream Get Dissemination Find Objects Resume Find Objects

Fedora Web Service APIs in a Nutshell API-A-Lite Repository-level operations: fedora/describe - Describe Repository fedora/search methods to locate objects via the default repository index Object-level operations: fedora/get - method to get object profile fedora/get/.. method to disseminate a view of an object s content Fedora/getMethods methods get information about all disseminations available on object OAI-PMH Provider Service All OAI-PMH methods to harvest OAI-DC from each object

Fedora 2.0 - Clients Fedora Administrator (via Fedora SOAP interfaces) Java Swing client Ingest/Export objects Batch creation and modification of objects One-up creation and modification of objects Search repository Wizards for creating BDEF/BMECH objects Web Browser (via Fedora REST interfaces) Access, Search, OAI Resource Index Selected management operations Command Line Utilities Ingest, export, purge Migration

Fedora Software Distribution Open Source (Mozilla Public License) 100% Java (Sun Java J2SDK1.4) Supporting Technologies Apache Tomcat and Apache Axis (SOAP) Xerces for XML parsing and validation Saxon for XSLT transformation Schematron for validation MySQL and Mckoi relational database Oracle 9i support Kowari for triple-store Deployment Platforms Windows 2000, NT, XP Solaris Linux Mac OSX

Fedora 2.1.1 (Apeil 2006) Authentication plug-ins HTTP basic authentication and SSL Plug-in #1 : user/password file Plug-in #2 : LDAP tie-in Plug-in #3 : Radius Authentication Authorization module XACML policy enforcement for API operations New OAI Provider (stand-alone service) Support for MPEG21-DIDL (ingest/export/oai) Performance testing and improvements Integration with Storage Resource Borker