A Repository of Metadata Crosswalks Jean Godby, Devon Smith, Eric Childress, Jeffrey A. Young OCLC Online Computer Library Center Office of Research DLF-2004 Spring Forum April 21, 2004
Outline of this talk Crosswalks and metadata translation Solution 1: A collection of XSLT scripts Solution 2: Pathfinders for crosswalks Solution 3: A crosswalk repository Open issues Project status
Our research project goals A robust design for metadata translation A clean separation of: document data model schema translations machinery Support for current practice and foreseeable innovation A metadata translation system/toolkit An unplugged service for metadata translation A place for human input (intellectual mappings) in an automated system
Our system design Crosswalk repository client A metadata crosswalk Record translation client A record Metadata Translator A transformed record
The problem Reusing metadata often requires translation Available translation options typically: Support single use cases Produce low-fidelity translations Are cumbersome to maintain 100 a Shakespeare, William, d 1564-1616.P 245 a Hamlet.P 260 a New York : b Penguin Books, c c2003.p <dc:creator>shakespeare, William,1564-1616</> <dc:title>hamlet</> <dc:publisher>penguin Books</> <dc:date>2003</>
What is a crosswalk? Crosswalks are used to translate between different metadata element sets. The elements (or fields) in one metadata set are correlated with the elements of another metadata set that have the same or similar meanings. This is also sometimes called semantic mapping. Source: Canadian Heritage Information Network (http://www.chin.gc.ca/english/)
An example: Dublin Core to Encoded Archival Description (EAD)
Why use XSLT for crosswalks? It capitalizes on current trends that model structured text in XML. It is a reasonable solution for lightweight processes and simple semantic mappings. So, an XSLT repository would: Reduce duplication of effort. Promote the use of standards.
Which crosswalks are equivalent? A test client If they re not equivalent, how do they differ? Which crosswalks have XML schemas that match my data?
A crosswalk pathfinder
In the XSLT collection Some problems Information needed for executing the scripts is missing. Undocumented XSLT scripts aren t crosswalks, as we have defined them. Syntax and semantics have been dissociated. The collection can t be mined. In the pathfinder page The documents vary in scope and granularity. Executable code is often difficult to locate. Pathfinders aren t designed for browsing and searching.
The crosswalk repository (3.0) Components of our solution: Model the crosswalk as a complex object using the Library of Congress Metadata Encoding and Transmission (METS) standard. Assemble the records into a searchable repository built on Open Archives Initiative (OAI) standards.
A crosswalk as a METS record Describe the crosswalk object in the METS header. Assemble and identify six objects in the METS structural map: The source metadata schema The target metadata schema The crosswalk Human-readable and executable versions of each Associate metadata for each file in the METS Descriptive Metadata Section.
OCLC s OAI repository The Open Archives Initiative develops and promotes interoperability standards to facilitate the efficient dissemination of content. Primary among these is the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Source:oclc.org/research Collects publicly accessible XML-encoded metadata on a range of research projects into a searchable collection, including thesauri and Electronic Theses and Dissertations. Includes XML tools required to manage the data, such as XSLT scripts. Serves as a testbed for research on information registries, URL maintenance, and searching.
A crosswalk METS record in the OAI repository
What the METS encoding solves The semantic and syntactic information required for interpreting and executing a crosswalk is collected into a single object. The repository is searchable by humans and automated processes. Services can be built on top of it. It encourages the development and standardization of crosswalks. These outcomes are possible because every component in the system is a standard.
Some possible services Translations Queries Which encodings have been done for a given metadata schema or namespace? Interactions with data Given the XML schema and namespace referenced in my data, does this repository have any XSLT scripts that process it? Documentation The METS crosswalk object can be associated with a given set of records to document which standards/versions/scripts were used to convert it.
Open issue 1 Are crosswalks a potential standard? Or just a local solution for the management of heterogeneous data? The crosswalk is a preliminary one aimed at transforming relatively simple METS documents Source: Yee and Beaubien, 2003 Crosswalks that extend interoperability are essential so that the digital library collections can be accessible through a variety of portals and search interfaces. As more organizations share what they have learned the development of crosswalks will be better understood and more easily accomplished. Source: Lightle and Ridgway 2003
Open issue 2: Is XSLT the best tool for metadata translation? XSLT is cumbersome when there is a need for high-fidelity translations. More precise associations between syntax and semantics may be necessary. The supporting documentation required for verifiable translation is daunting. Metadata Schema X (versions * encodings) * Metadata Schema Y (versions * encodings)
Project status The OCLC OAI repository is accessible at: http://errol.oclc.org/xmlregistry.oclc.org.html Advanced searching using the SRW (Search/Retrieve Web service) protocol is currently being implemented. The registry is being populated with crosswalk records. We welcome your comments and participation!
For further information Metadata Schema Transformation Services http://oclc.org/research/projects/mswitch/1_schematrans.htm The Open Archives Initiative Project http://oclc.org/research/projects/oai/default.htm Two Paths to Interoperable Metadata http://oclc.org/research/publications/archive/2003/godbydc2003.pdf
References Raymond Yee and Rick Beaubien. 2003. A Preliminary Crosswalk from METS to IMS Content Packaging. Library Hi Tech. Kimberly Lightle and Judith S. Ridgway. 2003. Generation of XML Records across Multiple Metadata Standards. DLIB http://dlib.org/dlib/september03/lightle/09lightle.html