Harvesting Topic Maps with XSLT

Similar documents
XSLT: How Do We Use It?

Developing an Automatic Metadata Harvesting and Generation System for a Continuing Education Repository: A Pilot Study

DocBook: A Case Study and Anecdotes. Norman Walsh Sun Microsystems, Inc.

Computer Science E-259

XML. Objectives. Duration. Audience. Pre-Requisites

Workshop B: Application Profiles Canadian Metadata Forum September 28, 2005

& Interoperability Issues

Birkbeck (University of London)

COP 4814 Florida International University Kip Irvine XSLT. Updated: 2/9/2016 Based on Goldberg, Chapter 2. Irvine COP 4814

XPath and XSLT without the pain!

6/6/2016 3:23 PM 1 of 15

Multi-agent Semantic Web Systems: Data & Metadata

Presentation to Canadian Metadata Forum September 20, 2003

XML Wrap-up. CS 431 March 1, 2006 Carl Lagoze Cornell University

0. Table of content Preface The management of knowledge and topic maps

Style Sheet A. Bellaachia Page: 22

<xsl:variable name="g_ndstemplatedoc" select="document($g_strhtmltemplatename)"/>

<xsl:apply-templates select="atom:entry/atom:content"/> <xsl:copy-of xmlns:xsl="

Million Book Universal Library Project :Manual for Metadata Capture, Digitization, and OCR

Excel to XML v3. Compatibility Switch 13 update 1 and higher. Windows or Mac OSX.

D4.8 Report on semantic interoperability with Europeana

XPath and XSLT. Overview. Context. Context The Basics of XPath. XPath and XSLT. Nodes Axes Expressions. Stylesheet templates Transformations

Common presentation of data from archives, libraries and museums in Denmark Leif Andresen Danish Library Agency October 2007

Display the XML Files for Disclosure to Public by Using User-defined XSL Zhiping Yan, BeiGene, Beijing, China Huadan Li, BeiGene, Beijing, China

XSL Languages. Adding styles to HTML elements are simple. Telling a browser to display an element in a special font or color, is easy with CSS.

RDF and Digital Libraries

EXAM XML 1.1 and Related Technologies TYPE: DEMO

A Guide for Designing Your Own Dyamic SiteMason Templates. Creating. SiteMason Templates

Excel to XML v4. Version adds two Private Data sets

Burrows & Langford Appendix D page 1 Learning Programming Using VISUAL BASIC.NET

Chapter 10 - XML. Goal: use document in various, evolving systems structure content layout grammar: markup vocabulary for mixed content

BIBLID (2004) 93:1 pp (2004.6) 209. NBINet NBINet 92

National Documentation Centre Open access in Cultural Heritage digital content

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

DC-Text - a simple text-based format for DC metadata

BIBLIOGRAPHIC REFERENCE DATA STANDARD

HDMS Finding Aids EAD FINDING AIDS 2 HTML FINDING AID 3 STEP 1: INPUT TITLE, CREATOR, COPYRIGHT, LANGUAGE AND URL DETAILS 4

Topic Map-Based Holy Quran Index

XSLT Programming Constructs

XSL Elements. xsl:copy-of

The Biblioteca de Catalunya and Europeana

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web

Using the WorldCat Digital Collection Gateway with CONTENTdm

Author: Irena Holubová Lecturer: Martin Svoboda

Getting Started with Omeka Music Library Association March 5, 2016

Making Information Findable

info-h-509 xml technologies Lecture 5: XSLT Stijn Vansummeren February 14, 2017

The Dublin Core Metadata Element Set

Metadata Standards and Applications. 4. Metadata Syntaxes and Containers

The MIND Approach. Fabio Crestani University of Strathclyde, Glasgow, UK. Open Archive Forum Workshop Berlin, Germany, March 2003

Developing Shareable Metadata for DPLA

Semantic Web. XSLT: XML Transformation. Morteza Amini. Sharif University of Technology Fall 95-96

XSL Concepts: Conditions and Loops. Robert Kiffe, Senior Web Developer OmniUpdate, Inc.

Building a Semantic Web Site By Eric van der Vlist

Extensible Markup Stylesheet Transformation (XSLT)

Extreme Java G Session 3 - Sub-Topic 5 XML Information Rendering. Dr. Jean-Claude Franchitti

Print & Page Layout Community W3C

Sample Text Point Instruction

Interactive XML Visualization - using XSLT 2.0 on the Browser. Phil Fearon - Saxonica

How to Create a Custom Ingest Form

Integration of Heterogeneous Metadata in Europeana. Cesare Concordia Institute of Information Science and Technology-CNR

Implementing Digital Folklore Collections

XML and Semantic Web Technologies. II. XML / 5. XML Stylesheet Language Transformations (XSLT)

INTRO INTO WORKING WITH MINT

Exam : Title : XML 1.1 and Related Technologies. Version : DEMO

Implementing digital folklore collections

Signed metadata : method and application

Introduction to XSLT

A Repository of Metadata Crosswalks. Jean Godby, Devon Smith, Eric Childress, Jeffrey A. Young OCLC Online Computer Library Center Office of Research

On the Effective Manipulation of Digital Objects: A Prototype-Based Instantiation Approach

First metadata-enabled service in Croatian Webspace

Creating and Maintaining Metadata Vocabularies for Networkbased

Introduction to XSLT. Version 1.0 July nikos dimitrakas

Metadata for Digital Collections: A How-to-Do-It Manual

XSLT is... XML XSLT XSL-FO XPath

Graphical Notation for Topic Maps (GTM)

Comparison and mapping of VRA Core and IMS Meta-data

OpenAIRE Guidelines Promoting Repositories Interoperability and Supporting Open Access Funder Mandates

Discovering Shropshire s History Help sheet 2 How to upload a resource Author: Owner: Client: Document Number: Version 2 Release Date: February 2007

Advanced XSLT editing: Content query web part (CQWP) Dolev Raz SharePoint top soft Soft.co.il

Knowledge maps for composite e-services: A mining-based system platform coupling with recommendations

Expressing language resource metadata as Linked Data: A potential agenda for the Open Language Archives Community

Opus: University of Bath Online Publication Store

October 7, 2013 Kourtney Blackburn

USING DC FOR SERVICE DESCRIPTION

For those of you who may not have heard of the BHL let me give you some background. The Biodiversity Heritage Library (BHL) is a consortium of

OpenDocument meta data. Florian Reuter

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

Based on the functionality defined there are five required fields, out of which two are system generated. The other elements are optional.

XSL extensible Style Language" DOCUMENTS MULTIMEDIA! Transforming documents using! XSLT" XSLT processor" XSLT stylesheet"

VET Learning Object Repository Project flexiblelearning.net.au

Proposal: Codelists 1.0 April 2003

Presentation. Separating Content and Presentation Cascading Style Sheets (CSS) XML and XSLT

Semi-structured Data 11 - XSLT

XSL Transformation (XSLT) XSLT Processors. Example XSLT Stylesheet. Calling XSLT Processor. XSLT Structure

Improving Existing XSLT Stylesheets. Improving Existing XSLT Stylesheets Priscilla Walmsley

Appendix H XML Quick Reference

RDF Twig Accessing RDF Graphs in XSLT

WebSphere DataPower SOA Appliances and XSLT (Part 2 of 2) - Tips and Tricks

CONTENTdm Core Metadata Application Profile v2.1

Transcription:

1 of 33 Slide # 1 Harvesting Topic Maps with XSLT

2 of 33 by Nikita Ogievetsky, Cogitech, Inc. nogievet@cogx.com Cogitech, Inc. Slide # 2 Food Chain Crops are grown. Crops are harvested and Fowl is hunted. Food is cooked. Cooked food is consumed. Consumed food is recycled, disseminated, turned into fertilizer... Crops are grown.

3 of 33 Slide # 3 Knowledge Chain Information is acquired. Conceived information becomes knowledge. Knowledge is cooked (prepared) for presentation. Presentation is perceived. Perceived presentation is recycled, disseminated turned into a common sense.. Information is acquired. Slide # 4 Food Chain. Large Perspective Food undergoes 2 stages before it is consumed:

4 of 33 Harvesting and storing. Aggregating and cooking. Slide # 5 Food Chain. Outcome

5 of 33 Slide # 6 Final result depends on both:

6 of 33 Slide # 7 1. How the produce was grown and gathered. How, who, when, where.

7 of 33 Slide # 8 How it was cooked.

8 of 33 Slide # 9 Harvesting Constraints It is hard to cook delicious, nice-looking and healthy dishes given spoiled ingredients. It is quite possible to cook tasteless dishes given excellent ingredients. Slide # 10 Harvesting Stylesheets Collection of constraints and rules constitute a stylesheet. Stylesheets that transform agriculture resources into eatable groceries. Stylesheets that transform groceries (bwyd) into food.

9 of 33 Slide # 11 Knowledge Chain. Large Perspective. Information has to undergo 2 stages before it is conceived: Data acquisition (harvesting) and storing. Aggregating and presenting. Slide # 12 Knowledge Chain. Outcome. Final result depends on both: How the information was collected. who, when, where

10 of 33 How it was presented. Slide # 13 Harvesting Constraints It is hard to make a good presentation given corrupt/wrong underlying knowledge base. It is quite possible to make a terrible presentation given great underlying knowledge base. Slide # 14 Knowledge Harvesting Stylesheets

11 of 33 Collection of constraints and rules constitute a stylesheet. Stylesheets that transform information resources into knowledge base. Cognition Stylesheets. Stylesheets that transform knowledge base into a presentation. Presentation Stylesheets. Slide # 15 Cognition Stylesheet Stylesheet that transforms... the situation that researcher is looking at into the situation he sees sounds that researcher is listening to into the signals he distinguishes from the noise a wine bouquet that researcher is testing into the bouquet he appreciates...

12 of 33 Slide # 16 Perspectives... The further back we look in time, the more adornments people use in their cognition stylesheets mythologies Or look back into your childhood... <xsl:choose> <xsl:when test="understand"> <Have-Fun/> </xsl:when> <otherwise> <Disregard/> </otherwise> </choose> Slide # 17 Food Web

13 of 33 "A complex of interrelated food chains in an ecological community." -- The American Heritage Dictionary Semantic Web? Slide # 18 Why intermediate Knowledge repository

14 of 33 Or? Slide # 19 Why XML Topic Maps for Knowledge Repository on the Web Allows to maintain metadata in very structured way, at a higher level then a single web-site. Different types of resources can be stored and maintained separately, and at the same time interconnected with each other and with the business rules of the web site.

15 of 33 Not only content and look and feel, but also the web site structure itself and navigational profiles can be customized for different types of users. Slide # 20 XSLT pseudocode Harvesting Topic Maps: How-to <xsl:choose> <xsl:when test="has-relevant-metadata"> <topic> <xsl:for-each test="doesn't-have-relevant-metadata"> <occurrence/> </xsl:for-each> </topic> <xsl:for-each test="has-relevant-metadata"> <association/> </xsl:for-each> </xsl:when> <otherwise/> </choose>

16 of 33 Slide # 21 Knowledge Extraction Stylesheets for Dublin Core Metadata Element Set Mapping Slide # 22 dc.identifier dc.identifier => topic/@id If dc.identifier is missing generate-id(.)=> topic/@id

17 of 33 <xsl:variable name="id"> <xsl:choose> <xsl:when test="dc:identifier"><xsl:value-of select="dc:identifier"/></xsl:when> <xsl:otherwise><xsl:value-of select="generate-id()"/></xsl:otherwise> </xsl:choose> </xsl:variable> <topic id="{$id}"/> Slide # 23 dc.subject dc.subject => <instanceof> elements <instanceof> <xsl:choose> <xsl:when test="@rdf:resource"> <topicref xlink:href="#{@rdf:resource}"/> </xsl:when> <xsl:otherwise> <topicref xlink:href="#{psv:descriptor/@rdf:about}"/> </xsl:otherwise> </xsl:choose> </instanceof>

18 of 33 Slide # 24 dc:subject Classes Extract unique <dc.subject>s <xsl:for-each select="//dc:subject[not(following::dc:subject/@rdf:resource = @rdf:resource)]"> <topic id = "{@rdf:resource}"> <subjectidentity> <topicref xlink:href="{substring-before(@rdf:resource,':')}.xtm#{@rdf:resource}"/> </subjectidentity> <instanceof><topicref xlink:href="#{substring-before(@rdf:resource,':')}"/></instanceof> <basename> <scope><topicref xlink:href="#{substring-before(@rdf:resource,':')}"/></scope> <basenamestring><xsl:value-of select="substring-after(@rdf:resource,':')"/></basenames </basename> </topic> </xsl:for-each> Slide # 25 dc:subject Classes in PRISM

19 of 33 xslt:template mode="prism" <xsl:for-each select="//dc:subject[@rdf:resource][not(following::dc:subject/@rdf:resource = @rdf:resource) and not <topic id = "{@rdf:resource}"> <subjectidentity><topicref xlink:href="{substring-before(@rdf:resource,':')}.xtm#{@rdf:resource}"/> <instanceof><topicref xlink:href="#{substring-before(@rdf:resource,':')}"/></instanceof> <basename> <scope><topicref xlink:href="#{substring-before(@rdf:resource,':')}"/></scope> <basenamestring><xsl:value-of select="substring-after(@rdf:resource,':')"/></basenames </basename> </topic> </xsl:for-each> <xsl:for-each select="//dc:subject/psv:descriptor[not(following::dc:subject/@rdf:resource = @rdf:about) and not(foll <topic id = "{@rdf:about}"> <instanceof><topicref xlink:href="#{substring-before(@rdf:about,':')}"/></instanceof> <basename> <scope><topicref xlink:href="#{substring-before(@rdf:about,':')}"/></scope> <basenamestring><xsl:value-of select="psv:label"/></basenamestring> </basename> <basename> <scope><topicref xlink:href="#{substring-before(@rdf:about,':')}"/></scope> <basenamestring><xsl:value-of select="psv:code"/></basenamestring> </basename> </topic> </xsl:for-each> Slide # 26

20 of 33 dc.format dc.format => <instanceof> MIME types <instanceof> <topicref xlink:href="#{translate(.,'/','')}"/> </instanceof> Extract unique <dc.format>s <xsl:for-eachselect="//rdf:description[not(following::rdf:description/dc:format=dc:format)][dc:format]"> <topic id = "{translate(dc:format,'.,/?-','')}"> <instanceof><topicref xlink:href="#dc-format"/></instanceof> <basename> <basenamestring><xsl:value-of select="dc:format"/></basenamestring> </basename> </topic> </xsl:for-each> Slide # 27 #dc-format

21 of 33 <topic id="dc-format"> <subjectidentity> <subjectindicatorref xlink:href="http://purl.org/dc/elements/1.1#format"/> </subjectidentity> <occurrence> <instanceof><topicref xlink:href="#definition"/></instanceof> <scope><topicref xlink:href="#dc"/></scope> <resourcedata>the physical or digital manifestation of the resource.</resourcedata> </occurrence> <occurrence> <instanceof><topicref xlink:href="#description"/></instanceof> <scope><topicref xlink:href="#prism"/></scope> <resourcedata> Typically, Format may include the media-type or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [MIME] defining computer media formats). [For PRISM, I think we are only interested in the media type. Physical format info is probably not something we need to do in an interoperable manner.] </resourcedata> </occurrence> </topic> Slide # 28 rdf:about

22 of 33 rdf:about => <resourceref>/ <subjectindicatorref> PRISM metadata is about resource content => <subjectindicatorref>. <subjectidentity> <subjectindicatorref xlink:href="{@rdf:about}"/> </subjectidentity> Slide # 29 dc:title dc:title => <basename> <basename> <basenamestring><xsl:value-of select="."/></basenamestring> </basename> Slide # 30 dc:date

23 of 33 dc:date => <occurrence> of type "dc-date" <occurrence> <instanceof><topicref xlink:href="#dc-date"/></instanceof> <resourcedata><xsl:value-of select="."/></resourcedata> </occurrence> Slide # 31 #dc-date <topic id="dc-date"> <instanceof> <subjectindicatorref xlink:href="http://www.topicmaps.org/xtm/1.0/index.html#psi-occurrence"/> </instanceof> <subjectidentity> <subjectindicatorref xlink:href="http://purl.org/dc/elements/1.1#date"/> </subjectidentity> <basename><basenamestring>date</basenamestring></basename> <occurrence> <instanceof><topicref xlink:href="#definition"/></instanceof> <scope><topicref xlink:href="#dc"/></scope> <resourcedata> A date associated with an event in the life cycle of the resource. </resourcedata> </occurrence>

24 of 33 <occurrence> <instanceof><topicref xlink:href="#description"/></instanceof> <scope><topicref xlink:href="#prism"/></scope> <resourcedata> Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format. Any number of dates may need to be associated with a resource. PRISM recommends that this element contain the date and time the resource was published. Preference should be given to the more specific PRISM date and time elements. </resourcedata> </occurrence> </topic> Slide # 32 Creators Unique dc.creator => <topic> of type "creator" <xsl:for-each select="//dc:creator[not(following::dc:creator =.)]"> <topic id = "{translate(.,'.,/?-','')}"> <instanceof><topicref xlink:href="#creator"/></instanceof> <basename> <basenamestring><xsl:value-of select="."/></basenamestring> </basename>

25 of 33 </topic> </xsl:for-each> Slide # 33 #dc-creator <topic id="dc-creator"> <subjectidentity> <subjectindicatorref xlink:href="http://purl.org/dc/elements/1.1#creator"/> </subjectidentity> <basename><basenamestring>creator</basenamestring></basename> <occurrence> <instanceof><topicref xlink:href="#definition"/></instanceof> <scope><topicref xlink:href="#dc"/></scope> <resourcedata> An entity primarily responsible for making the content of the resource. </resourcedata> </occurrence> <occurrence> <instanceof><topicref xlink:href="#description"/></instanceof> <scope><topicref xlink:href="#prism"/></scope> <resourcedata> Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity. In principle, any number of creators may be associated with a resource.

26 of 33 PRISM recommends that this element contain the name of one person or organization primarily responsible for this resource. Synonyms or "aliases" for creator names should be handled with an Authority File. Use other PRISM elements to describe arbitrary contributory roles. </resourcedata> </occurrence> </topic> Slide # 34 Knowledge Extraction Stylesheets for Publishing Requirements for Industry Standard Metadata (PRISM) Specification Slide # 35 prism:copyright

27 of 33 prism:copyright => <occurrence> of type "copyright" <occurrence> <instanceof><topicref xlink:href="#copyright"/></instanceof> <resourcedata><xsl:value-of select="."/></resourcedata> </occurrence> Slide # 36 prism:hasalternative, prism:isalternative prism:hasalternative; prism:isalternative => <association> of type "alternatives" <association> <instanceof><topicref xlink:href="#alternative"/></instanceof> <member> <rolespec><topicref xlink:href="#hasalternative"/></rolespec> <topicref xlink:href="#{../dc:identifier}"/> </member> <member> <rolespec><topicref xlink:href="#isalternative"/></rolespec> <topicrefxlink:href="#{//rdf:description[@rdf:about=current()/@rdf:resource]/dc:identifier}"/> </member> </association>

28 of 33 Slide # 37 #hasalternative, #isalternative <topic id="isalternative"> <subjectidentity> <subjectindicatorref xlink:href="http://prismstandard.org/1.0#isalternative"/> </subjectidentity> <basename><basenamestring>is alternative for</basenamestring></basename> <occurrence> <instanceof><topicref xlink:href="#description"/></instanceof> <scope><topicref xlink:href="#prism"/></scope> <resourcedata> The described resource can be substituted for the referenced resource. </resourcedata> </occurrence> </topic> <topic id="hasalternative"> <subjectidentity> <subjectindicatorref xlink:href="http://prismstandard.org/1.0#hasalternative"/> </subjectidentity> <basename><basenamestring>has an alternative</basenamestring></basename> <occurrence> <instanceof><topicref xlink:href="#description"/></instanceof> <scope><topicref xlink:href="#prism"/></scope> <resourcedata>the described resource has an alternative version that can be substituted, namely the referenc </occurrence> </topic>

29 of 33 Slide # 38 XSLT Layers Per XWATL framework harvesting stylesheets are split in layers. Include only required stylesheets. Example: <xsl:stylesheet...> <!--"http://purl.org/dc/elements/1.1/" vocabulary --> <xsl:include href = "dc2xtm.xsl" /> <!--"http://purl.org/rss/1.0/modules/syndication/" vocabulary --> <xsl:include href = "sy2xtm.xsl" /> <!--"http://purl.org/rss/1.0/modules/company/" vocabulary --> <xsl:include href = "co2xtm.xsl" /> <!--"http://purl.org/rss/1.0/modules/textinput/" vocabulary --> <xsl:include href = "ti2xtm.xsl" /> <!--"http://purl.org/rss/1.0/" vocabulary --> <xsl:include href = "rss2xtm.xsl" /> <xsl:include href = "prism2xtm.xsl" /> <xsl:include href = "psv2xtm.xsl" /> {...}

30 of 33 Slide # 39 Knowledge Presentation XSLT Templates Topic Maps give XSLT something to do! Slide # 40 Indexing topics with XSLT keys <xsl:key name = "topicbyid" match = "topic" use = "concat('#',@id)" /> <xsl:apply-templates select="key('topicbyid',@xlink:href)"/> Slide # 41

31 of 33 Indexing instanciated topics with XSLT keys <xsl:key name = "instance" match = "topic" use = "substring-after(instanceof/topicref/@xlink:href,'#')" /> <xsl:apply-templates select="key('instance',@id)"/> Slide # 42 XTM Cooking stylesheets Structural Components Topic Map source code that controls web site content and site map. XSLT stylesheets that control web page layout and look-and-feel style. The whole WWW universe of resources referenced by XTM topic <occurrence> resource locators. More on this in the AWL book "XML Topic Maps: Creating and Using Topic Maps for the Web"

32 of 33 edited by Jack Park. Slide # 43 Mapping Topic Map elements for HTML rendition Topic Map Topic Topic Associations Occurrences Topic Names Web Site Web Page Site map. Images, Logo,Text,HTML fragments,external Links Page Headers, Titles,UL lists,hyperlinks titles.

33 of 33 Slide # 44 Bon Appétit! http://www.cogx.com