1 of 33 Slide # 1 Harvesting Topic Maps with XSLT
2 of 33 by Nikita Ogievetsky, Cogitech, Inc. nogievet@cogx.com Cogitech, Inc. Slide # 2 Food Chain Crops are grown. Crops are harvested and Fowl is hunted. Food is cooked. Cooked food is consumed. Consumed food is recycled, disseminated, turned into fertilizer... Crops are grown.
3 of 33 Slide # 3 Knowledge Chain Information is acquired. Conceived information becomes knowledge. Knowledge is cooked (prepared) for presentation. Presentation is perceived. Perceived presentation is recycled, disseminated turned into a common sense.. Information is acquired. Slide # 4 Food Chain. Large Perspective Food undergoes 2 stages before it is consumed:
4 of 33 Harvesting and storing. Aggregating and cooking. Slide # 5 Food Chain. Outcome
5 of 33 Slide # 6 Final result depends on both:
6 of 33 Slide # 7 1. How the produce was grown and gathered. How, who, when, where.
7 of 33 Slide # 8 How it was cooked.
8 of 33 Slide # 9 Harvesting Constraints It is hard to cook delicious, nice-looking and healthy dishes given spoiled ingredients. It is quite possible to cook tasteless dishes given excellent ingredients. Slide # 10 Harvesting Stylesheets Collection of constraints and rules constitute a stylesheet. Stylesheets that transform agriculture resources into eatable groceries. Stylesheets that transform groceries (bwyd) into food.
9 of 33 Slide # 11 Knowledge Chain. Large Perspective. Information has to undergo 2 stages before it is conceived: Data acquisition (harvesting) and storing. Aggregating and presenting. Slide # 12 Knowledge Chain. Outcome. Final result depends on both: How the information was collected. who, when, where
10 of 33 How it was presented. Slide # 13 Harvesting Constraints It is hard to make a good presentation given corrupt/wrong underlying knowledge base. It is quite possible to make a terrible presentation given great underlying knowledge base. Slide # 14 Knowledge Harvesting Stylesheets
11 of 33 Collection of constraints and rules constitute a stylesheet. Stylesheets that transform information resources into knowledge base. Cognition Stylesheets. Stylesheets that transform knowledge base into a presentation. Presentation Stylesheets. Slide # 15 Cognition Stylesheet Stylesheet that transforms... the situation that researcher is looking at into the situation he sees sounds that researcher is listening to into the signals he distinguishes from the noise a wine bouquet that researcher is testing into the bouquet he appreciates...
12 of 33 Slide # 16 Perspectives... The further back we look in time, the more adornments people use in their cognition stylesheets mythologies Or look back into your childhood... <xsl:choose> <xsl:when test="understand"> <Have-Fun/> </xsl:when> <otherwise> <Disregard/> </otherwise> </choose> Slide # 17 Food Web
13 of 33 "A complex of interrelated food chains in an ecological community." -- The American Heritage Dictionary Semantic Web? Slide # 18 Why intermediate Knowledge repository
14 of 33 Or? Slide # 19 Why XML Topic Maps for Knowledge Repository on the Web Allows to maintain metadata in very structured way, at a higher level then a single web-site. Different types of resources can be stored and maintained separately, and at the same time interconnected with each other and with the business rules of the web site.
15 of 33 Not only content and look and feel, but also the web site structure itself and navigational profiles can be customized for different types of users. Slide # 20 XSLT pseudocode Harvesting Topic Maps: How-to <xsl:choose> <xsl:when test="has-relevant-metadata"> <topic> <xsl:for-each test="doesn't-have-relevant-metadata"> <occurrence/> </xsl:for-each> </topic> <xsl:for-each test="has-relevant-metadata"> <association/> </xsl:for-each> </xsl:when> <otherwise/> </choose>
16 of 33 Slide # 21 Knowledge Extraction Stylesheets for Dublin Core Metadata Element Set Mapping Slide # 22 dc.identifier dc.identifier => topic/@id If dc.identifier is missing generate-id(.)=> topic/@id
17 of 33 <xsl:variable name="id"> <xsl:choose> <xsl:when test="dc:identifier"><xsl:value-of select="dc:identifier"/></xsl:when> <xsl:otherwise><xsl:value-of select="generate-id()"/></xsl:otherwise> </xsl:choose> </xsl:variable> <topic id="{$id}"/> Slide # 23 dc.subject dc.subject => <instanceof> elements <instanceof> <xsl:choose> <xsl:when test="@rdf:resource"> <topicref xlink:href="#{@rdf:resource}"/> </xsl:when> <xsl:otherwise> <topicref xlink:href="#{psv:descriptor/@rdf:about}"/> </xsl:otherwise> </xsl:choose> </instanceof>
18 of 33 Slide # 24 dc:subject Classes Extract unique <dc.subject>s <xsl:for-each select="//dc:subject[not(following::dc:subject/@rdf:resource = @rdf:resource)]"> <topic id = "{@rdf:resource}"> <subjectidentity> <topicref xlink:href="{substring-before(@rdf:resource,':')}.xtm#{@rdf:resource}"/> </subjectidentity> <instanceof><topicref xlink:href="#{substring-before(@rdf:resource,':')}"/></instanceof> <basename> <scope><topicref xlink:href="#{substring-before(@rdf:resource,':')}"/></scope> <basenamestring><xsl:value-of select="substring-after(@rdf:resource,':')"/></basenames </basename> </topic> </xsl:for-each> Slide # 25 dc:subject Classes in PRISM
19 of 33 xslt:template mode="prism" <xsl:for-each select="//dc:subject[@rdf:resource][not(following::dc:subject/@rdf:resource = @rdf:resource) and not <topic id = "{@rdf:resource}"> <subjectidentity><topicref xlink:href="{substring-before(@rdf:resource,':')}.xtm#{@rdf:resource}"/> <instanceof><topicref xlink:href="#{substring-before(@rdf:resource,':')}"/></instanceof> <basename> <scope><topicref xlink:href="#{substring-before(@rdf:resource,':')}"/></scope> <basenamestring><xsl:value-of select="substring-after(@rdf:resource,':')"/></basenames </basename> </topic> </xsl:for-each> <xsl:for-each select="//dc:subject/psv:descriptor[not(following::dc:subject/@rdf:resource = @rdf:about) and not(foll <topic id = "{@rdf:about}"> <instanceof><topicref xlink:href="#{substring-before(@rdf:about,':')}"/></instanceof> <basename> <scope><topicref xlink:href="#{substring-before(@rdf:about,':')}"/></scope> <basenamestring><xsl:value-of select="psv:label"/></basenamestring> </basename> <basename> <scope><topicref xlink:href="#{substring-before(@rdf:about,':')}"/></scope> <basenamestring><xsl:value-of select="psv:code"/></basenamestring> </basename> </topic> </xsl:for-each> Slide # 26
20 of 33 dc.format dc.format => <instanceof> MIME types <instanceof> <topicref xlink:href="#{translate(.,'/','')}"/> </instanceof> Extract unique <dc.format>s <xsl:for-eachselect="//rdf:description[not(following::rdf:description/dc:format=dc:format)][dc:format]"> <topic id = "{translate(dc:format,'.,/?-','')}"> <instanceof><topicref xlink:href="#dc-format"/></instanceof> <basename> <basenamestring><xsl:value-of select="dc:format"/></basenamestring> </basename> </topic> </xsl:for-each> Slide # 27 #dc-format
21 of 33 <topic id="dc-format"> <subjectidentity> <subjectindicatorref xlink:href="http://purl.org/dc/elements/1.1#format"/> </subjectidentity> <occurrence> <instanceof><topicref xlink:href="#definition"/></instanceof> <scope><topicref xlink:href="#dc"/></scope> <resourcedata>the physical or digital manifestation of the resource.</resourcedata> </occurrence> <occurrence> <instanceof><topicref xlink:href="#description"/></instanceof> <scope><topicref xlink:href="#prism"/></scope> <resourcedata> Typically, Format may include the media-type or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [MIME] defining computer media formats). [For PRISM, I think we are only interested in the media type. Physical format info is probably not something we need to do in an interoperable manner.] </resourcedata> </occurrence> </topic> Slide # 28 rdf:about
22 of 33 rdf:about => <resourceref>/ <subjectindicatorref> PRISM metadata is about resource content => <subjectindicatorref>. <subjectidentity> <subjectindicatorref xlink:href="{@rdf:about}"/> </subjectidentity> Slide # 29 dc:title dc:title => <basename> <basename> <basenamestring><xsl:value-of select="."/></basenamestring> </basename> Slide # 30 dc:date
23 of 33 dc:date => <occurrence> of type "dc-date" <occurrence> <instanceof><topicref xlink:href="#dc-date"/></instanceof> <resourcedata><xsl:value-of select="."/></resourcedata> </occurrence> Slide # 31 #dc-date <topic id="dc-date"> <instanceof> <subjectindicatorref xlink:href="http://www.topicmaps.org/xtm/1.0/index.html#psi-occurrence"/> </instanceof> <subjectidentity> <subjectindicatorref xlink:href="http://purl.org/dc/elements/1.1#date"/> </subjectidentity> <basename><basenamestring>date</basenamestring></basename> <occurrence> <instanceof><topicref xlink:href="#definition"/></instanceof> <scope><topicref xlink:href="#dc"/></scope> <resourcedata> A date associated with an event in the life cycle of the resource. </resourcedata> </occurrence>
24 of 33 <occurrence> <instanceof><topicref xlink:href="#description"/></instanceof> <scope><topicref xlink:href="#prism"/></scope> <resourcedata> Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format. Any number of dates may need to be associated with a resource. PRISM recommends that this element contain the date and time the resource was published. Preference should be given to the more specific PRISM date and time elements. </resourcedata> </occurrence> </topic> Slide # 32 Creators Unique dc.creator => <topic> of type "creator" <xsl:for-each select="//dc:creator[not(following::dc:creator =.)]"> <topic id = "{translate(.,'.,/?-','')}"> <instanceof><topicref xlink:href="#creator"/></instanceof> <basename> <basenamestring><xsl:value-of select="."/></basenamestring> </basename>
25 of 33 </topic> </xsl:for-each> Slide # 33 #dc-creator <topic id="dc-creator"> <subjectidentity> <subjectindicatorref xlink:href="http://purl.org/dc/elements/1.1#creator"/> </subjectidentity> <basename><basenamestring>creator</basenamestring></basename> <occurrence> <instanceof><topicref xlink:href="#definition"/></instanceof> <scope><topicref xlink:href="#dc"/></scope> <resourcedata> An entity primarily responsible for making the content of the resource. </resourcedata> </occurrence> <occurrence> <instanceof><topicref xlink:href="#description"/></instanceof> <scope><topicref xlink:href="#prism"/></scope> <resourcedata> Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity. In principle, any number of creators may be associated with a resource.
26 of 33 PRISM recommends that this element contain the name of one person or organization primarily responsible for this resource. Synonyms or "aliases" for creator names should be handled with an Authority File. Use other PRISM elements to describe arbitrary contributory roles. </resourcedata> </occurrence> </topic> Slide # 34 Knowledge Extraction Stylesheets for Publishing Requirements for Industry Standard Metadata (PRISM) Specification Slide # 35 prism:copyright
27 of 33 prism:copyright => <occurrence> of type "copyright" <occurrence> <instanceof><topicref xlink:href="#copyright"/></instanceof> <resourcedata><xsl:value-of select="."/></resourcedata> </occurrence> Slide # 36 prism:hasalternative, prism:isalternative prism:hasalternative; prism:isalternative => <association> of type "alternatives" <association> <instanceof><topicref xlink:href="#alternative"/></instanceof> <member> <rolespec><topicref xlink:href="#hasalternative"/></rolespec> <topicref xlink:href="#{../dc:identifier}"/> </member> <member> <rolespec><topicref xlink:href="#isalternative"/></rolespec> <topicrefxlink:href="#{//rdf:description[@rdf:about=current()/@rdf:resource]/dc:identifier}"/> </member> </association>
28 of 33 Slide # 37 #hasalternative, #isalternative <topic id="isalternative"> <subjectidentity> <subjectindicatorref xlink:href="http://prismstandard.org/1.0#isalternative"/> </subjectidentity> <basename><basenamestring>is alternative for</basenamestring></basename> <occurrence> <instanceof><topicref xlink:href="#description"/></instanceof> <scope><topicref xlink:href="#prism"/></scope> <resourcedata> The described resource can be substituted for the referenced resource. </resourcedata> </occurrence> </topic> <topic id="hasalternative"> <subjectidentity> <subjectindicatorref xlink:href="http://prismstandard.org/1.0#hasalternative"/> </subjectidentity> <basename><basenamestring>has an alternative</basenamestring></basename> <occurrence> <instanceof><topicref xlink:href="#description"/></instanceof> <scope><topicref xlink:href="#prism"/></scope> <resourcedata>the described resource has an alternative version that can be substituted, namely the referenc </occurrence> </topic>
29 of 33 Slide # 38 XSLT Layers Per XWATL framework harvesting stylesheets are split in layers. Include only required stylesheets. Example: <xsl:stylesheet...> <!--"http://purl.org/dc/elements/1.1/" vocabulary --> <xsl:include href = "dc2xtm.xsl" /> <!--"http://purl.org/rss/1.0/modules/syndication/" vocabulary --> <xsl:include href = "sy2xtm.xsl" /> <!--"http://purl.org/rss/1.0/modules/company/" vocabulary --> <xsl:include href = "co2xtm.xsl" /> <!--"http://purl.org/rss/1.0/modules/textinput/" vocabulary --> <xsl:include href = "ti2xtm.xsl" /> <!--"http://purl.org/rss/1.0/" vocabulary --> <xsl:include href = "rss2xtm.xsl" /> <xsl:include href = "prism2xtm.xsl" /> <xsl:include href = "psv2xtm.xsl" /> {...}
30 of 33 Slide # 39 Knowledge Presentation XSLT Templates Topic Maps give XSLT something to do! Slide # 40 Indexing topics with XSLT keys <xsl:key name = "topicbyid" match = "topic" use = "concat('#',@id)" /> <xsl:apply-templates select="key('topicbyid',@xlink:href)"/> Slide # 41
31 of 33 Indexing instanciated topics with XSLT keys <xsl:key name = "instance" match = "topic" use = "substring-after(instanceof/topicref/@xlink:href,'#')" /> <xsl:apply-templates select="key('instance',@id)"/> Slide # 42 XTM Cooking stylesheets Structural Components Topic Map source code that controls web site content and site map. XSLT stylesheets that control web page layout and look-and-feel style. The whole WWW universe of resources referenced by XTM topic <occurrence> resource locators. More on this in the AWL book "XML Topic Maps: Creating and Using Topic Maps for the Web"
32 of 33 edited by Jack Park. Slide # 43 Mapping Topic Map elements for HTML rendition Topic Map Topic Topic Associations Occurrences Topic Names Web Site Web Page Site map. Images, Logo,Text,HTML fragments,external Links Page Headers, Titles,UL lists,hyperlinks titles.
33 of 33 Slide # 44 Bon Appétit! http://www.cogx.com