Annotation Services to Support Collaborative Development of Scholarly Editions

Size: px

Start display at page:

Download "Annotation Services to Support Collaborative Development of Scholarly Editions"

Bethanie Day
5 years ago
Views:

1 Annotation Services to Support Collaborative Development of Scholarly Editions Open Annotation Collaboration (OAC) Annotation Demonstration Experiment Report Anna Gerber and Jane Hunter ITEE, The University of Queensland 1. Executive Summary The aim of this experiment has been to evaluate and demonstrate the applicability of OAC in the context of collaborative scholarly editions of literary works. Several specialised types of annotation supporting the scholarly editing process were identified and modelled using the OAC beta data model. A user interface, which displays multiple versions of a literary work with textual transcriptions and facsimiles presented side- by- side was developed, and the Web- browser- based Aus- e- Lit LORE annotation tool 1 was extended to use the OAC model to allow segments of facsimile images and transcriptions, and variations between versions of a work to be annotated with those specialised annotation types. The LOREstore repository was also developed, to store annotations as RDF graphs and to support search, display, retrieval and SPARQL querying of OAC annotations. This report describes the use cases addressed, provides examples of the specialised annotation types that were identified to support the collaborative development of scholarly editions, and discusses the lessons learned from building annotation tools and services using the OAC beta model, during the course of this demonstration experiment. 2. Use Case Context Scholarly editions are the outcome of detailed study of a specific literary work or collection of shorter works such as poems or short stories. When preparing a scholarly edition, scholarly editors aim to provide a comprehensive description of the history of the literary work(s) including information about significant versions and physical forms. The IFLA FRBR model 2 can be used as a bibliographical foundation for describing these versions. In addition to a textual essay, editorial decisions are argued in textual notes, and a textual apparatus is 1 e- lit/ 2 requirements- for- bibliographic- records

2 compiled to record the alterations made between different versions of the work. Annotations can be used to document these textual notes and variations, and can provide an additional layer of information about the documents being studied, and the people or organisations who were involved in the production of the work over time. Annotations in the form of explanatory notes may also address the content of the text, identifying such things as allusions to other works, historical contexts and stylistic significance. Modern scholarly editions are increasingly collaborative ventures with editors, advisers and editorial board members dispersed globally, so there is a critical need for Web- based tools and services to support the collaborative development of scholarly editions by distributed scholars. Although tools such as MEDITE 3, Juxta 4, CollateX 5 and MVD GUI 6 facilitate comparing and displaying variants across multiple digitised versions of a work, there is no established, common model or tool for documenting, sharing or replying to annotations on specific variants within a collaborative edition. Hence the key use cases for this experiment focus on enabling editors to create, share and reply to scholarly commentary attached to variations between versions (FRBR manifestations) of a particular work and to parts of the text (i.e. a segment of a transcription) or facsimile image). Editors also need to be able to retrieve annotations (including their own and those created by collaborators) across versions, through search and querying by target resource. For example, users want to retrieve and display all annotations associated with a particular transcription or image (that is part of an edition) and display them in parallel within the Web browser. 3. Annotation Types We characterised the types of annotations that may be attached to selections of transcriptions, facsimiles and variants to support the production of apparatus and scholarly commentary during development of a collaborative scholarly edition as follows: Variation Annotation Purpose Describe textual variation between versions of a work. Description The scholarly commentary attached to textual variation describes metadata properties such as the date when the original variation occurred and the agent responsible for the change, and also allows links to additional resources such as part of a manuscript image, or supporting documentary evidence. Variation Annotations are eventually published as part of the content a scholarly edition. Implementation We define VariationAnnotation as a subtype of oac:annotation, with a structured data body (the target URI identifies an ORE 3 poleia.lip6.fr/~ganascia/medite_project

3 Resource Map expressed as RDF/XML) Textual Note Purpose Description Implementation Explanatory Note Purpose Description Implementation Comment Purpose Description Implementation Document or provide support for editorial decisions. These will be published as part of an edition. TextualNote is a subtype of oac:annotation, which may target a variant, or any segment of a transcription or facsimile image. Provide explanatory commentary on selected characters, words, paragraphs, sections etc. These will be published as part of an edition. These may, for example, define an obscure word, provide historical context, or may identify a person, place, event or some other allusion in the text, and so we can characterise these further into the following types o Literary Allusion o Classical Allusion o Biblical Allusion o Glossary o Historical Note o Bibliographical Note ExplanatoryNote is a subtype of oac:annotation. Explanatory Notes typically target a segment of a transcription or facsimile image. To facilitate communication between collaborating editors during the editorial process. These are intended for communication during the editing process, and so they will not be published as part of an edition. We use the oac:annotation class Reply Purpose Description Implementation To facilitate communication between collaborating editors during the editorial process. Replies can be attached to any of the above annotation types, and are not published as part of an edition. We use the oac:reply class 4. Example Annotations This section provides examples of typical annotations produced during the experiment. Except for variation annotations, which use an ORE Resource Map body, our annotation tool creates inline text or HTML bodies for all annotation types. We also allow semantic or free- text tags to be attached to any of these annotation types in addition to the commentary e.g. for tagging places, people,

4 events, subjects etc. RDF serialisations for these examples are provided in Appendix A. Figure 1: Explanatory Note (Biblical Allusion) Figure 2: Variation Annotation between two versions of The Buln Buln and the Brolga

Figure 3: Textual Note and Reply 5. Summary of Progress 5.1 Accomplishments The LOREstore repository was developed to store annotations as RDF named graphs.

5 Figure 3: Textual Note and Reply 5. Summary of Progress 5.1 Accomplishments The LOREstore repository was developed to store annotations as RDF named graphs. We also use it to store data bodies for variation annotations (stored as ORE Resource Maps). It provides a Web interface supporting annotation search and display, repository content and user account management, and a REST API supporting Create, Read, Update and Delete of annotations as well retrieval by annotation identifier, target, keyword search or SPARQL query. Several example works were established and displayed via a Web interface developed using the nmerge 7 / MVD- GUI tool, which compares versions of a literary work side- by- side, with the option to switch between displaying the facsimile or transcription for each version. The LORE annotation sidebar was extended so that OAC Annotations representing comments, explanatory notes and textual notes on a specific version of a work (e.g. a segment of a transcription or facsimile image), and OAC Annotations describing textual variation between versions of a work, can be created and displayed, and to support annotation of annotations (e.g. as replies). Appendix B provides screenshots of the user interface for viewing transcriptions, facsimiles and textual variation and the annotation sidebar, while Appendix C 7 Desmond Schmidt, nmerge,

6 provides screenshots of the web interface developed for lorestore. Appendix D describes conference presentations relating to the experiment. 5.2 Limitations, delays and failures Although the AustLit collection contains many full text resources, for most works the collection only includes a single version. We needed to spend additional time to source, and sometimes to encode (using TEI) additional versions to use as examples during the experiment. Because of the amount of time required to digitise and encode such documents, we decided to work with small examples e.g. a couple of chapters extracted from a larger work, a single story from an anthology, rather than full examples as might be found in a complete edition. We wanted to support annotation of any resource regardless of location e.g. to allow annotation of transcriptions and facsimiles made available online through libraries and archives. Hence, our annotation tool is implemented as a Web browser extension. We worked on porting the LORE annotation sidebar (originally developed for Firefox), so that it could be installed as an extension for Google Chrome to make the tool available to users who did not wish to use Firefox. This development effort relied upon an experimental sidebar API that was made available through Chrome s developer channel. Unfortunately that sidebar API was discontinued and removed from Chrome, so we have not been able to finish porting the extension, and LORE is at present only available for Firefox, which limits the potential user base. We hope to complete this work when a replacement for the sidebar API for Chrome becomes available in the future. 5.3 Supplemental Ontologies Required and Recommendations We used the following ontologies in combination with the OAC model: Dublin Core: creator, created, modified, title, description for annotations and annotation bodies FOAF: for creator metadata FRBR in RDF 8 : for bibliographic metadata for Works, Expressions, Manifestations Annotation Ontology v : PrefixPostfixSelector for identifying segments of text We also developed our own ontology describing the subclasses of oac:annotation listed in Section 3, as well as custom properties to record variation metadata, and to relate digital surrogates (transcriptions and facsimiles) to FRBR entities. Identifying FRBR (and other non- information) entities consistently remains an unresolved issue for interoperable annotations across distributed scholarly editions, annotation systems and content servers, for example, to enable all annotations for the 1603 edition of Hamlet to be retrieved and displayed. We assigned our own local identifiers for FRBR entities for this experiment, but community agreement on conventions for generating or mapping between such ontology/wiki/understandingao

7 identifiers, or use of a name authority will be necessary to achieve seamless sharing of these annotations across systems. We used XPointers, W3C media fragments and AO PrefixPostfixSelectors for describing segments of transcriptions and facsimiles. Describing segments of digital transcriptions and facsimiles independently of their media type or format remains a key challenge to interoperability. This may be achieved through use of a schema that allows segments to be described by line, paragraph and page references; prefix- postfix notation; content offsets, etc., but this requires community agreement. Existing standards for addressing sections of texts apply to specific formats (such as TEI) only. 6. Discussion of Results and Conclusions 6.1 Technical Lessons Learned Initially, we defined many annotation subclasses, for example, to represent different types of Allusion, Historical, Bibliographical notes etc. As the number of subclasses increased, it became impractical to enumerate all possible types in each query, so we tried switching the Storage and Inferencing Layer (SAIL) within the RDF repository to one that supports type inferencing. However, this choice impacts on performance and scalability of the annotation repository, and many highly scalable SAILs do not support inferencing. Type inferencing is also not supported in the JavaScript library that we are using within our annotation client. To avoid the need for type inferencing altogether, we modified the annotation client to explicitly assert that the type of each annotation is oac:annotation by adding an additional rdf:type property. The downside of this approach is that we are storing redundant type information for most annotations. Eventually, we decided to reduce the number of annotation subclasses, and we discuss guidelines for when to subclass in Section The main technical weaknesses with the OAC model that we identified are the complexity of the model for common, basic use cases and that there is more than one way to represent certain information, which will increase the development effort required to produce tools that fully implement the model. For example, for annotations on part of an image, the image segment can be specified using an SVG constraint (with a constrains relationship to the image URI) or using a media fragment identifier in the target URI (with an ispartof relationship to the image URI). In both cases there is no direct link from the annotation object to the image URI. Queries to retrieve all annotations on a given image must examine the target URI (for whole of image annotations), as well as URIs related via these two properties. However, it is the flexibility afforded by this complexity that enables the OAC model to represent complex scholarly annotation use cases, such as those we have presented in this report. It is in the application to such use cases, that simpler models like Annotea have proven to be limited in their capabilities.

8 6.2 Additional use cases identified Create and attribute bodies from existing (offline) content The scholarly editors collaborating with us during this experiment provided scholarly commentary in the form of books and Word documents that we wished to reuse for annotation bodies. While we were creating annotations to represent this content, we realized that supporting import of annotation bodies from existing content, while maintaining the original authorship so that the body content is properly attributed, is an additional use case. It is important to preserve the authorship of the annotation as well, because the annotator has contributed by selecting a segment of a text as the annotation target, and selecting what they consider to be a relevant part of some existing commentary to attach as the annotation body. The OAC model does allow separate authorship for the annotation, target and body, even if the body is provided as inline content with the annotation. We have added this metadata to annotations as required, through the LOREstore content management web interface, and we will extend our annotation client to support display and management of this additional metadata in the future. Attach geographical co-ordinates Attaching geographical co- ordinates to build a map or list of places, as included in some print editions was identified as an additional use case from user feedback during the experiment. We will investigate supporting this use case as future work. Relate multiple resources that are not associated with textual variation During the experiment, users of the annotation tool began to create variation annotations to relate multiple resources, even when the target was not textual variation. It became apparent that the distinction between Textual Note and Variation Annotation is artificial (the only difference being whether the body is RDF). This suggests that we should modify our annotation tool and rethink our annotation class hierarchy, for example, to select a more generic type instead of VariationAnnotation (e.g. oac:dataannotation) so that both explanatory notes and textual notes can support data bodies to link multiple targets. Export annotations to publication formats Users requested that they be able to export annotations to a Word document, PDF or epub, for archiving and sharing offline. We developed an export to Word feature for our annotation tool in response to these requests. This new use case suggests that it would be useful to investigate how to embed OAC annotation metadata within non- Web- based publication formats e.g. using RDFa, to provide an alternative method of interchange, rather than just a one- way export. 6.3 Modelling insights The main strengths of the OAC model for representing annotations for collaborative scholarly editions that we have identified during this experiment are as follows: The OAC model supports multiple targets, and each can be associated with a constraint for specifying the segment of interest, which means that we can

9 create interoperable annotations describing textual variation across multiple versions without extending the model or creating aggregate targets. By comparison, within the Annotea model there is no mechanism for associating an Annotea context (used for selecting the segment of a text) with a target, so when mapping variation annotations to Annotea, we used non- standard extensions, which rely on naming conventions rather than explicit semantic relationships. Targets and bodies can be any media type and can be located on any server. This flexibility means that we can directly annotate digitised materials (e.g. facsimile images) that have been made available through external archives and libraries. It also allows us to create RDF data annotation bodies, so that metadata properties associated with the body can be stored separately rather than included in the annotation graph, making the provenance of those statements clear. Because the OAC model is RDF- based, it is a trivial exercise to extend the model and include properties from existing ontologies within the annotation graph. We use custom properties to link target documents to FRBR entities, allowing us to query and retrieve annotations across multiple versions of the same FRBR expression or work Subclassing of oac:annotation This experiment has clarified our thoughts about when to subclass oac:annotation as opposed to using a tag or some other property to categorise annotations. We recommend subclassing oac:annotation under the following circumstances: For a small, fixed set of types; When the annotation types represent distinct concepts in the domain, and there is a requirement to provide search, filtering, or different presentation or handling of annotations according to those types. For example, printed scholarly editions almost always present explanatory notes in a standalone chapter or appendix, while descriptions of textual variation and textual notes are often published as apparatus in footnotes or on the opposite page to the text. Some editions also include additional sections for glossaries, and for place names with a map, so these would be candidates for subclasses. When the type relates to the semantics of the annotates relationship between the body and the target (e.g. for an explanatory note, the body provides an explanation for the target selection, and for a glossary entry, the body is a definition of the target selection). Conversely, we recommend using tags rather than a subclass to distinguish annotations when: The list of types is large or requires frequent reorganisation or extension; Interpretation is required in order to determine which type applies or multiple types may apply to individual annotations; or The type relates to the content of either the body or the target and not the nature of the relationship between body and target. Based on these insights, we should support tagging explanatory notes as classical, biblical, historical, mythological allusion etc., because allusion is about

10 the target only and is subjective, so it makes sense to create the tag as part of the annotation body rather than using an annotation subclass. The annotation types that could be represented as subclasses for our experiment include TextualNote and ExplanatoryNote, with subclasses Glossary and GeographicalNote. Modifying our annotation tool and repository to support using tags to distinguish annotation types e.g. for types of allusion, remains work to be completed in the future Aggregate Targets vs. Bodies Another modelling issue that we grappled with was whether to use aggregate targets when annotating differences between multiple versions of a work. Our initial approach was to create aggregate targets for Variation Annotations, however, we soon realised that the ordering and relationships that were being asserted in the aggregation by the annotator information that was not recorded prior to the creation of the annotation and which is subject to interpretation actually constituted part of the scholarly commentary content of the annotation, so we decided to use an aggregate body instead. Using multiple targets for the versions rather than an aggregation also greatly simplified the SPARQL queries for retrieving annotations by target (the most common query generated via our annotation tool) Referencing vs. targeting We have found it useful to remember that an oac:annotation reifies an annotates relationship between body and target, and that the body should be about the target(s) as a guideline for when to target vs. when to reference a resource within an annotation. For example, when creating an annotation that asserts that a publisher s editor made a change that occurred between two versions of a work, we may wish to link to some correspondence between the author and publisher so that a future reviewer of the edition will be able to view this evidence. The annotation would target the two versions of the work, however, the commentary is not really about the correspondence, and hence we would reference the letter using a dc:relation property in the variation annotation body, rather than including it as another target of the annotation Conclusion Our experiment has demonstrated that the OAC model can be applied to collaborative authoring of electronic scholarly editions to: support annotations to enable discussion; document textual variation between multiple versions of a work; attach scholarly commentary in the form of explanatory and textual notes. Acknowledgements The authors particularly wish acknowledge the valuable contributions to this experiment made by Dr Roger Osborne (UQ), Professor Paul Eggert (UNSW) and Professor Tim Dolin (Curtin University).

11 Appendix A Annotation RDF Serialisations The following prefixes apply for all rdf: < rdf- syntax- dc: cnt: dcterms: foaf: oac: < Explanatory Note from Figure 1 < dc:language "en" ; dc:title "Go thou and do likewise" ; dcterms:created " T13:32: :00"^^dcterms:W3CDTF; oac:hasbody <urn:uuid:f A0AA- 732F20FCAFEA> ; a < ; a < annotation- ns#explanatorynote> ; oac:hastarget rphy&version1=2#xpointer(string- range(id("id "), "", 431, 26))> dcterms:creator < <urn:uuid:f A0AA- 732F20FCAFEA> a < ; cnt:characterencoding "UTF- 8" ; cnt:chars "Cf. 'Go and do thou likewise' (Luke 10:37), but in the context of procreation, perhaps a faint echo of Be fruitful, and multiply, and replenish the earth (Genesis 1:28)". rphy&version1=2#xpointer(string- range(id("id "), "", 431, 26))> dcterms:ispartof rphy&version1=2>. < a < ; foaf:name "Roger Osborne". Variation Annotation from Figure 2 < dc:title "trying to turn] TSb A running in TSa" ; dcterms:created " T17:35: :00"^^dcterms:W3CDTF ; oac:hasbody < ; a < ; a < annotation- ns#variationannotation> ; oac:hastarget rphy&version1=1#xpointer(string- range(id("id "), "", 145, 10))> ; oac:hastarget rphy&version1=2#xpointer(string- range(id("id "), "", 245, 14))> dcterms:creator < < dc:format "application/rdf+xml".

12 Body for Variation Annotation: < a < ; ore:describes < ; dc:creator "Anna Gerber" ; dc:desciption "Furphy appears to have tested a revision on TSa, by striking through 'running in a single steer' and beginning a replacement that started with 'shouldering' before leaving the revision alone. By the time Furphy completed the BBB TS, 'trying to turn' had been inserted. This suggests that another document lies between TSb and TSa. " ; lit:variation- agent "Joseph Furphy". < a < ; ore:aggregates rphy&version1=1#xpointer(string- range(id(%22id %22),%20%22%22,%20145,%2010))> ; ore:aggregates rphy&version1=2#xpointer(string- range(id(%22id %22),%20%22%22,%20245,%2014))>. rphy&version1=1#xpointer(string- range(id(%22id %22),%20%22%22,%20145,%2010))> dcterms:ispartof rphy&version1=1> ; dc:title "typescript". rphy&version1=2#xpointer(string- range(id(%22id %22),%20%22%22,%20245,%2014))> dcterms:ispartof rphy&version1=2> dc:title "1948". rphy&version1=2> lit:surrogatefor < ; lit:isvariantof rphy&version1=1>. rphy&version1=1> lit:surrogatefor < Textual Note from Figure 3 < dc:language "en" ; dc:title "Amen" ; dcterms:created " T15:54: :00"^^dcterms:W3CDTF ; oac:hasbody <urn:uuid:b C46A- 4DD9- BA3E- 4FB8B4D36BC4> ; a < ; a < annotation- ns#textualnote> ;

13 oac:hastarget rphy&version1=2#xpointer(string- range(id("id "), "", 887, 4))> ; dcterms:creator < <urn:uuid:b C46A- 4DD9- BA3E- 4FB8B4D36BC4> a < ; cnt:characterencoding "UTF- 8" ; cnt:chars "TS continues with a long paragraph in which Jeff Rigby, who was dropped from BB and SIL, advises Mrs Falkland- Pritchard on her career as an authoress. Rigby identifies Dickens, Rousseau s Social Contract, Paine s Rights of Man, Uncle Tom s Cabin and Don Quixote as works that marked an epoch and brought about moral revolution. Some of the exchanges originally given to Rigby in TS fall to Tom Collins, but never any serious moralising about life or art.". rphy&version1=2#xpointer(string- range(id("id "), "", 887, 4))> dcterms:ispartof rphy&version1=2>. Reply from Figure 4 < dc:language "en" ; dc:title "Re: Amen" ; dcterms:created " T16:34: :00"^^dcterms:W3CDTF ; oac:hasbody <urn:uuid:e20d5767-4c0b D6B- 20C72560E418> ; a < ; oac:hastarget < ; dcterms:creator < <urn:uuid:e20d5767-4c0b D6B- 20C72560E418> a < ; cnt:characterencoding "UTF- 8" ; cnt:chars "While not deemed suitable for The Buln Buln and the Brolga, this passage is significant to the argument of Such is Life (1898). Furphy is much more concerned with exploring the 'fiction of facts' and the 'facts of fiction' in the typescript version. Returned to their previous context, the unrevised sections of the Buln Buln and the Brolga perform a different function in a significantly different narrative.". Appendix B Annotation Tool Screenshots The following screenshots illustrate some typical uses of the LORE Annotation sidebar: discussion between collaborators in the form of comments and replies, viewing annotations attached to a single version or variation between two versions of a work, and creating an annotation.

14 Figure B.1: Scholarly discussion through Comments and Replies Figure B.2: UI for creating a variation annotation

15 Figure B.3: Annotations on a single version of a Work Figure B.4: In-browser variation annotation display

16 Figure B.5: Creating an Explanatory Note using the experimental annotation sidebar for Chrome Appendix C Repository Screenshots The lorestore repository was developed for this experiment, to support storage, search and display of OAC annotations and ORE Resource Maps used for variation annotation bodies. The source code for the lorestore repository has been released under a GPL 3.0 open source license, and is available on GitHub at eresearch/lorestore/ A sandbox instance is deployed at Annotation Search and Display The lorestore web interface supports keyword search and search by target. Annotations can be displayed (Fig C.1), and can be retrieved in a variety of formats including TriG, TriX, RDF/XML and JSON, or visualised as a graph (Fig C.2). A SPARQL endpoint and editor (Fig C.3) is also provided to enable custom queries.

17 Figure C.1: Annotation summary display Figure C.2: Graphical annotation display

18 Figure C.3: SPARQL endpoint and query editor REST API A REST API was developed to support creating, reading, updating, deleting and querying of annotations. Documentation and code examples of usage (Fig C.4) are available through the lorestore web interface.

supports role- based access control and authentication.

19 Figure C.4: API documentation and usage examples Administration The lorestore repository supports role- based access control and authentication. Annotations and data bodies can be published publicly or privately. Administrators can manage content (Fig C.5) and user accounts (Fig C.6) through the web interface. Figure C.5: Content management functions

20 Figure C.6: User account management functions Appendix D Conference Presentations The following conference presentations have included discussion of or content relating to this annotation demonstration experiment: A. Gerber, R. Osborne, Transforming Communication in Textual Scholarship: Open Annotation for Electronic Editions, Digital Humanities Australasia (DHA) 2012, Canberra, March Slides available: DHA2012- slides- web.pdf The abstract for this presentation is provided as Appendix D.1. R. Osborne, A. Gerber, K. Kilner, "Using LORE", THATCamp Canberra, 7-9 October, 2011, Slides available: THATCamp2011.pdf A. Gerber, "LORE: An open source research tool for Australian literary scholars", linux.conf.au, January, 2011, Brisbane, Australia, Slides available: LCA2011Slides.pdf The following paper has also been accepted and will be presented in July: R. Osborne, A. Gerber, J. Hunter, Ontology- based Annotation for Electronic Editions using the Open Annotation Collaboration (OAC) Data Model, Ontology- based Annotation Workshop, Digital Humanities The abstract is provided as Appendix D.2 (figures have been elided for brevity)

21 Appendix D.1 - Abstract for Paper presented at Digital Humanities Australasia, March 2012 Transforming Communication in Textual Scholarship: Open Annotation for Electronic Editions Anna Gerber & Roger Osborne, The University of Queensland Abstract The Open Annotation Collaboration (OAC) provides a framework for sharing scholarly annotations across clients, servers, collections, applications and architectures. The OAC data model is based on linked data and semantic web principles, and can be tailored to meet the complex scholarly annotation requirements of specific research communities while maintaining interoperability. In this paper, we describe how we have applied the OAC model to support annotation within an electronic edition of Joseph Furphy s Such is Life. When preparing a scholarly edition, the editors aim to provide a comprehensive description of the history of a work, specifically information about significant versions and physical forms. In addition to a substantial textual essay, editorial decisions are argued in textual notes, and a textual apparatus is compiled to record the alterations made between different versions. Modern scholarly editions are frequently collaborative ventures with multiple editors, advisers and an editorial board dispersed globally. The open- source annotation toolkit that we have developed enables editors to relate transcripts with facsimiles; attach textual and explanatory notes to text and image selections; reference secondary sources; record information about textual variations; and to engage in collaborative discussion through comments, questions and replies. The flexibility of the OAC model allows us to use the same toolkit for annotations at all stages of the scholarly editing process, leaving a record of editorial decisions and allowing export for publication in print or electronic form. In 2003, editorial theorist Jerome McGann wrote, In the next fifty years the entirety of our inherited archive of cultural works will have to be reedited within a network of digital storage, access, and dissemination. This system, which is already under development, is transnational and transcultural. Tools such as those being developed for OAC will make a significant contribution to the thought and practical applications that flow from McGann s prediction. Appendix D.2 - Abstract submitted to DH2012 Ontology- Based Annotation Workshop Ontology- based Annotation for Electronic Editions using the Open Annotation Collaboration (OAC) Data Model

22 Roger Osborne, Anna Gerber, Jane Hunter The University of Queensland 1. Introduction Scholarly editions of literary works include significant amounts of information in explanatory notes, textual notes and glossaries. Print- based editions are limited by the amount of page space allocated, but electronic editions can support more comprehensive collections of notes and additional information to supplement the longer historical and textual essays that provide the main scholarly argument about the need for the edition and the validity of the editorial rationale. In an electronic edition, these notes may take the form of annotations. An ontology- based annotation system can extend the usefulness of notes beyond the limits of static, print- based models, and enable their discovery, sharing and re- use via the Web. In an electronic edition that includes facsimiles, transcriptions and collations, annotations provide an extra layer of information about the nature of the documents, the textual content of each document, the textual transmission between documents, and the various people and organisations that played a part in the production of the literary work over time. Annotations can also provide glosses about the text itself, identifying such things as allusions to other works, historical contexts and stylistic significance. Digital images and transcriptions provide a surrogate for the material artefacts held in libraries and archives, enabling the relationships between documents, people and organisations to be efficiently modelled within an ontology- based annotation system. Modelling the relationships between documents, people and organisations makes explicit the many implicit assumptions that exist in the mind of the editor and the intended audience for the edition. It also provides an alternative to text- based communication of standard explanatory notes by supporting graphical and tabular representations of information and by allowing powerful semantic querying, filtering, and faceted browsing within and across electronic editions. The complex range of internal and external relationships that emerge from a scholarly edition not only test the limits of print- based editions, but also test the limits of hierarchical data models. The graph- based, flexible, extensible nature of an ontology- based system is better suited to representing the complete history of literary, philosophical and historical works. In this paper we describe and discuss some of the challenges involved with how we are applying the Open Annotation Collaboration (OAC) data model [1] within the Australian Electronic Scholarly Editing (AustESE) i project to represent these data. Our OAC- based annotation system is enhanced by integration with the IFLA FRBR [2] taxonomy, which provides a solid bibliographical foundation for annotations to traverse all conceptual levels of a work. This benefits the editor by providing a well- structured environment to collect, describe, and analyse a work, but it also benefits readers by providing a wider variety of reading strategies to help them pursue their study of a particular work and its multiple derivative forms.

23 2. Modelling Approach The OAC provides a common data model for representing annotations across tools, architectures and collections. The model, which is expressed as an OWL ontology, is intended to be extensible, so that it can be refined to meet the annotation requirements of specific communities. We have extended the OAC data model with specialised annotation types to support the production of apparatus and commentary within electronic editions by subclassing the oac:annotation class. We categorise annotations as ExplanatoryNotes (providing commentary), TextualNotes (which provide support for editorial decisions), or VariationAnnotations (which describe textual variation between versions of a work). These annotation types can be used in search queries and for filtering and sorting annotations for display and inclusion for print or electronic publication. We have defined additional properties that may be used within the body of a VariationAnnotation, to record metadata about the agent, date or cause of the variation as well as documentary evidence including links to manuscript facsimiles. Within our RDF- based annotation tool and annotation repository ii, we have adopted a Linked Data approach of using HTTP URIs to identify entities that may be referenced within annotations, including documents, agents (people or organisations) and conceptual FRBR entities (Works, Expressions, Manifestations and Items). We use FOAF and Dublin Core to record annotation provenance, and we apply and extend the FRBR ontology iii with properties that relate the transcriptions and corresponding facsimile images that are being annotated. The base oac:annotation and oac:reply types are used in our system to support comments and discussion between collaborating editors content which is not usually considered to be part of the scholarly content of the edition. The flexibility of the OAC model, and particularly its extensibility and support for multiple targets and RDF annotation bodies, allows us to use the same annotation tools and repository at all stages of the scholarly editing process. Semantic tagging can be used in addition to the customised annotation types, to identify annotations that serve different purposes within the editing workflow - ultimately supporting efficient filtering and customised views that can be adapted for different modes of publication or intended audiences. 3. Discussion and Challenges Modern scholarly editions are frequently collaborative ventures with multiple editors, advisers and an editorial board dispersed globally. But to extend electronic editions beyond the closed, finished or abandoned, look- but- don t- touch products, described by Peter Shillingsburg [3], scholarly editing needs to be conducted in collaborative, open- ended electronic environments. Such an environment will support the scholarly editing model advocated by Hans Walter Gabler: one that is predicated on the functional correlation of bodies of material content in a systemics of discourses and argument [4]. Peter Robinson [5] suggests that the future of scholarly editing lies with a network of many servers, all holding different parts of an edition, with many other servers providing a range of services to the readers and scholars interested in this edition. Although the OAC ontology allows us to address these plans by representing the structure

24 of the annotations consistently across a range of tools and servers, we have identified several challenges to interoperability that should be resolved before seamless sharing of annotations within such collaborative, relational and distributed editions can be achieved, including: Identifying entities (e.g. people, documents, works etc.) consistently across annotation systems and content servers, so that queries can retrieve and display all annotations on a given entity. Use of name authorities or community agreement on naming conventions may help to address this issue, however any solution must also be applicable to conceptual entities such as semantic tags and non- extant resources (e.g. missing manuscripts) that may be referenced within annotations. Describing segments of digital transcriptions and facsimiles independently of their media type or format, for example, through use of a common schema that allows segments to be described by line, paragraph and page references; prefix- postfix notation; content offsets, etc. TEI and HTML have addressing schemes but these are low- level and format- specific. Developing strategies to manage the subjectiveness of interpretation that may be involved in deciding how to describe versions of a work in terms of FRBR, and how to relate particular documents to those FRBR entities, particularly when dealing with manuscripts and digital surrogates. One strategy would be to apply semantic inferencing rules to align bibliographic structures between systems. 4. Conclusion By using an ontology- based annotation system to represent knowledge that would normally be assumed of experienced readers of an edition, electronic editions can be made accessible to a wider audience. The ability to search, browse and represent information in graphical and tabular form will greatly assist new readers and novice researchers to navigate the large amounts of information and the complex networks of relationships that are captured in an electronic edition. These features will also benefit scholarly editors by recording the processes of editing in a way that better supports comprehensive checking, verification and review by external bodies. Ultimately, ontology- based annotation systems will enable collaborative, distributed editions to more easily share information across platforms, taking full advantage of the potential of semantic web technology, and accelerating the creation and communication of knowledge. Acknowledgements This work was undertaken as an annotation demonstration experiment through the Open Annotation Collaboration (OAC) and will be further developed for the AustESE project. The OAC is funded by the Andrew W. Mellon Foundation and the partners of the collaboration. The AustESE project is funded through the Australian National eresearch Collaboration Tools and Resources (NeCTAR) eresearch tools program. References [1] R. Sanderson and H. Van De Sompel, Open Annotation: Beta Data Model Guide, [Online]. Available: [Accessed: 12-Apr-2012].

25 [2] IFLA Study Group on the Functional Requirements for Bibliographic Records, Functional requirements for bibliographic records : final report, UBCIM publications, new series, vol. 19, [3] P. Shillingsburg, How Literary Works Exist: Implied, Represented, and Interpreted, in Text and Genre in Reconstruction: Effects of Digitalization on Ideas, Behaviours, Products and Institutions, W. Mccarty, Ed. Cambridge: OpenBook Publishers, 2010, pp [4] H. W. Gabler, Theorizing the Digital Scholarly Edition, Literature Compass, vol. 7, no. 2, pp , [5] P. M. W. Robinson, Towards a Scholarly Editing System for the Next Decades, Lecture Notes in Computer Science, vol. 5402, pp , i AustESE Project: ii lorestore eresearch/lorestore/ iii Expression of Core FRBR Concepts in RDF

LORE: A Compound Object Authoring and Publishing Tool for Literary Scholars based on the FRBR. Anna Gerber, Jane Hunter

LORE: A Compound Object Authoring and Publishing Tool for Literary Scholars based on the FRBR Anna Gerber, Jane Hunter Open Repositories 2009 Overview LORE: Literature Object Reuse and Exchange Background