Embedding Metadata and Other Semantics In Word-Processing Documents

Similar documents
RDF and Digital Libraries

XML Metadata Standards and Topic Maps

Dot Porter. Editing Options for TEI Users. Dot Porter

Web Standards Mastering HTML5, CSS3, and XML

Semantic Web. Lecture XIII Tools Dieter Fensel and Katharina Siorpaes. Copyright 2008 STI INNSBRUCK

ODF Perspectives Panel discussion. OASIS ODF Adoption TC

Understanding Page Template Components. Brandon Scheirman Instructional Designer, OmniUpdate

LORE: A Compound Object Authoring and Publishing Tool for Literary Scholars based on the FRBR. Anna Gerber, Jane Hunter

Abstract. The problem: paper-based thinking. doi: /j.serrev

MadCap Flare Training

DCMI Abstract Model - DRAFT Update

Metadata and Encoding Standards for Digital Initiatives: An Introduction

OpenOffice.org Writer

Methodology and Technology Services

Labelling & Classification using emerging protocols

> Semantic Web Use Cases and Case Studies

Chapter 12 Creating Web Pages

IBM Rational Rhapsody Gateway Add On. Tagger Manual

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

Create the Left Navigation SSI Quick Guide

Chapter 5 Making Life Easier with Templates and Styles

From Open Data to Data- Intensive Science through CERIF

Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS. Jenn Riley IU Metadata Librarian DLP Brown Bag Series February 25, 2005

WYSIWON T The XML Authoring Myths

Dictionary Driven Exchange Content Assembly Blueprints

Report from the W3C Semantic Web Best Practices Working Group

RIT Wiki 5.1 Upgrade - May 21, 2013

Automate to Innovate L EA RN WHAT SCRIPTING CAN DO FOR YOU P U N E E T S I N G H

An integrated approach to preparing, publishing, presenting and preserving theses

Semantic Web: vision and reality

Intro to XML. Borrowed, with author s permission, from:

For those of you who may not have heard of the BHL let me give you some background. The Biodiversity Heritage Library (BHL) is a consortium of

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

Creating Your Paper or Thesis With LYX

QUARK AUTHOR THE SMART CONTENT TOOL. INFO SHEET Quark Author

WWW2004: Semantic Web Implementation at Adobe. Chuck Myers Adobe Systems Incorporated May 20, 2004

FROM 4D WRITE TO 4D WRITE PRO INTRODUCTION. Presented by: Achim W. Peschke

Internet Standards for the Web: Part II

Give Your DITA wings with taxonomy & modern web design. Joe Pairman

HTML5 & CSS 8 th Edition. Chapter 2 Building a Webpage Template with HTML5

Overview 14 Table Definitions and Style Definitions 16 Output Objects and Output Destinations 18 ODS References and Resources 20

Data is the new Oil (Ann Winblad)

A Formal Definition of RESTful Semantic Web Services. Antonio Garrote Hernández María N. Moreno García

Macromedia RoboHelp Course Outline

XML Motivations. Semi-structured data. Principles of Information and Database Management 198:336 Week 8 Mar 28 Matthew Stone.

Oracle Workflow. 1 Introduction. 2 Web Services Overview. 1.1 Intended Audience. 1.2 Related Documents. Web Services Guide

ebxml Technical Architecture San Jose, CA USA Wednesday, 9 August 2000

Single Source Publishing with Apache Forrest

is an electronic document that is both user friendly and library friendly

WHITE PAPER. Comparison Guide: Choosing Between Help Authoring Tools and CCMSs

Making Information Findable

User Manual. ACM MAC Word Template. (MAC 2016 version)

Cloned page. A Technical Introduction to PDF/UA. DEFWhitepaper. The PDF/UA Standard for Universal Accessibility

Contents. 1. Using Cherry 1.1 Getting started 1.2 Logging in

Who should use this manual. Signing into WordPress

Administrative Training Mura CMS Version 5.6

Chapter 12 Creating Web Pages

How to Export Your Book as epub and Mobi file formats with Microsoft Word and Calibre

Metadata Workshop 3 March 2006 Part 1

Metadata harmonization for fun and profit

FCKEditor v1.0 Basic Formatting Create Links Insert Tables

A tool for Entering Structural Metadata in Digital Libraries

COLUMN. Choosing the right CMS authoring tools. Three key criteria will determine the most suitable authoring environment NOVEMBER 2003

Structured documents

Percussion Documentation Table of Contents

CONTENTdm Basic Skills 1: Getting Started with CONTENTdm

Part A: Getting started 1. Open the <oxygen/> editor (with a blue icon, not the author mode with a red icon).

A Breakthrough In the Science of Proposal Development: P-XML. APMP TM Southern California Fall Seminar. October 22, 2004.

Proposals for a New Workflow for Level-4 Content

Realisation of SOA using Web Services. Adomas Svirskas Vilnius University December 2005

CONTENTdm 4.3. Russ Hunt Product Specialist Barcelona October 30th 2007

Formalizing Dublin Core Application Profiles Description Set Profiles and Graph Constraints

Making Your Word Documents Accessible

The XML Metalanguage

Building Consensus: An Overview of Metadata Standards Development

One of the fundamental kinds of websites that SharePoint 2010 allows

Single Source Publishing with Apache Forrest

Wink and a Wiki. September 18 & 19, Implementing e-learning 2.0 Technologies. Debi Crabtree, Hamilton County Virtual School.

Lecture Telecooperation. D. Fensel Leopold-Franzens- Universität Innsbruck

Flash Album Generator 2 Manual Version 1.0. About Flash Album Generator 2. Flash Album Generator 2 Manual version 1.0 DMXzone.com

Chapter 2. Architecture of a Search Engine

Europeana update: aspects of the data

Lou Burnard Consulting

OpenOffice.org Programmability at a glance. Jürgen Schmidt OpenOffice.org Sun Microsystems, Inc.

October Where is it now? Working Papers and CaseView. For the Engagement team

Monarch Services Website Quick Guide

Publishing Technology 101 A Journal Publishing Primer. Mike Hepp Director, Technology Strategy Dartmouth Journal Services

XML-based production of Eurostat publications

Website Designing Training

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

odt2daisy Instruction Manual Vincent Spiewak

How to Build a Digital Library

Created with CMB AutoDoc - Server licence for ID Data

Introduction to XML. Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University

Chapter 3. Architecture and Design

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

MOODLE MANUAL TABLE OF CONTENTS

SciX Open, self organising repository for scientific information exchange. D15: Value Added Publications IST

or: How to be your own Good Fairy

Reading books on an ipad or other electronic device is a

Transcription:

Embedding Metadata and Other Semantics In Word-Processing Documents Peter Sefton (University Southern Queensland) Ian Barnes (Australian National University) Ron Ward (University Southern Queensland) Jim Downing (University of Cambridge) (presenting) [breath] The paper supporting this presentation provides important detail and can be obtained from http://www.dspace.cam.ac.uk/handle/1810/206423

Agenda Motivations Axioms of choice Interoperability is Hard The approach Examples (+ chemistry) http://www.flickr.com/photos/forezt/524108228

Why is this interesting? We want to move towards semantically-rich documents for e-research. In some disciplines 100% of documents start life in a word processor. Introduction of real world constraints yields interesting result

Semantically Rich Documents Enable automation Prevent information loss Better discovery Improved presentation Automation - zero click upload, not filling in redundant forms etc Information loss - rich data reduced to tables, images. Semantic information leads to richer alternatives for discovery and communication of research. Fully Supported Research - all the supporting data delivered with the text

Constraints Work in the real world, today http://www.flickr.com/photos/amirjina/2281612876 Solution had to work in ICE - the Integrated Content Environment, a distributed authoring system in production at USQ. Therefore the approach is PRAGMATIC!

Real World Metadata, semantics and data not easily distinguished Document creation == Metadata creation Not separable activities Metadata is in the document Documents have multiple, distributed authors

Tools and Formats Microsoft Word [Adoption] OpenOffice.org writer [Access] ICE - Integrated Content Environment.doc,.docx, OOXML, ODF HTML, PDF

The Difference Between Standards and Interoperability This is the test that semantic solutions must work inside to be useful in production - once semantics are created, they must survive when the document is edited in the wild.

There are a lot of standards already involved in this space, but none of them on their own deliver semantic data interoperability. This is the simple subset of document interop we re talking about, including only word and OO Writer. In the wild you can t control what formats people use to save, or the software they use. If any of these routes destroys semantics, then we ve lost interoperability.

Interoperability in Publishing PDF - scholarly publishing now HTML - the medium term future of scholarly publishing. Converter needed since HTML and PDF creation in OOo Writer and MS Word produce pretty poor results.

http://www.flickr.com/photos/druclimb/289636172 When you apply these interoperability constraints, the solution space gets very small. <metaphor>like walking along a ridge, keep it simple and take small steps. The paths off to the side lead quickly to peril.</metaphor>

Approaches Ruled Out MS Word Smart Tags No interop with OOo, but not necessarily a bad idea MS Word foreign namespace XML encoding Expensive, no interop with OOo, lock-in issues ODF 1.2 embedded semantic No Word equivalent in sight Things that would destroy WYSIWYG such as using wiki markup in the word processor.

Define A New Encoding Standard? Codifying a standard wouldn t work unless vanilla wp software can be shown not to destroy the information. For delivering interoperability in this area, standards are not sufficient.

Microformats! http://www.flickr.com/photos/onion/2046003604

Encoding Microformats Tables: for, like, tabulating things Styles: The original extensible inline semantic mechanism for word processing and still working! Links Frames: fragile Bookmarks and fields: require lots of field testing, not all that reliable in an interop situation The paper contains much more detail about the mechanism.

Styles The style approach is: - * Simple * Metadata schema agnostic * User extensible It doesn t /need/ any plugin / customized software to work.

Style: p-meta-author Style: p-meta-affiliation Style: p-meta-issued Style: p-meta-abstract Styles can be nested by placing inline styled text within styled paragraphs.

{ 'title':['metadata in ICE documents'], 'author':[{'name':'ian Barnes', 'affiliation':'anu'}, {'name':'peter Sefton', 'affiliation':'usq'} ] } Tables are also useful since the layout implies semantics

Toolbars The toolbars are implemented for Word and Writer. They provide easy access to the common microformat encoding styles and structures. They also contain macros for communicating with the ICE system, and uploading the document to the Institutional Repo / publisher system etc.

http://www.flickr.com/photos/jima/460348206 To make it even easier, templates can be used that include sample text in the relevant places - all the user has to do is replace the sample text.

Dublin Core metadata can be extracted directly from the document.

As can RDF metadata using the ORE vocabulary.

ICE-TheOREM Semantics in chemistry thesis documents Structural elements, Chapters, Appendices etc Data (molecules, spectral data etc) Chemical entities in text http://wwmm.ch.cam.ac.uk/trac/theorem/

Chemistry Style: p-exptl-compound Link to data. Style: p-exptl-compnum Style: p-compound-name This text from a synthetic chemistry thesis. Highlights the grey area between data and metadata - the compound name is data, but also the subject of the document.

These screenshots taken from CML in ICE demo at http://ice.usq.edu.au/presentations/ demos/index.htm

These screenshots taken from CML in ICE demo at http://ice.usq.edu.au/presentations/ demos/index.htm

These screenshots taken from CML in ICE demo at http://ice.usq.edu.au/presentations/ demos/index.htm

These screenshots taken from CML in ICE demo at http://ice.usq.edu.au/presentations/ demos/index.htm

These screenshots taken from CML in ICE demo at http://ice.usq.edu.au/presentations/ demos/index.htm

FIN Thank you. http://www.flickr.com/photos/jaysun/367670007

ICE - Integrated Content Environment http://ice.usq.edu.au/ Demos at http://ice.usq.edu.au/presentations/demos/ ICE-TheOREM. Tag: jisctheorem https://wwmm.ch.cam.ac.uk/trac/theorem http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2007/ theoremice.aspx Peter Sefton sefton@usq.edu.au http://ptsefton.com/ Jim Downing ojd20@cam.ac.uk http://wwmm.ch.cam.ac.uk/blogs/downing/