PLANETS, Document Conversion Tools and the OpenXML/ODF Translator

Similar documents
The PLANETS Testbed. Max Kaiser, Austrian National Library

Protection of the National Cultural Heritage in Austria

Testbed a walk-through

Keeping Emulation Environments Portable FP7-ICT System User Guide for the Emulation Framework version 1.0 (May 2011)

Current Digital Preservation Trends and SDB4 Features. Pauline Sinclair, Tessella, PASIG, Madrid, 5 July 2010

verapdf after PREFORMA

IST A blueprint for the development of new preservation action tools

A large-scale International IPv6 Network. A large-scale International IPv6 Network.

National Body Priority MI Topic MI Theme NB Comment. NB_Lookup

Emulation Framework. System User Guide. Release (February 2012) Keeping Emulation Environments Portable FP7-ICT

This document is a preview generated by EVS

The CHAIN-REDS Project

IMI WHITE PAPER INFORMATION MAPPING AND DITA: TWO WORLDS, ONE SOLUTION

Excel-erator. Enhancement Summary V1R9M0 Product Number 2A55XL1

An Approach to Software Preservation

Sustaining and Enhancing the Value of Digital Assets

The ESA CASPAR Scientific Testbed and the combined approach with GENESI-DR

Post Digitization: Challenges in Managing a Dynamic Dataset. Jasper Faase, 12 April 2012

Promoting semantic interoperability between public administrations in Europe

Apple Inc. US 6,587,904 US 6,618,785 US 6,636,914 US 6,639,918 US 6,718,497 US 6,831,928 US 6,842,805 US 6,865,632 US 6,944,705 US 6,985,981

THE CO-CITIES PACKAGE INTRODUCTION TO VERSION 1 STATUS NOVEMBER 2014

Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure

1 st Int. Workshop on Standards and Technologies in Multimedia Archives and Records (STAR), Lausanne, /27

An Introduction to Digital Preservation

Interoperability and transparency The European context

Different approaches to digital preservation

IBM Ultrium TB Data Cartridge models 550, 650, 570, and 670 nearly double the capacity of previous generation IBM LTO Ultrium cartridges

XML STANDARDS FOR ARCHIVING LEGISLATIVE RECORDS

Introduction to InTouch Edge HMI

Digital Preservation DMFUG 2017

What s New. Essential Studio Reporting Edition 2012 Volume 2

Digital Vellum Vint Cerf Google

The use of emulation tools as part of a strategy for long-term preservation of digital records

SCUtils Knowledge Base Installation Guide Solution for Microsoft System Center 2012 Service Manager

Sustainable File Formats for Electronic Records A Guide for Government Agencies

ArcaOS Where ArcaOS is going in the next 18 months

Developing a social science data platform. Ron Dekker Director CESSDA

What s New Essential Studio Reporting Edition

Distributed Multitiered Application

JPEG 2000 Archive Profiles

Striving for efficiency

High Fidelity Programmatic Access to Document Content

ODF and OOXML in Denmark

Emulation for digital preservation in practice: the results

The Eclipse Parallel Tools Platform Project

An overview of the OAIS and Representation Information

OpenOffice.org as a platform for developers

TEI, METS and ALTO, why we need all of them. Günter Mühlberger University of Innsbruck Digitisation and Digital Preservation

Characterisation. Digital Preservation Planning: Principles, Examples and the Future with Planets. July 29 th, 2008

I so want to know about ISO 20022

DigCCurr 2009 Digital Curation Practice, Promise and Prospects. IBM Demo Session

ECP-2008-DILI EuropeanaConnect. D5.7.1 EOD Connector Documentation and Final Prototype. PP (Restricted to other programme participants)

Intellicus Getting Started

Building for the Future

EMC CUSTOMER UPDATE. 12 juni 2012 Fort Voordorp. WHAT S NEW IN EMC AVAMAR 6.1 Arjo de Bruin. Copyright 2012 EMC Corporation. All rights reserved.

This paper focuses on the issue of increased biometric content. We have also published a paper on inspection systems.

ISO 20022, T2S and coexistence Advisory Group, Vienna, 1-2 June 2010

Document Interoperability

INTERLINK A EUROPEAN ROAD OTL. LDAC 5 th Linked Data in Architecture and Construction Workshop November 2017 Dijon, France

DuraArK Durable Architectural Knowledge

October 1, 2017 MPEG-2 Systems Attachment 1 Page 1 of 7. GE Technology Development, Inc. MY A MY MY A.

ABBYY FineReader 10. Professional Edition Corporate Edition Site License Edition. Small and medium-sized businesses or individual departments

BlackBerry Attachment Service

Preservation Planning for a Personal Digital Archive Paul Wilson

Cooperative ITS Corridor Joint Deployment

ELFms industrialisation plans

ICT & Digital Cinema A new start? Roberto Cencioni. DG Information Society and Media. Unit INFSO.E2 - Content & Knowledge

CLOUD QUALITY AND CLOUD CERTIFICATION

Real Player 11 Manual Xp Full Version Latest Version For Windows

GEOSS Data Management Principles: Importance and Implementation

AIT Austrian Institute of Technology Accessing remote I/Os with Ethernet POWERLINK and 4DIAC

Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data

Lionbridge ondemand for Adobe Experience Manager

Luckily, our enterprise had most of the back-end (services, middleware, business logic) already.

Fedora Commons: Taking on the Challenge of the Next Generation of Scholarly Communication

From The European Library to The European Digital Library. Jill Cousins Inforum, Prague, May 2007

South African Science Gateways

Take Your Oracle Forms on the Road Using ADF Mobile. Mia Urman, OraPlayer & Denis Tyrell, Oracle Corporation

Co-operative Development of a Long-term Digital Information Archive

Toward Horizon 2020: INSPIRE, PSI and other EU policies on data sharing and standardization

CitiDirect for Securities New Features February 2019

B. Assets are shared-by-copy by default; convert the library into *.jar and configure it as a shared library on the server runtime.

HP Storage Summit 2015 Transform Now.

ODF API - ODFDOM. Svante Schubert Software Engineer Sun Microsystems, Hamburg

DCH-RP and PREFORMA Two case studies on the digital preservation of cultural heritage

Project acronym: TraSer. Duration: 36 months. System Architecture

CCH Trust Accounts. Version Release Notes

<Insert Picture Here> The Oracle Fusion Development Platform: Oracle JDeveloper and Oracle ADF Overview

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

ODF Perspectives Panel discussion. OASIS ODF Adoption TC

Oracle Fusion Middleware

Adobe Acrobat 8 Professional - Available November 8, 2006 Communicate and Collaborate with the Essential PDF Solution

NOMAD Metadata for all

Storage Made Easy. SoftLayer

European Open Science Cloud Implementation roadmap: translating the vision into practice. September 2018

Micro Focus Net Express

Nuno Freire National Library of Portugal Lisbon, Portugal

Kopano clients WEBAPP VS DESKAPP VS OUTLOOK VIA ACTIVESYNC

Virtualization Introduction

IPv6 Addressing Status and Policy Report. Paul Wilson Director General, APNIC

Transcription:

PLANETS, Document Conversion Tools and the OpenXML/ODF Translator Document Interoperability Initiative Brussels, 12 November 2009 Wolfgang Keber (wk@dialogika.de)

Overview PLANETS Project Document conversion tools Objectives Technical approach Demo OpenXML/ODF Translator Overview Next steps (ISO 29500 compatibility)

PLANETS Preservation and Long-term Access through Networked Services PLANETS is a four-year project co-funded by the European Union under the Sixth Framework Programme to address core digital preservation challenges. The primary goal for Planets is to build practical services and tools to help ensure long-term access to our digital cultural and scientific assets. (excerpt from http://www.planets-project.eu/)

Partners The British Library National Library, Netherlands Austrian National Library State and University Library, Denmark Royal Library, Denmark National Archives, UK Swiss Federal Archives National Archives, Netherlands Hatii at University of Glasgow University of Freiburg Technical University of Vienna University at Cologne Tessella Plc IBM Netherlands Microsoft Research, Cambridge (with DIaLOGIKa as a Microsoft partner) ARC Seibersdorf research

Sub Projects

Microsoft & PLANETS: Office Documents Microsoft Research s role within PLANETS: Conversion of binary Microsoft Office Documents into Office Open XML File Format (OpenXML) Binary MS Office OpenXML We extended the effort to include other formats More legacy formats, e.g. WordPerfect Other open standards, e.g. Open Document Format. Binary MS Office WordPerfect DOS Word OpenXML ODF UOF

Document Conversion Tools Our Approach Three-step approach, resulting in a modular and extendible infrastructure Identify existing conversion tools and libraries Wrap these tools and libraries into re-usable components Integrate these components into PLANETS and other systems. If possible, do not use the office applications (e.g., Microsoft Office or OpenOffice.org) They are designed as interactive applications Message boxes might pop up ( Do you want ) Unclear license question when running on a server.

Reusable Components ToooXML (GUI) Web Service Watch Folder Tool TB Interface Binary OpenXML Transformer Box (Wrapper)

Extensible Architecture ToooXML (GUI) Web Service Watch Folder Tool TB Interface ODF OpenXML Transformer Box (Wrapper) Binary OpenXML Transformer Box (Wrapper) WP OpenXML Transformer Box (Wrapper)

More Technical Details Currently two types of wrappers for Command-line tools (stand-alone executables) OpenXML/ODF Translator (OpenXML ODF) OpenXML Document Viewer (OpenXML HTML) Microsoft conversion libraries (CNV libraries) WordPerfect RTF RTF OpenXML Wrappers can be chained WordPerfect RTF OpenXML ODF.

Supported Formats Source formats WordPerfect 5 WordPerfect 6 DOS Word Word 2, 6, 95 Office 97-2003 RTF ODF OpenXML Target formats OpenXML ODF UOF HTML XCDL (format defined in PLANETS/PC)

ToOOXML++ Demo ToOOXML++ UI for demonstrating the conversion tools Allows documents to be selected Automatically determines the document format Offers a list of available target formats. Virtual PC Pre-installed ToOOXML++ Legacy applications to view documents in their native applications WinWord 2 WordPerfect 5

OpenXML/ODF Translator (1) Open Source project hosted on SourceForge (http://sourceforge.net/projects/odf-converter) Developed under a liberate BSD-like license Several companies involved Sonata and DIaLOGIKa (development & testing) Novell (OpenOffice.org/Linux integration) Microsoft (funding and coordination)

OpenXML/ODF Translator (2) Available in three variants Add-in for Office 2000, XP and 2003 Add-in for Office 2007 Command-line tool (Office apps not required) All use the same translation kernel based on XSLTtechnology Pre- and postprocessing for special purposes.net Framework 2.0/C# for Office integration Compatibility with other platforms via Mono (e.g. Linux) Test suites based on Documents containing specific features Real documents found in the Internet and from other sources (public administration)

Next Major Step ISO 29500 Compatibility (1) Office Open XML standards Starting point ECMA-376 1 st Edition (December 2006) Office 2007 & File Format Compatibility Pack Current OpenXML/ODF Translator Evolved to ISO/IEC 29500:2008 Identical with ECMA-376 2 nd Edition (December 2008) Office 2010 (more in Doug s presentation) Next major release of OpenXML/ODF Translator File Format Compatibility Pack?

Next Major Step ISO 29500 Compatibility (2) It s a rugged way to ISO 29500 transitional vs. strict producer vs. consumer How can we test the translator? How can we validate the created ISO 29500 documents? How can we create ISO 29500 test documents? What has actually been changed? There is no nice comprehensive Excel sheet with all changes Schema comparison Responses from NBs Resolutions from BRM Final standard

Example: BRM Resolution 7 The BRM resolves to accept the editing instructions contained in http:/.../response_de- 0028_dates_v9.doc in replacement of R 18 and R 43, but with the following corrections: the words ISO value shall be replaced by the words ISO 8601 value ; page 9, line 24 shall be restored and line 23 shall be marked transitional ; and all changes as well as choices among alternatives rendered necessary by the Decision above shall be made results of the vote: 19 in favour; 3 against (EC, US, ZA); 9 abstentions (JP, KR, IE, AU, BR, CN, NL, MX and GR): so resolved R 18 and R 43 Response_DE-0028_dates_v9.doc

Q & A Document Interoperability Initiative Brussels, 12 November 2009 Wolfgang Keber (wk@dialogika.de)