PLATFORM OF TRANSCRIPTION THE OLD ARABIC MANUSCRIPTS

Size: px
Start display at page:

Download "PLATFORM OF TRANSCRIPTION THE OLD ARABIC MANUSCRIPTS"

Transcription

1 PLATFORM OF TRANSCRIPTION THE OLD ARABIC MANUSCRIPTS Noureddine EL MAKHFI Laboratoire de Transmission et Traitement de l Information (LTTI), UFR: Signaux Systèmes et Télécommunications, Faculty of Sciences and Technology, University of Sidi Mohamed Ben Abdellah, Fez, Morocco n.elmakhfi@gmail.com Rachid BENSLIMANE Laboratoire de Transmission et Traitement de l Information (LTTI), University of Sidi Mohamed Ben Abdellah, Fez, Morocco r.benslimane1@gmail.com Abstract : The old manuscripts kept in libraries are a part of the richest cultural heritage and legacy of civilizations. Digitalization is a solution for the preservation of this cultural and historical heritage, which is very difficult to handle for users. On the other hand, restriction of access to national heritage manuscript is related to the concern to preserve the manuscripts physically manipulated which contribute to their accelerated degradation, taking into consideration these limitations on access while ensuring preservation of original manuscripts, the solution widely adopted is based partly on the digitization of this heritage manuscript, and partly on the development of management platforms and diffusion of this wealth of knowledge digitized. We propose in this paper a platform of transcription and establishment by annotating images of manuscripts, these annotations are respecting a XML model. The search in the images of a handwritten document, the rich functionality, intuitive user interface, portability, extensibility and the powerful of the XML technology all make the platform of transcription and establishment an ideal explorer for a specialists and readers of ancient Arabic manuscripts. Keywords: Establishment; Transcription; Manuscripts; Digitalization; Arabic search engine; Metadata; Annotation. 1. Introduction The search for information in Arabic manuscripts is not a simple process, owing to several constraints related to the cursive nature of Arabic script, to the massive presence of diacritical symbols and problems of overlapping between words and lines in manuscripts Arabic. Many research works and projects devoted to the processing of ancient and modern manuscripts [1]: The BAMBI (Better Access to Manuscripts and Browsing of Images) project [2] has produced a hypermedia system allowing historians, and more particularly codicologists and philologists, to read manuscripts, transcribe them, write annotations, and navigate between the words of the transcription and the matching piece of image in the numerized picture of the manuscript. The DEBORA (Digital access to Books Of RenAissance) project [3] tried to define the needs related to the use of virtual libraries and electronic books. The aim of this project was to develop tools for the digitization and the access to a selection of books of the 16th century. EAMMS (Electronic Access to Medieval Manuscripts) project [4] has developed guidelines for encoding and storing catalog descriptions of medieval and Renaissance manuscripts in electronic form. MASTER (Manuscript Access Through Standards for Electronic Records) [5]. This is a generic, robust and flexible enough to allow its application in different areas of the description of the manuscript. The chosen technology is based on international standards SGML (Standard General Markup Language) and XML (extensible Markup Language). Access to patrimonial documents, including old Arabic manuscripts, involves creating a database of images of these manuscripts. The realization of such a database requires a scanning operation followed by the creation of metadata specific to these manuscripts. Our work in this field, aims to facilitate information retrieval by identifying Arabic manuscripts by metadata and annotations in the images of pages. ISSN : Vol. 3 No. 6 June

2 We developed in a recent work a method [6] entitled "Searching in Arab manuscripts using metadata and annotations" to handle old manuscripts, this method is based on the SDX platform [7]. The drawback of this method is related to the manual creation of XML files associated to manuscripts. The generation of XML files needs collaborative effort of several experts to annotate, transcribe, and establish the content of these old documents before the webcast. To overcome this drawback, we propose in this paper a new platform that facilitates transcription, establishment of the old Arab handwritten text in the form of annotations by respecting a XML model, thus with this work the rare manuscripts will be available. 2. Structure of handwritten documents 2.1. Metadata Indexes by librarians in order to describe existing documents have a long history. Records are data used to qualify other data like book contents; that is why they are called metadata. We can encode some essential information about the documents in a clear fashion: title, author, date of publication, keywords, etc. Table 1 summarizes several categories of metadata used in our work as reported in [8] Schema Diagram Table 1. Metadata. N Adopted metadata N Adopted metadata 1 Author Types of 5 Material study Copyist Dating of paper of document Name of possessor Title of manuscripts Koran Other religious 2 Title Title of chapters Title of sub-chapters texts Scientists Islamic Jerusalem Medicines Preislamique Literary IV -X centuries 6 Type of Moslem Studied VI -VII centuries manuscript juridical 3 period Medieval Islamic Philosophical (VII -XV ) Histories All periods Etc 7 Study of the writing Style 4 Category of Manuscripts Arabo-christian Arabo-islamic 8 Decoration of the texts Enluminures Illustrations Frontispice Root Element The root element of the XML file is called msdescription and is an element that may reasonably appear either within the body or the header of a TEI (Text Encoding Initiative) [9] conformant document. In the former case, where the document being encoded is essentially a collection of manuscript descriptions, the <msdescription> element may be placed anywhere a paragraph might appear. In the latter case, where the description forms part of the metadata to be associated with a digital representation of some manuscript original, whether as a transcription, as a collection of digital images, or as some combination of the two, the <msdescription> element has the following components, of which only the first is mandatory. This has seven elements, each with its own complex type. ISSN : Vol. 3 No. 6 June

3 Fig. 1. Schema diagram for msdescription Element Msidentifier Element This element includes all the elements which allow the identification of the manuscript or fragment of manuscript. Fig. 2. Schema diagram for msidentifier Element Physdesc Element Under the general heading of `physical description' we subsume a large number of different aspects generally regarded as useful in the description of a given manuscript. These include aspects of the form, support, extent, and quire structure of the manuscript object; aspects of the writing... Fig. 3. Schema diagram for physdesc Element. ISSN : Vol. 3 No. 6 June

4 History Element Groups elements describing the full history of a manuscript or manuscript part. Fig. 4. Schema diagram for history Element Mscontents Element The <mscontents> element is used to describe the intellectual content of a manuscript or manuscript part. It comprises either a series of informal prose paragraphs or a series of more structured <msitem> elements, each of which provides a more detailed description of a single item contained within the manuscript. Fig. 5. Schema diagram for mscontents Element Logicstruct Element This element describes the logic structure of the document, contents, parts, chapter, etc. Fig. 6. Schema diagram for logicstruct Element. ISSN : Vol. 3 No. 6 June

5 Admininfo Element A variety of information relating to the duration and management of a manuscript may be recorded as simple prose narrative tagged using the standard <p> element. Fig. 7. Schema Diagram for Admininfo Element Additional Element Groups other related information about a manuscript, in particular, administrative information relating to its current location, additional materials associated with it, etc. Fig. 8. Schema diagram for additional Element. 3. Proposed method for indexing and annotating manuscripts 3.1. Principle of the proposed method The platform offers the interfaces customizable of images manuscript and provides searching and browsing functionality for users of Arabic manuscripts. This platform is based on XML representation for metadata, annotation and indexing the pages of the manuscript. Fig. 9. Digitization, processing, indexing and searching in handwritten documents. ISSN : Vol. 3 No. 6 June

6 3.2. Adding new documents: Images of manuscript document to be created are stored in folders; the browse button identifies the path of pages. The form offers the option of completing the values of metadata fields, the shutter choice of the topic determines and produces topics if they did not exist according to the schema XML [6]. (Library, book title, copier, support...) Fig. 10. Adding books according to a XML scheme Technique for Image Document Annotations In order to allow information retrieval by content, the readers can contribute to the creation of text annotations by selecting areas of the image of manuscript document [10], and they can also associate links, attach files (pdf, doc, XML, wav, mp3...). Fig. 11. Schema diagram for annotation. Annotated pages with different icons Fig. 12. Annotation of a page from a handwritten document [11]. ISSN : Vol. 3 No. 6 June

7 Textual annotation pages: An annotation created can be saved in XML format; it can be used as supporting data to search by content in the manuscript. Fig. 13. Results of research by the textual annotation Graphical annotation of pages We propose to establish documentary research according to the geometrical annotations' on the same level as of the textual annotations. In addition, we have created relationships between geometric annotations and text information specific to each type of annotation. The information has been saved in XML format to serve as search keywords. Fig. 14. Graphical Interface annotation of a page of a manuscript document [11]. 4. Conclusion We presented in this paper, a new platform of transcription and establishment by annotating images of Arabic manuscripts, these annotations are respecting a XML model. Our platform offers the search in the images of Arab handwritten document by using both metadata and annotations. Improvement of the interesting obtained results can be realized by using word spotting method, which is a semi-automatic annotation method. ISSN : Vol. 3 No. 6 June

8 References [1] Stéphane Nicolas, Thierry Paquet, Laurent Heutte, Digitizing Cultural Heritage Manuscripts : the Bovary Project. in ACM Symposium on Document Engineering, ACM Doc Eng 2003, Grenoble, France, pp [2] CALABRETTO, Sylvie; BOZZI, Andrea; PINON, Jean-Marie, décembre Numérisation des manuscrits médiévaux : le projet européen BAMBI, in: Actes du colloque Vers une nouvelle érudition: numérisation et recherche en histoire du livre, Rencontres Jacques Cartier, Lyon. [3] DEBORA: projet européen n. LB 5608 A. Coordinateur R. Bouché, juin pages. [4] [5] BURNARD, Lou. ; ROBINSON, PETER. Vers un standars européen de description des manuscrits : le projet Master. Document numérique. 1999, vol 3, no 1-2, p [6] O. El bannay, R.Benslimane, N. El makhfi and N. Rais Searching in Arab Manuscripts Using Metadata and Annotation European Journal of Scientific Research ISSN X Vol.28 No.1 (2009), pp EuroJournals. [7] [8] Hala kaileh, L accès à distance aux manuscrits arabes numérisés en mode image. Thèse présentée devant l université Lumière Lyon II. [9] [10] Bertrand Coüasnon, Ivan Leplumey. A Generic Recognition System for Making Archives Documents accessible to Public. Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) /03 $ IEEE. [11] Kitab nakhla, G582, Edition av 1862 National library Rabat Morocco. ISSN : Vol. 3 No. 6 June

Automatic Metadata Retrieval from Ancient Manuscripts

Automatic Metadata Retrieval from Ancient Manuscripts Automatic Metadata Retrieval from Ancient Manuscripts Frank Le Bourgeois 1 and Hala Kaileh 2 1 LIRIS, INSA de Lyon, France Frank.lebourgeois@liris.cnrs.fr http://liris.cnrs.fr/ 2 ENSSIB-Université Lumière

More information

Enriching Historical Manuscripts: The Bovary Project

Enriching Historical Manuscripts: The Bovary Project Enriching Historical Manuscripts: The Bovary Project Stéphane Nicolas, Thierry Paquet, and Laurent Heutte Laboratoire PSI CNRS FRE 2645 - Université de Rouen Place E. Blondel UFR des Sciences et Techniques

More information

INTRODUCING THE UNIFIED E-BOOK FORMAT AND A HYBRID LIBRARY 2.0 APPLICATION MODEL BASED ON IT. 1. Introduction

INTRODUCING THE UNIFIED E-BOOK FORMAT AND A HYBRID LIBRARY 2.0 APPLICATION MODEL BASED ON IT. 1. Introduction Преглед НЦД 14 (2009), 43 52 Teo Eterović, Nedim Šrndić INTRODUCING THE UNIFIED E-BOOK FORMAT AND A HYBRID LIBRARY 2.0 APPLICATION MODEL BASED ON IT Abstract: We introduce Unified e-book Format (UeBF)

More information

BUDDHIST STONE SCRIPTURES FROM SHANDONG, CHINA

BUDDHIST STONE SCRIPTURES FROM SHANDONG, CHINA BUDDHIST STONE SCRIPTURES FROM SHANDONG, CHINA Heidelberg Academy of Sciences and Humanities Research Group Buddhist Stone Scriptures in China Hauptstraße 113 69117 Heidelberg Germany marnold@zo.uni-heidelberg.de

More information

A tool for Entering Structural Metadata in Digital Libraries

A tool for Entering Structural Metadata in Digital Libraries A tool for Entering Structural Metadata in Digital Libraries Lavanya Prahallad, Indira Thammishetty, E.Veera Raghavendra, Vamshi Ambati MSIT Division, International Institute of Information Technology,

More information

Segmentation of Arabic handwritten text to lines

Segmentation of Arabic handwritten text to lines Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 73 (2015 ) 115 121 The International Conference on Advanced Wireless, Information, and Communication Technologies (AWICT

More information

Using Linked Data to Reduce Learning Latency for e-book Readers

Using Linked Data to Reduce Learning Latency for e-book Readers Using Linked Data to Reduce Learning Latency for e-book Readers Julien Robinson, Johann Stan, and Myriam Ribière Alcatel-Lucent Bell Labs France, 91620 Nozay, France, Julien.Robinson@alcatel-lucent.com

More information

Comp 336/436 - Markup Languages. Fall Semester Week 2. Dr Nick Hayward

Comp 336/436 - Markup Languages. Fall Semester Week 2. Dr Nick Hayward Comp 336/436 - Markup Languages Fall Semester 2017 - Week 2 Dr Nick Hayward Digitisation - textual considerations comparable concerns with music in textual digitisation density of data is still a concern

More information

The Text Encoding Initiative and manuscript studies

The Text Encoding Initiative and manuscript studies The Text Encoding Initiative and manuscript studies M. J. Driscoll Arnamagnæan Institute Copenhagen COMSt Conference Hamburg, 1-3 December 2009 The Text Encoding Initiative The TEI is an international

More information

PROCESSING AND CATALOGUING DATA AND DOCUMENTATION: QUALITATIVE

PROCESSING AND CATALOGUING DATA AND DOCUMENTATION: QUALITATIVE PROCESSING AND CATALOGUING DATA AND DOCUMENTATION: QUALITATIVE.... LIBBY BISHOP... INGEST SERVICES UNIVERSITY OF ESSEX... HOW TO SET UP A DATA SERVICE, 3 4 JULY 2013 PRE - PROCESSING Liaising with depositor:

More information

User Manual Al Manhal. All rights reserved v 3.0

User Manual Al Manhal. All rights reserved v 3.0 User Manual 1 2010-2016 Al Manhal. All rights reserved v 3.0 Table of Contents Conduct a Search... 3 1. USING SIMPLE SEARCH... 3 2. USING ADVANCED SEARCH... 4 Search Results List... 5 Browse... 7 1. BROWSE

More information

Interrogation System Architecture of Heterogeneous Data for Decision Making

Interrogation System Architecture of Heterogeneous Data for Decision Making Interrogation System Architecture of Heterogeneous Data for Decision Making Cécile Nicolle, Youssef Amghar, Jean-Marie Pinon Laboratoire d'ingénierie des Systèmes d'information INSA de Lyon Abstract Decision

More information

Syrtis: New Perspectives for Semantic Web Adoption

Syrtis: New Perspectives for Semantic Web Adoption Syrtis: New Perspectives for Semantic Web Adoption Joffrey Decourselle, Fabien Duchateau, Ronald Ganier To cite this version: Joffrey Decourselle, Fabien Duchateau, Ronald Ganier. Syrtis: New Perspectives

More information

Image and Text Coupling for Creating Electronic Books from Manuscripts

Image and Text Coupling for Creating Electronic Books from Manuscripts Image and Text Coupling for Creating Electronic Books from Manuscripts Laurent ROBERT Laurence LIKFORMAN-SULEM Eric LECOLINET Ecole Nationale Supérieure des Télécommunications Signal Processing Department

More information

Text Line Segmentation in Handwritten Document Using a Production System

Text Line Segmentation in Handwritten Document Using a Production System Text Line Segmentation in Handwritten Document Using a Production System Stéphane Nicolas, Thierry Paquet, Laurent Heutte Laboratoire PSI FRE CNRS 2645 - Université de Rouen Place E. Blondel, UFR des Sciences

More information

A Framework for Processing Complex Document-centric XML with Overlapping Structures Ionut E. Iacob and Alex Dekhtyar

A Framework for Processing Complex Document-centric XML with Overlapping Structures Ionut E. Iacob and Alex Dekhtyar A Framework for Processing Complex Document-centric XML with Overlapping Structures Ionut E. Iacob and Alex Dekhtyar ABSTRACT Management of multihierarchical XML encodings has attracted attention of a

More information

PROCESSING AND CATALOGUING DATA AND DOCUMENTATION - QUALITATIVE

PROCESSING AND CATALOGUING DATA AND DOCUMENTATION - QUALITATIVE PROCESSING AND CATALOGUING DATA AND DOCUMENTATION - QUALITATIVE....... INGEST SERVICES UNIVERSITY OF ESSEX... HOW TO SET UP A DATA SERVICE, 8-9 NOVEMBER 2012 PRE - PROCESSING Liaising with depositor: consent

More information

Guidelines for Developing Digital Cultural Collections

Guidelines for Developing Digital Cultural Collections Guidelines for Developing Digital Cultural Collections Eirini Lourdi Mara Nikolaidou Libraries Computer Centre, University of Athens Harokopio University of Athens Panepistimiopolis, Ilisia, 15784 70 El.

More information

Advances in Databases and Information Systems 1997

Advances in Databases and Information Systems 1997 ELECTRONIC WORKSHOPS IN COMPUTING Series edited by Professor C.J. van Rijsbergen Rainer Manthey and Viacheslav Wolfengagen (Eds) Advances in Databases and Information Systems 1997 Proceedings of the First

More information

Easy Ed: An Integration of Technologies for Multimedia Education 1

Easy Ed: An Integration of Technologies for Multimedia Education 1 Easy Ed: An Integration of Technologies for Multimedia Education 1 G. Ahanger and T.D.C. Little Multimedia Communications Laboratory Department of Electrical and Computer Engineering Boston University,

More information

The NYPL Digital Gallery. Jenny Singer, Lara Hanneman, Alana Verminski

The NYPL Digital Gallery. Jenny Singer, Lara Hanneman, Alana Verminski The NYPL Digital Gallery Jenny Singer, Lara Hanneman, Alana Verminski The New York Public Library The Digital Gallery. NYPL Digital Gallery is The New York Public Library's image database, developed to

More information

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 5: Multimedia description schemes

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 5: Multimedia description schemes INTERNATIONAL STANDARD ISO/IEC 15938-5 First edition 2003-05-15 Information technology Multimedia content description interface Part 5: Multimedia description schemes Technologies de l'information Interface

More information

MUSEUM MEETS ACADEMIA: THE GOSLAR TO GRASMERE PROJECT

MUSEUM MEETS ACADEMIA: THE GOSLAR TO GRASMERE PROJECT MUSEUM MEETS ACADEMIA: THE GOSLAR TO GRASMERE PROJECT 3 Midfields Walk Burgess Hill United Kingdom richard@light.demon.co.uk "Goslar to Grasmere" is a collaboration between the Wordsworth Trust and the

More information

Metadata Workshop 3 March 2006 Part 1

Metadata Workshop 3 March 2006 Part 1 Metadata Workshop 3 March 2006 Part 1 Metadata overview and guidelines Amelia Breytenbach Ria Groenewald What metadata is Overview Types of metadata and their importance How metadata is stored, what metadata

More information

A Web Service-Based System for Sharing Distributed XML Data Using Customizable Schema

A Web Service-Based System for Sharing Distributed XML Data Using Customizable Schema Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 A Web Service-Based System for Sharing Distributed XML Data Using Customizable

More information

Elba Project. Procedures and general norms used in the edition of the electronic book and in its storage in the digital library

Elba Project. Procedures and general norms used in the edition of the electronic book and in its storage in the digital library Procedures and general norms used in the edition of the electronic book and in its storage in the digital library Dabne - Tecnologías de la información http://dabne.net Index 1 Some procedures and norms

More information

The Extensible Markup Language (XML) and Java technology are natural partners in helping developers exchange data and programs across the Internet.

The Extensible Markup Language (XML) and Java technology are natural partners in helping developers exchange data and programs across the Internet. 1 2 3 The Extensible Markup Language (XML) and Java technology are natural partners in helping developers exchange data and programs across the Internet. That's because XML has emerged as the standard

More information

Enhanced retrieval using semantic technologies:

Enhanced retrieval using semantic technologies: Enhanced retrieval using semantic technologies: Ontology based retrieval as a new search paradigm? - Considerations based on new projects at the Bavarian State Library Dr. Berthold Gillitzer 28. Mai 2008

More information

T H E D I G I TA L L I B R A R Y

T H E D I G I TA L L I B R A R Y THE DIGITAL LIBRARY About MediaINFO MediaINFO is a complete software solution for intuitive viewing, browsing, searching, cataloging and sharing digitized content. It is powering some of the world s most

More information

Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey.

Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey. Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey. Chapter 1: Organization of Recorded Information The Need to Organize The Nature of Information Organization

More information

TEI, METS and ALTO, why we need all of them. Günter Mühlberger University of Innsbruck Digitisation and Digital Preservation

TEI, METS and ALTO, why we need all of them. Günter Mühlberger University of Innsbruck Digitisation and Digital Preservation TEI, METS and ALTO, why we need all of them Günter Mühlberger University of Innsbruck Digitisation and Digital Preservation Agenda Introduction Problem statement Proposed solution Starting point Mass digitisation

More information

Lou Burnard Consulting

Lou Burnard Consulting Getting started with oxygen Lou Burnard Consulting 2014-06-21 1 Introducing oxygen In this first exercise we will use oxygen to : create a new XML document gradually add markup to the document carry out

More information

A Digital Image Processing and Database System for Watermarks in Medieval Manuscripts

A Digital Image Processing and Database System for Watermarks in Medieval Manuscripts ichim 01 --. C U L T U R A L H E R I T A G E ~ ~ ~ T c H N O L O G Ii n E S t h e T H I R D M I L L E N N I U M A Digital Image Processing and Database System for Watermarks in Medieval Manuscripts Emanuel

More information

Interactive Handwritten Text Recognition and Indexing of Historical Documents: the transcriptorum Project

Interactive Handwritten Text Recognition and Indexing of Historical Documents: the transcriptorum Project Interactive Handwritten Text Recognition and ing of Historical Documents: the transcriptorum Project Alejandro H. Toselli ahector@prhlt.upv.es Pattern Recognition and Human Language Technology Reseach

More information

Integration Strategy for the Realization of an Adaptive Hypermedia System of Natural Dyes

Integration Strategy for the Realization of an Adaptive Hypermedia System of Natural Dyes International Journal of Systems Engineering 2017; 1(1): 36-42 http://www.sciencepublishinggroup.com/j/ijse doi: 10.11648/j.ijse.20170101.15 Integration Strategy for the Realization of an Adaptive Hypermedia

More information

Lisa Biagini & Eugenio Picchi, Istituto di Linguistica CNR, Pisa

Lisa Biagini & Eugenio Picchi, Istituto di Linguistica CNR, Pisa Lisa Biagini & Eugenio Picchi, Istituto di Linguistica CNR, Pisa Computazionale, INTERNET and DBT Abstract The advent of Internet has had enormous impact on working patterns and development in many scientific

More information

Final Report. Phase 2. Virtual Regional Dissertation & Thesis Archive. August 31, Texas Center Research Fellows Grant Program

Final Report. Phase 2. Virtual Regional Dissertation & Thesis Archive. August 31, Texas Center Research Fellows Grant Program Final Report Phase 2 Virtual Regional Dissertation & Thesis Archive August 31, 2006 Submitted to: Texas Center Research Fellows Grant Program 2005-2006 Submitted by: Fen Lu, MLS, MS Automated Services,

More information

How to use TRANSKRIBUS a very first manual

How to use TRANSKRIBUS a very first manual How to use TRANSKRIBUS a very first manual A simple standard workflow for humanities scholars and volunteers (screenshots below) 0.1.6, 2015-04-24 0. Introduction a. Transkribus is an expert tool. As with

More information

Open Digital Forms. Hiep Le, Thomas Rebele, Fabian Suchanek. HAL Id: hal

Open Digital Forms. Hiep Le, Thomas Rebele, Fabian Suchanek. HAL Id: hal Open Digital Forms Hiep Le, Thomas Rebele, Fabian Suchanek To cite this version: Hiep Le, Thomas Rebele, Fabian Suchanek. Open Digital Forms. Research and Advanced Technology for Digital Libraries - 20th

More information

A DIGITAL APPROACH TO HANDWRITTEN DOCUMENTS. B.I.T. - Bureau Ingénieur Tomasi

A DIGITAL APPROACH TO HANDWRITTEN DOCUMENTS. B.I.T. - Bureau Ingénieur Tomasi A DIGITAL APPROACH TO HANDWRITTEN DOCUMENTS B.I.T. - Bureau Ingénieur Tomasi Introduction Handwritten documents can for the most part not be read by computers today. Our technology such as it has been

More information

Summary of Bird and Simons Best Practices

Summary of Bird and Simons Best Practices Summary of Bird and Simons Best Practices 6.1. CONTENT (1) COVERAGE Coverage addresses the comprehensiveness of the language documentation and the comprehensiveness of one s documentation of one s methodology.

More information

Page Delivery Service User Guide

Page Delivery Service User Guide Harvard University Library Office for Information Systems Page Delivery Service User Guide The Page Delivery Service (PDS) delivers to a web browser scanned page images of books, diaries, reports, journals

More information

ALOE - A Socially Aware Learning Resource and Metadata Hub

ALOE - A Socially Aware Learning Resource and Metadata Hub ALOE - A Socially Aware Learning Resource and Metadata Hub Martin Memmel & Rafael Schirru Knowledge Management Department German Research Center for Artificial Intelligence DFKI GmbH, Trippstadter Straße

More information

Database of historical places, persons, and lemmas

Database of historical places, persons, and lemmas Database of historical places, persons, and lemmas Natalia Korchagina Outline 1. Introduction 1.1 Swiss Law Sources Foundation as a Digital Humanities project 1.2 Data to be stored 1.3 Final goal: how

More information

Reformulation of Contexts: A Design Concept for the Database of DSR Archive

Reformulation of Contexts: A Design Concept for the Database of DSR Archive Reformulation of Contexts: A Design Concept for the Database of DSR Archive Asanobu Kitamoto, Takeo Yamamoto, Sonoko Sato, and Kinji Ono National Institute of Informatics kitamoto@nii.ac.jp, ty@nii.ac.jp,

More information

IF ONLY WE D KNOWN: COLLECTING RESEARCH DATA

IF ONLY WE D KNOWN: COLLECTING RESEARCH DATA FACULTY LIBRARY OF ARTS & PHILOSOPHY IF ONLY WE D KNOWN: COLLECTING RESEARCH DATA Katrien Deroo - LW Research Day WHAT IS DATA MANAGEMENT ABOUT? Spreadsheet bursting at the seams Database reconstruction

More information

The Case of the 35 Gigabyte Digital Record: OCR and Digital Workflows

The Case of the 35 Gigabyte Digital Record: OCR and Digital Workflows Florida International University FIU Digital Commons Works of the FIU Libraries FIU Libraries 8-14-2015 The Case of the 35 Gigabyte Digital Record: OCR and Digital Workflows Kelley F. Rowan Florida International

More information

INFORMATIQUE ET MÉDECINE/COMPUTER AND MEDICINE ELECTRONIC SUBMISSION OF AN ARTICLE

INFORMATIQUE ET MÉDECINE/COMPUTER AND MEDICINE ELECTRONIC SUBMISSION OF AN ARTICLE INFORMATIQUE ET MÉDECINE/COMPUTER AND MEDICINE ELECTRONIC SUBMISSION OF AN ARTICLE http://www.lebanesemedicaljournal.org/articles/56-3/it1.pdf Adib A. MOUKARZEL 1, Stéphane B. BAZAN 2, Armen MAYALIAN 3

More information

TBX in ODD: Schema-agnostic specification and documentation for TermBase exchange

TBX in ODD: Schema-agnostic specification and documentation for TermBase exchange TBX in ODD: Schema-agnostic specification and documentation for TermBase exchange Stefan Pernes INRIA stefan.pernes@inria.fr Kara Warburton Termologic kara@termologic.com Laurent Romary INRIA laurent.romary@inria.fr

More information

Taming the TEI Tiger 6. Lou Burnard June 2004

Taming the TEI Tiger 6. Lou Burnard June 2004 Taming the TEI Tiger Lou Burnard June 2004 Today s topics The TEI and its architecture Working with the schema generator How does the TEI scheme work? In today s exercise, you ll learn how to build your

More information

METAINFORMATION INCORPORATION IN LIBRARY DIGITISATION PROJECTS

METAINFORMATION INCORPORATION IN LIBRARY DIGITISATION PROJECTS METAINFORMATION INCORPORATION IN LIBRARY DIGITISATION PROJECTS Michael Middleton QUT School of Information Systems, Brisbane, Australia. m.middleton@qut.edu.au This paper was accepted in Poster form and

More information

Collection Policy. Policy Number: PP1 April 2015

Collection Policy. Policy Number: PP1 April 2015 Policy Number: PP1 April 2015 Collection Policy The Digital Repository of Ireland is an interactive trusted digital repository for Ireland s contemporary and historical social and cultural data. The repository

More information

Creating an Accessible PDF

Creating an Accessible PDF Creating an Accessible PDF Montclair State University is committed to making our digital content accessible to people with disabilities (required by Section 508). This document will discuss best practices

More information

Part A: Getting started 1. Open the <oxygen/> editor (with a blue icon, not the author mode with a red icon).

Part A: Getting started 1. Open the <oxygen/> editor (with a blue icon, not the author mode with a red icon). DIGITAL PUBLISHING AND PRESERVATION USING TEI http://www.lib.umich.edu/digital-publishing-production/digital-publishing-and-preservation-using-tei-november-13-2010 Introductory TEI encoding 1 This exercise

More information

Text Mining for Historical Documents Digitisation and Preservation of Digital Data

Text Mining for Historical Documents Digitisation and Preservation of Digital Data Digitisation and Preservation of Digital Data Computational Linguistics Universität des Saarlandes Wintersemester 2010/11 21.02.2011 Digitisation Why digitise? Advantages of Digitisation safeguard against

More information

Adobe Bridge CS5.1 Voluntary Product Accessibility Template

Adobe Bridge CS5.1 Voluntary Product Accessibility Template Adobe Bridge CS5.1 Voluntary Product Accessibility Template The purpose of the Voluntary Product Accessibility Template is to assist Federal contracting officials in making preliminary assessments regarding

More information

Sharing the digital pedagogical resources among institutions of higher education in Morocco

Sharing the digital pedagogical resources among institutions of higher education in Morocco Sharing the digital pedagogical resources among institutions of higher education in Morocco H. Slimani (1), N. El Faddouli (2), M. Khalidi Idrissi (3) et S. Bennani (4) Mohammadia Engineering School, Mohammed

More information

3 Publishing Technique

3 Publishing Technique Publishing Tool 32 3 Publishing Technique As discussed in Chapter 2, annotations can be extracted from audio, text, and visual features. The extraction of text features from the audio layer is the approach

More information

An Integrated Framework to Enhance the Web Content Mining and Knowledge Discovery

An Integrated Framework to Enhance the Web Content Mining and Knowledge Discovery An Integrated Framework to Enhance the Web Content Mining and Knowledge Discovery Simon Pelletier Université de Moncton, Campus of Shippagan, BGI New Brunswick, Canada and Sid-Ahmed Selouani Université

More information

Web-based Internet Information and Application Checklist

Web-based Internet Information and Application Checklist REVIEWER INFORMATION Product Name: Version #: Reviewer Name: Date: Filenames/URL: Locations: Intranet Training Academy DCMA360 Other (explain) REVIEW GUIDELINES Complete this review, using the following

More information

Comparing Open Source Digital Library Software

Comparing Open Source Digital Library Software Comparing Open Source Digital Library Software George Pyrounakis University of Athens, Greece Mara Nikolaidou Harokopio University of Athens, Greece Topic: Digital Libraries: Design and Development, Open

More information

Metadata Standards and Applications

Metadata Standards and Applications Clemson University TigerPrints Presentations University Libraries 9-2006 Metadata Standards and Applications Scott Dutkiewicz Clemson University Derek Wilmott Clemson University, rwilmot@clemson.edu Follow

More information

Adobe RoboHelp 9 Voluntary Product Accessibility Template

Adobe RoboHelp 9 Voluntary Product Accessibility Template Adobe RoboHelp 9 Voluntary Product Accessibility Template The purpose of the Voluntary Product Accessibility Template is to assist Federal contracting officials in making preliminary assessments regarding

More information

Data Exchange and Conversion Utilities and Tools (DExT)

Data Exchange and Conversion Utilities and Tools (DExT) Data Exchange and Conversion Utilities and Tools (DExT) Louise Corti, Angad Bhat, Herve L Hours UK Data Archive CAQDAS Conference, April 2007 An exchange format for qualitative data Data exchange models

More information

The Trustworthiness of Digital Records

The Trustworthiness of Digital Records The Trustworthiness of Digital Records International Congress on Digital Records Preservation Beijing, China 16 April 2010 1 The Concept of Record Record: any document made or received by a physical or

More information

Automatic Metadata Extraction for Archival Description and Access

Automatic Metadata Extraction for Archival Description and Access Automatic Metadata Extraction for Archival Description and Access WILLIAM UNDERWOOD Georgia Tech Research Institute Abstract: The objective of the research reported is this paper is to develop techniques

More information

Practical Experiences with Ingesting Materials for Long-Term Preservation

Practical Experiences with Ingesting Materials for Long-Term Preservation Practical Experiences with Ingesting Materials for Long-Term Preservation Esa-Pekka Keskitalo 20.10.2011 Digital Preservation Summit 2011, Hamburg Overview About the National

More information

The Necessity of a New Culture of Electronic Publishing C A S L I N

The Necessity of a New Culture of Electronic Publishing C A S L I N Humboldt-University at Berlin Computer and Media Services The Necessity of a New Culture of Electronic Publishing C A S L I N 2004 Dr. Peter Schirmbacher Humboldt-University at Berlin Computer and Media

More information

Building OWL Ontology of Unique Bulgarian Bells Using Protégé Platform

Building OWL Ontology of Unique Bulgarian Bells Using Protégé Platform Building OWL Ontology of Unique Bulgarian Bells Using Protégé Platform Galina Bogdanova 1, Kilian Stoffel 2, Todor Todorov 1, Nikolay Noev 1 1 Institute of Mathematics and Informatics, Bulgarian Academy

More information

Unit 3 Corpus markup

Unit 3 Corpus markup Unit 3 Corpus markup 3.1 Introduction Data collected using a sampling frame as discussed in unit 2 forms a raw corpus. Yet such data typically needs to be processed before use. For example, spoken data

More information

Scientific Data Management for Visualization

Scientific Data Management for Visualization The Turbine Simulation Project Scientific Data Management for Visualization Implementation Experience The Turbine Simulation project brings together academic and industrial partners to develop an advanced

More information

Dexterity: Data Exchange Tools and Standards for Social Sciences

Dexterity: Data Exchange Tools and Standards for Social Sciences Dexterity: Data Exchange Tools and Standards for Social Sciences Louise Corti, Herve L Hours, Matthew Woollard (UKDA) Arofan Gregory, Pascal Heus (ODaF) I-Pres, 29-30 September 2008, London Introduction

More information

ISO/IEC INTERNATIONAL STANDARD. Systems and software engineering Requirements for designers and developers of user documentation

ISO/IEC INTERNATIONAL STANDARD. Systems and software engineering Requirements for designers and developers of user documentation INTERNATIONAL STANDARD ISO/IEC 26514 First edition 2008-06-15 Systems and software engineering Requirements for designers and developers of user documentation Ingénierie du logiciel et des systèmes Exigences

More information

Networked Access to Library Resources

Networked Access to Library Resources Institute of Museum and Library Services National Leadership Grant Realizing the Vision of Networked Access to Library Resources An Applied Research and Demonstration Project to Establish and Operate a

More information

Creating Word Outlines from Compendium on a Mac

Creating Word Outlines from Compendium on a Mac Creating Word Outlines from Compendium on a Mac Using the Compendium Outline Template and Macro for Microsoft Word for Mac: Background and Tutorial Jeff Conklin & KC Burgess Yakemovic, CogNexus Institute

More information

A Collaboration Model between Archival Systems to Enhance the Reliability of Preservation by an Enclose-and-Deposit Method

A Collaboration Model between Archival Systems to Enhance the Reliability of Preservation by an Enclose-and-Deposit Method A Collaboration Model between Archival Systems to Enhance the Reliability of Preservation by an Enclose-and-Deposit Method Koichi Tabata, Takeshi Okada, Mitsuharu Nagamori, Tetsuo Sakaguchi, and Shigeo

More information

From Individual Solutions to Generic Tools Digitization at the Max Planck Society. Digitization Day 2012, Geneva Andrea Kulas

From Individual Solutions to Generic Tools Digitization at the Max Planck Society. Digitization Day 2012, Geneva Andrea Kulas From Individual Solutions to Generic Tools Digitization at the Max Planck Society Digitization Day 2012, Geneva Andrea Kulas To start with. Differences. Journals (different locations!) Rare books (dating

More information

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Katsuya Masuda *, Makoto Tanji **, and Hideki Mima *** Abstract This study proposes a framework to access to the

More information

Semi-Automatic Techniques for Generating BIM Façade Models of Historic Buildings

Semi-Automatic Techniques for Generating BIM Façade Models of Historic Buildings Semi-Automatic Techniques for Generating BIM Façade Models of Historic Buildings C. Dore, M. Murphy School of Surveying & Construction Management Dublin Institute of Technology Bolton Street Campus, Dublin

More information

Unicode Encoding. The TITUS Project

Unicode Encoding. The TITUS Project Unicode Encoding and Online Data Access Ralf Gehrke / Jost Gippert The TITUS Project ( Thesaurus indogermanischer Text- und Sprachmaterialien ) (since 1987/1993) www.ala.org/alcts 1 Scope of the TITUS

More information

Information retrieval concepts Search and browsing on unstructured data sources Digital libraries applications

Information retrieval concepts Search and browsing on unstructured data sources Digital libraries applications Digital Libraries Agenda Digital Libraries Information retrieval concepts Search and browsing on unstructured data sources Digital libraries applications What is Library Collection of books, documents,

More information

Text Encoding Fundamentals: Element list

Text Encoding Fundamentals: Element list Text Encoding Fundamentals: Element list Elements for basic TEI documents This is more of a brief reference sheet than an exhaustive list of TEI elements: it is intended to provide you with a way to look

More information

Xyleme Studio Data Sheet

Xyleme Studio Data Sheet XYLEME STUDIO DATA SHEET Xyleme Studio Data Sheet Rapid Single-Source Content Development Xyleme allows you to streamline and scale your content strategy while dramatically reducing the time to market

More information

SEARCH SEMI-STRUCTURED DATA ON WEB

SEARCH SEMI-STRUCTURED DATA ON WEB SEARCH SEMI-STRUCTURED DATA ON WEB Sabin-Corneliu Buraga 1, Teodora Rusu 2 1 Faculty of Computer Science, Al.I.Cuza University of Iaşi, Romania Berthelot Str., 16 6600 Iaşi, Romania, tel: +40 (32 201529,

More information

DESIGN PATTERN MATCHING

DESIGN PATTERN MATCHING PERIODICA POLYTECHNICA SER. EL. ENG. VOL. 47, NO. 3 4, PP. 205 212 (2003) DESIGN PATTERN MATCHING Dániel PETRI and György CSERTÁN Department of Measurement and Information Systems Budapest University of

More information

For those of you who may not have heard of the BHL let me give you some background. The Biodiversity Heritage Library (BHL) is a consortium of

For those of you who may not have heard of the BHL let me give you some background. The Biodiversity Heritage Library (BHL) is a consortium of 1 2 For those of you who may not have heard of the BHL let me give you some background. The Biodiversity Heritage Library (BHL) is a consortium of natural history and botanical libraries that cooperate

More information

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites Access IT Training 2003 Google indexed 3,3 billion of pages http://searchenginewatch.com/3071371 2005 Google s index contains 8,1 billion of websites http://blog.searchenginewatch.com/050517-075657 Estimated

More information

Adobe InDesign CC Voluntary Product Accessibility Template

Adobe InDesign CC Voluntary Product Accessibility Template Adobe InDesign CC Voluntary Product Accessibility Template The purpose of the Voluntary Product Accessibility Template is to assist Federal contracting officials in making preliminary assessments regarding

More information

Adobe Business Catalyst Voluntary Product Accessibility Template

Adobe Business Catalyst Voluntary Product Accessibility Template Adobe Business Catalyst Voluntary Product Accessibility Template The purpose of the Voluntary Product Accessibility Template is to assist Federal contracting officials in making preliminary assessments

More information

Growing interests in. Urgent needs of. Develop a fieldworkers toolkit (fwtk) for the research of endangered languages

Growing interests in. Urgent needs of. Develop a fieldworkers toolkit (fwtk) for the research of endangered languages ELPR IV International Conference 2002 Topics Reitaku University College of Foreign Languages Developing Tools for Creating-Maintaining-Analyzing Field Shoju CHIBA Reitaku University, Japan schiba@reitaku-u.ac.jp

More information

NISO STS (Standards Tag Suite) Differences Between ISO STS 1.1 and NISO STS 1.0. Version 1 October 2017

NISO STS (Standards Tag Suite) Differences Between ISO STS 1.1 and NISO STS 1.0. Version 1 October 2017 NISO STS (Standards Tag Suite) Differences Between ISO STS 1.1 and NISO STS 1.0 Version 1 October 2017 1 Introduction...1 1.1 Four NISO STS Tag Sets...1 1.2 Relationship of NISO STS to ISO STS...1 1.3

More information

Adobe Illustrator CS5.1 Voluntary Product Accessibility Template

Adobe Illustrator CS5.1 Voluntary Product Accessibility Template Adobe Illustrator CS5.1 Voluntary Product Accessibility Template The purpose of the Voluntary Product Accessibility Template is to assist Federal contracting officials in making preliminary assessments

More information

Fueling Time Machine: Information Extraction from Retro-Digitised Address Directories

Fueling Time Machine: Information Extraction from Retro-Digitised Address Directories Fueling Time Machine: Information Extraction from Retro-Digitised Address Directories Mohamed Khemakhem, Carmen Brando, Laurent Romary, Frédérique Mélanie-Becquet, Jean-Luc Pinol To cite this version:

More information

AVS4YOU Programs Help

AVS4YOU Programs Help AVS4YOU Help - AVS Document Converter AVS4YOU Programs Help AVS Document Converter www.avs4you.com Online Media Technologies, Ltd., UK. 2004-2012 All rights reserved AVS4YOU Programs Help Page 2 of 39

More information

Semantic Indexing of Algorithms Courses Based on a New Ontology

Semantic Indexing of Algorithms Courses Based on a New Ontology Semantic Indexing of Algorithms Courses Based on a New Ontology EL Guemmat Kamal 1, Benlahmer Elhabib 2, Talea Mohamed 1, Chara Aziz 2, Rachdi Mohamed 2 1 Université Hassan II - Mohammedia Casablanca,

More information

Enterprise Multimedia Integration and Search

Enterprise Multimedia Integration and Search Enterprise Multimedia Integration and Search José-Manuel López-Cobo 1 and Katharina Siorpaes 1,2 1 playence, Austria, 2 STI Innsbruck, University of Innsbruck, Austria {ozelin.lopez, katharina.siorpaes}@playence.com

More information

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Extracting meaning

More information

Adobe Illustrator CC Voluntary Product Accessibility Template

Adobe Illustrator CC Voluntary Product Accessibility Template Adobe Illustrator CC Voluntary Product Accessibility Template The purpose of the Voluntary Product Accessibility Template is to assist Federal contracting officials in making preliminary assessments regarding

More information

User Manual. ACM MAC Word Template. (MAC 2016 version)

User Manual. ACM MAC Word Template. (MAC 2016 version) User Manual ACM MAC Word Template (MAC 2016 version) By Aptara Technology P a g e 1 31 Contents 1. INTRODUCTION... 3 2. Prerequisites and Installation... 3 a. Software requirements... 3 b. Operating system

More information

Adobe Experience Manager (AEM) 6.2 Forms Workbench Voluntary Product Accessibility Template

Adobe Experience Manager (AEM) 6.2 Forms Workbench Voluntary Product Accessibility Template Adobe Experience Manager (AEM) 6.2 Forms Workbench Voluntary Product Accessibility Template The purpose of the Voluntary Product Accessibility Template is to assist Federal contracting officials in making

More information