Open Digital Forms. Hiep Le, Thomas Rebele, Fabian Suchanek. HAL Id: hal

Similar documents
The New Territory of Lightweight Security in a Cloud Computing Environment

Fault-Tolerant Storage Servers for the Databases of Redundant Web Servers in a Computing Grid

Multimedia CTI Services for Telecommunication Systems

Tacked Link List - An Improved Linked List for Advance Resource Reservation

Mokka, main guidelines and future

How to simulate a volume-controlled flooding with mathematical morphology operators?

BoxPlot++ Zeina Azmeh, Fady Hamoui, Marianne Huchard. To cite this version: HAL Id: lirmm

Linked data from your pocket: The Android RDFContentProvider

YANG-Based Configuration Modeling - The SecSIP IPS Case Study

Blind Browsing on Hand-Held Devices: Touching the Web... to Understand it Better

Service Reconfiguration in the DANAH Assistive System

Natural Language Based User Interface for On-Demand Service Composition

YAM++ : A multi-strategy based approach for Ontology matching task

Assisted Policy Management for SPARQL Endpoints Access Control

SIM-Mee - Mobilizing your social network

KeyGlasses : Semi-transparent keys to optimize text input on virtual keyboard

Setup of epiphytic assistance systems with SEPIA

An FCA Framework for Knowledge Discovery in SPARQL Query Answers

Mapping classifications and linking related classes through SciGator, a DDC-based browsing library interface

IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs

Comparison of spatial indexes

NP versus PSPACE. Frank Vega. To cite this version: HAL Id: hal

The Connectivity Order of Links

MUTE: A Peer-to-Peer Web-based Real-time Collaborative Editor

Privacy-preserving carpooling

Relabeling nodes according to the structure of the graph

Robust IP and UDP-lite header recovery for packetized multimedia transmission

Modularity for Java and How OSGi Can Help

Teaching Encapsulation and Modularity in Object-Oriented Languages with Access Graphs

Study on Feebly Open Set with Respect to an Ideal Topological Spaces

Quality of Service Enhancement by Using an Integer Bloom Filter Based Data Deduplication Mechanism in the Cloud Storage Environment

Simulations of VANET Scenarios with OPNET and SUMO

ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler

Catalogue of architectural patterns characterized by constraint components, Version 1.0

BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs

The optimal routing of augmented cubes.

UsiXML Extension for Awareness Support

Malware models for network and service management

Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints

QAKiS: an Open Domain QA System based on Relational Patterns

Moveability and Collision Analysis for Fully-Parallel Manipulators

Very Tight Coupling between LTE and WiFi: a Practical Analysis

Change Detection System for the Maintenance of Automated Testing

Application of RMAN Backup Technology in the Agricultural Products Wholesale Market System

Syrtis: New Perspectives for Semantic Web Adoption

HySCaS: Hybrid Stereoscopic Calibration Software

A Practical Evaluation Method of Network Traffic Load for Capacity Planning

XBenchMatch: a Benchmark for XML Schema Matching Tools

QuickRanking: Fast Algorithm For Sorting And Ranking Data

Every 3-connected, essentially 11-connected line graph is hamiltonian

Structuring the First Steps of Requirements Elicitation

A Short SPAN+AVISPA Tutorial

The Sissy Electro-thermal Simulation System - Based on Modern Software Technologies

Regularization parameter estimation for non-negative hyperspectral image deconvolution:supplementary material

Representation of Finite Games as Network Congestion Games

Sewelis: Exploring and Editing an RDF Base in an Expressive and Interactive Way

Stream Ciphers: A Practical Solution for Efficient Homomorphic-Ciphertext Compression

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

LaHC at CLEF 2015 SBS Lab

A Voronoi-Based Hybrid Meshing Method

DANCer: Dynamic Attributed Network with Community Structure Generator

Comparison of radiosity and ray-tracing methods for coupled rooms

Reverse-engineering of UML 2.0 Sequence Diagrams from Execution Traces

Scan chain encryption in Test Standards

Formal modelling of ontologies within Event-B

FStream: a decentralized and social music streamer

Decentralised and Privacy-Aware Learning of Traversal Time Models

An Implementation of a Paper Based Authentication Using HC2D Barcode and Digital Signature

Operation of Site Running StratusLab toolkit v1.0

X-Kaapi C programming interface

XML Document Classification using SVM

lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes

Real-Time Collision Detection for Dynamic Virtual Environments

Sliding HyperLogLog: Estimating cardinality in a data stream

FIT IoT-LAB: The Largest IoT Open Experimental Testbed

A N-dimensional Stochastic Control Algorithm for Electricity Asset Management on PC cluster and Blue Gene Supercomputer

Paris-Lille-3D: A Point Cloud Dataset for Urban Scene Segmentation and Classification

Aligning Legivoc Legal Vocabularies by Crowdsourcing

Comparator: A Tool for Quantifying Behavioural Compatibility

Linux: Understanding Process-Level Power Consumption

Hardware Acceleration for Measurements in 100 Gb/s Networks

Real-Time and Resilient Intrusion Detection: A Flow-Based Approach

Zigbee Wireless Sensor Network Nodes Deployment Strategy for Digital Agricultural Data Acquisition

Rule-Based Application Development using Webdamlog

Multi-atlas labeling with population-specific template and non-local patch-based label fusion

Emerging and scripted roles in computer-supported collaborative learning

Branch-and-price algorithms for the Bi-Objective Vehicle Routing Problem with Time Windows

Fuzzy sensor for the perception of colour

The Athena data dictionary and description language

Fueling Time Machine: Information Extraction from Retro-Digitised Address Directories

Taking Benefit from the User Density in Large Cities for Delivering SMS

RecordMe: A Smartphone Application for Experimental Collections of Large Amount of Data Respecting Volunteer s Privacy

The Proportional Colouring Problem: Optimizing Buffers in Radio Mesh Networks

[Demo] A webtool for analyzing land-use planning documents

Generative Programming from a Domain-Specific Language Viewpoint

GDS Resource Record: Generalization of the Delegation Signer Model

From medical imaging to numerical simulations

DSM GENERATION FROM STEREOSCOPIC IMAGERY FOR DAMAGE MAPPING, APPLICATION ON THE TOHOKU TSUNAMI

From Microsoft Word 2003 to Microsoft Word 2007: Design Heuristics, Design Flaws and Lessons Learnt

Technical Overview of F-Interop

Transcription:

Open Digital Forms Hiep Le, Thomas Rebele, Fabian Suchanek To cite this version: Hiep Le, Thomas Rebele, Fabian Suchanek. Open Digital Forms. Research and Advanced Technology for Digital Libraries - 20th International Conference on Theory and Practice of Digital Libraries, TPDL 2016, Hannover, Germany, September 5-9, 2016, Proceedings, pp.454-458, 2016, <10.1007/978-3-319-43997-6_43>. <hal-01369077> HAL Id: hal-01369077 https://hal.archives-ouvertes.fr/hal-01369077 Submitted on 21 Sep 2016 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Distributed under a Creative Commons Attribution - NonCommercial 4.0 International License

Open Digital Forms Hiep Le, Thomas Rebele, Fabian Suchanek Télécom ParisTech, Paris, France Abstract. The maintenance of digital libraries often passes through physical paper forms. Such forms are tedious to handle for both senders and receivers. Several commercial solutions exist for the digitization of forms. However, most of them are proprietary, expensive, centralized, or require software installation. With this demo, we propose a free, secure, and lightweight framework for digital forms. It is based on HTML documents with embedded JavaScript, it uses exclusively open standards, and it does not require a centralized architecture. Our forms can be digitally signed with the OpenPGP standard, and they contain machine-readable RDFa. Thus, they allow for the semantic analysis, sharing, re-use, or merger of documents across users or institutions. 1 Introduction The world s digital libraries contain millions of documents, visual artwork, and audio material. The maintenance of these libraries often passes through physical paper forms. Out of the 50 first English digital library projects on Wikipedia 1, more than half offer a service that requires sending a paper form by letter. Such paper forms are used for membership requests, for licensing, for revocation requests, etc. In one case, 3 different forms have to be filled out and sent physically to the library 2. The dependence on paper forms is particularly surprising for projects that aim to digitize paper documents, but the problem is of course in no way restricted to digital libraries. Paper forms require substantial manual work for the sender and the receiver. Therefore, several commercial solutions exist for the digitization of forms. However, most of these are proprietary, pricy, centralized, or require software installation (see Section 2). Thus, the challenge is to develop a solution so that no additional software is required to display and fill forms. only open standards are used, so everyone can use the format and software. the user remains in control of their data. With this demo, we propose a free, secure, and lightweight framework for digital forms that fulfills these desiderata: Our solution is based on HTML documents, and requires only a browser. The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-43997-6_43 1 https://en.wikipedia.org/wiki/list_of_digital_library_projects 2 http://www.ahds.ac.uk/depositing/how-to-deposit.htm

2 Le et al. Our solution uses exclusively open standards, which makes the forms easy to parse, verify, produce, or modify. Our solution does not require a centralized architecture. The user remains in complete control of their data. Our forms can be digitally signed with the OpenPGP standard, which can be verified by open software. Our forms are machine-readable in RDFa, thus allowing the semantic analysis, sharing, re-use, or merger of forms. 2 Related Work Commercial Solutions. Adobe Acrobat, LibreOffice and others support the creation of Portable Document Format (PDF) forms. The full range of capabilities (such as electronic signature, digital signature, or importing from other forms) requires the installation of one or several proprietary software packages. Microsoft Word documents can also be used as forms, and can be digitally signed. The certificates are mostly not free 3. In addition, such forms require the installation of large software packages. The same goes for IBM Lotus. Online Services. Several cloud-based services (such as Google Forms) offer online polls or online collaboration on documents. However, they typically do not allow digital signing. Other services provide form creation and filling capabilities 4. The user creates forms and sends a link to the person, who fills out the form online. However, the form creator needs to pay per form. Furthermore, the user data has to pass through a central architecture. Open Standards. Various form description formats exist, such as XForms, XUL, Vexi, ZF, XAML, and others 5. However, all of them either require programming experience or special software to create and fill the forms. XML-based Solutions. Kuo et al. generate HTML forms from a XML schema, but do not allow signing the documents [3]. Boyer proposed a combination of ODF and XForms, which allows saving and sending the forms per email, filling them offline, and signing them with XML Signatures [1, 2]. However, users would need additional software to use or edit the combination of ODF and XForms. 3 Open Digital Forms Infrastructure. We decided for a decentralized solution, where forms are not stored in a central repository, but kept and sent by the users themselves. This has two advantages: First, the users remain in total control of their data, and they remain independent of a service provider. Second, our format can be used in a grass-root fashion. It can easily complement other types of forms that are mandated by the host institution. Format. Our forms are in HTML, because this format allows basic formatting while at the same time being human-readable. HTML can also be displayed on almost any digital device with no additional software. A form consists of a 3 https://help.libreoffice.org/common/applying_digital_signatures 4 http://www.jotform.com/, http://www.moreapp.com/ 5 https://en.wikipedia.org/wiki/list_of_user_interface_markup_languages

Open Digital Forms 3 title and several sections (DIVs). Each section describes one particular entity of interest, such as a client, a digital library, or a licensed content. Each section contains a list of attributes of that entity, such as the birth date and address of a client, the name of a library, or the type and the price of the licensed content. An HTML INPUT field for each attribute stores its value. Semantic Annotation. Our forms are semantically annotated, so that machines can read and analyze them. We use RDFa for this purpose, the W3C standard for semantical annotation of Web documents. This makes our forms compatible with the Semantic Web and Linked Data standards. By default, we use the vocabulary from schema.org, which is designed, supported, and maintained by Google, Bing, Yandex, and Yahoo 6. Each section of our document defines a new entity (by help of typeof), and each attribute in that section creates a new property of that entity (by help of property). Signing. Our documents can be electronically signed by an image of a scanned signature. For this purpose, the form contains a button which allows the user to select the image file from their hard drive. The image is then embedded as a base-64 encoded string in a data:image URL directly into the HTML document. This way, the form remains self-contained, and the user has to send only a single file if they want to share the signed document. Digital signing. The forms can also be digitally signed. We opted for PGP for this purpose, because unlike S/MIME it does not require a central certification authority. PGP allows signing documents with ASCII armor, i.e. a plain text header and footer containing the signature. GnuPGP accepts a signed document enclosed by additional text. We sign only the content of the HTML tag, so that we retain a valid HTML document. CSS styling hides the ASCII armor. 4 Demo Creating a form. Our tool for creating forms is an open source JavaScript program 7. If Alice wants to create a form, she can just visit our Web page 8. She can drag-and-drop section elements from the toolbar into the form (Figure 1 left). For each section, Alice can choose the section type (book, institution, etc.) from a drop-down menu from the schema.org vocabulary. She can also name each section (which creates an instance of the type in the underlying RDFa code). Alice can then drag controls such as text fields and check boxes into the sections. She can assign a caption and an attribute name to each control. We offer predefined attributes such as birthdate, email, and other properties from schema.org. Alice can also create her own property, which will then live in the local namespace of the document. In the end, Alice can download the form as a self-contained HTML document, which she can store for later modification, send by email or, if need be, print. Filling a form. If Bob receives the form, he can display it on any HTML-enabled device (Figure 1 right). He can fill out the form by typing into the INPUT elements 6 http://schema.org 7 https://github.com/lenguyenhaohiep/formless3 8 https://rawgit.com/lenguyenhaohiep/formless3/master/

4 Le et al. Fig. 1. Our tool (left) and the resulting form (right) of the form. The input elements use standard HTML 5 validation to check the format of dates, emails, etc. Filling and saving the form is done by embedded JavaScript, and does not need an Internet connection. Importing. Since our forms are semantically annotated, Bob can also import data from other forms he already filled. This functionality is provided by a JavaScript program that is loaded from our Web page on demand if Bob clicks on the Import button. The program proposes a match between the existing and the incoming sections, and Bob can accept, reject, or modify this proposed match. Since the semantic annotations are taken from a standard vocabulary, such imports work also across forms from different institutions. Signing the form. Finally, Bob can sign the document by attaching the image of a scanned physical signature to the document. This functionality is embedded in the document, and thus available offline. Bob can also add a digital signature with his private PGP key. This functionality is provided by JavaScript code that is loaded on demand. The private key never leaves the computer, and is used just to generate the signature. After this step, no modifications of the form are possible, and Bob can send around the signed document. Alice can check the validity of the signature in our tool, by providing the form and the public key of Bob. She can also verify the signature with any standard tool such as OpenPGP, Gpg4win, or GPGsuite. Since the document is self-contained and nearly human-readable, users can also always inspect the workings of the form. 5 Conclusion Our demo proposes a free and secure format for digital forms, which is based completely on open standards. It provides a mechanism of information integration even across institutions due to the use of Semantic Web standards, and therefore supports the infrastructure of digital libraries. Acknowledgments This research was partially supported by Labex DigiCosme (project ANR-11-LABEX-0045-DIGICOSME) operated by ANR as part of the program Investissement d Avenir Idex Paris-Saclay (ANR-11-IDEX-0003-02). References 1. J. M. Boyer. Interactive office documents. In Document Engineering, 2008. 2. J. M. Boyer, C. F. Wiecha, and R. P. Akolkar. Interactive Web Documents. Computer Science - R&D, 27(2), May 2012. 3. Y.-S. Kuo, N. C. Shih, L. Tseng, and H.-C. Hu. Generating form-based user interfaces for XML vocabularies. In Document Engineering, 2005.