The New Document Digital Polymorphic Ubiquitous Actionable Patrick P. Bergmans University of Ghent

Similar documents
Part III: Survey of Internet technologies

M4-R4: INTRODUCTION TO MULTIMEDIA (JAN 2019) DURATION: 03 Hrs

Characterisation. Digital Preservation Planning: Principles, Examples and the Future with Planets. July 29 th, 2008

Creating & Sending PDF Files Using Piedmont s Print Driver. PageMaker PC

MEDIA RELATED FILE TYPES

USER S GUIDE Software/Hardware Module: ADOBE ACROBAT 7

Roll No. :... Invigilator's Signature : GRAPHICS AND MULTIMEDIA. Time Allotted : 3 Hours Full Marks : 70

PRODUCT SHEET. LookAt Technologies LTD

What s New in QuarkXPress 2018

Ad Creation Guide. Table of Contents

9/8/2016. Characteristics of multimedia Various media types

R.L. HAMMETTE & ASSOCIATES

Elementary Computing CSC 100. M. Cheng, Computer Science

ATLAS.ti 6 Features Overview

III-6Exporting Graphics (Windows)

Submission Guidelines

PDF and Accessibility

How. Can Acrobat Help My Bar Association? Catherine Sanders Reach ABA Legal Technology Resource Center

Ten Ways to Share Your Publications With the World: A Guide to Creating Accessible PDF Documents in Adobe Acrobat Professional 7.

DOWNLOAD OR READ : THE IMAGE OF THE POPULAR FRONT PDF EBOOK EPUB MOBI

Alternate Format for STEM

- What we actually mean by documents (the FRBR hierarchy) - What are the components of documents

ScholarOne Manuscripts. Author File Upload Guide

Standard File Formats

Prentice Hall. Learning Microsoft PowerPoint , (Weixel et al.) Arkansas Multimedia Applications I - Curriculum Content Frameworks

XML Update. Royal Society of the Arts London, December 8, Jon Bosak Sun Microsystems

Programs We Support. We accept files created in these major design and layout programs. Please contact us if you do not see your program listed below.

An Audio View of (L A )TEX Documents Part II

This guideline cannot anticipate all operating systems and software versions, therefore general instructions are provided.

DjVu Technology Primer

Different File Types and their Use

HOW TO SAVE YOUR DESIGN FILES

CHAPTER 2 MARKUP LANGUAGES: XHTML 1.0

What s New. Essential Studio Reporting Edition 2012 Volume 2

3.01C Multimedia Elements and Guidelines Explore multimedia systems, elements and presentations.

AVS4YOU Programs Help

Scan to PC Desktop Professional v9 vs. Scan to PC Desktop SE v9 + SE

CONVERT EXCEL DOCUMENT INTO

Paraben s Network Examiner 7.0 Release Notes

From SGML to HTML and back. From SGML to HTML

What you will learn 2. Converting to PDF Format 15 Converting to PS Format 16 Converting to HTML format 17 Saving and Updating documents 19

What You See Is What You Sign Trustworthy Display of XML Documents for Signing and Verification

Unicode. Standard Alphanumeric Formats. Unicode Version 2.1 BCD ASCII EBCDIC

Question No: 2 Which part of the structured FrameMaker application controls how long SGML and FrameMaker element names can be by default?

ABBYY FineReader 14. User s Guide ABBYY Production LLC. All rights reserved.

How to Create a PDF. Using Acrobat Distiller. Acrobat Distiller settings. Adobe Acrobat Professional 8.0 Guide

Text Languages and Properties

Press-Ready Cookbook Page Guidelines

How to make a PDF from inside Acrobat

PDF I N S T R U C T I O N GUIDE

Perfect PDF 9 Premium

Discovering Computers Chapter 5 Input. CSA 111 College of Applied Studies UOB

The Use of Search Engines for Massively Scalable Forensic Repositories

You can make your own layout / theme for your PowerPoint project.

Understanding the Web Design Environment. Principles of Web Design, Third Edition

Lecture 5. Digital Media Components Markup and Scripting Languages Multimedia Tools Facilities Provided by the School Suggested Reading

Dr. Shahanawaj Ahamad. Dr. S.Ahamad, SWE-423, Unit-04

Introduction to the Internet and World Wide Web p. 1 The Evolution of the Internet p. 2 The Internet, Intranets, and Extranets p. 3 The Evolution of

402mm. hardware specifications graphic specifications additional info

CSI 3140 WWW Structures, Techniques and Standards. Markup Languages: XHTML 1.0

CS7026: Authoring for Digital Media. Introduction Markup Languages

Perfect PDF & Print 9

Chapter 1: Getting Started. You will learn:

Tennessee. Trade & Industrial Course Web Page Design II - Site Designer Standards. A Guide to Web Development Using Adobe Dreamweaver CS3 2009

Prentice Hall. Learning Media Design with Adobe CS4 (Skintik) Alabama - Course of Study - Interactive Multimedia Design

The Next Step. DPS Adobe Digital Publishing Suite. Apple cofounder Steve Jobs stated in a 1983 speech

File Routing & Collaboration. I.T. & Client Configuration Guide. Version 7.0

Moodle Student Introduction

File Preparation for Eagle Print. PLEASE NOTE:If you are an IBM based customer, we can only accept PDF files!

Hercules Dalianis DSV/KTH-Stockholm University /

Advanced Topics in Curricular Accessibility: Strategies for Math and Science Accessibility

Multimedia Systems. Part 1. Mahdi Vasighi

Creating Accessible PDF Files using Microsoft Word 2010 and Adobe Acrobat Pro version X

How to Build a Digital Library

Geneva CUSD 304 Content-Area Curriculum Frameworks Grades 6-12 Business

Flip Writer Integrate elements to create Page-flipping ebooks. User Documentation. About Flip Writer

Fiery Network Controller for DocuColor 5065 SERVER & CONTROLLER SOLUTIONS. Printing from Windows

Formatting Support: Word 2008

MY DOCUMENT FILE epoint.edu.vn MY DOCUMENT FILE. page 1 / 5

Markup Languages SGML, HTML, XML, XHTML. CS 431 February 13, 2006 Carl Lagoze Cornell University

extensible Markup Language

SCSD Technology Standards Third Grade A Awareness - Exposed to the technology as it is being used by others.

CONTENTS. Chapter 1 Introduction and Evolution of Computer

HTML5: MULTIMEDIA. Multimedia. Multimedia Formats. Common Video Formats

Table of content. Creating signup form Associating automation tools to signup form Signup form reports...42

Social Issues. spam Espionage forgery access to your data years from today destroying old records/ data

CiviX Author Custom Actions Cheat Sheet

CS 528 Mobile and Ubiquitous Computing Lecture 4a: Playing Sound and Video Emmanuel Agu

Formalize Accessibility. Accessibility and Open Source. Italian Legislation. Law n. 4 can be summarized: Focal Points on Technical Requirements

Lecture 19 Media Formats

Release Notes Fiery X3eTY 35C-KM version 2.01 and Fiery X3eTY 30C-KM version 1.01

FrameMaker 7.2. Adobe WELCOME TO ADOBE FRAMEMAKER 7.2 REVIEWER S GUIDE. Microsoft Windows 2000, Windows XP, UNIX

Learn Dreamweaver CS5 in a Day

Using UML To Define XML Document Types

Final Study Guide Arts & Communications

PEERNET File Conversion Center 6.0

TOSHIBA GA Utilities

QuarkXPress Server 8 Known and Resolved Issues

WIDE FORMAT DIGITAL IMAGING SYSTEM GEI 2900

The Journal of Insect Science

Transcription:

X X

The New Document Digital Polymorphic Ubiquitous Actionable Patrick P. Bergmans University of Ghent

The Traditional Document Documents have been around for thousands of years The Bible is a document The scrolls of the Dead Sea are documents Hieroglyphs of the Ancient Egypt are documents Documents have been and continue to be the support of a large fraction of human knowledge Documents are stored on a specific medium For centuries, the traditional medium for documents has been paper Recently, the storage medium for documents has become digital

The Traditional Document Paper documents were a fairly simple concept Digital documents are much more complex, because of their numerous additional attributes The Digital Document is polymorphic; it has many, many different embodiments and representations Computer Scientists have introduced formal Document Models These models are used to Analyze document transformations and evolutions Identify resources needed for those transformations Define Document Processes that govern these transformations

Dimensions of Document Space In these models, documents are contained in a multidimensional document space (content, structure, format, time, spatial, others ), identifying their specific properties along the axes of the space Documents transformations are trajectories in document space, describing the life of a document and its evolutions The multidimensional document space can be simplified by projecting it onto sub-spaces The initial (content, structure, format) model considers the subspace of documents independently of time and space

Expected Model Benefits Precise definitions (giving common terminology) Definition of (generic) operators for document transformation Copy, Move, Erase, Print,... Explicitly show where conceptual difficulties lie, giving some ideas of their fundamental nature Enable reasoning on document transitions (e.g. versioning and properties inheritances, document rights)

Content-Structure-Format Model (three-dimensional projection)

Content-Structure-Format model This sub-space of the full document model can be used to illustrate how knowledge, meaning and content are derived and transformed during the structuring, formatting and physical output of a document Vertical axis is some sort of overall evolutionary axis, but not exactly a time axis Local Transformations are Content transformations at any time Structure transformations in the structured document plane Format transformations in the styled document plane

Knowledge Intent Meaning Form Logical premises Language, Pictorial, Musical, Gesture Patrick P. Bergmans / The New Document / Analogous Spaces / May17, 2008 Layer & Property Examples Basic Content Logical Structure Text, Artwork DTD Structure/Format planes Transition Properties Structured Content Presentation Format SGML, XML, (HTML) Style sheet, XSL Styled Content Resources DOC, WPF, RTF, (HTML) Fonts Digital Documents Output Representation Media Properties PDF, PS, PCL, MIDI Page size, Screen Resolution Raw Digital Image Device Properties TIFF, GIF, BMP, WAV Screens, CD, Audio cassette, VHS, Minidisk, DVD Physical Representation Paper, Sound, Video, Voice

Digital Documents There are many forms of Digital Documents It is extremely important to distinguish them In function of expectations of usage In terms of storage, editability etc. Issue: coexistence at several levels of representation Logical and physical Logical concepts: chapter, paragraph, sentence, word space Physical concepts: page, column, line of text

The Four Types of Digital Documents Structured Styled PDL Bitmap The Paper Document The Digital Document

Digital - Bitmap Document stored as an array of pixels Is really a digital picture of the document Simple 1-to-1 representation of the physical Document Examples: TIFF, GIF, BMP, PNG, JPG Large storage volume Little processing for imaging Essentially not editable (except with image processing tools); no text reflow

Digital - Page Description Contains objects, such as characters (glyphs), graphics, images, and a description of where (and sometimes how) they appear on the page Examples: PostScript, PDF, PCL; but PostScript is a programming language PDF is a non-procedural data representation system Reasonably compact storage Processing required for imaging ( RIP ) Device independent Marginally editable (moving objects), but no text reflow

Digital - Styled Document Document contains styled and sequenced graphic elements, and a limited amount of structure Example : RTF, DOC (MS Word Document), WPF Reasonably compact storage Requires processing for output (driver) Completely editable, and text may be reflowed But not structure-driven editing

Digital - Structured Document Document is highly structured, and structure-controlled Examples: SGML, XML HTML is hybrid (many properties of ML, some of RTF) Powerful concept of document type definition (DTD) High structure-controlled editability Text is contained in unprocessed elements; text reflow is possible, because of its very representation Requires often complex editing tools Often used in technical documents

Select Knowledge Meaning Intent Patrick P. Bergmans / The New Document / Analogous Spaces / May17, 2008 Logical premises Transform Examples Express Form Language, Pictorial, Musical, Gesture Basic Content Text, Artwork Organize Logical Structure DTD Structured Content SGML, XML, (HTML) Framemaker XML parser Style Presentation Format Style sheet, XSL Styled Content DOC, WPF, RTF Microsoft Word, Quark Xpress Postscript Driver Compose Resources Fonts Output Representation PDF, PS, PCL, MIDI Adobe Illustrator RIP, Speech & Sound Synthesizer Render Media Properties Page size, Screen Resolution Raw Digital Image TIFF, GIF, BMP, WAV Adobe Photoshop Marking engine, CRT, LCD, AV System Playback Device Properties Screens, CD, Audio cassette, VHS, DVD Physical Representation Paper, Sound, Video

Starting from Paper What if the original Document is paper? Scan to Digital Document What level do we scan to? Digital-to-paper is many-to-one Green button operation Paper-to-digital is one-to-many Level depends on purpose For storage, bitmap level might be sufficient For edits, at least styled content level

Intent Knowledge Meaning Patrick P. Bergmans / The New Document / Analogous Spaces / May17, 2008 Learn Upward Transforms Form Understand Basic Content Logical Structure Fragment Structured Content Presentation Format Re- Structure Styled Content Resources Recognize Output Representation Media Properties Segment Raw Digital Image Device Properties Capture Physical Representation

Knowledge Intent Meaning Product Specification Customer Documentation Re-Targeting Form Basic Content English German French Translation Logical Structure Structured Content SGML HTML XML Structure Edits, Conversions Presentation Format Styled Content WPF DOC RTF Contents Edits, Conversions Resources Output Representation Media Properties Raw Digital Image TIFF GIF BMP Processing, Format Conversion Device Properties Physical Representation

Examples of Applications of the Model

Knowledge Intent Meaning Form Patrick P. Bergmans / The New Document / Analogous Spaces / May17, 2008 Knowledge Intent Meaning Form Analog Copier Basic Content Logical Structure Basic Content Logical Structure Structured Content Presentation Format Structured Content Presentation Format Styled Content Resources Styled Content Resources Output Representation Media Properties Output Representation Media Properties Raw Digital Image Device Properties Raw Digital Image Device Properties Physical Representation Physical Representation

Knowledge Intent Meaning Form Patrick P. Bergmans / The New Document / Analogous Spaces / May17, 2008 Knowledge Intent Meaning Form Digital Copier Basic Content Logical Structure Basic Content Logical Structure Structured Content Presentation Format Structured Content Presentation Format Styled Content Resources Styled Content Resources Output Representation Media Properties Output Representation Media Properties Raw Digital Image Bitmap Bitmap Raw Digital Image Device Properties Physical Representation Image Processing Device Properties Physical Representation

Knowledge Intent Meaning Form Patrick P. Bergmans / The New Document / Analogous Spaces / May17, 2008 Knowledge Intent Meaning Form Multi- Function Devices Basic Content Logical Structure Basic Content Logical Structure Structured Content Presentation Format Structured Content Presentation Format Styled Content Resources DOC TextBridge Styled Content Resources Output Representation Media Properties PDL OCR Output Representation Media Properties PDL Ripping Raw Digital Image Bitmap Bitmap Raw Digital Image Device Properties Physical Representation Image Processing Device Properties Physical Representation

Select Knowledge Intent Meaning Patrick P. Bergmans / The New Document / Analogous Spaces / May17, 2008 Learn Translating Copier Express Organize Form Basic Content Logical Structure Understand Fragment Style Structured Content Presentation Format Styled Content Re- Structure Compose Resources Recognize Render Output Representation Media Properties Segment Playback Raw Digital Image Device Properties Capture Physical Representation

Knowledge Intent Meaning Patrick P. Bergmans / The New Document / Analogous Spaces / May17, 2008 The function of the RIP Form Basic Content Logical Structure Structured Content The Digital World Presentation Format Styled Content The Digital World Resources One-to-one PDL ( PS, PDF, etc) Media Properties Raw Digital Image The Digital World The Digital World RIP Device Properties Physical Representation The Paper World

Intent Knowledge Learn Patrick P. Bergmans / The New Document / Analogous Spaces / May17, 2008 Steps to improve OCR Meaning Meaning Form Logical Structure Basic Content Styled Content Fragment Structured Content Presentation Format Resources Re- Structure Recognize Output Representation Media Properties Device Properties Segment Raw Digital Image Capture Physical Representation Semantic Structure Syntactic Structure Form Parts of speech Language Content type Basic Content Pragmatic analysis Semantic Content Semantic analysis Content Dependencies Syntactic Parsing Tagged Content Morpho - analysis English Text Recognize Language Text content Recognise Basic Form Pragmatic knowledge Language Semantics Language syntax Understand Morphosyntactics Trigram model External Information

The Networked Document (1) Structured Styled PDL Bitmap Structured Styled The Networked Document PDL Bitmap Distributed and Hyperlinked Documents Documents with Network Intelligence Documents with Workflow Intelligence Mobile Documents The Paper Document The Digital Paper Document The Digital Document

The Networked Document (2) Parts of Documents are stored in different locations on the network For example, images on an image server Or a large number of logically linked servers Documents are dynamically assembled When viewing When printing Requires networks with High performance High availability Technology for dynamic document assembly is Hyperlinking

The Networked Document (3) Hypertext was a major fundamental advance in Document Storage Architecture Documents are linked to integrate external objects Powerfully implemented in HTML and XML HTML is vulnerable, unfit for a robust corporate Document Management System XML is much better for linking purposes (through XLL) Network-based storage of corporate documents requires a Document Storage Architecture with robust links and strong link management Making a difference between Intranet & Internet

The Networked Document (4) Bi-directional linking, link registry, link ownership and link lifetime management are key A B2 Intranet with full object control B1 C3 C1 C2 X1 X2 X3 C2 knows is is used by B1 and C3 X2, Z1 don t know who uses them Z1

The Network Intelligent Document (1) Documents which adapt themselves to the (limited) bandwidth of the network Documents with hierarchical information representation On a printer On a display On a Portable Document Reader Requires several levels of representation Explicitly Stored internally Or automatic summarization

The Network Intelligent Document (2) Automatically generated at authoring time To be available when needed (like thumbnails) Reproduction adapted to available bandwidth or storage Small Bandwidth Small Storage Full Images Full Text Summary t URL Large Bandwidth Large Storage

The New Document is Live, Dynamic, Updatable Freezes when printed or converted to a static (conventional digital) document, lives on the net Linked, Hyperlinked With links resolved at rendering time (printing or viewing) Implementing the ultimate late binding capability Inherently supporting variable/personalized publishing With reverse link control for document integrity Intelligent, Adaptable Integrates some of its own workflow procedures Understands the limitation of communication channels, and of viewing or printing equipment, and adapts itself Auto-translating, Auto-summarizing

The New Document is Generator of a whole new range of activities, such as Document-based collaboration activities Collaborative authoring Document-based business and administrative processes Supports complex pruduct design and approval cycles Integrates document rights Integrates digital signatures and biometric data Document-based search methods and engines Search engines for the WWW Search engines for DMS Meta-search engines and restricted-domain search engines

The New Document is Digital, of course, and Polymorphic; exists in many different variations, media, formats, etc Ubiquitous: linked and hyperlinked, distributed, dynamic and mobile Actionable: supporting business processes and generating activities unthinkable of a decade ago

Thank you Patrick P. Bergmans / The New Document / Analogous Spaces / May17, 2008

X X