Document-Centric Computing

Similar documents
EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content

The Adobe XML Architecture

Quark XML Author June 2017 Update for Platform with DITA

What's New in Laserfiche Rio and Laserfiche Avante 9.1. White Paper

XML APIs Testing Using Advance Data Driven Techniques (ADDT) Shakil Ahmad August 15, 2003

Microsoft SharePoint Designer 2010

Machine Readable Profiles (MRP)

Index A Access data formats, 215 exporting data from, to SharePoint, forms and reports changing table used by form, 213 creating, cont

EasyCatalog For Adobe InDesign

Proposed Revisions to ebxml Technical. Architecture Specification v1.04

Call: SharePoint 2013 Course Content:35-40hours Course Outline

Dictionary Driven Exchange Content Assembly Blueprints

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 14 Database Connectivity and Web Technologies

Quark XML Author September 2016 Update for Platform with Business Documents

Using the VMware vrealize Orchestrator Client

COMP9321 Web Application Engineering

Proposed Revisions to ebxml Technical Architecture Specification v ebxml Business Process Project Team

Quark XML Author October 2017 Update for Platform with Business Documents

a white paper from Corel Corporation

Metadata Standards and Applications. 4. Metadata Syntaxes and Containers

XML: Extensible Markup Language

D WSMO Data Grounding Component

Oracle Insurance IStream

Publishing Concurrent Requests with XML Publisher. An Oracle White Paper January 2005

Managing Metadata with Oracle Data Integrator. An Oracle Data Integrator Technical Brief Updated December 2006

A Guide to Quark Author Web Edition 2015

Using the VMware vcenter Orchestrator Client. vrealize Orchestrator 5.5.1

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

Oracle Warehouse Builder 10g Runtime Environment, an Update. An Oracle White Paper February 2004

ON TWO ADAPTIVE SYSTEMS FOR DOCUMENT MANAGEMENT * Vanyo G. Peychev, Ivo I. Damyanov

COMP9321 Web Application Engineering

CA ERwin Data Modeler r7.3

InfoSphere Master Data Management Reference Data Management Hub Version 10 Release 0. User s Guide GI

Altova XMLSpy 2007 Tutorial

Oracle Insurance IStream

Quantum, a Data Storage Solutions Leader, Delivers Responsive HTML5-Based Documentation Centers Using MadCap Flare

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 2: Description definition language

Forms iq Designer Training

Open Standard Voting Localization with CAM

Teiid Designer User Guide 7.5.0

Business Processes and Rules: Siebel Enterprise Application Integration. Siebel Innovation Pack 2013 Version 8.1/8.

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

Oracle BI Publisher 11g R1: Fundamentals

Xyleme Studio Data Sheet

Mobile Application Workbench. SAP Mobile Platform 3.0 SP02

<Insert Picture Here> Oracle Policy Automation Connector For Siebel Features and Benefits

SDMX self-learning package XML based technologies used in SDMX-IT TEST

Obsoletes: 2070, 1980, 1942, 1867, 1866 Category: Informational June 2000

COLUMN. Choosing the right CMS authoring tools. Three key criteria will determine the most suitable authoring environment NOVEMBER 2003

EMC DOCUMENT SCIENCES INTERACTIVE DOCUMENT DEVELOPMENT KIT

11. EXTENSIBLE MARKUP LANGUAGE (XML)

Stylus Studio Case Study: FIXML Working with Complex Message Sets Defined Using XML Schema

Presentation + Integration + Extension delivering business intelligence

MD Link Integration MDI Solutions Limited

Chapter 2 XML, XML Schema, XSLT, and XPath

EMC Documentum Forms Builder

Informatics 1: Data & Analysis

Background of HTML and the Internet

Integrating with EPiServer

> Semantic Web Use Cases and Case Studies

WHAT S NEW IN QLIKVIEW 10. qlikview.com NEW FEATURES AND FUNCTIONALITY IN QLIKVIEW 10

Index. Business Connectivity Services (BCS), 325 features by version, terminology, 325

Information Technology Document Schema Definition Languages (DSDL) Part 1: Overview

Oracle SOA Suite 10g: Services Orchestration

EMC Documentum Connector for Microsoft SharePoint Farm Solution

JMP and SAS : One Completes The Other! Philip Brown, Predictum Inc, Potomac, MD! Wayne Levin, Predictum Inc, Toronto, ON!

Developing a Basic Web Page

So You Want To Save Outlook s to SharePoint

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

WHITE PAPER. LiveApp Player Architecture. Player Technology The Composite Applications Deployment Style Architecture

SERVICE-ORIENTED COMPUTING

User Interaction: XML and JSON

5/19/2015. Objectives. JavaScript, Sixth Edition. Introduction to the World Wide Web (cont d.) Introduction to the World Wide Web

Oracle BI Publisher 10g R3: Fundamentals

EMC Ionix ControlCenter (formerly EMC ControlCenter) 6.0 StorageScope

Version 3.7 Addendum

The Business Case for a Web Content Management System. Published: July 2001

Survey Creation Workflow These are the high level steps that are followed to successfully create and deploy a new survey:

Manipulating Database Objects

Jeppesen Solution Integrator Overview DOCUMENT VERSION 1.0

Editing XML Data in Microsoft Office Word 2003

RedDot Web Content Management

Personal Information Manager Overview & Installation Guide

Oracle Reports 6.0 New Features. Technical White Paper November 1998

BW C SILWOOD TECHNOLOGY LTD. Safyr Metadata Discovery Software. Safyr User Guide

Jade Technologies JADE 2016 Roadmap

User Scripting April 14, 2018

Oracle Developer Day

The Semantic Planetary Data System

Product Features. Web-based e-learning Authoring

Inmedius Spectrum : S1000D Suite

Importing Metadata From an XML Source in Test Data Management

Visual Studio 2010 Xml Schema Explorer Tutorial

XML Metadata Standards and Topic Maps

No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation.

Microsoft XML Namespaces Standards Support Document

EMC Documentum Quality and Manufacturing

Microsoft XML Namespaces Standards Support Document

Visual Dialogue User Guide. Version 6.0

WebSphere Message Broker Patterns: Generate applications in an instant

Transcription:

Document-Centric Computing White Paper Abstract A document is a basic instrument for business and personal interaction and for capturing and communicating information and knowledge. Until the invention of the World Wide Web, the datacentric computing model, which is based on relational database theory, dominated the computing industry. The Web with hundreds of million of hyperlinked electronic documents has renewed interest in document-centric computing. Recent developments with the W3C Document Object Model used for both HTML and XML documents underscore this interest. Document-centric computing is more than just web publishing or electronic record keeping. It should support the capture of documents as part of the business process. With documents organised in an information repository, various views of this large information set can be presented in intelligent compound documents, incorporating and combining documents, lists, grouped lists, trees, branches, embedded document fragments, live links and so on. With XML technology, the document is not only self-describing but can also be self managing. Posting and distributing instructions, workflow and document update rules can be associated with the document type and this provides the intelligence necessary for the documents to be self-managed and to be routed through the organization. A key to reducing information overload is to produce and capture information that can be processed effectively by the computer. This white paper describes some of the innovations in document-centric computing implemented in the Multicentric Enterprise Information Framework (MEIF). Copyright 2003 Multicentric Technology Sdn Bhd The information contained in this document represents the current view of Multicentric Technology on the issues discussed as of the date of publication. This white paper is for informational purposes only. Multicentric Technology MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. Other product and company names mentioned herein may be the trademarks of their respective owners. Multicentric Technology Sdn Bhd 4, Jalan SS20/26 Damansara Utama 47400 Petaling Jaya Selangor Malaysia http://www.multicentric.com First Published: September 2003 Revised: December 2003, January 2004

1. Introduction The computing landscape today remains dominated by the datacentric computing model. The data-centric computing model is focused on efficient processing and access to high volume data with low complexity or diversity. The top left-hand corner in Figure 1 represents the region where it dominates. Data-centric computing, based on the relational model, requires information to be fitted into rows and columns of tables and with more complex information, decomposed into several related tables. This framework is optimized for data processing but is rigid and changes in information structure will require corresponding changes in the underlying table structures and program codes. It is anticipated that developments of the datacentric model will be limited to incremental improvements. A more flexible and adaptable approach is required to handle problems in the lower volume but higher complexity/diversity region in the bottom right of Figure 1, representing the computing challenges of the 21 st century. The focus here is on information modeling. With the maturing of computing systems, implementation of such an approach is now feasible. Figure 1 - Computing Landscape and Directions Documents provide an abstraction that allows information to be captured natively in a more intuitive and familiar medium. Diverse documents types can be stored in a single repository and this facilitates their usage, management, distribution, manipulation and processing. Some of the challenges in document-centric computing include providing facilities to allow users to browse the documents available and to manipulate, process and present the information captured in the documents in various contexts. The documents must also contain the intelligence necessary to support workflow processes. Multicentric Technology White Paper 2

This then implies a compute intensive system utilizing the computing power available now on the desktop. 2. Key technology and Concepts Document-centric computing requires the integration of several key technologies and concepts to realize its potential. Some of these technologies and concepts are: HTML [1]: HTML is the lingua franca for publishing hypertext documents. It is a markup language that focuses on presentation. The content of the HTML document is machine-readable, but not machine-understandable. The Multicentric Enterprise Information Framework (MEIF 1 ) uses HTML for the display of documents. The HTML format allows scripts and comments to be embedded in the document and for compound documents to be generated based on this information. XML [2]: XML is the Extensible Markup Language, which lets the user design their own customized markup languages for an infinite number of types of documents. XML documents are self-describing thus making its content available to practical applications. The MEIF uses XML for modeling complex information by providing them with a structure. XML is used at the document content level. It is also used to support document-centric workflow facilities. RDF [3]: The resource description framework (RDF) is the W3C framework, which uses metadata to describe the data contained on the Web. This framework is at the document level rather than the document content level of XML. The MEIF does not use RDF but uses its own network information model for storing and managing documents. Relational Database: Relational database technology is a mature technology for storing data and handling relationships. All information in the MEIF is stored in a fixed set of relational database tables. The Multicentric Information Network Model: Documents stored in the system are organised in accordance with the Multicentric Information Network Model as shown in Figures 2 and 3. 1 See Appendix A Multicentric Technology White Paper 3

Figure 2 - Relationships Figure 3 - Interfaces The primary component in the framework is the object, which can represent anything that can be given a name. The object can contain multiple documents representing multiple perspectives of the object. List processing: The Multicentric Information Network Model is based on lists. Related objects can be represented as a single list, in multiple lists based on types of relationship or in filtered lists based on other relationships. Using lists to handle information is intuitive and people are already familiar with various types of lists such as checklists, shopping lists, material lists, etc. List processing is also one of the foundations of artificial intelligence - a lot of intelligent information can be stored in lists with common attributes. Granular or Elemental Information: One of the objectives of document-centric computing is information reuse. For information reuse the information must be in granular or in elemental form. Ideally, the documents should focus on a single subject or issue as in an encyclopedia. Multicentric Technology White Paper 4

The structure of the XML documents must be sufficiently granular and structured so that specific nodes in the documents can be extracted using XPath expressions for information reuse and processing. 3. Dynamic documents Dynamic documents are compound documents that are generated from other documents in the repository. The objective is to provide the facility to present complex information from multiple documents in a single document in various formats, ensuring that the presented document is always current and promoting information reuse. The MEIF provides several features in support of this objective. The MEIF uses a set of Multicentric Processing Instructions (MctPI) to declare to the engine the output required. The MctPI, which is based on XML syntax, is enclosed in an HTML comment statement (to ensure that this information is not displayed in a web browser). This is very similar in concept to the PIA's Document Processing System (DPS) [4]. This declaration is feasible as the documents are organised using the Multicentric Information Network Model. Users are not required to remember the syntax of the processing instructions as they can be defined using options and dialogs provided in the internal editor menu. Related Object Lists: A list of related objects complete with hypertext links can be embedded in a document using MctPI. The list can be inserted as a plain list, a grouped list or a filtered list. An example of the MctPI declaration for this purpose is as follows: <!-- LiveInsert <Relationship type="members"/> <Sort type="ordered" Renumber="True"/> <InclChildren ChildLevels="2"/> <Grouped GroupParent="9" GroupLevels="4" ShowTopGroup="True" InclOriginal="False"/> --> These instructions define the live insertion of related member objects. The list is to be sorted based on an assigned sort order and the items are to be renumbered sequentially. Children of the related objects are included to 2 levels. The list is to be grouped into 4 levels and the top-level group object is the one that has its object ID as 9. The top-level object is to be shown and the original list omitted. By default, the reference object is the object of the current document. If a different reference object is required then it can be specified as follows: Multicentric Technology White Paper 5

<RefObject value= ## /> The list generated by these instructions is shown in Figure 4. Figure 4 - Live Insert of Related objects Tree Insert: A declaration for inserting a tree structure is given below: <!--LiveTreeInsert <Relationship type="members"/> <ListCaption Caption="List of all functional categories"/> <Sort type="alpha"/> <Count type="associate"/> <NewTarget/> --> The root of the tree is the current object and the displayed tree is shown in Figure 5. Figure 5 - Inserted Tree with Counts Multicentric Technology White Paper 6

Filtered Lists: The list can also be filtered based on relationships and on dates. Filtering the list provides a fine control on the objects that are displayed. Relationship filters can be inversed. The date filter can for instance be used to display objects that are current. With XML documents, the date filter can refer to a specific node in the document. Including Synopsis: The synopses of the related objects can also be included in the document with the following declaration: <IncludeSynopsis size="500"/> An example of the result of this declaration is shown in Figure 6. Figure 6 - List with Synopsis Navigation Branch: It is common now to see the branch of the current document based on the site map displayed. Figure 7 shows such a branch from the Microsoft Web site. Figure 7 - Navigation Branch In the MEIF, the branch may have multiple leaves as shown in Figure 8 because of the network structure of the repository. Figure 8 - Navigation Branch In MultiCentrix Displaying this is achieved with the simple declaration: <!-- *ShowBranch--> Multicentric Technology White Paper 7

Document Threads: If there are several documents in the same object, links to these documents can be displayed in a document thread using the declaration: <!-- MultiCentrix Document Thread --> The thread is displayed as shown in Figure 9. Figure 9 - Document Thread Instead of the default numeric labels, user-defined labels can be used. Live Links: Information from ODBC compliant databases can be embedded in the document. The database connection, the SQL statements and the XSLT transformation file defined by the user are stored in the system tables for reuse. <!--Livelink <QueryName="Country Information"/> <fname="my-"/> --> Document Templates: To promote a common look and feel across different documents of a similar type, the MEIF supports an unlimited number of document templates. The document template acts as a wrapper around the actual document, which can be in HTML or XML format. Document templates are specified with the HTML document meta tag: <meta name= mct:template content= templatename > Document templates can also incorporate the MctPI to use the facilities provided by the MctPI. As the MctPI uses the current object of the document as the default reference object, generic declarations can be used and applied to any document. New MctPI Declarations: Based on the same concept, new MctPI declarations can be developed for the visualization of the rich information network structure. These can be and are added to the framework from time to time when we learn new tricks or as the need arises. 4. The Intelligent Document XML technology provides an important framework for making intelligent documents possible. The MEIF uses XML technology together with relational database tables to provide a framework for managing enterprise information intelligently. Multicentric Technology White Paper 8

Starting from a sample XML document, an XML Schema (XSD) can be inferred using Schema-Forms, one of the components of the MEIF. The XML Schema can then be refined to meet the actual requirements. 2 Based on this XSD document, posting instructions for the document can be defined. The posting instructions will define the target information repositories, and the relationship of this object (document) with other objects in the repositories. The posting instructions reference nodes in the documents to derive the object names and resolve the conditions specified. Distribution instructions for the document can be defined in a similar way to the posting instructions. The posting and distribution instructions, which are in XML format, are stored in the table together with the XSD document. They can be exported with the XSD document by including them in a SOAP [5] envelope. Based on the XSD document, an HTML form can be generated using the HTML form generator in Schema-Forms. The HTML form generator allows the user to define various display properties and constructs not included in the XSD document. The generated HTML forms include sufficient intelligence for entered information to be converted to an XML document conforming to the XSD document. XML documents can be loaded into the HTML form as required for editing purposes. The HTML forms also support the addition of repeated items, rows or branches interactively, an important flexibility requirement. Help documents can be created in the framework and linked contextually to the fields in the HTML forms. The HTML forms can be deployed on the desktop or the Web for distributed data capture. Enumerated list can be embedded in the XSD document or placed in a reference table or even retrieved as a list of objects from the information network model. An XSLT writer is provided to generate basic XSLT documents based on the XSD document. 2 The new E-Descriptors framework allows the XSD, XSLT and HTML forms to be generated from selected descriptors from the descriptors repository. Multicentric Technology White Paper 9

These features enable the user to make the document intelligent and self-managing. Together with XSD documents cross-referencing, document-centric workflow is supported as described in the next section. 5. Document-Centric Workflow A key requirement in the work place is document workflow. When a document is created, we need to be able to route it. Each recipient must be able to act on the document by creating new documents or updating the current document. The selected information from the current document must be capable of being transferred to the new document to reduce work and transposition errors. Transferred information may or may not be editable. When the new document is posted, the status of the original document may also need to be updated. User rights in adding, updating or modifying documents must also be definable. The Workflow Reference Model as defined by the Workflow Management Consortium [6] (WFMC) is process based. It requires the complete workflow process to be defined in a workflow process description language complete with participants and workflow relevant data. The document-centric workflow model as implemented in MEIF takes a different tack. Instead of focusing on the process, it focuses on the individual documents. When a document is received, the recipient is typically only concerned with what actions they can take on the document rather than the workflow process. Some examples of these actions are ask for more information, acknowledge receipt of the document, process the document and route it, close the file etc. Some of these actions require a new document while others require the updating of the existing document. Multicentric Technology White Paper 10

Figure 10 Figure 10 shows the document-centric workflow model. For each XSD document, users can define the: Posting instructions, Distribution instructions, and Web Actions (updates options) In Figure 10, Document B and Document C reference Document A. Nodes mappings can be defined for transferring information from the Master document and Document A to these two documents. With these two dependant documents defined, when a user is viewing Document A, the options to initialize an XML instance document based on Document B or C will be available provided the specified conditions are met. When this new document is posted, the status of the referenced document (i.e. Document A) can be updated and based on the conditions specified for the referencing, this option may or may not be available again. Information may also need to be transferred from the master workflow document to the new documents and likewise the master workflow document may be updated when the new documents are posted. Multicentric Technology White Paper 11

Figure 11-1 Multicentric Posting Instructions Figure 11-2 Relationships Posting Figure 11-3 Nodes Mapping Figure 11-4 Pick and Fill Nodes Mapping Multicentric Technology White Paper 12

Figures 11-1 to 11-3 show the user interface for defining the workflow related instructions and Figure 11-4 the Pick and fill nodes mapping. Figure 12 - Defining Web Actions Figure 13 - Defining Web action variables Figure 12 shows the user interface for defining Web Actions and Figure 13 the definition of Web Actions variables. Web actions allow the entry of selected variables that will be used to update the document. After the update, the document may be reposted. A workflow action can also be associated with this action. 6. Documents Publishing Publishing documents should be relatively simple as compared to publishing a relational database. Selected documents in the repository can be structured and published in the Adobe PDF or Microsoft CHM format. The document repository and HTML forms can also be served on the web. Multicentric Technology White Paper 13

7. Document-centric Applications If you are not really managing data as such, then a document-centric application may be more suitable. A data-centric application is required when the data need to be processed and posted to perpetual files as in accounting, inventory control and other similar situations. Document-centric applications are more informational in nature. We need draw upon this vast information asset for our actions and our decision-making processes. Some examples of document-centric applications are correspondence, minutes of meetings, project monitoring, contact management, standards and specifications, procedures, customer information, staff resumes, logging reports of various types, etc. 8. Conclusion To meet the computing challenges of the 21 st century, the computing model has to be flexible and adaptable and at the same time be able to handle higher complexity. The document-centric model as described in this paper offers some interesting potential. The document, offering a higher level of abstraction, is more intuitive for humans, thus enabling us to model the real world more accurately. Document-centric computing requires the integration of several key technologies and concepts. HTML is required for the dynamic generation and presentation of information; XML technology for providing intelligence at the document content level; Relational database technology for storing and managing the documents and an information network model to provide a framework for modeling the complex relationships among objects, as containers for documents. 9. References [1.] HTML 4.01 Specification W3C Recommendation 24 December 1999 (http://www.w3.org/tr/1999/rec-html401-19991224/) [2.] Extensible Markup Language (XML) 1.0 (Second Edition) W3C Recommendation 6 October 2000 (http://www.w3.org/tr/recxml) [3.] Resource Description Framework (RDF) Model and Syntax Specification W3C Recommendation 22 February 1999 (http://www.w3.org/tr/1999/rec-rdf-syntax-19990222/) [4.] Document Processing in the PIA (http://www.risource.org/papers/wp-dps.html) [5.] SOAP Version 1.2 Part 0: Primer W3C Recommendation 24 June 2003 (http://www.w3.org/tr/2003/rec-soap12-part0-20030624/) [6.] Workflow Reference Model (http://www.wfmc.org/standards/docs/tc003v11.pdf) Multicentric Technology White Paper 14