Edits in Xylia Validity Preserving Editing of XML Documents

Similar documents
Laboratory Exercise 6

Operational Semantics Class notes for a lecture given by Mooly Sagiv Tel Aviv University 24/5/2007 By Roy Ganor and Uri Juhasz

Advanced Encryption Standard and Modes of Operation

An Intro to LP and the Simplex Algorithm. Primal Simplex

xy-monotone path existence queries in a rectilinear environment

Lecture 14: Minimum Spanning Tree I

AUTOMATIC TEST CASE GENERATION USING UML MODELS

Universität Augsburg. Institut für Informatik. Approximating Optimal Visual Sensor Placement. E. Hörster, R. Lienhart.

An Approach to a Test Oracle for XML Query Testing

Chapter S:II (continued)

The Association of System Performance Professionals

Testing Structural Properties in Textual Data: Beyond Document Grammars

SIMIT 7. Component Type Editor (CTE) User manual. Siemens Industrial

Topics. Lecture 37: Global Optimization. Issues. A Simple Example: Copy Propagation X := 3 B > 0 Y := 0 X := 4 Y := Z + W A := 2 * 3X

Cutting Stock by Iterated Matching. Andreas Fritsch, Oliver Vornberger. University of Osnabruck. D Osnabruck.

Routing Definition 4.1

1 The secretary problem

Compiler Construction

Aspects of Formal and Graphical Design of a Bus System

On successive packing approach to multidimensional (M-D) interleaving

Laboratory Exercise 6

Minimum congestion spanning trees in bipartite and random graphs

DAROS: Distributed User-Server Assignment And Replication For Online Social Networking Applications

Delaunay Triangulation: Incremental Construction

MAT 155: Describing, Exploring, and Comparing Data Page 1 of NotesCh2-3.doc

Proving Temporal Properties of Z Specifications Using Abstraction

A SIMPLE IMPERATIVE LANGUAGE THE STORE FUNCTION NON-TERMINATING COMMANDS

CS201: Data Structures and Algorithms. Assignment 2. Version 1d

Bottom Up parsing. Bottom-up parsing. Steps in a shift-reduce parse. 1. s. 2. np. john. john. john. walks. walks.

Parallel MATLAB at FSU: Task Computing

Laboratory Exercise 2

Refining SIRAP with a Dedicated Resource Ceiling for Self-Blocking

3D SMAP Algorithm. April 11, 2012

np vp cost = 0 cost = c np vp cost = c I replacing term cost = c+c n cost = c * Error detection Error correction pron det pron det n gi

Shortest Paths Problem. CS 362, Lecture 20. Today s Outline. Negative Weights

Domain-Specific Modeling for Rapid System-Wide Energy Estimation of Reconfigurable Architectures

Today s Outline. CS 561, Lecture 23. Negative Weights. Shortest Paths Problem. The presence of a negative cycle might mean that there is

Hassan Ghaziri AUB, OSB Beirut, Lebanon Key words Competitive self-organizing maps, Meta-heuristics, Vehicle routing problem,

SIMIT 7. Profinet IO Gateway. User Manual

Select Operation (σ) It selects tuples that satisfy the given predicate from a relation (choose rows). Review : RELATIONAL ALGEBRA

Service and Network Management Interworking in Future Wireless Systems

A Basic Prototype for Enterprise Level Project Management

Practical Analog and Digital Filter Design

Chapter 13 Non Sampling Errors

Keywords Cloud Computing, Service Level Agreements (SLA), CloudSim, Monitoring & Controlling SLA Agent, JADE

Laboratory Exercise 6

Distributed Packet Processing Architecture with Reconfigurable Hardware Accelerators for 100Gbps Forwarding Performance on Virtualized Edge Router

Course Project: Adders, Subtractors, and Multipliers a

Shortest Path Routing in Arbitrary Networks

Motion Control (wheeled robots)

SIMIT 7. What's New In SIMIT V7.1? Manual

A Multi-objective Genetic Algorithm for Reliability Optimization Problem

Laboratory Exercise 6

Modeling of underwater vehicle s dynamics

Integration of Digital Test Tools to the Internet-Based Environment MOSCITO

else end while End References

LinkGuide: Towards a Better Collection of Hyperlinks in a Website Homepage

A note on degenerate and spectrally degenerate graphs

Analysis of the results of analytical and simulation With the network model and dynamic priority Unchecked Buffer

Performance Evaluation of an Advanced Local Search Evolutionary Algorithm

Dynamically Reconfigurable Neuron Architecture for the Implementation of Self- Organizing Learning Array

New Structural Decomposition Techniques for Constraint Satisfaction Problems

Kinematics Programming for Cooperating Robotic Systems

Quadrilaterals. Learning Objectives. Pre-Activity

A Fast Association Rule Algorithm Based On Bitmap and Granular Computing

Analyzing Hydra Historical Statistics Part 2

CSE 250B Assignment 4 Report

INVERSE DYNAMIC SIMULATION OF A HYDRAULIC DRIVE WITH MODELICA. α Cylinder chamber areas ratio... σ Viscous friction coefficient

Policy-based Injection of Private Traffic into a Public SDN Testbed

The Data Locality of Work Stealing

Parameters, UVM, Coverage & Emulation Take Two and Call Me in the Morning

ES205 Analysis and Design of Engineering Systems: Lab 1: An Introductory Tutorial: Getting Started with SIMULINK

Representations and Transformations. Objectives

Performance of a Robust Filter-based Approach for Contour Detection in Wireless Sensor Networks

Contents. shortest paths. Notation. Shortest path problem. Applications. Algorithms and Networks 2010/2011. In the entire course:

Karen L. Collins. Wesleyan University. Middletown, CT and. Mark Hovey MIT. Cambridge, MA Abstract

(12) Patent Application Publication (10) Pub. No.: US 2011/ A1

Using Mouse Feedback in Computer Assisted Transcription of Handwritten Text Images

Stochastic Search and Graph Techniques for MCM Path Planning Christine D. Piatko, Christopher P. Diehl, Paul McNamee, Cheryl Resch and I-Jeng Wang

arxiv: v1 [cs.ds] 27 Feb 2018

Highly Heterogeneous XML Collections: How to retrieve precise results?

Aalborg Universitet. Published in: Proceedings of the Working Conference on Advanced Visual Interfaces

Algorithmic Discrete Mathematics 4. Exercise Sheet

Planning of scooping position and approach path for loading operation by wheel loader

A QoS-aware Service Composition Approach based on Semantic Annotations and Integer Programming

(12) Patent Application Publication (10) Pub. No.: US 2013/ A1. Dhar et al. (43) Pub. Date: Jun. 6, 2013 NY (US) (57) ABSTRACT

Computer Arithmetic Homework Solutions. 1 An adder for graphics. 2 Partitioned adder. 3 HDL implementation of a partitioned adder

The norm Package. November 15, Title Analysis of multivariate normal datasets with missing values

Problem Set 2 (Due: Friday, October 19, 2018)

Multicast with Network Coding in Application-Layer Overlay Networks

ADAM - A PROBLEM-ORIENTED SYMBOL PROCESSOR

A New Approach to Pipeline FFT Processor

Size Balanced Tree. Chen Qifeng (Farmer John) Zhongshan Memorial Middle School, Guangdong, China. December 29, 2006.

IMPLEMENTATION OF CHORD LENGTH SAMPLING FOR TRANSPORT THROUGH A BINARY STOCHASTIC MIXTURE

Optimizing Synchronous Systems for Multi-Dimensional. Notre Dame, IN Ames, Iowa computation is an optimization problem (b) circuit

Generic Traverse. CS 362, Lecture 19. DFS and BFS. Today s Outline

Midterm 2 March 10, 2014 Name: NetID: # Total Score

Shortest Paths with Single-Point Visibility Constraint

Lecture Outline. Global flow analysis. Global Optimization. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

AVL Tree. The height of the BST be as small as possible

Transcription:

dit in Xylia Validity Preerving diting of XML Document Pouria Shaker, Theodore S. Norvell, and Denni K. Peter Faculty of ngineering and Applied Science, Memorial Univerity of Newfoundland, St. John, NFLD, Canada Abtract Thi paper preent an overview of the Xylia Toolkit' approach to preerving the validity of XML document ubjected to edit action uch a inertion, replacement, and deletion. The Xylia Toolkit i a library of clae that provide the Java programmer with the mean neceary to contruct a WYSIWYG (What You See I What You Get) XML editor. In addition, a default XML editor (with eential feature) created by the toolkit i included in the package, which can be ued a framework for creating an XML editor tailored to the uer need. An XML document i valid if it complie with it aociated Document Type Definition (DTD). A DTD impoe tructural contraint on a document by defining the et of element type for the document and the allowed parent child relation between them. Validating XML parer can check a document againt it DTD, but the main challenge i to maintain the validity of a document while edit uch a inertion, replacement, and delete are taking place. Xylia' trategy for preerving validity i to diallow operation that would invalidate the document rather than to attempt repairing the damage made. 1. About Xylia The Xylia Toolkit provide the Java programmer with the mean neceary to contruct a WYSIWYG (an acronym for what you ee i what you get ) XML (xtenible Markup Language) editor. In addition, a default XML editor (with eential feature) created by the toolkit i included in the package; it can be ued a framework for creating an XML editor tailored to the uer need. A WYSIWYG (pronounced wiz-eewig ) editor i one that allow the developer to ee the end reult of a document while creating it through an eay to ue GUI (Graphical Uer Interface). xample of currently available HTML WYSIWYG editor are Microoft FrontPage, Adobe PageMill, and Adobe GoLive. 2. About XML XML [1] i a tandard developed by the World Wide Web Conortium (W3C) [2] ued to deign cutomized markup language. A markup language decribe the way the content of a document hould be interpreted. Unlike HTML, which i a pure markup language, XML doe not offer a predefined et of element and rule of tructural validity; rather it define an infinite et of related language, and a metalanguage for expreing the et of element and the validity rule of a particular language. The imilarity between XML language allow a ingle oftware tool, uch a a parer of an editor, to be deigned that can handle any XML language, including thoe not yet conceived of. 3. Modelling XML Document Xylia i baed on the Swing library of the Java language [3]. The repreentation of XML document in Xylia i partly impoed by need to interact with exiting part of the Swing library, and partly by the nature of XML and the need of the Xylia editor kit. In Xylia, XML document are modelled by two data tructure: a equence of character holding the textual content of the document, and an element tree, coniting of a finite et of node atifying the uual tree propertie, extended upon thi equence to give the document tructure (ee Figure 1) 1

XML_DOC A Xylia Snaphot <A> <>ome</> <!-- comment --> <>text</> </A> XML Document content c o m e t e x t Document Model content 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Figure 1 Node in the element tree are either branche or run; run being the node with no children (tree leave). ach branch node, except the root, repreent an XML element. Run node are either content node or dummy node. Content node cover the portion of the character equence containing textual data. Dummy node are of three type: tart entinel, end entinel (both hown a in Figure 1), and comment dummie (hown a c in Figure 1). ach branch node ha a tart entinel a it firt child and an end entinel a it lat child. Comment dummie, a the name ugget, repreent comment in the XML document. Dummy node cover a ingle pace ( ) character in the character equence. ranch node cover the union of the character covered by their children. There i one branch element, XML_DOC, which cover the entire equence. mpty element uch a <D></D> (or <D/>) are conidered branch element and have exactly 2 children: a tart entinel and an end entinel. Thi tree model of the document i manipulated by editing action and i rendered onto the diplay by a ytem of view object. The mapping of tree node to view object i controlled by a Cacading Style Sheet [2]. While intereting, the iue of diplaying the Xylia tree will not be further conidered in thi paper. 4. Well-formedne v. Validity The Well-formedne contraint, a pecified by W3C, i mainly concerned with the yntactical correctne of an XML document, uch a the requirement that tart and end tag are conitently neted. A complete decription of the contraint can be found under the XML 1.0 recommendation available for viewing at the W3C webite. XML parer have been programmed to pare document adhering completely to thi contraint and will fail to complete their tak when confronted with an ill-formed document. The document model dicued in the previou ection i a reult of the paring of a well-formed document. The characteritic of the data tructure decribed, therefore, erve a well-formedne invariant, ome of which are impoed by Swing, while other are pecific to Xylia (e.g. the exitence of XML_DOC a a default root, the preence of dummy element, etc.). An XML document i valid if it complie with the document type definition (DTD) aociated with the document. A DTD impoe tructural contraint on a document by defining the et of element type for the document and the allowed parent-child relation between them. The following i a ample DTD element declaration: <!LMNT A ((, C) C D*)> 2

Thi imply tate that element can have type A and that an element of type A could have the following et of equence of type for it children: {C, C, ε, D, DD, DDD, } The expreion ((, C) C D*) i termed a regular expreion. A regular expreion i a way of expreing a certain pattern, in our cae, a child equence pattern. In the example given, mean or,, mean juxtapoition, and * mean zero or more intance of. A detailed treatment of regular expreion i outide the cope of thi paper. The et of type equence aociated with a regular expreion r i called r language and i denoted L(r). The occurrence of #PCDATA in a regular expreion indicate that pared content data i allowed a a child; pared content data imply mean character, and correpond to the content node in our tree tructure. Xylia ue a validating XML parer that i capable of checking the document againt it aociated DTD and of reporting any dicrepancie. However the more challenging tak i to maintain the validity of a document while edit uch a inertion, replacement, and delete are taking place. In other word, all operation modifying the document model hould enure that the equence of children of each branch element (excluding all dummie) remain compliant to it DTD declaration. 5. Inertion Sequence Aume that the regular expreion aociated with element type A,, C, and D are a follow: A :: C C D* L(A) = {C, C, ε, D, DD, DDD, } :: C (A C)* D L() = {D, C, CAC, CACAC, } C :: #PCDATA L(C) = {ε, #PCDATA, #PCDATA#PCDATA, } D :: #PCDATA L(D) = {ε, #PCDATA, #PCDATA#PCDATA, } Suppoe that in the coure of editing, the uer ha made the following election in the document: <> <C></C> <A></A> <C></C> </> Deleting the election would violate the DTD ince CC i not a member of L(r). Inerting character data would alo violate the DTD ince doe not accept #PCDATA. However, the following et of replacement would be valid: {A, ACA, ACACA, } Now conider the following example: <A> <C>Hello World</C> </A> Deletion would be valid ince ε i a member of A L(r). Other replacement option would include: {C, C, ε, D, DD, DDD, } The ame et of type equence would be applicable to the following inertion requet: <A> </A> It can be een that whether the deired edit i inertion, replacement, or deletion, a et of valid inertion equence mut be derived baed on the election made and the affected element content model. Thi et would then determine whether the requeted operation hould be allowed. So Xylia trategy for preerving validity i to diallow operation that would invalidate the document rather than to attempt repairing the damage made. 3

When the uer wihe to inert at a given poition (or replace a given election of children) under a given parent node, a finite et of inertion equence i computed and offered to the uer on a menu. Firt the regular expreion decription of the content model i tranlated into a determinitic finite tate automaton (ee Figure 2 (a)). y adjuting the tart tate according to the equence of children of prior to the inertion point (or election) and the et of final tate according to the equence of children after the inertion point (or election), we can obtain a econd automaton that repreent the et of valid inertion equence (ee Figure 2 (b) and (c)). In the figure, thi et i {ε, D, C, CD, CC, CCD,...}. A thi i an infinite et, it i not uitable to be offered to the uer on a menu. A imple path in a graph contain no repeated node; a imple cycle i a path that contain no repeated node except that the firt and lat node are the ame. We can identify the et of accepting path in thi automaton that are either imple path or imple cycle (ee Figure 2 (d)). The et of equence of label along thee path contitute the et inertion equence offered to the uer. In the figure, thi i {ε, D, C}. It i important to note that any valid document can be tranformed to any other valid document by a uitable combination of the following uer action: pick an inertion point, elect a range, inert an offered inertion equence, and inert a character. Such a et of uer action i termed a complete et. For example, in Figure 2, if the uer really want to inert CCD, they can do o by firt inerting C, moving the curor to after the econd and inerting CD. In general any accepting path through the automaton can be decompoed into imple path and imple cycle which will be offered given appropriate curor placement. (A, (,C)*, D?, ) A D C (a) A regular expreion and it automaton. A (b) Inertion point i preceded by A and node and ucceded by an node A C C D D (c) Automaton repreenting the et of valid inertion. (d) Accepting imple path and imple cycle. 6. From Sequence to Tree Figure 2 Suppoe that the uer chooe to inert the C inertion equence after being offered a choice from the et {C, C, D} (the empty inertion equence requet ε i really a requet for deletion and not inertion) for the <A> </A> inertion requet example. Poible inertion would include: <A> <> <D></D> </> <C></C> </A> cae 1 <A> <> <C></C><A><D>text</D></A><C></C> </> <C></C> </A> cae 2 and infinitely many more. For each element in the choen inertion equence (in our cae C) a default inertion tree mut be elected baed on the element type aociated L(r). Recall A,, C, and D aociated L(r): 4

A :: C C D* L(A) = { C, C, ε, D, DD, DDD, } :: C (A C)* D L() = {D, C, CAC, CACAC, } C :: #PCDATA L(C) = { ε, #PCDATA, #PCDATA#PCDATA, } D :: #PCDATA L(D) = { ε, #PCDATA, #PCDATA#PCDATA, } In cae 1, at the firt level, D wa choen for and the empty tring wa choen for C. At the econd level, the empty tring wa choen for D. In cae 2, at the firt level, CAC wa choen for and the empty tring wa choen for C. At the econd level, D wa choen for A and the empty tring wa choen for C. Finally at the third level the tring text wa choen for D. It i not hard to ee that an infinite number of uch cae could be lited given the element type L(r). In Xylia, we identify a tree of minimal height for each element type. Thi can be done by tarting with element that allow empty content equence and labelling them a level 0. Next we identify element type that can have content equence coniting only of level 0 type, and labelling them a level 1. In general, a level i+1 type can have a content equence that contain only type of level i and below and doe not have a content equence that contain only type of level le than i. Only the finite et of content equence dicued in Section 5 need be conidered. For each level i type we can contruct a tree of height i+1. There i a retriction on regular expreion allowed in a DTD implying that #PCDATA can only be a child of an element with level 0 type. Thu the minimal height tree never contain character data. Thi algorithm enure that the tree inerted for the equence C are a hown in cae 1. XML element are aociated with one or more attribute. Some attribute are marked a required in the DTD. When a tree i inerted that require an attribute, the uer i prompted to upply the value. 7. A Few Note on Implementation Xylia editor do not allow the election of a ingle entinel element; if one i elected, the election automatically extend to cover it matching entinel. Thi i important becaue it prevent the deletion of a ingle entinel element in the coure of variou edit where election are involved. Such a deletion would violate the Xylia pecific well-formedne invariant of every branch element having a entinel element a it firt and lat child. Thi however doe not completely rule out the poibility of ingle entinel being deleted, a it i till poible to do o uing the delete and backpace key. Xylia handle election delete and backpace and delete-key delete differently: In the cae of election delete the following algorithm i ued: If deleting the election doe not violate the DTD, perform delete. Otherwie, do nothing. ackpace and delete-key delete of entinel are lightly more involved. The following rule are tried in turn until one i found that will not make the document invalid: Try deleting the element parenting the entinel and promoting it content to it parent. If the dummy being deleted i a tart entinel, try deleting the element parenting the entinel and moving it content to the end of that of it left ibling. If the entinel being deleted i an end dummy, try deleting the element parenting the entinel and moving it content to the beginning of that of it right ibling. Try deleting the entiniel parent and all it content. Do nothing. The Xylia toolkit provide the mean neceary to create an inert menu in the editor GUI. The inert menu automatically update itelf in repone to change in the curor poition or the current election (ee 5

Figure 3). The inert menu item repreent the et of inertion equence decribed in Section 5. Clicking on a menu-item replace the election with (or inert at the curor poition) the equence of default tree correponding to the inertion equence in quetion, a dicued in Section 6. Cut, copy, and pate operation have been recently implemented uch that validity i preerved. <html> <body> <h1>level 1 heading.</h1> <h2>level 2 heading.</h2> <p><b>bold text</b> and normal text</p> </body> </html> lement declaration for body: <!LMNT body (h1 h2 p table)*> Figure 3 8. Future Conideration Validity-preerving algorithm for inertion, replacement, and deletion have been implemented. diting action that need to be implemented in the future include ditributed engulf action. An XHTML (XML implementation of HTML) example will make the nature of the ditributed engulf action clear. Conider the following election: <html> <body> <p>a paragraph</p> <table><tr><td>a table entry</td></tr></table> </body> </html> A ditributed engulf of the <em></em> tag would reult in the following: <html> <body> <p><em>a paragraph</em></p> <table><tr><td><em>a table entry</em></td></tr></table> </body> </html> Refrence: [1] World Wide Web Conortium (W3C), xtenible Markup Language (XML), [Online document], Available HTTP: http://www.w3.org/xml/ [2] World Wide Web Conortium (W3C), World Wide Web Conortium, [Online document], Available HTTP: http://www.w3.org/ [3] Sun Microytem, Inc. Java TM Foundation Clae (JFC), [Online document], Available HTTP: http://java.un.com/product/jfc/ 6