Path Query Reduction and Diffusion for Distributed Semi-structured Data Retrieval+

Size: px
Start display at page:

Download "Path Query Reduction and Diffusion for Distributed Semi-structured Data Retrieval+"

Transcription

1 Path Query Reduction and Diffusion for Distributed Semi-structured Data Retrieval+ Jaehyung Lee, Yon Dohn Chung, Myoung Ho Kim Division of Computer Science, Department of EECS Korea Advanced Institute of Science and Technology (KAIST) 373-1, Kusong-dong, Yusong-gu, Taejon, , Korea {jlee, ydchung, Abstract In this paper, we address the problem of query processing on distributed semi-structured data. The distributed semistructured data can be modeled as a rooted and edge-labeled graph, where nodes are located in a single or a number of sites. For eficient retrieval of distributed semi-structured data, we propose a query processing model that is based on the query reduction and diffusion method. In the method, a user query is reduced in a site and distributed to other sites for data retrieval. We also propose a set of algorithms for the proposed model. 1. Introduction The semi-structured data is generally described as the data whose structure or format is not separated from its contents. Examples of semi-structured data are HTML documents, BIB^ documents, genome data, etc. These semi-structured data have irregular and changeable structure. Most previous studies on semi-structured data try to extend existing database technologies. That is, they complement relational and object-oriented database technologies for semi-structured data retrieval. Detailed research areas include data integration, web site management, general purpose semi-structured data management [6], data model [5], query language design 11, 21, query processing [8], and indexing techniques [7]. Various data models proposed for semi-structured data share a common property: they model the semi-structured data as a rooted and edge-labeled graph. In the graph, nodes are objects and labels are strings, integers, images, sounds, etc. Several query languages have been proposed for semistructured data, which vary in style and expressive power. ~~ ~ +This work was supported by grant No from the interdisciplinary research program of the KOSEF. They are generally based on regular path expressions and describe nodes reachable from the given path in a declarative way. In the below we show an example of regular path query which is from UnQL [2]. Q = select t where -*+ CS-Dept *-*+ in DB Paper * t The query retrieves all papers accessible from a CS-Dept link in the DB. Here, -** CS- Dept +-*+ Paper is a regular path expression. This expression denotes a path which has an edge labeled CS-Dept and an edge labeled Paper in this order. In this paper, for distributed semi-structured data retrieval, we propose a query processing model that can be applied to the current web environment consisting of HTML and XML documents. The model uses path query reduction and diffusion for local and distributed query processing. In the local query processing, regular path queries are processed with the query reduction, where regular path queries are reduced as they move towards children nodes. In the distributed query processing, queries are forwarded to other sites when there are edges that connect two sites. We call it the query diffusion. In the distributed query processing, the query originating site must know the time when the query processing process is terminated. We propose a set of algorithms with which the query originating site detects the termination of distributed query processing both in the normal case (i.e., all query processing is ended normally) and the user abort case (i.e., the user wants to stop the query processing). 2. Background A general model for semi-structured data is a rooted and edge-labeled graph. Figure 1 illustrates an example of graph representation which is a fragment of a university web site. In the graph nodes are web pages and edges are hyperlinks between the nodes. Numbers in the nodes are the identifiers for each node /00 $ IEEE 393

2 ..... Figure 1. A graph representation of semistructured data Semi-structured query languages have a common feature that they use path expressions to traverse graphs. A path expression on semi-structured data denotes a sequence of edge labels. The query results are the nodes that satisfy the given path expression from the root node of the data. The use of regular expressions as the query language for semi-structured data retrieval is effective, since it need not describe all sequences of labels. The regular expression has the following grammar: R ::= P I a I - I RIR I R+R I R' Here, P is a user defined condition statement or boolean combination of such condition statements. a is a label constant, - is a label, RllRz is an alternation, R1 * Rz is the concatenation of R1 and Rl, and R' is the closure of R. The following regular expression Queryl finds all the papers of the computer science department in Figure 1. Queryl : -* =+ CS-Dept => -* + Paper The result is the set of nodes (57, 69, 70, 86). In the rest of the paper, we omit '3' in regular expressions if there is no ambiguity. In the paper, we consider a regular expression R of the following query. Q(DB) = select t where R + t in DB We call this query a regular path query or regular query, and the result of this query is a set of nodes in the graph reachable from the root via the given regular expression R. In these days, there are many web sites spread out in several locations and they are connected to each other by hyperlinks. If we consider those web sites as semi-structured data, we can apply the semi-structured data model to them. An example of university web sites distributed on three different sites is depicted in Figure 2. If Queryl is applied to the node 1 in Figure 2, the result is the nodes with slant lines. Suczu [8] proposed a query decomposition method for query processing on distributed semi-structured data. In the method, queries are transferred to all other sites, computed in each site, and then the result of each site is returned to the query originating site. This method assumes Figure 2. An example of distributed semistructured data R : a*bc*d I a*e b b'sa Reduce(R,a) = a'bc'd Reduce(R,b) = c'd Reduce(R,c) = q5 Reduce(R,d) = 4 Reduce(R,e) = E I a'e -@ Figure 3. An example of path query reduction that semi-structured data is distributed on fixed and known sites, and every site knows its input and output nodes'. However, if we consider the current web environment, this assumption is not realistic. In HTML and XML documents, identifying output nodes is very easy, but identifying input nodes is almost impossible. 3. Proposed Query Processing Model In this section, we propose a query processing model for distributed semi-structured data retrieval. - Local Query Processing The query in a local site is processed through the query reduction, which is done by the following 'Reduce' function. The Reduce function takes two inputs: (i) a path query given as a regular expression and (ii) a label used for state transition in an automata constructed from the regular expression. If a transition from the start state of the automata using the label is possible, then the Reduce function returns a regular path query with a new start state which is obtained from the transition. Otherwise, the Reduce function returns q5. Figure 3 is an example of an automata and 'For every cross link U + v from site a to site p, we call U an output node in o and v an input node in D. 394

3 Reduce function results for each label of a regular expression R : a'bc'd 1 a'e. We assume that each local site uses Algorithm 1 for path query processing. The LQP (Local Query Processing) function in Algorithm 1 takes two inputs: (i) a path query given as a regular expression and (ii) a node identifier. It applies the Reduce function to each label on the edges that are adjacent to the given node. If the result of Reduce function is not 4, the LQP function recursively calls itself on children nodes using a reduced query. Figure 4 shows an example, where a path query a'bc'd is given to node 1 and the query result is (3, 4). Algorithm 1 Local query processing Visited t 4 {nodes already visited} Result c 4 {query result} LQP(R, Root(DB)) function LQP(R, U) begin if R's start state E R's final states then Result t Result U {U} if < R,u >E Visited then return Visited t Visited U { < R, U >} for all U 4 U do R2 = Reduce(R,a) if R2 # 4 then LQP(R2,u) end for end query processing model for distributed semi-structured data is based on Algorithm 2. Algorithm 2 takes two inputs: (i) distributed semi-structured data and (ii) a regular path query. Algorithm 2 Distributed query processing s : query originating site 0, : identifier for a node in site p R : regular expression {query message (s, 0,, R) arrives at site p} receive (s, 0,, R) evaluate (s, O,, R) {query message (s, 0,, R') is computed} send (s, 0,, R') to site r {query processing result result, is computed} send (result,) to site s..,i. Figure 5. An example of distributed query processing Figure 4. An example of local query processing - Distributed Query Processing For distributed query processing, we assume that each site knows its output nodes only. Unlike local query processing, in distributed semi-structured data retrieval, queries have to be transferred to several sites using the query diffusion. The query diffusion means that, in each local site, if an edge to the other site is reachable and the result of the Reduce function on that edge is not 4, then the reduced query is transferred along that edge. The proposed If a site receives a query, the query is processed through the LQP algorithm in the site. When an edge to the other site which satisfies the Reduce function is found, a reduced query message is sent to that site. After the local query processing, the query result is sent to the query originating site. As shown here, the distributed query processing model is based on both the query reduction and the query diffusion. An example of query processing using this model is illustrated in Figure Termination Detection The query processing model we have proposed is applicable to the current web environment that consists of HTML and XML documents. However, since the query originating site gets query results incrementally from several sites, it can not detect when the query processing is terminated. In this section, we propose an algorithm with which the query originating site detects when the query processing is finished in our distributed query processing model. In addition, we propose an extended algorithm that the query 395

4 Name I Descriotion state, parent, n (result,) Table 1. Notations 1,... N I site (s is query originating site) each site's state (active, passive) the site which sent query to site p (parent, = s, for p # s) the log for queries given to site p the result for a given query in site p the number of (query-ack) messages to be received in site p the number of (result-ack) messages to be received in site p the number of (query-ack) messages to be received in site p for query originating site detects the termination of the query processing when the user wants to abort the query processing. Table 1 describes some notations for the termination detection of query processing in the normal and user abort cases. - Normal Case For the query originating site to detect the termination of distributed query processing, each participating site p acts as follows. when query message (s, y, 0,, R) arrives at site p receive (s, q, 0,, R) if (O,, R) E log, then send (query-ack) to q return 109, + log, U { (OP, R) I if then parent, t q send (query-ack) to q state, t active evaluate (s, 0,, R) 0 when (query-ack) arrives at site p n(query,)-- if n(query,) = 0 and state, =passive then 2(s, q, 0,, R) : s - query originating site, q - parent site which sent the query, 0, - identifier of the node in site p, R - path query 0 when query processing result result, is computed send (result,) to site s n( result,) ++ 0 when (result-ack) arrives at site p receive (result-ack) n(res.uk,)-- if n(result,) = 0 then state, t passive if n(yuery,) = 0 and state,=passive then. 0 when query message (s,p, O,, R') is computed send (s,p, O,, R') to site r n(query,)++ Figure 6 is an example of a distributed query processing and the termination detection. Queries are transferred from Site 1 to Site 2, Site 2 to Site 3 and Site 4, and Site 3 to Site 4. Each transferred query is processed and the query result is transferred to Site 1. Site 3 and Site 4 notify the end of the query processing to Site 2, and Site 2 to Site 1. Finally, Site 1 detects the termination of the distributed query processing. Figure 6. An example of normal termination For the query that Site 3 sent to Site 4, Site 4 immediately sends an acknowledgement to Site 3 because its parent variable was already set to Site 2. This immediate acknowledgement maintains the path on which query acknowledgement messages are delivered as a tree. - Cycle Prevention 396

5 Let query R be delivered to the node a of site A. When processing R, the same query R can be delivered to the node b of site B, and then it can be delivered to the node a of site A again. For preventing this kind of query cycling, each site must not process the query message that is already processed or currently being processed. For this purpose, each site uses a log variable for storing received query messages. The management of log variables is as follows. After detecting the termination of query processing, the query originating site notifies the termination to all the sites which sent the query result. Then each.site clears its log variable. - User Abort Case While the given query is in process, user may want to abort it. For supporting the termination detection in the case of user requested abort, we use a variable n(queryi) (in Table 1) and add the followings to the previous algorithm: Increasing n(queryi) when a query message to site T is computed in site p and decreasing it when the (query-ack) is delivered from site r. In addition, we add the following abort message handling algorithm. 0 when (abort) arrives at site p if then return stop its evaluation state, t passive if n(query,) = 0 then if n(result,) = 0 then send (abort) to all sites T where n(queryi) # 0 n(queryi) = 0 for all T - Correctness Now, we prove the correctness of the termination detection algorithms for the normal case and for the user abort case. In each case, we prove the followings: (i) after the termination of distributed query processing, the query originating site returns to a passive state and (ii) after the query originating site returns to a passive state, all distributed query processing is actually terminated. First, we define the termination of distributed query processing. Definition 1 The distributed query processing is terminated when all the relevant sites are in passive states and there is no undelivered (query), (query-ack), (result) and (result-ack) message. There may be (abort) messages in the network in case of the user requested abort. However, this message does not change the state of a site to be active. Theorem 1 The distributed query processing for a regular path query is terminated if and only if the query originating site returns to a passive state. [Proof] We can prove the theorem by transforming Dijkstra s diffusing computation method [3]. For details, see the reference [4]. 5. Conclusion In this paper, we have proposed a query processing model for distributed semi-structured data retrieval. The proposed model is based on the query reduction and the query diffusion The query reduction is used for the local query processing and the query diffusion is used for the distributed query processing. In the model we assume that Each site knows its output nodes only, which is practically applicable to the current web environment. In addition, we have proposed termination detection algorithms for the proposed model. References (11 S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The Lore1 query language for semistructured data. International Journal on Digital Libraries, 1(1), April [2] Peter Buneman, Susan Davidson, Gerd Hillebrand, and Dan Suciu. A query language and optimization techniques for unstructured data. In Proceedings of ACM SIGMOD Conference, [3] Edsger W. Dijkstra and C.S. Scholten. Termination Detection For Diffusing Computations. Information Processing Letters, 11, [4] J. Lee, Y. D. Chung, and M. H. Kim. A Path Query Processing Scheme for Distributed Semi-structured Data Retrieval. Journal of KISS - Databases, (revision). [5] R. Goldman, S. Chawathe, A. Crespo, and J. Mchugh. A Standard Textual Interchange Format for the Object Exchnage Model (OEM). Technical report, Stanford University, [6] J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: A database management system for semistructured data. SIGMOD Record, 26(3), September [7] Tova Milo and Dan Suciu. Index structures for path expressions. In International Conference on Database Theory, [SI Dan Suciu. Query Decomposition and View Maintenance for Query Languages for Unstructured Data. In Proceedings of Very Large Databases Conference,

Semistructured Data Store Mapping with XML and Its Reconstruction

Semistructured Data Store Mapping with XML and Its Reconstruction Semistructured Data Store Mapping with XML and Its Reconstruction Enhong CHEN 1 Gongqing WU 1 Gabriela Lindemann 2 Mirjam Minor 2 1 Department of Computer Science University of Science and Technology of

More information

Some aspects of references behaviour when querying XML with XQuery

Some aspects of references behaviour when querying XML with XQuery Some aspects of references behaviour when querying XML with XQuery c B.Khvostichenko boris.khv@pobox.spbu.ru B.Novikov borisnov@acm.org Abstract During the XQuery query evaluation, the query output is

More information

Efficient Query Evaluation on Distributed Graph with Hadoop Environment

Efficient Query Evaluation on Distributed Graph with Hadoop Environment 30 (2013 ) Efficient Query Evaluation on Distributed Graph with Hadoop Environment Le-Duc Tung, Quyet Nguyen-Van, Zhenjiang Hu Graph has emerged as a powerful data structure to describe the variety of

More information

A Web-Based OO Platform for the Development of Didactic Multimedia Collaborative Applications

A Web-Based OO Platform for the Development of Didactic Multimedia Collaborative Applications A Web-Based OO Platform for the Development of Didactic Multimedia Collaborative Applications David A. Fuller, Luis A. Guerrero, Jenny Zegarra {dfuller, luguerre, jzegarra}@ing.puc.cl Computer Science

More information

Folder(Inbox) Message Message. Body

Folder(Inbox) Message Message. Body Rening OEM to Improve Features of Query Languages for Semistructured Data Pavel Hlousek Charles University, Faculty of Mathematics and Physics, Prague, Czech Republic Abstract. Semistructured data can

More information

An index replication scheme for wireless data broadcasting

An index replication scheme for wireless data broadcasting The Journal of Systems and Software 51 (2000) 191±199 wwwelseviercom/locate/jss An index replication scheme for wireless data broadcasting Yon Dohn Chung, Myoung Ho Kim * Department of Computer Science,

More information

Schemas for Integration and Translation of. Structured and Semi-Structured Data?

Schemas for Integration and Translation of. Structured and Semi-Structured Data? Schemas for Integration and Translation of Structured and Semi-Structured Data? Catriel Beeri 1 and Tova Milo 2 1 Hebrew University beeri@cs.huji.ac.il 2 Tel Aviv University milo@math.tau.ac.il 1 Introduction

More information

Interactive Query and Search in Semistructured Databases æ

Interactive Query and Search in Semistructured Databases æ Interactive Query and Search in Semistructured Databases Roy Goldman, Jennifer Widom Stanford University froyg,widomg@cs.stanford.edu www-db.stanford.edu Abstract Semistructured graph-based databases have

More information

Indexing XML Data with ToXin

Indexing XML Data with ToXin Indexing XML Data with ToXin Flavio Rizzolo, Alberto Mendelzon University of Toronto Department of Computer Science {flavio,mendel}@cs.toronto.edu Abstract Indexing schemes for semistructured data have

More information

Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database

Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database Yuanying Mo National University of Singapore moyuanyi@comp.nus.edu.sg Tok Wang Ling National University of Singapore

More information

Fixpoint Path Queries

Fixpoint Path Queries Fixpoint Path Queries N. Bidoit and M. Ykhlef LaBRI (U.R.A. 1304 du CNRS) Université Bordeaux I 351, Cours de la Libération, F-33405 Talence email : {Nicole.Bidoit, Mourad.Ykhlef}@labri.u-bordeaux.fr Abstract

More information

Aspects of an XML-Based Phraseology Database Application

Aspects of an XML-Based Phraseology Database Application Aspects of an XML-Based Phraseology Database Application Denis Helic 1 and Peter Ďurčo2 1 University of Technology Graz Insitute for Information Systems and Computer Media dhelic@iicm.edu 2 University

More information

Ambiguous Grammars and Compactification

Ambiguous Grammars and Compactification Ambiguous Grammars and Compactification Mridul Aanjaneya Stanford University July 17, 2012 Mridul Aanjaneya Automata Theory 1/ 44 Midterm Review Mathematical Induction and Pigeonhole Principle Finite Automata

More information

Hybrid XML Data Model Architecture for Efficient Document Management

Hybrid XML Data Model Architecture for Efficient Document Management Association for Information Systems AIS Electronic Library (AISeL) ECIS 2003 Proceedings European Conference on Information Systems (ECIS) 2003 Hybrid XML Data Model Architecture for Efficient Document

More information

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington CS330 Lecture April 8, 2003 1 Overview From HTML to XML DTDs Querying XML: XPath Transforming XML: XSLT

More information

Introduction to Semistructured Data and XML

Introduction to Semistructured Data and XML Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of Washington Database Management Systems, R. Ramakrishnan 1 How the Web is Today HTML documents often

More information

Introduction to XML. Yanlei Diao UMass Amherst April 17, Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.

Introduction to XML. Yanlei Diao UMass Amherst April 17, Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. Introduction to XML Yanlei Diao UMass Amherst April 17, 2008 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. 1 Structure in Data Representation Relational data is highly

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Lecture 11 Ana Bove April 26th 2018 Recap: Regular Languages Decision properties of RL: Is it empty? Does it contain this word? Contains

More information

SSDDM: Distance Metric for Graph-based Semistructured

SSDDM: Distance Metric for Graph-based Semistructured SSDDM: Distance Metric for Graph-based Semistructured Data István Soós, Tihamér Levendovszky, Hassan Charaf Department of Automation and Applied Informatics Budapest University of Technology and Economics

More information

fied by a regular expression [4,7,9,11,23,16]. However, this kind of navigational queries is not completely satisfactory since in many cases we would

fied by a regular expression [4,7,9,11,23,16]. However, this kind of navigational queries is not completely satisfactory since in many cases we would Electronic Notes in Theoretical Computer Science 50 No. 3 (2001) Proc. GT-VMT 2001 URL: http://www.elsevier.nl/locate/entcs/volume50.html 10 pages Graph Grammars for Querying Graph-like Data S. Flesca,

More information

Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes

Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes Vu Le Anh, Attila Kiss Department of Information Systems, ELTE University, Hungary leanhvu@inf.elte.hu,

More information

Non-context-Free Languages. CS215, Lecture 5 c

Non-context-Free Languages. CS215, Lecture 5 c Non-context-Free Languages CS215 Lecture 5 c 2007 1 The Pumping Lemma Theorem (Pumping Lemma) Let be context-free There exists a positive integer divided into five pieces Proof for for each and Let and

More information

LOGIC AND DISCRETE MATHEMATICS

LOGIC AND DISCRETE MATHEMATICS LOGIC AND DISCRETE MATHEMATICS A Computer Science Perspective WINFRIED KARL GRASSMANN Department of Computer Science University of Saskatchewan JEAN-PAUL TREMBLAY Department of Computer Science University

More information

METAXPath. Utah State University. From the SelectedWorks of Curtis Dyreson. Curtis Dyreson, Utah State University Michael H. Böhen Christian S.

METAXPath. Utah State University. From the SelectedWorks of Curtis Dyreson. Curtis Dyreson, Utah State University Michael H. Böhen Christian S. Utah State University From the SelectedWorks of Curtis Dyreson December, 2001 METAXPath Curtis Dyreson, Utah State University Michael H. Böhen Christian S. Jensen Available at: https://works.bepress.com/curtis_dyreson/11/

More information

Distributed Query Evaluation on Semistructured Data

Distributed Query Evaluation on Semistructured Data Distributed Query Evaluation on Semistructured Data DAN SUCIU University of Washington Semistructured data is modeled as a rooted, labeled graph. The simplest kinds of queries on such data are those which

More information

Design of Index Schema based on Bit-Streams for XML Documents

Design of Index Schema based on Bit-Streams for XML Documents Design of Index Schema based on Bit-Streams for XML Documents Youngrok Song 1, Kyonam Choo 3 and Sangmin Lee 2 1 Institute for Information and Electronics Research, Inha University, Incheon, Korea 2 Department

More information

Semantic Web and Databases: Relationships and some Open Problems

Semantic Web and Databases: Relationships and some Open Problems Semantic Web and Databases: Relationships and some Open Problems Stefan Decker Gates Bldg 4A/425 Stanford University, Stanford, CA, 94306, USA stefan@db.stanford.edu Abstract. In this position paper I

More information

Element Algebra. 1 Introduction. M. G. Manukyan

Element Algebra. 1 Introduction. M. G. Manukyan Element Algebra M. G. Manukyan Yerevan State University Yerevan, 0025 mgm@ysu.am Abstract. An element algebra supporting the element calculus is proposed. The input and output of our algebra are xdm-elements.

More information

A Commit Scheduler for XML Databases

A Commit Scheduler for XML Databases A Commit Scheduler for XML Databases Stijn Dekeyser and Jan Hidders University of Antwerp Abstract. The hierarchical and semistructured nature of XML data may cause complicated update-behavior. Updates

More information

XML-QE: A Query Engine for XML Data Soures

XML-QE: A Query Engine for XML Data Soures XML-QE: A Query Engine for XML Data Soures Bruce Jackson, Adiel Yoaz {brucej, adiel}@cs.wisc.edu 1 1. Introduction XML, short for extensible Markup Language, may soon be used extensively for exchanging

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Explorer Beyond XML Query Languages Citation for published version: Buneman, P, Deutsch, A, Fan, W, Liefke, H, Sahuguet, A & Tan, W-C 1998, Beyond XML Query Languages. in Query Language

More information

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data?

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data? Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data? Diego Calvanese University of Rome La Sapienza joint work with G. De Giacomo, M. Lenzerini, M.Y. Vardi

More information

A Dynamic Labeling Scheme using Vectors

A Dynamic Labeling Scheme using Vectors A Dynamic Labeling Scheme using Vectors Liang Xu, Zhifeng Bao, Tok Wang Ling School of Computing, National University of Singapore {xuliang, baozhife, lingtw}@comp.nus.edu.sg Abstract. The labeling problem

More information

Context-Free Languages and Parse Trees

Context-Free Languages and Parse Trees Context-Free Languages and Parse Trees Mridul Aanjaneya Stanford University July 12, 2012 Mridul Aanjaneya Automata Theory 1/ 41 Context-Free Grammars A context-free grammar is a notation for describing

More information

The Relational Model

The Relational Model The Relational Model David Toman School of Computer Science University of Waterloo Introduction to Databases CS348 David Toman (University of Waterloo) The Relational Model 1 / 28 The Relational Model

More information

Lab Assignment 3 on XML

Lab Assignment 3 on XML CIS612 Dr. Sunnie S. Chung Lab Assignment 3 on XML Semi-structure Data Processing: Transforming XML data to CSV format For Lab3, You can write in your choice of any languages in any platform. The Semi-Structured

More information

DATA MODELS FOR SEMISTRUCTURED DATA

DATA MODELS FOR SEMISTRUCTURED DATA Chapter 2 DATA MODELS FOR SEMISTRUCTURED DATA Traditionally, real world semantics are captured in a data model, and mapped to the database schema. The real world semantics are modeled as constraints and

More information

An Improved Prefix Labeling Scheme: A Binary String Approach for Dynamic Ordered XML

An Improved Prefix Labeling Scheme: A Binary String Approach for Dynamic Ordered XML An Improved Prefix Labeling Scheme: A Binary String Approach for Dynamic Ordered XML Changqing Li and Tok Wang Ling Department of Computer Science, National University of Singapore {lichangq, lingtw}@comp.nus.edu.sg

More information

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5 1 Not all languages are regular So what happens to the languages which are not regular? Can we still come up with a language recognizer?

More information

1.0 Languages, Expressions, Automata

1.0 Languages, Expressions, Automata .0 Languages, Expressions, Automata Alphaet: Language: a finite set, typically a set of symols. a particular suset of the strings that can e made from the alphaet. ex: an alphaet of digits = {-,0,,2,3,4,5,6,7,8,9}

More information

Querying Spatiotemporal XML Using DataFoX

Querying Spatiotemporal XML Using DataFoX Querying Spatiotemporal XML Using DataFoX Yi Chen Peter Revesz Computer Science and Engineering Department University of Nebraska-Lincoln Lincoln, NE 68588, USA {ychen,revesz}@cseunledu Abstract We describe

More information

Recursion and Structural Induction

Recursion and Structural Induction Recursion and Structural Induction Mukulika Ghosh Fall 2018 Based on slides by Dr. Hyunyoung Lee Recursively Defined Functions Recursively Defined Functions Suppose we have a function with the set of non-negative

More information

Quiz 1: Solutions J/18.400J: Automata, Computability and Complexity. Nati Srebro, Susan Hohenberger

Quiz 1: Solutions J/18.400J: Automata, Computability and Complexity. Nati Srebro, Susan Hohenberger 6.45J/8.4J: utomata, Computability and Complexity Quiz : Solutions Prof. Nancy Lynch Nati Srebro, Susan Hohenberger Please write your name in the upper corner of each page. (2 Points) Q- Problem : True

More information

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1 Introduction to Automata Theory BİL405 - Automata Theory and Formal Languages 1 Automata, Computability and Complexity Automata, Computability and Complexity are linked by the question: What are the fundamental

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lecture 11: XML and XPath 1 XML Outline What is XML? Syntax Semistructured data DTDs XPath 2 What is XML? Stands for extensible Markup Language 1. Advanced, self-describing

More information

CS152: Programming Languages. Lecture 2 Syntax. Dan Grossman Spring 2011

CS152: Programming Languages. Lecture 2 Syntax. Dan Grossman Spring 2011 CS152: Programming Languages Lecture 2 Syntax Dan Grossman Spring 2011 Finally, some formal PL content For our first formal language, let s leave out functions, objects, records, threads, exceptions,...

More information

Nested XPath Query Optimization for XML Structured Document Database

Nested XPath Query Optimization for XML Structured Document Database Nested XPath Query Optimization for XML Structured Document Database Radha Senthilkumar #, G. B. Rakesh, N.Sasikala, M.Gowrishankar, A. Kannan 3, Department of Information Technology, MIT Campus, Anna

More information

Labeling Recursive Workflow Executions On-the- Fly

Labeling Recursive Workflow Executions On-the- Fly University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 6-1-2011 Labeling Recursive Workflow Executions On-the- Fly Zhuowei Bao University of Pennsylvania,

More information

Using Webspaces to Model Document Collections on the Web

Using Webspaces to Model Document Collections on the Web Using Webspaces to Model Document Collections on the Web Roelof van Zwol and Peter M.G. Apers University of Twente, Department of Computer Science P.O.box 217, 7500 AE, Enschede, the Netherlands {zwol,

More information

Jennifer Widom. Stanford University

Jennifer Widom. Stanford University Principled Research in Database Systems Stanford University What Academics Give Talks About Other people s papers Thesis and new results Significant research projects The research field BIG VISION Other

More information

Extending E-R for Modelling XML Keys

Extending E-R for Modelling XML Keys Extending E-R for Modelling XML Keys Martin Necasky Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic martin.necasky@mff.cuni.cz Jaroslav Pokorny Faculty of Mathematics and

More information

10/24/12. What We Have Learned So Far. XML Outline. Where We are Going Next. XML vs Relational. What is XML? Introduction to Data Management CSE 344

10/24/12. What We Have Learned So Far. XML Outline. Where We are Going Next. XML vs Relational. What is XML? Introduction to Data Management CSE 344 What We Have Learned So Far Introduction to Data Management CSE 344 Lecture 12: XML and XPath A LOT about the relational model Hand s on experience using a relational DBMS From basic to pretty advanced

More information

Context-Free Grammars and Languages (2015/11)

Context-Free Grammars and Languages (2015/11) Chapter 5 Context-Free Grammars and Languages (2015/11) Adriatic Sea shore at Opatija, Croatia Outline 5.0 Introduction 5.1 Context-Free Grammars (CFG s) 5.2 Parse Trees 5.3 Applications of CFG s 5.4 Ambiguity

More information

Chapter 13 XML: Extensible Markup Language

Chapter 13 XML: Extensible Markup Language Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server

More information

Fragmentation of XML Documents

Fragmentation of XML Documents Fragmentation of XML Documents Hui Ma 1, Klaus-Dieter Schewe 2 1 Victoria University of Wellington, School of Engineering and Computer Science, Wellington, New Zealand hui.ma@ecs.vuw.ac.nz 2 Software Competence

More information

Overview of the Integration Wizard Project for Querying and Managing Semistructured Data in Heterogeneous Sources

Overview of the Integration Wizard Project for Querying and Managing Semistructured Data in Heterogeneous Sources In Proceedings of the Fifth National Computer Science and Engineering Conference (NSEC 2001), Chiang Mai University, Chiang Mai, Thailand, November 2001. Overview of the Integration Wizard Project for

More information

modern database systems lecture 4 : semi-structured data

modern database systems lecture 4 : semi-structured data modern database systems lecture 4 : semi-structured data Aristides Gionis spring 2018 in the previous lectures relational model and sql storage and indexing access cost analysis hash index b+ trees indexing

More information

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer: Theoretical Part Chapter one:- - What are the Phases of compiler? Six phases Scanner Parser Semantic Analyzer Source code optimizer Code generator Target Code Optimizer Three auxiliary components Literal

More information

Context-Free Grammars

Context-Free Grammars Context-Free Grammars 1 Informal Comments A context-free grammar is a notation for describing languages. It is more powerful than finite automata or RE s, but still cannot define all possible languages.

More information

Ontology Structure of Elements for Web-based Natural Disaster Preparedness Systems

Ontology Structure of Elements for Web-based Natural Disaster Preparedness Systems Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2007 Proceedings Americas Conference on Information Systems (AMCIS) December 2007 Ontology Structure of Elements for Web-based Natural

More information

Antisymmetric Relations. Definition A relation R on A is said to be antisymmetric

Antisymmetric Relations. Definition A relation R on A is said to be antisymmetric Antisymmetric Relations Definition A relation R on A is said to be antisymmetric if ( a, b A)(a R b b R a a = b). The picture for this is: Except For Example The relation on R: if a b and b a then a =

More information

Querying XML Data. Mary Fernandez. AT&T Labs Research David Maier. Oregon Graduate Institute

Querying XML Data. Mary Fernandez. AT&T Labs Research David Maier. Oregon Graduate Institute Querying XML Data Alin Deutsch Univ. of Pennsylvania adeutsch@gradient.cis.upenn.edu Alon Levy University of Washington, Seattle alon@cs.washington.edu Mary Fernandez AT&T Labs Research mff@research.att.com

More information

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules. Outline Type Checking General properties of type systems Types in programming languages Notation for type rules Logical rules of inference Common type rules 2 Static Checking Refers to the compile-time

More information

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference Type Checking Outline General properties of type systems Types in programming languages Notation for type rules Logical rules of inference Common type rules 2 Static Checking Refers to the compile-time

More information

XSLT and Structural Recursion. Gestão e Tratamento de Informação DEI IST 2011/2012

XSLT and Structural Recursion. Gestão e Tratamento de Informação DEI IST 2011/2012 XSLT and Structural Recursion Gestão e Tratamento de Informação DEI IST 2011/2012 Outline Structural Recursion The XSLT Language Structural Recursion : a different paradigm for processing data Data is

More information

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation Outline Introduction to Parsing (adapted from CS 164 at Berkeley) Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed ranslation he Functionality of the Parser Input: sequence of

More information

Graph Semantic Based Conceptual Model of Semistructured Data: An Object Oriented Approach

Graph Semantic Based Conceptual Model of Semistructured Data: An Object Oriented Approach Graph Semantic Based Conceptual Model of Semistructured Data: An Object Oriented Approach Anirban Sarkar 1, Sesa Singha Roy 2 1 Department of Computer Applications, National Institute of Technology, Durgapur,

More information

Introduction to Database Systems CSE 414

Introduction to Database Systems CSE 414 Introduction to Database Systems CSE 414 Lecture 13: XML and XPath 1 Announcements Current assignments: Web quiz 4 due tonight, 11 pm Homework 4 due Wednesday night, 11 pm Midterm: next Monday, May 4,

More information

Message-Optimal and Latency-Optimal Termination Detection Algorithms for Arbitrary Topologies

Message-Optimal and Latency-Optimal Termination Detection Algorithms for Arbitrary Topologies Message-Optimal and Latency-Optimal Termination Detection Algorithms for Arbitrary Topologies Neeraj Mittal, S. Venkatesan, and Sathya Peri Department of Computer Science The University of Texas at Dallas,

More information

Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, MAINTENANCE OF RECURSIVE VIEWS. Suzanne W.

Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, MAINTENANCE OF RECURSIVE VIEWS. Suzanne W. Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, 2009. MAINTENANCE OF RECURSIVE VIEWS Suzanne W. Dietrich Arizona State University http://www.public.asu.edu/~dietrich

More information

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM. What is the corresponding regex? [2-9]: ([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM. CS 230 - Spring 2018 4-1 More CFG Notation

More information

Foundations of Computer Science Spring Mathematical Preliminaries

Foundations of Computer Science Spring Mathematical Preliminaries Foundations of Computer Science Spring 2017 Equivalence Relation, Recursive Definition, and Mathematical Induction Mathematical Preliminaries Mohammad Ashiqur Rahman Department of Computer Science College

More information

Slides for Faculty Oxford University Press All rights reserved.

Slides for Faculty Oxford University Press All rights reserved. Oxford University Press 2013 Slides for Faculty Assistance Preliminaries Author: Vivek Kulkarni vivek_kulkarni@yahoo.com Outline Following topics are covered in the slides: Basic concepts, namely, symbols,

More information

Inferring Structure in Semistructured Data

Inferring Structure in Semistructured Data Inferring Structure in Semistructured Data SVETLOZAR NESTOROV æ SERGE ABITEBOUL y RAJEEV MOTWANI z Department of Computer Science Stanford University Stanford, CA 94305-9040 fevtimov,abiteboug@db.stanford.edu,

More information

in [8] was soon recognized by the authors themselves [7]: After a few experiments they realized that RMM was not adequate for many typical application

in [8] was soon recognized by the authors themselves [7]: After a few experiments they realized that RMM was not adequate for many typical application Dynamic web sites M. Gabbrielli Universit a di Pisa Λ M. Marchiori W3C/MIT x F.S. de Boer Universiteit Utrecht ΛΛ Abstract We propose a formalism for dynamic web sites, based on the RMM model, an existing

More information

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please)

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please) Virginia Tech. Computer Science CS 4604 Introduction to DBMS Spring 2016, Prakash Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please) Reminders: a. Out of 100 points.

More information

Inferring Structure in Semistructured Data

Inferring Structure in Semistructured Data Inferring Structure in Semistructured Data SVETLOZAR NESTOROV SERGE ABITEBOUL RAJEEV MOTWANI æ Department of Computer Science Stanford University Stanford, CA 94305-9040 fevtimov,abiteboug@db.stanford.edu,

More information

New Rewritings and Optimizations for Regular Path Queries

New Rewritings and Optimizations for Regular Path Queries New Rewritings and Optimizations for Regular Path Queries Gösta Grahne and Alex Thomo Concordia University, Montreal, Canada {grahne, thomo}@cs.concordia.ca Abstract. All the languages for querying semistructured

More information

o12 references o24 references o29 first last

o12 references o24 references o29 first last Semistructured Data and XML Dan Suciu AT&T Labs suciu@research.att.com 1 Introduction Today much of the existing electronic data lies outside of database management system: it lies in structured documents

More information

An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance

An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance Toan Thang Ta, Cheng-Yao Lin and Chin Lung Lu Department of Computer Science National Tsing Hua University, Hsinchu

More information

X-tree Diff+: Efficient Change Detection Algorithm in XML Documents *

X-tree Diff+: Efficient Change Detection Algorithm in XML Documents * X-tree Diff+: Efficient Change Detection Algorithm in XML Documents * Suk Kyoon Lee, Dong Ah Kim Division of Information and Computer Science, Dankook University San 8, Hannam-dong, Youngsan-gu, Seoul,

More information

Programming Languages Third Edition

Programming Languages Third Edition Programming Languages Third Edition Chapter 12 Formal Semantics Objectives Become familiar with a sample small language for the purpose of semantic specification Understand operational semantics Understand

More information

CS521 \ Notes for the Final Exam

CS521 \ Notes for the Final Exam CS521 \ Notes for final exam 1 Ariel Stolerman Asymptotic Notations: CS521 \ Notes for the Final Exam Notation Definition Limit Big-O ( ) Small-o ( ) Big- ( ) Small- ( ) Big- ( ) Notes: ( ) ( ) ( ) ( )

More information

DSD: A Schema Language for XML

DSD: A Schema Language for XML DSD: A Schema Language for XML Nils Klarlund, AT&T Labs Research Anders Møller, BRICS, Aarhus University Michael I. Schwartzbach, BRICS, Aarhus University Connections between XML and Formal Methods XML:

More information

An Extended Byte Carry Labeling Scheme for Dynamic XML Data

An Extended Byte Carry Labeling Scheme for Dynamic XML Data Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 5488 5492 An Extended Byte Carry Labeling Scheme for Dynamic XML Data YU Sheng a,b WU Minghui a,b, * LIU Lin a,b a School of Computer

More information

Monitoring Stable Properties in Dynamic Peer-to-Peer Distributed Systems

Monitoring Stable Properties in Dynamic Peer-to-Peer Distributed Systems Monitoring Stable Properties in Dynamic Peer-to-Peer Distributed Systems Sathya Peri and Neeraj Mittal Department of Computer Science, The University of Texas at Dallas, Richardson, TX 75083, USA sathya.p@student.utdallas.edu

More information

CS525 Winter 2012 \ Class Assignment #2 Preparation

CS525 Winter 2012 \ Class Assignment #2 Preparation 1 CS525 Winter 2012 \ Class Assignment #2 Preparation Ariel Stolerman 2.26) Let be a CFG in Chomsky Normal Form. Following is a proof that for any ( ) of length exactly steps are required for any derivation

More information

Introduction to Database Systems CSE 414

Introduction to Database Systems CSE 414 Introduction to Database Systems CSE 414 Lecture 14-15: XML CSE 414 - Spring 2013 1 Announcements Homework 4 solution will be posted tomorrow Midterm: Monday in class Open books, no notes beyond one hand-written

More information

A Note on Scheduling Parallel Unit Jobs on Hypercubes

A Note on Scheduling Parallel Unit Jobs on Hypercubes A Note on Scheduling Parallel Unit Jobs on Hypercubes Ondřej Zajíček Abstract We study the problem of scheduling independent unit-time parallel jobs on hypercubes. A parallel job has to be scheduled between

More information

XML Query Processing and Optimization

XML Query Processing and Optimization XML Query Processing and Optimization Bartley D. Richardson Department of Electrical & Computer Engineering and Computer Science University of Cincinnati December 16, 2005 Outline Background XML As A Data

More information

Composability Test of BOM based models using Petri Nets

Composability Test of BOM based models using Petri Nets I. Mahmood, R. Ayani, V. Vlassov and F. Moradi 7 Composability Test of BOM based models using Petri Nets Imran Mahmood 1, Rassul Ayani 1, Vladimir Vlassov 1, and Farshad Moradi 2 1 Royal Institute of Technology

More information

Choosing a Data Model and Query Language for Provenance

Choosing a Data Model and Query Language for Provenance Choosing a Data Model and Query Language for Provenance The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Accessed Citable

More information

Computer Science 236 Fall Nov. 11, 2010

Computer Science 236 Fall Nov. 11, 2010 Computer Science 26 Fall Nov 11, 2010 St George Campus University of Toronto Assignment Due Date: 2nd December, 2010 1 (10 marks) Assume that you are given a file of arbitrary length that contains student

More information

CONVENTIONAL EXECUTABLE SEMANTICS. Grigore Rosu CS522 Programming Language Semantics

CONVENTIONAL EXECUTABLE SEMANTICS. Grigore Rosu CS522 Programming Language Semantics CONVENTIONAL EXECUTABLE SEMANTICS Grigore Rosu CS522 Programming Language Semantics Conventional Semantic Approaches A language designer should understand the existing design approaches, techniques and

More information

Functional Dependency: Design and Implementation of a Minimal Cover Algorithm

Functional Dependency: Design and Implementation of a Minimal Cover Algorithm IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 5, Ver. I (Sep.- Oct. 2017), PP 77-81 www.iosrjournals.org Functional Dependency: Design and Implementation

More information

Generalized Document Data Model for Integrating Autonomous Applications

Generalized Document Data Model for Integrating Autonomous Applications 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Generalized Document Data Model for Integrating Autonomous Applications Zsolt Hernáth, Zoltán Vincellér Abstract

More information

Storing and Querying XML Documents Without Using Schema Information

Storing and Querying XML Documents Without Using Schema Information Storing and Querying XML Documents Without Using Schema Information Kanda Runapongsa Department of Computer Engineering Khon Kaen University, Thailand krunapon@kku.ac.th Jignesh M. Patel Department of

More information

Querying Tree-Structured Data Using Dimension Graphs

Querying Tree-Structured Data Using Dimension Graphs Querying Tree-Structured Data Using Dimension Graphs Dimitri Theodoratos 1 and Theodore Dalamagas 2 1 Dept. of Computer Science New Jersey Institute of Technology Newark, NJ 07102 dth@cs.njit.edu 2 School

More information

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011 MA53: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 8 Date: September 2, 20 xercise: Define a context-free grammar that represents (a simplification of) expressions

More information

The PCAT Programming Language Reference Manual

The PCAT Programming Language Reference Manual The PCAT Programming Language Reference Manual Andrew Tolmach and Jingke Li Dept. of Computer Science Portland State University September 27, 1995 (revised October 15, 2002) 1 Introduction The PCAT language

More information