TwigINLAB: A Decomposition-Matching-Merging Approach To Improving XML Query Processing

Similar documents
TwigX-Guide: An Efficient Twig Pattern Matching System Extending DataGuide Indexing and Region Encoding Labeling

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

Evaluating XPath Queries

Aggregate Query Processing of Streaming XML Data

TwigList: Make Twig Pattern Matching Fast

An Implementation of Tree Pattern Matching Algorithms for Enhancement of Query Processing Operations in Large XML Trees

Compression of the Stream Array Data Structure

An Efficient XML Index Structure with Bottom-Up Query Processing

Prefix Path Streaming: a New Clustering Method for XML Twig Pattern Matching

On Label Stream Partition for Efficient Holistic Twig Join

Accelerating XML Structural Matching Using Suffix Bitmaps

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today

Full-Text and Structural XML Indexing on B + -Tree

A Survey Of Algorithms Related To Xml Based Pattern Matching

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( ) 1

International Journal of Advanced Research in Computer Science and Software Engineering

Efficient Processing of Complex Twig Pattern Matching

ISSN: [Lakshmikandan* et al., 6(3): March, 2017] Impact Factor: 4.116

Answering XML Twig Queries with Automata

A New Way of Generating Reusable Index Labels for Dynamic XML

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

Structural Joins, Twig Joins and Path Stack

An Extended Byte Carry Labeling Scheme for Dynamic XML Data

Outline. Depth-first Binary Tree Traversal. GerĂȘnciade Dados daweb -DCC922 - XML Query Processing. Motivation 24/03/2014

Navigation- vs. Index-Based XML Multi-Query Processing

Announcements (March 31) XML Query Processing. Overview. Navigational processing in Lore. Navigational plans in Lore

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

Labeling and Querying Dynamic XML Trees

Estimating the Selectivity of XML Path Expression with predicates by Histograms

QuickStack: A Fast Algorithm for XML Query Matching

Fast Matching of Twig Patterns

XML Query Processing and Optimization

Efficient Query Optimization Of XML Tree Pattern Matching By Using Holistic Approach

A New Method of Generating Index Label for Dynamic XML Data

Querying Spatiotemporal Data Based on XML Twig Pattern

A Prime Number Approach to Matching an XML Twig Pattern including Parent-Child Edges

<=chapter>... XML. book. allauthors (1,5:60,2) title (1,2:4,2) XML author author author. <=author> jane. Origins (1,1:150,1) (1,61:63,2) (1,64:93,2)

Compact Access Control Labeling for Efficient Secure XML Query Evaluation

Efficient Evaluation of Generalized Path Pattern Queries on XML Data

MQEB: Metadata-based Query Evaluation of Bi-labeled XML data

TwigStack + : Holistic Twig Join Pruning Using Extended Solution Extension

Rank-aware XML Data Model and Algebra: Towards Unifying Exact Match and Similar Match in XML

QuickXDB: A Prototype of a Native XML QuickXDB: Prototype of Native XML DBMS DBMS

A Dynamic Labeling Scheme using Vectors

A FRACTIONAL NUMBER BASED LABELING SCHEME FOR DYNAMIC XML UPDATING

XQuery Optimization in Relational Database Systems

This is a repository copy of A Prime Number Approach to Matching an XML Twig Pattern including Parent-Child Edges.

SFilter: A Simple and Scalable Filter for XML Streams

XML Data Stream Processing: Extensions to YFilter

CSE 530A. B+ Trees. Washington University Fall 2013

A Sampling Approach for XML Query Selectivity Estimation

CHAPTER 3 LITERATURE REVIEW

Design of Index Schema based on Bit-Streams for XML Documents

XML Filtering Technologies

Security-Conscious XML Indexing

Tree-Pattern Queries on a Lightweight XML Processor

A Two-Step Approach for Tree-structured XPath Query Reduction

(2,4) Trees. 2/22/2006 (2,4) Trees 1

Parallelizing String Similarity Join Algorithms

Schema-Based XML-to-SQL Query Translation Using Interval Encoding

Integrating Path Index with Value Index for XML data

Experimental Evaluation of Query Processing Techniques over Multiversion XML Documents

Bottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases

Efficient Path Query Processing on Encoded XML

A Cloud Computing Implementation of XML Indexing Method Using Hadoop

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS

CBSL A Compressed Binary String Labeling Scheme for Dynamic Update of XML Documents

EE 368. Weeks 5 (Notes)

Monotone Constraints in Frequent Tree Mining

The Research on Coding Scheme of Binary-Tree for XML

IJSER 1 INTRODUCTION. Sathiya G. In this paper, proposed a novel, energy and latency. efficient wireless XML streaming scheme supporting twig

Child Prime Label Approaches to Evaluate XML Structured Queries

Data Centric Integrated Framework on Hotel Industry. Bridging XML to Relational Database

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

Optimize Twig Query Pattern Based on XML Schema

Answering Aggregate Queries Over Large RDF Graphs

Native Temporal Slicing Support for XML Databases

Efficient Indexing and Querying in XML Databases

A Novel Replication Strategy for Efficient XML Data Broadcast in Wireless Mobile Networks

Graph Matching Based Authorization Model for Efficient Secure XML Querying

Outline. Approximation: Theory and Algorithms. Ordered Labeled Trees in a Relational Database (II/II) Nikolaus Augsten. Unit 5 March 30, 2009

ADT 2009 Other Approaches to XQuery Processing

An Effective and Efficient Approach for Keyword-Based XML Retrieval. Xiaoguang Li, Jian Gong, Daling Wang, and Ge Yu retold by Daryna Bronnykova

Top-k Keyword Search Over Graphs Based On Backward Search

AFilter: Adaptable XML Filtering with Prefix-Caching and Suffix-Clustering

High-Performance Holistic XML Twig Filtering Using GPUs. Ildar Absalyamov, Roger Moussalli, Walid Najjar and Vassilis Tsotras

Symmetrically Exploiting XML

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

Efficient Querying of Root to Leaf Xpath Queries for XML Documents Atul D. Raut, Dr. M. Atique

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents

Evaluating find a path reachability queries

Improving Data Access Performance by Reverse Indexing

Storing and Querying Multiversion XML Documents using Durable Node Numbers

BLAS : An Efficient XPath Processing System

Compact Access Control Labeling for Efficient Secure XML Query Evaluation

DDE: From Dewey to a Fully Dynamic XML Labeling Scheme

Binary Search Tree (3A) Young Won Lim 6/2/18

Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving Multi-documents

Storing and Querying XML Documents Without Using Schema Information

The B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Transcription:

American Journal of Applied Sciences 5 (9): 99-25, 28 ISSN 546-9239 28 Science Publications TwigINLAB: A Decomposition-Matching-Merging Approach To Improving XML Query Processing Su-Cheng Haw and Chien-Sing Lee Multimedia University Faculty of Information Technology 63 Cyberjaya, Malaysia Abstract: The emergence of the Web has increased significant interests in querying XML data. Current methods for XML query processing still suffers from producing large intermediate results and are not efficient in supporting query with mixed types of relationships. We propose the TwigINLAB algorithm to process and optimize the query evaluation. Our TwigINLAB adopts the decompositionmatching-merging approach and focuses on optimizing all three sub-processes; introducing a novel compact labeling scheme, optimizing the matching phase and reducing the number of inspection required in the merging phase. Experimental results indicate that TwigINLAB can process both path queries and twig queries better than the TwigStack algorithm on an average of 2.7% and 8.7% respectively in terms of execution time using the SwissProt dataset. Keywords: XML, query optimization, structural query, twig query, decomposition-matching-merging INTRODUCTION extensible Mark-up Language (XML) is emerging as the de facto standard for data exchange over the Web. Since XML is a semi-structured data, two types of user queries namely full-text queries (keyword based search) and structural queries (complex queries specified in tree-like structure) are usually used []. This paper is concerned with structural queries. Structural queries can be viewed as sequences of location steps, where each node in the sequence is an element tag or string value. Query nodes are related by either parent-child (P-C) steps or ancestor-descendant (A-D) steps. These relationships are depicted with a single line and double lines respectively. Besides, query nodes can be related adjacently with one another by sibling or ordered query relationship. Sibling (ordered query) relationship is usually denoted by []. To process such queries, it may undergo a decomposition-matching-merging process. TWIG- XSKETCH [2], tree signature [3], MPMGJN [4], Stack- Tree [5] and PathStack and TwigStack [6] are examples of query processing using the decomposition-matchingmerging approaches. Nevertheless, most of these approaches focus on the second sub-process: the matching phase only. In this paper, we propose. a novel hybrid query optimization architecture, INLAB (combination of INdexing and LABeling techniques), which comprises an XML Parser, XML Encoder, XML Indexer and Query Engine and 2. query optimization algorithms (TwigINLAB) to process twig queries efficiently without traversing the whole XML tree. INLAB labeling scheme size is only 2 bytes; much shorter compared to previous labeling schemes. This enables quick determination of P-C relationship between elements in the XML database. However, to check for A-D relationship, the index table need to be accessed for confirmation. Besides, INLAB labeling is integer based. Integer processing is very efficient compared to that of string or bit-vector. The index structures of INLAB allow us to efficiently find all elements that belong to the same parent or ancestor. Our TwigINLAB approach decomposes relationships into a set of path queries. In addition, we focus on optimizing all three decomposition-matchingmerging sub-processes. First, we introduce a novel robust and compact labeling scheme consisting of <self level: parent> to allow quick determination and Corresponding Author: Su-Cheng Haw, Faculty of Information Technology, Multimedia University, 63 Cyberjaya, Malaysia, Tel: +63-8325233, Fax: +63-8325264 99

decomposition of the types of relationships among each path edge. Subsequently, we optimize the matching phase based on each relationship and finally reduce the number of inspection required in the merging phase. Twig Query Processing: With the increasing popularity of XML data representation, XML query processing and optimization has attracted a lot of research interest [7, 8, 9,,, 2]. In this section, we summarize the related work. There are typically two types of decomposition-matching-merging process. First, a complex query pattern can be decomposed into a set of basic binary relationships between each pair of nodes or second, it can be decomposed into a set of path queries, followed by subsequent matching and merging processes. Our INLAB adopts the latter approach and focuses on optimizing all three sub-processes; introducing novel compact labeling scheme, optimizing the matching phase and reducing the number of inspection required in the merging phase. In the first sub-process, most researchers use the labeling of (docno, begin : end, level) for an element and (docno, wordno, level) for a text word as the positional representation of XML elements and texts. However, we use <self level : parent> as the positional representation instead. The details on this will be explained in the next section. MPMGJN [4], Stack-Tree [5] and TwigStack [6] algorithms are based on (docno, begin : end, level) labeling of XML elements. These algorithms accept two lists of sorted individual matching nodes and structurally join pairs of nodes from both lists to produce the matching of the binary relationships. Another similar approach is to decompose the twig query into a set of path queries instead. Polyzotis et al. propose methods to reduce the number of intermediate results by introducing a filtration step based on some notion of synopses to facilitate query-approximate answers [2]. They propose both TREESKETCH and TWIG-XSKETCH. Another work, done by Amer-Yahia et al is to preprocess the query patterns before the matching phase is executed [3]. Since the efficiency of tree pattern matching depends on the size of the pattern, it is essential to identify and eliminate redundant nodes in the pattern before the matching phase takes place. On the other hand, Zezula et al. propose a novel technique, tree signature, to represent tree structures as ordered sequences of pre-order and post-order ranks of the nodes [3]. They use tree signatures as index structure and find qualifying patterns through integration of structurally consistent path query. Merging together the structural matches in the final process poses the problem of selecting a good join ordering. Wu et al. propose a cost-based join order selection of structural join [7]. Kim et al. suggest partitioning all nodes in an extent into several clusters [4]. Given two extents to be joined, they propose filtering out unnecessary clusters in both extents prior to the joining process. Our TwigINLAB algorithm is a generalization of the stack-based algorithm first mentioned by Bruno et al. [6] to match twig query. However, we enhance the query processing by utilizing indexes (built only once) to speed up the matching and merging phases. Further elaboration can be found in next section. MATERIALS AND METHODS Fig. shows the INLAB architecture, which consists of the XML Parser to check the wellformedness of the XML document, the XML Encoder to generate the labeling based on a <self-level:parent> scheme, the XML Indexer to create index storing each node parent and child information and the XML Query Engine for pattern query matching. This paper concentrates only on the XML Query Engine (the optimizer). Other components such as XML Parser, XML Encoder and XML Indexer have been reported in [5, 6]. The criterion for assessing TwigINLAB is execution time. Fig. 2 depicts an example of XML document labeled based on <self-level: parent>. Structural relationships between element nodes can be efficiently determined from the label as follows:. P-C relationship node is the parent of node 2 if and only if node.self = node 2.parent. 2. Sibling relationship node is the sibling of node 2 if and only if node.parent = node 2.parent. 3. Ordered query relationship (predecessor and successor) a. node is the predecessor node of node 2 if and only if node.self < node 2.self. b. node is the successor node of node 2 if and only if node.self > node 2.self. 4. A-D relationship node is possible as an ancestor of node 2 if and only if leveldiff = node 2.level - node.level >=. A multiple look-up via PCTable is necessary as long as the leveldiff > is true to confirm the A-D relationship. 2

< r o o t > < a > < b > < / b > < b > < / b > < / a > < / r o o t > X M L d o c u m e n t X M L P a r s e r X M L E n c o d e r Intermediate results Final results root - : - <-:> <2-2:> a-b a b -: 2-2 : 3-2:2 a b merge a-b-c <-: 2-2: 3-2:2> c 3-2 : 2 4-2: 5-2: b-c r e s u l t <2-2:> <3-2:2> b c Q u e r y E n g i n e X M L I n d e x e r U s e r - s p e c i f i e d q u e r y TwigINLAB algorithm Fig. : Query processing component architecture publications (-:-) book (-:) journal (2-:) title author author author publicationinfo price title author price (2-2:) (3-2:) (4-2:) (5-2:) (6-2:) (-2:) (3-2:2) (4-2:2) (5-2:2) self parent - 2 3 2 4 2 5 2 Fig. 3: Fragment of PCTable index table. place publisher (7-3:6) (8-3:6) (9-3:6) dateissued daterevised (-3:6) Fig. 2: A sample XML document with <self-level: parent> label. For example, let publications (-:-) be node and author (4-2:2) be node 2. The leveldiff between the two nodes is two. This means that we need to trace up the PCTable twice starting from the self attribute of author to check whether publications is ancestor of author as illustrated in Fig. 3. The parent attribute of the retrieved node is equal to the self attribute of publications. Thus, publications and author is of A-D relationship. Fig. 4 illustrates the overall processes involved in TwigINLAB processing. Initially, the query pattern is analyzed using the analysisquerypattern() function. For each query edge, if the twig is of P-C relationship, the parent and child details will be updated in the twigpc (a hashtable to store parent and child) repository. During this process, each node in the twig query is associated with a stream. Each stream contains the positional representations of the node appearance in the XML tree (as shown in Fig. 5). The nodes in the stream are sorted by their self attribute, and thus, this will determine the order of the node to be processed. Associated with each stream is a stack. Stack is used to store the possible intermediate results. 2

analysisquerypattern() partitiontwig() twigjoin() mergetwig() outputsolution() S a u th o r author book author book-author book publisher book publisher <5-2:> <4-2:> <-:> <3-2:> <8-3:6> Sbook Sauthor Spublisher book=publisher 3 8 4-3 -4-5 =8 8 5 8 Fig. 4: Overall flow of TwigINLAB a u th o r T a u th o r < 3-2 : 4-2 : 5-2 : > b o o k T b o o k < - : > p u b lish e r T p u b lish e r < 8-3 :6 > S p u b lish e r Fig. 5: Stack and stream assigned to each query node during the analysisquerypattern() function. Next, the partitiontwig() function takes place. If the query is path query (only one leaf node), this function is skipped and it will proceed to the twigjoin() function. However, if it is twig query, during this function, the twig pattern is decomposed into two or more path queries. Starting from the root of twig query pattern, for each start tag event, it pushes the tag into twigstack (a stack to keep track of twig query sequence). When it reaches an end tag event, it checks whether the current entry at the top of twigstack is a 22 leaf node. If it is a leaf, the query node will be added one by one to the vpathlist (a vector to store query nodes in leaf-to-root order) until it reaches the root. Finally, it will be output in reverse order by the function reverse(). The final output of this function is a set of path queries in root-to-leaf order in pq (a hashtable to keep each distinct path query). For each path query, it recursively calls the twigjoin() function to find the possible path matches. Each possible match is pushed into the stack in the twigjoin() function. For instance, using the twig query in Fig. 5 as an example, after the partitiontwig() function, there are two path queries: book-author and book=publisher. Initially, the path query book-author is to be processed first. Based on the self attribute in each first occurrence in T book, and T author, query node book is being processed first. Element <-:> is then pushed into S book. The next returned query node is the immediate child of book, which is author. Element <3-2:> is pushed into S author because parent attribute of book is equal to self attribute of author. Since author is the leaf query node, a partial solution is formed between book-author. Based on the next occurrences, the next returned node is element <4-2:> as it has the next smallest self attribute. This element is then pushed into S author because the parent attribute of book is equal to the self attribute of author. Since author is the leaf query node, another partial solution is formed between book-author. This process repeats until it reaches the leaf node of the all paths as illustrated in Fig. 6. Next, these matches are merged back through the mergetwig() function. In the mergetwig() function, all partial solutions from the twigjoin() function are merged together to generate the final solutions. This function begins by comparing each entry in the partial solutions of two path queries at a time. All the occurrences in the partial solutions are in sorted order of their self-attributes. If each entry first node is equal, or if the query edge is of P-C relationship and the second query node is of sibling and predecessor relationship, the partial solution will be added to the final solutions. For query edge with A-D relationship, if the second query node is a predecessor, it will be added as a final solution. In both cases, the inner loop begins the iteration from the current j position. Hence, this function skips the unnecessary iteration of non-feasible partial solutions. However, if the first node in the second path query is greater than node, the next inner loop will begin from position j- (for cases where j > ). Fig. 7 illustrates the merging process. Finally, the final solutions are output through the outputsolution() function.

P q h a s h t a b le S tr e a m s p q T b o o k < - : > b o o k - a u t h o r T a u t h o r < 3-2 : > < 4-2 : > < 5-2 : > b o o k = p u b lis h e r T p u b lis h e r < 8-3 : 6 > In T w ig J o i n ( ) l o o p ( i= ) < - : > S a u t h o r ( i= ) < - : > < 3-2 : > S a u t h o r - 3 ( i= 2 ) < - : > < 4-2 : > S a u t h o r - 3-4 ( i= 3 ) < - : > < 5-2 : > - 3-4 - 5 S a u th o r In T w ig J o i n ( ) l o o p 2 ( i= ) ( i= ) < - : > < - : > S p u b lis h e r < 8-3 : 6 > S p u b lis h e r = 8 Fig. 6: Matching process in twigjoin() function. Fig. 7: The merging process scenario. RESULTS AND DISCUSSION We have implemented TwigINLAB using Java API for XML Processing (JAXP). Experiments have been carried out on the SwissProt dataset (2MB) obtained 23 from the University of Washington XML repository [7]. We modified the SwissProt dataset into various file sizes ranging from MB until MB for the purpose of measuring the scalability of both approaches in supporting large-scale dataset. We evaluated the performance of TwigINLAB as compared to TwigStack on two main types of queries namely, path query and twig query. For each type of query, we measure the performance of both algorithms on (a) Q:-Query with P-C relationship (b) Q2:-Query with A-D relationship and (c) Q3:-Mixed query. All our experiments are performed on.7ghz Pentium IV processor with 52 MB SDRAM running on Windows XP systems. All numbers presented here are produced by running the experiments multiple times and averaging the execution times of several consecutive runs. Figures 8, 9 and show the execution time of TwigINLAB and TwigStack for both path and twig

query. Fig. 8 shows the execution time of: QPQ= Entry/Organelle for path query and QTQ= Entry[/Organelle]/Prints for twig query over Standard dataset by varying the file sizes. From the result, TwigINLAB outperforms TwigStack in all the test cases by about 26.8% for path query and 24.5% for twig query. Fig. shows the execution time of: Q3PQ = Features//MUTAGEN//Descr for path query and Q3TQ = Features[//MUTAGEN//Descr]/Site for twig query respectively. TwigINLAB performs about 7.% better than TwigStack for path query and 3.7% for twig query. 5 3 PQ-TwigINLAB PQ-TwigStack PQ-TwigINLAB PQ-TwigStack 4 TQ-TwigINLAB TQ-TwigStack 25 TQ-TwigINLAB TQ-TwigStack 2 Execution Time (s) 3 2 Execution Time (s) 5 5 2 3 4 5 6 7 8 9 XML Doc. (MB) Fig. 8:Test results for Q. Fig. 9 shows the execution time of: Q2PQ = Entry//MedlineID for path query and Q2TQ= Entry[//MedlineID]//Comment for twig query respectively. TwigINLAB performs by about 2.2% better than TwigStack for path query and 7.8% for twig query. Execution Time (s) 4 2 8 6 4 2 PQ-TwigINLAB TQ-TwigINLAB 2 3 4 5 6 7 8 9 XML Doc. (MB) PQ-TwigStack TQ-TwigStack Fig. 9: Test results for Q2. 2 3 4 5 6 7 8 9 XML Doc. (MB) Fig. : Test results for Q3. From these figures, we draw several observations and conclusions:- When the twig query contains only P-C edges, TwigINLAB performs around 24.5% better as compared to TwigStack (shown in Fig. 9). This may be due to the INLAB labeling scheme, which is optimal to support P-C relationships. Although TwigINLAB still outperforms TwigStack for query with edges of A-D relationship by around 7.8%, the difference is less significant as compared to query with edges of P-C relationships. This may be due to the extra time needed to determine whether the two nodes is in A-D relationship by multiple lookups on the index table until the ancestor level is reached. For each test case, TwigINLAB increases less drastically as compared to TwigStack. This shows that TwigINLAB is more scalable in processing large-scale datasets efficiently. 24

CONCLUSION In this paper, we have presented the TwigINLAB algorithm to optimize all the sub-processes involved in the decomposition-matching-merging approaches. Experimental results show that, in terms of execution time, on average, TwigINLAB performs about 2.7% better for path query and about 8.7% better for twig query compared to the TwigStack. Also, TwigINLAB is more scalable compared to TwigStack. As such, TwigINLAB supports large-scale query of datasets efficiently. ACKNOWLEDGEMENTS This work was partially supported by funding from esciencefund, Ministry of Science, Technology and Innovation, Malaysia. REFERENCES. Haw, S.C. and Rao, G.S.V.R.K., 25. Query Optimization Techniques for XML Databases. International Journal of Information Technology, 2(): 97-4. 2. Polyzotis, N., Garofalakis, M., Ioannidis, Y., 24. Approximate XML Query Answers, Proceedings of ACM SIGMOD, pp: 263 274. 3. Zezula, P., Mandreoli, F., Martoglia, R., 24. Tree Signatures and Unordered XML Pattern Matching, Proceedings of SOFSEM, pp:22 39. 4. Zhang, C., Naughton, J., DeWitty, D., Luo, Q., Lohman, G., 2. On Supporting Containment Queries in Relational Database Management Systems, Proceedings of ACM SIGMOD, pp: 425-436. 5. Al-Khalifa, S., Jagadish, H.V., Koudas, N., Patel, J.M., Srivastava D., Wu, Y., 22. Structural Joins: A Primitive for Efficient XML Query Pattern Matching, Proceedings of ICDE, pp: 4-52. 6. Bruno, N., Srivastava, D., Koudas, N., 22. Holistic Twig Joins: Optimal XML Pattern Matching, Proceedings of ACM SIGMOD, pp: 3-32. 7. Kim, J., Lee, S.H., Kim, H-J., 24. Efficient structural joins with clusters extents. Information Processing Letters, 9: 69-75. 8. Yao, J.T. and Zhang, M., 24. A Fast Tree Pattern Matching Algorithm for XML Query, Proceedings of IEEE/WIC/ACM, pp: 235-24. 9. Chen, Q., Lim, A., Ong, K., Tang, J., 23. D(k)- index: An adaptive structural summary for graphstructured data, Proceedings of SIGMOD, pp: 34 44.. Li, Q. and Moon, B., 2. Indexing and Querying XML Data for Regular Path Expressions, Proceedings. of VLDB, pp: 36-37.. Bayardo, R.J., Gruhl, D., Josifovski, V., Myllymaki, J. 24. An Evaluation of Binary XML Encoding Optimizations for Fast Stream Based XML Processing, Proceedings of WWW, pp: 345-354. 2. Diao, Y., Fischer, P., Franklin, M., To, R. (22). YFilter: Efficient and Scalable Filtering of XML Documents, Proceedings of ICDE. Demo paper. 3. Amer-Yahia, S., Cho, S., Lakshmanan, L.V.S., Srivastava, D., 22. Tree pattern query minimization. VLDB Journal,(4): 35-33. 4. Wu, Y., Patel, J. M., Jagadish, H.V., 23. Structural join order selection for XML query optimization, Proceedings of ICDE, pp : 443-454. 5. Haw, S.C. and Rao, G.S.V.R.K. 27. A Comparative Study and Benchmarking on XML Parsers, Proceedings of IEEE-ICACT, pp: 32-325. 6. Haw, S.C. and Rao, G.S.V.R.K., 27. An efficient Path Query Processing support for Parent-Child Relationship in Native XML Databases. Journal of Digital Information Management, 2(2): 82-87. 7. University of Washington XML Repository. Available http://www.cs.washington.edu/research/xmldatase ts/ 25