Symmetrically Exploiting XML

Similar documents
The Benefits of Utilizing Closeness in XML. Curtis Dyreson Utah State University Shuohao Zhang Microsoft, Marvell Hao Jin Expedia.

Symmetrically Exploiting XML

An Effective and Efficient Approach for Keyword-Based XML Retrieval. Xiaoguang Li, Jian Gong, Daling Wang, and Ge Yu retold by Daryna Bronnykova

Element Algebra. 1 Introduction. M. G. Manukyan

An Efficient XML Index Structure with Bottom-Up Query Processing

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

Querying Tree-Structured Data Using Dimension Graphs

METAXPath. Utah State University. From the SelectedWorks of Curtis Dyreson. Curtis Dyreson, Utah State University Michael H. Böhen Christian S.

XML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW

Compression of the Stream Array Data Structure

XML Technologies. Doc. RNDr. Irena Holubova, Ph.D. Web pages:

Estimating the Selectivity of XML Path Expression with predicates by Histograms

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

Using an Oracle Repository to Accelerate XPath Queries

Top-k Keyword Search Over Graphs Based On Backward Search

A MODEL FOR ADVANCED QUERY CAPABILITY DESCRIPTION IN MEDIATOR SYSTEMS

QuickXDB: A Prototype of a Native XML QuickXDB: Prototype of Native XML DBMS DBMS

XML and Databases. Outline. Outline - Lectures. Outline - Assignments. from Lecture 3 : XPath. Sebastian Maneth NICTA and UNSW

Processing XML Keyword Search by Constructing Effective Structured Queries

Classifying Elements for XML Query Transformation

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

An Extended Byte Carry Labeling Scheme for Dynamic XML Data

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Improving Data Access Performance by Reverse Indexing

Chapter 13 XML: Extensible Markup Language

XML Data Management. 5. Extracting Data from XML: XPath

Indexing XML Data with ToXin

Informatics 1: Data & Analysis

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

Querying XML Data. Querying XML has two components. Selecting data. Construct output, or transform data

Schema-Based XML-to-SQL Query Translation Using Interval Encoding

Informatics 1: Data & Analysis

Course: The XPath Language

ADT 2009 Other Approaches to XQuery Processing

SpiderX: Fast XML Exploration System

Semi-structured Data. 8 - XPath

Integrating Path Index with Value Index for XML data

Module 3. XML Processing. (XPath, XQuery, XUpdate) Part 5: XQuery + XPath Fulltext

Selectively Storing XML Data in Relations

XML: Extensible Markup Language

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

SQLfX Live Online Demo

XPath. Lecture 36. Robb T. Koether. Wed, Apr 16, Hampden-Sydney College. Robb T. Koether (Hampden-Sydney College) XPath Wed, Apr 16, / 28

DISCUSSION 5min 2/24/2009. DTD to relational schema. Inlining. Basic inlining

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today

CSE 544 Principles of Database Management Systems. Lecture 4: Data Models a Never-Ending Story

On Label Stream Partition for Efficient Holistic Twig Join

Evaluating XPath Queries

Evaluation of Keyword Search System with Ranking

Ecient XPath Axis Evaluation for DOM Data Structures

TwigINLAB: A Decomposition-Matching-Merging Approach To Improving XML Query Processing

Course: The XPath Language

EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML

XML Query Processing and Optimization

SFilter: A Simple and Scalable Filter for XML Streams

10/24/12. What We Have Learned So Far. XML Outline. Where We are Going Next. XML vs Relational. What is XML? Introduction to Data Management CSE 344

Introduction to Database Systems CSE 414

TDDD43. Theme 1.2: XML query languages. Fang Wei- Kleiner h?p:// TDDD43

Data Exchange over Web-based Applications with DXL

Lab Assignment 3 on XML

Informatics 1: Data & Analysis

Introduction to Data Management CSE 344

XSelMark: A Micro-Benchmark for Selectivity Estimation Approaches of XML Queries

XML in Databases. Albrecht Schmidt. al. Albrecht Schmidt, Aalborg University 1

CMPSCI 645 Database Design & Implementation

Querying Spatiotemporal XML Using DataFoX

Adding Valid Time to XPath

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Querying and Updating XML with XML Schema constraints in an RDBMS

Introduction to Semistructured Data and XML

Annotating Multiple Web Databases Using Svm

H2 Spring B. We can abstract out the interactions and policy points from DoDAF operational views

Efficient Keyword Search for Smallest LCAs in XML Databases

CoXML: A Cooperative XML Query Answering System

Index-Driven XQuery Processing in the exist XML Database

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 4 Data models A Never-Ending Story

Query equivalence and optimization

Keyword Search over Hybrid XML-Relational Databases

Integrated Usage of Heterogeneous Databases for Novice Users

Efficient Processing of Complex Twig Pattern Matching

Path-based Keyword Search over XML Streams

Introduction to Database Systems CSE 414

An Algorithm for Streaming XPath Processing with Forward and Backward Axes

Progress Report on XQuery

1 Introduction. Philippe Michiels. Jan Hidders University of Antwerp. University of Antwerp. Roel Vercammen. University of Antwerp

Semantic Integration of Tree-Structured Data Using Dimension Graphs

THE EVOLUTION OF THE INFORMATION RETRIEVAL LANGUAGES FOR XML

Fast Matching of Twig Patterns

XML, DTD, and XPath. Announcements. From HTML to XML (extensible Markup Language) CPS 116 Introduction to Database Systems. Midterm has been graded

Database Fundamentals Chapter 1

Keywords Data alignment, Data annotation, Web database, Search Result Record

CSE 544 Data Models. Lecture #3. CSE544 - Spring,

A FRACTIONAL NUMBER BASED LABELING SCHEME FOR DYNAMIC XML UPDATING

Introduction to XML. Yanlei Diao UMass Amherst April 17, Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.

Data Formats and APIs

Introduction to Database Systems CSE 444

Shifting Predicates to Inner Sub-Expressions for XQuery Optimization

MAXLCA: A NEW QUERY SEMANTIC MODEL FOR XML KEYWORD SEARCH

XML Systems & Benchmarks

XML-Relational Mapping. Introduction to Databases CompSci 316 Fall 2014

Transcription:

Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA The 15 th International World Wide Web Conference May 2006 Edinburgh, Scotland

1970 s Database Controversy Hierarchical model vs. relational model Codd: symmetric exploitation of data Part Project Commit Project Part Project Part part/project works on some, but not all Path expressions are asymmetric Currently, all XML query languages use path expressions

Querying Data with Path Expressions author name E. F. Codd publisher price publisher price DB 46.95 Automata 9.99 Addison Wesley Academic Press Task Find s by E. F. Codd XQuery return doc("author.xml")//author[name= 'E. F. Codd']/

Same Data, Different Structure author name author price publisher author price publisher E. F. Codd publisher price publisher price DB 46.95 Automata 9.99 Addison Wesley Academic Press DB 46.95 Automata 9.99 name Addison Wesley name E. F. Codd Codd Academic Press Same task Find s by E. F. Codd Need different XQuery return doc(".xml")//[author/name='e. F. Codd']

Goal Make same query work on different structures Useful when there is lack of schema knowledge heterogeneous data irregular data schema evolution Factor off problem of different label sets, others are working on it

Existing Axes are Directional ancestor self preceding descendent following

Proposal: A Non-directional Axis ancestor self preceding descendent following

Proposal: A Non-directional Axis ancestor self preceding descendent following

Proposal: A Non-directional Axis ancestor self preceding descendent following

The Closest Axis Syntax closest:: ->name is abbreviation for closest::name Semantics a function that takes a context node and returns a sequence of closest nodes

Closest Axis of the First Title author name publisher price publisher price closest::* Returns a list of five nodes closest::price Returns the first price node

When the First Book Lacks a Price author name publisher publisher price Node selection restricted by minimal type distance The minimal distance between a and a price is 2 closest::price Returns an empty list

Type Distance is Crucial closest::name for each? author name publisher publisher price name Root-to-node path type author/name author//publisher/name

Querying with the Closest Axes Same query -- return doc("any.xml")->author[->name='e. F. Codd']-> Query Result#1 Query Closest axis-enabled XQuery evaluation engine Result#2 Result#3 Query

Querying with Directional Axes Query#1 -- return doc("author.xml")//author[name= 'E. F. Codd']/ Result#1 Query#2 -- XQuery evaluation engine Result#2 Result#3 Query#3 -- return doc(".xml")//[author/name='e. F. Codd']

In-memory Implementation Naïve approach Compute Closest for every node Time complexity is O(sn 2 ) s: number of labels in the signature n: number of nodes Converting to a path expression name author Find the closest price for Non-directional expression closest::price publisher price Directional (path) expression parent::*/child::price

Experiment Compare directional vs. nondirectional for $b in doc("bib.xml")///closest::publisher return $b for $b in doc("bib.xml")///..//publisher return $b 1600 1400 Implemented closest in exist (an XML DBMS) Time (milliseconds) 1200 1000 800 600 400 descendant closest 200 0 25000 50000 75000 100000 125000 Number of Nodes 150000

Persistent Implementation Take advantage of type indexes LCA-join Every Closest pair related via an LCA Idea is to merge lists of types current lca current parent current child direction of merge O(sn)

Related Work Data integration TSIMMIS Garcia-Molina et al. (Journal of Intelligent Information Systems 1997) YAT Christophides, Cluet, Simèon (SIGMOD Record June 2000) Silkroute Fernandez, Tan, Suciu (WWW 2000) LCA-related techniques Schmidt, Kersten, Windhouwer (ICDE 2001) Cohen, Mamou, Kanza, Sagiv (VLDB 2003) Li, Yu, Jagadish (VLDB 2004)

Related Research Projects XML Restructuring Zhang, Dyreson (IIWeb 2006) XML Compaction Zhang, Dyreson, Dang (DASFAA 2006) Common theme symmetric exploitation!

Conclusion Current XQuery depends on path expressions A path expression is directional (asymmetric) May break down if structure changes The closest axis is non-directional (symmetric) Simple in syntax Can be easily integrated in XQuery Can be implemented efficiently In-memory Persistent

Thank You!