XML-Relational Mapping. Introduction to Databases CompSci 316 Fall 2014

Similar documents
XML-XQuery and Relational Mapping. Introduction to Databases CompSci 316 Spring 2017

XML, DTD, and XPath. Announcements. From HTML to XML (extensible Markup Language) CPS 116 Introduction to Database Systems. Midterm has been graded

XPath and XQuery. Introduction to Databases CompSci 316 Fall 2018

XSLT. Announcements (October 24) XSLT. CPS 116 Introduction to Database Systems. Homework #3 due next Tuesday Project milestone #2 due November 9

JSON & MongoDB. Introduction to Databases CompSci 316 Fall 2018

XSLT program. XSLT elements. XSLT example. An XSLT program is an XML document containing

Lecture 20 (Additional/Optional Slides) NoSQL/MongoDB

XML and Databases. Outline. Outline - Lectures. Outline - Assignments. from Lecture 3 : XPath. Sebastian Maneth NICTA and UNSW

XML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW

EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML

Semi-structured Data: Programming. Introduction to Databases CompSci 316 Fall 2018

10/24/12. What We Have Learned So Far. XML Outline. Where We are Going Next. XML vs Relational. What is XML? Introduction to Data Management CSE 344

Introduction to Database Systems CSE 444

Introduction to Data Management CSE 344

Introduction to Database Systems CSE 414

XML in Databases. Albrecht Schmidt. al. Albrecht Schmidt, Aalborg University 1

Introduction to Database Systems CSE 414

SAX & DOM. Announcements (Thu. Oct. 31) SAX & DOM. CompSci 316 Introduction to Database Systems

Approaches. XML Storage. Storing arbitrary XML. Mapping XML to relational. Mapping the link structure. Mapping leaf values

Introduction to XML. Yanlei Diao UMass Amherst April 17, Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.

Introduction to Semistructured Data and XML

Lab Assignment 3 on XML

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

Data Formats and APIs

XML, DTD, and XML Schema. Introduction to Databases CompSci 316 Fall 2014

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington

Evaluating XPath Queries

Querying XML Data. Querying XML has two components. Selecting data. Construct output, or transform data

Module 4. Implementation of XQuery. Part 2: Data Storage

Additional Readings on XPath/XQuery Main source on XML, but hard to read:

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Indexing XML Data Stored in a Relational Database

Querying XML. COSC 304 Introduction to Database Systems. XML Querying. Example DTD. Example XML Document. Path Descriptions in XPath

ADT 2009 Other Approaches to XQuery Processing

Chapter 13 XML: Extensible Markup Language

CSE 544 Data Models. Lecture #3. CSE544 - Spring,

Lecture2: Database Environment

XML Systems & Benchmarks

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today

CSE 544 Principles of Database Management Systems. Lecture 4: Data Models a Never-Ending Story

CLASS DISCUSSION AND NOTES

Symmetrically Exploiting XML

M359 Block5 - Lecture12 Eng/ Waleed Omar

PASSWORDS TREES AND HIERARCHIES. CS121: Relational Databases Fall 2017 Lecture 24

Hierarchical Data in RDBMS

A DTD-Syntax-Tree Based XML file Modularization Browsing Technique

XML Background. The reason that so many people are excited about XML is that so many people are excited about XML. ANON

XPath an XML query language

Index-Driven XQuery Processing in the exist XML Database

Designing Information-Preserving Mapping Schemes for XML

XML: Extensible Markup Language

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

XML Query Languages. Yanlei Diao UMass Amherst April 22, Slide content courtesy of Ramakrishnan & Gehrke, Donald Kossmann, and Gerome Miklau

Announcements (March 31) XML Query Processing. Overview. Navigational processing in Lore. Navigational plans in Lore

CSE 214 Computer Science II Introduction to Tree

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

Distributed Relationship Schemes for Trees

Traditional Query Processing. Query Processing. Meta-Data for Optimization. Query Optimization Steps. Algebraic Transformation Predicate Pushdown

XML publishing. Querying and storing XML. From relations to XML Views. From relations to XML Views

Introduction. Web Pages. Example Graph

XML. Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

XML and Relational Databases

Pre-Discussion. XQuery: An XML Query Language. Outline. 1. The story, in brief is. Other query languages. XML vs. Relational Data

Annoucements. Where are we now? Today. A motivating example. Recursion! Lecture 21 Recursive Query Evaluation and Datalog Instructor: Sudeepa Roy

Working with XML and DB2

EXtensible Markup Language XML

Advances in Programming Languages

Outline. Depth-first Binary Tree Traversal. Gerênciade Dados daweb -DCC922 - XML Query Processing. Motivation 24/03/2014

Database Systems (INFR10070) Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 14 Database Connectivity and Web Technologies

An Effective and Efficient Approach for Keyword-Based XML Retrieval. Xiaoguang Li, Jian Gong, Daling Wang, and Ge Yu retold by Daryna Bronnykova

11. EXTENSIBLE MARKUP LANGUAGE (XML)

XML and Semi-structured data

SQL: Recursion. Announcements (October 3) A motivating example. CPS 116 Introduction to Database Systems. Homework #2 graded

Lecture 2 SQL. Announcements. Recap: Lecture 1. Today s topic. Semi-structured Data and XML. XML: an overview 8/30/17. Instructor: Sudeepa Roy

CHAPTER 3 LITERATURE REVIEW

Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases

CSE 344 Final Examination

Prolog and XML Programming as a Tool for Web Search

Lecture 11: Xpath/XQuery. Wednesday, April 17, 2007

Navigation- vs. Index-Based XML Multi-Query Processing

Informatics 1: Data & Analysis

Storage and Indexing

CSE 530A. B+ Trees. Washington University Fall 2013

Informatics 1: Data & Analysis

XML Technical Overview. Bill Arledge, Consulting Product Manager BMC Software Inc.

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Updating XML documents

Lecture 22 Acyclic Joins and Worst Case Join Results Instructor: Sudeepa Roy

Introduction to Data Management CSE 344

What is database? Types and Examples

XML and Web Services

The Benefits of Utilizing Closeness in XML. Curtis Dyreson Utah State University Shuohao Zhang Microsoft, Marvell Hao Jin Expedia.

Efficient schema-based XML-to-Relational data mapping

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 4 Data models A Never-Ending Story

Selectively Storing XML Data in Relations

IBM DB2 11 DBA for z/os Certification Review Guide Exam 312

CS313D: ADVANCED PROGRAMMING LANGUAGE

5. Storage and Manipulation of SSD

Transcription:

XML-Relational Mapping Introduction to Databases CompSci 316 Fall 2014

2 Approaches to XML processing Text files/messages Specialized XML DBMS Tamino(Software AG), BaseX, exist, Sedna, Not as mature as relational DBMS Relational (and object-relational) DBMS Middleware and/or extensions IBM DB2 s purexml, PostgreSQL s XML type/functions

3 Mapping XML to relational Store XML in a CLOB(Character Large OBject) column Simple, compact Full-text indexing can help (often provided by DBMS vendors as object-relational extensions ) Poor integration with relational query processing Updates are expensive Alternatives? Schema-oblivious mapping: well-formed XML generic relational schema Node/edge-based mapping for graphs Interval-based mapping for trees Path-based mapping for trees Schema-aware mapping: valid XML special relational schema based on DTD Focus of this lecture

4 Node/edge-based: schema Element(eid, tag) Attribute(eid, attrname, attrvalue) Attribute order does not matter ElementChild(eid, pos, child) pos specifies the ordering of children child references either Element(eid) or Text(tid) Text(tid, value) tidcannot be the same as any eid Need to invent lots of id s Key: Keys: Need indexes for efficiency, e.g., Element(tag), Text(value)

5 Node/edge-based: example <bibliography> <book ISBN="ISBN-10" price="80.00"> <title>foundations of Databases</title> <author>abiteboul</author> <author>hull</author> <author>vianu</author> <publisher>addison Wesley</publisher> <year>1995</year> </book> </bibliography> Attribute Text tid t0 t1 t2 t3 t4 eid attrname attrvalue e1 ISBN ISBN-10 e1 price 80 value Foundations of Databases Abiteboul Hull Vianu t5 1995 Addison Wesley Element eid tag e0 bibliography e1 book e2 title e3 author e4 author e5 author e6 publisher e7 year ElementChild eid tag child e0 1 e1 e1 1 e2 e1 2 e3 e1 3 e4 e1 4 e5 e1 5 e6 e1 6 e7 e2 1 t0 e3 1 t1 e4 1 t2 e5 1 t3 e6 1 t4 e7 1 t5

6 Node/edge-based: simple paths //title SELECT eid FROM Element WHERE tag = 'title'; //section/title SELECT e2.eid FROM Element e1, ElementChild c, Element e2 WHERE e1.tag = 'section' AND e2.tag = 'title' AND e1.eid = c.eid AND c.child = e2.eid; Path expression becomes joins! Number of joins is proportional to the length of the path expression

7 Node/edge-based: complex paths //bibliography/book[author="abiteboul"]/@price SELECT a.attrvalue FROM Element e1, ElementChild c1, Element e2, Attribute a WHERE e1.tag = 'bibliography' AND e1.eid = c1.eid AND c1.child = e2.eid AND e2.tag = 'book' AND EXISTS (SELECT * FROM ElementChild c2, Element e3, ElementChild c3, Text t WHERE e2.eid = c2.eid AND c2.child = e3.eid AND e3.tag = 'author' AND e3.eid = c3.eid AND c3.child = t.tid AND t.value = 'Abiteboul') AND e2.eid = a.eid AND a.attrname = 'price';

Node/edge-based: descendent-or-self 8 //book//title

9 Interval-based: schema Element(left, right, level, tag) leftis the start position of the element rightis the end position of the element levelis the nesting depth of the element (strictly speaking, unnecessary) Key is Text(left, right, level, value) Key is Attribute(left, attrname, attrvalue) Key is

10 Interval-based: example 1<bibliography> 2<book ISBN="ISBN-10" price="80.00"> 3<title>4Foundations of Databases</title>5 6<author>7Abiteboul</author>8 9<author>10Hull</author>11 12<author>13Vianu</author>14 15<publisher>16Addison Wesley</publisher>17 18<year>191995</year>20 </book>21 </bibliography>999 bibliography book 2,21,2 1,999,1 Where did ElementChild go? is the parent of iff: [.left,.right] [.left,.right], and.level =.level 1 title authorauthor author publisher year 3,5,3 6,8,3 9,11,3 12,14,3 15,17,3 18,20,3

11 Interval-based: queries //section/title SELECT e2.left FROM Element e1, Element e2 WHERE e1.tag = 'section' AND e2.tag = 'title' AND e1.left < e2.left AND e2.right < e1.right AND e1.level = e2.level-1; Path expression becomes containment joins! Number of joins is proportional to path expression length //book//title

12 Summary so far Node/edge-based vs. interval-based mapping Path expression steps Equality vs. containment join Descendent-or-self Recursion required vs. not required

13 Path-based mapping: approach 1 Label-path encoding: paths as strings of labels Element(pathid, left, right, ), Path(pathid, path), pathis a string containing the sequence of labels on a path starting from the root Why are leftand rightstill needed? Element pathid left right pathid path Path 1 1 999 2 2 21 3 3 5 4 6 8 1 /bibliography 2 /bibliography/book 3 /bibliography/book/title 4 /bibliography/book/author 4 9 11 4 12 14

14 Label-path encoding: queries Simple path expressions with no conditions //book//title Perform string matching on Path Join qualified pathid s with Element //book[publisher='prentice Hall']/title Evaluate //book/title Evaluate //book/publisher[text()='prentice Hall'] Must then ensure title and publisher belong to the same book (how?) Path expression with attached conditions needs to be broken down, processed separately, and joined back

15 Path-based mapping: approach 2 Dewey-order encoding Each component of the id represents the order of the child within its parent Unlike label-path, this encoding is lossless bibliography 1 book 1.1 1.2 1.3 1.4 1.4.1 1.4.2 Element(dewey_pid, tag) Text(dewey_pid, value) Attribute(dewey_pid, attrname, attrvalue) title authorauthor author publisher year 1.1.1 1.1.2 1.1.2 1.1.4 1.1.5 1.1.6

16 Dewey-order encoding: queries Examples: //title //section/title //book//title //book[publisher='prentice Hall']/title Works similarly as interval-based mapping Except parent/child and ancestor/descendant relationship are checked by prefix matching Serves a different purpose from label-path encoding Any advantage over interval-based mapping?

17 Summary XML data can be shredded into rows in a relational database XQueries can be translated into SQL queries Queries can then benefit from smart relational indexing, optimization, and execution With schema-oblivious approaches, comprehensive XQuery-SQL translation can be easily automated Different data mapping techniques lead to different styles of queries Schema-aware translation is also possible and potentially more efficient, but automation is more complex