Introduction to Data Management CSE 344

Similar documents
10/24/12. What We Have Learned So Far. XML Outline. Where We are Going Next. XML vs Relational. What is XML? Introduction to Data Management CSE 344

Introduction to Database Systems CSE 414

Introduction to Database Systems CSE 444

Introduction to Database Systems CSE 414

Additional Readings on XPath/XQuery Main source on XML, but hard to read:

Introduction to XML. Yanlei Diao UMass Amherst April 17, Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.

CSE 544 Data Models. Lecture #3. CSE544 - Spring,

Introduction to Data Management CSE 344

Traditional Query Processing. Query Processing. Meta-Data for Optimization. Query Optimization Steps. Algebraic Transformation Predicate Pushdown

Querying XML Data. Querying XML has two components. Selecting data. Construct output, or transform data

Lecture 11: Xpath/XQuery. Wednesday, April 17, 2007

Data Formats and APIs

EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington

XML and Web Services

XPath an XML query language

Introduction to Semistructured Data and XML

XML and Semi-structured data

CSE 544 Principles of Database Management Systems. Lecture 4: Data Models a Never-Ending Story

XML Query Languages. Yanlei Diao UMass Amherst April 22, Slide content courtesy of Ramakrishnan & Gehrke, Donald Kossmann, and Gerome Miklau

Introduction to Database Systems CSE 414

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 4 Data models A Never-Ending Story

CSE 344 APRIL 16 TH SEMI-STRUCTURED DATA

Lab Assignment 3 on XML

XML Overview COMP9319

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

Database Systems CSE 414

XML, DTD, and XPath. Announcements. From HTML to XML (extensible Markup Language) CPS 116 Introduction to Database Systems. Midterm has been graded

Announcements. Two Classes of Database Applications. Class Overview. NoSQL Motivation. RDBMS Review: Serverless

Overview. Structured Data. The Structure of Data. Semi-Structured Data Introduction to XML Querying XML Documents. CMPUT 391: XML and Querying XML

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

JSON - Overview JSon Terminology

XML Parsers XPath, XQuery

Announcements. JSon Data Structures. JSon Syntax. JSon Semantics: a Tree! JSon Primitive Datatypes. Introduction to Database Systems CSE 414

Introduction to Semistructured Data and XML. Management of XML and Semistructured Data. Path Expressions

Class Overview. Two Classes of Database Applications. NoSQL Motivation. RDBMS Review: Client-Server. RDBMS Review: Serverless

Introduction to Semistructured Data and XML. Contents

CSCI3030U Database Models

XML in Databases. Albrecht Schmidt. al. Albrecht Schmidt, Aalborg University 1

CSE 344 FEBRUARY 2 ND DATA SEMI-STRUCTURED

Data Presentation and Markup Languages

(One) Layer Model of the Semantic Web. Semantic Web - XML XML. Extensible Markup Language. Prof. Dr. Steffen Staab Dipl.-Inf. Med.

CS145 Introduction. About CS145 Relational Model, Schemas, SQL Semistructured Model, XML

Introduction Syntax and Usage XML Databases Java Tutorial XML. November 5, 2008 XML

Introduction to XML. Leonidas Fegaras. CSE 6331 Leonidas Fegaras XML 1

XML Systems & Benchmarks

XSLT and Structural Recursion. Gestão e Tratamento de Informação DEI IST 2011/2012

Introduction to XML. National University of Computer and Emerging Sciences, Lahore. Shafiq Ur Rahman. Center for Research in Urdu Language Processing

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

XML Stream Processing

Chapter 13 XML: Extensible Markup Language

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

Introduction. Web Pages. Example Graph

XML: Extensible Markup Language

XML Background. The reason that so many people are excited about XML is that so many people are excited about XML. ANON

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

Semistructured Data and XML

Overview. Introduction. Introduction XML XML. Lecture 16 Introduction to XML. Boriana Koleva Room: C54

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

M359 Block5 - Lecture12 Eng/ Waleed Omar

Chapter 1: Semistructured Data Management XML

XML. Document Type Definitions. Database Systems and Concepts, CSCI 3030U, UOIT, Course Instructor: Jarek Szlichta

Chapter 1: Semistructured Data Management XML

Part II: Semistructured Data

XML. extensible Markup Language. ... and its usefulness for linguists

Semistructured data, XML, DTDs

Part 2: XML and Data Management Chapter 6: Overview of XML

Querying XML. COSC 304 Introduction to Database Systems. XML Querying. Example DTD. Example XML Document. Path Descriptions in XPath

XML Technologies. Doc. RNDr. Irena Holubova, Ph.D. Web pages:

Relational Data Model is quite rigid. powerful, but rigid.

Author: Irena Holubová Lecturer: Martin Svoboda

Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa

EMERGING TECHNOLOGIES. XML Documents and Schemas for XML documents

Working with XML and DB2

Introduction to XML (Extensible Markup Language)

XML and Databases. Outline. Outline - Lectures. Outline - Assignments. from Lecture 3 : XPath. Sebastian Maneth NICTA and UNSW

Part V. Relational XQuery-Processing. Marc H. Scholl (DBIS, Uni KN) XML and Databases Winter 2007/08 297

CPSC 504 Background (aka, all you need to know about databases for this course in two lectures) Rachel Pottinger January 8 and 12, 2015

XML and information exchange. XML extensible Markup Language XML

Introduction to XML. Chapter 133

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

Introduction to XML. XML: basic elements

Introduction to Information Retrieval

Introduction to Database Systems. Fundamental Concepts

XSLT. Announcements (October 24) XSLT. CPS 116 Introduction to Database Systems. Homework #3 due next Tuesday Project milestone #2 due November 9

XML. Semi-structured data (SSD) SSD Graphs. SSD Examples. Schemas for SSD. More flexible data model than the relational model.

CMPT 354: Database System I. Lecture 2. Relational Model

Query Languages for XML

Introduction to XML. An Example XML Document. The following is a very simple XML document.

XML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW

XML-Relational Mapping. Introduction to Databases CompSci 316 Fall 2014

CSE 344 Midterm. November 9, 2011, 9:30am - 10:20am. Question Points Score Total: 100

CSE 880. Advanced Database Systems. Semistuctured Data and XML

Querying purexml Part 1 The Basics

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML

XML. Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior

CSE 344 Final Examination

COMP9321 Web Application Engineering

Transcription:

Introduction to Data Management CSE 344 Lecture 11: XML and XPath 1

XML Outline What is XML? Syntax Semistructured data DTDs XPath 2

What is XML? Stands for extensible Markup Language 1. Advanced, self-describing file format 2. Based on a flexbile, semi-structured data model Applications: Data exchange Storing data without a rigid schema: advertisements Configuration files: e.g. Web.Config Document markup: e.g. XHTML We will study only XML as data 3

XML vs Relational Relational data model Rigid flat structure (tables) Schema must be fixed in advanced Binary representation: good for performance, bad for exchange Query language based on Relational Calculus Semistructured data model / XML Flexible, nested structure (trees) Does not require predefined schema ("self describing ) Text representation: good for exchange, bad for performance Query language borrows from automata theory 4

From HTML to XML HTML describes the presentation 5

HTML <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteboul, Buneman, Suciu <br> Morgan Kaufmann, 1999 HTML describes the presentation 6

XML Syntax <bibliography> <book> <title> Foundations </title> </book> </bibliography> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> XML describes the content 7

XML Terminology Tags: book, title, author, Start tag: <book>, end tag: </book> Elements: <book> </book>,<author> </author> Elements are nested Empty element: <red></red> abbrv. <red/> An XML document: single root element Well formed XML document Has matching tags A short header And a root element 8

Well-Formed XML <? xml version= 1.0 encoding= utf-8 standalone= yes?> <SomeTag> </SomeTag> 9

More XML: Attributes <book price = 55 currency = USD > <title> Foundations of Databases </title> <author> Abiteboul </author> <year> 1995 </year> </book> 10

Attributes v.s. Elements <book price = 55 currency = USD > <title> Foundations of DBs </title> <author> Abiteboul </author> <year> 1995 </year> </book> <book> <title> Foundations of DBs </title> <author> Abiteboul </author> <year> 1995 </year> <price> 55 </price> <currency> USD </currency> </book> Attributes are alternative ways to represent data 11

Comparison Elements Ordered May be repeated May be nested Attributes Unordered Must be unique Must be atomic 12

XML Semantics: a Tree! <data> <person id= o555 > <name> Mary </name> <address> <street>maple</street> <no> 345 </no> <city> Seattle </city> </address> </person> <person> <name> John </name> <address>thailand </address> <phone>23456</phone> </person> </data> id o555 Attribute node name Mary person address street no city Maple 345 data Seattle name John person address Thai Text node Element node phone 23456 Order matters!!! 13

XML Data XML is self-describing Schema elements become part of the data Relational schema: person(name,phone) In XML <person>, <name>, <phone> are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data 14

Mapping Relational Data to XML Data The canonical mapping: XML: person row row row Person Name Phone John 3634 Sue 6343 Dick 6363 name phone name phone name phone John 3634 Sue 6343 Dick 6363 <person> <row> <name>john</name> <phone> 3634</phone></row> <row> <name>sue</name> <phone> 6343</phone></row> <row> <name>dick</name> <phone> 6363</phone></row> </person> 15

Mapping Relational Data to XML Data Application specific mapping Person Name Phone John 3634 Sue 6343 Orders PersonName Date Product John 2002 Gizmo John 2004 Gadget Sue 2002 Gadget XML <people> <person> <name> John </name> <phone> 3634 </phone> <order> <date> 2002 </date> <product> Gizmo </product> </order> <order> <date> 2004 </date> <product> Gadget </product> </order> </person> <person> <name> Sue </name> <phone> 6343 </phone> <order> <date> 2004 </date> <product> Gadget </product> </order> </person> </people> 16

XML=Semi-structured Data (1/3) Missing attributes: <person> <name> John</name> <phone>1234</phone> </person> <person> <name>joe</name> </person> no phone! Could represent in a table with nulls name phone John 1234 Joe - 17

XML=Semi-structured Data (2/3) Repeated attributes <person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone> </person> Two phones! Impossible in tables: name phone Mary 2345 3456??? 18

XML=Semi-structured Data (3/3) Attributes with different types in different objects <person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone> </person> Structured name! Nested collections Heterogeneous collections: <db> contains both <book>s and <publisher>s 19

Schema 20

Document Type Definitions (DTD) An XML document may have a DTD XML document: Well-formed = if tags are correctly closed Valid = if it has a DTD and conforms to it Validation is useful in data exchange Use http://validator.w3.org/check to validate Superseded by XML Schema (Book Sec. 11.4) Very complex: DTDs still used widely

Example DTD <!DOCTYPE company [ <!ELEMENT company ((person product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]> 22

Example DTD Example of valid XML document: <company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> </person> <product>... </product>... </company> 23

DTD: The Content Model <!ELEMENT tag (CONTENT)> Content model: Complex = a regular expression over other elements Text-only = #PCDATA Empty = EMPTY Any = ANY content model Mixed content = (#PCDATA A B C)* 24

DTD: Complex Content Sequence DTD <!ELEMENT name (firstname, lastname))> XML <name> <firstname>..... </firstname> <lastname>..... </lastname> </name> Optional <!ELEMENT name (firstname?, lastname))> Kleene star <!ELEMENT person (name, phone*))> Alternation <!ELEMENT person (name, (phone email)))> <person> <name>..... </name> <phone>..... </phone> <phone>..... </phone> <phone>..... </phone>...... </person> 25

DTD: Attributes From sample-xml-with-dtd.xml <!DOCTYPE bib [ ]> <!ELEMENT bib (book* )> <!ELEMENT book (title, (author+ editor+ ), publisher?, price )> <!ATTLIST book year CDATA #REQUIRED > <bib> <book year="1994"> 26

DTD: Text Two options: #PCDATA ("Parsed Character Data") = the text inside elements CDATA ("Character Data") = the text inside attributes There is no #CDATA and no PCDATA 27

Querying 28

Querying XML Data XPath = simple navigation à today XQuery = the SQL of XML à Friday XSLT = recursive traversal will not discuss in class 29

Sample Data for Queries <bib> <book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year> </book> <book price= 55 > <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year> </book> </bib> 30

Data Model for XPath XPath returns a sequence of items. An item is either: A value of primitive type, or A node (doc, element, or attribute) The root bib The root element book book publisher author.... Addison-Wesley Serge Abiteboul 31

XPath: Simple Expressions /bib/book/year Result: <year> 1995 </year> <year> 1998 </year> /bib/paper/year Result: empty (there were no papers) /bib What s the difference? / 32

XPath: Restricted Kleene Closure //author Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author> /bib//first-name Result: <first-name> Rick </first-name> 33

XPath: Attribute Nodes /bib/book/@price Result: 55 @price means that price has to be an attribute 34

XPath: Wildcard //author/* Result: <first-name> Rick </first-name> <last-name> Hull </last-name> * Matches any element @* Matches any attribute 35

XPath: Text Nodes /bib/book/author/text() Result: Serge Abiteboul Victor Vianu Jeffrey D. Ullman Rick Hull doesn t appear because he has first-name, last-name Functions in XPath: text() = matches the text value node() = matches any node (= * or @* or text()) name() = returns the name of the current tag 36

XPath: Predicates /bib/book/author[first-name] Result: <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> 37

XPath: More Predicates /bib/book/author[first-name][address[.//zip][city]]/last-name Result: <last-name> </last-name> <last-name> </last-name> How do we read this? First remove all qualifiers (predicates): /bib/book/author/last-name Then add them one by one: /bib/book/author[first-name][address]/last-name 38

XPath: More Predicates /bib/book[@price < 60] /bib/book[author/@age < 25] /bib/book[author/text()] 39

XPath: Position Predicates /bib/book[2] /bib/book[last()] /bib/book[@year = 1998] [2] /bib/book[2][@year = 1998] The 2nd book The last book The 2nd of all books in 1998 2nd book IF it is in 1998 40

XPath: More Axes. means current node /bib/book[.//review] /bib/book[./review] Same as /bib/book[review] /bib/author/. /first-name Same as /bib/author/first-name 41

XPath: More Axes.. means parent node /bib/author/.. /author/zip Same as /bib/author/zip /bib/book[.//review/../comments] Same as /bib/book[.//*[comments][review]] Hint: don t use.. 42

XPath: Summary bib matches a bib element * matches any element / matches the root element /bib matches a bib element under root bib/paper matches a paper in bib bib//paper matches a paper in bib, at any depth //paper matches a paper at any depth paper book matches a paper or a book @price matches a price attribute bib/book/@price matches price attribute in book, in bib bib/book[@price< 55 ]/author/last-name matches bib/book[@price< 55 or @price> 99 ]/author/last-name matches 43