One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

Similar documents
XPath. Lecture 36. Robb T. Koether. Wed, Apr 16, Hampden-Sydney College. Robb T. Koether (Hampden-Sydney College) XPath Wed, Apr 16, / 28

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

Seleniet XPATH Locator QuickRef

Semi-structured Data. 8 - XPath

H2 Spring B. We can abstract out the interactions and policy points from DoDAF operational views

XML Data Management. 5. Extracting Data from XML: XPath

XPath Lecture 34. Robb T. Koether. Hampden-Sydney College. Wed, Apr 11, 2012

XPath Expression Syntax

TDDD43. Theme 1.2: XML query languages. Fang Wei- Kleiner h?p:// TDDD43

Navigating Input Documents Using Paths4

Informatics 1: Data & Analysis

Query Languages for XML

XML & Databases. Tutorial. 3. XPath Queries. Universität Konstanz. Database & Information Systems Group Prof. Marc H. Scholl

Example using multiple predicates

Course: The XPath Language

XML Data Management. 6. XPath 1.0 Principles. Werner Nutt

Querying XML. COSC 304 Introduction to Database Systems. XML Querying. Example DTD. Example XML Document. Path Descriptions in XPath

XPath and XQuery. Introduction to Databases CompSci 316 Fall 2018

XPath. Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University

XML: Extensible Markup Language

XML Technologies. Doc. RNDr. Irena Holubova, Ph.D. Web pages:

Course: The XPath Language

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Semantic Characterizations of XPath

Arbori Starter Manual Eugene Perkov

Informatics 1: Data & Analysis

Navigating an XML Document

XPath an XML query language

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today

XML, DTD, and XPath. Announcements. From HTML to XML (extensible Markup Language) CPS 116 Introduction to Database Systems. Midterm has been graded

XML. Semi-structured data (SSD) SSD Graphs. SSD Examples. Schemas for SSD. More flexible data model than the relational model.

An introduction to searching in oxygen using XPath

Binary Trees

Informatics 1: Data & Analysis

XML in Databases. Albrecht Schmidt. al. Albrecht Schmidt, Aalborg University 1

7 JSON and XML: Giving Messages a Structure

Follow these steps to get started: o Launch MS Access from your start menu. The MS Access startup panel is displayed:

Teiid Designer User Guide 7.5.0

Chapter 13 XML: Extensible Markup Language

EMERGING TECHNOLOGIES

Announcements (March 31) XML Query Processing. Overview. Navigational processing in Lore. Navigational plans in Lore

YFilter: an XML Stream Filtering Engine. Weiwei SUN University of Konstanz

CSC Web Technologies, Spring Web Data Exchange Formats

Web Services Week 3. Fall Emrullah SONUÇ. Department of Computer Engineering Karabuk University

DOM Interface subset 1/ 2

Introduction to XPath

EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML

Chapter 1 Readme.doc definitions you need to know 1

SQLfX Live Online Demo

4D2b Navigating an XML Document

V Advanced Data Structures

Trees. Carlos Moreno uwaterloo.ca EIT

Progress Report on XQuery

Burrows & Langford Appendix D page 1 Learning Programming Using VISUAL BASIC.NET

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

~ Ian Hunneybell: DIA Revision Notes ~

XPath node predicates. Martin Holmes

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML

StreamServe Persuasion SP5 XMLIN

DBS2: Exkursus XQuery and XML-Databases. Jan Sievers Jens Hündling Lars Trieloff

Querying XML Data. Querying XML has two components. Selecting data. Construct output, or transform data

singly and doubly linked lists, one- and two-ended arrays, and circular arrays.

Indexing Keys in Hierarchical Data

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Outline of this part (I) Part IV. Querying XML Documents. Querying XML Documents. Outline of this part (II)

V Advanced Data Structures

XML. Document Type Definitions. Database Systems and Concepts, CSCI 3030U, UOIT, Course Instructor: Jarek Szlichta

18.3 Deleting a key from a B-tree

Web scraping and crawling, open data, markup languages and data shaping. Paolo Boldi Dipartimento di Informatica Università degli Studi di Milano

Model Querying with Graphical Notation of QVT Relations

Databases and Information Systems 1. Prof. Dr. Stefan Böttcher

Evaluating XPath Queries

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology

2006 Martin v. Löwis. Data-centric XML. XPath

XPath. Mario Alviano A.Y. 2017/2018. University of Calabria, Italy 1 / 21

XML and information exchange. XML extensible Markup Language XML

Trees. Carlos Moreno uwaterloo.ca EIT

XQuery. Leonidas Fegaras University of Texas at Arlington. Web Databases and XML L7: XQuery 1

Introduction to XQuery. Overview. Basic Principles. .. Fall 2007 CSC 560: Management of XML Data Alexander Dekhtyar..

Teiid Designer User Guide 7.7.0

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

XML. Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior

Navigation. 3.1 Introduction. 3.2 Paths

XPath evaluation in linear time with polynomial combined complexity

Informatics 1: Data & Analysis

2.2 Syntax Definition

Section 5.5. Left subtree The left subtree of a vertex V on a binary tree is the graph formed by the left child L of V, the descendents

CSE 214 Computer Science II Introduction to Tree

Comp 336/436 - Markup Languages. Fall Semester Week 9. Dr Nick Hayward

6/3/2016 8:44 PM 1 of 35

Big Data 10. Querying

XML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW

Chapter 1: Semistructured Data Management XML

XML Query Languages. Yanlei Diao UMass Amherst April 22, Slide content courtesy of Ramakrishnan & Gehrke, Donald Kossmann, and Gerome Miklau

Big Data 12. Querying

6.001 Notes: Section 6.1

Databases and Information Systems 1

Quick XPath Guide. Introduction. What is XPath? Nodes

XML, XPath, and XSLT. Jim Fawcett Software Modeling Copyright

6.001 Notes: Section 15.1

Transcription:

1

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling the request. Today we will look at XPath, a declarative syntax for navigating XML documents. The key observation is that XML, with its tree layout, has exactly one path from the root to each other element in the document. Intuitively, the structure is very similar to a file system path or web URL, and XPath adopts a similar notation. Conceptually, a path identifies a projection of the document (returning only certain types of elements). On top of that, xpath adds support for selection (filtering out unwanted elements) and basic types of aggregation (count, sum, etc.). The result is a language that, while nowhere near as expressive as RA/SQL, is still quite useful for distilling useful bits of information from large XML files. 2

XPath is a fairly intuitive language, requiring only a few basic concepts to understand well. The base unit of an xpath expression is a path step ( location step in the standard). Each step identifies an axis to move along (see slides that follow), an element name to move to, and an optional predicate to filter out unwanted paths. XPath is nested: predicates can themselves contain xpath expressions (which themselves can contain predicates). We will go over each of these concepts in detail. 3

At each step of an xpath query, we can imagine a pointer in the tree that specifies the current location from which the next step will be taken. This location is known as the context node. There are a number of different directions we might move in, called axes. The default axis is to move from the context node to a child named in the path step. Two other well-known axes come from the file system world: the parent axis allows moving toward the root element (usually specified using the short-hand notation../ ) and the self axis makes the next step stay in the same place (short-hand is./ ). 4

Consider the xml tree specified here. Individual elements are labeled with their names, while triangles denote sub-trees of unknown size. The context node (a book ) is shown in the center. It can be referred to in a step using the self axis. The only node on the parent axis is the genre node above it, the ancestors axis contains both genre and book-list (every element along the path from the root to the context node s own parent). The ancestor-or-self axis is exactly what it sounds like, and contains both the context node and its proper ancestors. Moving downward, the child, descendant, and descendant-or-self axes capture children, nodes below the context, and nodes at or below self. Sideways movement is also possible: the following axis includes every node whose opening and closing tags are both after the context node s closing tag (descendants of the context node are not included). The following-sibling axis refers to siblings of the context node that follow it (in other words, elements having the same parent that are found along the following axis). The preceding axis is similar to following, but includes all elements whose opening and closing tags both appear in the document before the context node s opening tag (ancestors are not included). 5

There are also a few special axes that are used to access non-element information: attribute gives access to attributes of the context node, and namespace gives access to a node s namespace (namespaces are a feature of XML that allows for logical grouping of related elements; such tag names consist of a namespace:element pair). Note that these special axes must be the last step in any path expression (other than self steps), because they return strings and path steps work with node sets. Xpath also defines several functions: given the example xml snippet above, ::* selects only elements, ::text() selects only the textual contents of a node, and ::node() selects everything (text and elements). Xpath accesses elements in strict document order (that s why preceding and following axes are meaningful), and so each node has a position relative to its siblings, available by invoking ::position(). Positioning is one-based. 6

Several short-hand notations are available in xpath, some of the more commonlyused ones are shown here. 7

One big difference from a file system: path names are not necessarily unique and an xpath expression always returns a set of nodes (possibly empty or containing only one element). As far as the language is concerned, there could be several book elements, each with multiple title elements as children, and all would be returned by the example xpath queries shown here. A DTD might reasonably forbid books from having multiple titles, but that s an orthogonal matter. 8

A given path expression returns a set of nodes (all nodes along any path that matches the one given); we can filter that set using predicates, which are given as boolean expressions inside square brackets. Here you can see several examples of the kinds of predicate expressions that can be used. It s VERY IMPORTANT to note that xpath applies existential quantification when predicate treats a node set as a scalar value: empty sets are false and non-empty sets are true. The query /book-list/book[editor] thus returns all books having at least one editor. When comparing a node set to a scalar value, the engine uses the predicate to filter the set, then returns false if the resulting set is empty. For example, consider the query /book-list/book[price < 50]. As long as at least one price smaller than 50 exists, the resulting node set is non-empty and the predicate is satisfied; if no price exists, or if all prices are at least 50, then the predicate s node set is empty and the predicate does not pass. Put another way, book-list/book/[editor] is shorthand for book-list/book[count(editor) > 0] and book-list/book[price < 50] is shorthand for book-list/book[count(price[. < 50]) > 0]. To ask for a book whose prices are *all* under 50 (universal quantification) you have to invert the question, to ask for books having no prices over 50: booklist/book[not (price >= 50)] 9

When chaining predicates, realize that position() refers to the position of the node relative to the siblings that are still in the node set. The query /book[3][price < 50]/title says out of all books, return the third one if its price is less than 50 while the query /book[price < 50][3] says out of the books with price less than 50, return the third one 10

//book/author[1] says return the first author of each book (//book/author)[1] says return the first author from the list of all book authors 11

12

The unnecessary restriction on unions makes very little logical sense, and is arguably a symptom of xpath being designed by a committee (top-down) rather than derived from a sound underlying theory (bottom-up). SQL has plenty of its own flaws, but the features it does have are usually complete and consistent. 13

14

See http://www.w3.org/tr/xpath/#corelib for the full list of functions (that page contains all other official details about xpath as well). 15

16

17

18

19

20

21

22

23

24

25

26

27