Informatics 1: Data & Analysis

Similar documents
Informatics 1: Data & Analysis

Informatics 1: Data & Analysis

Informatics 1: Data & Analysis

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

Semi-structured Data. 8 - XPath

XPath. Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University

XPath. Lecture 36. Robb T. Koether. Wed, Apr 16, Hampden-Sydney College. Robb T. Koether (Hampden-Sydney College) XPath Wed, Apr 16, / 28

XML: Extensible Markup Language

Part II: Semistructured Data

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

XML Data Management. 5. Extracting Data from XML: XPath

XML & Databases. Tutorial. 3. XPath Queries. Universität Konstanz. Database & Information Systems Group Prof. Marc H. Scholl

XML Technologies. Doc. RNDr. Irena Holubova, Ph.D. Web pages:

Informatics 1: Data & Analysis

TDDD43. Theme 1.2: XML query languages. Fang Wei- Kleiner h?p:// TDDD43

H2 Spring B. We can abstract out the interactions and policy points from DoDAF operational views

Course: The XPath Language

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML

EMERGING TECHNOLOGIES

Web Services Week 3. Fall Emrullah SONUÇ. Department of Computer Engineering Karabuk University

XPath Lecture 34. Robb T. Koether. Hampden-Sydney College. Wed, Apr 11, 2012

Course: The XPath Language

XPath. by Klaus Lüthje Lauri Pitkänen

Navigating Input Documents Using Paths4

XML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW

XML Data Management. 6. XPath 1.0 Principles. Werner Nutt

XML and Databases. Outline. Outline - Lectures. Outline - Assignments. from Lecture 3 : XPath. Sebastian Maneth NICTA and UNSW

Querying XML. COSC 304 Introduction to Database Systems. XML Querying. Example DTD. Example XML Document. Path Descriptions in XPath

Advances in Programming Languages

StreamServe Persuasion SP5 XMLIN

2006 Martin v. Löwis. Data-centric XML. XPath

XPath and XQuery. Introduction to Databases CompSci 316 Fall 2018

Presentation. Separating Content and Presentation Cascading Style Sheets (CSS) XML and XSLT

Chapter 2 XML, XML Schema, XSLT, and XPath

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

XPath Expression Syntax

Navigation. 3.1 Introduction. 3.2 Paths

Informatics 1: Data & Analysis

Announcements (March 31) XML Query Processing. Overview. Navigational processing in Lore. Navigational plans in Lore

Outline of this part (I) Part IV. Querying XML Documents. Querying XML Documents. Outline of this part (II)

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today

XPath node predicates. Martin Holmes

Comp 336/436 - Markup Languages. Fall Semester Week 9. Dr Nick Hayward

Informatics 1: Data & Analysis

XPath. Mario Alviano A.Y. 2017/2018. University of Calabria, Italy 1 / 21

Binary Trees

METAXPath. Utah State University. From the SelectedWorks of Curtis Dyreson. Curtis Dyreson, Utah State University Michael H. Böhen Christian S.

Navigating an XML Document

Chapter 13 XML: Extensible Markup Language

Seleniet XPATH Locator QuickRef

Query Languages for XML

ADT 2009 Other Approaches to XQuery Processing

Informatics 1: Data & Analysis

Informatics 1: Data & Analysis

XML, DTD, and XPath. Announcements. From HTML to XML (extensible Markup Language) CPS 116 Introduction to Database Systems. Midterm has been graded

Informatics 1: Data & Analysis

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

Querying XML Data. Querying XML has two components. Selecting data. Construct output, or transform data

Lecture 05 I/O statements Printf, Scanf Simple statements, Compound statements

516. XSLT. Prerequisites. Version 1.2

Evaluating XPath Queries

Semantic Characterizations of XPath

Example using multiple predicates

Informatics 1: Data & Analysis

Arbori Starter Manual Eugene Perkov

Computer Science E-259

Overview. Structured Data. The Structure of Data. Semi-Structured Data Introduction to XML Querying XML Documents. CMPUT 391: XML and Querying XML

COPYRIGHTED MATERIAL. Contents. Part I: Introduction 1. Chapter 1: What Is XML? 3. Chapter 2: Well-Formed XML 23. Acknowledgments

DBS2: Exkursus XQuery and XML-Databases. Jan Sievers Jens Hündling Lars Trieloff

An introduction to searching in oxygen using XPath

xmlns:gu=" xmlns:uky="

Tutorial 8: Practice Exam Questions

XML & Databases. Tutorial. 9. Staircase Join, Symmetries. Universität Konstanz. Database & Information Systems Group Prof. Marc H.

XML and information exchange. XML extensible Markup Language XML

Databases and Information Systems 1. Prof. Dr. Stefan Böttcher

Delivery Options: Attend face-to-face in the classroom or remote-live attendance.

A Structural Numbering Scheme for XML Data

11. EXTENSIBLE MARKUP LANGUAGE (XML)

4D2b Navigating an XML Document

An Algorithm for Streaming XPath Processing with Forward and Backward Axes

COMP9321 Web Application Engineering

XML. Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior

Integrating Path Index with Value Index for XML data

Semistructured Data and XML

CSE 530A. B+ Trees. Washington University Fall 2013

Delivery Options: Attend face-to-face in the classroom or via remote-live attendance.

Trees. Carlos Moreno uwaterloo.ca EIT

Manipulating XML Trees XPath and XSLT. CS 431 February 18, 2008 Carl Lagoze Cornell University

B.V.Patel Institute of Business Management, Computer & Information Technology, UTU

Request for Comments: 4825 Category: Standards Track May 2007

Notes on XML and XQuery in Relational Databases

Informatics 1: Data & Analysis

Tutorial 5: XML. Informatics 1 Data & Analysis. Week 7, Semester 2,

~ Ian Hunneybell: DIA Revision Notes ~

- if you look too hard it isn't there

Introduction to XPath

2. Write style rules for how you d like certain elements to look.

Big Data 12. Querying

Big Data 10. Querying

The main Topics in this lecture are:

Transcription:

T O Y H Informatics 1: Data & Analysis Lecture 11: Navigating XML using XPath Ian Stark School of Informatics The University of Edinburgh Tuesday 26 February 2013 Semester 2 Week 6 E H U N I V E R S I T http://www.inf.ed.ac.uk/teaching/courses/inf1/da F E D I N B U R G

Lecture Timing This is Inf1-DA Lecture 11, in Week 6. There is no Inf1-DA lecture on Friday, 1 March. Inf1-DA Lecture 12 is on Tuesday 5 March, in Week 7. Normal service then resumes.

Lecture Plan XML We start with technologies for modelling and querying semistructured data. Semistructured Data: Trees and XML Schemas for structuring XML Navigating and querying XML with XPath Corpora One particular kind of semistructured data is large bodies of written or spoken text: each one a corpus, plural corpora. Corpora: What they are and how to build them Applications: corpus analysis and data extraction

Sample Semistructured Data

Sample Semistructured Data in XML <Gazetteer> <Country> <Name>Slovenia</Name> <Population>2,020,000</Population> <Capital>Ljubljana</Capital> <Region> <Name>Gorenjska</Name> <Feature type="lake">bohinj</feature> <Feature type="mountain">triglav</feature> <Feature type="mountain">spik</feature> </Region> </Country> <! data for other countries here > </Gazetteer>

How to Extract Information from an XML Document? Since an XML document is a text document, we could simply use conventional text search to look for data. However, this ignores all the document structure. A more powerful approach is to use a dedicated language for forming queries based on the tree structure of an XML document. This is (yet another) domain-specific language. With such a language we can, for example: Perform database-style queries on data published as XML; Extract annotated content from marked-up text documents; Identify information captured in the tree structure itself.

XQuery and XPath XQuery is a powerful declarative query language for extracting information from XML documents. As well as using XML documents for its source data, XQuery can also produce XML documents as output, so we can view it as an XML transformation language. However, the XQuery language is complex, and we shall not investigate it further here. XPath is a sublanguage of XQuery, used for navigating XML documents using path expressions. XPath can be viewed as a rudimentary query language in its own right. It is also an important component of other XML application languages (XML Schema, XSLT, XForms,... ).

XPath Location Paths An XPath location path (or path expression) identifies a set of nodes within an XML document tree. The location path describes a set of possible paths from the root of the tree. The set of nodes identified is all those reached as final destinations of these paths. When using a location path as a query on a document, this set of nodes is returned as a list (without duplicates) sorted in document order the order the nodes appeared in the original XML document.

Family Tree Navigation Document order Siblings of A Ancestors of A Descendants of A

Examples of Location Paths The next few slides illustrate a selection of location paths applied to the gazetteer example. Each expression appears twice: once using full XPath syntax, and once using a standard abbreviated syntax. In each case, the nodes identified by the path are highlighted in red, and for a query would be retrieved in document order. Paths are built up step-by-step as the location path is read from left to right, with a context node that travels over the tree according to the components of the location path. The slash / at the start of a location path indicates that the starting position for the context node is the document root.

/child::gazetteer /Gazetteer

/child::gazetteer/child::country /Gazetteer/Country

/child::gazetteer/child::country/child::region /Gazetteer/Country/Region

/descendant::region/child:: //Region/

/descendant::region/descendant:: //Region//

/descendant::region/descendant::node() //Region//node()

/descendant::region/descendant::text() //Region//text()

/descendant::feature/attribute::type //Feature/@type

Syntax for Location Paths A location path is a sequence of location steps separated by a / character. Each location step has the form axis :: node-test predicate The axis indicates which way the context node moves. The node test selects nodes of an appropriate type. The optional predicates supply further conditions that need to be satisfied to continue with the path. The examples so far used the child and descendant axes; node-tests node(), text(),, and individual names; and no predicates.

Some Axes Different axes point in different directions from the current context node. child: immediate children (attribute nodes don t count) descendant: any descendants (again, not attribute nodes) parent: the unique parent (root has no parent) attribute: all attribute nodes (context node must be an element node) self: the context node itself descendant or self: the context node together with its descendants.

Some Node Tests Node tests select among all nodes along the current axis. text(): nodes with character data. node(): all nodes. : all nodes of the principal node type for this axis: for the attribute axis, this is attribute nodes; for any other axis, element nodes. name: element nodes with the given name. The names used for node tests in the earlier examples were: Gazetteer, Country, Region, Feature and type.

XPath Abbreviations Complete path expressions can become cumbersome, and XPath provides a number of abbreviations for the basic operations. The child:: axis is default and can be omitted Syntax @ is an abbreviation for attribute:: Syntax // is an abbreviation for /descendant or self::node()/ Syntax.. is an abbreviation for parent::node() Syntax. is an abbreviation for self::node()

UTF-8 Encoding

Syntax for Location Paths A location path is a sequence of location steps separated by a / character. Each location step has the form axis :: node-test predicate The axis indicates which way the context node moves. The node test selects nodes of an appropriate type. The optional predicates supply further conditions that need to be satisfied to continue with the path. The examples so far used the child and descendant axes; node-tests node(), text(),, and individual names; and no predicates.

Some Predicates The node test in a location step may be followed by zero, one or several predicates each given by an expression enclosed in square brackets. [locationpath] Selects only those nodes for which there exists a continuation path matching locationpath. [locationpath=value] Selects nodes for which there is a continuation path matching locationpath where the final node of the path is equal to value. The full syntax of XPath predicate expressions includes arithmetic operations and further path queries, and is beyond the scope of this course.

/descendant::feature[attribute::type= Mountain ] //Feature[@type= Mountain ]

/descendant::feature[attribute::type= Mountain ]/child::text() //Feature[@type= Mountain ]/text()

/descendant::feature[attribute::type= Mountain ]/parent:: /child::name/child::text() //Feature[@type= Mountain ]/../Name/text()

/descendant:: [Feature/attribute::type= Mountain ]/child::name/child::text() // [Feature/@type= Mountain ]/Name/text()

XPath as Query Language These last examples begin to show XPath as a query language, in this case identifying in turn: All features which are mountains; The names of all mountains; The names of all regions containing mountains. When using XPath in practice, it s often necessary to prefix a location path with a pointer to the relevant XML document: doc("gazetteer.xml")//feature[@type= Mountain ]/text()

Subtleties in Complex Queries Name all countries containing a feature called Salmon River We can select this from a gazetteer with the following XPath expression: //Country[.//Feature/text()= Salmon River ]/Name/text() Note the use of. to start a predicate path at the current context node. However, this other apparently very similar expression won t do: //Country[//Feature/text()= Salmon River ]/Name/text() Without. the predicate //Feature/text() goes back to the root node.

More on XPath Full XPath has a host of other features, including: navigation based on document order, position and size of context; name spaces; and a rich expression language. Further Reading The official W3C specification: Wikipedia on XPath: The (wildly optimistic) 10-minute XPath Tutorial: Homework http://www.w3.org/tr/xpath https://en.wikipedia.org/wiki/xpath http://is.gd/xpath10 Tutorial sheet 5 is now online. This involves writing an XML DTD and XPath queries, and running command-line tools which use them. There s quite a lot to do in this one, so start soon.