Computer Science E-259

Similar documents
Big Data 9. Data Models

Faster XML data validation in a programming language with XML datatypes

This Documentation, which includes embedded help systems and electronically distributed materials, (hereinafter referred to

The Prague Markup Language (Version 1.1)

DBS2: Exkursus XQuery and XML-Databases. Jan Sievers Jens Hündling Lars Trieloff

Data Presentation and Markup Languages

Describing Document Types: The Schema Languages of XML Part 2

Outline. XML vs. HTML and Well Formed vs. Valid. XML Overview. CSC309 Tutorial --XML 4. Edward Xia

Author: Irena Holubová Lecturer: Martin Svoboda

Big Data for Engineers Spring Data Models

The concept of DTD. DTD(Document Type Definition) Why we need DTD

XML Format Plug-in User s Guide. Version 10g Release 3 (10.3)

XML, DTD, and XPath. Announcements. From HTML to XML (extensible Markup Language) CPS 116 Introduction to Database Systems. Midterm has been graded

Advances in Programming Languages

SDPL : XML Basics 2. SDPL : XML Basics 1. SDPL : XML Basics 4. SDPL : XML Basics 3. SDPL : XML Basics 5

Katz_C01.qxd 7/23/03 2:45 PM Page 1. Part I BASICS

Chapter 5: XPath/XQuery Data Model

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington

Extensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013

Outline. XML DOCTYPE External - SYSTEM. XML DOCTYPE Internal DTD &6&7XWRULDO ;0/ (GZDUG;LD

Querying XML 4/2/2008. USC - CSCI585 - Spring Farnoush Banaei-Kashani

Big Data Fall Data Models

W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes

Semistructured data, XML, DTDs

Chapter 1: Getting Started. You will learn:

MetaBase Modeler User s Guide MetaMatrix Products, Release 4.2 SP2 (Second Service Pack for Release 4.2) Document Edition 1, June 10, 2005

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11

Chapter 1: Semistructured Data Management XML

Chapter 13 XML: Extensible Markup Language

XPath and XQuery. Introduction to Databases CompSci 316 Fall 2018

Session [2] Information Modeling with XSD and DTD

Chapter 1: Semistructured Data Management XML

Overview. Introduction. Introduction XML XML. Lecture 16 Introduction to XML. Boriana Koleva Room: C54

WEB SERVICES FOR MANAGEMENT

Computer Science E-1. Understanding Computers and the Internet. Lecture 10: Website Development Wednesday, 29 November 2006

Marker s feedback version

Big Data 11. Data Models

2006 Martin v. Löwis. Data-centric XML. Document Types

Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa

UNIT I. A protocol is a precise set of rules defining how components communicate, the format of addresses, how data is split into packets

Tutorial 2: Validating Documents with DTDs

Introduction to XML. Leonidas Fegaras. CSE 6331 Leonidas Fegaras XML 1

Computer Science E-75 Building Dynamic Websites

Introduction to XML. An Example XML Document. The following is a very simple XML document.

Measuring the Capacity of an XML Schema

W3C XML Schemas For Publishing

EMERGING TECHNOLOGIES. XML Documents and Schemas for XML documents

Big Data 10. Querying

IBM. XML and Related Technologies Dumps Braindumps Real Questions Practice Test dumps free

Computer Science E-259

Big Data 12. Querying

Cyrus Shahabi Computer Science Department University of Southern California

XML module 2. Creating XML. Hans C. Arents. senior IT market analyst. I.T. Works. Guiding the IT Professional

Delivery Options: Attend face-to-face in the classroom or via remote-live attendance.

What is XHTML? XHTML is the language used to create and organize a web page:

Semistructured Data and XML

extensible Markup Language

HTML Overview. With an emphasis on XHTML

Markup Languages SGML, HTML, XML, XHTML. CS 431 February 13, 2006 Carl Lagoze Cornell University

Part 2: XML and Data Management Chapter 6: Overview of XML

Information Systems Modelling Information Systems II: XML

The XQuery Data Model

Modelling XML Applications (part 2)

Delivery Options: Attend face-to-face in the classroom or remote-live attendance.

Introduction to HTML5

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

Constructing a Document Type Definition (DTD) for XML

XML (Extensible Markup Language)

Part VII. Querying XML The XQuery Data Model. Marc H. Scholl (DBIS, Uni KN) XML and Databases Winter 2005/06 153

Fundamentals of Web Programming a

Introduction to XML. When talking about XML, here are some terms that would be helpful:

CSCI3030U Database Models

XML. Semi-structured data (SSD) SSD Graphs. SSD Examples. Schemas for SSD. More flexible data model than the relational model.

Why do we need an XML query language? XQuery: An XML Query Language CS433. Acknowledgment: Many of the slides borrowed from Don Chamberlin.

Java EE 7: Back-end Server Application Development 4-2

Advanced Java Programming

XML Applications. Prof. Andrea Omicini DEIS, Ingegneria Due Alma Mater Studiorum, Università di Bologna a Cesena

Structured documents

1Z Java SE 5 and 6, Certified Associate Exam Summary Syllabus Questions

XML. extensible Markup Language. Overview. Overview. Overview XML Components Document Type Definition (DTD) Attributes and Tags An XML schema

COMP9321 Web Application Engineering

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

Oracle Berkeley DB XML. API Reference for C++ 12c Release 1

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML

XML Schema Profile Definition

XML. Objectives. Duration. Audience. Pre-Requisites

Introduction to XML. XML: basic elements

XML. extensible Markup Language. ... and its usefulness for linguists

extensible Markup Language (XML) Basic Concepts

XML Information Set. Working Draft of May 17, 1999

XQuery. Web Data Management and Distribution. Serge Abiteboul Philippe Rigaux Marie-Christine Rousset Pierre Senellart

XML and DTD. Mario Alviano A.Y. 2017/2018. University of Calabria, Italy 1 / 28

XML. Document Type Definitions XML Schema. Database Systems and Concepts, CSCI 3030U, UOIT, Course Instructor: Jarek Szlichta

About the Data-Flow Complexity of Web Processes

Introduction to XML. M2 MIA, Grenoble Université. François Faure

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

Introduction p. 1 An XML Primer p. 5 History of XML p. 6 Benefits of XML p. 11 Components of XML p. 12 BNF Grammar p. 14 Prolog p. 15 Elements p.

XQuery. Leonidas Fegaras University of Texas at Arlington. Web Databases and XML L7: XQuery 1

(One) Layer Model of the Semantic Web. Semantic Web - XML XML. Extensible Markup Language. Prof. Dr. Steffen Staab Dipl.-Inf. Med.

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

Transcription:

Computer Science E-259 XML with Java, Java Servlet, and JSP Lecture 8: XQuery 1.0 and DTD 19 November 2007 David J. Malan malan@post.harvard.edu 1

Last Time HTTP 1.1, JavaServer Pages 2.1, and Java Servlet 2.5 HTTP 1.1 n-tier Enterprise Applications JavaServer Pages 2.1 Java Servlet 2.5 Project 3 2

Last Time Typical J2EE Architecture Laptop Computer PDA Client XML? XML? XML? HTTP Presentation JSP/servlet JSP/servlet Web Server XML? XML? RMI Business Logic EJB EJB XML? EJB Server XML? JDBC 3 Data DB

Last Time Wahoo! Computer You Write! Client Tier Login servlet Prefs servlet View servlet webserver Middle Tier UserManager NewsProvider moreover.com User DB Back-End Tier 4

Computer Science E-259 This Time XQuery 1.0 DTD Project 3 5

XQuery 1.0 History Recommendation as of 1/07. XML is a versatile markup language, capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases, and object repositories. 6

XQuery 1.0 XPath 2.0 Sequences Data types Enhanced function set Multiple sources 7

XQuery 1.0 Path Expressions doc("books.xml") doc("books.xml")/bib/book/title doc("books.xml")//title doc("books.xml")/bib/book[price<50] 8 Adapted from http://www.w3schools.com/xquery/xquery_example.asp.

XQuery 1.0 FLWOR Expressions FLWORExpr ::= (ForClause LetClause)+ WhereClause? OrderByClause? "return" ExprSingle 9 Excerpted from http://www.w3.org/tr/xquery/.

XQuery 1.0 FLWOR Expressions for $x in doc("books.xml")/bib/book where $x/price>50 order by $x/title return $x/title 10 Adapted from http://www.w3schools.com/xquery/xquery_example.asp.

XQuery 1.0 FLWOR Expressions <bib> <book> <title>tcp/ip Illustrated</title> <author>stevens</author> <publisher>addison-wesley</publisher> </book> <book> <title>advanced Unix Programming</title> <author>stevens</author> <publisher>addison-wesley</publisher> </book> <book> <title>data on the Web</title> <author>abiteboul</author> <author>buneman</author> <author>suciu</author> </book> </bib> 11 Excerpted from http://www.w3.org/tr/xquery/.

XQuery 1.0 FLWOR Expressions 12 <authlist> { for $a in fn:distinct-values($books//author) order by $a return <author> <name> { $a/text() } </name> <books> { for $b in $books//book[author = $a] order by $b/title return $b/title } </books> </author> } </authlist> Adapted from http://www.w3.org/tr/xquery/.

XQuery 1.0 FLWOR Expressions 13 <authlist> <author> <name>abiteboul</name> <books> <title>data on the Web</title> </books> </author> <author> <name>buneman</name> <books> <title>data on the Web</title> </books> </author> <author> <name>stevens</name> <books> <title>tcp/ip Illustrated</title> <title>advanced Unix Programming</title> </books> </author> <author> <name>suciu</name> <books> <title>data on the Web</title> </books> </author> </authlist> Excerpted from http://www.w3.org/tr/xquery/.

XQuery 1.0 Sequence Expressions for $d in doc("depts.xml")//deptno let $e := doc("emps.xml")//emp[deptno = $d] where count($e) >= 10 order by avg($e/salary) descending return <big-dept> { $d, <headcount>{count($e)}</headcount>, <avgsal>{avg($e/salary)}</avgsal> } </big-dept> 14 Example excerpted from http://www.w3.org/tr/xquery/.

XQuery 1.0 Conditional Expressions FOR $h IN doc("library.xml")//holding RETURN <holding> { $h/title, IF ($h/@type = "Journal") THEN $h/editor ELSE $h/author } </holding> 15 Example adapted from http://www.brics.dk/~amoeller/xml/querying/condexp.html.

XQuery 1.0 Quantified Expressions FOR $b IN doc("bib.xml")//book WHERE SOME $p IN $b//paragraph SATISFIES (contains($p,"sailing") AND contains($p,"windsurfing")) RETURN $b/title FOR $b IN doc("bib.xml")//book WHERE EVERY $p IN $b//paragraph SATISFIES contains($p,"sailing") RETURN $b/title 16 Examples adapted from http://www.brics.dk/~amoeller/xml/querying/quantexp.html.

XQuery 1.0 Data Types String-related ENTITIES, ENTITY, ID, IDREF, IDREFS, language, Name, NCName, NMTOKEN, NMTOKENS, normalizedstring, QName, string, token Date-related date, datetime, duration, gday, gmonth, gmonthday, gyear, gyearmonth, time Number-related base64binary, byte, decimal, double, float, hexbinary, int, integer, long, negativeinteger, nonpositiveinteger, positiveinteger, short, unsignedlong, unsignedint, unsignedshort, unsignedbyte Err, unrelated anyuri, boolean, NOTATION,... User-Defined 17

XQuery 1.0 Expressions on Sequence Types Instance of <a>{5}</a> instance of xs:integer Typeswitch typeswitch($customer/billing-address) case $a as element(*, USAddress) return $a/state case $a as element(*, CanadaAddress) return $a/province case $a as element(*, JapanAddress) return $a/prefecture default return "unknown" Cast and Castable if ($x castable as hatsize) then $x cast as hatsize else if ($x castable as IQ) then $x cast as IQ else $x cast as xs:string 18 Examples excerpted from http://www.w3.org/tr/xquery/.

Well-Formedness <moreovernews> [...] <article id="_840925179"> <url>http://c.moreover.com/click/here.pl?x840925179</url> <headline_text>whose Genome Is It, Anyway?</headline_text> <source>discover</source> <media_type>text</media_type> <cluster>moreover...</cluster> <tagline></tagline> <document_url>http://discovermagazine.com</document_url> <harvest_time>mar 11 2007 8:46AM</harvest_time> <access_registration></access_registration> <access_status></access_status> </article> [...] </moreovernews> 19 Excerpted from http://www.fas.harvard.edu/~cscie259/distribution/projects/project3-7.0/root/xml/cache/biotech%2520news.xml.

Validity <!ELEMENT moreovernews (article*)> <!ELEMENT article (url, headline_text, source, media_type, cluster, tagline, document_url, harvest_time, access_registration, access_status)> <!ATTLIST article id ID #IMPLIED> <!ELEMENT url (#PCDATA)> <!ELEMENT headline_text (#PCDATA)> <!ELEMENT source (#PCDATA)> <!ELEMENT media_type (#PCDATA)> <!ELEMENT cluster (#PCDATA)> <!ELEMENT tagline (#PCDATA)> <!ELEMENT document_url (#PCDATA)> <!ELEMENT harvest_time (#PCDATA)> <!ELEMENT access_registration (#PCDATA)> <!ELEMENT access_status (#PCDATA)> 20 Available at http://www.fas.harvard.edu/~cscie259/distribution/projects/project3-7.0/root/dtd/moreovernews.dtd.

XHTML 1.0 Transitional <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> <title/> </head> <body/> </html> 21 Available at http://www.fas.harvard.edu/~cscie259/distribution/lectures/8/examples8/xhtml.html.

Overview A DTD is a definition of an XML document's schema Codifies what the structure of a document must be The relationships between the components of the document What data is allowed where The DTD language was released as part of the official XML specification XML Schema is a more modern, powerful way to accomplish the same goals However, DTDs are still widely in use, and are supported as the primary method of validating XML 22

Motivation DTDs, or schemas in general, are a contracts for what make a certain type of XML document DTDs allow you to check whether a document "instance" is "valid" with respect to its schema (in contrast with its simply being well-formed) DTDs provide a place to specify what belongs in elements, attributes, and what individual elements represent, etc. Particularly useful in B2B transactions where agreeing on a data format is important DTDs encapsulate good document design so you can benefit from it Why reinvent a document standard when there is DocBook? http://www.oasis-open.org/specs/index.php#dbv4.1 Why reinvent a financial exchange standard when there is OFX? http://www.ofx.net/ofx/specview/specview.html Why reinvent a voice standard when there is VoiceXML? http://www.w3.org/tr/voicexml20/vxml.dtd 23

To DTD or not to DTD It depends on the application DTDs (or schemas in general) are crucial when a common understanding of data is important XML makes data interchange easier from a technical standpoint, but it still doesn't eliminate human misunderstandings I say <price>, you say <cost> Writing a DTD can help you design a good data model All the principles of proper data modeling apply to XML as well However, DTDs constrain XML flexibility As soon as you have a DTD, your data model is less extensible At least, changes require distribution of a new DTD 24

A SONG Element <SONG> <TITLE>Everyday</TITLE> <COMPOSER>Dave</COMPOSER> <COMPOSER>Boyd Tinsley</COMPOSER> <PRODUCER>Dave Matthews</PRODUCER> <PUBLISHER>BMG</PUBLISHER> <LENGTH>12:20</LENGTH> <YEAR>2001</YEAR> <ARTIST>Dave Matthews Band</ARTIST> </SONG> 25 Excerpted from http://www.fas.harvard.edu/~cscie259/distribution/lectures/8/examples8/song{1,2}.xml.

A DTD for SONG Elements <!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT COMPOSER (#PCDATA)> <!ELEMENT PRODUCER (#PCDATA)> <!ELEMENT PUBLISHER (#PCDATA)> <!ELEMENT LENGTH (#PCDATA)> <!ELEMENT YEAR (#PCDATA)> <!ELEMENT ARTIST (#PCDATA)> 26 Available at http://www.fas.harvard.edu/~cscie259/distribution/lectures/8/examples8/song.dtd.

The <!ELEMENT> Declaration <!ELEMENT element_name (content_model)> 27

The <!ELEMENT> Declaration Gives the name and content model of an element The name must be unique The content model specifies what the valid child content can be #PCDATA <!ELEMENT TITLE (#PCDATA)> EMPTY <!ELEMENT course EMPTY> Elements Mixed ANY <!ELEMENT comment ANY> 28

Element Content The most sophisticated of content types Allows you to specify a regular expression for the allowed child elements <!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> <!ELEMENT spec (front, body, back?)> <!ELEMENT div1 (head, (p list note)*, div2*)> 29

Building Blocks of Regular Expressions foo? The foo element must occur 0 times or exactly 1 time. foo* The foo element may occur 0 or more times. foo+ The foo element must occur 1 or more times. (foo bar baz) Either foo or bar or baz must appear exactly 1 time. (foo,bar,baz) 1 instance of foo must occur, followed by 1 instance of bar, followed by 1 instance of baz. 30

Mixed Content When both character and element content can be interspersed, the names of the elements can be constrained, but not their order or number; and #PCDATA must be declared first! <!ELEMENT p (#PCDATA a u b i em)*> <p>i am <b>bold</b> and <i>italic</i>.</p> <!ELEMENT PO (#PCDATA item shipdate qty)*> <PO><qty>1</qty> <item>flowbee</item> was shipped to you on <shipdate>29 March 2003</shipdate>.</PO> 31 The Flowbee Precision Home Haircut System is available for purchase at http://www.flowbee.com/.

The <!ATTLIST> Declaration <!ATTLIST element_name attribute_name attribute_type default_declaration attribute_name attribute_type default_declaration... > 32

<!ATTLIST> Examples <!ATTLIST termdef id ID #REQUIRED type CDATA #REQUIRED name CDATA #IMPLIED> <!ATTLIST list type (bullets ordered glossary) "ordered"> <!ATTLIST form method CDATA #FIXED post"> <!ATTLIST paper language CDATA "English"> 33

Attribute Types CDATA Character data, including entities. ID Must be unique within document (and must start with a letter ). IDREF Must refer to an ID in document. IDREFS References one or more IDs, separated by spaces. ENTITY Must refer to an entity. ENTITIES References one or more entities, separated by spaces. NMTOKEN Name token devoid of whitespace. NMTOKENS Series of one or more NMTOKENs, separated by spaces. 34

Default Declarations #FIXED Attribute's value is fixed and must be that specified in DTD. #REQUIRED The element is required to have the attribute, and the the attribute is required to have a value. #IMPLIED Attribute is optional. 35

Where do DTDs go? DTDs can be placed in a standalone file known as an "external subset" part of the <!DOCTYPE> declaration in the XML document as an "internal subset" (which overrides any declarations in an external subset) Examples <!DOCTYPE SONG [ <!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*, PUBLISHER*, LENGTH?, YEAR?, ARTIST+)> ]> <!DOCTYPE SONG SYSTEM "song.dtd"> <!DOCTYPE SONG SYSTEM "song.dtd" [ <!ELEMENT ARTIST (FIRST,LAST)> <!ELEMENT FIRST (#PCDATA)> <!ELEMENT LAST (#PCDATA)> ]> 36

Validation javax.xml.parsers.saxparserfactory org.xml.sax.errorhandler 37

Whitespace <foo> <bar/> <baz/> </foo> 38

Similar XML Constructs Entities <!ENTITY nbsp " "> <!ENTITY copyright "Copyright (c) David Malan. All rights reserved."> Notations (http://msxml.com/intro_xml/notation_decl.html) <!NOTATION GIF SYSTEM "GIF Notation"> <!ENTITY watage_logo SYSTEM "watage.gif" NDATA GIF> 39

Shortcomings Not well-formed XML (though still derived from SGML) No built-in data types (e.g., bool, int, float, string, etc.) No support for custom data types (e.g., phone numbers) No pattern-matching No inheritance No support for ranges (e.g., "year must be an integer between 0 and 99", "review can appear as a child of book no more than 10 times", etc.) Not namespace-aware Content models must be deterministic; cannot allow arbitrary ordering of children, as with: <!ELEMENT foo ((bar,baz,qux) (bar,qux,baz) (baz,bar,qux) (baz,qux,bar) (qux,bar,baz) (qux,baz,bar))>... 40

Next Time XML Schema (Second Edition) XML Schema (Second Edition) 41

Computer Science E-259 XML with Java, Java Servlet, and JSP Lecture 8: XQuery 1.0 and DTD 19 November 2007 David J. Malan malan@post.harvard.edu 42