SDPL : XML Basics 2. SDPL : XML Basics 1. SDPL : XML Basics 4. SDPL : XML Basics 3. SDPL : XML Basics 5

Similar documents
Introduction & Motivation. Course Outline. Arrangements. Presentation vs Structure. Markup and Markup Language. Main Topic: Two XML Query Models

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11

XML. Objectives. Duration. Audience. Pre-Requisites

Chapter 1: Getting Started. You will learn:

Author: Irena Holubová Lecturer: Martin Svoboda

Introduction to XML. XML: basic elements

Data Presentation and Markup Languages

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML

Document Parser Interfaces. Tasks of a Parser. 3. XML Processor APIs. Document Parser Interfaces. ESIS Example: Input document

The concept of DTD. DTD(Document Type Definition) Why we need DTD

The XML Metalanguage

.. Cal Poly CPE/CSC 366: Database Modeling, Design and Implementation Alexander Dekhtyar..

Introduction to XML. An Example XML Document. The following is a very simple XML document.

2006 Martin v. Löwis. Data-centric XML. Document Types

Delivery Options: Attend face-to-face in the classroom or via remote-live attendance.

Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa

extensible Markup Language

Markup Languages SGML, HTML, XML, XHTML. CS 431 February 13, 2006 Carl Lagoze Cornell University

XML 2 APPLICATION. Chapter SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

Overview. Introduction. Introduction XML XML. Lecture 16 Introduction to XML. Boriana Koleva Room: C54

What is XML? XML is designed to transport and store data.

Extensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013

XML for Java Developers G Session 2 - Sub-Topic 1 Beginning XML. Dr. Jean-Claude Franchitti

Delivery Options: Attend face-to-face in the classroom or remote-live attendance.

Java EE 7: Back-end Server Application Development 4-2

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington

M359 Block5 - Lecture12 Eng/ Waleed Omar

XML. COSC Dr. Ramon Lawrence. An attribute is a name-value pair declared inside an element. Comments. Page 3. COSC Dr.

Using UML To Define XML Document Types

CSS, Cascading Style Sheets

- XML. - DTDs - XML Schema - XSLT. Web Services. - Well-formedness is a REQUIRED check on XML documents

extensible Markup Language (XML) Basic Concepts

2009 Martin v. Löwis. Data-centric XML. XML Syntax

Chapter 7: XML Namespaces

XML Overview, part 1

Structured documents

COMP9321 Web Application Engineering

XML Information Set. Working Draft of May 17, 1999

Introduction to XML (Extensible Markup Language)

Introduction to XML. Chapter 133

Oracle Application Server 10g Oracle XML Developer s Kit Frequently Asked Questions September, 2005

XML Extensible Markup Language

Session [2] Information Modeling with XSD and DTD

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

W3C XML XML Overview

Tutorial 2: Validating Documents with DTDs

Session 23 XML. XML Reading and Reference. Reading. Reference: Session 23 XML. Robert Kelly, 2018

XML SCHEMA INFERENCE WITH XSLT

XML. extensible Markup Language. Overview. Overview. Overview XML Components Document Type Definition (DTD) Attributes and Tags An XML schema

CSI 3140 WWW Structures, Techniques and Standards. Markup Languages: XHTML 1.0

XML and DTD. Mario Alviano A.Y. 2017/2018. University of Calabria, Italy 1 / 28

CSC Web Technologies, Spring Web Data Exchange Formats

Part 2: XML and Data Management Chapter 6: Overview of XML

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

Introduction Syntax and Usage XML Databases Java Tutorial XML. November 5, 2008 XML

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

XML: Managing with the Java Platform

XML, DTD: Exercises. A7B36XML, AD7B36XML: XML Technologies. Practical Classes 1 and 2: 3. and

XML. Presented by : Guerreiro João Thanh Truong Cong

Chapter 13 XML: Extensible Markup Language

Outline. XML vs. HTML and Well Formed vs. Valid. XML Overview. CSC309 Tutorial --XML 4. Edward Xia

PART. Oracle and the XML Standards

웹기술및응용. XML Basics 2018 년 2 학기. Instructor: Prof. Young-guk Ha Dept. of Computer Science & Engineering

Question Bank XML (Solved/Unsolved) Q.1 Fill in the Blanks: (1 Mark each)

XML. Jonathan Geisler. April 18, 2008

XML for Java Developers G Session 3 - Main Theme XML Information Modeling (Part I) Dr. Jean-Claude Franchitti

XML for Java Developers G Session 3 - Main Theme XML Information Modeling (Part I) Dr. Jean-Claude Franchitti

Chapter 10: Understanding the Standards

[MS-XML]: Microsoft Extensible Markup Language (XML) 1.0 Fourth Edition Standards Support Document

XML and Databases. Lecture 11 XSLT Stylesheets and Transforms. Sebastian Maneth NICTA and UNSW

Well-formed XML Documents

Chapter 1: Semistructured Data Management XML

XML 2 APPLICATION. Chapter SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

XML. XML Syntax. An example of XML:

XML Applications. Prof. Andrea Omicini DEIS, Ingegneria Due Alma Mater Studiorum, Università di Bologna a Cesena

Semistructured data, XML, DTDs

XML. Part I XML Document and DTD

Data Exchange. Hyper-Text Markup Language. Contents: HTML Sample. HTML Motivation. Cascading Style Sheets (CSS) Problems w/html

EMERGING TECHNOLOGIES. XML Documents and Schemas for XML documents

Chapter 1: XML Syntax

Chapter 1: XML Syntax

COPYRIGHTED MATERIAL. Contents. Part I: Introduction 1. Chapter 1: What Is XML? 3. Chapter 2: Well-Formed XML 23. Acknowledgments

COMP9321 Web Application Engineering

XML. Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior

CS6501 IP Unit IV Page 1

Intro to XML. Borrowed, with author s permission, from:


TagSoup: A SAX parser in Java for nasty, ugly HTML. John Cowan

XML and Databases. Outline XML. XML, typical usage scenario XML. 1. extensible Stylesheet Language X M L. Sebastian Maneth NICTA and UNSW

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

XML. extensible Markup Language. ... and its usefulness for linguists

Informatique de Gestion 3 èmes Bachelier Groupes 230x

Internet and Web Technologies. Sample Solutions 2013

Appendix H XML Quick Reference

Introduction to XML the Language of Web Services

XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013

Introduction to XML. Large Scale Programming, 1DL410, autumn 2009 Cons T Åhs

CHAPTER 2 MARKUP LANGUAGES: XHTML 1.0

Transcription:

2 Basics of XML and XML documents 2.1 XML and XML documents Survivor's Guide to XML, or XML for Computer Scientists / Dummies 2.1 XML and XML documents 2.2 Basics of XML DTDs 2.3 XML Namespaces XML 1.0 - Extensible Markup Language, 3C Recommendation, February 1998 not an official standard, but a stable industry standard 2 nd Ed 2000, 3 rd Ed 2004, 4 th Ed 2006, 5 th Ed Nov 2008» editorial revisions, not new versions of XML 1.0 a simplified subset of SGML, Standard Generalized Markup Language, ISO 8879:1987 what is said later about valid XML documents applies to SGML documents, too SDPL 2011 2: XML Basics 1 SDPL 2011 2: XML Basics 2 hat is XML? Semantics of XML Markup Extensible Markup Language is not a markup language! does not fix a tag set nor its semantics (unlike markup languages, e.g. HTML) XML documents have no inherent (processing or presentation) semantics even though many think that XML is semantic or self- describing; See next SDPL 2011 2: XML Basics 3 Meaning of this XML fragment? The computer cannot know it either! Implementing the semantics is the topic of this course SDPL 2011 2: XML Basics 4 hat is XML (2)? How does it look? XML is a (A) way to use markup to represent information (B) metalanguage» supports definition of specific markup languages through XML DTDs or Schemas» E.g. XHTML a reformulation of HTML using XML (C) Often XML XML + XML technology that is, processing models and languages we re studying (and many others...) SDPL 2011 2: XML Basics 5 <?xml version= 1.0 encoding= iso-8859-1?> <invoice num= 1234 > <client clnum= 00-01 > 01 > <name>pekka Kilpeläinen</name> <email>kilpelai@cs.uku.fi</email> </client> <item price= 60 unit= EUR > XML Handbook</item> <item price= 350 unit= FIM > XSLT Programmer s Ref</item> </invoice> SDPL 2011 2: XML Basics 6 1

Essential Features of XML Overview of XML essentials many details skipped» some to be discussed in exercises or with other topics when the need arises Learn to consult original sources (specifications, documentation etc) for details!» The XML specification is easy to browse SDPL 2011 2: XML Basics 7 XML Document Characters XML documents are made of ISO-10646 (32-bit) characters; ; in practice of 16-bit Unicode chars (cf. Java) Three aspects or characters (see next): representation by bytes/octets numeric code points 97 98 99... visual presentation SDPL 2011 2: XML Basics 8 External Aspects of Characters XML Encoding of Structure 1 Documents are stored/transmitted as a sequence of bytes (or octets). An encoding determines how characters are represented by bytes. UTF-8 ( 7-bit ASCII) is the XML default encoding encoding="iso-8859-1" ~> 256 estern European chars as single bytes A font (collection of character images called glyphs) ) determines the visual presentation of characters XML is, essentially, just a textual encoding scheme of labelled, ordered and attributed trees,, in which internal nodes are elements labelled by type names leaves are text nodes labelled by string values, or empty element nodes the left-to-right order of children of a node matters element nodes may carry attributes (= name-string-value pairs) SDPL 2011 2: XML Basics 9 SDPL 2011 2: XML Basics 10 XML Encoding of Structure 2 XML Encoding of Structure: Example XML encoding of a tree corresponds to a pre-order walk start of an element node with type name A denoted by a start tag <A>, and its end denoted by end tag </A> possible attributes written within the start tag: <A attr 1 = value 1 attr n = value n >» names must be unique: attr k attr h when k h text nodes written as their string value SDPL 2011 2: XML Basics 11 Hello S E A=1 <S> <> Hello </> <E A= 1 /> <>world! </> </S> SDPL 2011 2: XML Basics 12 world! 2

XML: Logical Document Structure Elements indicated by matching (case-sensitive!) sensitive!) tags <ElementTypeName> </ElementTypeName> can contain text and/or subelements can be empty: <elem-type></elem-type> or <elem-type/> (e.g. <br/> in XHTML) unique root element document a single tree Logical document structure (2) Attributes name-value pairs attached to elements metadata, usually not treated as content in start-tag tag after the element type name Also: <div class="preface" date='990126'> <!-- comments outside other markup --> <?PI value of processing instruction PI?> SDPL 2011 2: XML Basics 13 SDPL 2011 2: XML Basics 14 CDATA Sections Two levels of correctness (1) CDATA Sections to include XML markup characters as textual content <![CDATA[ Here we can include for example code fragments: <example>if (Count < 5 && Count > 0) </example> ]]> SDPL 2011 2: XML Basics 15 ell-formed documents roughly: follows the syntax of XML, markup correct (elements properly nested, tag names match, attributes of an element have unique names,...) violation is a fatal error Valid documents (in addition to being well-formed) obey an associated grammar (DTD/Schema) SDPL 2011 2: XML Basics 16 XML docs and valid XML docs An XML Processor (Parser) DTD-valid documents Schema-valid documents XML documents = well-formed XML documents Reads XML documents and reports their contents to an application relieves the application from details of markup XML Recommendation specifies, partly, the behaviour of XML processors: recognition of characters as markup or data; what information to pass to applications; how to check the correctness of documents; validation based on comparing document against its grammar SDPL 2011 2: XML Basics 17 SDPL 2011 2: XML Basics 18 3

2. 2 Basics of XML DTDs Markup Declarations A Document Type Declaration provides a grammar (document type definition, DTD) ) for a class of documents [Defined in XML Rec ec] Syntax (in the prolog of a document instance): <!DOCTYPE rootelemtype SYSTEM "ex.dtd" <!-- "external subset" in file ex.dtd --> [ <!-- "internal subset" may come here --> ]> DTD = union of the external and internal subset (could be empty); internal has higher precedence can override entity and attribute declarations (see next) SDPL 2011 2: XML Basics 19 DTD consists of markup declarations element type declarations» similar to productions of ECFGs attribute-list declarations» for declared element types entity declarations» short-hand hand notations and physical storage units notation declarations» information about external (binary) objects SDPL 2011 2: XML Basics 20 How do Declarations Look Like? Element type declarations <!ELEMENT invoice (client, item+)> <!ATTLIST invoice num NMTOKEN #REQUIRED> <!ELEMENT client (name, email?)> <!ATTLIST client num NMTOKEN #IMPLIED> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT item (#PCDATA)> <!ATTLIST item price NMTOKEN #REQUIRED unit (FIM EUR) EUR > string of name characters; (Letter [0-9]. - _ : ) + SDPL 2011 2: XML Basics 21 The general form is <!ELEMENT elemtype ( E )> where E is a content model (regular expr.); ECFG production elemtype E ) Content model operators: E F : alternation E, F: concatenation E? : optional E* : zero or more E+ : one or more (E) : grouping No priorities: : either (A,B) C or A,(B C), but no A,B C SDPL 2011 2: XML Basics 22 Attribute-List Declarations Mixed, Empty and Arbitrary Content Can declare attributes for elements: Name, data type and possible default value: <!ATTLIST FIG id ID #IMPLIED descr CDATA #REQUIRED class (a b c) "a"> Semantics mainly up to the application parser checks that ID attributes are unique and that targets of IDREF and IDREFS attributes exist SDPL 2011 2: XML Basics 23 Mixed content: <!ELEMENT P (#PCDATA I IMG)*> may contain text (#PCDATA)) and elements Empty content: <!ELEMENT IMG EMPTY> Arbitrary content: <!ELEMENT HTML ANY> (= <!ELEMENT HTML (#PCDATA choice-of-all-declared-element-types)*> )*>) SDPL 2011 2: XML Basics 24 4

Entities (1) Named storage units or XML fragments (~ macros in some languages) character entities:» << and< all expand to < (treated as data, not as start-of-markup)» other predefined entities: & > &apos apos; &quote; for & > ' " general entities are shorthand notations: <!ENTITY CS School of Computing"> SDPL 2011 2: XML Basics 25 Entities (2) physical storage units comprising a document parsed entities <!ENTITY chap1 SYSTEM "http://myweb/ch1"> document entity is the starting point of processing entities and elements must nest properly: <!DOCTYPE doc [ <!ENTITY chap1 ( as above ) > ] <doc> &chap1; </doc> <sec num="1"> </sec> <sec num="2"> </sec> SDPL 2011 2: XML Basics 26 Unparsed Entities An Alternative ay For connecting external binary objects to XML documents; (XML processor handles only text) <!NOTATION TIFF "bin/gimp"> <!ENTITY fig123 SYSTEM "figs/f123.tif" NDATA TIFF> <!ATTLIST IMG file ENTITY #REQUIRED> Usage: <IMG file="fig123"/> parser provides notation and address to the application an SGML legacy technique(?) SDPL 2011 2: XML Basics 27 I have rarely used unparsed entities or notations Easier "HTML-style" linking: <IMG type="tiff" file="figs/fig123"/> the link indicates the format and the address directly maintenance of numerous entities might be easier with (indirect references through) explicit declarations SDPL 2011 2: XML Basics 28 Parameter Entities to parameterize and modularize DTDs: <!ENTITY % tabledtd SYSTEM "dtds/tab.dtd"> %tabledtd; <!-- include sub-dtd --> <!ENTITY % stattr "ready (yes no) 'no'"> <!ATTLIST CHAP %stattr; > <!ATTLIST SECT %stattr; > (Parameter entities inside a markup declaration allowed only in external DTD subset) Speculations about XML Parsing Parsing involves two things: 1. Pulling the entities together, and checking the well- formedness 2. Building a parse tree for the input (a'la DOM), or otherwise informing the application about document content and structure (e.g. a'la SAX) Task 2 is simple ( simplicity of XML markup; see next) Checking well-formedness is straightforward; Implementing validation is a bit more challenging SDPL 2011 2: XML Basics 29 SDPL 2011 2: XML Basics 30 5

Building an XML Parse Tree Simplified XML Tree Building Algorithm Hello S E <S> <> Hello </><E> <E> </E> <> world! </></S> </S> world! C new Document(); // current node Uses DOM like Scan document text until at end: tree operations case of scanned text: "<Type>": N new ElementNode(Type); C.addChildNode(N); C N; ": C.addChildNode(new TextNode(Txt)); Type>": C C.parentNode(); "Txt": "</ </Type return C; SDPL 2011 2: XML Basics 31 SDPL 2011 2: XML Basics 32 Sketching XML validation Treat the document as a tree d Document is valid w.r.t.. a grammar (DTD/Schema) G iff d is a syntax tree over G Check that the root is labelled by the start symbol of G For each element node n of the tree, check that its» attributes match with those of the element type» content matches the content model of its type: If n is of type A and its children of type B 1,, B n, check that the grammar has a production A E for which Sketching the validation (2) How to check condition (1; matching of children with a content model)? by an automaton built from content model E Example: <!ELEMENT A ((B C)+, D)> B 1 B n L(E) (1) SDPL 2011 2: XML Basics 33 SDPL 2011 2: XML Basics 34 2.3 XML Namespaces Documents often comprise parts processed by different applications (and/or defined in different schemas) for example, in XSLT scripts: <xsl:template match="doc/title"> <H1> HTML <xsl:apply-templates /> elements </H1> </xsl:template> XSLT elements/ How to manage multiple sets of names? instructions XML Namespaces (2/5) Solution: XML Namespaces, 3C Rec. Jan 99, for separating possibly overlapping vocabularies (sets of element type and attribute names) within a single document by introducing (arbitrary) local name prefixes,, and binding them to (fixed) globally unique URIs For example, the local prefix xsl: conventionally used in XSLT scripts SDPL 2011 2: XML Basics 35 SDPL 2011 2: XML Basics 36 6

XML Namespaces briefly (3/5) XML Namespaces (4/5) Namespace identified by a URI (through the associated local prexif) e.g.http://www.w3.org/ http://www.w3.org/1999/xsl/transform for XSLT conventional but not required to use URLs the identifying URI has to be unique, but it does not have to be an existing address Association inherited to sub-elements see the next example (of an XSLT script) <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/xsl/transform" xmlns="http://www.w3.org/tr/xhtml1/strict"> <!-- XHTML is the default namespace --> <xsl:template match="doc/title"> <H1> <xsl:apply-templates /> </H1> </xsl:template> </xsl:stylesheet> SDPL 2011 2: XML Basics 37 SDPL 2011 2: XML Basics 38 XML Namespaces briefly (5/5) Built on top of basic XML By overloading attribute syntax (xmlns:foo=... =...) does not affect validation» namespace attributes must be declared for DTD-validity» all element type names must be declared (with their prefixes!) > Other schema languages (XML Schema, Relax NG) better for validating documents with Namespaces Processing languages allow to declare NS prefixes; see next SDPL 2011 2: XML Basics 39 Example: Namespaces in XSLT XML: <test xmlns:foo="http://www.uef.fi/test"> <foo:a>aaa</ </foo:a> <foo:b>bbb</ </foo:b> </test> <xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/xsl/transform" xmlns:bar="http://www.uef.fi/test"> <xsl:template match="/"> <xsl:apply-templates select="//bar:*" /> </xsl:template> </xsl:transform> SDPL 2011 2: XML Basics 40 Example: Namespaces in XQuery ns-test.xml : <test xmlns:foo="http://www.uef.fi/test"> <foo:a>aaa</ </foo:a> <foo:b>bbb</ </foo:b> </test> xquery version "1.0"; declare namespace zap = "http://www.uef.fi/test"; doc("ns-test.xml")//zap:* For NS support in XML APIs, see next SDPL 2011 2: XML Basics 41 7