Author: Irena Holubová Lecturer: Martin Svoboda

Similar documents
XML Technologies. Doc. RNDr. Irena Holubova, Ph.D. Web pages:

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11

SDPL : XML Basics 2. SDPL : XML Basics 1. SDPL : XML Basics 4. SDPL : XML Basics 3. SDPL : XML Basics 5

Introduction to XML. An Example XML Document. The following is a very simple XML document.

Extensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013

Chapter 1: Getting Started. You will learn:

Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa

Data Presentation and Markup Languages

The concept of DTD. DTD(Document Type Definition) Why we need DTD

웹기술및응용. XML Basics 2018 년 2 학기. Instructor: Prof. Young-guk Ha Dept. of Computer Science & Engineering

Delivery Options: Attend face-to-face in the classroom or remote-live attendance.

2009 Martin v. Löwis. Data-centric XML. XML Syntax

2006 Martin v. Löwis. Data-centric XML. Document Types

Structured documents

Outline. XML vs. HTML and Well Formed vs. Valid. XML Overview. CSC309 Tutorial --XML 4. Edward Xia

XML. extensible Markup Language. Overview. Overview. Overview XML Components Document Type Definition (DTD) Attributes and Tags An XML schema

XML and DTD. Mario Alviano A.Y. 2017/2018. University of Calabria, Italy 1 / 28

Introduction to XML. XML: basic elements

Overview. Introduction. Introduction XML XML. Lecture 16 Introduction to XML. Boriana Koleva Room: C54

Delivery Options: Attend face-to-face in the classroom or via remote-live attendance.

extensible Markup Language (XML) Basic Concepts

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

W3C XML XML Overview

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

Chapter 1: XML Syntax

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

Markup Languages SGML, HTML, XML, XHTML. CS 431 February 13, 2006 Carl Lagoze Cornell University

.. Cal Poly CPE/CSC 366: Database Modeling, Design and Implementation Alexander Dekhtyar..

Introduction to XML (Extensible Markup Language)

COMP9321 Web Application Engineering

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

XML Technologies. Doc. RNDr. Irena Holubova, Ph.D. Web page:

XML. Objectives. Duration. Audience. Pre-Requisites

XML 2 APPLICATION. Chapter SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

Chapter 1: XML Syntax

Introduction Syntax and Usage XML Databases Java Tutorial XML. November 5, 2008 XML

Introduction to XML. Chapter 133

Tutorial 2: Validating Documents with DTDs

CSI 3140 WWW Structures, Techniques and Standards. Markup Languages: XHTML 1.0

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

CHAPTER 2 MARKUP LANGUAGES: XHTML 1.0

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML

Part 2: XML and Data Management Chapter 6: Overview of XML

M359 Block5 - Lecture12 Eng/ Waleed Omar

Session [2] Information Modeling with XSD and DTD

EMERGING TECHNOLOGIES. XML Documents and Schemas for XML documents

Chapter 13 XML: Extensible Markup Language

XML & Related Languages

XML Structures. Web Programming. Uta Priss ZELL, Ostfalia University. XML Introduction Syntax: well-formed Semantics: validity Issues

XML, DTD: Exercises. A7B36XML, AD7B36XML: XML Technologies. Practical Classes 1 and 2: 3. and

XML. XML Syntax. An example of XML:

XML. extensible Markup Language. ... and its usefulness for linguists

Introduction to XML. National University of Computer and Emerging Sciences, Lahore. Shafiq Ur Rahman. Center for Research in Urdu Language Processing

Chapter 1: Semistructured Data Management XML

CSS, Cascading Style Sheets

XML stands for Extensible Markup Language and is a text-based markup language derived from Standard Generalized Markup Language (SGML).

CSC Web Technologies, Spring Web Data Exchange Formats

XML 2 APPLICATION. Chapter SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington

COMP9321 Web Application Engineering

[MS-XML]: Microsoft Extensible Markup Language (XML) 1.0 Fourth Edition Standards Support Document

XML Information Set. Working Draft of May 17, 1999

Chapter 1: Semistructured Data Management XML

What is XML? XML is designed to transport and store data.

PART. Oracle and the XML Standards


Fundamentals of Web Programming a

XML: Managing with the Java Platform

Session 23 XML. XML Reading and Reference. Reading. Reference: Session 23 XML. Robert Kelly, 2018

XML. Semi-structured data (SSD) SSD Graphs. SSD Examples. Schemas for SSD. More flexible data model than the relational model.

Editor s Concrete Syntax (ECS): a Profile of SGML for Editors

but XML goes far beyond HTML: it describes data

Java EE 7: Back-end Server Application Development 4-2

XML module 2. Creating XML. Hans C. Arents. senior IT market analyst. I.T. Works. Guiding the IT Professional

Databases and Internet Applications

Solutions. a. Yes b. No c. Cannot be determined without the DTD. d. Schema. 9. Explain the term extensible. 10. What is an attribute?

B4M36DS2, BE4M36DS2: Database Systems 2

XML. Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior

Constructing a Document Type Definition (DTD) for XML

XML: Extensible Markup Language

The main Topics in this lecture are:

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Introduction to XML. Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University

Chapter 10: Understanding the Standards

The XML Metalanguage

XML: and related technologies

Outline. XML DOCTYPE External - SYSTEM. XML DOCTYPE Internal DTD &6&7XWRULDO ;0/ (GZDUG;LD

Introduction to XML 3/14/12. Introduction to XML

Informatique de Gestion 3 èmes Bachelier Groupes 230x

Overview. Structured Data. The Structure of Data. Semi-Structured Data Introduction to XML Querying XML Documents. CMPUT 391: XML and Querying XML

Enhanced XML Retrieval with Flexible Constraints Evaluation

XML Extensible Markup Language

Semistructured data, XML, DTDs

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

XML, DTD, and XPath. Announcements. From HTML to XML (extensible Markup Language) CPS 116 Introduction to Database Systems. Midterm has been graded

Introduction to Database Systems CSE 414

Module 2 (III): XHTML

Appendix H XML Quick Reference

Fundamentals of Web Programming a

Information Systems. XML Essentials. Nikolaj Popov

Transcription:

NPRG036 XML Technologies Lecture 1 Introduction, XML, DTD 19. 2. 2018 Author: Irena Holubová Lecturer: Martin Svoboda http://www.ksi.mff.cuni.cz/~svoboda/courses/172-nprg036/

Lecture Outline Introduction XML XML technologies DTD

Introduction to XML format

Motivation Node A We want to exchange information Node B

E.g. we want to send a message Tim Berners-Lee, Robert Cailliau Hi! My Internet does not work! Steve J. P.S. Help me!

as a unstructured text? Tim Berners-Lee, Robert Cailliau Hi! My Internet does not work! Steve J. P.S. Help me!

as a unstructured text? Tim Berners-Lee, Robert Cailliau Hi! My Internet does not work! Steve J. P.S. Help me! But how to find out (automatically) who sends the message?

Let us introduce tags start tag <tag>body</tag> end tag

We can tag parts of the message <address>tim Berners-Lee</address> <address>robert Cailliau</address> <intro>hi!</intro> <text>my Internet does not work!</text> <signature>steve J.</signature> <PS>Help me!</ps> data metadata

And the whole message <message> <address>tim Berners-Lee</address> <address>robert Cailliau</address> <intro>hi!</intro> <text>my Internet does not work!</text> <signature>steve J.</signature> <PS>Help me!</ps> </message>

In general to process the data automatically To show the correct content in a browser it is not sufficient <message> <address>tim Berners-Lee</address> <address>robert Cailliau</address> <intro>hi!</intro> <text>my Internet does not work!</text> <signature>steve J.</signature> <PS>Help me!</ps> </message>

We need more information Format: XML + version Encoding: By default the document is in ISO 10646 (Unicode) To communicate with the whole world we can use UTF- 8 Compatible with ASCII Contains all characters of all languages For the Czech language we have ISO-8859-2 or Windows-1250

Better, but Not much user friendly The browser shows also the meta data

We can, e.g., transform the document into HTML <html encoding="windows-1250"> <head> <title>message from: Steve J.</title> </head> <body> <p>tim Berners-Lee</p> <p>robert Cailliau</p> <p>hi!</p> <p>my Internet does not work!</p> <p>steve J.</p> <p>help me!</p> </body> </html>

Now the browser knows what to do with the data

What is the general aim? Pure data are hard to process automatically We need to:?? Ensure that a particular software understands the data Add meaning (semantics) of particular data fragments E.g. HTML describes visualization of data for an HTML browser Problem 1: What if we are not interested just in visualization? Problem 2: HTML has lax rules for structure Complex processing Solution: XML

XML XML (extensible Markup Language) is a format for transfer and exchange of general data Extensible Markup Language (XML) 1.0 (Fifth Edition) http://www.w3.org/tr/xml/ Extensible Markup Language (XML) 1.1 (Second Edition) http://www.w3.org/tr/xml11/ XML is a subset (application) of SGML (Standard Generalized Markup Language - ISO 8879) from 1986 XML does not deal with data presentation It enables to tag parts of the data The meaning of the tags depends on the author Presentation is one possible example

SGML vs. XML vs. HTML vs. XHTML strict rules for structure XML SGML specific set of tags, lax rules for structure HTML specific set of tags XHTML which tags to use

XML Document XML document is well-formed, if: It has introductory prolog Start and end tags nest properly Each element has a start and an end tag Corresponding tags have the same name (case sensitivity) <a></a> Pairs of tags do not cross <a><b></a></b> The whole document is enclosed in a single root element

Prolog An information for the SW that it works with an XML document It must contain declaration of XML version We have 1.0 and 1.1 It can contain information about encoding and if the document is standalone Version: <?xml version="1.1"?> Encoding other than UTF-8: <?xml version="1.1" encoding="iso-8859-2"?> Standalone document: <?xml version="1.1" standalone="yes"?> always lowercase

Elements Element with element content <?xml version="1.1" encoding="iso-8859-2"?> <message> <address> <name>tim Berners-Lee</name> <street>northern 12</street> </address> <intro>hi!</intro> <text>my <it>internet</it> does not work!</text> <signature>steve J.</signature> <attachment/> </message> Empty element Element with text content Element with mixed content Root element <attachment></attachment>

Attributes Element with an attribute <?xml version="1.1" encoding="iso-8859-2"?> <message> <address> <name>tim Berners-Lee</name> <street>northern 12</street> </address> <intro>hi!</intro> <text>my <it>internet</it> does not work!</text> <signature>steve J.</signature> <attachment fig="image01.jpg"/> </message> Attribute name Attribute value

Other Items of XML Document <?xml version="1.1" encoding="iso-8859-2"?> <message> <! - to whom the message should be sent? --> <address>jan Amos</address> <text> <![CDATA[ for (i=0; i < 10; i++) { document.writeln("<p>hi!</p>"); } ]]> </text> <signature>steve J.</signature> <date><?php echo Date("d.m.Y")?></date> </message> Comment CDATA section Processing instruction

XML Technologies XML is not only about the tags XML = basic format for data description XML documents XML technologies = a family of technologies to process XML data Description of allowed content, data interface, parsing of data, information extraction (querying), transformation into other formats, W3C (WWW Consortium) standards Efficient implementation of the standards Parsers, validators, query evaluators, XSL transformers, data persistence, Standard XML formats Where XML is used http://www.w3.org/

XML Technologies native XML DB relational XML DB XHTML MathML SVG DocBook OpenDocument UBL XForms UDDI SOAP OpenTravel exploitation XSLT transformation persistence SQL/XML SAX DOM StAX LINQ parsing XML data validation querying DTD XML Schema RELAX NG Schematron XPath XQuery XQuery Update

DTD

DTD Problem: Well-formedness is insufficient We need to restrict the set of tags and their content Document Type Definition (DTD) describes the structure (grammar) of an XML document Using regular expressions Valid XML document = well-formed XML document corresponding to a given grammar There are also other languages XML Schema, Schematron, RELAX NG,

Structure of a Valid Document <?xml version="1.0"?> <!DOCTYPE root-element [... ]> <root-element>... </root-element> Declaration of a document type Can be internal (grammar specified within DOCTYPE) or external (a reference to a separate file with the grammar) There is no significant use for internal rules Usually only for testing Both can be used at the same time Internal declarations have higher priority

Example: external and internal DTD <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>hello, world!</greeting> <?xml version="1.0"?> <!DOCTYPE greeting SYSTEM "greeting.dtd"> <greeting>hello, world!</greeting> <?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/tr/xhtml1/dtd/xhtml1-strict.dtd"> <html> </html> PUBLIC public identifier" "URI"

Basic DTD Tags Document type declaration <!DOCTYPE > upper case!! Element type declaration <!ELEMENT > Declaration of a list of attributes <!ATTLIST > Declaration of an entity <!ENTITY > Declaration of a notation <!NOTATION >

Declaration of an Element Type <!ELEMENT parent (child*)> <parent> <child>... </child> <child>... </child>... </parent> Element name + declaration of allowed content Empty, any, text, mixed, element

Declaration of an Element Type, sequence selection? iteration (0 or 1) + iteration (1 or more) * iteration (0 or more) Empty content <!ELEMENT attachment EMPTY> Any content <!ELEMENT container ANY> Text content <!ELEMENT surname (#PCDATA)> Mixed content <!ELEMENT text (#PCDATA it bold)*> Element content <!ELEMENT message (address, text)> (name, (author editor)?, p*, (title, p+)*)

Declaration of an Attribute The order is in the document arbitrary <!ATTLIST person number ID #REQUIRED employed CDATA #FIXED "yes" holiday (yes no) "no"> Attributes of element person Attribute number is a unique ID and compulsory (#REQUIRED) Attribute employed contains text (CDATA), it has a constant value (#FIXED) yes Attribute holiday can have one of the given values ( yes or no ), implicit value is no???

Data Types of Attributes CDATA arbitrary text string Enumeration of values ID unique identifier (within the content of the document), it must be a string of letters, numbers and characters -, _, :,., preferably in ASCII, it must start with a letter or character _ IDREF reference to an ID of any element in the document IDREFS list of references (delimited with white spaces) to IDs NMTOKEN string similar to ID, not unique, can start with a number NMTOKENS list of NMTOKENs later ENTITY link to an external entity ENTITIES list of links to external entities

Presence of Attributes #REQUIRED the attribute is compulsory #IMPLIED the attribute is optional #FIXED the attribute has a fixed value

Entity Declaration In practice only the trivial cases are usually used Entity = association of a name and a value which is later (repeatedly) used Classification 1: Parsed = the text which replaces the link to an entity becomes a part of the document We refer using references Unparsed = a resource which can contain anything (e.g. binary data = an image, a video) We refer using attributes of type ENTITY/ENTITIES It must be associated with a notation Classification 2: General in XML documents Parameter in DTDs Classification 3: Internal vs. external later

Character Entities A possibility to insert a character with any code Hexadecimal or decimal Solve inequality 3x < 5 Pre-defined entities for special characters Solve inequality 3x < 5 & amp < lt > gt apos quot

General Entity Internal (hence of course parsed) entity Usage: Repeating parts of XML documents <!ENTITY status "working draft"> <note>the current status of the document is &status;</note> External parsed entity Usage: Modularity of XML documents <!ENTITY xml-serial SYSTEM "xml-serial.txt">

General Entities External unparsed entity Usage: Reference to non-xml data or PUBLIC <?xml version="1.0" encoding="windows-1250"?> <!DOCTYPE message [ <!NOTATION avi SYSTEM "C:/Program Files/Video Player/Player.exe"> <!ENTITY video SYSTEM "video.avi" NDATA avi> <!ELEMENT video-holiday (#PCDATA)> <!ATTLIST video-holiday src ENTITY> ]> <message>i enclose the <video-holiday src="video">video</video-holiday> from holiday.</message> Declaration of a notation

Parameter Entity Internal entity Usage: repeating parts of DTDs!!! <!ELEMENT rental (car*)> <!ENTITY % attributes "color (blue white black) #REQUIRED speed (high low) #IMPLIED" > <!ELEMENT car (#PCDATA)> <!ATTLIST car %attributes; > <!ELEMENT motorcycle (#PCDATA)> <!ATTLIST motorcycle %attributes; > <!ELEMENT bike (#PCDATA)> <!ATTLIST bike %attributes; >

Parameter Entity External entity Usage: Modularity of DTDs <!ENTITY % ISOLat2 SYSTEM "iso-pub.ent">... %ISOLat2;...

Conditional Sections <!ENTITY % draft 'INCLUDE' > <!ENTITY % final 'IGNORE' > <![%draft;[ <!ELEMENT book (comments*, title, body, supplements?)> ]]> <![%final;[ <!ELEMENT book (title, body, supplements?)> ]]>

DTD Bigger Example <!ELEMENT employees (person)+> <!ELEMENT person (name, email*, relations?)> <!ATTLIST person id ID #REQUIRED> <!ATTLIST person note CDATA #IMPLIED> <!ATTLIST person holiday (yes no) "no"> <!ELEMENT name ((first, surname) (surname, first))> <!ELEMENT first (#PCDATA)> <!ELEMENT surname (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT relations EMPTY> <!ATTLIST relations superior IDREF #IMPLIED> <!ATTLIST relations subordinates IDREFS #IMPLIED>