Introduction to XML. An Example XML Document. The following is a very simple XML document.

Similar documents
XML 2 APPLICATION. Chapter SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

Overview. Introduction. Introduction XML XML. Lecture 16 Introduction to XML. Boriana Koleva Room: C54

XML 2 APPLICATION. Chapter SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11

Author: Irena Holubová Lecturer: Martin Svoboda

W3C XML XML Overview

Web Services Part I. XML Web Services. Instructor: Dr. Wei Ding Fall 2009

XML. extensible Markup Language. ... and its usefulness for linguists


Configuring Tomcat for a Web Application

Introduction to XML. National University of Computer and Emerging Sciences, Lahore. Shafiq Ur Rahman. Center for Research in Urdu Language Processing

Well-formed XML Documents

Outline. XML vs. HTML and Well Formed vs. Valid. XML Overview. CSC309 Tutorial --XML 4. Edward Xia

XML. Objectives. Duration. Audience. Pre-Requisites

Markup Languages SGML, HTML, XML, XHTML. CS 431 February 13, 2006 Carl Lagoze Cornell University

Chapter 10: Understanding the Standards

CSC Web Technologies, Spring Web Data Exchange Formats

Chapter 1: Getting Started. You will learn:

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

2009 Martin v. Löwis. Data-centric XML. XML Syntax

Introduction to XML. Chapter 133

markup language carry data define your own tags self-descriptive W3C Recommendation

SDPL : XML Basics 2. SDPL : XML Basics 1. SDPL : XML Basics 4. SDPL : XML Basics 3. SDPL : XML Basics 5

COMP9321 Web Application Engineering

extensible Markup Language (XML) Basic Concepts

XML & Related Languages

XML Structures. Web Programming. Uta Priss ZELL, Ostfalia University. XML Introduction Syntax: well-formed Semantics: validity Issues

XML and DTD. Mario Alviano A.Y. 2017/2018. University of Calabria, Italy 1 / 28

What is XML? XML is designed to transport and store data.

Java EE 7: Back-end Server Application Development 4-2

Information Systems. XML Essentials. Nikolaj Popov

XML. XML Syntax. An example of XML:

Extensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013

2006 Martin v. Löwis. Data-centric XML. Document Types

Data Presentation and Markup Languages

Chapter 1: XML Syntax

Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa

Chapter 1: XML Syntax

Computer Science E-75 Building Dynamic Websites

XML: Managing with the Java Platform

Outline. XML DOCTYPE External - SYSTEM. XML DOCTYPE Internal DTD &6&7XWRULDO ;0/ (GZDUG;LD

M359 Block5 - Lecture12 Eng/ Waleed Omar

PART. Oracle and the XML Standards

EXtensible Markup Language XML

Introduction to XML. XML: basic elements

Introduction to XML. M2 MIA, Grenoble Université. François Faure

Solutions. a. Yes b. No c. Cannot be determined without the DTD. d. Schema. 9. Explain the term extensible. 10. What is an attribute?

Computer Science E-75 Building Dynamic Websites

XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013

XML. extensible Markup Language. Overview. Overview. Overview XML Components Document Type Definition (DTD) Attributes and Tags An XML schema

The concept of DTD. DTD(Document Type Definition) Why we need DTD

.. Cal Poly CPE/CSC 366: Database Modeling, Design and Implementation Alexander Dekhtyar..

Computer Science S-75 Building Dynamic Websites

Fundamentals of Web Programming a

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

Android Pre- requisites

TagSoup: A SAX parser in Java for nasty, ugly HTML. John Cowan

XML. Part I XML Document and DTD

Semistructured data, XML, DTDs

The main Topics in this lecture are:

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

XML for Android Developers. partially adapted from XML Tutorial by W3Schools

Chapter 1. Creating XML Documents

Structured documents

XML stands for Extensible Markup Language and is a text-based markup language derived from Standard Generalized Markup Language (SGML).

웹기술및응용. XML Basics 2018 년 2 학기. Instructor: Prof. Young-guk Ha Dept. of Computer Science & Engineering

Semantic Web. XML and XML Schema. Morteza Amini. Sharif University of Technology Fall 94-95

XML for Java Developers G Session 2 - Sub-Topic 1 Beginning XML. Dr. Jean-Claude Franchitti

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

CSS, Cascading Style Sheets

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

Chapter 13 XML: Extensible Markup Language

Introduction Syntax and Usage XML Databases Java Tutorial XML. November 5, 2008 XML

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML

CHAPTER 2 MARKUP LANGUAGES: XHTML 1.0

Introduction p. 1 An XML Primer p. 5 History of XML p. 6 Benefits of XML p. 11 Components of XML p. 12 BNF Grammar p. 14 Prolog p. 15 Elements p.

Session 23 XML. XML Reading and Reference. Reading. Reference: Session 23 XML. Robert Kelly, 2018

11. EXTENSIBLE MARKUP LANGUAGE (XML)

Chapter 7: XML Namespaces

Overview. Structured Data. The Structure of Data. Semi-Structured Data Introduction to XML Querying XML Documents. CMPUT 391: XML and Querying XML

Part 2: XML and Data Management Chapter 6: Overview of XML

Tutorial 1 Getting Started with HTML5. HTML, CSS, and Dynamic HTML 5 TH EDITION

Creating Markup with XML

XML is a popular multi-language system, and XHTML depends on it. XML details languages

Appendix H XML Quick Reference

Introduction to XML. When talking about XML, here are some terms that would be helpful:

COMP9321 Web Application Engineering

Web scraping and crawling, open data, markup languages and data shaping. Paolo Boldi Dipartimento di Informatica Università degli Studi di Milano

XML module 2. Creating XML. Hans C. Arents. senior IT market analyst. I.T. Works. Guiding the IT Professional

XML, DTD, and XPath. Announcements. From HTML to XML (extensible Markup Language) CPS 116 Introduction to Database Systems. Midterm has been graded

TASC Consulting Technical Writing Courseware Training

CountryData Technologies for Data Exchange. Introduction to XML

Introduction to XML (Extensible Markup Language)

The XML Metalanguage

Tutorial 2: Validating Documents with DTDs

Document Parser Interfaces. Tasks of a Parser. 3. XML Processor APIs. Document Parser Interfaces. ESIS Example: Input document

XML: Extensible Markup Language

ONIX for Books Product Information Message. Application Note: Embedding HTML markup in ONIX 3.0 data elements

LABORATORY 117. Intorduction to VoiceXML

Transcription:

Introduction to XML Extensible Markup Language (XML) was standardized in 1998 after 2 years of work. However, it developed out of SGML (Standard Generalized Markup Language), a product of the 1970s and 1980s. SGML was standardized in 1986. It is very complicated, so a more useful subset was needed, and XML now handles many of the problems that motivated SGML. XML is a markup language. That means that it annotates or marks up documents. Its main purpose is to provide information (semantics) about the contents of the document. It uses tags similar to those in html, but these tags are not specified in advance, rather they are created by the user. An Example XML Document The following is a very simple XML document. <?xml version = 1.0?> <name>alice Lee</name> <birthday>1983-07-15</birthday> </address> Like in xhtml, all start tags must have matching end tags. However, while XML is case sensitive, it is not restricted to lower case. Most applications mix cases, such as in fullname or lastname. The naming requirements are similar to those in Java, but along with letters, digits, and underscores, names may include hyphens and periods. An XML document must follow these rules. If it does so, it is said to be well-formed. If it does not, it is not a true XML document. Tree Structure An XML document exhibits a tree structure. It has a single root node, in the example above. The tree is a general ordered tree. There is a first child, a next sibling, etc. Nodes have parents and children. There are leaf nodes at the bottom of the tree. The declaration at the top is not part of the tree, but the rest of the document is. We could expand the document above so that name and birthday have child nodes. <?xml version = 1.0?> <name> <first>alice</first> <last>lee</last> </name> <birthday> <year>1983</year> <month>07</month> <day>15</day> </birthday>

</address> Now <name> has two children and <birthday> has three. Most processing on the tree is done with a preorder traversal. address name email phone birthday Entities As in html, certain characters are not allowed in the data. The most obvious ones are the less than signs and quotation marks. Also, ampersands are used to start the escape string, so they too have a substitution. These are escaped with the following substitutions with the greater than sign thrown in for symmetry. < < > > & & " &apos; CDATA Sections CDATA stands for character data. XML can have sections that contain characters of any kind that are not parsed. This means that they will be ignored by the XML parser that is used to put the document into a tree. These sections are similar to the pre sections in html. The browser simply displays them unchanged. CDATA sections begin with <![CDATA[ and end with ]]>. An example might be an equation like the following: <![CDATA[ x + 2*y = 3 ]]> Attributes first last year month day As in html, tags can have attributes. These are name-value pairs such as width = "300". We have seen these in applet and image tags. They can be used in XML and are required in some places. An example from the preceding might be <name first = "Alice" last = "Lee" /> However this is not very useful for data. It makes it harder to see the structure of the document.

There are places where attributes are necessary. One that we will be using in the future is for giving a reference to the location of a stylesheet. <link rel="stylesheet" type="text/css" href="address.css" /> Unicode Like Java, XML uses Unicode to code for character data. There are a number of different versions of Unicode, but all have ASCII as the first 128 characters. After that they may differ. The most common version used in the west, and the XML default, is UTF-8. It is a variable length code that encodes some characters in a byte, some in two bytes and even some in four bytes. Since many applications just use ASCII, this is the most efficient way to work with them and wastes the least space. The remaining 128 characters from code 128 to code 255 are used for some of the more common non-ascii characters used in western nations. The two-byte codes are used for some other language systems including some Asian ideographs. And finally the four-byte codes are used for more complicated ideographs. There are a number of other flavors of Unicode. If you expect to be coding languages other than the common western ones, you should investigate all the possibilities. We will use UTF-8 for our documents. Declarations XML documents do not require a declaration at the top, but it is always a good idea to put one there. The simplest declaration is the one used above: <?xml version = 1.0?> To this we can add two attributes. These are encoding and standalone. The result might be <?xml version="1.0" encoding="utf-8" standalone ="no"?> The meaning of encoding was given above, and standalone is explained below. There are many predefined declarations in use. The one that follows has been provided by the Apache Tomcat project to be used in configuring the web application deployment descriptor, web.xml. The encoding here is ISO-8859-1, known as Latin-1. ISO stands for International Standards Organization. And the 8859 standard contains a number of encodings. Latin-1 is the only one that is the same as UTF-8 in the first 256 characters. <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" "http://java.sun.com/dtd/web-app_2_3.dtd"> The DOCTYPE declaration is similar to the one that we used in xhtml. But this one refers to a DTD (Document Type Definition) that has been created for Tomcat. It can be found as the only contents of the web.xml file, stored in the WEB-INF folder, in Tomcat. Instructions for configuring it for a web application are in the next document. Document Type Definitions A DTD is used to describe a particular XML application. It is used to specify the names, order, and data of elements in the application. It also shows the tree structure, identifying which elements are children,

siblings, or parents of other elements. A document that adheres to the specifications is said to be valid. A document may be well-formed but not valid. This can occur for example if the elements are out of order. A DTD for the address example might be: <!ELEMENT address (name, email, phone, birthday)> <!ELEMENT name (first, last)> <!ELEMENT first (#PCDATA)> <!ELEMENT last (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT birthday (year, month, day)> <!ELEMENT year (#PCDATA)> <!ELEMENT month (#PCDATA)> <!ELEMENT day (#PCDATA)> This says that address is the root of the document. It has four children: name, email, phone, and birthday. The element name also has two children, first and last. And the element birthday has three children, year, month, and day. All the rest of the elements consist of PCDATA, parsed character data. A DTD can only make a distinction between PCDATA and CDATA, unparsed character data. For finer distinctions, you have to use schema. A DTD can be either in-line or external. The standalone attribute in the declaration above has the value "no", meaning that the DTD is external. The value "yes" means that it is in-line. No is the default. If the above definitions are contained in a file called address.dtd, the following declaration should be added to the top of the xml file. <!DOCTYPE address SYSTEM "address.dtd"> This assumes that the file, address.dtd, is in the same folder as the xml file. This is the best way to handle finished DTDs, however when developing a DTD, it is more convenient to have it in-line. In that case, the entire DTD is placed at the top of the xml file enclosed by <!DOCTYPE address ]>. The entire inline example for the preceding xml file follows. It has been validated by the DTD validator, found at http://www.w3schools.com. <?xml version = "1.0"?> <!DOCTYPE address [<!ELEMENT address (name, email, phone, birthday)> <!ELEMENT name (first, last)> <!ELEMENT first (#PCDATA)> <!ELEMENT last (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT birthday (year, month, day)> <!ELEMENT year (#PCDATA)> <!ELEMENT month (#PCDATA)> <!ELEMENT day (#PCDATA)> ]> <name> <first>alice</first> <last>lee</last>

</name> <birthday> <year>1983</year> <month>07</month> <day>15</day> </birthday> </address> Reference Elliotte Rusty Harold and Scott Means, XML Programming, chapters 1, 2, 3, and 5, O Reilly & Associates, Inc., 2004.