Introduction to XML. When talking about XML, here are some terms that would be helpful:

Similar documents
Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa


XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11

markup language carry data define your own tags self-descriptive W3C Recommendation

The concept of DTD. DTD(Document Type Definition) Why we need DTD

CPT374 Tutorial-Laboratory Sheet Two

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

Overview. Introduction. Introduction XML XML. Lecture 16 Introduction to XML. Boriana Koleva Room: C54

Introduction to XML. Chapter 133

EXtensible Markup Language XML

Extensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013

Chapter 1: Getting Started. You will learn:

Introduction to XML. XML: basic elements

Web Programming Paper Solution (Chapter wise)

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

CHAPTER 2 MARKUP LANGUAGES: XHTML 1.0

Introduction to XML. National University of Computer and Emerging Sciences, Lahore. Shafiq Ur Rahman. Center for Research in Urdu Language Processing

.. Cal Poly CPE/CSC 366: Database Modeling, Design and Implementation Alexander Dekhtyar..

Introduction to XML. An Example XML Document. The following is a very simple XML document.

2009 Martin v. Löwis. Data-centric XML. XML Syntax

Structured documents

Overview. Structured Data. The Structure of Data. Semi-Structured Data Introduction to XML Querying XML Documents. CMPUT 391: XML and Querying XML

Author: Irena Holubová Lecturer: Martin Svoboda

Constructing a Document Type Definition (DTD) for XML

Web Services Part I. XML Web Services. Instructor: Dr. Wei Ding Fall 2009

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

Tutorial 2: Validating Documents with DTDs

Outline. XML vs. HTML and Well Formed vs. Valid. XML Overview. CSC309 Tutorial --XML 4. Edward Xia

XML Structures. Web Programming. Uta Priss ZELL, Ostfalia University. XML Introduction Syntax: well-formed Semantics: validity Issues

Data Presentation and Markup Languages

M359 Block5 - Lecture12 Eng/ Waleed Omar

Information Systems. XML Essentials. Nikolaj Popov

User Interaction: XML and JSON

CSI 3140 WWW Structures, Techniques and Standards. Markup Languages: XHTML 1.0

CSC Web Technologies, Spring Web Data Exchange Formats

EMERGING TECHNOLOGIES. XML Documents and Schemas for XML documents

INTERNET PROGRAMMING XML

XML. extensible Markup Language. Overview. Overview. Overview XML Components Document Type Definition (DTD) Attributes and Tags An XML schema

Make a Website. A complex guide to building a website through continuing the fundamentals of HTML & CSS. Created by Michael Parekh 1

Semistructured data, XML, DTDs

PODCASTS, from A to P

User Interaction: XML and JSON

Fundamentals of Web Programming a

Introduction to XML. Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University

COMP9321 Web Application Engineering

XML Information Set. Working Draft of May 17, 1999

Introduction Syntax and Usage XML Databases Java Tutorial XML. November 5, 2008 XML

The main Topics in this lecture are:

What is XML? XML is designed to transport and store data.

Hypertext Markup Language, or HTML, is a markup

Introduction to XML 3/14/12. Introduction to XML

Create web pages in HTML with a text editor, following the rules of XHTML syntax and using appropriate HTML tags Create a web page that includes

- XML. - DTDs - XML Schema - XSLT. Web Services. - Well-formedness is a REQUIRED check on XML documents

SDPL : XML Basics 2. SDPL : XML Basics 1. SDPL : XML Basics 4. SDPL : XML Basics 3. SDPL : XML Basics 5

XML (Extensible Markup Language)

Semistructured Data and XML

HTML. Mohammed Alhessi M.Sc. Geomatics Engineering. Internet GIS Technologies كلية اآلداب - قسم الجغرافيا نظم المعلومات الجغرافية

Full file at New Perspectives on HTML and CSS 6 th Edition Instructor s Manual 1 of 13. HTML and CSS

XML, DTD: Exercises. A7B36XML, AD7B36XML: XML Technologies. Practical Classes 1 and 2: 3. and

XML. XML Syntax. An example of XML:

XML (Extensible Markup Language

XML. Objectives. Duration. Audience. Pre-Requisites

A gentle guide to DocBook How to use the portable document creator

Tutorial 1 Getting Started with HTML5. HTML, CSS, and Dynamic HTML 5 TH EDITION

Java EE 7: Back-end Server Application Development 4-2

CSS, Cascading Style Sheets

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington

5/19/2015. Objectives. JavaScript, Sixth Edition. Introduction to the World Wide Web (cont d.) Introduction to the World Wide Web

Design issues in XML formats

PART COPYRIGHTED MATERIAL. Getting Started LEARN TO: Understand HTML, its uses, and related tools. Create HTML documents. Link HTML documents

User Interaction: XML and JSON

The Extensible Markup Language (XML) and Java technology are natural partners in helping developers exchange data and programs across the Internet.

XML 2 APPLICATION. Chapter SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

Part A: Getting started 1. Open the <oxygen/> editor (with a blue icon, not the author mode with a red icon).

Podcasting in The Classroom

Outline. XML DOCTYPE External - SYSTEM. XML DOCTYPE Internal DTD &6&7XWRULDO ;0/ (GZDUG;LD

UNIT I. A protocol is a precise set of rules defining how components communicate, the format of addresses, how data is split into packets

Part 2: XML and Data Management Chapter 6: Overview of XML

Using UML To Define XML Document Types

02 Structured Web. Semantic Web. Documents in XML

Announcements. Paper due this Wednesday

Session [2] Information Modeling with XSD and DTD

Chapter 13 XML: Extensible Markup Language

PODCASTS, from A to P

HTML. Hypertext Markup Language. Code used to create web pages

extensible Markup Language

DOWNLOAD PDF CAN I ADD A PAGE TO MY WORD UMENT

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

Informatics 1: Data & Analysis

It is possible to create webpages without knowing anything about the HTML source behind the page.

Solutions. a. Yes b. No c. Cannot be determined without the DTD. d. Schema. 9. Explain the term extensible. 10. What is an attribute?

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML

Introduction to XML. University of California, Santa Cruz Extension Computer and Information Technology

XML, DTD, and XPath. Announcements. From HTML to XML (extensible Markup Language) CPS 116 Introduction to Database Systems. Midterm has been graded

Building an ASP.NET Website

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

S emistructured Data & XML

XMLInput Application Guide

Transcription:

Introduction to XML XML stands for the extensible Markup Language. It is a new markup language, developed by the W3C (World Wide Web Consortium), mainly to overcome limitations in HTML. HTML is an immensely popular markup language. Even though HTML is a popular and successful markup language, it has some major shortcomings. XML was developed to address these shortcomings. It was not introduced for replacement. When talking about XML, here are some terms that would be helpful: XML: extensible Markup Language, a standard created by the W3Group for marking up data. DTD: Document Type Definition, a set of rules defining relationships within a document; DTDs can be "internal" (within a document) or "external" (links to another document). XML Parser: Software that reads XML documents and interprets or "parse" the code according to the XML standard. A parser is needed to perform actions on XML, such as comparing an XML document to a DTD. XML Anatomy If you have ever done HTML coding, creating an XML document will seem very familiar. Like HTML, XML is based on SGML, Standard Generalized Markup Language, and designed for use with the Web. If you haven't coded in HTML before, after creating an XML document, you should find creating HTML documents easy. XML documents, at a minimum, are made of two parts: the prolog and the content. 1. The prolog or head of the document usually contains the administrative metadata about the rest of document. It will have information such as what version of XML is used, the character set standard used, and the DTD, either through a link to an external file or internally. 1 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

2. Content is usually divided into two parts that of the structural markup and content contained in the markup, which is usually plain text. Let's take a look at a simple prologue for an XML document: <?xml version="1.0" encoding="iso-8859-1"?> <?xml declares to a processor that this is where the XML document begins. version="1.0" declares which recommended version of XML the document should be evaluated in. encoding="iso-8859-1" identifies the standardized character set that is being used to write the markup and content of the XML. The structural markup consists of elements, attributes, and entities; primarily focus on elements and attributes. Elements have a few particular rules: 1. Element names can be any mixture of characters, with a few exceptions. However, element names are case sensitive, unlike HTML. For instance, <elementname> is different from <ELEMENTNAME>, which is different from <ElementName>. Note: The characters that are excluded from element names in XML are &, <, ", and >, which are used by XML to indicate markup. The character: should be avoided as it has been used for special extensions in XML. If you want to use these restricted characters as part of the content within elements but do not want to create new elements, then you would need to use the following entities to have them displayed in XML: XML Entity Names for Restricted Characters Use & For & < < > > " " 2 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

2. Elements containing content must have closing and opening tags. <elementname> (opening) </elementname> (closing). Note that the closing tag is the exact same as the opening tag, but with a backslash in front of it. The content within elements can be either elements or character data. If an element has additional elements within it, then it is considered a parent element; those contained within it are called child elements. For example, <elementname>this is a sample of <anotherelement> simple XML</anotherElement>Coding </elementname>. So in this example, <elementname> is the parent element. <anotherelement> is the child of elementname, because it is nested within elementname. Elements can have attributes attached to them in the following format: <elementname attributename="attributevalue" > While attributes can be added to elements in XML, there are a couple of reasons to use attributes sparingly: XML parsers have a harder time checking attributes against DTDs. If the information in the attribute is valuable, why not contain that information in an element? Since some attributes can only have predefined categories, you can't go back and easily add new categories. We recommend using attributes for information that isn't absolutely necessary for interpreting the document or that has a predefined number of options that will not change in the future. When using attributes in XML, the value of the attributes must always be contained in quotes. The quotes can be either single or double quotes. For example, the attribute version= 1.0 in the opening XML declaration could be written version= 1.0 and would be interpreted the same way 3 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

by the XML parser. However, if the attribute value contains quotes, it is necessary to use the other style of quotation marks to indicate the value. For example, if there was an attribute name with a value of John Q. Public then it would need to be marked up in XML as name= John Q Public, using the symbols for quotes to enclose the attribute value that is not being used in the value itself. 4 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

Creating a Simple XML Document Now that you know the basic rules for creating an XML document, let's try them out. Like most, if not all, standards developed by the W3Group, you can create XML documents using a plain text editor like Notepad (PC), TextEdit (Mac), or Pico (UNIX). You can also use programs like Dreamweaver and Cooktop, but all that is necessary to create the document is a text editor. Let's say we have two types of documents we would like to wrap in XML: emails and letters. We want to encode the emails and letters because we are creating an online repository of archival messages within an organization or by an individual. By encoding them in XML, we hope to encode their content once and be able to translate it to a variety of outputs, like HTML, PDFs, or types not yet created. To begin, we need to declare an XML version: <?xml version="1.0" encoding="iso-8859-1"?> Now, after declaring the XML version, we need to determine the root element for the documents. Let's use message as the root element, since both email and letters can be classified as messages. <?xml version="1.0" encoding="iso-8859-1"?> <message> </message> Note: You might have noticed that I created both the opening and closing tags for the message element. When creating XML documents, it is useful to create both the opening and closing elements at the same time. After creating the tags, you would then fill in the content. Since one of the fatal errors for XML is forgetting to close an element, if you make the opening and closing tags each time you create an element, you won't accidentally forget to do so. 5 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

Parent and child relationships A way of describing relationships in XML is the terminology of parent and child. In our examples, the parent or "root" element is <message>, which then has two child elements, <email>, and <letter>. An easy way of showing how elements are related in XML is to indent the code to show that an element is a child of another. For example, <?xml version="1.0" encoding="iso-8859-1"?> <message> <email> </email> </message> Now that we have the XML declaration, the root element, and the child element (email), let's determine the information we want to break out in an email. Say we want to keep information about the sender, recipients, subject, and the body of the text. Since the information about the sender and recipients are generally in the head of the document, let's consider them children elements of a parent element that we will call <header>. In addition to <header>, the other child elements of <email> will be <subject> and <text>. So our XML will look something like this: <?xml version="1.0" encoding="iso-8859-1"?> <message> <email> <header> <sender>me@ischool.utexas.edu</sender> <recipient>you@ischool.utexas.edu</recipient> </header> <subject>re: XML </subject> <text>i'm working on my XML project right now. </text> </email> </message> 6 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

Now, let's create an XML document for a letter. Some of the information in a letter we want to know include the sender, the recipient, and the text of the letter. Additionally, we want to know the date that it was sent and what salutation was used to start off the message. Let's see what this would look like in XML: <?xml version="1.0" encoding="iso-8859-1"?> <message> <letter> <letterhead> <sender>margaret</sender> <recipient>god</recipient> <date>1970</date> </letterhead> <text> <salutation>are you there God?</salutation> It's me Margaret... </text> </letter> </message> Now say we wanted to keep track of whether or not these messages were replies or not. Instead of creating an additional element called <reply>, let's assign an attribute to the elements <email> and <letter> indicating whether that document was a reply to a previous message. In XML, it would look something like this: <email reply="yes"> or <letter reply="no"> When creating XML documents, it's always useful to spend a little time thinking about what information you want to store, as well as what relationships the elements will have. Now that we've made some XML documents, let's talk about "well formed" XML and valid XML. We have DTD s which tells that the XML document is well formed and valid or not. 7 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

DTD (Document Type Definition) Document Type Definition, a mechanism to describe the structure of documents. Sometimes XML is too flexible: Most Programs can only process a subset of all possible XML applications. For exchanging data, the format (i.e., elements, attributes and their semantics) must be fixed. Document Type Definitions (DTD) for establishing the vocabulary for one XML application (in some sense comparable to schemas in databases) A document is valid with respect to a DTD if it conforms to the rules specified in that DTD. Most XML parsers can be configured to validate. The syntax for DTD The syntax for DTDs is different from the syntax for XML documents. Example: 1 is the address book XML code but with one difference: It has a new <!DOCTYPE> statement. The new statement is introduced in the section Document Type Declaration. For now, it suffices to say that it links the document file to the DTD file. Example: 2 is its DTD. Example: 1 An Address Book in XML <?xml version= 1.0?> <!DOCTYPE address-book SYSTEM address-book.dtd > <!-- loosely inspired by vcard 3.0 --> <address-book> <entry> <name>john Doe</name> <address> 8 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

<street>34 Fountain Square Plaza</street> <region>oh</region> <postal-code>45202</postal-code> <locality>cincinnati</locality> <country>us</country> </address> <tel preferred= true >513-555-8889</tel> <tel>513-555-7098</tel> <email href= mailto:jdoe@emailaholic.com /> </entry> <entry> <name><fname>jack</fname><lname>smith</lname></name> <tel>513-555-3465</tel> <email href= mailto:jsmith@emailaholic.com /> </entry> </address-book> Example: 2 DTD for the Address Book <!-- top-level element, the address book is a list of entries--> <!ELEMENT address-book (entry+)> <!-- an entry is a name followed by addresses, phone numbers, etc.--> 9 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

<!ELEMENT entry (name,address*,tel*,fax*,email*)> <!-- name is made of string, first name and last name. This is a very flexible model to accommodate exotic name--> <!ELEMENT name (#PCDATA fname lname)*> <!ELEMENT fname (#PCDATA)> <!ELEMENT lname (#PCDATA)> <!-- definition of the address structure if several addresses, the preferred attribute signals the default one --> <!ELEMENT address <!ATTLIST address <!ELEMENT street <!ELEMENT region (street,region?,postal-code,locality,country)> preferred (true false) false > (#PCDATA)> (#PCDATA)> <!ELEMENT postal-code (#PCDATA)> <!ELEMENT locality <!ELEMENT country (#PCDATA)> (#PCDATA)> <!-- phone, fax and email, same preferred attribute as address --> <!ELEMENT tel <!ATTLIST tel <!ELEMENT fax <!ATTLIST fax <!ELEMENT email <!ATTLIST email (#PCDATA)> preferred (true false) false > (#PCDATA)> preferred (true false) false > EMPTY> href CDATA #REQUIRED preferred (true false) false > 10 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

DTD Example: Elements Declaration in DTD One element declaration for each element type: <!ELEMENT element_name content_specification> Where content specification can be (#PCDATA) parsed character data (child) one child element (c1,,cn) a sequence of child elements c1 cn (c1 cn) one of the elements c1 cn Special Characters For each component c, possible counts can be specified: c exactly one such element c+ one or more c* zero or more c? zero or one Plus arbitrary combinations using parenthesis: <!ELEMENT f ((a b)*,c+,(d e))*> 11 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

Elements with mixed content: <!ELEMENT text (#PCDATA index cite glossary)*> Elements with empty content: <!ELEMENT image EMPTY> Elements with arbitrary content (this is nothing for production-level DTDs): <!ELEMENT thesis ANY> Attribute Declaration Attribute Example 12 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

Attributes are declared per element: <!ATTLIST section number CDATA #REQUIRED title CDATA #REQUIRED> declares two required attributes for element section. Possible attribute defaults: #REQUIRED is required in each element instance #IMPLIED is optional #FIXED default always has this default value default has this default value if the attribute is omitted from the element instance CDATA string data (A1 An) enumeration of all possible values of the attribute (each is XML name) ID unique XML name to identify the element IDREF refers to ID attribute of some other element ( intra-document link ) IDREFS list of IDREF, separated by white space Linking DTD and XML Docs DTDs are of two type: a) Internal b) Seperate Internal DTD: <?xml version= 1.0?> <!DOCTYPE article [ ]> <article>... </article> <!ELEMENT article (title,author+,text)>... <!ELEMENT index (#PCDATA)> 13 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

Both ways can be mixed, internal DTD overwrites external entity information: <!DOCTYPE article SYSTEM article.dtd [ <!ENTITY % pub_content (title+,author*,text) ]> Flaws of DTDs No support for basic data types like integers, doubles, dates, times, No structured, self-definable data types No type derivation id/idref links are quite loose (target is not specified) 14 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

RSS (Really Simple Syndication) About RSS If you frequent Weblogs, you've seen the little XML icons inviting you to "syndicate this site", but what does that really mean? A long time ago, newspaper managers realized that if they could use articles and stories from other newspapers in their paper, they could garner more readers because they could cover a wider area than they could with just their own reporters. This is an example of how syndication can work in print. Online, there are potentially millions of authors writing about millions of topics each day. It can be very difficult to keep track of without some type of automated system. And that's where RSS comes in. Really Simple Syndication (RSS) is an easy way for Web sites to share headlines and stories from other sites. Web surfers can use sophisticated news readers to surf these headlines using RSS aggregators. A Brief History of RSS RSS was first invented by Netscape, when they were trying to get into the portal business. They wanted an XML format (RSS.90) that would be easy for them to get news stories and information from other sites and have them automatically added to their site. They then came out with RSS.91 and dropped it when they decided to get out of the portal business. What is RSS? RSS is a protocol that lets users subscribe to online content using an RSS reader or aggregator, which checks subscribed Web pages and automatically downloads new content. The aggregators display a list of subscriptions, with highlighting or another indicator of RSS feeds that have added content since the user last logged in. Without having to go to all of the individual Web sites, users can quickly and easily access new material from sites that interest them. For many, RSS has become the pipe through which content flows from providers to consumers. What makes RSS important is that users decide exactly what content is allowed through that pipe. Since its introduction in the late 1990s, RSS has become almost ubiquitous. An excellent mechanism for distributing regularly updated content, RSS is a natural complement to blogs, news sites, photo-sharing applications, and podcasts. The popularity of podcasting results on some level 15 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

from RSS technology. When new podcasts are available, the aggregator (or, in this case, podcatcher) automatically downloads the new file to your computer or portable music player. Why it is significant? In many ways that lets users subscribe to online content using an RSS reader or aggregator, which checks subscribed Web pages and automatically downloads new content. The aggregators display a list of subscriptions, with highlighting or another indicator of RSS feeds that have added content since the user last logged in. Without having to go to all of the individual Web sites, users can quickly and easily access new material from sites that interest them. For many, RSS has become the pipe through which content flows from providers to consumers. What makes RSS important is that users decide exactly what content is allowed through that pipe. Since its introduction in the late 1990s, RSS has become almost ubiquitous. An excellent mechanism for distributing regularly updated content, RSS is a natural complement to blogs, news sites, photo-sharing applications, and podcasts. The popularity of podcasting results on some level from RSS technology. When new podcasts are available, the aggregator (or, in this case, podcatcher) automatically downloads the new file to your computer or portable music player. RSS Example: RSS files are essentially XML formatted plain text. The RSS file itself is relatively easy to read both by automated processes and by humans alike. An example file could have contents such as the following. This could be placed on any appropriate communication protocol for file retrieval, such as http or ftp, and reading software would use the information to present a neat display to the end users. <?xml version="1.0" encoding="utf-8"?> <rss version="2.0"> <channel> <title>rss Title</title> <description>this is an example of an RSS feed</description> <link>http://www.someexamplerssdomain.com/main.html</link> <lastbuilddate>mon, 06 Sep 2010 00:01:00 +0000 </lastbuilddate> <pubdate>mon, 06 Sep 2009 16:20:00 +0000 </pubdate> 16 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.

<ttl>1800</ttl> <item> <title>example entry</title> <description>here is some text.</description> <link>http://www.wikipedia.org/</link> <guid>unique string per item</guid> <pubdate>mon, 06 Sep 2009 16:20:00 +0000 </pubdate> </item> </channel> </rss> 17 P a g e C E M I S, U n i v e r s i t y o f N i z w a, O m a n.