XML: some structural principles

Similar documents
XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11

.. Cal Poly CPE/CSC 366: Database Modeling, Design and Implementation Alexander Dekhtyar..

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

XML. Jonathan Geisler. April 18, 2008

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

Announcements. 1. Class webpage: Have you been reading the announcements? Lecture slides and coding examples will be posted

COMP9321 Web Application Engineering

Announcements. 1. Class webpage: Have you been reading the announcements? Lecture slides and coding examples will be posted

Delivery Options: Attend face-to-face in the classroom or remote-live attendance.

Chapter 1 Getting Started with HTML 5 1. Chapter 2 Introduction to New Elements in HTML 5 21

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML

AIM. 10 September


Chapter 1: Getting Started. You will learn:

Delivery Options: Attend face-to-face in the classroom or via remote-live attendance.

Sections and Articles

Chapter 13 XML: Extensible Markup Language

HTML. Mohammed Alhessi M.Sc. Geomatics Engineering. Internet GIS Technologies كلية اآلداب - قسم الجغرافيا نظم المعلومات الجغرافية

The XML Metalanguage

This course is designed for web developers that want to learn HTML5, CSS3, JavaScript and jquery.

CS6501 IP Unit IV Page 1

XHTML & CSS CASCADING STYLE SHEETS

Assignments (4) Assessment as per Schedule (2)

CHAPTER 2 MARKUP LANGUAGES: XHTML 1.0

What is XML? XML is designed to transport and store data.

extensible Markup Language

COMP9321 Web Application Engineering

CSI 3140 WWW Structures, Techniques and Standards. Markup Languages: XHTML 1.0

Markup Languages SGML, HTML, XML, XHTML. CS 431 February 13, 2006 Carl Lagoze Cornell University

Introduction to XML. XML: basic elements

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

Data Presentation and Markup Languages

Web Development IB PRECISION EXAMS

Alpha College of Engineering and Technology. Question Bank

Overview. Introduction. Introduction XML XML. Lecture 16 Introduction to XML. Boriana Koleva Room: C54

INTRODUCTION TO HTML5! HTML5 Page Structure!

Programming the World Wide Web by Robert W. Sebesta

The Benefits of CSS. Less work: Change look of the whole site with one edit

Developing Web Applications

Extensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013

UI Course HTML: (Html, CSS, JavaScript, JQuery, Bootstrap, AngularJS) Introduction. The World Wide Web (WWW) and history of HTML

Introduction to WEB PROGRAMMING

Hypertext Markup Language, or HTML, is a markup

CSS 1: Introduction. Chapter 3

HTML CS 4640 Programming Languages for Web Applications

XML. Objectives. Duration. Audience. Pre-Requisites

INTERNET PROGRAMMING XML

CS WEB TECHNOLOGY

Styles, Style Sheets, the Box Model and Liquid Layout

Java Applets, etc. Instructor: Dmitri A. Gusev. Fall Lecture 25, December 5, CS 502: Computers and Communications Technology

Deccansoft Software Services

XML Processing & Web Services. Husni Husni.trunojoyo.ac.id

Introduction to XML. M2 MIA, Grenoble Université. François Faure

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answer from all the Groups as directed. Group A.

The Xlint Project * 1 Motivation. 2 XML Parsing Techniques

IT6503 WEB PROGRAMMING. Unit-I

Setting Up a Development Server What Is a WAMP, MAMP, or LAMP? Installing a WAMP on Windows Testing the InstallationAlternative WAMPs Installing a

ADDING CSS TO YOUR HTML DOCUMENT. A FEW CSS VALUES (colour, size and the box model)

2. Write style rules for how you d like certain elements to look.

Understanding the Web Design Environment. Principles of Web Design, Third Edition

XML: Extensible Markup Language

Review of HTML. Chapter Pearson. Fundamentals of Web Development. Randy Connolly and Ricardo Hoar

SDPL : XML Basics 2. SDPL : XML Basics 1. SDPL : XML Basics 4. SDPL : XML Basics 3. SDPL : XML Basics 5

XML Structures. Web Programming. Uta Priss ZELL, Ostfalia University. XML Introduction Syntax: well-formed Semantics: validity Issues

Programmazione Web a.a. 2017/2018 HTML5

Lecture : 3. Practical : 2. Course Credit. Tutorial : 0. Total : 5. Course Learning Outcomes

TagSoup: A SAX parser in Java for nasty, ugly HTML. John Cowan

x ide xml Integrated Development Environment Specifications Document 1 Project Description 2 Specifi fications

CSC Web Technologies, Spring Web Data Exchange Formats

Tutorial 1 Getting Started with HTML5. HTML, CSS, and Dynamic HTML 5 TH EDITION

1/6/ :28 AM Approved New Course (First Version) CS 50A Course Outline as of Fall 2014

DEVELOPING A MESSAGE PARSER TO BUILD THE TEST CASE GENERATOR

COPYRIGHTED MATERIAL. Contents. Chapter 1: Creating Structured Documents 1

Introduction Syntax and Usage XML Databases Java Tutorial XML. November 5, 2008 XML

Certified CSS Designer VS-1028

JAVASCRIPT FOR PROGRAMMERS

Shankersinh Vaghela Bapu Institue of Technology

Lesson 5 Introduction to Cascading Style Sheets

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

COMS 359: Interactive Media

Ministry of Higher Education and Scientific Research

W3C XML XML Overview

Adobe Dreamweaver CS6 Digital Classroom

DDC Learning Web Design with Adobe CS5 Georgia Edition 2011

Cascading Style Sheets for layout II CS7026

M359 Block5 - Lecture12 Eng/ Waleed Omar

Govt. of Karnataka, Department of Technical Education Diploma in Computer Science & Engineering. Fifth Semester. Subject: Web Programming

Basics of Web Design, 3 rd Edition Instructor Materials Chapter 2 Test Bank

Author: Irena Holubová Lecturer: Martin Svoboda

Structured documents

Chapter 3 Style Sheets: CSS

FUNDAMENTALS OF WEB DESIGN (46)

What is a web site? Web editors Introduction to HTML (Hyper Text Markup Language)

Interview Question & Answers

XML Extensible Markup Language

CompuScholar, Inc. Alignment to Utah's Web Development I Standards

Chapter 1 Introduction to HTML, XHTML, and CSS

UNIT -II. Language-History and Versions Introduction JavaScript in Perspective-

Index. alt, 38, 57 class, 86, 88, 101, 107 href, 24, 51, 57 id, 86 88, 98 overview, 37. src, 37, 57. backend, WordPress, 146, 148

Data Exchange. Hyper-Text Markup Language. Contents: HTML Sample. HTML Motivation. Cascading Style Sheets (CSS) Problems w/html

Transcription:

XML: some structural principles Hayo Thielecke University of Birmingham www.cs.bham.ac.uk/~hxt October 18, 2011 1 / 25

XML in SSC1 versus First year info+web Information and the Web is optional in Year 1 Info+Web is a how-to for web technologies (HTML, XML, Javascript,...) Iain Styles s lecture notes: XML: A Toolkit for Structured Information http://www.cs.bham.ac.uk/internal/courses/ info-web/2010/xml.html In this module, we have only a few lectures on XML But you know more CS now: reg exps, grammar, parse trees Will focus on those aspects of XML: structure Learning Outcome for SSC1: structural principles of XML 2 / 25

Some XML constructs and terminology Tags: opening <a> and closing </a> Elements: complete <a>... </a> pair attributes inside tags, e.g. <div id="navigation"> empty tag <br /> abbreviates <br></br> W3C produces official standards for XML and its allies Why have one standard when you can have many? HTML, XHTML 1.x, HTML5, SGML,... ML : markup language: tags 3 / 25

Example: XHTML modern HTML standard based on XML Separation of structure (XHTML) from layout (CSS) Same webpage can be displayed differently (e.g. for printing, partially sighted) CSS = Cascading Style Sheets CSS is an unparser for XHTML CSS renders tree as nested boxes Nesting of elements nesting of boxes 4 / 25

XML = Dyck language + text Example: module web page structure in XHTML. <html> <head> <title> Software Systems Componenents 1, 2011-2012 </title> </head> <body> <h1> Software Systems Components 1, 2011/2012 </h1> </body> </html> 5 / 25

Example: Module web page with CSS The XHTML has been parsed into a DOM tree. CSS has traversed the tree and rendered nesting boxes. 6 / 25

Continued: Module web page without CSS Same XHTML. All the info is and tree structure there. Rendered with default old-timey layout; nesting not as obvious. 7 / 25

CSS: from tree to text again In the browser, the renderer walks over the XML tree. At each node, it applies a CSS style rule. Example: header in navigation bars: #navigation h4 { width : auto; padding : 5pt; background-color : #2F4F4F; margin : 0em; color : white; margin-bottom : 0pt; font-size : 110%; } 8 / 25

XML parsers XML needs to be parsed Match brackets and build tree somewhat like a Dyck language parser but more complex XML parsing makes a distinction between well-formed and valid In grammars, no such distinction (valid is implicit) 9 / 25

Well-formed and valid Well-formed XML An XML file is legal or well-formed if it meets certain syntactic conditions. The most important of these conditions is that all the tags match. Compare Dyck: [ [ ] [ ] ] Valid XML If an XML file is well-formed, it may still not have the structure we expect it to have. For instance, do we want an element A to have a B child? If so, how many? Compare A B C 10 / 25

An example for validity <A> <B> </B> <C> <D> Fnord </D> </C> </A> There is additional structure: A element contains a B and a C C contains D D contains plain text 11 / 25

DTD - Document Type Definition The previous example is valid for the following DTD <!ELEMENT A (B,C)> <!ELEMENT B EMPTY> <!ELEMENT C (D)> <!ELEMENT D (#PCDATA)> Note: #PCDATA means text. 12 / 25

Regular expression constructs in DTDs DTDs can contain reg exp constructs like + and * for repetitions and for alternation. However, the reg exps have to be deterministic. No FIRST/FIRST conflicts are allowed. Need to rewrite (a b) (a c) as a (b c) 13 / 25

Class hierarchy analogous to the above DTD Recall how we represent typed parse trees using the COMPOSITE pattern. class A { B b; C c; } class B { } class C { D d; } class D { String fnord = "Fnord"; } 14 / 25

S-expressions and XML Lisp has symbolic expressions since then 1960s Lots of Irritating Silly Parentheses Idea: forget about syntax, just write down the parse tree Put in enough brackets to make it unambiguous Lisp syntax is analogous to XML but more concise (A...) is like <A>... </A> Our example becomes: (A (B) (C (D "Fnord"))) 15 / 25

Tree parsers Tree is constructed while parsing DOM tree: Document Object Model Like a parse tree (roughly) Compare: ANTLR builds parse tree for grammar 16 / 25

Streaming parsers, StAX DOM trees get large May consume too much memory (but it depends) Streaming parser traverses tree without constructing tree as data structure Compare: Parse tree traversal derivation StAX: Streaming API for XML Stream of events: beginning of element, end of element, plain text We could build the tree from that in principle Nice example: echo and indent an XML file 17 / 25

StaX example From http://download.oracle.com/javaee/1.4/tutorial/ doc/jaxpsax3.html ELEMENT: CHARS: ELEMENT: CHARS: END_ELM: CHARS: END_ELM: <item> Why <em> WonderWidgets </em> are great </item> 18 / 25

Alternatives to DTDs DTD have limitations. More powerful alternatives include XML Schema and Relax NG. The latter has a reasonable syntax based on grammars and regular expressions. 19 / 25

XPATH Language for navigation of XML trees. Like navigating a Unix file tree with paths /users/hxt/public html/2011/19343 Navigate to child element Navigate to all descendants If you want to go deeper: XPATH is a modal logic for trees 20 / 25

Software security related to this module Relevant to our flagship MSc in Computer Security and my Secure Programming module http://www.cs.bham.ac.uk/~hxt/2011/20010/ Software security is not an add-on feature not security software Security aspects are best learned along with the main material We should cover some in undergrad teaching, not just the MSc 21 / 25

Streams and security Serializing can be dangerous. Loading an object of unkown class Dynamic class loading Malicious mobile code Calendar Bug in Java: http://slightlyrandombrokenthoughts.blogspot.com/ 2008/12/calendar-bug.html 22 / 25

Regular expression matching and security Inefficient backtracking for reg exp matching takes forever on some inputs. Attackers can force a Denial of Service (DoS) attack. Regular Expression Denial of Service Attacks and Defenses See http: //msdn.microsoft.com/en-us/magazine/ff646973.aspx 23 / 25

XML and security Attackers can force a Denial of Service (DoS) attack. See http: //msdn.microsoft.com/en-us/magazine/ee335713.asp XML Entity expansion XML parsing: Loads of opening tags overflow the parsing stack. XML injection; XPATH injection compare SQL injection (SSC2) 24 / 25

Streams, XML, parsing: reading structured input The stream hierarchy cooks raw input, e.g., 12345 integer Regular expression matching can be done on streams Regular expressions Perl regex (which deserve sneer quotes) with exponential runtime Parsers are more powerful than regular expressions Regular expression constructs can be used in grammars Regular expressions are also part of other languages, e.g, DTDs for XML XML needs to be parsed; it is a language of brackets like Dyck XML parsers may build a tree or not, just like general-purpose parsers 25 / 25