COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML Schema, and Tree Grammars XSLT

Size: px
Start display at page:

Download "COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML Schema, and Tree Grammars XSLT"

Transcription

1 COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML Schema, and Tree Grammars XSLT Bijan Parsia and Uli Sattler University of Manchester 1 1

2 Datatypes and representations Or, are you my type? 2 2

3 M3 Did you need for to write get-all-axioms.xquery? You did need disjunction in some form! Consider predicate tests vs. functions returning nodes ssd:axiom(.) = true How would we use this? ssd:axiom(.) as element() How would we use this? Types affect PSVI! Abstract types as well! Inheritance simulates disjunction! (In this case!) Nodes contain their children Nodes/elements are not tags! 3 3

4 Some SE3 Questions CLICK! Which query is most robust to changes in the schema? 1. /*/(equivalent subsumes...) 2. /*/*[ssd:axiom(.)] 3. /*/element(*,el:axiom) 4. They are equi-robust (and fragile) 5. They are equi-robust (and robust) Which query is most widely usable? 1. /*/(equivalent subsumes...) 2. /*/*[ssd:axiom(.)] 3. /*/element(*,el:axiom) 4. They are equi-usable (and not widely usable) 5. They are equi-usable (and widely usable) 4 4

5 Robustness as a value Robustness in the face of change A measure of evolvability If something changes, does our system break? If our system breaks, do we know that it broke? Did it fail silently. If it broke, can we fix it? If we fixed it, can we tell? Will anything else break? Given a prospective change, can we predict breakage? Robustness is an organization-wide phenomenon Fragility in one area can be compensated for by another E.g., by someone who never sleeps and knows the system Different sorts of fragility With different probabilities and costs 5 5

6 Type Systems What, in the most general sense, is a datatype? 1. A set of (data) values 2. A description of the arguments of a function 3. Anything derived from xs:anytype 4. An annotation of a variable Anything naming or describing a set...has an associated type! Types are just sets (of values ) The extensional view But we may not be able to express this type» in certain ways A Type System is a language for describing types (the intensional view) associating types with other linguistic entities E.g., literals, variables, expressions, programs CLICK! 6 6

7 A Typical Type System Has a set of primitive or built-in or basic types Integer, strings, etc. Typically, lots of builtin support Has a set of composite types Arrays, records, dictionaries, etc. Typically, there are constructors for composite types So we can define an Array of Integers Has a set of additional constructors To, for example, create other derived types E.g., Positive Integers Has a syntax for associating types with variables And functions, etc. Type Declarations A set of conditions for success or failure (Type Errors) 7 7

8 W3C XML Schema Has/Is a type system Type are central, in fact Both a structuring mechanism a way of modifying the PSVI integers rather than strings elements have types as well as names Large set of simple types Strings, integers, etc. in many flavors Key composite type: Complex Essentially, element content models But named And composed Derviation by extension Other! (List, union,...) XML Schema (plus a little) is XQuery s type system 8 8

9 A Brief Tour of Type Systems Strong vs. Weak Type errors cause failure Static vs. Dynamic Check at compile type or at run time Explicit vs. Implicit Declarations Also known as Manifest vs. Latent Type inference vs. type checking Nominal vs. Structural Type compatibility relies on features of the declaration I declare a two types, miles and feet whose values are integers 1 as miles!= 1 as feet Type compatibility relies entirely on the structure of the values 1 as miles == 1 as feet (1 is the same integer!) 9 9

10 Some questions CLICK! Java s type system is primarily 1. strong, manifest, and nominal 2. strong, manifest, and structural 3. strong, latent, and nominal 4. weak, latent, and structural 5. weak, manifest, and nominal XQuery s type system is primarily 1. strong, manifest, and nominal 2. strong, manifest, and structural 3. strong, latent, and nominal 4. weak, latent, and nominal 5. weak, manifest, and structural 10 10

11 Some Expression Examples Consider a simple expression if (true()) then 1+1 else "2" What is itʼs type? if (true()) then 1+1 else "2" instance of xs:integer Consider another if (false()) then 1+1 else "2" How about this? (if (false()) then 1+1 else "2") instance of xs:string Finally if ($abool) then 1+1 else "2" (Assume that $abool is restricted to xs:boolean) (if ($abool) then 1+1 else "2") instance of (xs:integer xs:string) What s the most restrictive type of each of these? Not legal XQuery 11 11

12 Mistyped Obvious conflict "2" + 2 Arithmetic operator is not defined for arguments of types (xs:integer, xs:string) Slightly less obvious conflict (if (false()) then 1+1 else "2") + 2 Same as above (if (true()) then 1+1 else "2") + 2 This is fine! Conflicts declare function ssd:test($x as xs:boolean) as xs:integer{ if ($x) then 1+1 else "2" + 2 }; declare function ssd:test($x as xs:boolean) as xs:integer{ if ($x) then 1+1 else "2" }; My checker doesn t flag this error It does flag this one! 12 12

13 Simple Promotion Explicit (1.0 + ("1" cast as xs:integer)) instance of xs:decimal True! Implicit ((1.0 treat as xs:decimal) + 125E2) instance of xs:double Also true Same as: ((1.0 cast as xs:double l) + 125E2) instance of xs:double Note that treat as and cast as are not the same ("1.0" treat as xs:decimal) Required item type of value in 'treat as' expression is xs:decimal or subtypes; supplied value has item type xs:string (1 treat as xs:integer) vs. (1 treat as xs:decimal)» Fixes the static type ("1.0" cast as xs:decimal) This results in 1 Implicit cast here! 13 13

14 Complex Casting

15 Getting to PSVI Consider a very simple XQuery import schema default element namespace " at "el-typed.xsd"; <instance-of> <constant name="sally"/> <atomic name="person"/> </instance-of>/element(*, ClassExpression) No results! Must validate! import schema default element namespace "..." at "el-typed.xsd"; validate {<instance-of> <constant name="sally"/> <atomic name="person"/> </instance-of>}/element(*, ClassExpression) Returns: <atomic xmlns="..." name="person"/> validate generates a PSVI Constructors don t validate! Casting only works with atomics 15 15

16 Complex Typed Transform Input and output all typed import schema namespace el=" at "el-typed.xsd"; import schema namespace owl=" at "owl2-xml.xsd"; declare namespace ex=" declare function ex:convertaxiom($ax as element(*, el:axiom)) as element(*, owl:axiom){ typeswitch ($ax) case schema-element(el:equivalent) return validate{<owl:equivalentclasses>{ for $expr in $ax/* return ex:convertexpression($expr)}</owl:equivalentclasses>} default return validate {<owl:equivalentclasses><owl:class IRI=" IRI=" owl:equivalentclasses>} }; declare function ex:convertexpression($expr as element(*, el:classexpression)) as element(*, owl:classexpression){ if ($expr instance of element(el:atomic)) then validate{<owl:class else validate {<owl:class IRI=" (:These would be easier if the elements were nilable:) }; declare function ex:convert($ont as element(*, el:ontology)) as element(owl:ontology, owl:ontology){ validate{ <owl:ontology> {for $e in $ont/element(*,el:axiom) return ex:convertaxiom($e)} </owl:ontology> } }; ex:convert(validate{doc("el1.xml")/*}) 16 16

17 Complex Typed Transform Input and output all typed <?xml version="1.0" encoding="utf-8"?> <owl:ontology xmlns:owl=" <owl:equivalentclasses> <owl:class IRI=" <owl:class IRI=" </owl:equivalentclasses> <owl:equivalentclasses> <owl:class IRI=" <owl:class IRI=" </owl:equivalentclasses> <owl:equivalentclasses> <owl:class IRI="Person"/> <owl:class IRI=" </owl:equivalentclasses> <owl:equivalentclasses> <owl:class IRI=" <owl:class IRI=" </owl:equivalentclasses> </owl:ontology> The only proper value 17 17

18 Type Soundness Type-inference rules are written in such a way that any value that can be returned by an expression is guaranteed to conform to the static type inferred for the expression. This property of a type system is called type soundness. A consequence of this property is that a query that raises no type errors during static analysis will also raise no type errors during execution on valid input data. The importance of type soundness depends somewhat on which errors are classified as "type errors," as we will see below. A (statically verified) type safe program has some guaranteed behavior and thus can be transformed or optimized in aggressive ways may be more brittle fails hard on invalid input less input is valid

19 Data Representations Data and data structures have representations (More or less) Physical embodiments (Ultimately) Bits in a machine The same data can have distinct representations 1 vs. one The same data structure can have distinct representations At different levels of abstraction One key distinction Internal ( in-memory ) Location doesn t really matter External ( on disk ) Generally: External representations are for exchange between (heterogeneous) systems 19 19

20 A Java Example (1) Consider a value of type int* We have several canonical external representations: Decimal: Hexadecimal: 1ADA3 (0x1ADA3 in source code) Octal: ( in source code) We have one (canonical) internal representation: 32 bit, signed two s complement (Each digit is a bit not a character) The representations are different Decimal size in memory: Approx** 48 bytes Internal rep: 4 bytes *We consider only ints, i.e., 32 bit integers ** ** See also:

21 A Java Example (2) We have APIs (the Integer class): Reading/Parsing/Deserializing/Unmarshalling Writing/Printing/Serializing/Marshalling ADT functions +, -, /, *, <, >, etc. For examining and manipulating the internal rep

22 JSON (1) Javascript has a rich set of literals (ext. reps) Atomic (numbers, booleans, strings*) 1, 2, true, I m a string Composite Arrays Ordered lists with random access [1, 2, one, two ] Objects Associative arrays/dictionary { one :1, two :2} These can nest! [{ one :1, o1 :{ a1 : [1,2,3.0], a2 :[]}] JSON == roughly this subset of Javascript The internal representation varies In JS, 1 represents a 64 bit, IEEE floating point number In Python s json module, 1 represents a 32 bit integer in two s complement *Strings can be thought of as a composite, i.e., an array of characters, but not here

23 JSON (2) {"menu": { "id": "file", "value": "File", "popup": { "menuitem": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] } }} <menu id="file" value="file"> <popup> <menuitem value="new" onclick="createnewdoc()" /> <menuitem value="open" onclick="opendoc()" /> <menuitem value="close" onclick="closedoc()" /> </popup> </menu> Slightly different!

24 JSON (2.1) {"menu": [{ "id": "file", "value": "File"}, "popup": [ "menuitem": {"value": "New", "onclick": "CreateNewDoc()"}, "menuitem": {"value": "Open", "onclick": "OpenDoc()"}, "menuitem": {"value": "Close", "onclick": "CloseDoc()"} ] ] }} Still not right! Needed to preserve order! <menu id="file" value="file"> <popup> <menuitem value="new" onclick="createnewdoc()" /> <menuitem value="open" onclick="opendoc()" /> <menuitem value="close" onclick="closedoc()" /> </popup> </menu>

25 JSON (2.2) {"menu": [{"id": "file", "value": "File"}, [{"popup": [{}, [{"menuitem": [{"value": "New", "onclick": "CreateNewDoc()"},[]]}, {"menuitem": [{"value": "Open", "onclick": "OpenDoc()"},[]]}, {"menuitem": [{"value": "Close", "onclick": "CloseDoc()"},[]]} ] ] } ] ] } <menu id="file" value="file"> <popup> <menuitem value="new" onclick="createnewdoc()" /> <menuitem value="open" onclick="opendoc()" /> <menuitem value="close" onclick="closedoc()" /> </popup> </menu>

26 JSON (2.1) Recipe Elements are mapped to objects With one pair ElementName : contents Contents are a list First item is an object, the attributes Attributes are pairs of strings Second item is a list (of children) Empty elements require an explicit empty list No attributes requires an explicit empty object Cumbersome! 26 26

27 JSON vs. XML (expressivity) CLICK! Every XML WF DOM can be faithfully represented as a JSON object Every JSON object can be faithfully represented as an XML WF DOM Every WXS PSVI can be faithfully represented as a JSON object Every JSON object can be faithfully represented as a WXS PSVI 27 27

28 Conversion We can go from internal to external (i2e) Parsing, reading, loading, de-serializing, unmarshalling We can go from external to internal (e2i) Serializing, writing, printing, saving, marshalling Different systems may have different internals At least in detail Different applications may behave differently There and back again Roundtripping Internal to external to internal (e2i2e) External to internal to external (i2e2i) Ideally preserves key properties Which? When is ok not to preserve? 28 28

29 What is an XML Document? Layers A series of octets A series of unicode characters A series of events SAX perspective E.g., Start/End tags Events are tokens A tree structure A DOM/Infoset A tree of a certain shape A Validated Infoset An adorned tree of a certain shape A PSVI wrt an WXS Errors here mean no XML! SAX ErrorHandler Yay! XPath! XSLT! Etc. Types in play 29 29

30 What is an XML Document? Layers A series of octets A series of unicode characters A series of events SAX perspective E.g., Start/End tags Events are tokens A tree structure A DOM/Infoset A tree of a certain shape A Validated Infoset An adorned tree of a certain shape A PSVI wrt an WXS validate erase 30 30

31 What is an XML Document? Layers A series of octets A series of unicode characters A series of events SAX perspective E.g., Start/End tags Events are tokens A tree structure A DOM/Infoset A tree of a certain shape A Validated Infoset An adorned tree of a certain shape A PSVI wrt an WXS Same inputs can have different meanings! (external validation) 31 31

32 What is an XML Document? Layers A series of octets A series of unicode characters A series of events SAX perspective E.g., Start/End tags Events are tokens A tree structure A DOM/Infoset A tree of a certain shape A Validated Infoset An adorned tree of a certain shape A PSVI wrt an WXS Generally looks like <configuration xmlns=" edition="ee"> <serialization method="xml" /> </configuration> But can look otherwise! element configuration { attribute edition {"ee"}, element serialization {attribute method {"xml"}}} Same meaning, different spelling 32 32

33 What is an XML Document? Layers A series of octets A series of unicode characters A series of events SAX perspective E.g., Start/End tags Events are tokens A tree structure A DOM/Infoset A tree of a certain shape A Validated Infoset An adorned tree of a certain shape A PSVI wrt an WXS A picture (or document, or action, or ) Application meaning Can have many.....for the same meaning 33 33

34 The Essence of XML (with WXS) Thesis: XML is touted as an external format for representing data. Two properties Self-describing Destroyed by external validation Round-tripping Destroyed by defaults and union types

35 The Essence of XML (with WXS) Roundtripping issues Internal to external and back Take an element, foo, with content { one, 2, 3} It s (simple) type is a list of union of integer and string Serialise <foo>one 2 3</foo> Parse and validate Content is { one, 2, 3 } External to internal and back 001 to 1 to

36 The Essence of XML (with WXS) Conclusion: So the essence of XML is this: the problem it solves is not hard, and it does not solve the problem well. Itʼs not obvious That the issues are serious (enough) That the problem solved is all that easy That there arenʼt other, worse issues

37 S more Tree Grammars 37 37

38 Tree Grammars: a reminder Production rules are central to tree grammars reflect element declarations...to be read as follows N P (PA (FEd,SEd*)) for each w nodes(t) with children w1 w2... wn, there exists a rule X a e P such that match? r(w) = X, T(w) = a, and r(w1) r(w2)... r(wn) matches e. r(w)=n w P r(w1) = FEd w1? r(w2) =SEd w2?... r(wn) wn? =SEd then, for w1,w2,..: check FEd? e1 SEd? e

39 Tree Grammars: 3 more things A single-type grammar can have no more than one run on a tree. A regular grammar can have more than one run on a tree. BTW, w.l.o.g., we can assume that no two production rules have the same non-terminal on the left hand side and the same terminal. I.e., no N P PA and N P (Editor,Editor*). We can also rewrite those, e.g., to N P (PA (Editor,Editor*))...so, how did we get here? From DTDs and XML schemas! 39 39

40 Tree Grammars and DTDs since DTDs don t have types, just element names, they correspond to grammars of a peculiar, simple kind: <!ELEMENT T (N1,N2*)> <!ELEMENT N1 (M (M,M))> <!ELEMENT N2 (#PCDATA)> <!ELEMENT M (#PCDATA)> F = (N, Σ, S, P) with N = {T, N1, N2, M, pcdata} Σ = {T, N1, N2, M, pcdata} S = {T} P = { T T (N1,N2*), N1 N1 (M (M,M)), N2 N2 pcdata, 0,0,0 pcdata M M pcdata, Tree grammars for DTDs are always local pcdata pcdata ε}...even if the DTD has a non-deterministic content model <!ELEMENT N1 (M (M,M))> is not deterministic and thus illegal (but can be replaced with <!ELEMENT N1 (M,(M ε))>) ε 0 N1 M 0,0 T 1 N2 pcdata 1,

41 Remember?! in DTDs and in WXS, content models are further restricted (for compatibility with SGML) [DTD] determistic (or 1-unambiguous), e.g., (M (M,M)) is not deterministic, (M,(M ε)) is. e.g., ((b, c) (b, d)) is not deterministic, b,(c d) is. From As noted in Element Content, it is required that content models in element type declarations be deterministic. This requirement is for compatibility with SGML (which calls deterministic content models "unambiguous"); XML processors built using SGML systems may flag non-deterministic content models as errors. More formally: a finite state automaton may be constructed from the content model using the standard algorithms, e.g. algorithm 3.5 in section 3.9 of Aho, Sethi, and Ullman [Aho/Ullman]. In many such algorithms, a follow set is constructed for each position in the regular expression (i.e., each leaf node in the syntax tree for the regular expression); if any position has a follow set in which more than one following position is labeled with the same element type name, then the content model is in error and may be reported as an error

42 Tree Grammars and DTDs so, DTDs are local (and thus single-type) because they don t have types at all and not because their content model is deterministic! they are single-type even with non-deterministic content model hence we could extend DTDs with types and still be singletype...provided we impose suitable restrictions 42 42

43 Tree Grammars and WXS tree grammars also capture the basic, structural part of WXS: types (complex and anonymous) model groups (we ignore them) derivation by extension and restriction (we ignore them) substitution groups (we ignore them) integrity constraints like keys (must be ignored, don t fit into tree grammars) we only deal with simple XML schemas, but general approach works for more to transform an XML schema S into a tree grammar G, 1. we translate S into a generalized tree grammar 2. then flatten the generalized tree grammar into a tree grammar G this will be done such that T validates against S iff T is accepted by G

44 Translating WXS into Tree Grammars let S be a simple XML Schema for each top-level element in S of the form <xs:element name="mylist" type="blistt"></xs:element> add the following production rule to your grammar MYLIST mylist BLIST^TYPE add MYLIST, BLIST^TYPE to non-terminals, add mylist to terminals for each top-level element in S of the form <xs:element name="mylist"> <xs:complextype> <xs:sequence> <xs:element name="ename" type="compt" maxoccurs="unbounded"/> </xs:sequence> </xs:complextype> </xs:element> add the following production rules to your grammar MYLIST mylist ENAME,ENAME* what is the default ENAME ename COMP^TYPE for minoccurs? add MYLIST, ENAME, COMP^TYPE to non-terminals, add mylist, ename to terminals 44 44

45 Translating WXS into Tree Grammars for each top-level element in S of the form <xs:complextype name="blistt"> <xs:sequence> <xs:element name="friend" type='persont' minoccurs = ʻ1ʼ maxoccurs ='2'/> </xs:sequence> </xs:complextype> add the following production rules to your grammar BLIST^TYPE (FRIEND (FRIEND,FRIEND)) FRIEND friend PERSON^TYPE add BLIST^TYPE, FRIEND, PERSON^TYPE to non-terminals, add friend to terminals %% generalized rule: to be expanded!

46 Translating WXS into Tree Grammars for each top-level element in S of the form - <xs:complextype name="bblistt"> <xs:choice> <xs:sequence> <xs:element name="a" type="xs:string"/> <xs:element name="b" type="xs:string"/> </xs:sequence> <xs:sequence> <xs:element name="a" type="xs:string"/> <xs:element name="c" type="xs:string"/> </xs:sequence> </xs:choice> </xs:complextype> add the following production rules to your grammar BBLIST^TYPE (A,B) (A,C) A A STRING^TYPE B B STRING^TYPE C C STRING^TYPE %% generalized rule -- to be expanded! add BBLIST^TYPE, A, B, C, STRING^TYPE to non-terminals, add A, B, C to terminals %% UPA - violation: %% Oxygen complains! 46 46

47 Translating WXS into Tree Grammars Consider the following case: <xs:complextype name="at"> <xs:sequence> <xs:element name="n" type="alistt" minoccurs="0" maxoccurs="unbounded"/> </xs:sequence> </xs:complextype> <xs:complextype name="bt"> <xs:sequence> <xs:element name="n" type="blistt" minoccurs="0" maxoccurs="unbounded"/> </xs:sequence> </xs:complextype> To handle cases like the one above we can t always add rules AT^TYPE N*, BT^TYPE N* N N??LIST^TYPE Instead, we translate these as AT^TYPE N^AS^ALIST^TYPE* BT^TYPE N^AS^BLIST^TYPE* N^AS^ALIST^TYPE N ALIST^TYPE N^AS^BLIST^TYPE N BLIST^TYPE 47 47

48 Translating WXS into Tree Grammars Our translation yields almost a tree grammar: it produces illegal rules of the form X e, i.e., without non-terminal e.g., BLIST^TYPE (FRIEND (FRIEND,FRIEND)) our grammar model doesn t handle those (check definition of a run) hence we expand these illegal rules: pick illegal rule X e: remove X e from rule set replace all occurrences of X in rule set with e until no illegal rules are left in rule set e.g., MYLIST mylist BLIST^TYPE would be transformed into MYLIST mylist (FRIEND (FRIEND,FRIEND))...and if we had <xs:element name="yourlist" type="blist"/> then we also had YOURLIST yourlist BLIST^TYPE and thus YOURLIST yourlist (FRIEND (FRIEND,FRIEND)) 48 48

49 Translating WXS into Tree Grammars Expanding illegal rules even works with cyclic type definitions - try <xs:complextype name="nt"> <xs:choice> <xs:element name="test2" type="at"/> <xs:element name="endelement" type="xs:string"/> </xs:choice> </xs:complextype> <xs:complextype name="at"> <xs:choice> <xs:element name="test1" type="nt"/> <xs:element name="endelement" type="xs:string"/> </xs:choice> </xs:complextype> This gives you these rules, including 2 illegal rules NT^TYPE (TEST2 ENDELEMENT) TEST2 test2 AT^TYPE ENDELEMENT EndElement STRING^TYPE TEST1 test1 NT^TYPE ENDELEMENT EndElement STRING^TYPE AT^TYPE (TEST1 ENDELEMENT)...which can be expanded as follows: TEST2 test2 (TEST1 ENDELEMENT) ENDELEMENT EndElement STRING^TYPE TEST1 test1 (TEST2 ENDELEMENT) ENDELEMENT EndElement STRING^TYPE 49 49

50 WXS and Tree Grammars So, to transform an XML schema S into a tree grammar G, 1. we translate S into a generalized tree grammar G 2. then expand G into a tree grammar G Then any tree T validates against S iff T is accepted by G. So, what are the tree grammars we get as results? they are tree grammars are they single-type? are they local? Tree grammars corresponding to WXS are not local. E.g., consider N^AS^ALIST^TYPE N ALIST^TYPE N^AS^BLIST^TYPE N BLIST^TYPE.. N^AS^ALIST^TYPE and N^AS^BLIST^TYPE are competing! Reg ST Loc 50 50

51 WXS and Tree Grammars Tree grammars corresponding to WXS are single-type. This is ensured by the Unique Particle Attribution constraint in WXS. Tree grammars corresponding to DTDs are local,.hence DTDs are less expressive than XML schemata. Reg ST Loc That is, there are tree languages that we can describe in WXS, but not in DTDs, e.g.: N = {Book, PA, Editor, A, Paper, F, L} Σ = {B,N,A,P,C} S = {Book, Paper} P = { Book B Editor PA, Paper P PA, Editor N F,L, PA N L,A, F F ε, L L ε, A A ε } F 0,0 ε 0 B N L 0,1 L 0,0 ε 0 P N 0,1 A 51

52 Remember: In XML Schema, content model is constrainted as well to make validation easier & for compatibility with SGML e.g., through Unique Particle Attribute Constraint: A content model must be formed such that during validation of an element information item sequence, the particle component contained directly, indirectly or implicitly therein with which to attempt to validate each item in the sequence in turn can be uniquely determined without examining the content or attributes of that item, and without any information about the items in the remainder of the sequence. Rephrasing: a content model M must be formed such that, during validation of an element E s childnode sequence E 1...E k, we can, starting from i = 1 and increasing, associate each E i with a single particle contained (possibly implicitly) in M without examining the content or attributes of E i, and without any information about any E j with j >i

53 Content models & types in DTD & WXS (we already know that) in WXS, we have a type hierarchy an element of a type X derived by restriction or extension from Y can be used in place of an element of type Y but you have to say so explicitly: we call this named typing: sub-types are declared (restriction or extension), and not inferred (by comparing structure) in DTDs, we don t have types! <person phone="2"> <Name>Peter</Name> <DoB> </DoB></person> <person xsi:type="longpersontype" phone="5432"> <Name>Paul</Name> <DoB> </DoB> <address>manchester</address></person> In order to prevent difficulties in WXS as caused by types, Element Declarations Consistent constraint is imposed: <xs:complextype> <xs:sequence> <xs:element name="person" type= "NewPersonType" minoccurs="0" maxoccurs="1"/> <xs:element name="person" type= "OldPersonType" minoccurs="0" maxoccurs="1"/> </xs:sequence> </xs:complextype> 53 53

54 Outlook: next steps we have now seen that DTDs $ local grammars WXS $ single-type grammars DTDs are structurally weaker than WXS Reg ST Loc RelaxNG: an even stronger schema language RelaxNG $ regular grammars DTDs are structurally weaker than WXS we will also look into how computationally expensive validation is against DTD/local grammar against WXS/single-type grammar against RelaxNG/regular grammar...all roughly the same! 54 54

55 Relax NG, a very powerful schema language 55 55

56 Relax NG: yet another schema language Relax NG was designed to be a simpler schema language (described in a readable on-line book by Eric Van der Vlist) and allows us to describe (valid) XML documents in terms of their tree abstractions: no default attributes no entity declarations no key/uniqueness constraints minimal datatypes: only token and string like DTDs (but a mechanism to use XSD datatypes) since it is so simple/flexible it s (claimed to be) easy to use it doesn t have complex constraints on description of element content like determinism/1-unambiguity it s claimed to be reliable but you need other tools to do other things (like datatypes and attributes) 56 56

57 Relax NG: another side of Determinism remember that DTDs and WXS required their content models to be [DTD] deterministic (and thus look-ahead-free) [WXS] deterministic (EDC, every matching child node sequence matches in exactly one way only) [WXS] UPA constraint expresses both and other constraints even more determinism & single-typeness have a reason: some tools annotate a (valid) document while parsing: type information -- to be exploited, e.g., for concise queries (remember assignment?) default attribute values if your schema is not single-type, then tools validating the same document against the same schema may construct different PSVIs this can happen with different tools or different runs of the same tool 57 57

58 RelaxNG: another side of Validation Reasons why one would want to validate an XML document: ensure that structure is ok ensure that values in elements/attributes are of the correct type generate PSVI to work with check constraints on co-occurrence of elements/how they are related check other integrity constraints, eg. a person age vs. their mother s age check constraints on elements/their value against external data postcode correctness VAT/tax/other numeric constraints spell checking...only few of these checks can be carried out by validating against schemas... Relax NG was designed to 1. validate structure and 2. link to datatype validators to type check values of elements/attributes 58 58

59 Relax NG: basic principles Relax NG is based on patterns (similar to XPath expressions): a pattern is a description of a set of valid node sets we can view our example as different combinations of different parts, and design patterns for each enhanced flexibility <?xml version="1.0" encoding="utf-8"?> <people> <person age="41"> <name> <first>harry</first> <last>potter</last> </name> <address>4 Main Road </address> <project type="epsrc" id="1"> DeCompO </project> <project type="eu" id="3"> TONES </project> </person> <person>... </people> 59 59

60 Relax NG: good to know Relax NG comes in 2 syntaxes the compact syntax succinct human readable the XML syntax ü ü verbose machine readable Trang converts between the two, pfew! (and also into/from other schema languages) Trang can be used from Oxygen grammar { start = element name { element first { text }, element last { text } }} <grammar xmlns=" xmlns:a=" datatypelibrary=" <start> <element name="name"> <element name="first"><text/></element> <element name="first"><text/></element> </element> </start> </grammar> 60 60

61 Relax NG - structure validation: 3 kinds of patterns, for the 3 central nodes: text attribute element these can be combined ordered groups unordered groups choices we can constrain cardinalities of patterns text nodes can be marked as data and linked we can specify libraries of patterns element name { element first { text }, element last { text }} is a RelaxNG schema for (parts of) this: <?xml version="1.0" encoding="utf-8"?> <people> <person age="41"> <name> <first>harry</first> <last>potter</last> </name> <address>4 Main Road </address> <project type="epsrc" id="1"> DeCompO </project> <project type="eu" id="3"> TONES </project> </person> <person>... </people> 61 61

62 Relax NG: ordered groups we can name patterns in strange chains we can use?, *, and +: grammar { start = element people {people-content} people-content = element person { person-content }+ person-content = attribute age { text },! element name {name-content},! element address { text }+,! element project {project-content}* name-content = element first { text },! element middle { text }?,! element first { text } project-content = attribute type { text },! attribute id {text},! text }! is a RelaxNG schema for this " use? if optional <?xml version="1.0" encoding="utf-8"?> <people> <person age="41"> <name> <first>harry</first> <last>potter</last> </name> <address>4 Main Road </address> <project type="epsrc" id="1"> DeCompO </project> <project type="eu" id="3"> TONES </project> </person> <person>... </people> 62 62

63 Relax NG: ordered groups in XML syntax (Trang knows ) our schema in compact syntax: our schema in XML syntax: grammar { start = element people {people-content} people-content = element person { person-content }+ person-content = attribute age { text },! element name {name-content},! element address { text }+,! element project {project-content}* name-content = element first { text },! element middle { text }?,! element first { text } project-content = attribute type { text },! attribute id {text},! text } use Trang to convert <?xml version="1.0" encoding="utf-8"?> <grammar xmlns=" <start> <element name="people"> <ref name="people-content"/> </element> </start> <define name="people-content"> <oneormore> <element name="person"> <ref name="person-content"/> </element> </oneormore> </define> <define name="person-content"> <attribute name="age"/> <element name="name"> <ref name="name-content"/> </element> <oneormore> <element name="address"> <text/> </element> </oneormore> <zeroormore> <element name="project"> <ref name="project-content"/> </element> </zeroormore> </define> <define name="name-content"> <element name="first"> <text/> </element> <optional> <element name="middle"> <text/> </element> 63 63

64 Relax NG: different styles so far, we modelled element centric...we can model content centric : grammar { start = element people {people-content} people-content = element person { person-content }+ person-content = attribute age { text },! element name {name-content},! element address { text }+,! element project {project-content}* name-content = element first { text },! element middle { text }?,! element first { text } project-content = attribute type { text },! attribute id {text},! text } grammar { start = people-element people-element = element people { person-element+ } person-element = element person {! attribute age { text },! name-element,! address-element+,! project-element*} name-element = element name {! element first { text },! element middle { text }?,! element last { text } } address-element = element address { text } project-element = element project {! attribute type { text },! attribute id {text},! text }} 64 64

65 Relax NG - structure validation: ordered groups we can combine patterns in fancy ways: grammar {start = element people {people-content} people-content = element person { person-content }+ person-content = HR-stuff,! contact-stuff HR-stuff = attribute age { text },! project-content contact-stuff = attribute phone { text },! element name {name-content},! element address { text } name-content = element first { text },! element middle { text }?,! element first { text } project-content = element project { attribute type { text },! attribute id {text},! text }+} <?xml version="1.0" encoding="utf-8"?> <people> <person age="41"> <name> <first>harry</first> <last>potter</last> </name> <address>4 Main Road </address> <project type="epsrc" id="1"> DeCompO </project> <project type="eu" id="3"> TONES </project> </person> <person>... </people> 65 65

66 Relax NG: structure validation summary Relax NG s specification of structure differs from DTDs and XSD: grammar oriented 2 syntaxes with automatic translation flexible: we can gather different aspects of elements into different patterns unconstrained: no constraints regarding unambiguity/1-ambiguity/deterministic content model/unique Particle Constraints/Element Declarations Consistent like for XSD, we have an ALL construct for unordered groups, interleave &: here, the patterns must appear in the specified order, (except for attributes, which are allowed to appear in any order in the start tag): element person { attribute age { text}, attribute phone { text}, name-element, address-element+, project-element*} here, the patterns can appear any order: element person { attribute age { text } & attribute phone { text} & name-element & address-element+ & project-element*} 66 66

67 Translating Relax NG into tree grammars by example 1 grammar { start = AddressBook AddressBook = element addressbook { Card* } Card = element card { Inline } Inline = Name, + Name = element name { text } = element { text } } Translate into G=(N, Σ, S, P) with N = {AddressBook, Card, Inline, Name, , Pcdata} Σ = {addressbook, card, name, , pcdata} S = {AddressBook} P = {AddressBook addressbook Card*, Card card Inline, Inline Name, +, Name name Pcdata, Pcdata, Pcdata pcdata ϵ } element y y Σ...possibly also uppercased copy Y N all other user defined symbols X X N...translate Relax NG rules easy (depending on Relax NG style)...let s see one more 67 67

68 Translating Relax NG into tree grammars by example 2 grammar { start = p-el p-el = element people { per-el+ } per-el = element person { attribute age { text }, na-el, ad-el+, pro-el*} na-el = element name { element first { text }, element middle { text }?, element last { text } } ad-el = element address { text } pro-el = element project { attribute type { text }, attribute id {text}, text }} Ignore! Ignore! Translate into G = (N, Σ, S, P) with N = {P-EL, PER-EL, NA-EL, AD-EL, PRO-EL, FIRST, MIDDLE, LAST, Pcdata} Σ = {people, person, name, first, middle, last, address, project} S = {P-EL} P = {P-EL people PER-EL, PER-EL*, PER-EL person NA-EL,AD-EL, AD-EL*,PRO-EL* NA-EL name FIRST, (MIDDLE ε), LAST, FIRST first Pcdata, MIDDLE middle Pcdata, LAST last Pcdata, AD-EL address Pcdata, PRO-EL project Pcdata, Pcdata pcdata ϵ } This Relax NG style makes translation of rules easy 68 68

69 Translating Relax NG into tree grammars by example 3 This Relax NG style makes translation of rules less easy and leads to generalized rules! grammar { start = element people {people-content} people-content = element person { person-content }+ person-content = attribute age { text }, element name {name-content}, element address { text }+, element project {project-content}* name-content = element first { text }, element middle { text }?, element last { text } project-content = attribute type { text }, attribute id {text}, text } Ignore! Translate into G=(N, Σ, S, P) with N = {PEOPLE, P-C, PER-C, NA, NA-C, PERSON, PRO-C,ADR, PROJ, PRO-C, FIRST, MIDDLE,LAST, Pcdata} Σ = {people, person, name, first, middle, last, address, project} S = {PEOPLE} P = {PEOPLE people P-C, P-C PERSON, PERSON*, PERSON person PER-C, PER-C NA, ADR, ADR*,PROJ, NA name NA-C, ADR address Pcdata, PROJ project PRO-C, PRO-C pcdata ϵ, NA-C FIRST,(MIDDLE ϵ),last FIRST first Pcdata, MIDDLE middle Pcdata, LAST last Pcdata, Pcdata pcdata ϵ } Ignore! expand! expand! 69 69

70 Translating Relax NG into tree grammars by example 3... people-content = element person { person-content }+... person-content = attribute age { text }, element name {name-content}, element address { text }+, element project {project-content}*... PERSON person PER-C, PER-C NA, ADR, ADR*,PROJ, NA name NA-C, ADR address Pcdata,... expand! Two things we have already seen when translating WXS: that we might need to introduce generalized rules -- which can & need to be expanded, as for WXS: for each illegal rule X e: remove X e from rule set replace all occurrences of X in rule set with e we might have to contextualise names and types of elements:

71 Translating Relax NG into tree grammars by example 4 2. we might have to contextualise names and types of elements, to handle schemas where the same element name is used in different contexts with different types:... people-content = element person { person-content }+, element friend {friend-content }+... person-content = attribute age { text }, element name {name-content},... friend-content = attribute age { text }, element name {friend-name-content}, P-C PERSON, PERSON*,FRIEND,FRIEND* PERSON person PER-C, FRIEND friend FRIE-C, PER-C NA^NA-C,... FRIE-C NA^FRIE-NA-C,... NA^NA-C name NA-C, NA^FRIE-NA-C name FRIE-NA-C,

72 Translating Relax NG into tree grammars each Relax NG schema can be faithfully translated into a tree grammar: local? no: example on previous slide leads to competing non-terminals (NA^PER-C and NA^FRIE-C)... NA^PER-C name NA-C, NA^FRIE-C name NA-C, single-type? no: see example below... NA^NA-C and NA^FO-NA-C compete and occur in the same RHS... person-content = attribute age { text }, element name {name-content} element name {foreign-name-content}, PER-C NA^NA-C NA^FO-NA-C NA^NA-C name NA-C, NA^FO-NA-C name FO-NA-C,... so is Relax NG as powerful as tree grammars? 72 72

73 Relax NG schema is indeed as powerful as tree grammars Every tree grammar can be faithfully translated into a Relax NG schema. Proof (not too hard): given a tree grammar G = (N, Σ, S, P), 1. translate each production rule N t regexp in P into N = element t { regexp } (fortunately, the tree grammar regular expression syntax is very close to and more strict than Relax NG regular expression syntax) 2. Put the resulting statements into a grammar, where N1,..., Nk are all start symbols, i.e., S = {N1,..., Nk} 3. Call the resulting schema GS grammar {start = N1... Nk... } Then T L(G) if and only if T validates against GS

74 Tree Grammars and Schema Languages Reg ST Loc with our knowledge Relax NG WXS DTD 74 74

75 Outlook: next steps we have now seen that DTDs $ local grammars WXS $ single-type grammars RelaxNG $ regular grammars Reg ST Loc DTDs are structurally weaker than WXS DTDs are structurally weaker than WXS we will also look into how computationally expensive validation is against DTD/local grammar against WXS/single-type grammar against RelaxNG/regular grammar...all roughly the same! 75 75

76 How costly is validaty testing? Does it matter against which kind of schema? Is Single-Type cheaper than general? 76 76

77 See the paper by Murata, Lee, Mani, Kawaguchi Schema Languages and Tree Grammars We will look at: the problem of algorithms for validating a document against a schema! Tree T Grammar G algorithm yes, if T L(G) no, otherwise 77 77

78 Tree T Grammar G ValAlgo yes, if T L(G) no, otherwise To design our schema validator, 1. we start with the easy case: assume that G is local (this gives us automatically a validator for structural aspect of DTDs) 2. then expand algorithm to single-type (this gives us automatically a validator for structural aspect of WXS) 3. then expand to general tree grammars (...Relax NG) we also assume that we have a subroutine yes, if w L(e), (w matches e) String w MatchAlgo regular expression e no, otherwise to see how to build that one (it s based on a translation of regular expressions into finite state machines (aka automata)), consult remember your undergraduate studies (?) read it up, e.g., in the textbook by Hopcroft, Ullman 78 78

79 XML doc/tree T local Grammar G ValAlgo yes, if T L(G) no, otherwise let s start simple! Reg ST Loc 79 79

80 General idea of algorithm our algorithm visits a tree in a depth-first, left-2-to-right manner whenever we visit a node on our way down, we push relevant information for this node on stacks up, we pop relevant information for this node from stacks hence, whenever we are at a node n during this traversal, all relevant information regarding all ancestors of n are (in reverse order), on our stacks Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E s tag name then push N a e onto R and push ϵ onto NT else report not accepted and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w of non-terminals out of NT and push w N onto NT else report not accepted and stop 80 80

81 See the paper by Murata, Lee, Mani, Kawaguchi XML doc/tree T local Grammar G ValAlgo yes, if T L(G) no, otherwise Input: DOM Tree for T, local tree grammar G = (N, Σ, S, P), NT is a stack of strings of non-terminals R is a stack of production rules Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E s tag name then push N a e onto R and push ϵ onto NT else report not accepted and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w of non-terminals out of NT and push w N onto NT else report not accepted and stop report accepted and stop to store NTs of child nodes locality unique store rule for E s content in R start remembering E s child nodes retrieve rule for E s content in R retrieve E s child nodes add E s terminal node to its predecessor siblings 81 81

82 XML doc/tree T local Grammar G ValAlgo yes, if T L(G) no, otherwise Let s see how algorithm works: G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C) C, C c ϵ C} b a b Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, if there is a production rule N a e in P with a = E s tag name then push N a e onto R and push ϵ onto NT else report not accepted and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT Stack of if w matches e then pop a string w of non-terminals NT strings out of NT and push w N onto NT else report not accepted and stop R NT c Stack of rules c c 82 82

83 XML doc/tree T local Grammar G ValAlgo yes, if T L(G) no, otherwise Let s see how algorithm works: G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C) C, C c ϵ C} b a b Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, c c c if there is a production rule N a e in P with a = E s tag name then push N a e onto R and push ϵ onto NT else report not accepted and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w of non-terminals out of NT and push w N onto NT S a B,B* ϵ else report not accepted and stop R NT

84 XML doc/tree T local Grammar G ValAlgo yes, if T L(G) no, otherwise Let s see how algorithm works: G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C) C, C c ϵ C} b a b Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, c c c if there is a production rule N a e in P with a = E s tag name then push N a e onto R and push ϵ onto NT else report not accepted and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w of non-terminals out of NT and push w N onto NT else report not accepted and stop B b (C,C) C S a B,B* R ϵ ϵ NT 84 84

85 XML doc/tree T local Grammar G ValAlgo yes, if T L(G) no, otherwise Let s see how algorithm works: G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C) C, C c ϵ C} b a b Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, c c c if there is a production rule N a e in P with a = E s tag name then push N a e onto R and push ϵ onto NT else report not accepted and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w of non-terminals out of NT and push w N onto NT else report not accepted and stop C c ϵ C B b (C,C) C S a B,B* R ϵ ϵ ϵ NT 85 85

86 XML doc/tree T local Grammar G ValAlgo yes, if T L(G) no, otherwise Let s see how algorithm works: G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C) C, C c ϵ C} b a b Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, c c c if there is a production rule N a e in P with a = E s tag name then push N a e onto R and push ϵ onto NT else report not accepted and stop C c ϵ C yes, ϵ L(ϵ C) ϵ When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w of non-terminals out of NT and push w N onto NT else report not accepted and stop B b (C,C) C S a B,B* R ϵ ϵ NT 86 86

87 XML doc/tree T local Grammar G ValAlgo yes, if T L(G) no, otherwise Let s see how algorithm works: G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C) C, C c ϵ C} b a b Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, c c c if there is a production rule N a e in P with a = E s tag name then push N a e onto R and push ϵ onto NT else report not accepted and stop C c ϵ C When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w of non-terminals out of NT and push w N onto NT else report not accepted and stop B b (C,C) C S a B,B* R C ϵ NT 87 87

88 XML doc/tree T local Grammar G ValAlgo yes, if T L(G) no, otherwise Let s see how algorithm works: G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C) C, C c ϵ C} b a b Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, c c c if there is a production rule N a e in P with a = E s tag name then push N a e onto R and push ϵ onto NT else report not accepted and stop When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w of non-terminals out of NT and push w N onto NT else report not accepted and stop C c ϵ C B b (C,C) C S a B,B* R ϵ C ϵ NT 88 88

89 XML doc/tree T local Grammar G ValAlgo yes, if T L(G) no, otherwise Let s see how algorithm works: G = ({S,B,C},{a,b,c},{S},P) with P = { S a B,B*, B b (C,C) C, C c ϵ C} b a b Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down, c c c if there is a production rule N a e in P with a = E s tag name then push N a e onto R and push ϵ onto NT else report not accepted and stop C c ϵ C yes, ϵ L(ϵ C) ϵ When an element E is visited on way up, pop a rule N a e out of R pop a string of non-terminals w out of NT if w matches e then pop a string w of non-terminals out of NT and push w N onto NT else report not accepted and stop B b (C,C) C S a B,B* R C ϵ NT 89 89

COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML Schema, and Tree Grammars XSLT

COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML Schema, and Tree Grammars XSLT COMP60411 Semi-structured Data and the Web Datatypes Relax NG, XML Schema, and Tree Grammars XSLT Bijan Parsia and Uli Sattler University of Manchester 1 Datatypes and representations 2 XQuery, schemas,

More information

Marker s feedback version

Marker s feedback version Two hours Special instructions: This paper will be taken on-line and this is the paper format which will be available as a back-up UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Semi-structured Data

More information

Java EE 7: Back-end Server Application Development 4-2

Java EE 7: Back-end Server Application Development 4-2 Java EE 7: Back-end Server Application Development 4-2 XML describes data objects called XML documents that: Are composed of markup language for structuring the document data Support custom tags for data

More information

The XQuery Data Model

The XQuery Data Model The XQuery Data Model 9. XQuery Data Model XQuery Type System Like for any other database query language, before we talk about the operators of the language, we have to specify exactly what it is that

More information

Introduction Syntax and Usage XML Databases Java Tutorial XML. November 5, 2008 XML

Introduction Syntax and Usage XML Databases Java Tutorial XML. November 5, 2008 XML Introduction Syntax and Usage Databases Java Tutorial November 5, 2008 Introduction Syntax and Usage Databases Java Tutorial Outline 1 Introduction 2 Syntax and Usage Syntax Well Formed and Valid Displaying

More information

XML: Extensible Markup Language

XML: Extensible Markup Language XML: Extensible Markup Language CSC 375, Fall 2015 XML is a classic political compromise: it balances the needs of man and machine by being equally unreadable to both. Matthew Might Slides slightly modified

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1 Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.

More information

Part VII. Querying XML The XQuery Data Model. Marc H. Scholl (DBIS, Uni KN) XML and Databases Winter 2005/06 153

Part VII. Querying XML The XQuery Data Model. Marc H. Scholl (DBIS, Uni KN) XML and Databases Winter 2005/06 153 Part VII Querying XML The XQuery Data Model Marc H. Scholl (DBIS, Uni KN) XML and Databases Winter 2005/06 153 Outline of this part 1 Querying XML Documents Overview 2 The XQuery Data Model The XQuery

More information

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington CS330 Lecture April 8, 2003 1 Overview From HTML to XML DTDs Querying XML: XPath Transforming XML: XSLT

More information

XML Schema and alternatives

XML Schema and alternatives XML Schema and alternatives Patryk Czarnik XML and Applications 2016/2017 Lecture 4 24.03.2017 Some possibilities of XML Schema we have not learnt too much Deriving complex types by restriction restriction

More information

XML Structures. Web Programming. Uta Priss ZELL, Ostfalia University. XML Introduction Syntax: well-formed Semantics: validity Issues

XML Structures. Web Programming. Uta Priss ZELL, Ostfalia University. XML Introduction Syntax: well-formed Semantics: validity Issues XML Structures Web Programming Uta Priss ZELL, Ostfalia University 2013 Web Programming XML1 Slide 1/32 Outline XML Introduction Syntax: well-formed Semantics: validity Issues Web Programming XML1 Slide

More information

Week 4. COMP62342 Sean Bechhofer, Uli Sattler

Week 4. COMP62342 Sean Bechhofer, Uli Sattler Week 4 COMP62342 Sean Bechhofer, Uli Sattler sean.bechhofer@manchester.ac.uk, uli.sattler@manchester.ac.uk Today Some clarifications from last week s coursework More on reasoning: extension of the tableau

More information

XML and Web Application Programming

XML and Web Application Programming XML and Web Application Programming Schema languages for XML XML in programming languages Web application frameworks Copyright 2007 Anders Møller 2 Part I Part II Schema languages for

More information

COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness & Errors Week 4. Bijan Parsia & Uli SaJler University of Manchester

COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness & Errors Week 4. Bijan Parsia & Uli SaJler University of Manchester COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness & Errors Week 4 Bijan Parsia & Uli SaJler University of Manchester 1 SE2 General Feedback use a good spell checker answer the quesuon

More information

Progress Report on XQuery

Progress Report on XQuery Progress Report on XQuery Don Chamberlin Almaden Research Center May 24, 2002 History Dec. '98: W3C sponsors workshop on XML Query Oct. '99: W3C charters XML Query working group Chair: Paul Cotton About

More information

CS561 Spring Mixed Content

CS561 Spring Mixed Content Mixed Content DTDs define mixed content by mixing #PCDATA into the content model DTDs always require mixed content to use the form (#PCDATA a b )* the occurrence of elements in mixed content cannot be

More information

Last week we saw how to use the DOM parser to read an XML document. The DOM parser can also be used to create and modify nodes.

Last week we saw how to use the DOM parser to read an XML document. The DOM parser can also be used to create and modify nodes. Distributed Software Development XML Schema Chris Brooks Department of Computer Science University of San Francisco 7-2: Modifying XML programmatically Last week we saw how to use the DOM parser to read

More information

COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness & Errors Week 4. Bijan Parsia & Uli SaJler University of Manchester

COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness & Errors Week 4. Bijan Parsia & Uli SaJler University of Manchester COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness & Errors Week 4 Bijan Parsia & Uli SaJler University of Manchester!1 SE2 General Feedback use a good spell & grammar checker answer

More information

The Essence of XML. Jérôme Siméon, Bell Labs, Lucent Philip Wadler, Avaya Labs

The Essence of XML. Jérôme Siméon, Bell Labs, Lucent Philip Wadler, Avaya Labs The Essence of XML Jérôme Siméon, Bell Labs, Lucent Philip Wadler, Avaya Labs The Evolution of Language 2x (Descartes) λx. 2x (Church) (LAMBDA (X) (* 2 X)) (McCarthy)

More information

XML Research for Formal Language Theorists

XML Research for Formal Language Theorists XML Research for Formal Language Theorists Wim Martens TU Dortmund Wim Martens (TU Dortmund) XML for Formal Language Theorists May 14, 2008 1 / 65 Goal of this talk XML Research vs Formal Languages Wim

More information

Introduction to Semistructured Data and XML

Introduction to Semistructured Data and XML Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of Washington Database Management Systems, R. Ramakrishnan 1 How the Web is Today HTML documents often

More information

Big Data 9. Data Models

Big Data 9. Data Models Ghislain Fourny Big Data 9. Data Models pinkyone / 123RF Stock Photo 1 Syntax vs. Data Models Physical view Syntax this is text. 2 Syntax vs. Data Models a Logical view

More information

BEAWebLogic. Integration. Transforming Data Using XQuery Mapper

BEAWebLogic. Integration. Transforming Data Using XQuery Mapper BEAWebLogic Integration Transforming Data Using XQuery Mapper Version: 10.2 Document Revised: March 2008 Contents Introduction Overview of XQuery Mapper.............................................. 1-1

More information

CLIENT-SIDE XML SCHEMA VALIDATION

CLIENT-SIDE XML SCHEMA VALIDATION Factonomy Ltd The University of Edinburgh Aleksejs Goremikins Henry S. Thompson CLIENT-SIDE XML SCHEMA VALIDATION Edinburgh 2011 Motivation Key gap in the integration of XML into the global Web infrastructure

More information

Big Data for Engineers Spring Data Models

Big Data for Engineers Spring Data Models Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone / 123RF Stock Photo CSV (Comma separated values) This is syntax ID,Last name,first name,theory, 1,Einstein,Albert,"General, Special

More information

COS 320. Compiling Techniques

COS 320. Compiling Techniques Topic 5: Types COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 Types: potential benefits (I) 2 For programmers: help to eliminate common programming mistakes, particularly

More information

Tokens, Expressions and Control Structures

Tokens, Expressions and Control Structures 3 Tokens, Expressions and Control Structures Tokens Keywords Identifiers Data types User-defined types Derived types Symbolic constants Declaration of variables Initialization Reference variables Type

More information

A new generation of tools for SGML

A new generation of tools for SGML Article A new generation of tools for SGML R. W. Matzen Oklahoma State University Department of Computer Science EMAIL rmatzen@acm.org Exceptions are used in many standard DTDs, including HTML, because

More information

Chapter 13 XML: Extensible Markup Language

Chapter 13 XML: Extensible Markup Language Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server

More information

CS152: Programming Languages. Lecture 11 STLC Extensions and Related Topics. Dan Grossman Spring 2011

CS152: Programming Languages. Lecture 11 STLC Extensions and Related Topics. Dan Grossman Spring 2011 CS152: Programming Languages Lecture 11 STLC Extensions and Related Topics Dan Grossman Spring 2011 Review e ::= λx. e x e e c v ::= λx. e c τ ::= int τ τ Γ ::= Γ, x : τ (λx. e) v e[v/x] e 1 e 1 e 1 e

More information

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University Lecture 7: Type Systems and Symbol Tables CS 540 George Mason University Static Analysis Compilers examine code to find semantic problems. Easy: undeclared variables, tag matching Difficult: preventing

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back

More information

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9 XML databases Jan Chomicki University at Buffalo Jan Chomicki (University at Buffalo) XML databases 1 / 9 Outline 1 XML data model 2 XPath 3 XQuery Jan Chomicki (University at Buffalo) XML databases 2

More information

Modelling XML Applications

Modelling XML Applications Modelling XML Applications Patryk Czarnik XML and Applications 2013/2014 Lecture 2 14.10.2013 XML application (recall) XML application (zastosowanie XML) A concrete language with XML syntax Typically defined

More information

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, errors, robustness week 4

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, errors, robustness week 4 COMP60411: Modelling Data on the Web Schematron, SAX, JSON, errors, robustness week 4 Bijan Parsia & Uli Sattler University of Manchester 1 SE2 General Feedback use a good spell checker answer the question

More information

Which Of The Following Is Not One Of The Built-in Data Types Of Xml Schema Definition

Which Of The Following Is Not One Of The Built-in Data Types Of Xml Schema Definition Which Of The Following Is Not One Of The Built-in Data Types Of Xml Schema Definition 2.5 DTD (Document Type Definition) Validation - Simple Example To develop an XML document and schema, start with a

More information

B4M36DS2, BE4M36DS2: Database Systems 2

B4M36DS2, BE4M36DS2: Database Systems 2 B4M36DS2, BE4M36DS2: Database Systems 2 h p://www.ksi.mff.cuni.cz/~svoboda/courses/171-b4m36ds2/ Lecture 2 Data Formats Mar n Svoboda mar n.svoboda@fel.cvut.cz 9. 10. 2017 Charles University in Prague,

More information

Goals: Define the syntax of a simple imperative language Define a semantics using natural deduction 1

Goals: Define the syntax of a simple imperative language Define a semantics using natural deduction 1 Natural Semantics Goals: Define the syntax of a simple imperative language Define a semantics using natural deduction 1 1 Natural deduction is an instance of first-order logic; that is, it is the formal

More information

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, Robustness & Errors Week 4

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, Robustness & Errors Week 4 COMP60411: Modelling Data on the Web Schematron, SAX, JSON, Robustness & Errors Week 4 Bijan Parsia & Uli Sattler University of Manchester 1 SE2 General Feedback use a good spell checker answer the question

More information

XML. Part II DTD (cont.) and XML Schema

XML. Part II DTD (cont.) and XML Schema XML Part II DTD (cont.) and XML Schema Attribute Declarations Declare a list of allowable attributes for each element These lists are called ATTLIST declarations Consists of 3 basic parts The ATTLIST keyword

More information

Intermediate Code Generation

Intermediate Code Generation Intermediate Code Generation In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target

More information

XML: some structural principles

XML: some structural principles XML: some structural principles Hayo Thielecke University of Birmingham www.cs.bham.ac.uk/~hxt October 18, 2011 1 / 25 XML in SSC1 versus First year info+web Information and the Web is optional in Year

More information

Programming Lecture 3

Programming Lecture 3 Programming Lecture 3 Expressions (Chapter 3) Primitive types Aside: Context Free Grammars Constants, variables Identifiers Variable declarations Arithmetic expressions Operator precedence Assignment statements

More information

EMERGING TECHNOLOGIES. XML Documents and Schemas for XML documents

EMERGING TECHNOLOGIES. XML Documents and Schemas for XML documents EMERGING TECHNOLOGIES XML Documents and Schemas for XML documents Outline 1. Introduction 2. Structure of XML data 3. XML Document Schema 3.1. Document Type Definition (DTD) 3.2. XMLSchema 4. Data Model

More information

XML. COSC Dr. Ramon Lawrence. An attribute is a name-value pair declared inside an element. Comments. Page 3. COSC Dr.

XML. COSC Dr. Ramon Lawrence. An attribute is a name-value pair declared inside an element. Comments. Page 3. COSC Dr. COSC 304 Introduction to Database Systems XML Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca XML Extensible Markup Language (XML) is a markup language that allows for

More information

2010 Martin v. Löwis. Data-centric XML. Other Schema Languages

2010 Martin v. Löwis. Data-centric XML. Other Schema Languages Data-centric XML Other Schema Languages Problems of XML Schema According to Schematron docs: No support for entities idiomatic or localized data types (date, time) not supported limited support for element

More information

Recursively Enumerable Languages, Turing Machines, and Decidability

Recursively Enumerable Languages, Turing Machines, and Decidability Recursively Enumerable Languages, Turing Machines, and Decidability 1 Problem Reduction: Basic Concepts and Analogies The concept of problem reduction is simple at a high level. You simply take an algorithm

More information

XPath with transitive closure

XPath with transitive closure XPath with transitive closure Logic and Databases Feb 2006 1 XPath with transitive closure Logic and Databases Feb 2006 2 Navigating XML trees XPath with transitive closure Newton Institute: Logic and

More information

Types. Type checking. Why Do We Need Type Systems? Types and Operations. What is a type? Consensus

Types. Type checking. Why Do We Need Type Systems? Types and Operations. What is a type? Consensus Types Type checking What is a type? The notion varies from language to language Consensus A set of values A set of operations on those values Classes are one instantiation of the modern notion of type

More information

On why C# s type system needs an extension

On why C# s type system needs an extension On why C# s type system needs an extension Wolfgang Gehring University of Ulm, Faculty of Computer Science, D-89069 Ulm, Germany wgehring@informatik.uni-ulm.de Abstract. XML Schemas (XSD) are the type

More information

CS5363 Final Review. cs5363 1

CS5363 Final Review. cs5363 1 CS5363 Final Review cs5363 1 Programming language implementation Programming languages Tools for describing data and algorithms Instructing machines what to do Communicate between computers and programmers

More information

M359 Block5 - Lecture12 Eng/ Waleed Omar

M359 Block5 - Lecture12 Eng/ Waleed Omar Documents and markup languages The term XML stands for extensible Markup Language. Used to label the different parts of documents. Labeling helps in: Displaying the documents in a formatted way Querying

More information

Big Data 12. Querying

Big Data 12. Querying Ghislain Fourny Big Data 12. Querying pinkyone / 123RF Stock Photo Declarative Languages What vs. How 2 Functional Languages for let order by if + any else = then every while where return exit with Expression

More information

Big Data 10. Querying

Big Data 10. Querying Ghislain Fourny Big Data 10. Querying pinkyone / 123RF Stock Photo 1 Declarative Languages What vs. How 2 Functional Languages for let order by if + any else = then every while where return exit with Expression

More information

HTML vs. XML In the case of HTML, browsers have been taught how to ignore invalid HTML such as the <mymadeuptag> element and generally do their best

HTML vs. XML In the case of HTML, browsers have been taught how to ignore invalid HTML such as the <mymadeuptag> element and generally do their best 1 2 HTML vs. XML In the case of HTML, browsers have been taught how to ignore invalid HTML such as the element and generally do their best when dealing with badly placed HTML elements. The

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Fall 2016 Lecture 7a Andrew Tolmach Portland State University 1994-2016 Values and Types We divide the universe of values according to types A type is a set of values and a

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Winter 2017 Lecture 7b Andrew Tolmach Portland State University 1994-2017 Values and Types We divide the universe of values according to types A type is a set of values and

More information

Control Flow. COMS W1007 Introduction to Computer Science. Christopher Conway 3 June 2003

Control Flow. COMS W1007 Introduction to Computer Science. Christopher Conway 3 June 2003 Control Flow COMS W1007 Introduction to Computer Science Christopher Conway 3 June 2003 Overflow from Last Time: Why Types? Assembly code is typeless. You can take any 32 bits in memory, say this is an

More information

Semantic Analysis. Lecture 9. February 7, 2018

Semantic Analysis. Lecture 9. February 7, 2018 Semantic Analysis Lecture 9 February 7, 2018 Midterm 1 Compiler Stages 12 / 14 COOL Programming 10 / 12 Regular Languages 26 / 30 Context-free Languages 17 / 21 Parsing 20 / 23 Extra Credit 4 / 6 Average

More information

Extensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013

Extensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013 Extensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013 2 Outline Introduction XML Structure Document Type Definition (DTD) XHMTL Formatting XML CSS Formatting XSLT Transformations

More information

XML Extensible Markup Language

XML Extensible Markup Language XML Extensible Markup Language Generic format for structured representation of data. DD1335 (Lecture 9) Basic Internet Programming Spring 2010 1 / 34 XML Extensible Markup Language Generic format for structured

More information

JAVASCRIPT AND JQUERY: AN INTRODUCTION (WEB PROGRAMMING, X452.1)

JAVASCRIPT AND JQUERY: AN INTRODUCTION (WEB PROGRAMMING, X452.1) Technology & Information Management Instructor: Michael Kremer, Ph.D. Class 1 Professional Program: Data Administration and Management JAVASCRIPT AND JQUERY: AN INTRODUCTION (WEB PROGRAMMING, X452.1) WHO

More information

XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013

XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013 Assured and security Deep-Secure XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013 This technical note describes the extensible Data

More information

XML extensible Markup Language

XML extensible Markup Language extensible Markup Language Eshcar Hillel Sources: http://www.w3schools.com http://java.sun.com/webservices/jaxp/ learning/tutorial/index.html Tutorial Outline What is? syntax rules Schema Document Object

More information

Software Engineering Methods, XML extensible Markup Language. Tutorial Outline. An Example File: Note.xml XML 1

Software Engineering Methods, XML extensible Markup Language. Tutorial Outline. An Example File: Note.xml XML 1 extensible Markup Language Eshcar Hillel Sources: http://www.w3schools.com http://java.sun.com/webservices/jaxp/ learning/tutorial/index.html Tutorial Outline What is? syntax rules Schema Document Object

More information

printf( Please enter another number: ); scanf( %d, &num2);

printf( Please enter another number: ); scanf( %d, &num2); CIT 593 Intro to Computer Systems Lecture #13 (11/1/12) Now that we've looked at how an assembly language program runs on a computer, we're ready to move up a level and start working with more powerful

More information

XML technology is very powerful, but also very limited. The more you are aware of the power, the keener your interest in reducing the limitations.

XML technology is very powerful, but also very limited. The more you are aware of the power, the keener your interest in reducing the limitations. XML technology is very powerful, but also very limited. The more you are aware of the power, the keener your interest in reducing the limitations. A key problem is rooted in the very paradigm of XML, which

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

CSCI-GA Scripting Languages

CSCI-GA Scripting Languages CSCI-GA.3033.003 Scripting Languages 12/02/2013 OCaml 1 Acknowledgement The material on these slides is based on notes provided by Dexter Kozen. 2 About OCaml A functional programming language All computation

More information

Programming Language Concepts, cs2104 Lecture 04 ( )

Programming Language Concepts, cs2104 Lecture 04 ( ) Programming Language Concepts, cs2104 Lecture 04 (2003-08-29) Seif Haridi Department of Computer Science, NUS haridi@comp.nus.edu.sg 2003-09-05 S. Haridi, CS2104, L04 (slides: C. Schulte, S. Haridi) 1

More information

H2 Spring B. We can abstract out the interactions and policy points from DoDAF operational views

H2 Spring B. We can abstract out the interactions and policy points from DoDAF operational views 1. (4 points) Of the following statements, identify all that hold about architecture. A. DoDAF specifies a number of views to capture different aspects of a system being modeled Solution: A is true: B.

More information

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End Architecture of Compilers, Interpreters : Organization of Programming Languages ource Analyzer Optimizer Code Generator Context Free Grammars Intermediate Representation Front End Back End Compiler / Interpreter

More information

Altova XMLSpy 2007 Tutorial

Altova XMLSpy 2007 Tutorial Tutorial All rights reserved. No parts of this work may be reproduced in any form or by any means - graphic, electronic, or mechanical, including photocopying, recording, taping, or information storage

More information

Schemachine. (C) 2002 Rick Jelliffe. A framework for modular validation of XML documents

Schemachine. (C) 2002 Rick Jelliffe. A framework for modular validation of XML documents June 21, 2002 Schemachine (C) 2002 Rick Jelliffe A framework for modular validation of XML documents This note specifies a possible framework for supporting modular XML validation. It has no official status

More information

Topics Covered Thus Far CMSC 330: Organization of Programming Languages

Topics Covered Thus Far CMSC 330: Organization of Programming Languages Topics Covered Thus Far CMSC 330: Organization of Programming Languages Names & Binding, Type Systems Programming languages Ruby Ocaml Lambda calculus Syntax specification Regular expressions Context free

More information

Pre-Discussion. XQuery: An XML Query Language. Outline. 1. The story, in brief is. Other query languages. XML vs. Relational Data

Pre-Discussion. XQuery: An XML Query Language. Outline. 1. The story, in brief is. Other query languages. XML vs. Relational Data Pre-Discussion XQuery: An XML Query Language D. Chamberlin After the presentation, we will evaluate XQuery. During the presentation, think about consequences of the design decisions on the usability of

More information

Weiss Chapter 1 terminology (parenthesized numbers are page numbers)

Weiss Chapter 1 terminology (parenthesized numbers are page numbers) Weiss Chapter 1 terminology (parenthesized numbers are page numbers) assignment operators In Java, used to alter the value of a variable. These operators include =, +=, -=, *=, and /=. (9) autoincrement

More information

XML. Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior

XML. Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior XML Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior XML INTRODUCTION 2 THE XML LANGUAGE XML: Extensible Markup Language Standard for the presentation and transmission of information.

More information

XML in Databases. Albrecht Schmidt. al. Albrecht Schmidt, Aalborg University 1

XML in Databases. Albrecht Schmidt.   al. Albrecht Schmidt, Aalborg University 1 XML in Databases Albrecht Schmidt al@cs.auc.dk http://www.cs.auc.dk/ al Albrecht Schmidt, Aalborg University 1 What is XML? (1) Where is the Life we have lost in living? Where is the wisdom we have lost

More information

XML (Extensible Markup Language)

XML (Extensible Markup Language) Basics of XML: What is XML? XML (Extensible Markup Language) XML stands for Extensible Markup Language XML was designed to carry data, not to display data XML tags are not predefined. You must define your

More information

Type Checking and Type Equality

Type Checking and Type Equality Type Checking and Type Equality Type systems are the biggest point of variation across programming languages. Even languages that look similar are often greatly different when it comes to their type systems.

More information

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

COMP9321 Web Application Engineering. Extensible Markup Language (XML) COMP9321 Web Application Engineering Extensible Markup Language (XML) Dr. Basem Suleiman Service Oriented Computing Group, CSE, UNSW Australia Semester 1, 2016, Week 4 http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2442

More information

DSD: A Schema Language for XML

DSD: A Schema Language for XML DSD: A Schema Language for XML Nils Klarlund, AT&T Labs Research Anders Møller, BRICS, Aarhus University Michael I. Schwartzbach, BRICS, Aarhus University Connections between XML and Formal Methods XML:

More information

5. Syntax-Directed Definitions & Type Analysis

5. Syntax-Directed Definitions & Type Analysis 5. Syntax-Directed Definitions & Type Analysis Eva Rose Kristoffer Rose NYU Courant Institute Compiler Construction (CSCI-GA.2130-001) http://cs.nyu.edu/courses/spring15/csci-ga.2130-001/lecture-5.pdf

More information

Data Types. (with Examples In Haskell) COMP 524: Programming Languages Srinivas Krishnan March 22, 2011

Data Types. (with Examples In Haskell) COMP 524: Programming Languages Srinivas Krishnan March 22, 2011 Data Types (with Examples In Haskell) COMP 524: Programming Languages Srinivas Krishnan March 22, 2011 Based in part on slides and notes by Bjoern 1 Brandenburg, S. Olivier and A. Block. 1 Data Types Hardware-level:

More information

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML 7.1 Introduction extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML Lax syntactical rules Many complex features that are rarely used HTML is a markup language,

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely

More information

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview COMP 181 Compilers Lecture 2 Overview September 7, 2006 Administrative Book? Hopefully: Compilers by Aho, Lam, Sethi, Ullman Mailing list Handouts? Programming assignments For next time, write a hello,

More information

Language Basics. /* The NUMBER GAME - User tries to guess a number between 1 and 10 */ /* Generate a random number between 1 and 10 */

Language Basics. /* The NUMBER GAME - User tries to guess a number between 1 and 10 */ /* Generate a random number between 1 and 10 */ Overview Language Basics This chapter describes the basic elements of Rexx. It discusses the simple components that make up the language. These include script structure, elements of the language, operators,

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking CSE450 Translation of Programming Languages Lecture 11: Semantic Analysis: Types & Type Checking Structure Project 1 - of a Project 2 - Compiler Today! Project 3 - Source Language Lexical Analyzer Syntax

More information

XML Introduction 1. XML Stands for EXtensible Mark-up Language (XML). 2. SGML Electronic Publishing challenges -1986 3. HTML Web Presentation challenges -1991 4. XML Data Representation challenges -1996

More information

Next Generation Query and Transformation Standards. Priscilla Walmsley Managing Director, Datypic

Next Generation Query and Transformation Standards. Priscilla Walmsley Managing Director, Datypic Next Generation Query and Transformation Standards Priscilla Walmsley Managing Director, Datypic http://www.datypic.com pwalmsley@datypic.com 1 Agenda The query and transformation landscape Querying XML

More information

Problem with Scanning an Infix Expression

Problem with Scanning an Infix Expression Operator Notation Consider the infix expression (X Y) + (W U), with parentheses added to make the evaluation order perfectly obvious. This is an arithmetic expression written in standard form, called infix

More information

Big Data Fall Data Models

Big Data Fall Data Models Ghislain Fourny Big Data Fall 2018 11. Data Models pinkyone / 123RF Stock Photo CSV (Comma separated values) This is syntax ID,Last name,first name,theory, 1,Einstein,Albert,"General, Special Relativity"

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure

More information

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

SEMANTIC ANALYSIS TYPES AND DECLARATIONS SEMANTIC ANALYSIS CS 403: Type Checking Stefan D. Bruda Winter 2015 Parsing only verifies that the program consists of tokens arranged in a syntactically valid combination now we move to check whether

More information

CS 417 9/18/17. Paul Krzyzanowski 1. Socket-based communication. Distributed Systems 03. Remote Procedure Calls. Sample SMTP Interaction

CS 417 9/18/17. Paul Krzyzanowski 1. Socket-based communication. Distributed Systems 03. Remote Procedure Calls. Sample SMTP Interaction Socket-based communication Distributed Systems 03. Remote Procedure Calls Socket API: all we get from the to access the network Socket = distinct end-to-end communication channels Read/write model Line-oriented,

More information

DBS2: Exkursus XQuery and XML-Databases. Jan Sievers Jens Hündling Lars Trieloff

DBS2: Exkursus XQuery and XML-Databases. Jan Sievers Jens Hündling Lars Trieloff DBS2: Exkursus XQuery and XML-Databases Jan Sievers Jens Hündling Lars Trieloff Motivation XML ubiquitous data exchange format Can be used to present Object data, relational data, semi-structured data

More information

The Prague Markup Language (Version 1.1)

The Prague Markup Language (Version 1.1) The Prague Markup Language (Version 1.1) Petr Pajas, Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics Revision History Revision 1.0.0 5 Dec 2005 Initial revision for UFAL

More information