XML - Schema Mario Arrigoni Neri 1
Well formed XML and valid XML Well formation is a purely syntactic property Proper tag nesting, unique root, etc.. Validation is more semantic, because it must take into account the meaning of data and the use of the document Commitment on document structure I'll send articles with a title, the list of authors, an abstract and a set of chapters, each one with its own title and some paragraphs. Each Image will have a caption... 2
XML in document access It could be hard to read by human beings Designed to be processed by artificial systems Designed to describe documents Metalanguage to define custom languages Simpler than SGML Focus on: Application Simple elaboration Standardization layer (parser) Tree-shaped documents XML parser DTD XML 3
XML in communication XML is also a format to send information Data communication between applications interoperability Standard services to validate tree structure is general enough to adapt to different applications and well characterized to build generic parsers DTD Appl. 1 Appl. 2 Internal format 1 XML Internal format 2 4
Why XML-S? DTD is sometimes too weak for our requirements DTD limits Namespaces: it is quite difficult to integrate DTD and namespaces, because DTD is part of original XML core definition Text elements: it is not possible to define constraints on texts and attributes No concept of typed text es: <course code= Knowledge Engineering > <number-of-students>marco</number-of-students> </course> Mixed contents: available only in the standard form (#PCDATA....)* Documentation: no support but XML comments (could be ignored by parsers) 5 DTD is NON written in XML!!!
XSD XSD (XML Schema Definition) is a particular application of XML used to describe rules to validate other languages Often replaces DTD Supports for qualification through namespaces Hyerarchical type system Text typization Types for contents Defines reusable fragments of the definition Allows for Mixed contents documentation written in XML 6 Drawbacks: it is more complex and verbose than DTD (1:4 size ratio)
XML Schema structure namespace http://www.w3.org/2001/xmlschema Root element <schema> <xsd:schema xmlns:xsd= http://www.w3.org/2001/xmlschema > </xsd:schema> Contains a set of definizions of types and elements XML Schema uses types to describe constraints on the content of elements and attributes <xsd:element name= type= /> <xsd:attribute name= type= /> Types can be: Simple: an instance of a simple type cannot contain attributes nor markup. Basically they are subsets of #PCDATA and CDATA Complex: corrspond to element content and mixed content in DTD 7
Simple types Like in classic programming languages, simple types can be built-in or user-defined Types are qualified. Ex: xsd:string built-in types: string, boolean, decimal, float, Date (es: 2004-01-10 ), time (es: 13:00:00+01:00 ) ID ed IDREF: like in DTD Etc user-defined (or derived) <xsd:simpletype name=.. > 8 Anonymous types can be assigned on the basys of XML nesting structure
Derivation by restriction - 1 The most common approach is to restrict available values Each simple type has some features (facets) we can use to describe the restriction Facets length, minlength, maxlength : number of elements (eg: characters) minexclusive, mininclusive, maxexclusive, maxinclusive enumeration ecc 9
Derivation by restriction - 2 <xsd:simpletype name= year > <xsd:restriction base= xsd:integer > <xsd:mininclusive value= 0 /> <xsd:maxinclusive value= 9999 /> </xsd:restriction> <xsd:simpletype name= year > <xsd:restriction base= xsd:nonnegativeinteger > <xsd:precision value= 4 /> </xsd:restriction> <xsd:simpletype name= salutation > <xsd:restriction base= xsd:string > <xsd:enumeration value= Mr /> <xsd:enumeration value= Mrs /> </xsd:restriction> 10
Derivation by pattern A special case of restriction Specific syntax to describe values: regular expressions a?, a a+: a, aa, aaaaa, a* :, a, aa, aaaa, [abcd]/(a b c d) a, b, c, d [a-z] a, b, c,, z a{2,4} aa, aaa, aaaa [^0-9]+ sequence of non-digit 11 <xsd:simpletype name= phone > <xsd:restriction base= xsd:string > <xsd:pattern value= (0039-)?0[0..9]{1,3}-[0..9]+) /> </xsd:restriction>
Derivation by union Admissible values are the union of arguments <xsd:simpletype name= Tpositive > <xsd:restriction base="xsd:decimal"> <xsd:minexclusive value="0.0"/> </xsd:restriction> <xsd:simpletype name= Tfree > <xsd:restriction base="xsd:string"> <xsd:enumeration value="free"/> </xsd:restriction> <xsd:simpletype name="tprice"> <xsd:union memberstypes= Tpositive Tfree /> 12
Derivation by list - 1 Up to now only scalar derivations A simple type of lists, space divided, with elements from another type Omogeneous structured (simple) type. Like arrays <xsd:simpletype name= Tpositive > <xsd:restriction base="xsd:decimal"> <xsd:minexclusive value="0.0"/> </xsd:restriction> <xsd:simpletype name= TpositiveList > <xsd:list itemtype= Tpositive /> <value>1 2 34 88</value> <xsd:element name= value type= TpositiveList > 13
Derivation by list - 2 length facet <xsd:simpletype name= Tpositive > <xsd:restriction base="xsd:decimal"> <xsd:minexclusive value="0.0"/> </xsd:restriction> <xsd:simpletype name= TpositiveList > <xsd:list itemtype= Tpositive /> <xsd:simpletype name= TsixPositive > <xsd:restriction base= TpositiveList > <xsd:length value= 6 /> </xsd:restriction> 14
Anonymous types - 1 Static nesting vs. explicit reference 15 <xsd:simpletype name= TpositiveList > <xsd:list> <xsd:simpletype name= Tpositive > <xsd:restriction base="xsd:decimal"> <xsd:minexclusive value="0.0"/> </xsd:restriction> </xsd:list> <xsd:simpletype name= TpositiveList > <xsd:restriction> <xsd:simpletype> </xsd:restriction> <xsd:simpletype name="tprice"> <xsd:union> <xsd:simpletype> <xsd:restriction base="xsd:decimal"> <xsd:minexclusive value="0.0"/> </xsd:restriction> <xsd:simpletype> <xsd:restriction base="xsd:string"> <xsd:enumeration value="free"/> </xsd:restriction> </xsd:union>
Anonymous types - 2 used if the type is used only once In DTD no distinction between elements (or attributes) and types (1-to-1 correspondence) <xsd:simpletype name= Tprice > <xsd:element name= price type= Tprice /> <xsd:element name= price > <xsd:simpletype> </xsd:element> 16
Complex types Complex types are Empty and generic contents (EMPTY and ANY in DTD) Element content Mixed content Any element with attributes Disomogeneous aggregation. Like structs in classic programming languages 17
Content model ANY ed EMPTY ANY content is defined as xsd:anytype <xsd:element name= memo type= xsd:anytype /> EMPTY content is a complextype with no content <xsd:complextype name= empty /> <xsd:element name= br type= empty /> 18
Element content - 1 Special elements instead of regular expressions (compared with DTD) Since XSD distinguishes types and instances, for each sub-element we have to define both the name (tag) and the type Sequence (A, B, ) Example: 19 <xsd:sequence> <xsd:element name= A type= Atype /> <xsd:element name= B type= Btype /> </xsd:sequence> <xsd:complextype name= note > <xsd:sequence> <xsd:element name= title type= xsd:string /> <xsd:element name= from type= xsd:string /> <xsd:element name= to type= xsd:string /> </xsd:sequence> </xsd:complextype>
Element content - 2 choice (A B ) <xsd:choice> <xsd:element name= A type= Atype /> <xsd:element name= B type= Btype /> </xsd:choice> set (A & B & ) Like sequence, but with no constraint on the order Eliminated from SGML <xsd:all> <xsd:element name= A type= Atype /> <xsd:element name= B type= Btype /> </xsd:all> 20
Element content - 3 A? / A+ / A* A more general construct to define minimum and maximum cardinality of each subelement xsd:minoccurs: minimum occurrences xsd:maxoccurs: maximum occurrences. Can be unbounded Default value of both is 1 A? <xsd:element name= type=.. minoccurs= 0 /> A+ <xsd:element name= type=.. maxoccurs= unbounded /> A* <xsd:element name= type=.. maxoccurs= unbounded minoccurs= 0 /> 21
Complex element contents Regular expressions like in DTD <!ELEMENT section (title, (subtitle abstract)?, para+)> 22 <xsd:element name= section > <xsd:complextype> <xsd:sequence> <xsd:element name= title type= xsd:string /> <xsd:choice minoccurs= 0 > <xsd:element name= subtitle type= xsd:string /> <xsd:element name= abstract type= xsd:string /> </xsd:choice> <xsd:element name= para type= xsd:string maxoccurs= unbounded /> <xsd:sequence> </xsd:complextype> </xsd:element>
Mixed content - 1 It is a complex type with the attribute mixed= true. <!ELEMENT text (#PCDATA bold italic)*> <xsd:complextype name= TextType mixed= true > <xsd:choice minoccurs= 0 maxoccurs= unbounded > <xsd:element name= bold type= xsd:string /> <xsd:element name= italic type= xsd:string /> <xsd:choice> </xsd:complextype> <xsd:element name= text type= TextType > 23
Mixed content - 2 XSD does not constrain the form of mixed contents <xsd:complextype name= text mixed= true > <xsd:sequence> <xsd:element name= bold type= xsd:string minoccurs= 0 /> <xsd:element name= italic type= xsd:string minoccurs= 0 /> <xsd:sequence> </xsd:complextype> 24 <xsd:complextype name= text mixed= true > <xsd:all> <xsd:element name= bold type= xsd:string /> <xsd:element name= italic type= xsd:string /> <xsd:all> </xsd:complextype>
Derivation of complex types Complex types can be derived both by restriction and by extension Derivation by restriction: by further contraining values. minoccurs and maxoccurs Typization of subelemets and/or attributes By assigning fixed values to subelements and/or attributes Derivation by extension : adding sub-elements and/or attributes The two approaches correspond to different meanings for inheritance in programming languages 25
Derivation by restriction DecoratedText is a text with at least one bold or italic <xsd:complextype name= DecoratedText mixed= true > <xsd:restriction base= TextType > <xsd:choice minoccurs= 1 maxoccurs= unbounded > <xsd:element name= bold type= xsd:string /> <xsd:element name= italic type= xsd:string /> <xsd:choice> </xsd:restriction> </xsd:complextype> Polimorfic type: can be used instead of the original one Other restrictions are: Adding default Fixed value 26 restricting minoccurs-maxoccurs
Derivation by extension Adding elements and/or attributes <xsd:complextype name= LocalizedText mixed= true > <xsd:extension base= TextType > <xsd:sequence> <xsd:element name= extract type= xsd:string /> <xsd:sequence> <xsd:attribute name= language type= xsd:string /> </xsd:extension> </xsd:complextype> 27
Types and content models Formally, subelements describe the ContentModel, while the type includes attributes also We may have complex types with both simple and complex contents <xsd:complextype name= price > <xsd:simplecontent> <xsd:extension base= xsd:decimal > <xsd:attribute name= valuta type= xsd:string /> </xsd:extension> </xsd:simplecontent> </xsd:complextype> 28
Global definitions A definition directly inside <schema> element We can refer to global elements to avoid to reintroduce the same element many times. We can use the attribute ref <xsd:element name= name type= xsd:string /> <xsd:element name= surname type= xsd:string /> 29 <xsd:complextype name= fullname > <xsd:sequence> <xsd:element ref= name /> <xsd:element ref= surname /> </xsd:sequence> </xsd:complextype>
XSD and namespaces - 1 A schema defines a collection of types and elements in the same namespace, called the target namespace Schema author can specify if elements and attributes must be qualifed with prefixes or with default namespace 30
XSD and namespaces - 2 targetnamespace attribute attributes elementformdefault and attributeformdefault say if elements and attrbutes must be qualified 31 <xsd:schema xmlns:xsd= http://www.w3.org/2001/xmlschema xmlns:note= http://www.elet.polimi.it targetnamespace= http://www.elet.polimi.it/ elementformdefault= unqualified attributeformdefault= unqualified > <xsd:simpletype name= tm ><xsd:restriction base= ><> <xsd:element name= note > <xsd:complextype><xsd:sequence> <xsd:element name= from type= xsd:string /> <xsd:element name= to type= xsd:string /> <xsd:element name= message type= note:tm /> </xsd:sequence></xsd:complextype></xsd:element> </xsd:schema>
XSD and namespaces - 3 Only global elements can be document root With default values (unqualified) only global elements are qualified <?xml version= 1.0?> error <note:note xmlns:note= http://www.elet.polimi.it > <note:from>marco</note:from> <note:to>luca</note:to> <note:message>nota di prova</note:message> </note:note> 32
XSD and namespaces - 4 We can express relationghips between namespaces <xsd:complextype name= list > <xsd:element name= nodo maxoccurs= unbounded > <xsd:complextype> <xsd:sequence> <xsd:any namespace= www.elet.polimi.it /> </xsd:sequence> </xsd:complextype> </xsd:element> </xsd:complextype> List of elements from www.elet.polimi.it 33 <xsd:sequence> <xsd:other namespace= www.elet.polimi.it /> </xsd:sequence> List of elements not belonging to www.elet.polimi.it
Modularity - 1 Reusable elements and attributes groups <xsd:group name= namegroup > <xsd:sequence> <element name= nome type= xsd:string /> <element name= cognome type= csd:string /> </xsd:sequece> </xsd:group> <xsd:attributegroup name= attrgroup > <xsd:attribute name= matricola type= xsd:integer /> </xsd:attributegroup> <xsd:element name= employee > <xsd:complextype> <xsd:group ref= namegroup /> <xsd:attributegroup ref= attrgroup /> </xsd:complextype> </xsd:element> include an external schema 34 <import namespace= schemalocation= />
Modularity - 2 Substitution Groups: Each group connects a global element to a set of possible substitutes <xsd:element name= shipcomment type= xsd:string substitutiongroup= comment /> <xsd:element name= customercomment type= xsd:string substitutiongroup= comment /> shipcomment and customercomment can replace comment in the document 35
Annotations Documentation contains human-readable descriptions Appinfo contains (meta)informations for the application <xsd:simpletype name= tipotesto > <xsd:annotation> <xsd:documentation>... </xsd:documentation> <xsd:appinfo> <hfp:hasfacet name= length /> </csd:appinfo> </xsd:annotation> 36
Using XSD 1 Instance document uses the namespace http://www.w3.org/2001/ XMLSchema-instance If targetnamespace is not defined: <?xml version= 1.0?> <note xmlns:xsi= http://www.w3.org/2001/xmlschema-instance xsi:nonamespaceschemalocation= http://www.elet.polimi.it/note.xsd > </note> If targetnamespace is defined: 37 <?xml version= 1.0?> <note:note xmlns:note= http://www.elet.polimi.it/ xmlns:xsi= http://www.w3.org/2001/xmlschema-instance xsi:schemalocation= http://www.elet.polimi.it/note.xsd > </note:note>
Using XSD 2 If derived types are used it is necessary to explicitly state it (dynamic type) through the attrbiute xsi:type why? <note:text xsi:type= note:localizedtext language= IT > <note:extract>estratto in lingua italiana</note:extract> </note:testo> 38
Other OO features 1 Abstract elements: are used for substitution groups only <xsd:element name= comment type= xsd:string abstract= true /> Abstract types: requires the use of a concrete subtype (using xsi:type) <complextype name= vehicle abstract= true /> <complextype name= car > <complexcontent> <extension base= vehicle /> </complexcontent> </complextype> <E/> NO <E xsi:type= car /> YES 39 <element name= E type= vehicle >
Other OO features 2 Final types and facets <redefine> like <include>, but allows for modification of imported schema 40