Big Data 11. Data Models

Size: px
Start display at page:

Download "Big Data 11. Data Models"

Transcription

1 Ghislain Fourny Big Data 11. Data Models pinkyone / 123RF Stock Photo

2 CSV (Comma separated values) This is syntax ID,Last name,first name,theory, 1,Einstein,Albert,"General, Special Relativity" 2,Gödel,Kurt,"""Incompleteness"" Theorem" 2

3 CSV (Comma separated values) This is syntax ID,Last name,first name,theory, 1,Einstein,Albert,"General, Special Relativity" 2,Gödel,Kurt,"""Incompleteness"" Theorem" ID Last name First name Theory 1 Einstein Albert General, Special Relativity 2 Gödel Kurt "Incompleteness" Theorem This is a data model 3

4 Syntax vs. Data Models Physical view Syntax ID,Last name,first name,theory, 1,Einstein,Albert,"General, Special Relativity" 2,Gödel,Kurt,"""Incompleteness"" Theorem" 4

5 Syntax vs. Data Models Logical view Data Model ID Last name First name Theory 1 Einstein Albert General, Special Relativity 2 Gödel Kurt "Incompleteness" Theorem Physical view Syntax ID,Last name,first name,theory, 1,Einstein,Albert,"General, Special Relativity" 2,Gödel,Kurt,"""Incompleteness"" Theorem" 5

6 Syntax vs. Data Models Physical view Syntax <a> <d e="f"/> <c>this is <b>text</b>.</c> </a> 6

7 Syntax vs. Data Models a Logical view Data Model e = f d This is c b. text Physical view Syntax <a> <d e="f"/> <c>this is <b>text</b>.</c> </a> 7

8 Edge vs. Node labeling foo bar Labels are on the edges 8

9 Edge vs. Node labeling foo foo foobar bar bar Labels are on the edges Labels are on the nodes 9

10 XML Data models Information Set (Infoset) 10

11 XML Data models Information Set (Infoset) Post Schema-Validation Infoset (PSVI) 11

12 XML Data models Information Set (Infoset) Post Schema-Validation Infoset (PSVI) XQuery and XPath Data Model (XDM) 12

13 JSON Data Models "original" (implicit) JSON Data Model 13

14 JSON Data Models "original" (implicit) JSON Data Model JSON Schema Data Model 14

15 JSON Data Models "original" (implicit) JSON Data Model JSON Schema Data Model JSONiq Data Model (JDM) y/html/section-jsoniq-data-model.html 15

16 HTML/XML Data model Document Object Model (DOM) 16

17 grigory_bruev / 123RF Stock Photo XML Information Set 17

18 Information Set <?xml version="1.0" encoding="utf-8"?> <dc:metadata xmlns:dc=" <title xml:lang="en" year="2008" >Systems Group</title> <publisher>eth Zurich</publisher> </dc:metadata> 18

19 Information Set 19

20 The 11 XML Information Items Document Element Attribute Processing Instruction Character Comment Namespace Unexpanded Entity Reference DTD Unparsed Entity Notation 20 20

21 The 11 XML Information Items Document Element Attribute Processing Instruction Character Comment Namespace Unexpanded Entity Reference DTD Unparsed Entity Notation 21 21

22 Document Information Items Document Information Item doc [children] Element Information Item metadata [document element] Element Information Item metadata [notations] <empty> [unparsed entities] <empty> [base URI ] file:///users/bigdata/documents/info.xml [character encoding scheme] UTF-8 [standalone] <no value> [version]

23 Element Information Items Element Information Item metadata metadata [namespace name] [local name] metadata [prefix] dc [children] Element Information Items title, publisher [attributes] <empty> [namespace attributes] Attribute Information Item xmlns:dc [in-scope namespaces] Namespace Information Items dc->systems, xml->ns [base URI] file:///users/bigdata/documents/info.xml [parent] Document Information Item doc 23

24 Attribute Information Items Attribute Information Item xmlns:dc [namespace name] [local name] dc [prefix] xmlns [normalized value] [specified] true [attribute type] <no value> [references] unknown [owner element] Element Information Item metadata xmlns:dc 24

25 Namespace Information Items Namespace Information Item dc->systems [prefix] dc [namespace name] dc->systems Namespace Information Item xml->ns xml->ns [prefix] xml [namespace name] 25

26 XML Infoset - the tree doc xmlns:dc metadata xml->ns dc->systems xml->ns dc->systems title publisher dc->systems xml->ns lang year ETH Zurich Systems Group 26

27 Post-Schema-Validation Infoset Infoset + Types Post-Schema-Validation Infoset (PSVI) 27

28 Weerapat Kiatdumrong / 123RF Stock Photo XPath and XQuery Data Model 28

29 XDM: Sequences of Items (,,,,, ) 29

30 XDM: Sequence of one item 30

31 XDM: Sequence of one item = ( ) 31

32 XDM: Sequences are flat ((, ), ) 32

33 XDM: Sequences are flat ((, ), )=(,, ) 33

34 XDM: Items Atomic Node 34

35 XDM: Seven Kinds of XML Nodes Document node Element node Attribute node Text node Comment node Processing instruction node Namespace node 35

36 XDM: Seven Kinds of XML Nodes Infoset XDM 36

37 XDM vs. Infoset Infoset XDM xs:untyped 37

38 XDM: New Items in 3.0 and 3.1 Functions 38

39 XDM: New Items in 3.0 and 3.1 lorem ipsum dolor sit amet Functions Maps 39

40 XDM: New Items in 3.0 and 3.1 lorem ipsum dolor sit amet Functions Maps Arrays 40

41 XDM and Querying for let order by if + any else = then every while where return exit with Expression 41

42 Types 42

43 Type Systems Almost all type systems (Java, SQL, PSVI, JDM, Protocol buffers, Avro, Parquet, and so on) share the following properties: 43

44 Type Systems Almost all type systems (Java, SQL, PSVI, JDM, Protocol buffers, Avro, Parquet, and so on) share the following properties: - Distinction between atomic types and structured types 44

45 Type Systems Almost all type systems (Java, SQL, PSVI, JDM, Protocol buffers, Avro, Parquet, and so on) share the following properties: - Distinction between atomic types and structured types - Same categories of atomic types 45

46 Type Systems Almost all type systems (Java, SQL, PSVI, JDM, Protocol buffers, Avro, Parquet, and so on) share the following properties: - Distinction between atomic types and structured types - Same categories of atomic types - Lists and maps as structured types 46

47 Type Systems Almost all type systems (Java, SQL, PSVI, JDM, Protocol buffers, Avro, Parquet, and so on) share the following properties: - Distinction between atomic types and structured types - Same categories of atomic types - Lists and maps as structured types - Sequence type cardinalities 47

48 Types (General) Atomic Types vs. Structured Types 48

49 Atomic Types 49

50 Atomic Types Strings 50

51 Strings (Character sequences with monoid structure) "foo" "Zurich" "Ilsebill salzte nach." f o o Z u r i c h I l s e b i l l s a l z t e n a c h. 51

52 Atomic Types Strings Numbers 52

53 Interval-based integer types (exist as signed and unsigned) 8-bit (Java's byte) 16-bit (Java's short) 32-bit (Java's int) 64-bit (Java's long) 53

54 Arbitrary precision decimals (and integers) Any precision and scale

55 Float and Double IEEE 754 standard 32 bits 64 bits ca. 7 digits ca. 15 digits to to single precision double precision 55

56 Atomic Types Strings Numbers Booleans 56

57 Booleans TRUE t true y yes on 1 FALSE f false n no off 0 57

58 Atomic Types Strings Numbers Booleans Dates and Times 58

59 Dates and times Date Time Timestamp Duration 59

60 Dates (Gregorian calendar) Year + Month + Day 2017 August 1st (AD) 60

61 Times Hours + Minutes + Seconds 10 : 31 :

62 Timestamps Year + Month + Day + Hours + Minutes + Seconds 2017 August 1 st 10 : 31 : (AD) 62

63 Atomic Types Strings Numbers Booleans Dates and Times Time Intervals 63

64 Duration kinds Year Month Day Hour Minute Second Example: 2 years and 4 months 64

65 Duration kinds Year Month Day Hour Minute Second Example: 2 years and 4 months Example: 3 hours and 14 minutes 65

66 Atomic Types Strings Numbers Booleans Dates and Times Time Intervals Binaries 66

67 Atomic Types Strings Numbers Booleans Dates and Times Time Intervals Binaries Null 67

68 Lexical space vs. value space Value space Lexical space 68

69 Lexical space vs. value space "1" "01"... Value space Lexical space 69

70 Lexical space vs. value space "4" "04" "100b"... Value space Lexical space 70

71 Subtypes Supertype's value space Subtype's value space 71

72 Structured Types Data Structure Associative Arrays (a.k.a. maps) Ordered Lists Examples JSON Object, Protobuf Message, Set of XML Attributes JSON Array, XML Element, Protobuf repeated field 72

73 Structured Types Data Structure Associative Arrays (a.k.a. maps) Ordered Lists Examples JSON Object, Protobuf Message, Set of XML Attributes JSON Array, XML Element, Protobuf repeated field 73

74 Cardinality How many? Common sign Common adjective 74

75 Cardinality One How many? Common sign Common adjective required 75

76 Cardinality One How many? Common sign Zero or more * Common adjective required 76

77 Cardinality One How many? Common sign Zero or more * Common adjective required Zero or one? optional 77

78 Cardinality One How many? Common sign Zero or more * Common adjective required Zero or one? optional One or more + 78

79 wklzzz / 123RF Stock Photo JSON Data Model 79

80 JSON Values Strings Objects Numbers Booleans Arrays Null Atomic values Structured values 80

81 JSON Values Strings Numbers Booleans Null Objects String-to-Value map Arrays List of values Atomic values Structured values 81

82 JSON Values Strings Numbers Booleans Null Objects String-to-Value map Recursion Arrays List of values Atomic values Structured values 82

83 Tree-based visual model { } "foo" : true, "bar" : [ { "foobar" : "foo" }, null ] 83

84 Tree-based visual model { } "foo" : true, "bar" : [ { "foobar" : "foo" }, null ] object 84

85 Tree-based visual model { } "foo" : true, "bar" : [ { "foobar" : "foo" }, null ] foo true object bar array 85

86 Tree-based visual model { } "foo" : true, "bar" : [ { "foobar" : "foo" }, null ] foo true object object bar array null foobar foo 86

87 wklzzz / 123RF Stock Photo Protocol Buffers 87

88 Messages message Person { required string last_name = 1; repeated string first_name = 2; optional Title title = 3; optional Person boss = 4; } 88

89 Scalar types double, float int32, int64 and variants bool string bytes 89

90 Enums enum Title { MR = 1; MS = 2; MRS = 3; } 90

91 In C++ person.boss().first_name() 91

92 Validation Burak Cakmak / 123RF Stock Photo 92

93 Validation: The Pipeline Document 93

94 Validation: The Pipeline Document Well- Formedness 94

95 Validation: The Pipeline Document Well- Formedness Validation 95

96 On the oxygen Cheat Sheet Validity Well- Formedness 96

97 Validation vs. Annotation Validation 97

98 Validation vs. Annotation Validation Annotation 98

99 Validation 99

100 Validation 100

101 Validation 101

102 Validation 102

103 Validation 103

104 DTD Validation 104

105 Document Type Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a>& <a>& &&<d&e="f"/>& &&<c>this&is&<b>text</b>.</c>& </a>& & 105

106 Document Type Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a>& <a>& &&<d&e="f"/>& &&<c>this&is&<b>text</b>.</c>& </a>& & 106

107 Document Type Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& ]>& <a>& &&<d&e="f"/>& &&<c>this&is&<b>text</b>.</c>& </a>& & 107

108 Element Type Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& ]>& <a>& &&<d&e="f"/>& &&<c>this&is&<b>text</b>.</c>& </a>& & 108

109 Element Type Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& <!ELEMENT&a&EMPTY>& ]>& <a/>& & Empty Content 109

110 Element Type Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& <!ELEMENT&a&(#PCDATA)>& ]>& <a>& &&This&is&text.& </a>& & Simple Content 110

111 Element Type Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& <!ELEMENT&a&(foo,&bar)>& <!ELEMENT&foo&EMPTY>& <!ELEMENT&bar&EMPTY>& ]>& <a>& &&<foo/>& &&<bar/>& </a>& & Complex Content 111

112 Element Type Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& <!ELEMENT&a&(foo+,&bar*,&foobar?)>& <!ELEMENT&foo&EMPTY>& <!ELEMENT&bar&EMPTY>& <!ELEMENT&foobar&EMPTY>& ]>& <a>& &&<foo/>& &&<foo/>& &&<foo/>& &&<foobar/>& </a>& Complex Content 112

113 Element Type Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& <!ELEMENT&a&(&bar*& &(foo foobar)+)?>& <!ELEMENT&foo&EMPTY>& <!ELEMENT&foobar&EMPTY>& ]>& <a>& &&<foo/>& &&<foobar/>& &&<foo/>& &&<foobar/>& </a>& & Complex Content 113

114 Element Type Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& <!ELEMENT&a&(#PCDATA& &foo)*>& <!ELEMENT&foo&EMPTY>& ]>& <a>& &&<foo/>lorum<foo/>ipsum<foo/>& </a>& & Mixed Content 114

115 Element Type Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& <!ELEMENT&a&ANY>& <!ELEMENT&foo&EMPTY>& <!ELEMENT&bar&EMPTY>& ]>& <a>& &&<foo/>& &&Lorem& &&<bar/>& &&Ipsum& &&<bar/>& &&<bar/>& </a>& & Mixed Content 115

116 Attribute-List Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& <!ELEMENT&a&EMPTY>& <!ATTLIST&a&foo&CDATA&#REQUIRED>& ]>& <a&foo="this&is&a&"value""></a>& & 116

117 Attribute-List Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& <!ELEMENT&a&EMPTY>& <!ATTLIST&a&foo&CDATA&#IMPLIED>& ]>& <a&foo="this&is&a&"value""></a>& & 117

118 Attribute-List Declaration <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE a [ <!ELEMENT a EMPTY> <!ATTLIST a foo CDATA #IMPLIED> ]> <a></a> 118

119 Attribute-List Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& <!ELEMENT&a&EMPTY>& <!ATTLIST&a&foo&CDATA&"bar">& ]>& <a&foo="this&is&a&"value""></a>& & 119

120 Attribute-List Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& <!ELEMENT&a&EMPTY>& <!ATTLIST&a&foo&CDATA&"bar">& ]>& <a></a>& & 120

121 Attribute-List Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& <!ELEMENT&a&EMPTY>& <!ATTLIST&a&foo&CDATA&#FIXED&"bar">& ]>& <a&foo="bar"></a>& & 121

122 Attribute-List Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& <!ELEMENT&a&EMPTY>& <!ATTLIST&a&foo&NMTOKEN&#REQUIRED>& ]>& <a&foo="123atoken456"></a>& & 122

123 Attribute-List Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&a&[& <!ELEMENT&a&EMPTY>& <!ATTLIST&a&foo&NMTOKENS&#REQUIRED>& ]>& <a&foo="123atoken&456"></a>& & 123

124 Attribute-List Declaration <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&root&[& <!ELEMENT&root&(foo+,&bar,&barlist)>& <!ELEMENT&foo&EMPTY>& <!ELEMENT&bar&EMPTY>& <!ELEMENT&barlist&EMPTY>& <!ATTLIST&foo&myid&ID&#REQUIRED>& <!ATTLIST&bar&ref&IDREF&#REQUIRED>& <!ATTLIST&barlist&ref&IDREFS&#REQUIRED>& ]>& <root>& &&<foo&myid="foobar"/>& &&<foo&myid="foobar2"/>& &&<bar&ref="foobar"/>& &&<barlist&ref="foobar&foobar2"/>& </root>& & 124

125 DTD Example: External Subset <?xml&version="1.0"&encoding="utf98"?>& <!DOCTYPE&root&SYSTEM&"schema.dtd">& <a&foo="bar"/>& 125

126 Warning: DTDs and Namespaces <!ELEMENT(eth((date,(president,(Rektor)>( <!ATTLIST(eth(xmlns(CDATA(#FIXED( " ((((((((((((((xmlns:xmldb(cdata(#fixed( " ((((((((((((((date(cdata(#implied( ((((((((((((((xmldb:date(cdata(#implied>( <!ELEMENT(date((#PCDATA)>( <!ELEMENT(president((#PCDATA)>( <!ATTLIST(president(number(CDATA(#IMPLIED>( <!ELEMENT(Rektor((#PCDATA)>( ( 126

127 Notations and Unparsed Entities <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE foo [ <!ENTITY presentation SYSTEM "/Users/bigdata/desktop/presentation.pptx" NDATA pptx> <!NOTATION pptx PUBLIC "powerpoint"> <!ELEMENT foo EMPTY> <!ATTLIST foo foo ENTITY #REQUIRED> ]> <foo foo="presentation"></foo> 127

128 XML Schema 128

129 Empty Schema <?xml&version="1.0"&encoding="utf98"?>& <xs:schema& &&xmlns:xs=" </xs:schema>& & 129

130 Simple Scenario <?xml&version="1.0"&encoding="utf98"?>& <xs:schema& &&xmlns:xs=" &&<xs:element&name="foo"&type="xs:string"/>& </xs:schema>& & & schema.xsd <?xml&version="1.0"&encoding="utf98"?>& <foo& &&xmlns:xsi=" &&xsi:nonamespaceschemalocation="schema.xsd">& &&This&is&text.& </foo>& & file.xml 130

131 Simple Scenario <?xml&version="1.0"&encoding="utf98"?>& <xs:schema& &&xmlns:xs=" &&<xs:element&name="foo"&type="xs:integer"/>& </xs:schema>& & & schema.xsd <?xml&version="1.0"&encoding="utf98"?>& <foo& &&xmlns:xsi=" &&xsi:nonamespaceschemalocation="schema.xsd">& &&142857& </foo>& & file.xml 131

132 Simple Types: Built-in Strings Numbers Booleans string anyuri QName decimal integer float double long int short byte positiveinteger nonnegativeinteger... unsignedlong unsignedint... boolean 132

133 Simple Types: Built-in Dates and Times Time Intervals Binaries Null - datetime time date gyearmonth gmonthday gyear gmonth gday datetimestamp duration yearmonthduration daytimeduration hexbinary base64binary 133

134 Dates T10:15:00Z 01:15:00-08:00 134

135 Durations P1Y2MT3H 135

136 User-defined types 136

137 User-defined types Restriction 137

138 User-defined types Restriction Union Not atomic 138

139 User-defined types Restriction Union Not atomic List Not atomic 139

140 Restriction <?xml&version="1.0"&encoding="utf98"?>& <xs:schema& &&xmlns:xs=" &&<xs:simpletype&name="myfixedlengthstring">& &&&&<xs:restriction&base="xs:string">& &&&&&&<xs:length&value="3"/>& &&&&</xs:restriction>& &&</xs:simpletype>& &&<xs:element&name="foo"&type="myfixedlengthstring"/>& </xs:schema>& schema.xsd <?xml&version="1.0"&encoding="utf98"?>& <foo& &&xmlns:xsi=" &&xsi:nonamespaceschemalocation="schema.xsd">zrh</foo>& & & file.xml 140

141 Restriction <xs:simpletype,name="myfixedlengthstring">,,,<xs:restriction,base="xs:string">,,,,,<xs:length,value="3"/>,,,</xs:restriction>, </xs:simpletype>,, <foo>zrh</foo>,,, 141

142 List <xs:simpletype,name="mylist">,,,<xs:list,itemtype="xs:string"/>, </xs:simpletype>,, <foo>foo,bar,foobar</foo>,, 142

143 Union <xs:simpletype,name="myunion">,,,<xs:union,membertypes="xs:integer,xs:boolean"/>, </xs:simpletype>,, <foo>true</foo>,,,, 143

144 Complex Types 144

145 Complex Types Empty <foo/> 145

146 Complex Types Empty <foo/> Simple Content <foo>text</foo> 146

147 Complex Types Empty <foo/> Simple Content Complex Content <foo>text</foo> <foo> <a/> <b/> </foo> 147

148 Complex Types Empty <foo/> Simple Content Complex Content Mixed Content <foo>text</foo> <foo> <a/> <b/> </foo> <foo> Text<a/>Text<b/> </foo> 148

149 Complex content <xs:complextype-name="complexcontent">- --<xs:sequence>- ----<xs:element-name="twotofour"-type="xs:string"-minoccurs="2"-maxoccurs="4"/>- ----<xs:element-name="zeroorone"-type="xs:boolean"-minoccurs="0"-maxoccurs="1"/>- --</xs:sequence>- </xs:complextype>- - <foo>- --<twotofour>foobar</twotofour>- --<twotofour>foobar</twotofour>- --<twotofour>foobar</twotofour>- --<zeroorone>true</zeroorone>- </foo>

150 Complex content <xs:complextype-name="complexcontent">- --<xs:sequence>- ----<xs:element-name="twotofour"-type="xs:string"-minoccurs="2"-maxoccurs="4"/>- ----<xs:element-name="zeroorone"-type="xs:boolean"-minoccurs="0"-maxoccurs="1"/>- --</xs:sequence>- </xs:complextype>- - <foo>- --<twotofour>foobar</twotofour>- --<twotofour>foobar</twotofour>- --<twotofour>foobar</twotofour>- --<zeroorone>true</zeroorone>- </foo>

151 Empty content <xs:complextype-name="emptytype">- --<xs:sequence/>- </xs:complextype>- - <foo/>

152 Simple content <xs:complextype-name="datecountry">- --<xs:simplecontent>- ----<xs:extension-base="xs:date"> <xs:attribute-name="country"-type="xs:string"/>- ----</xs:extension>- --</xs:simplecontent>- </xs:complextype>- - <foo-country="switzerland">2014d12d02</foo>

153 Mixed content <xs:complextype-name="mixedcontent"-mixed="true">- --<xs:sequence>- ----<xs:element-name="b"-type="xs:string"-minoccurs="0"-maxoccurs="unbounded"/>- --</xs:sequence>- </xs:complextype>- - <foo>some-text-and-some-<b>bold</b>-text.</foo>

154 Simple type on attributes <xs:complextype-name="withattribute">- --<xs:sequence/>- --<xs:attribute-name="country" type="xs:string" default="switzerland"/>- </xs:complextype>- - <foo-country="switzerland"/>

155 Named Types <xs:complextype-name="empty">---- <xs:sequence/>- </xs:complextype>- - <xs:element-name="c"-type="empty">- </xs:element>

156 Anonymous Types <xs:element*name="c">* **<xs:complextype>* ****<xs:sequence/>* **</xs:complextype>* </xs:element>* * 156

157 No namespaces <?xml&version="1.0"&encoding="utf98"?>& <xs:schema& &&xmlns:xs=" &&<xs:element&name="foo"&type="xs:string"/>& </xs:schema>& & & <?xml&version="1.0"&encoding="utf98"?>& <foo& &&xmlns:xsi=" &&xsi:nonamespaceschemalocation="schema.xsd">& &&This&is&text.& </foo>& & 157

158 With namespaces <?xml&version="1.0"&encoding="utf98"?>& <xs:schema& &&xmlns:xs=" &&<xs:element&name="foo"&type="xs:string"/>& </xs:schema>& & <?xml&version="1.0"&encoding="utf98"?>& <big:foo& &&xmlns:xsi=" &&xsi:schemalocation="! &&xmlns:big=" &&This&is&text.& </big:foo>& & & 158

159 Keys <xs:schema* ****xmlns:xs=" **<xs:element*name="root">* ****<xs:complextype>* ******..* ****</xs:complextype>* ****<xs:key*name="foodid">* ******<xs:selector*xpath="foo"/>* ****</xs:key>* **</xs:element>* </xs:schema>* * <?xml*version="1.0"*encoding="utfd8"?>* <root* **xmlns:xsi=" **xsi:nonamespaceschemalocation="schema.xsd">* **<foo*id="foo"/>* **<foo*id="bar"/>* **<foo*id="foobar"/>* </root>* * 159

160 Keys <xs:schema* ****xmlns:xs=" **<xs:element*name="root">* ****<xs:complextype>* ******..* ****</xs:complextype>* ****<xs:key*name="foodid">* What must be unique ******<xs:selector*xpath="foo"/>* ****</xs:key>* **</xs:element>* </xs:schema>* * <?xml*version="1.0"*encoding="utfd8"?>* <root* **xmlns:xsi=" **xsi:nonamespaceschemalocation="schema.xsd">* **<foo*id="foo"/>* **<foo*id="bar"/>* **<foo*id="foobar"/>* </root>* * 160

161 Keys <xs:schema* ****xmlns:xs=" **<xs:element*name="root">* ****<xs:complextype>* ******..* ****</xs:complextype>* ****<xs:key*name="foodid">* ******<xs:selector*xpath="foo"/>* ****</xs:key>* **</xs:element>* </xs:schema>* * <?xml*version="1.0"*encoding="utfd8"?>* <root* What must be unique What makes it unique **xmlns:xsi=" **xsi:nonamespaceschemalocation="schema.xsd">* **<foo*id="foo"/>* **<foo*id="bar"/>* **<foo*id="foobar"/>* </root>* * 161

162 Bonus material: The Schema of Schemas!!<xs:schema!!!!!xmlns:xs=" 162

163 Avro 163

164 Avro is language neutral 164

165 Interoperability 165

166 Interoperability Avro DataFile 166

167 Interoperability Avro DataFile 167

168 Unlike Protocol Buffer and Thrift... Schema (Language independent) 168

169 Unlike Protocol Buffer and Thrift... Schema (Language independent) Compilation 169

170 Unlike Protocol Buffer and Thrift... Schema (Language independent) Compilation Code 170

171 Unlike Protocol Buffer and Thrift... Schema (Language independent) Compilation More Code More Code Code More Code More Code 171

172 Unlike Protocol Buffer and Thrift... Schema (Language independent) Compilation More Code More Code Code More Code More Code Data 172

173 Avro processes schemas not known at compile time More Code More Code Code More Code More Code Schema Data + Known at runtime 173

174 Avro Schema Schema This is actually JSON 174

175 Avro Schema Schema Avro IDL But there is also another syntax for users that like C 175

176 Avro DataFile Avro DataFile An Avro DataFile is encoded in binary format 176

177 Avro DataFile DataFile JSONbased encoder But there is also another debuggingfriendly syntax 177

178 Avro types Atomic Types 178

179 Avro types Atomic Types vs. Structured Types 179

180 Avro atomic types Category Absence of value Boolean Number String Binary Types null boolean int long float double string enum bytes fixed (list of 8-bit unsigned bytes) 180

181 Avro atomic types { "type" : "null" } { "type" : "string" } { "type" : "boolean" } Primitive types 181

182 Avro atomic types { "type" : "null" } { "type" : "string" } { "type" : "boolean" } { } { } "type" : "enum", "name" : "cuttlery", "symbols" : [ "KNIFE", "FORK", "SPOON" ] "type" : "fixed", "name" : "MD5", "size" : 16 Primitive types Non-primitive types 182

183 Avro structured types Category Arrays Objects Types array map record 183

184 Avro arrays 1 foo 2 bar 3 foobar 4 foo 5 foobar 6 bar {a: 1, b: "foo"} 2 {a: 2, b: "foo"} 3 {a: 3, b: "bar"} 184

185 Avro arrays 1 foo 2 bar 3 foobar 4 foo 5 foobar 6 bar {a: 1, b: "foo"} 2 {a: 2, b: "foo"} 3 {a: 3, b: "bar"} { } "type" : "array", "items" : "string" 185

186 Avro arrays 1 foo 2 bar 3 foobar 4 foo 5 foobar 6 bar {a: 1, b: "foo"} 2 {a: 2, b: "foo"} 3 {a: 3, b: "bar"} { } "type" : "array", "items" : "string" All items in the array must have the same type. 186

187 Avro maps keys foo bar foobar values true false true 187

188 Avro maps keys foo bar foobar values true false true Must be strings 188

189 Avro maps keys foo bar foobar values true false true Must be same type 189

190 Avro maps keys foo bar foobar values true false true { } "type" : "map", "name" : "flags", "values" : "boolean" Must be same type 190

191 Avro records fields values name ETH canton ZH students 20,

192 Avro records { } fields name canton values ETH ZH students 20,000 "type" : "map", "name" : "university", "fields" : [ { "name" : "name", "type" : "string" }, { "name" : "canton", "type" : "cantonal-code" }, { "name" : "students", "type" : "long" } ] 192

193 Mappings Generic 193

194 Mappings Generic Specific 194

195 Mappings Generic Specific Reflexive 195

196 Avro DataFile Metadata (including schema) 196

197 Avro DataFile Metadata (including schema) Block 197

198 Avro DataFile Metadata (including schema) Block Block 198

199 Avro DataFile Metadata (including schema) Block Block Block 199

200 Avro DataFile Metadata (including schema) Block Block Block Block 200

201 Avro DataFile Metadata (including schema) Block Block Block Block Block 201

202 Avro DataFile Metadata (including schema) Specifies sync marker Block sync marker Block Block Block Block 202

203 Avro DataFiles are splittable 203

204 Avro DataFiles are splittable 204

205 Avro DataFiles are splittable 205

206 Schema resolution Old Schema 206

207 Schema resolution Old Schema Avro DataFile 207

208 Schema resolution Old Schema New Schema Avro DataFile 208

209 Schema resolution Old Schema New Schema Avro DataFile New field 209

210 Schema resolution Old Schema New Schema Avro DataFile New field Default value is taken 210

211 Schema resolution Old Schema New Schema Avro DataFile Removed field 211

212 Schema resolution Old Schema New Schema Avro DataFile Removed field Ignored (projection) 212

213 Schema resolution New Schema Old Schema Avro DataFile New field 213

214 Schema resolution New Schema Old Schema Avro DataFile New field Ignored (projection) 214

215 Schema resolution New Schema Old Schema Avro DataFile Removed field 215

216 Schema resolution New Schema Old Schema Avro DataFile Removed field Default value (error if none) 216

217 Parquet 217

218 YET ANOTHER format?

219 YET ANOTHER format? Don't we already have enough?

220 But this time, it's different

221 Row storage 221

222 Columnar storage 222

223 But... But I thought we were talking about heterogeneous collections of trees?

224 Mapping trees to tables... If our heterogeneous collection is really homogeneous and our trees are flat, this is easy 224

225 Mapping trees to tables... Otherwise, there are ways. Edge shredding Ad-hoc mappings Tree-encoding Dremel 225

226 Parquet atomic types Category Boolean Number String Binary Date Types boolean int32 int64 int96 double DECIMAL(precision, scale) UTF8 ENUM binary fixed_len_byte_array DATE 226

227 Parquet structured types Category Arrays Objects Types LIST MAP 227

228 Parquet schema definition Protocol buffer schemas Avro schemas Thrift schemas 228

Big Data for Engineers Spring Data Models

Big Data for Engineers Spring Data Models Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone / 123RF Stock Photo CSV (Comma separated values) This is syntax ID,Last name,first name,theory, 1,Einstein,Albert,"General, Special

More information

Big Data 9. Data Models

Big Data 9. Data Models Ghislain Fourny Big Data 9. Data Models pinkyone / 123RF Stock Photo 1 Syntax vs. Data Models Physical view Syntax this is text. 2 Syntax vs. Data Models a Logical view

More information

Big Data Fall Data Models

Big Data Fall Data Models Ghislain Fourny Big Data Fall 2018 11. Data Models pinkyone / 123RF Stock Photo CSV (Comma separated values) This is syntax ID,Last name,first name,theory, 1,Einstein,Albert,"General, Special Relativity"

More information

Solution Sheet 5 XML Data Models and XQuery

Solution Sheet 5 XML Data Models and XQuery The Systems Group at ETH Zurich Big Data Fall Semester 2012 Prof. Dr. Donald Kossmann Prof. Dr. Nesime Tatbul Assistants: Martin Kaufmann Besmira Nushi 07.12.2012 Solution Sheet 5 XML Data Models and XQuery

More information

Introduction Syntax and Usage XML Databases Java Tutorial XML. November 5, 2008 XML

Introduction Syntax and Usage XML Databases Java Tutorial XML. November 5, 2008 XML Introduction Syntax and Usage Databases Java Tutorial November 5, 2008 Introduction Syntax and Usage Databases Java Tutorial Outline 1 Introduction 2 Syntax and Usage Syntax Well Formed and Valid Displaying

More information

Big Data Exercises. Fall 2018 Week 8 ETH Zurich. XML validation

Big Data Exercises. Fall 2018 Week 8 ETH Zurich. XML validation Big Data Exercises Fall 2018 Week 8 ETH Zurich XML validation Reading: (optional, but useful) XML in a Nutshell, Elliotte Rusty Harold, W. Scott Means, 3rd edition, 2005: Online via ETH Library 1. XML

More information

Information Systems for Engineers Fall Data Definition with SQL

Information Systems for Engineers Fall Data Definition with SQL Ghislain Fourny Information Systems for Engineers Fall 2018 3. Data Definition with SQL Rare Book and Manuscript Library, Columbia University. What does data look like? Relations 2 Reminder: relation 0

More information

Chapter 5: XPath/XQuery Data Model

Chapter 5: XPath/XQuery Data Model 5. XPath/XQuery Data Model 5-1 Chapter 5: XPath/XQuery Data Model References: Mary Fernández, Ashok Malhotra, Jonathan Marsh, Marton Nagy, Norman Walsh (Ed.): XQuery 1.0 and XPath 2.0 Data Model (XDM).

More information

Java EE 7: Back-end Server Application Development 4-2

Java EE 7: Back-end Server Application Development 4-2 Java EE 7: Back-end Server Application Development 4-2 XML describes data objects called XML documents that: Are composed of markup language for structuring the document data Support custom tags for data

More information

W3C XML Schemas For Publishing

W3C XML Schemas For Publishing W3C XML Schemas For Publishing 208 5.8.xml: Getting Started

More information

Marker s feedback version

Marker s feedback version Two hours Special instructions: This paper will be taken on-line and this is the paper format which will be available as a back-up UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Semi-structured Data

More information

Avro Specification

Avro Specification Table of contents 1 Introduction...2 2 Schema Declaration... 2 2.1 Primitive Types... 2 2.2 Complex Types...2 2.3 Names... 5 2.4 Aliases... 6 3 Data Serialization...6 3.1 Encodings... 7 3.2 Binary Encoding...7

More information

2006 Martin v. Löwis. Data-centric XML. XML Schema (Part 1)

2006 Martin v. Löwis. Data-centric XML. XML Schema (Part 1) Data-centric XML XML Schema (Part 1) Schema and DTD Disadvantages of DTD: separate, non-xml syntax very limited constraints on data types (just ID, IDREF, ) no support for sets (i.e. each element type

More information

Avro Specification

Avro Specification Table of contents 1 Introduction...2 2 Schema Declaration... 2 2.1 Primitive Types... 2 2.2 Complex Types...2 2.3 Names... 5 3 Data Serialization...6 3.1 Encodings... 6 3.2 Binary Encoding...6 3.3 JSON

More information

[MS-SSISPARAMS-Diff]: Integration Services Project Parameter File Format. Intellectual Property Rights Notice for Open Specifications Documentation

[MS-SSISPARAMS-Diff]: Integration Services Project Parameter File Format. Intellectual Property Rights Notice for Open Specifications Documentation [MS-SSISPARAMS-Diff]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation ( this documentation ) for

More information

Big Data 10. Querying

Big Data 10. Querying Ghislain Fourny Big Data 10. Querying pinkyone / 123RF Stock Photo 1 Declarative Languages What vs. How 2 Functional Languages for let order by if + any else = then every while where return exit with Expression

More information

XML and Web Services

XML and Web Services XML and Web Services Lecture 8 1 XML (Section 17) Outline XML syntax, semistructured data Document Type Definitions (DTDs) XML Schema Introduction to XML based Web Services 2 Additional Readings on XML

More information

Big Data 12. Querying

Big Data 12. Querying Ghislain Fourny Big Data 12. Querying pinkyone / 123RF Stock Photo Declarative Languages What vs. How 2 Functional Languages for let order by if + any else = then every while where return exit with Expression

More information

BEAWebLogic. Integration. Transforming Data Using XQuery Mapper

BEAWebLogic. Integration. Transforming Data Using XQuery Mapper BEAWebLogic Integration Transforming Data Using XQuery Mapper Version: 10.2 Document Revised: March 2008 Contents Introduction Overview of XQuery Mapper.............................................. 1-1

More information

[MS-DPAD]: Alert Definition Data Portability Overview. Intellectual Property Rights Notice for Open Specifications Documentation

[MS-DPAD]: Alert Definition Data Portability Overview. Intellectual Property Rights Notice for Open Specifications Documentation [MS-DPAD]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation ( this documentation ) for protocols,

More information

XML. Part II DTD (cont.) and XML Schema

XML. Part II DTD (cont.) and XML Schema XML Part II DTD (cont.) and XML Schema Attribute Declarations Declare a list of allowable attributes for each element These lists are called ATTLIST declarations Consists of 3 basic parts The ATTLIST keyword

More information

No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation.

No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation. [MS-DPAD]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation for protocols, file formats, languages,

More information

Intellectual Property Rights Notice for Open Specifications Documentation

Intellectual Property Rights Notice for Open Specifications Documentation [MS-SSISPARAMS-Diff]: Intellectual Property Rights tice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation for protocols, file formats,

More information

Modelling XML Applications (part 2)

Modelling XML Applications (part 2) Modelling XML Applications (part 2) Patryk Czarnik XML and Applications 2014/2015 Lecture 3 20.10.2014 Common design decisions Natural language Which natural language to use? It would be a nonsense not

More information

This Documentation, which includes embedded help systems and electronically distributed materials, (hereinafter referred to

This Documentation, which includes embedded help systems and electronically distributed materials, (hereinafter referred to This Documentation, which includes embedded help systems and electronically distributed materials, (hereinafter referred to as the Documentation ) is for your informational purposes only and is subject

More information

Using and defining new simple types. Basic structuring of definitions. Copyright , Sosnoski Software Solutions, Inc. All rights reserved.

Using and defining new simple types. Basic structuring of definitions. Copyright , Sosnoski Software Solutions, Inc. All rights reserved. IRIS Web Services Workshop II Session 1 Wednesday AM, September 21 Dennis M. Sosnoski What is XML? Extensible Markup Language XML is metalanguage for markup Enables markup languages for particular domains

More information

Faster XML data validation in a programming language with XML datatypes

Faster XML data validation in a programming language with XML datatypes Faster XML data validation in a programming language with XML datatypes Kurt Svensson Inobiz AB Kornhamnstorg 61, 103 12 Stockholm, Sweden kurt.svensson@inobiz.se Abstract EDI-C is a programming language

More information

The XQuery Data Model

The XQuery Data Model The XQuery Data Model 9. XQuery Data Model XQuery Type System Like for any other database query language, before we talk about the operators of the language, we have to specify exactly what it is that

More information

Oracle Berkeley DB XML. API Reference for C++ 12c Release 1

Oracle Berkeley DB XML. API Reference for C++ 12c Release 1 Oracle Berkeley DB XML API Reference for C++ 12c Release 1 Library Version 12.1.6.0 Legal Notice This documentation is distributed under an open source license. You may review the terms of this license

More information

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington CS330 Lecture April 8, 2003 1 Overview From HTML to XML DTDs Querying XML: XPath Transforming XML: XSLT

More information

XML Format Plug-in User s Guide. Version 10g Release 3 (10.3)

XML Format Plug-in User s Guide. Version 10g Release 3 (10.3) XML Format Plug-in User s Guide Version 10g Release 3 (10.3) XML... 4 TERMINOLOGY... 4 CREATING AN XML FORMAT... 5 CREATING AN XML FORMAT BASED ON AN EXISTING XML MESSAGE FORMAT... 5 CREATING AN EMPTY

More information

XML extensible Markup Language

XML extensible Markup Language extensible Markup Language Eshcar Hillel Sources: http://www.w3schools.com http://java.sun.com/webservices/jaxp/ learning/tutorial/index.html Tutorial Outline What is? syntax rules Schema Document Object

More information

Software Engineering Methods, XML extensible Markup Language. Tutorial Outline. An Example File: Note.xml XML 1

Software Engineering Methods, XML extensible Markup Language. Tutorial Outline. An Example File: Note.xml XML 1 extensible Markup Language Eshcar Hillel Sources: http://www.w3schools.com http://java.sun.com/webservices/jaxp/ learning/tutorial/index.html Tutorial Outline What is? syntax rules Schema Document Object

More information

XML. Document Type Definitions XML Schema. Database Systems and Concepts, CSCI 3030U, UOIT, Course Instructor: Jarek Szlichta

XML. Document Type Definitions XML Schema. Database Systems and Concepts, CSCI 3030U, UOIT, Course Instructor: Jarek Szlichta XML Document Type Definitions XML Schema 1 XML XML stands for extensible Markup Language. XML was designed to describe data. XML has come into common use for the interchange of data over the Internet.

More information

All About <xml> CS193D, 2/22/06

All About <xml> CS193D, 2/22/06 CS193D Handout 17 Winter 2005/2006 February 21, 2006 XML See also: Chapter 24 (709-728) All About CS193D, 2/22/06 XML is A markup language, but not really a language General purpose Cross-platform

More information

Part VII. Querying XML The XQuery Data Model. Marc H. Scholl (DBIS, Uni KN) XML and Databases Winter 2005/06 153

Part VII. Querying XML The XQuery Data Model. Marc H. Scholl (DBIS, Uni KN) XML and Databases Winter 2005/06 153 Part VII Querying XML The XQuery Data Model Marc H. Scholl (DBIS, Uni KN) XML and Databases Winter 2005/06 153 Outline of this part 1 Querying XML Documents Overview 2 The XQuery Data Model The XQuery

More information

XML Schema. Mario Alviano A.Y. 2017/2018. University of Calabria, Italy 1 / 28

XML Schema. Mario Alviano A.Y. 2017/2018. University of Calabria, Italy 1 / 28 1 / 28 XML Schema Mario Alviano University of Calabria, Italy A.Y. 2017/2018 Outline 2 / 28 1 Introduction 2 Elements 3 Simple and complex types 4 Attributes 5 Groups and built-in 6 Import of other schemes

More information

About the Data-Flow Complexity of Web Processes

About the Data-Flow Complexity of Web Processes Cardoso, J., "About the Data-Flow Complexity of Web Processes", 6th International Workshop on Business Process Modeling, Development, and Support: Business Processes and Support Systems: Design for Flexibility.

More information

Last week we saw how to use the DOM parser to read an XML document. The DOM parser can also be used to create and modify nodes.

Last week we saw how to use the DOM parser to read an XML document. The DOM parser can also be used to create and modify nodes. Distributed Software Development XML Schema Chris Brooks Department of Computer Science University of San Francisco 7-2: Modifying XML programmatically Last week we saw how to use the DOM parser to read

More information

Restricting complextypes that have mixed content

Restricting complextypes that have mixed content Restricting complextypes that have mixed content Roger L. Costello October 2012 complextype with mixed content (no attributes) Here is a complextype with mixed content:

More information

[MS-QDEFF]: Query Definition File Format. Intellectual Property Rights Notice for Open Specifications Documentation

[MS-QDEFF]: Query Definition File Format. Intellectual Property Rights Notice for Open Specifications Documentation [MS-QDEFF]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation for protocols, file formats, languages,

More information

CSC Web Technologies, Spring Web Data Exchange Formats

CSC Web Technologies, Spring Web Data Exchange Formats CSC 342 - Web Technologies, Spring 2017 Web Data Exchange Formats Web Data Exchange Data exchange is the process of transforming structured data from one format to another to facilitate data sharing between

More information

Framing how values are extracted from the data stream. Includes properties for alignment, length, and delimiters.

Framing how values are extracted from the data stream. Includes properties for alignment, length, and delimiters. DFDL Introduction For Beginners Lesson 3: DFDL properties In lesson 2 we learned that in the DFDL language, XML Schema conveys the basic structure of the data format being modeled, and DFDL properties

More information

XML Information Set. Working Draft of May 17, 1999

XML Information Set. Working Draft of May 17, 1999 XML Information Set Working Draft of May 17, 1999 This version: http://www.w3.org/tr/1999/wd-xml-infoset-19990517 Latest version: http://www.w3.org/tr/xml-infoset Editors: John Cowan David Megginson Copyright

More information

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11 !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... 7:4 @import Directive... 9:11 A Absolute Units of Length... 9:14 Addressing the First Line... 9:6 Assigning Meaning to XML Tags...

More information

Syntax XML Schema XML Techniques for E-Commerce, Budapest 2004

Syntax XML Schema XML Techniques for E-Commerce, Budapest 2004 Mag. iur. Dr. techn. Michael Sonntag Syntax XML Schema XML Techniques for E-Commerce, Budapest 2004 E-Mail: sonntag@fim.uni-linz.ac.at http://www.fim.uni-linz.ac.at/staff/sonntag.htm Michael Sonntag 2004

More information

[MS-QDEFF]: Query Definition File Format. Intellectual Property Rights Notice for Open Specifications Documentation

[MS-QDEFF]: Query Definition File Format. Intellectual Property Rights Notice for Open Specifications Documentation [MS-QDEFF]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation ( this documentation ) for protocols,

More information

[MS-MSL]: Mapping Specification Language File Format. Intellectual Property Rights Notice for Open Specifications Documentation

[MS-MSL]: Mapping Specification Language File Format. Intellectual Property Rights Notice for Open Specifications Documentation [MS-MSL]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation ( this documentation ) for protocols,

More information

PESC Compliant JSON Version /19/2018. A publication of the Technical Advisory Board Postsecondary Electronic Standards Council

PESC Compliant JSON Version /19/2018. A publication of the Technical Advisory Board Postsecondary Electronic Standards Council Version 0.5.0 10/19/2018 A publication of the Technical Advisory Board Postsecondary Electronic Standards Council 2018. All Rights Reserved. This document may be copied and furnished to others, and derivative

More information

CS 378 Big Data Programming

CS 378 Big Data Programming CS 378 Big Data Programming Lecture 9 Complex Writable Types AVRO, Protobuf CS 378 - Fall 2017 Big Data Programming 1 Review Assignment 4 WordStaQsQcs using Avro QuesQons/issues? MRUnit and AVRO AVRO keys,

More information

Framing how values are extracted from the data stream. Includes properties for alignment, length, and delimiters.

Framing how values are extracted from the data stream. Includes properties for alignment, length, and delimiters. DFDL Introduction For Beginners Lesson 3: DFDL properties Version Author Date Change 1 S Hanson 2011-01-24 Created 2 S Hanson 2011-03-30 Updated 3 S Hanson 2012-09-21 Corrections for errata 4 S Hanson

More information

MCS-274 Final Exam Serial #:

MCS-274 Final Exam Serial #: MCS-274 Final Exam Serial #: This exam is closed-book and mostly closed-notes. You may, however, use a single 8 1/2 by 11 sheet of paper with hand-written notes for reference. (Both sides of the sheet

More information

Chapter 1: Getting Started. You will learn:

Chapter 1: Getting Started. You will learn: Chapter 1: Getting Started SGML and SGML document components. What XML is. XML as compared to SGML and HTML. XML format. XML specifications. XML architecture. Data structure namespaces. Data delivery,

More information

Accessing Arbitrary Hierarchical Data

Accessing Arbitrary Hierarchical Data D.G.Muir February 2010 Accessing Arbitrary Hierarchical Data Accessing experimental data is relatively straightforward when data are regular and can be modelled using fixed size arrays of an atomic data

More information

Data Presentation and Markup Languages

Data Presentation and Markup Languages Data Presentation and Markup Languages MIE456 Tutorial Acknowledgements Some contents of this presentation are borrowed from a tutorial given at VLDB 2000, Cairo, Agypte (www.vldb.org) by D. Florescu &.

More information

Service Modeling Language (SML) Pratul Dublish Principal Program Manager Microsoft Corporation

Service Modeling Language (SML) Pratul Dublish Principal Program Manager Microsoft Corporation Service Modeling Language (SML) Pratul Dublish Principal Program Manager Microsoft Corporation 1 Outline Introduction to SML SML Model Inter-Document References URI scheme deref() extension function Schema-based

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1 Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.

More information

Introduction to Semistructured Data and XML

Introduction to Semistructured Data and XML Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of Washington Database Management Systems, R. Ramakrishnan 1 How the Web is Today HTML documents often

More information

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML 7.1 Introduction extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML Lax syntactical rules Many complex features that are rarely used HTML is a markup language,

More information

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML Chapter 7 XML 7.1 Introduction extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML Lax syntactical rules Many complex features that are rarely used HTML

More information

SDPL : XML Basics 2. SDPL : XML Basics 1. SDPL : XML Basics 4. SDPL : XML Basics 3. SDPL : XML Basics 5

SDPL : XML Basics 2. SDPL : XML Basics 1. SDPL : XML Basics 4. SDPL : XML Basics 3. SDPL : XML Basics 5 2 Basics of XML and XML documents 2.1 XML and XML documents Survivor's Guide to XML, or XML for Computer Scientists / Dummies 2.1 XML and XML documents 2.2 Basics of XML DTDs 2.3 XML Namespaces XML 1.0

More information

Semistructured Content

Semistructured Content On our first day Semistructured Content 1 Structured data : database system tagged, typed well-defined semantic interpretation Semi-structured data: tagged - XML (HTML?) some help with semantic interpretation

More information

CS A1150 Databases, 2017 Exercise Round 6 (( ), Solutions

CS A1150 Databases, 2017 Exercise Round 6 (( ), Solutions CS A1150 Databases, 2017 Exercise Round 6 ((8.5. 12.5.2017), Solutions 1. (6 p.) Consider a database schema which consists of three relations, whose schemas are: Students(ID, name, program, year) Courses(code,

More information

Describing Document Types: The Schema Languages of XML Part 2

Describing Document Types: The Schema Languages of XML Part 2 Describing Document Types: The Schema Languages of XML Part 2 John Cowan 1 Copyright Copyright 2005 John Cowan Licensed under the GNU General Public License ABSOLUTELY NO WARRANTIES; USE AT YOUR OWN RISK

More information

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. M.Sc. in Advanced Computer Science. Date: Tuesday 20 th May 2008.

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. M.Sc. in Advanced Computer Science. Date: Tuesday 20 th May 2008. COMP60370 Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE M.Sc. in Advanced Computer Science Semi-Structured Data and the Web Date: Tuesday 20 th May 2008 Time: 09:45 11:45 Please answer

More information

Module 4. Implementation of XQuery. Part 2: Data Storage

Module 4. Implementation of XQuery. Part 2: Data Storage Module 4 Implementation of XQuery Part 2: Data Storage Aspects of XQuery Implementation Compile Time + Optimizations Operator Models Query Rewrite Runtime + Query Execution XML Data Representation XML

More information

So far, we've discussed the use of XML in creating web services. How does this work? What other things can we do with it?

So far, we've discussed the use of XML in creating web services. How does this work? What other things can we do with it? XML Page 1 XML and web services Monday, March 14, 2011 2:50 PM So far, we've discussed the use of XML in creating web services. How does this work? What other things can we do with it? XML Page 2 Where

More information

XML Schema Element and Attribute Reference

XML Schema Element and Attribute Reference E XML Schema Element and Attribute Reference all This appendix provides a full listing of all elements within the XML Schema Structures Recommendation (found at http://www.w3.org/tr/xmlschema-1/). The

More information

XML DTDs and Namespaces. CS174 Chris Pollett Oct 3, 2007.

XML DTDs and Namespaces. CS174 Chris Pollett Oct 3, 2007. XML DTDs and Namespaces CS174 Chris Pollett Oct 3, 2007. Outline Internal versus External DTDs Namespaces XML Schemas Internal versus External DTDs There are two ways to associate a DTD with an XML document:

More information

Introduction to Database Systems CSE 414

Introduction to Database Systems CSE 414 Introduction to Database Systems CSE 414 Lecture 14-15: XML CSE 414 - Spring 2013 1 Announcements Homework 4 solution will be posted tomorrow Midterm: Monday in class Open books, no notes beyond one hand-written

More information

Thrift specification - Remote Procedure Call

Thrift specification - Remote Procedure Call Erik van Oosten Revision History Revision 1.0 2016-09-27 EVO Initial version v1.1, 2016-10-05: Corrected integer type names. Small changes to section headers. Table of Contents 1.

More information

Tutorial 2: Validating Documents with DTDs

Tutorial 2: Validating Documents with DTDs 1. One way to create a valid document is to design a document type definition, or DTD, for the document. 2. As shown in the accompanying figure, the external subset would define some basic rules for all

More information

W3c Xml Schema 1.0 Data Type Datetime

W3c Xml Schema 1.0 Data Type Datetime W3c Xml Schema 1.0 Data Type Datetime The XML Schema recommendations define features, such as structures ((Schema Part 1)) and simple data types ((Schema Part 2)), that extend the XML Information Set with

More information

Semantic Web. XML and XML Schema. Morteza Amini. Sharif University of Technology Fall 94-95

Semantic Web. XML and XML Schema. Morteza Amini. Sharif University of Technology Fall 94-95 ه عا ی Semantic Web XML and XML Schema Morteza Amini Sharif University of Technology Fall 94-95 Outline Markup Languages XML Building Blocks XML Applications Namespaces XML Schema 2 Outline Markup Languages

More information

Markup Languages. Lecture 4. XML Schema

Markup Languages. Lecture 4. XML Schema Markup Languages Lecture 4. XML Schema Introduction to XML Schema XML Schema is an XML-based alternative to DTD. An XML schema describes the structure of an XML document. The XML Schema language is also

More information

PART. Oracle and the XML Standards

PART. Oracle and the XML Standards PART I Oracle and the XML Standards CHAPTER 1 Introducing XML 4 Oracle Database 10g XML & SQL E xtensible Markup Language (XML) is a meta-markup language, meaning that the language, as specified by the

More information

Modelling XML Applications

Modelling XML Applications Modelling XML Applications Patryk Czarnik XML and Applications 2013/2014 Lecture 2 14.10.2013 XML application (recall) XML application (zastosowanie XML) A concrete language with XML syntax Typically defined

More information

B4M36DS2, BE4M36DS2: Database Systems 2

B4M36DS2, BE4M36DS2: Database Systems 2 B4M36DS2, BE4M36DS2: Database Systems 2 h p://www.ksi.mff.cuni.cz/~svoboda/courses/171-b4m36ds2/ Lecture 2 Data Formats Mar n Svoboda mar n.svoboda@fel.cvut.cz 9. 10. 2017 Charles University in Prague,

More information

Request for Comments: Tail-f Systems December Partial Lock Remote Procedure Call (RPC) for NETCONF

Request for Comments: Tail-f Systems December Partial Lock Remote Procedure Call (RPC) for NETCONF Network Working Group Request for Comments: 5717 Category: Standards Track B. Lengyel Ericsson M. Bjorklund Tail-f Systems December 2009 Partial Lock Remote Procedure Call (RPC) for NETCONF Abstract The

More information

Chapter 3 Brief Overview of XML

Chapter 3 Brief Overview of XML Slide 3.1 Web Serv vices: Princ ciples & Te echno ology Chapter 3 Brief Overview of XML Mike P. Papazoglou & mikep@uvt.nl Slide 3.2 Topics XML document structure XML schemas reuse Document navigation and

More information

Apache Avro# Specification

Apache Avro# Specification Table of contents 1 Introduction...3 2 Schema Declaration... 3 2.1 Primitive Types... 3 2.2 Complex Types...3 2.3 Names... 6 2.4 Aliases... 7 3 Data Serialization...8 3.1 Encodings... 8 3.2 Binary Encoding...8

More information

Big Data Exercises. Fall 2016 Week 9 ETH Zurich

Big Data Exercises. Fall 2016 Week 9 ETH Zurich Big Data Exercises Fall 2016 Week 9 ETH Zurich Introduction This exercise will cover XQuery. You will be using oxygen (https://www.oxygenxml.com/xml_editor/software_archive_editor.html), AN XML/JSON development

More information

Complex type. This subset is enough to model the logical structure of all kinds of non-xml data.

Complex type. This subset is enough to model the logical structure of all kinds of non-xml data. DFDL Introduction For Beginners Lesson 2: DFDL language basics We have seen in lesson 1 how DFDL is not an entirely new language. Its foundation is XML Schema 1.0. Although XML Schema was created as a

More information

QVX File Format and QlikView Custom Connector

QVX File Format and QlikView Custom Connector QVX File Format and QlikView Custom Connector Contents 1 QVX File Format... 2 1.1 QvxTableHeader XML Schema... 2 1.1.1 QvxTableHeader Element... 4 1.1.2 QvxFieldHeader Element... 5 1.1.3 QvxFieldType Type...

More information

Big Data Exam ETH Zürich February 9, 2017 Answer sheet

Big Data Exam ETH Zürich February 9, 2017 Answer sheet Big Data Exam ETH Zürich February 9, 2017 Answer sheet Name: Legi-Number: 1. A B C D 2. A B C D 3. A B C D 4. A B C D 5. A B C D 6. A B C D 7. A B C D 8. A B C D 9. A B C D 10. A B C D 11. A B C D 12.

More information

Appendix H XML Quick Reference

Appendix H XML Quick Reference HTML Appendix H XML Quick Reference What Is XML? Extensible Markup Language (XML) is a subset of the Standard Generalized Markup Language (SGML). XML allows developers to create their own document elements

More information

Introduction to Database Systems CSE 414

Introduction to Database Systems CSE 414 Introduction to Database Systems CSE 414 Lecture 13: XML and XPath 1 Announcements Current assignments: Web quiz 4 due tonight, 11 pm Homework 4 due Wednesday night, 11 pm Midterm: next Monday, May 4,

More information

JSON - Overview JSon Terminology

JSON - Overview JSon Terminology Announcements Introduction to Database Systems CSE 414 Lecture 12: Json and SQL++ Office hours changes this week Check schedule HW 4 due next Tuesday Start early WQ 4 due tomorrow 1 2 JSON - Overview JSon

More information

CLIENT-SIDE XML SCHEMA VALIDATION

CLIENT-SIDE XML SCHEMA VALIDATION Factonomy Ltd The University of Edinburgh Aleksejs Goremikins Henry S. Thompson CLIENT-SIDE XML SCHEMA VALIDATION Edinburgh 2011 Motivation Key gap in the integration of XML into the global Web infrastructure

More information

XML. Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior

XML. Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior XML Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior XML INTRODUCTION 2 THE XML LANGUAGE XML: Extensible Markup Language Standard for the presentation and transmission of information.

More information

ADT 2005 Lecture 7 Chapter 10: XML

ADT 2005 Lecture 7 Chapter 10: XML ADT 2005 Lecture 7 Chapter 10: XML Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ Database System Concepts Silberschatz, Korth and Sudarshan The Challenge: Comic Strip Finder The Challenge:

More information

Author: Irena Holubová Lecturer: Martin Svoboda

Author: Irena Holubová Lecturer: Martin Svoboda NPRG036 XML Technologies Lecture 1 Introduction, XML, DTD 19. 2. 2018 Author: Irena Holubová Lecturer: Martin Svoboda http://www.ksi.mff.cuni.cz/~svoboda/courses/172-nprg036/ Lecture Outline Introduction

More information

[MS-WORDSSP]: Word Automation Services Stored Procedures Protocol Specification

[MS-WORDSSP]: Word Automation Services Stored Procedures Protocol Specification [MS-WORDSSP]: Word Automation Services Stored Procedures Protocol Specification Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open

More information

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML CSI 3140 WWW Structures, Techniques and Standards Representing Web Data: XML XML Example XML document: An XML document is one that follows certain syntax rules (most of which we followed for XHTML) Guy-Vincent

More information

Announcements. JSon Data Structures. JSon Syntax. JSon Semantics: a Tree! JSon Primitive Datatypes. Introduction to Database Systems CSE 414

Announcements. JSon Data Structures. JSon Syntax. JSon Semantics: a Tree! JSon Primitive Datatypes. Introduction to Database Systems CSE 414 Introduction to Database Systems CSE 414 Lecture 13: Json and SQL++ Announcements HW5 + WQ5 will be out tomorrow Both due in 1 week Midterm in class on Friday, 5/4 Covers everything (HW, WQ, lectures,

More information

Fall, 2005 CIS 550. Database and Information Systems Homework 5 Solutions

Fall, 2005 CIS 550. Database and Information Systems Homework 5 Solutions Fall, 2005 CIS 550 Database and Information Systems Homework 5 Solutions November 15, 2005; Due November 22, 2005 at 1:30 pm For this homework, you should test your answers using Galax., the same XQuery

More information

MANAGING INFORMATION (CSCU9T4) LECTURE 2: XML STRUCTURE

MANAGING INFORMATION (CSCU9T4) LECTURE 2: XML STRUCTURE MANAGING INFORMATION (CSCU9T4) LECTURE 2: XML STRUCTURE Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE XML Elements vs. Attributes Well-formed vs. Valid XML documents Document Type Definitions (DTDs)

More information

Work/Studies History. Programming XML / XSD. Database

Work/Studies History. Programming XML / XSD. Database Work/Studies History 1. What was your emphasis in your bachelor s work at XXX? 2. What was the most interesting project you worked on there? 3. What is your emphasis in your master s work here at UF? 4.

More information

[MS-ASNOTE]: Exchange ActiveSync: Notes Class Protocol. Intellectual Property Rights Notice for Open Specifications Documentation

[MS-ASNOTE]: Exchange ActiveSync: Notes Class Protocol. Intellectual Property Rights Notice for Open Specifications Documentation [MS-ASNOTE]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation ( this documentation ) for protocols,

More information

No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation.

No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation. [MS-MSL]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation for protocols, file formats, languages,

More information