웹기술및응용 XML Basics 2018 년 2 학기 Instructor: Prof. Young-guk Ha Dept. of Computer Science & Engineering
목차 q Introduction to XML q XML Document Structure and Basic Syntax 2
Introduction to XML
XML (extensible Markup Language) 개요 (1) q What extensible means in XML Ø Capable of being extended Ø Means that you can define your own markups q Markups (Tags) Ø Information added to content of a text that enhances its meaning o Demarcates or labels parts of a text Ø Types of markups in HTML o Semantic Markup: describes the meaning of content E.g.) <TITLE>, <BODY> o Stylistic Markup: describes how to present the content E.g.) <FONT>, <B> o Structural Markup: describes the structure of content E.g.) <P> 4
XML (extensible Markup Language) 개요 (2) q Markup language Ø A set of markups that can be placed in a text for a specific purpose Ø E.g., HTML, WML, VRML, SensorML, MathML, VoiceXML, q XML Ø Extensible markup language = meta-markup language Ø A set of rules to build a markup language and to handle the documents o I.e., family of technologies to describe how to define tags, transform documents, retrieve data, present data, and so on q XML document Ø A document having its content demarcated by XML tags Ø Set of new tag definitions with XML tags 5
XML 의역사 1970 GML (IBM) 1986 1991 SGML HTML WWW q 1986: SGML (Standard Generalized Markup Language) à International Standard (ISO) q 1998: XML 1.0 à De Facto Standard (W3C) 1998 XML q 2004: XML 1.1 q 2006: XML 1.1 (2nd Edition) q 2008: XML 1.0 (5th Edition) 6
Example of XML Document (1) q All XML documents are made up of markups and contents Ø Semi-structured documents Ø Markups and contents complement each other Ø Markups create an information entity with partitions Ø Markups create an labeled data in a handy package <?xml version= 1,0?> <letter priority= important > <to>john</to> <subject>cs760</subject> <message> Don t forget to attend the class <emphasis>on Friday </emphasis> Good luck to you. </message> <from>tomas</from> </letter> 7
Example of XML Document (2) 3 BMW 차에대한 XML 문서 2 XML 저작도구 : BMW 차에대한 XML 문서작성 1 실세계의 BMW 차 BMW 8
XML vs. HTML (1) q HTML 은미리정의된 tag 만을사용, XML 은 tag 를확장가능 q HTML tag 들은주로 content 를화면에보여주기위한방법제공, XML tag 들은문서의구조화혹은 content 에대한 labeling 방법제공 q XML 은 tag 명칭의대 / 소문자를구분 화양동 화양동 우편번호라는사실을알기어려움 9
XML vs. HTML (2) q XML 문서 Ø XML tag 를이용해서 labeling 함으로써 content 의의미를표현가능 <zip>450-3490</zip> 화양동 10
XML vs. Other Electronic Documents q HWP 및 MS Word 문서 Ø 비표준화된전용의이진파일형태로저장 Ø 문서구조정보가없고문서내용과스타일이혼합 Ø 외부프로그램에서문서사용및처리의자동화가어려움 q XML 문서 Ø 일반 text 파일형태로저장하여모든컴퓨팅플랫폼에서판독가능 Ø 문서를구조, 내용및스타일로각각분리하여관리 o 문서구조 : DTD나 XML Schema를기반으로정의 (document model) o 문서내용 : document model에맞추어 content 작성 (valid XML document) o 문서스타일 : 문서내용을표현하기위한스타일정의 (XSL, CSS) Ø 외부프로그램에서문서사용및자동화된처리가용이함 11
Benefits of XML Documents q 다른전자문서와비교한 XML 문서의장점 Ø 데이터의독립성 o 문서의구조 (DTD, XML Schema) 와내용 (document) 을분리 Ø 다양한표현 o 동일한문서내용을다양하게표현이가능 (CSS, XSL) Ø 데이터교환이용이 o Text 및개방형웹표준기반 Ø 데이터검색기능강화 o Semi-structured 문서로서데이터검색이용이 (XPath, XQuery) Ø 문서구조의변형 (transform) 이용이 o E.g., XML 문서 à HTML 문서 (XSLT) o E.g., XML 문서 à MS Word, HWP, PDF 등 binary 문서 (XSLT-FO) 12
XML Technology Family 문서구조 DTD XML Schema SOX 문서스타일 XSLT XSLT-FO XSL, CSS 문서 API SAX DOM JDOM 문서링크 XPath XPointer Xlink XML SOAP WSDL UDDI 서비스 파생언어 WML XHTML MathML 보안 Encryption Signature 저장및검색 XML-DBMS NXD XQuery 13
XML Document Structure and Basic Syntax
XML 기본용어 (1) q Element Ø Labeled container of content Ø Basic building block of XML documents 시작태그 (Start tag) Element to <to type = name > Hong Gildong </to> 내용 (Content) 속성 (Attribute) 마침태그 (End tag) 15
XML 기본용어 (2) q 적절한문서 (Well-formed document) Ø 브라우저나다른프로그램에의해처리될수있도록해주는최소한의규약인 XML 기본문법을준수한문서 1) It contains only properly-encoded legal Unicode characters 2) None of the special syntax characters such as "<" and "&" appear except when performing their markup-delineation roles 3) The begin, end, and empty-element tags which delimit the elements are correctly nested, without missing and overlapping 4) The element tags are case-sensitive; the start and end tags must match exactly 5) There is a single root element which contains all the other elements q 유효한문서 (Valid document) Ø 해당문서의문서모델에맞는문서 o o DTD (Document Type Definition) XML Schema 16
적절한 (Well-Formed) 문서의예 q 정확히하나의최상위 (root) 엘리먼트를가져야함 Ø 적절한문서 : <jumin> </jumin> q 태그가올바르게둘러싸여져야함 (correctly nested) Ø 적절한문서 : <jumin><name>kim</name></jumin> Ø 적절하지못한문서 : <jumin><name>kim</jumin></name> q 각엘리먼트가시작태그와마침태그를모두가져야함 Ø 적절하지못한문서 : <name>kim 또는 kim</name> q 시작태그명과마침태그명이같아야함 ( 대 / 소문자구분포함 ) Ø 적절한문서 : <name>kim</name> Ø 적절하지못한문서 : <name>kim</age>, <name>kim</name> 17
Well-formed 및 Valid Document 검사 18
XML 문서구조 <?xml version= 1.0 encoding= euc-kr?> <!DOCUMENT memo [ <!ELEMENT memo (to, )> ]> XML Declaration Document Type Declaration ( 생략가능 ) Prolog ( 생략가능 ) <memo> <to what= name > 홍길동 </to> <date>2002/04/05</date> <contents> 전화요망 </contents> <from> 허준 </from> </memo> Elements (Contents) 19
간단한 XML 문서구조의예 XML 선언 XML 문서내용 (Elements) 20
Example of XML Document XML 선언 XML 문서내용 (Elements) 21
Tree View of the Example Document Structure Root Element Element Attribute Content 22
Structure of XML Documents q XML Document := Prolog? Element q Prolog Ø Tips off the world that the document is marked up in XML q Element Ø Root element (Document element) Ø Other elements 23
Prolog q Prolog := XMLDecl DocTypeDecl? q Top of XML document is graced with special information Ø XML Declaration o The document is marked up in XML o Example <?xml version= 1.0?> Ø Document Type Declaration o Defines name of the root element o Defines DTD (Document Type Definition) reference à document model 24
XML Declaration q XMLDecl := <?xml versioninfo encodinginfo? standaloneinfo??> Ø version o E.g., version= 1.0 Ø encoding o euc-kr : Korean encoding o UTF-8 : 8-bit Unicode (default) Ø standalone o yes : No external file to load o no : Some files to load (default) When there is an External Entity When DTD is in an external file * Note <??> tag comes from SGML q Examples <?xml version= 1.0?> <?xml version= 1.0 encoding= euc-kr?> 25
Document Type Declaration q DocTypeDecl := <!DOCTYPE root-element extid-of-dtd? > ( [ internal-subset ] )? * Note <!!> and [ ] tags come from SGML q Document Type Declaration Ø Defines name of the root element Ø Defines DTD (internal subset) o For document validity checking o Defines ELEMENT and ENTITY declarations q External subset reference Ø extid-of-dtd refers to an external subset for document type declaration 26
Document Type Declaration Example (1) Root Element DTD 27
Document Type Declaration Example (2) External ID of DTD 28
Element: Building Block of XML Documents q Element := <name (att1= value1 att2= value2 )? > content </name> q Empty Element := <name (att1= value1 att2= value2 )? /> q Example <Caution class= info > Start, End tag should be pair! Name is case-sensitive! Whitespace in content is preserved! Following element is empty element. <EmptyElement/> </Caution> 29
Element: Building Block of XML (cont d) q Naming rules Ø Starts with a letter or underscore (_) Ø Should not start with xml, Xml, xml, xml,, or XML Ø Contains letters, numbers, hyphen (-), period (.) and underscore (_) q Positioning rules for well-formed documents Ø End tag must come after the start tag Ø Elements should be correctly nested o There should be no overlapping elements o An element s start and end tags must both reside in the same parent 30
Element: Building Block of XML (cont d) q Element definition examples Ø <Err>Case-sensitive</err> à </Err>just do it</err> Ø <1st>Don t Start with Number</1st> à <first> </first> Ø <Xml_tag>Don t Start with xml <Xml_tag> Ø < err></err> à <err></err> Ø <e rr></err> à <err></err> Ø <emptyelement/> o Is equal to <emptyelement></emptyelement> o Is not equal to <emptyelement> </emptyelement> because whitespaces are preserved in XML content 31
Attribute: More Muscle for Elements q Attribute := name = value value Ø Gives elements unique properties Ø There can be many attributes in an element (unordered) Ø Attributes are separated by whitespaces (not comma) Ø Attribute names should be unique within an element Ø If the attribute value itself contains double (or single) quotes we can use single (or double) quotes around them q Examples Ø <letter priority= high type= 1 /> == <letter type= 1 priority= high /> Ø <choice test= msg= hi > or <choice test= msg= hi > Ø <team person= sue person= joe > à <team person1= sue person2= joe > 32
Attribute: More Muscle for Elements (cont d) q Attribute Value Types (in DTD) Ø ID o Validating XML parser warns you if the ID doesn t have a unique value through out the document (attribute no in the example below) Ø IDREF(S) o Validating XML parser warns you if the IDREF points to a nonexistent element (attribute with in the example below) Ø Other types: ENUMERATED, CDATA, ENTITY(S), NMTOKEN(S) q Example <part no= bolt-100 /> <part no= bolt-100 /> <part no= bolt-123 /> <part no= nut-123 > <compatible with= bolt-123 /> <compatible with= bolt-456 /> </part> 33
Entity: Placeholder for Content q Entity Ø Contains a part of XML document Ø Something like macro in C (#define): Declare once, use many times Ø Doesn t add anything semantically to the markup Ø Always eliminate an inconvenience o From standing in impossible-to-type characters o To marking the place where a file should be imported (external entity) q Example in the internal-subset <!DOCTYPE letter... [ ]> <!ENTITY w3url http://www.w3.org/ > <letter> <message>hi. John. W3 URL is &w3url;</message> </letter> <message> Hi. John. W3 URL is http://www.w3.org/ </message> 34
Entity: Placeholder for Content (cont d) Used in DTD 35
Entity: Placeholder for Content (cont d) q Character Entity Ø Predefined o Ampersand(&): amp o Apostrophe( ): apos o Greater than(>): gt o Less than(<): lt o Quotation( ): quot Ø Numbered (Unicode from #0 to #65536) o E.g., cedilla(ç): #231 o Alphabetic, syllabic, ideographic scripts Latin Greek 20,000 Han ideographs 11,000 Hangul ideographs,... Ø Named (user defined) o E.g., <!ENTITY cedilla ç > <!ENTITY name Kim > 36
Entity: Placeholder for Content (cont d) q Mixed-Content Entity Ø Contains content of unlimited length Ø Can include markup as well as text o Internal entity E.g., <!ENTITY phone <number>042-999-9999</number> > o External entity E.g., <!ENTITY signature SYSTEM./signature.xml > 37
Entity: Placeholder for Content (cont d) q Example à External entity 38
Entity: Placeholder for Content (cont d) External entity imported from./signature.xml 39
Entity: Placeholder for Content (cont d) q External Entity Example <!ENTITY part1 SYSTEM./p1.xml > <!ENTITY part2 SYSTEM http://www.bobsbolts.com/p2.xml > <!ENTITY part3 SYSTEM http://www.tomsnuts.com/p3.xml > à Local file à www.bobsbolts.com à www.tomsnuts.com 40
Entity: Placeholder for Content (cont d) q Unparsed Entity Ø Should not be parsed by XML parser o Tells parser not to load the entity s content o Normally used for applications Ø May contain something other than text o E.g.) Binary image files <!ENTITY mypic SYSTEM./erik.gif NDATA GIF> à GIF is name of notation data (NDATA) declared as <!NOTATION GIF SYSTEM image/gif > 41
Entity: Placeholder for Content (cont d) q Parameter Entity Ø Only occur in the document type declaration section o Preceded by % (not by & ) Ø Parameter entity references are immediately expanded in the document type declaration o E.g., without parameter entity <!ELEMENT burns (#PCDATA quote)*> <!ELEMENT allen (#PCDATA quote)*> o E.g., with parameter entity <!ENTITY % pcont "#PCDATA quote"> <!ELEMENT burns (%pcont;)*> <!ELEMENT allen (%pcont;)*> 42
Miscellaneous Markups q Comment := <!-- any_text_and_markup --> Ø Tells parser to ignore those regions Ø Within comments, -- should not occur Ø E.g., <!-- <address>59 Sunspot Avene</address> --> q Processing Instruction := <? keyword data??> Ø Container for data targeted toward specific applications or parsers Ø E.g., <?linebreak?> <?xml version= 1.0?> 43
Miscellaneous Markups (cont d) q CDATA Section := <![CDATA[ any_text_and_markup ]]> Ø Tells parser the section contains no markup o Should be treated as a regular text Ø Within a CDATA section, ]]> should not occur o You can use ]]> instead of ]]> Ø E.g.) Using < and > in CDATA section ]]> with CDATA Section 44
References q XML 1.0 (Fifth Edition) Ø W3C Recommendation 26 Nov. 2008 Ø http://www.w3.org/tr/xml 45