Summarization of XML Documents
|
|
- Hollie McCoy
- 6 years ago
- Views:
Transcription
1 Summarization of XML Documents Hesham Elzentani, Prof. dr Mladen Veinović Abstract EXtensible Markup Language (XML) has become a standard of data exchange and representation in many applications. An XML document is usually too complex and large to understand and use. A summarized XML document of the original document is useful in such cases. This paper introduces semantic XML document summaries, which present the important information available in an XML document, three standards are given to evaluate the final summarized XML document: document size, information content, and information importance and approaches for the summarization of XML documents. well-formed XML document showed in Fig. 1 [3]. Index Terms XML Documents, Summarization. I. INTRODUCTION TO XML DOCUMENTS The extensible Markup Language (XML) was developed by an XML Working Group (originally known as the SGML Editorial Review Board) formed under the auspices of the World Wide Web Consortium (W3C) in 1996 [1]. XML is an open standard providing the means to share data and information between computers and computer programs as unambiguously as possible. Once transmitted, it is up to the receiving computer program to interpret the data for some useful purpose thus turning the data into information. Sometimes the data will be rendered as HTML. Other times it might be used to update and/or query a database and also it might be used in E-business and E-payment transactions. Originally intended as a means for Web publishing, the advantages of XML have proven useful for things never intended to be rendered as Web pages [2]. XML is a text file format for representing tree structures in a standard form. The whole structure of an XML document, if we abstract over less important details, is a tree of variable entity, in which nodes (also called elements) are labeled, leaves of the tree are text nodes, and the ordering between children of a node is significant. XML can be seen as a concrete syntax for describing such tree structures using mark-up texts. Fig. 1 shows an example of an XML document [3]. The XML specification does not fix a priori the set of allowed labels in an XML document nor it defines any semantics for labels. Only well-formed conditions are defined in particular to ensure proper nesting of elements, which allows to consider XML documents as trees. For instance, Fig. 2 gives a more visual tree representation of the previous Hesham Elzentani is a PHD student with Faculty of Informatics and Computing, University of Singidunum, 32 Danijelova, Voždovac, Beograd, Serbia ( hesham342002@gmail.com). Prof. dr Mladen Veinović is with Faculty of Informatics and Computing, University of Singidunum, 32 Danijelova, Voždovac, Beograd, Serbia ( mveinovic@singidunum.ac.rs). Fig. 1. Example of XML document. Fig. 2. Sample Tree of a Well-Formed Document. A. Elements and Attributes. Three important terms for describing basic XML syntax are elements, attributes, and documents. All three of these terms are special and important; they encompass a large portion of the conceptual playing field for XML [4]. Markup is often divided into two separate vocabulary words when we are talking about XML: elements and attributes. An element is a start tag <... > and an end tag < /... > including the stuff inside of it and an attribute is a simple name-value pair where the value is in single or double quotes. An attribute cannot stand by itself and must be inside the start tag of a given element. Two short examples followed [4]: <Food> Ice Cream </Food> <Food Flavour="Chocolate">Ice Cream </Food> The first line in the previous example is an element called
2 Food that has Ice Cream as its element content. The second line in the previous example is the same element with a name-value pair added to it (this is an attribute). The name is Flavour and the value is Chocolate. Elements may also be empty, having no element content. This is shown in the example below [4]: <HealthyFood></HealthyFood> <HealthyFood/> The first line in the previous example is an element called <HealthyFood> that has nothing inside of it. The second line is shorthand for the same empty element [4]. 5. The number of child elements. 6. Whether an element is empty or can include text. 7. Data types for elements and attributes. 8. Default and fixed values for elements and attributes. In general, XML Schema Structures specifies the XML Schema definition language and XML Schema Data types specifies extensible data types for XML. Fig. 3 shows an example of a simple XML schema document to describe an address. B. Well-formed XML documents must be well-formed by defining simple syntax rules for the legal positioning of elements and attributes. The data represented in XML documents are wellformed if the following points are achieved [4]: 1. There is exactly one root element. 2. Every start tag has a matching end tag. 3. No tag overlaps another tag. 4. All elements and attributes must obey the naming constraints. C. Validity A valid XML document is a well-formed document that also conforms to the stricter rules specified in the Document Type Definition (DTD) or in the schema (XSD). The DTD describes a document s structure, specifies which element types are allowed, and defines the properties for each element. A well-formed document is not checked against an external DTD [5]. Schemas, like DTDs, describe the structure of the document. But schemas are more powerful than DTDs, because they can also describe the structure of other information, such as databases. They also provide additional information about inheritance and data types in the XML document [5]. D. Schema Languages. Schemas describe structural constraints for XML documents. There are many formalisms (called schema languages) for specifying schemas (or types ). For instance: DTD, which is part of the XML specification, XML Schema (W3C), and RELAX NG (OASIS/ISO) are actively used by various applications. Each schema language has different constraint mechanisms and different expressivenesses [3]. The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD. XML Schema has better support for applications, document structure, attributes, and data-typing. An XML Schema must define [9]: 1. Elements that can appear in a document. 2. Attributes that can appear in a document. 3. Which elements are child elements. 4. The order of child elements. Fig. 3. XML schema document to describe an address. E. XML Technologies. W3C produces and manages the XML specifications and technologies. The followings are some of them [7]: 1. XML Namespaces enable the XML document to contain elements and attributes without any naming collisions occurring. 2. XPointer is a system for addressing components of XML-based Internet media. 3. XSLT is a declarative, XML-based document transformation language. 4. XInclude defines the ability for XML documents to include all or part of an external file. 5. XML Signature defines the syntax and processing rules of creating digital signatures on XML content. 6. XML Encryption defines the syntax and processing rules of encrypting XML content. 7. XPath lets it possible to reach to individual parts of XML document. XPath expressions can refer to all or part of the text, data, and values in XML documents. 8. XQuery is a query and functional programming language that is designed to query collections of XML data. In fact, XQuery uses XPath as its sublanguage.
3 F. Processing XML Documents An XML document may be stored as a plain file in a file system of any operating system as well as in a database management system (DBMS). There are three ways to process an XML document: 1. SAX (Simple API for XML), SAX is an event based sequential access parser API for XML documents, in which the processing is made on a tag-at-a-time basis without the need to load the entire document into memory and provides a mechanism for reading data from an XML document [3]. 2. DOM (Document Object Model), is an API designed for structured documents in general, which the document is first entirely loaded into memory and then processed, DOM operates on the document as a whole [4]. 3. A transformation language such as XSLT, which is a language for transforming XML documents into other XML documents and describes rules for transforming a source tree into a result tree [6]. SAX or DOM have advantages and disadvantages. SAX normally requires less memory space than DOM to process a document. However, SAX processing is limited to only oneway direction. In SAX, when an element/attribute is processed, there is no way to return to it. In contrast, DOM can navigate throughout the document, in both forward (rootto-leave) direction and reverse (leave-to-root) direction. DOM will require, however, a memory space proportional to the document size which may not be suitable in many practical situations. G. XML Data Management. Both shredded and native database processing of XML documents also have advantages and disadvantages. For shredding processing, a relational database engine is normally used. In this case, a document is mapped to (a set of) tables and columns, thus breaking its native structure. For instance, a row may contain a document node and each column can store information regarding the document node (e.g., element/attribute name and/or value). Using a relational engine, one can benefit from proven features of the relational database management systems such as transaction management and query processing and reuse them. An additional software layer should be provided to enable document mapping and unmapping. This layer should provoke a non-negligible burden because, as the XML document is shredded to enable its use in a relational storage, it must be reconstructed as a result of a query. Nevertheless, a shredded document is processed as relational data, not taking into account the specific needs and idiosyncrasies of a native XML data management. Instead the processing unit being a document, it is a table. A document query in a shredded scenario is made with SQL language or SQL/XML, an extension of SQL enabling specific document operations and (a limited form of) XPath/XQuery expressions. Pure XML data management systems (XDBMS), in turn, store an XML document, keeping its entire tree structure. Normally, B-trees are used as supporting structure to hold the document order. An XDBMS uses XPath and/or XQuery for querying stored documents. Query results are also XML documents which are sent back to the user with no need for remapping [8]. II. XML DOCUMENT SUMMARIZATION extensible Markup Language (XML) has become one of the de facto standards of data representation over World Wide Web, Web services based on XML technologies also have been emerging for exchanging structured information among organizations and in mostly applications and elsewhere. More and more data are stored in XML format [10][11]. To understand these XML documents with complex structure and abundant data, a human being must spend much time to read such documents. In some cases, it is impracticable, even not impossible, for a human being to read the whole XML document when the document is very large and complex. So it is necessary to present a human being with a summarized form of the original complex and large XML document. Such a summarized XML document is also useful in other applications: querying XML documents, comparing two XML documents, displaying or storing XML documents in a mobile or embedded device which has limited CPU processing ability, display screen and storage spaces, etc. Although, such a summarized XML document is very useful, it is difficult to generate a good summarized XML document. So the challenge of summarizing XML documents is how to generate such a summarized XML document. A summarized XML document should grasp the core information of the original document so that a human being can have a basic understanding of the original document. Of course, such a summarized XML document should have less size than the original document considering the storage space and complexity. A good summarized XML document can be evaluated by the following three standards [10]: 1. Document Size: The first goal of summarizing XML document is to obtain an XML document with an acceptable size comparing with the original one according to specific applications. In general, an XML document of smaller size is more readable than a larger one for a human being. Suppose the sizes of the original and the summarized XML document are S original bytes and S summarized bytes, respectively, so the summarized ratio of a summarized XML document respect to the original one is given in (1). = (1) 2. Information Content: A perfect summarized XML document should contain the entire information content of the original one, i.e., it is equivalent to the original one in the aspect of information content. But in reality, it is impossible for a summarized XML document with less size to contain the entire information content of the original document which
4 has no redundant information. Suppose the number of element values of the original XML document without redundant information and the summarized XML document are C original and C summarized, respectively, so the information content ratio of a summarized XML document respect to the original one is given in (2). parent and/or sibling may sometimes be of importance. For example Fig. 4, in the context of movies, we cannot have a <prodyear> without also having the <title> as well. However, we can have the <title> without the <prodyear>. = (2) 3. Information Importance: As a summarized XML document cannot contain the entire information content of the original one in most cases, it is necessary and practicable to contain the most important information of the original one. Although, there are two kinds of summaries can be generated [24]: 1. A generic summary, summarizes the entire contents of the document. 2. A query-biased summary, summarizes those parts of the document which are relevant to the user s query. A. Challenges in XML Summarization. A summary is useful if, at the very least, it helps the user to decide whether a particular document is worth looking into its entirety or not. The best summary would encapsulate most or all the salient points of the document and it could in many cases serve as a replacement for the original document. However, generating the best summary involves satisfying two contradictory goals: maximum coverage and minimum space. Good balance between the size of the summary and its coverage is required (for example see Fig. 4 and Fig. 5). To achieve these goals must applying the following steps [24]: 1. Informativeness: The notion of informativeness is intuitive. For a unit of information (Elements, Text) to be informative to the user, it has to be important in the document and presented concisely to the user. This follows directly from the goal of summarization which is to present the salient points of the document in a concise manner to the user. 2. Non-redundancy: Redundancy in XML data is very explicit. From Fig. 4, the same element, say, <keyword> could occur multiple times, once for each keyword associated with a given movie. Clearly, it is not necessary to repeat all occurrences of the element in the summary, but instead, concisely represent it using a single element. 3. Coverage: Coverage is closely related to informativeness and refers to the amount of information in the summary as opposed to the data. We may choose, for example, to leave out certain elements since they are not important enough. While this may improve the readability of the summary, it simultaneously reduces the coverage. 4. Coherence: The context of a element in terms of its Fig. 4. XML Document of a movie. Fig. 5. A Concise Summary of Fig. 4. III. RELATED WORKS Text summarization uses Extraction and abstraction methods. The summarization process performs analyzing the source text, determining its salient points, and synthesizing an appropriate output. Text summarization can be indicative, informative, or critical. In single document text summarization, a ranking framework is better function than a classification framework. In general, text summarization focused on free-flowing texts in text datasets, which is not always applicable to XML summarization as the structure information and semantic information [12][13]. XML schema summarization is one related topic which
5 summarizes XML schemas rather than XML documents. A schema summary can be of great help, providing a succinct overview of the entire schema, and making it possible to explore in depth only the relevant schema components [14]. XML structure summarization is another related topic which summarizes XML structures rather than XML documents. Clustering XML documents by structure. Modeling the XML documents as rooted ordered labeled trees, to improve the performance of the distance calculation and at the same time to maintain or even improve its quality [15]. The significant XML disadvantage is document size: tagging a set of data increases the space needed to store it, the bandwidth needed to transmit it, and the time needed to parse it, so compression techniques (Relax NG, XMill, gzip, bzip2, Huffmann, etc.) based on the document type are another related topics to reduce the document size [16][17]. Another topics focused on constructing XML summarization for XML efficient query estimation: StatiX explores schema transformation and schema validation to obtain statistics for query selectivity estimation in XML documents. StatiX uses histograms to summarize both the structure and values in an XML document [18]. TreeSketch synopses can produce fast, accurate approximate answers for XML documents. Unlike earlier techniques focusing solely on selectivity estimation, TreeSketch synopses are much more effective in capturing the complete tree structure of the underlying XML database [19]. Bloom histogram is a framework for XML path selectivity estimation in a dynamic context. Compared to other alternatives, such as Path Tree and Markov Table, it is of smaller size yet offers superior accuracy [20]. Xseed is a method to estimate cardinality of Xpath queries, it is accurate, robust, efficient, and adaptive to memory budgets. Xseed starts from a very small kernel, and then incrementally updates information of the synopsis. With such an incremental construction, a synopsis structure can be dynamically configured to accommodate different memory budgets. Cardinality estimation based on Xseed can be performed very efficiently and accurately [21]. Veronica Mayorga and Neoklis Polyzotis, proposed a method to summarize XML data streams other than XML documents. They introduced a technique for approximately answering a complex aggregate query over an XML stream using limited memory. The main novelty of the proposed technique is that it supports XML queries with any combination of the common XPath axes, namely, ancestor, descendant, parent, child, following, preceding, followingsibling, and preceding-sibling. At the heart of our method lies an efficient transform that reduces a continuous XML query to an equi-join query over relational streams [22]. A semi-automatic method to summarize XML collections where the user can specify semantically relevant features for an XML collection in a template, and define rules for summarization. The system assists the user in generating one or several such templates, selects applicable templates for a given collection, and applies them for automatic summarization [23]. Maya Ramanath and Kondreddi Sarath Kumar, focused on the generation of generic summaries of XML documents and a method of XML document summarization based on document itself alone. They proposed techniques to automatically generate concise, readable summaries of XML documents subject to a memory budget. The resulting summary is shorter but conveys all the important information from the original document also they addressed the three main issues which arise in producing such meaningful and concise summaries: which elements or text units are important and should be included in the summary, how can the selected elements and text be presented in a concise and coherent manner? and how to generate a semantic summary for different memory budgets? [24]. EXsum (Element-wise XML summarization) is framework for the summarization of XML document properties, which can capture statistical information of all important XPath axes related to (the nodes having) the same element name in a document [25]. IV. CONCLUSION The summarized XML document can help a human being to understand the original large and complex document well and present a clear and core information of the original one. In general concept, XML summarization still remain an open problem. This paper introduced a brief introduction to XML documents, XML Schema Languages, XML technologies, XML documents parsers using SAX, DOM or transformation languages (XSLT,..etc.). Also introduced XML document summarization, a good summarized XML document can be evaluated by Document Size, Information Content and Information Importance. There are two kinds of summaries which can be generated: A query-biased summary and A generic summary. To generate best summary must satisfy two contradictory goals: maximum coverage and minimum space, achieving these goals by Informativeness, Non-redundancy, Coverage and Coherence methods. Finally, paper introduced summarization's related works. REFERENCES [1] W3C Recommendation, " Extensible Markup Language (XML) 1.1(Second Edition) ", 2006, [2] Eric Lease Morgan, "Getting Started with XML: A Manual and Workshop", [3] Pierre Geneves, " Logics for XML ", [4] Blake Dournaee, " XML Security ", [5] Borland Software Corporation (JBuilder), "XML Developer s Guide", [6] W3C Recommendation, "XSL Transformations (XSLT) Version 2.0", 23 January 2007, [7] W3C Recommendation, "XML Technology", [8] Jose de Aguiar Moraes Filho, "Summarizing XML Documents: Contributions, Empirical Studies, and Challenges", [9] W3schools.com, [10] Teng Lv and Ping Yan, "A Framework of Summarizing XML Documents with Schemas", The International Arab Journal of Information Technology, Vol. 10, No. 1, January 2013.
6 [11] Dilek Basci and Sanjay Misra, "Entropy as a Measure of Quality of XML Schema Document", The International Arab Journal of Information Technology, Vol. 8, No. 1, January [12] Hahn U. and Mani I., "The Challenges of Automatic Summarization", Journal of Computer, vol. 33, no. 11, pp , [13] Amini M., Tombros A., Usunier N., and Lalmas M., "Learning-Based Summarization of XML Documents", Information Retrieval, vol. 10, no.3, pp , [14] Yu C. and Jagadish H., "Schema Summarization", in Proceedings of the 32 nd International Conference on Very Large Data Bases VLDB, Korea, pp , [15] Theodore Dalamagas, Tao Cheng, Klaas-Jan Winkel, Timos Sellis, "A Methodology for Clustering XML Documents by Structure", Information Systems, vol. 31, no. 3, pp , [16] Christopher League and Kenjone Eng, "Schema-Based Compression of XML Data with Relax NG", journal of computers, vol. 2, no. 10, December [17] Sebastian Maneth, Nikolay Mihaylov, Sherif Sakr, "XML Tree Structure Compression", in Proceedings of the 3rd International Workshop on XML Data Management Tools and Techniques, Italy, pp , [18] Juliana Freire, Jayant R. Haritsa, Maya Ramanath, Prasan Roy, and Jerome Simeon, "StatiX: Making XML Count", in Proceedings of the International Conference on Management of Data, USA, pp , [19] Neoklis Polyzotis, Minos Garofalakis and Yannis Ioannidis, "Approximate XML Query Answers", in Proceedings of SIGMOD International Conference on Management of Data, France, pp , [20] Wei Wang, Haifeng Jiang, Hongjun Lu and Jeffrey Xu Yu, "Bloom Histogram: Path Selectivity Estimation for XML Data with Updates", in Proceedings of the 30 th International Conference on Very Large Data Bases VLDB, Canada, pp , [21] Ning Zhang, M. Tamer Ozsu, Ashraf Aboulnaga and Ihab F. Ilyas, "XSEED: Accurate and Fast Cardinality Estimation for XPath Queries", in Proceedings of the 22nd International Conference on ICDE, USA, pp. 61, [22] Veronica Mayorga and Neoklis Polyzotis, "Sketch-based Summarization of Ordered XML Streams", in Proceedings of IEEE 25 th International Conference on ICDE, China, pp , [23] Gudrun Fischer and Igor Jacy Lino Campista, "A Template-Based Approach to Summarize XML Collections", in Proceedings of Lernen, Wissensentdeckung and Adaptivit, Germany, pp , [24] Maya Ramanath and Kondreddi Sarath Kumar, "A Rank-Rewrite Framework for Summarizing XML Documents", in Proceedings of 2 nd International Workshop on Ranking in Databases, ICDE Workshop, Mexico, pp , [25] Jose de Aguiar and Theo Harder, "EXsum - An XML Summarization Framework", IDEAS , September 10-12, Coimbra, Portugal.
XML: Extensible Markup Language
XML: Extensible Markup Language CSC 375, Fall 2015 XML is a classic political compromise: it balances the needs of man and machine by being equally unreadable to both. Matthew Might Slides slightly modified
More informationChapter 13 XML: Extensible Markup Language
Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server
More information.. Cal Poly CPE/CSC 366: Database Modeling, Design and Implementation Alexander Dekhtyar..
.. Cal Poly CPE/CSC 366: Database Modeling, Design and Implementation Alexander Dekhtyar.. XML in a Nutshell XML, extended Markup Language is a collection of rules for universal markup of data. Brief History
More informationM359 Block5 - Lecture12 Eng/ Waleed Omar
Documents and markup languages The term XML stands for extensible Markup Language. Used to label the different parts of documents. Labeling helps in: Displaying the documents in a formatted way Querying
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411 1 Extensible
More informationCOMP9321 Web Application Engineering. Extensible Markup Language (XML)
COMP9321 Web Application Engineering Extensible Markup Language (XML) Dr. Basem Suleiman Service Oriented Computing Group, CSE, UNSW Australia Semester 1, 2016, Week 4 http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2442
More informationCopyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1
Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.
More informationXML APIs Testing Using Advance Data Driven Techniques (ADDT) Shakil Ahmad August 15, 2003
XML APIs Testing Using Advance Data Driven Techniques (ADDT) Shakil Ahmad August 15, 2003 Table of Contents 1. INTRODUCTION... 1 2. TEST AUTOMATION... 2 2.1. Automation Methodology... 2 2.2. Automated
More informationStatiX: Making XML Count
StatiX: Making XML Count * Prasan Roy Jerome Simeon Bell Labs - Lucent Technologies Jayant Haritsa Maya Ramanath Indian Institute of Science Statix SIGMOD, 2002 1 Motivation Statistics to estimate cardinality
More informationA tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.
A tutorial report for SENG 609.22 Agent Based Software Engineering Course Instructor: Dr. Behrouz H. Far XML Tutorial Yanan Zhang Department of Electrical and Computer Engineering University of Calgary
More informationXSelMark: A Micro-Benchmark for Selectivity Estimation Approaches of XML Queries
XSelMark: A Micro-Benchmark for Selectivity Estimation Approaches of XML Queries Sherif Sakr National ICT Australia (NICTA) Sydney, Australia sherif.sakr@nicta.com.au Abstract. Estimating the sizes of
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 1, 2017 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 12 (Wrap-up) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2457
More informationH2 Spring B. We can abstract out the interactions and policy points from DoDAF operational views
1. (4 points) Of the following statements, identify all that hold about architecture. A. DoDAF specifies a number of views to capture different aspects of a system being modeled Solution: A is true: B.
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 12 (Wrap-up) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
Rashmi Gadbail,, 2013; Volume 1(8): 783-791 INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK EFFECTIVE XML DATABASE COMPRESSION
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2017 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid= 2465 1
More informationXML Technologies. Doc. RNDr. Irena Holubova, Ph.D. Web pages:
XML Technologies Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz Web pages: http://www.ksi.mff.cuni.cz/~holubova/nprg036/ Outline Introduction to XML format, overview of XML technologies DTD
More informationSDPL : XML Basics 2. SDPL : XML Basics 1. SDPL : XML Basics 4. SDPL : XML Basics 3. SDPL : XML Basics 5
2 Basics of XML and XML documents 2.1 XML and XML documents Survivor's Guide to XML, or XML for Computer Scientists / Dummies 2.1 XML and XML documents 2.2 Basics of XML DTDs 2.3 XML Namespaces XML 1.0
More informationIntroduction to XML. Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University
Introduction to XML Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University http://gear.kku.ac.th/~krunapon/xmlws 1 Topics p What is XML? p Why XML? p Where does XML
More information11. EXTENSIBLE MARKUP LANGUAGE (XML)
11. EXTENSIBLE MARKUP LANGUAGE (XML) Introduction Extensible Markup Language is a Meta language that describes the contents of the document. So these tags can be called as self-describing data tags. XML
More informationOne of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while
1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling
More informationEstimating the Selectivity of XML Path Expression with predicates by Histograms
Estimating the Selectivity of XML Path Expression with predicates by Histograms Yu Wang 1, Haixun Wang 2, Xiaofeng Meng 1, and Shan Wang 1 1 Information School, Renmin University of China, Beijing 100872,
More informationIntroduction to XML 3/14/12. Introduction to XML
Introduction to XML Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University http://gear.kku.ac.th/~krunapon/xmlws 1 Topics p What is XML? p Why XML? p Where does XML
More informationData Exchange. Hyper-Text Markup Language. Contents: HTML Sample. HTML Motivation. Cascading Style Sheets (CSS) Problems w/html
Data Exchange Contents: Mariano Cilia / cilia@informatik.tu-darmstadt.de Origins (HTML) Schema DOM, SAX Semantic Data Exchange Integration Problems MIX Model 1 Hyper-Text Markup Language HTML Hypertext:
More informationData Centric Integrated Framework on Hotel Industry. Bridging XML to Relational Database
Data Centric Integrated Framework on Hotel Industry Bridging XML to Relational Database Introduction extensible Markup Language (XML) is a promising Internet standard for data representation and data exchange
More informationXML. Jonathan Geisler. April 18, 2008
April 18, 2008 What is? IS... What is? IS... Text (portable) What is? IS... Text (portable) Markup (human readable) What is? IS... Text (portable) Markup (human readable) Extensible (valuable for future)
More informationTwigINLAB: A Decomposition-Matching-Merging Approach To Improving XML Query Processing
American Journal of Applied Sciences 5 (9): 99-25, 28 ISSN 546-9239 28 Science Publications TwigINLAB: A Decomposition-Matching-Merging Approach To Improving XML Query Processing Su-Cheng Haw and Chien-Sing
More informationCSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML
CSI 3140 WWW Structures, Techniques and Standards Representing Web Data: XML XML Example XML document: An XML document is one that follows certain syntax rules (most of which we followed for XHTML) Guy-Vincent
More informationIntroduction to XML. XML: basic elements
Introduction to XML XML: basic elements XML Trying to wrap your brain around XML is sort of like trying to put an octopus in a bottle. Every time you think you have it under control, a new tentacle shows
More informationXML: Managing with the Java Platform
In order to learn which questions have been answered correctly: 1. Print these pages. 2. Answer the questions. 3. Send this assessment with the answers via: a. FAX to (212) 967-3498. Or b. Mail the answers
More informationXML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9
XML databases Jan Chomicki University at Buffalo Jan Chomicki (University at Buffalo) XML databases 1 / 9 Outline 1 XML data model 2 XPath 3 XQuery Jan Chomicki (University at Buffalo) XML databases 2
More informationEnhanced XML Retrieval with Flexible Constraints Evaluation
University of Milano Bicocca Department of Informatics, Systems and Communication (DISCo) Enhanced XML Retrieval with Flexible Constraints Evaluation Ph.D dissertation of Emanuele Panzeri Supervisor: Prof.
More informationChapter 2 XML, XML Schema, XSLT, and XPath
Summary Chapter 2 XML, XML Schema, XSLT, and XPath Ryan McAlister XML stands for Extensible Markup Language, meaning it uses tags to denote data much like HTML. Unlike HTML though it was designed to carry
More informationXML. COSC Dr. Ramon Lawrence. An attribute is a name-value pair declared inside an element. Comments. Page 3. COSC Dr.
COSC 304 Introduction to Database Systems XML Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca XML Extensible Markup Language (XML) is a markup language that allows for
More informationXQuery Optimization in Relational Database Systems
XQuery Optimization in Relational Database Systems Riham Abdel Kader Supervised by Maurice van Keulen Univeristy of Twente P.O. Box 217 7500 AE Enschede, The Netherlands r.abdelkader@utwente.nl ABSTRACT
More informationXML. Objectives. Duration. Audience. Pre-Requisites
XML XML - extensible Markup Language is a family of standardized data formats. XML is used for data transmission and storage. Common applications of XML include business to business transactions, web services
More informationSearching SNT in XML Documents Using Reduction Factor
Searching SNT in XML Documents Using Reduction Factor Mary Posonia A Department of computer science, Sathyabama University, Tamilnadu, Chennai, India maryposonia@sathyabamauniversity.ac.in http://www.sathyabamauniversity.ac.in
More informationEMERGING TECHNOLOGIES. XML Documents and Schemas for XML documents
EMERGING TECHNOLOGIES XML Documents and Schemas for XML documents Outline 1. Introduction 2. Structure of XML data 3. XML Document Schema 3.1. Document Type Definition (DTD) 3.2. XMLSchema 4. Data Model
More informationInformatics 1: Data & Analysis
Informatics 1: Data & Analysis Lecture 9: Trees and XML Ian Stark School of Informatics The University of Edinburgh Tuesday 11 February 2014 Semester 2 Week 5 http://www.inf.ed.ac.uk/teaching/courses/inf1/da
More informationPath-based XML Relational Storage Approach
Available online at www.sciencedirect.com Physics Procedia 33 (2012 ) 1621 1625 2012 International Conference on Medical Physics and Biomedical Engineering Path-based XML Relational Storage Approach Qi
More informationInformation Technology Department, PCCOE-Pimpri Chinchwad, College of Engineering, Pune, Maharashtra, India 2
Volume 5, Issue 5, May 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Adaptive Huffman
More informationA Sampling Approach for XML Query Selectivity Estimation
A Sampling Approach for XML Query Selectivity Estimation Wen-Chi Hou Computer Science Department Southern Illinois University Carbondale Carbondale, IL 62901, U.S.A. hou@cs.siu.edu Cheng Luo Department
More informationThe XML Metalanguage
The XML Metalanguage Mika Raento mika.raento@cs.helsinki.fi University of Helsinki Department of Computer Science Mika Raento The XML Metalanguage p.1/442 2003-09-15 Preliminaries Mika Raento The XML Metalanguage
More informationAnswering XML Twig Queries with Automata
Answering XML Twig Queries with Automata Bing Sun, Bo Zhou, Nan Tang, Guoren Wang, Ge Yu, and Fulin Jia Northeastern University, Shenyang, China {sunb,wanggr,yuge,dbgroup}@mail.neu.edu.cn Abstract. XML
More informationData Presentation and Markup Languages
Data Presentation and Markup Languages MIE456 Tutorial Acknowledgements Some contents of this presentation are borrowed from a tutorial given at VLDB 2000, Cairo, Agypte (www.vldb.org) by D. Florescu &.
More informationA Clustering-based Scheme for Labeling XML Trees
84 IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.9A, September 2006 A Clustering-based Scheme for Labeling XML Trees Sadegh Soltan, and Masoud Rahgozar, University of
More informationAn approach to the model-based fragmentation and relational storage of XML-documents
An approach to the model-based fragmentation and relational storage of XML-documents Christian Süß Fakultät für Mathematik und Informatik, Universität Passau, D-94030 Passau, Germany Abstract A flexible
More informationXML. Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior
XML Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior XML INTRODUCTION 2 THE XML LANGUAGE XML: Extensible Markup Language Standard for the presentation and transmission of information.
More informationCopyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML
Chapter 7 XML 7.1 Introduction extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML Lax syntactical rules Many complex features that are rarely used HTML
More informationXML Data Management. 5. Extracting Data from XML: XPath
XML Data Management 5. Extracting Data from XML: XPath Werner Nutt based on slides by Sara Cohen, Jerusalem 1 Extracting Data from XML Data stored in an XML document must be extracted to use it with various
More informationXML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson
Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Manning, Raghavan, and Schütze http://www.informationretrieval.org OVERVIEW Introduction Basic XML Concepts Challenges
More information7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML
7.1 Introduction extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML Lax syntactical rules Many complex features that are rarely used HTML is a markup language,
More informationEvolution of XML Applications
Evolution of XML Applications University of Technology Sydney, Australia Irena Mlynkova 9.11. 2011 XML and Web Engineering Research Group Department of Software Engineering Faculty of Mathematics and Physics
More informationSymmetrically Exploiting XML
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA The 15 th International World Wide Web Conference
More informationInformation Technology Document Schema Definition Languages (DSDL) Part 1: Overview
ISO/IEC JTC 1/SC 34 Date: 2008-09-17 ISO/IEC FCD 19757-1 ISO/IEC JTC 1/SC 34/WG 1 Secretariat: Japanese Industrial Standards Committee Information Technology Document Schema Definition Languages (DSDL)
More informationData Processing System to Network Supported Collaborative Design
Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 3351 3355 Advanced in Control Engineering and Information Science Data Processing System to Network Supported Collaborative Design
More informationA System for Storing, Retrieving, Organizing and Managing Web Services Metadata Using Relational Database *
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 6, No 1 Sofia 2006 A System for Storing, Retrieving, Organizing and Managing Web Services Metadata Using Relational Database
More informationOutline. Approximation: Theory and Algorithms. Ordered Labeled Trees in a Relational Database (II/II) Nikolaus Augsten. Unit 5 March 30, 2009
Outline Approximation: Theory and Algorithms Ordered Labeled Trees in a Relational Database (II/II) Nikolaus Augsten 1 2 3 Experimental Comparison of the Encodings Free University of Bozen-Bolzano Faculty
More informationIntro to XML. Borrowed, with author s permission, from:
Intro to XML Borrowed, with author s permission, from: http://business.unr.edu/faculty/ekedahl/is389/topic3a ndroidintroduction/is389androidbasics.aspx Part 1: XML Basics Why XML Here? You need to understand
More informationXML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11
!important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... 7:4 @import Directive... 9:11 A Absolute Units of Length... 9:14 Addressing the First Line... 9:6 Assigning Meaning to XML Tags...
More informationXML Systems & Benchmarks
XML Systems & Benchmarks Christoph Staudt Peter Chiv Saarland University, Germany July 1st, 2003 Main Goals of our talk Part I Show up how databases and XML come together Make clear the problems that arise
More informationKeyword Search over Hybrid XML-Relational Databases
SICE Annual Conference 2008 August 20-22, 2008, The University Electro-Communications, Japan Keyword Search over Hybrid XML-Relational Databases Liru Zhang 1 Tadashi Ohmori 1 and Mamoru Hoshi 1 1 Graduate
More informationSemi-structured Data. 8 - XPath
Semi-structured Data 8 - XPath Andreas Pieris and Wolfgang Fischl, Summer Term 2016 Outline XPath Terminology XPath at First Glance Location Paths (Axis, Node Test, Predicate) Abbreviated Syntax What is
More informationAn UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry
An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry I-Chen Wu 1 and Shang-Hsien Hsieh 2 Department of Civil Engineering, National Taiwan
More informationPart 2: XML and Data Management Chapter 6: Overview of XML
Part 2: XML and Data Management Chapter 6: Overview of XML Prof. Dr. Stefan Böttcher 6. Overview of the XML standards: XML, DTD, XML Schema 7. Navigation in XML documents: XML axes, DOM, SAX, XPath, Tree
More informationExpressing Internationalization and Localization information in XML
Expressing Internationalization and Localization information in XML Felix Sasaki Richard Ishida World Wide Web Consortium 1 San Francisco, This presentation describes the current status of work on the
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK SQL EDITOR FOR XML DATABASE MISS. ANUPAMA V. ZAKARDE 1, DR. H. R. DESHMUKH 2, A.
More informationXML in Databases. Albrecht Schmidt. al. Albrecht Schmidt, Aalborg University 1
XML in Databases Albrecht Schmidt al@cs.auc.dk http://www.cs.auc.dk/ al Albrecht Schmidt, Aalborg University 1 What is XML? (1) Where is the Life we have lost in living? Where is the wisdom we have lost
More informationPak. J. Biotechnol. Vol. 13 (special issue on Innovations in information Embedded and communication Systems) Pp (2016)
DEPARTMENT STUDENT LIBRARY USING TWIG PATTERN QUERY PROCESSING OVER ADMIN-USER LOGIN PRIVILEGE 1 ALBERT MAYAN J., 2 SURYA, B., 3 PRANOY PRABHAKAR, 4 PRINCE KUMAR Department of Computer Science and Engineering,
More informationA Survey on Keyword Diversification Over XML Data
ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology An ISO 3297: 2007 Certified Organization Volume 6, Special Issue 5,
More informationInformation Systems. XML Essentials. Nikolaj Popov
Information Systems XML Essentials Nikolaj Popov Research Institute for Symbolic Computation Johannes Kepler University of Linz, Austria popov@risc.uni-linz.ac.at Outline Introduction Basic Syntax Well-Formed
More informationmarkup language carry data define your own tags self-descriptive W3C Recommendation
XML intro What is XML? XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to carry data, not to display data XML tags are not predefined. You must define
More informationXML and information exchange. XML extensible Markup Language XML
COS 425: Database and Information Management Systems XML and information exchange 1 XML extensible Markup Language History 1988 SGML: Standard Generalized Markup Language Annotate text with structure 1992
More informationCS6501 IP Unit IV Page 1
CS6501 Internet Programming Unit IV Part - A 1. What is PHP? PHP - Hypertext Preprocessor -one of the most popular server-side scripting languages for creating dynamic Web pages. - an open-source technology
More informationPerformance Evaluation of XHTML encoding and compression
Performance Evaluation of XHTML encoding and compression Sathiamoorthy Manoharan Department of Computer Science, University of Auckland, Auckland, New Zealand Abstract. The wireless markup language (WML),
More informationSFilter: A Simple and Scalable Filter for XML Streams
SFilter: A Simple and Scalable Filter for XML Streams Abdul Nizar M., G. Suresh Babu, P. Sreenivasa Kumar Indian Institute of Technology Madras Chennai - 600 036 INDIA nizar@cse.iitm.ac.in, sureshbabuau@gmail.com,
More informationXML Technologies Dissected Erik Wilde Swiss Federal Institute of Technology, Zürich
XML Technologies Dissected Erik Wilde Swiss Federal Institute of Technology, Zürich The lack of well-defined information models in many XML technologies can generate compatibility problems and lower the
More informationUNIT 3 XML DATABASES
UNIT 3 XML DATABASES XML Databases: XML Data Model DTD - XML Schema - XML Querying Web Databases JDBC Information Retrieval Data Warehousing Data Mining. 3.1. XML Databases: XML Data Model The common method
More informationA DTD-Syntax-Tree Based XML file Modularization Browsing Technique
IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.2A, February 2006 127 A DTD-Syntax-Tree Based XML file Modularization Browsing Technique Zhu Zhengyu 1, Changzhi Li, Yuan
More informationIntroduction to XML Zdeněk Žabokrtský, Rudolf Rosa
NPFL092 Technology for Natural Language Processing Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa November 28, 2018 Charles Univeristy in Prague Faculty of Mathematics and Physics Institute of Formal
More informationDelivery Options: Attend face-to-face in the classroom or remote-live attendance.
XML Programming Duration: 5 Days Price: $2795 *California residents and government employees call for pricing. Discounts: We offer multiple discount options. Click here for more info. Delivery Options:
More informationMETAXPath. Utah State University. From the SelectedWorks of Curtis Dyreson. Curtis Dyreson, Utah State University Michael H. Böhen Christian S.
Utah State University From the SelectedWorks of Curtis Dyreson December, 2001 METAXPath Curtis Dyreson, Utah State University Michael H. Böhen Christian S. Jensen Available at: https://works.bepress.com/curtis_dyreson/11/
More informationA FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS
A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:
More informationUsing UML To Define XML Document Types
Using UML To Define XML Document Types W. Eliot Kimber ISOGEN International, A DataChannel Company Created On: 10 Dec 1999 Last Revised: 14 Jan 2000 Defines a convention for the use of UML to define XML
More informationXML: some structural principles
XML: some structural principles Hayo Thielecke University of Birmingham www.cs.bham.ac.uk/~hxt October 18, 2011 1 / 25 XML in SSC1 versus First year info+web Information and the Web is optional in Year
More informationDelivery Options: Attend face-to-face in the classroom or via remote-live attendance.
XML Programming Duration: 5 Days US Price: $2795 UK Price: 1,995 *Prices are subject to VAT CA Price: CDN$3,275 *Prices are subject to GST/HST Delivery Options: Attend face-to-face in the classroom or
More informationWeb Services and SOA. The OWASP Foundation Laurent PETROQUE. System Engineer, F5 Networks
Web Services and SOA Laurent PETROQUE System Engineer, F5 Networks OWASP-Day II Università La Sapienza, Roma 31st, March 2008 Copyright 2008 - The OWASP Foundation Permission is granted to copy, distribute
More informationIntroduction p. 1 An XML Primer p. 5 History of XML p. 6 Benefits of XML p. 11 Components of XML p. 12 BNF Grammar p. 14 Prolog p. 15 Elements p.
Introduction p. 1 An XML Primer p. 5 History of XML p. 6 Benefits of XML p. 11 Components of XML p. 12 BNF Grammar p. 14 Prolog p. 15 Elements p. 16 Attributes p. 17 Comments p. 18 Document Type Definition
More informationx ide xml Integrated Development Environment Specifications Document 1 Project Description 2 Specifi fications
x ide xml Integrated Development Environment Specifications Document Colin Hartnett (cphartne) 7 February 2003 1 Project Description There exist many integrated development environments that make large
More informationAn Efficient XML Index Structure with Bottom-Up Query Processing
An Efficient XML Index Structure with Bottom-Up Query Processing Dong Min Seo, Jae Soo Yoo, and Ki Hyung Cho Department of Computer and Communication Engineering, Chungbuk National University, 48 Gaesin-dong,
More informationPart XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321
Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends
More informationSecurity Based Heuristic SAX for XML Parsing
Security Based Heuristic SAX for XML Parsing Wei Wang Department of Automation Tsinghua University, China Beijing, China Abstract - XML based services integrate information resources running on different
More informationTo Optimize XML Query Processing using Compression Technique
To Optimize XML Query Processing using Compression Technique Lalita Dhekwar Computer engineering department Nagpur institute of technology,nagpur Lalita_dhekwar@rediffmail.com Prof. Jagdish Pimple Computer
More informationCOPYRIGHTED MATERIAL. Contents. Part I: Introduction 1. Chapter 1: What Is XML? 3. Chapter 2: Well-Formed XML 23. Acknowledgments
Acknowledgments Introduction ix xxvii Part I: Introduction 1 Chapter 1: What Is XML? 3 Of Data, Files, and Text 3 Binary Files 4 Text Files 5 A Brief History of Markup 6 So What Is XML? 7 What Does XML
More informationStructured documents
Structured documents An overview of XML Structured documents Michael Houghton 15/11/2000 Unstructured documents Broadly speaking, text and multimedia document formats can be structured or unstructured.
More informationPART. Oracle and the XML Standards
PART I Oracle and the XML Standards CHAPTER 1 Introducing XML 4 Oracle Database 10g XML & SQL E xtensible Markup Language (XML) is a meta-markup language, meaning that the language, as specified by the
More informationPart VII. Querying XML The XQuery Data Model. Marc H. Scholl (DBIS, Uni KN) XML and Databases Winter 2005/06 153
Part VII Querying XML The XQuery Data Model Marc H. Scholl (DBIS, Uni KN) XML and Databases Winter 2005/06 153 Outline of this part 1 Querying XML Documents Overview 2 The XQuery Data Model The XQuery
More informationOptimize Twig Query Pattern Based on XML Schema
JOURNAL OF SOFTWARE, VOL. 8, NO. 6, JUNE 2013 1479 Optimize Twig Query Pattern Based on XML Schema Hui Li Beijing University of Technology, Beijing, China Email: xiaodadaxiao2000@163.com HuSheng Liao and
More informationA Framework for Processing Complex Document-centric XML with Overlapping Structures Ionut E. Iacob and Alex Dekhtyar
A Framework for Processing Complex Document-centric XML with Overlapping Structures Ionut E. Iacob and Alex Dekhtyar ABSTRACT Management of multihierarchical XML encodings has attracted attention of a
More informationJava EE 7: Back-end Server Application Development 4-2
Java EE 7: Back-end Server Application Development 4-2 XML describes data objects called XML documents that: Are composed of markup language for structuring the document data Support custom tags for data
More information