Information Retrieval from Structured Documents Represented by Attribute Grammars

Size: px
Start display at page:

Download "Information Retrieval from Structured Documents Represented by Attribute Grammars"

Transcription

1 Abstract Information Retrieval from Structured Documents Represented by Attribute Grammars Alda Lopes Gançarski * alda.lopes@lip6.fr Pedro Rangel Henriques ** prh@di.uminho.pt This paper presents a system for Information Retrieval (IR) from collections of structured documents represented by Attribute Grammars (AG). Each document corresponds to a syntactic tree with nodes decorated with sets of attributes. The values of these attributes correspond to characteristics which specify the semantics of the textual content and the structure in order to perform IR. First, we show how to generate such grammars. Then, we explain how the AG analyser constructs document representations to obtain a ranked list of the result to the queries. The relevance of the document components is estimated using textual and structural information. This information is given by a set of heuristics or, alternatively, is explicitly introduced into the system by a specific document. 1. Introduction It has been recently a tremendous growth of the specification of structured textual information using the standards Standard Generalised Markup Language (SGML), Hypertext Markup Language (HTML) and extensible Markup Language (XML). When a collection of documents share the kind of information they deal with, it is natural to think about describing that information in the same way for all of them. For example, business letters have almost all the same structure, so it is desirable that this structure would be described in a standard way. These ideas were at the origins of a standard for specifying markup languages, the SGML, as a format of exchanging documents. The specification of the structural elements and their hierarchical relations for a given type of documents is made through the Document Type Definition (DTD). Later, the HTML was developed as an SGML application to show the documents in the Web. This standard makes use of mainly presentation marks to describe how the textual parts must be displayed by the browser. More recently, XML appeared as a simplification of SGML adapted to exchange documents in the Web. XML documents have a description richer than the simple but limited one provided by HTML. We can take advantage of the markup information to represent structured documents in a standard way in order to extract results from them, whatever the system application, by means of a Attribute Grammar (AG) ([Gan01]). Not only it is a wide known and rather simple concept, but it also allows to uniformly represent heterogeneous sources of structured information, avoiding to develop new intermediate specifications or languages for specialised tasks. A Attribute Grammar (AG) consists of a context independent grammar extended by a set of attributes (and rules for their calculation) which specify the semantics of the analysed texts. If necessary, it also allows to impose contextual conditions to productions, based on attribute values. The result of the syntactic and semantic analysis of a text is an abstract syntax * Pierre et Marie Curie University, Laboratoire d Informatique de Paris 6, Paris, France. ** University of Minho, Department of Informatics, Braga, Portugal.

2 tree decorated with the attribute values (DAST). When an attribute value is calculated by the production where the respective symbol is derived (the symbol is on the left-hand side of the production) it is called synthesised attribute; otherwise, it is called inherited. Synthesised attributes make the propagation of the semantic information from the leaves up to the root of the syntax tree, while the inherited ones do the same but from the root down to the leaves or between sibling nodes. A AG in which all the attributes can be evaluated by visiting the nodes of the syntactic tree in a single left-to-right, depth-first traversal is said to be L-attributed [Sco00]. In this case, the semantic analysis can be interleaved with the syntactic analysis. The task we address here is Information Retrieval (IR). IR consists of retrieving from a collection the relevant documents to a query, while returning as few as possible of non-relevant documents. Moreover, the resulting documents should be ranked by their relevance to the query. In classical IR, a document is represented by the set of terms which are considered to describe the meaning of the text. The selection of those terms is called automatic indexing and consists of some operations done on the text, like stemming and elimination of non-informative words (stopwords). One possible textual representation is a vector in which the components store a measure of how representative is each term with respect to the meaning of the text. This measure is based both on the frequency of the term in a document and the number of documents that contain the term. The relevance of a document to a query is usually computed by the cosine between the vector that represents the text and the one that represents the query. In order to speed up the search when performing query processing, appropriate data structures are build over the text, called indexes [BYRN99]. Among several techniques, we chosed inverted files which are currently the best choice for most applications. An inverted file is a word-oriented mechanism composed of two elements: the vocabulary and the occurrences. The vocabulary is the set of different terms in the text. For each term, a list of all the text positions where the term appears is stored. The set of all lists is called the occurrences file. With the emerge of structured documents, the need appeared to construct an index for the structure where each element occurrence is associated with its father. To take advantage of the structural information of the documents, IR research evolved, first, retrieving documents taking into account the structural parts relevance ([Wil94]); then, enriching the query formats with structural information to retrieve certain parts of documents ([NBY95]); more recently, some works tried to establish the relevance ranking ([Lal97], [WFC99], [HTK00], [SN00]), but it is still an opened research area. We intend to retrieve parts of documents and to establish the relevance ranking of the results. For that, we use the structure information together with the textual content to make the relevance calculations. The information about the structure is expressed by a set of heuristics based on the analysis of different types of structured documents. Our system is composed of different modules, as shown in Figure 1. Using the DTD, the AG Generator module will automatically generate a specific AG, as we show in the section 2. The AG Analyser (section 3) makes a syntactic and semantic analysis of an input document to create the corresponding DAST. The textual contents and structure information from the document are used as arguments of various external functions which compute the attribute values (for example, a function can compute the frequency of a term in a text). As an alternative to the heuristics about the structure information (section 4), a document called Information about Structure for IR (ISIR) can be used by external functions to calculate the relevance of the document components (section 5). A query is a pair (ElemType, textual restriction) which expresses the type of element to return and a textual restriction over it (expressed as a natural language query). The result of analysing one document is the ranked list of the desired elements. After analysing all the documents, the resulting lists can be merged to have a final ranked list of all the relevant element occurrences. This result can be given to the

3 user in several ways; for example, each occurrence may have a pointer to the document it belongs to. DTD AG Generator Text and Structure External Functions Structured Document AG Analyser Attribute Values Structured Query DAST ISIR Fig. 1 - The IR system. 2. The AG Generator To represent a structured document by a DAST, we need to specify both the syntactic and the semantic components of the AG. The syntactic part of the AG corresponds to the structure of the document. Production rules can be automatically constructed from the element declarations using a pre-defined set of generic mapping rules. Each rule corresponds to a type of element declaration or a type of operator in the regular expressions of the element contents: For an element (ElemType) declared as a sequence of N components (Comp_1,, Comp_N) in the following DTD declaration: <!ELEMENT ElemType - - (Comp_1,, Comp_N)>, the corresponding production of the AG is: ElemType Comp_1 Comp_N. For an element declared as an alternative of components: <!ELEMENT ElemType - - (Comp_1 Comp_N)>, a new production of the AG is generated for each component: ElemType Comp_i, i=1..n. If a component (Comp) is formed by another regular expression, the productions of the AG vary according to the operator of the expression. For sequential expressions and alternative expressions, the productions are, respectively: Comp Comp_1 Comp_N and Comp Comp_i, i=1..n, where Comp is a new non-terminal symbol. For one or more occurrences of a component (Comp +), zero or more occurrences (Comp *) and zero or one occurrence (Comp?), a new symbol is created (NewSymbol) and the set of productions are, respectively: (Comp +) (Comp *) (Comp?) NewSymbol Comp NewSymbol NewSymbol Comp NewSymbol Comp NewSymbol NewSymbol ε 1 NewSymbol Comp NewSymbol ε The new symbols of the grammar (Comp and NewSymbol) do not have meaning, they are just syntactic constructs, so we will ignore them in the example below. The generated AG productions are, then, augmented with the semantic attribute calculations. The basic set of attributes for IR is: - identifier - the unique identifier for the element occurrence; 1 Empty production.

4 - elemtype - the type of an element; - textrep - the vector representation of a textual component; - relevance - the relevance of an element to the query; - select - a binary value to detect the elements of the type specified in the query; - oldranklist - the ranked list of relevant elements inherited from the partial DAST constructed until the present element; it is initialised with an empty list in the production of the root symbol; - newranklist - the ranked list of relevant elements updated with the information coming from the partial DAST with the present symbol as root. We show now a simple example composed by a DTD which describes mail letters, a corresponding instance, a query and the final DAST representation of the instance (Figure 2). DTD: <!DOCTYPE Letter SYSTEM [ <!ELEMENT Letter - - (Subject, Head, Message)> <!ELEMENT Subject - - (#PCDATA)> <!ELEMENT Head - - (#PCDATA)> <!ELEMENT Message - - (Paragraph+)> <!ELEMENT Paragraph - - (#PCDATA)>]> Instance:<Letter> <Subject>Vacations</Subject> <Head>14 Nov 2000, Dear Marianne: </Head> <Message> </Message> </Letter> <Paragraph> I am in vacations in Europe!<Paragraph> <Paragraph> I visited Portugal and Spain. Now I am in Paris, a city with lots of monuments and interesting things to see! Then I plan to visit you. Will you be there?</paragraph> <Paragraph>I hope to enjoy the rest of my vacations! I will visit nice places as much as possible! I whish you good vacations too!</paragraph> The receiver of this message wants paragraphs talking about vacations (query=(paragraph, vacations )). In the DAST, the synthesised attributes appear on the right hand side of the derivation arrows and the inherited ones, i.e. the ones that depend on attributes coming from the ascendants, on the left hand side. For better understanding, the relevance attribute has a very simple calculation function: it is the frequency of the textual restrictions in the textual content, here the frequency of the term vacations. Each symbol corresponding to an element is assigned a unique natural number identifier. The select attribute has always the value 0, except for the occurrences of the desired element (Paragraph). The old ranked list of occurrences of the desired element (oldranklist) for the symbol Letter is an empty list since there is no partial DAST constructed until that node. Then, it is inherited by Subject (for clarity, the evolution of the list is indicated with dotted arrows). The corresponding updated list (newranklist) remains empty because Subject is not the desired element. This list is inherited by Head and remains empty because Head is still not the desired element. The same empty list is passed to Message and then to the first Paragraph in the oldranklist attribute. The attribute select of this paragraph has the value 1. Thus, a relevance value of 1 is stored in relevance, which means that there is one occurrence of the term vacations. The new ranked list becomes a single list with this paragraph s identifier (5). In the following paragraph, this list is not altered since there are no occurrences of the word vacations. The last Paragraph has a relevance value of 2; consequently, the head of the ranked list becomes the identifier of this

5 paragraph (7, 5). This list is now passed up to Message and then to Letter. The final result of the IR process over the instance is the newranklist attribute of the root symbol, i.e., (7, 5), which means that the Paragraph identified by 7 ( I hope vacations too! ) is the most relevant to the query, followed by the Paragraph identified by 5 ( I am in vacations in Europe! ). Letter identifier = 1 elemtype=letter oldranklist = ( ) relevance =4 select = 0 ranklist = (7, 5) Subject Head Message identifier=4 elemtype=message oldranklist=() identifier=2 elemtype=subject identifier=3 elemtype=head oldranklist=() textrep= oldranklist=() textrep=... relevance=1 relevance=0 select=0 select=0 ranklist=( ) ranklist = ( ) Vacations 14 Nov 2000, Dear Marianne relevance=3 select=0 ranklist =(7, 5) Paragraph Paragraph Paragraph elemtype=paragraph elemtype=paragraph elemtype=paragraph identifier = 5 textrep= identifier=6 textrep= identifier=7 textrep= oldranklist=() select=1 oldranklist=(5) select=1 oldranklist=(5) select = 1 relevance=1 relevance=0 relevance =2 ranklist=(5) ranklist=(5) ranklist =(7, 5) I am in vacations in Europe! I visited be there? I wish you good vacations! Bye! Fig. 2 : The DAST representation of a structured document. 3. The AG Analyser To represent a document by a DAST, it is analysed syntactically and semantically. Since our AG is L-attributed, both the information concerning the position of an element in the syntactic tree and its attributes are computed and stored in the same step during the syntactic analysis. Before starting the user queries session, a partial DAST is created with the queryindependent attributes. This consists of saving the syntactic tree in a structure index, making the automatic indexation and frequencies calculations to produce the vector representation of the text and calculating other fixed attributes, like the element type. Figure 3 shows the text index (vocabulary and occurrences) and the structure index. In the vocabulary, each term (term) is associated with the number of elements where it appears (efreq, needed for the vector representation of the text) and a pointer to the correspondent list of occurrences in the occurrences file (occptr). For each occurrence, the system stores the element identifier where the term occurs (elemid) together with its frequency in the same element (tfreq, also needed for the textual representation). The structure index has the following information about each

6 element (elemid): its parent (parent), its type (type) and the maximum frequency of a term in its textual component (tfmax, still needed for the textual representation). When the user starts the queries, the system uses the previously created indexes to calculate the attributes related with the relevance of the document components. This attributes are calculated for each element following the order in which the elements appear in the structure index, i.e. the same as in the syntactic analysis (left-to-right, depth-first). In the example of section 3, the order is <2, 3, 5, 6, 7, 4, 1>. Following this order, a recursive algorithm maintains, for each element, a stack with the relevance of each of its components in order to calculate its own relevance. The same algorithm keeps the ranked list of the relevant elements in a vector and augments it with the elemid of each new relevant selected element. Dictionary Occurrences Structure Index term efreq occptr elemid tfreq elemid parent type tfmax Fig. 3 : The text index (vocabulary and occurrences) and the structure index of the system. 4. Information about Structure for IR In IR, only textual information is used to calculate the relevance of a document to a query. In our work, we want to improve the relevance evaluation using also structural information in two ways: allowing the user to specify parts of documents in the query; and developing the following heuristics based on the analysis of different types of structured documents: 1. Basically, a markup can be for presentation (procedural markup, such as Bold or Italic tags), for logical organization of the content (descriptive structural markup, such as Chapter and Section tags) or it can correspond to concepts defining the content (descriptive nominal markup, such as Car or Hotel tags) [Pit]. In the latter case, the name of the tags should be indexed because they are information that can be asked directly by the user (when, for instance, he wants documents about hotels for vacations). 2. The textual elements (elements with just a #PCDATA content) of a sequence are usually more related to each other than other sibling elements up in the structure. It is thus likely that, if one of them is relevant to the query, the majority of them are too. This is the case of a sequence of Line elements in a paragraph. 3. A textual component before a list of elements can be interpreted as a context for those elements. This is the case of an introduction of a chapter with respect to its sections. 4. The elements can be categorized in different levels of importance with respect to the relevance of a document (or an ascendant). Typically, the element Title of a scientific article has a much greater importance than a Footnote element. The system should be able to detect the existence of titles, headers or other tags which are easily assigned with a relative importance value. An alternative or a complement to the above heuristics, is the explicit introduction of that information in a particular document called Information about Structure for IR (ISIR). The specification of a ISIR is given to the system together with the DTD of the documents collection (recall Figure 1). We propose the following XML syntax for the ISIR: <!DOCTYPE ISIR [ <!ELEMENT ISIR (NominalMarkup?, Sequence?, Context*, Importance?)>

7 <!ELEMENT NominalMarkup (ElemType+)> <!ELEMENT ElemType (#PCDATA)> <!ELEMENT Sequence (TextType+)> <!ELEMENT TextType (#PCDATA)> <!ELEMENT Context (TextType, TextTypeList)> <!ELEMENT TextTypeList (TextType+)> <!ELEMENT Importance (ElemType, Weight)+> <!ELEMENT Weight (NUM)>]> Here, NominalMarkup specifies the tags that must be indexed; Sequence refers to a sequence of strongly related textual components (given by a sequence of TextType tags); Context defines a contextual relation between a textual component and a list of textual components (TextTypeList); Importance defines the attribution of a weight Weight to the element type ElemType. The element types are the elements declared in the DTD of the documents collection. The textual component types correspond to the different possible paths from the root to a textual content specified in the DTD: each time a #PCDATA content occurs, it is assigned a TextType unique type. A ISIR document looks like: <ISIR> <NominalMarkup> <ElemType> </ElemType> </NominalMarkup> <Sequence> <TextType> </TextType> </Sequence> <Context> <TextType> </TextType> <TextTypeList> <TextType> </TextType> </TextTypeList> </Context> <Importance><ElemType> </ElemType><Weight> </Weight>... </Importance> </ISIR> 5. The Relevance Calculation Relevance calculation is performed by dedicated functions of the External Functions module. To process a query, it is first necessary to identify the required elements and, eventually, the elements where the textual restriction will be evaluated. Let TextOcc be the occurrence of a textual component of type TextType in a document. The relevance of such a component, with respect to a textual restriction, is calculated by a function RelText of the type of the component, its textual representation (TextRep) and the structural information given by the heuristics or the ISIR: Relevance(TextOcc) = RelText(TextType, TextRep, Heuristics/ISIR). To show the role of the information from the heuristics (or the ISIR) in the calculation of this relevance, recall the example of section 3. If the content of Subject was taken into account as a context to the paragraphs (heuristic 3), and these ones were considered to be related to each other (heuristic 2), then the second paragraph would have a relevance value different from 0 (despites it does not have the term vacations), entering in the final ranked list. This is correct in the sense that this paragraph, like the others, also talks about the vacations of the sender and probably would interest the receiver. After analysing the relevance of the textual components of a document, the required elements ranked by their relevance are given to the user. The system calculates this relevance for all the elements in the path from the textual components up to the required elements. The calculation is done in a simplistic way by the average of the relevances of the N components (Comp n ) of an

8 element type that occurs in a document (ElemOcc). For textual components, this relevance is given by the function RelText; for structural components, it is weighted by the respective importance (Importance). We assume the importance of a textual component equal to the highest importance of the sibling elements. Thus, we have: Relevance(ElemOcc) = [Σ N Relevance(Comp n )*Importance n ] / Σ N Importance n 6. Conclusions and Future Work We have presented a system to make IR over collections of structured documents represented in DASTs. We extend the traditional IR approach, based on textual representations only, with heuristics about the structure in order to improve the quality of the relevance estimation. We proposed also the possibility of substituting the heuristics by a ISIR document with precise information concerning the structure of the documents. As future work, we intend to: (1) Analyse further the information about the structure that we can get from different types of structured documents to extend the ISIR and improve the retrieval quality. (2) Finish the implementation of the system with the approximation of the function RelText using statistical methods. For that we will seek for appropriate training sets, i.e. labelled structured documents collections and respective queries. Then, we will test the retrieval quality of the system calculating appropriate standard IR measures (namely Precision and Recall) and compare the results with the ones given using a simple textual representation and the cosine similarity. (3) Extend the AG specification with XML attributes, entities, hyperlinks and other multimedia information in the automatic generation of the AG. We believe that their semantic can improve the IR task. (4) Extend the system to accept more flexible queries. Financial Support: This work is granted by the Sub-Programa Ciência e Tecnologia do 2 Quadro Comunitário de Apoio of the "Fundação para a Ciência e a Tecnologia (FCT). References [Gan01] A. L. Gançarski, Using Attribute Grammars to Uniformly Represent Structured Documents Application to Information Retrieval, Third Network of Excellence Workshop on Interoperability and Mediation in Heterogeneous Digital Libraries, Darmstadt, Germany, [HTK00] Y. Hayashi, J. Tomota and G. Kikui, Searching text-rich XML documents with relevance ranking, ACM SIGIR'00 Workshop on XML and Information Retrieval, Athens, Greece, [Lal97] M. Lalmas, A dempster-shafer theory of evidence applied to structured documents: Modelling uncertainty, Research and Development in Information Retrieval, p , [NBY95] G. Navarro and R. Baeza-Yates, A language for queries on structure and contents of textual databases, ACM SIGIR 95, [Pit] D. V. Pitt, Standard Generalised Markup Language and the Transformation of Cataloging, [Sco00] M. L. Scott, Programming Languages Pragmatics, Morgan Kaufmann Publishers, San Francisco, California, [SN00] T. Schlieder and F. Naumann, Approximate tree embeeding for querying XML data, ACM SIGIR 2000 Workshop on XML and Information Retrieval, Athens, Greece, [WFC99] J. E. Wolff, H. Florke and A. B. Cremers, Xpres: a ranking approach to retrieval on structured documents, Technical report, University of Bonn, Germany, [Wil94] R. Wilkinson, Effective retrieval of structured documents, ACM SIGIR 94, 1994.

Using Attribute Grammars to Uniformly Represent Structured Documents - Application to Information Retrieval

Using Attribute Grammars to Uniformly Represent Structured Documents - Application to Information Retrieval Using Attribute Grammars to Uniformly Represent Structured Documents - Application to Information Retrieval Alda Lopes Gançarski Pierre et Marie Curie University, Laboratoire d Informatique de Paris 6,

More information

AG-Based interactive system to retrieve information from XML documents

AG-Based interactive system to retrieve information from XML documents AG-Based interactive system to retrieve information from XML documents Alda Lopes Gançarski University of Minho, Braga, Portugal, email: aldalopes@di.uminho.pt Member of Laboratoire d Informatique de Paris

More information

Inverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5

Inverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5 Inverted Indexes Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5 Basic Concepts Inverted index: a word-oriented mechanism for indexing a text collection to speed up the

More information

A new generation of tools for SGML

A new generation of tools for SGML Article A new generation of tools for SGML R. W. Matzen Oklahoma State University Department of Computer Science EMAIL rmatzen@acm.org Exceptions are used in many standard DTDs, including HTML, because

More information

A QUERY BY EXAMPLE APPROACH FOR XML QUERYING

A QUERY BY EXAMPLE APPROACH FOR XML QUERYING A QUERY BY EXAMPLE APPROACH FOR XML QUERYING Flávio Xavier Ferreira, Daniela da Cruz, Pedro Rangel Henriques Department of Informatics, University of Minho Campus de Gualtar, 4710-057 Braga, Portugal flavioxavier@di.uminho.pt,

More information

extensible Markup Language

extensible Markup Language extensible Markup Language XML is rapidly becoming a widespread method of creating, controlling and managing data on the Web. XML Orientation XML is a method for putting structured data in a text file.

More information

The Wikipedia XML Corpus

The Wikipedia XML Corpus INEX REPORT The Wikipedia XML Corpus Ludovic Denoyer, Patrick Gallinari Laboratoire d Informatique de Paris 6 8 rue du capitaine Scott 75015 Paris http://www-connex.lip6.fr/denoyer/wikipediaxml {ludovic.denoyer,

More information

Chapter 13 XML: Extensible Markup Language

Chapter 13 XML: Extensible Markup Language Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server

More information

Comp 336/436 - Markup Languages. Fall Semester Week 2. Dr Nick Hayward

Comp 336/436 - Markup Languages. Fall Semester Week 2. Dr Nick Hayward Comp 336/436 - Markup Languages Fall Semester 2017 - Week 2 Dr Nick Hayward Digitisation - textual considerations comparable concerns with music in textual digitisation density of data is still a concern

More information

EMERGING TECHNOLOGIES. XML Documents and Schemas for XML documents

EMERGING TECHNOLOGIES. XML Documents and Schemas for XML documents EMERGING TECHNOLOGIES XML Documents and Schemas for XML documents Outline 1. Introduction 2. Structure of XML data 3. XML Document Schema 3.1. Document Type Definition (DTD) 3.2. XMLSchema 4. Data Model

More information

XML. extensible Markup Language. ... and its usefulness for linguists

XML. extensible Markup Language. ... and its usefulness for linguists XML extensible Markup Language... and its usefulness for linguists Thomas Mayer thomas.mayer@uni-konstanz.de Fachbereich Sprachwissenschaft, Universität Konstanz Seminar Computerlinguistik II (Miriam Butt)

More information

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward Comp 336/436 - Markup Languages Fall Semester 2017 - Week 4 Dr Nick Hayward XML - recap first version of XML became a W3C Recommendation in 1998 a useful format for data storage and exchange config files,

More information

Beyond DTDs: Constraining Data Content. Language Processing and Specification Group Computer Science Department University of Minho Portugal

Beyond DTDs: Constraining Data Content. Language Processing and Specification Group Computer Science Department University of Minho Portugal Beyond DTDs: Constraining Data Content José Carlos Ramalho Pedro Rangel Henriques Language Processing and Specification Group Computer Science Department University of Minho Portugal Work Background Project

More information

CHAPTER 2 MARKUP LANGUAGES: XHTML 1.0

CHAPTER 2 MARKUP LANGUAGES: XHTML 1.0 WEB TECHNOLOGIES A COMPUTER SCIENCE PERSPECTIVE CHAPTER 2 MARKUP LANGUAGES: XHTML 1.0 Modified by Ahmed Sallam Based on original slides by Jeffrey C. Jackson reserved. 0-13-185603-0 HTML HELLO WORLD! Document

More information

The XML Metalanguage

The XML Metalanguage The XML Metalanguage Mika Raento mika.raento@cs.helsinki.fi University of Helsinki Department of Computer Science Mika Raento The XML Metalanguage p.1/442 2003-09-15 Preliminaries Mika Raento The XML Metalanguage

More information

Processing Structural Constraints

Processing Structural Constraints SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited

More information

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11 !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... 7:4 @import Directive... 9:11 A Absolute Units of Length... 9:14 Addressing the First Line... 9:6 Assigning Meaning to XML Tags...

More information

A Web Service-Based System for Sharing Distributed XML Data Using Customizable Schema

A Web Service-Based System for Sharing Distributed XML Data Using Customizable Schema Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 A Web Service-Based System for Sharing Distributed XML Data Using Customizable

More information

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial. A tutorial report for SENG 609.22 Agent Based Software Engineering Course Instructor: Dr. Behrouz H. Far XML Tutorial Yanan Zhang Department of Electrical and Computer Engineering University of Calgary

More information

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward Comp 336/436 - Markup Languages Fall Semester 2018 - Week 4 Dr Nick Hayward XML - recap first version of XML became a W3C Recommendation in 1998 a useful format for data storage and exchange config files,

More information

A Universal Model for XML Information Retrieval

A Universal Model for XML Information Retrieval A Universal Model for XML Information Retrieval Maria Izabel M. Azevedo 1, Lucas Pantuza Amorim 2, and Nívio Ziviani 3 1 Department of Computer Science, State University of Montes Claros, Montes Claros,

More information

Glossary. ASCII: Standard binary codes to represent occidental characters in one byte.

Glossary. ASCII: Standard binary codes to represent occidental characters in one byte. Glossary ASCII: Standard binary codes to represent occidental characters in one byte. Ad hoc retrieval: standard retrieval task in which the user specifies his information need through a query which initiates

More information

Information management - Topic Maps visualization

Information management - Topic Maps visualization Information management - Topic Maps visualization Benedicte Le Grand Laboratoire d Informatique de Paris 6, Universite Pierre et Marie Curie, Paris, France Benedicte.Le-Grand@lip6.fr http://www-rp.lip6.fr/~blegrand

More information

The Specifications Exchange Service of an RM-ODP Framework

The Specifications Exchange Service of an RM-ODP Framework The Specifications Exchange Service of an RM-ODP Framework X. Blanc (*+), M-P. Gervais(*), J. Le Delliou(+) (*)Laboratoire d'informatique de Paris 6-8 rue du Capitaine Scott F75015 PARIS (+)EDF Research

More information

Introduction to XML. An Example XML Document. The following is a very simple XML document.

Introduction to XML. An Example XML Document. The following is a very simple XML document. Introduction to XML Extensible Markup Language (XML) was standardized in 1998 after 2 years of work. However, it developed out of SGML (Standard Generalized Markup Language), a product of the 1970s and

More information

Device Independent Principles for Adapted Content Delivery

Device Independent Principles for Adapted Content Delivery Device Independent Principles for Adapted Content Delivery Tayeb Lemlouma 1 and Nabil Layaïda 2 OPERA Project Zirst 655 Avenue de l Europe - 38330 Montbonnot, Saint Martin, France Tel: +33 4 7661 5281

More information

The Utrecht Blend: Basic Ingredients for an XML Retrieval System

The Utrecht Blend: Basic Ingredients for an XML Retrieval System The Utrecht Blend: Basic Ingredients for an XML Retrieval System Roelof van Zwol Centre for Content and Knowledge Engineering Utrecht University Utrecht, the Netherlands roelof@cs.uu.nl Virginia Dignum

More information

- What we actually mean by documents (the FRBR hierarchy) - What are the components of documents

- What we actually mean by documents (the FRBR hierarchy) - What are the components of documents Purpose of these slides Introduction to XML for parliamentary documents (and all other kinds of documents, actually) Prof. Fabio Vitali University of Bologna Part 1 Introduce the principal aspects of electronic

More information

AUTOMATIC ACQUISITION OF DIGITIZED NEWSPAPERS VIA INTERNET

AUTOMATIC ACQUISITION OF DIGITIZED NEWSPAPERS VIA INTERNET AUTOMATIC ACQUISITION OF DIGITIZED NEWSPAPERS VIA INTERNET Ismael Sanz, Rafael Berlanga, María José Aramburu and Francisco Toledo Departament d'informàtica Campus Penyeta Roja, Universitat Jaume I, E-12071

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Ricardo Baeza-Yates Berthier Ribeiro-Neto ACM Press NewYork Harlow, England London New York Boston. San Francisco. Toronto. Sydney Singapore Hong Kong Tokyo Seoul Taipei. New

More information

Introduction to XML. Yanlei Diao UMass Amherst April 17, Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.

Introduction to XML. Yanlei Diao UMass Amherst April 17, Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. Introduction to XML Yanlei Diao UMass Amherst April 17, 2008 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. 1 Structure in Data Representation Relational data is highly

More information

Focussed Structured Document Retrieval

Focussed Structured Document Retrieval Focussed Structured Document Retrieval Gabrialla Kazai, Mounia Lalmas and Thomas Roelleke Department of Computer Science, Queen Mary University of London, London E 4NS, England {gabs,mounia,thor}@dcs.qmul.ac.uk,

More information

Query Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4

Query Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4 Query Languages Berlin Chen 2005 Reference: 1. Modern Information Retrieval, chapter 4 Data retrieval Pattern-based querying The Kinds of Queries Retrieve docs that contains (or exactly match) the objects

More information

Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa

Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa NPFL092 Technology for Natural Language Processing Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa November 28, 2018 Charles Univeristy in Prague Faculty of Mathematics and Physics Institute of Formal

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 13 Structured Text Retrieval with Mounia Lalmas Introduction Structuring Power Early Text Retrieval Models Evaluation Query Languages Structured Text Retrieval, Modern

More information

Plan for today. CS276B Text Retrieval and Mining Winter Vector spaces and XML. Text-centric XML retrieval. Vector spaces and XML

Plan for today. CS276B Text Retrieval and Mining Winter Vector spaces and XML. Text-centric XML retrieval. Vector spaces and XML CS276B Text Retrieval and Mining Winter 2005 Plan for today Vector space approaches to XML retrieval Evaluating text-centric retrieval Lecture 15 Text-centric XML retrieval Documents marked up as XML E.g.,

More information

Overview. Introduction. Introduction XML XML. Lecture 16 Introduction to XML. Boriana Koleva Room: C54

Overview. Introduction. Introduction XML XML. Lecture 16 Introduction to XML. Boriana Koleva Room: C54 Overview Lecture 16 Introduction to XML Boriana Koleva Room: C54 Email: bnk@cs.nott.ac.uk Introduction The Syntax of XML XML Document Structure Document Type Definitions Introduction Introduction SGML

More information

An Analysis of Approaches to XML Schema Inference

An Analysis of Approaches to XML Schema Inference An Analysis of Approaches to XML Schema Inference Irena Mlynkova irena.mlynkova@mff.cuni.cz Charles University Faculty of Mathematics and Physics Department of Software Engineering Prague, Czech Republic

More information

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington CS330 Lecture April 8, 2003 1 Overview From HTML to XML DTDs Querying XML: XPath Transforming XML: XSLT

More information

Introduction to Database Systems CSE 444

Introduction to Database Systems CSE 444 Introduction to Database Systems CSE 444 Lecture 25: XML 1 XML Outline XML Syntax Semistructured data DTDs XPath Coverage of XML is much better in new edition Readings Sections 11.1 11.3 and 12.1 [Subset

More information

Mounia Lalmas, Department of Computer Science, Queen Mary, University of London, United Kingdom,

Mounia Lalmas, Department of Computer Science, Queen Mary, University of London, United Kingdom, XML Retrieval Mounia Lalmas, Department of Computer Science, Queen Mary, University of London, United Kingdom, mounia@acm.org Andrew Trotman, Department of Computer Science, University of Otago, New Zealand,

More information

Information mining and information retrieval : methods and applications

Information mining and information retrieval : methods and applications Information mining and information retrieval : methods and applications J. Mothe, C. Chrisment Institut de Recherche en Informatique de Toulouse Université Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse

More information

Improving Suffix Tree Clustering Algorithm for Web Documents

Improving Suffix Tree Clustering Algorithm for Web Documents International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal

More information

SEARCH SEMI-STRUCTURED DATA ON WEB

SEARCH SEMI-STRUCTURED DATA ON WEB SEARCH SEMI-STRUCTURED DATA ON WEB Sabin-Corneliu Buraga 1, Teodora Rusu 2 1 Faculty of Computer Science, Al.I.Cuza University of Iaşi, Romania Berthelot Str., 16 6600 Iaşi, Romania, tel: +40 (32 201529,

More information

M359 Block5 - Lecture12 Eng/ Waleed Omar

M359 Block5 - Lecture12 Eng/ Waleed Omar Documents and markup languages The term XML stands for extensible Markup Language. Used to label the different parts of documents. Labeling helps in: Displaying the documents in a formatted way Querying

More information

Phrase Detection in the Wikipedia

Phrase Detection in the Wikipedia Phrase Detection in the Wikipedia Miro Lehtonen 1 and Antoine Doucet 1,2 1 Department of Computer Science P. O. Box 68 (Gustaf Hällströmin katu 2b) FI 00014 University of Helsinki Finland {Miro.Lehtonen,Antoine.Doucet}

More information

Ontology Extraction from Heterogeneous Documents

Ontology Extraction from Heterogeneous Documents Vol.3, Issue.2, March-April. 2013 pp-985-989 ISSN: 2249-6645 Ontology Extraction from Heterogeneous Documents Kirankumar Kataraki, 1 Sumana M 2 1 IV sem M.Tech/ Department of Information Science & Engg

More information

Text Properties and Languages

Text Properties and Languages Text Properties and Languages 1 Statistical Properties of Text How is the frequency of different words distributed? How fast does vocabulary size grow with the size of a corpus? Such factors affect the

More information

Metamorphosis An Environment to Achieve Semantic Interoperability with Topic Maps

Metamorphosis An Environment to Achieve Semantic Interoperability with Topic Maps Metamorphosis An Environment to Achieve Semantic Interoperability with Topic Maps Giovani Rubert Librelotto 1 and José Carlos Ramalho 2 and Pedro Rangel Henriques 2 1 UNIFRA Centro Universitário Franciscano

More information

1. Please, please, please look at the style sheets job aid that I sent to you some time ago in conjunction with this document.

1. Please, please, please look at the style sheets job aid that I sent to you some time ago in conjunction with this document. 1. Please, please, please look at the style sheets job aid that I sent to you some time ago in conjunction with this document. 2. W3Schools has a lovely html tutorial here (it s worth the time): http://www.w3schools.com/html/default.asp

More information

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Jorge Gracia, Eduardo Mena IIS Department, University of Zaragoza, Spain {jogracia,emena}@unizar.es Abstract. Ontology matching, the task

More information

HERA: Automatically Generating Hypermedia Front- Ends for Ad Hoc Data from Heterogeneous and Legacy Information Systems

HERA: Automatically Generating Hypermedia Front- Ends for Ad Hoc Data from Heterogeneous and Legacy Information Systems HERA: Automatically Generating Hypermedia Front- Ends for Ad Hoc Data from Heterogeneous and Legacy Information Systems Geert-Jan Houben 1,2 1 Eindhoven University of Technology, Dept. of Mathematics and

More information

DATA MODELS FOR SEMISTRUCTURED DATA

DATA MODELS FOR SEMISTRUCTURED DATA Chapter 2 DATA MODELS FOR SEMISTRUCTURED DATA Traditionally, real world semantics are captured in a data model, and mapped to the database schema. The real world semantics are modeled as constraints and

More information

Text Languages and Properties

Text Languages and Properties Text Languages and Properties Berlin Chen 2005 Reference: 1. Modern Information Retrieval, chapter 6 Documents A document is a single unit of information Typical text in digital form, but can also include

More information

CSI 3140 WWW Structures, Techniques and Standards. Markup Languages: XHTML 1.0

CSI 3140 WWW Structures, Techniques and Standards. Markup Languages: XHTML 1.0 CSI 3140 WWW Structures, Techniques and Standards Markup Languages: XHTML 1.0 HTML Hello World! Document Type Declaration Document Instance Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson

More information

Labelling & Classification using emerging protocols

Labelling & Classification using emerging protocols Labelling & Classification using emerging protocols "wheels you don't have to reinvent & bandwagons you can jump on" Stephen McGibbon Lotus Development Assumptions The business rationale and benefits of

More information

Chapter 1: Getting Started. You will learn:

Chapter 1: Getting Started. You will learn: Chapter 1: Getting Started SGML and SGML document components. What XML is. XML as compared to SGML and HTML. XML format. XML specifications. XML architecture. Data structure namespaces. Data delivery,

More information

An Approach To Web Content Mining

An Approach To Web Content Mining An Approach To Web Content Mining Nita Patil, Chhaya Das, Shreya Patanakar, Kshitija Pol Department of Computer Engg. Datta Meghe College of Engineering, Airoli, Navi Mumbai Abstract-With the research

More information

SXML: Streaming XML. Boris Rogge 1, Dimitri Van De Ville 1, Rik Van de Walle 1, Wilfried Philips 2 and Ignace Lemahieu 1

SXML: Streaming XML. Boris Rogge 1, Dimitri Van De Ville 1, Rik Van de Walle 1, Wilfried Philips 2 and Ignace Lemahieu 1 SXML: Streaming XML Boris Rogge 1, Dimitri Van De Ville 1, Rik Van de Walle 1, Wilfried Philips 2 and Ignace Lemahieu 1 1 University of Ghent 2 University of Ghent Elis - Medisip - Ibitech Telin Sint-Pietersnieuwstraat

More information

SDPL : XML Basics 2. SDPL : XML Basics 1. SDPL : XML Basics 4. SDPL : XML Basics 3. SDPL : XML Basics 5

SDPL : XML Basics 2. SDPL : XML Basics 1. SDPL : XML Basics 4. SDPL : XML Basics 3. SDPL : XML Basics 5 2 Basics of XML and XML documents 2.1 XML and XML documents Survivor's Guide to XML, or XML for Computer Scientists / Dummies 2.1 XML and XML documents 2.2 Basics of XML DTDs 2.3 XML Namespaces XML 1.0

More information

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target

More information

singly and doubly linked lists, one- and two-ended arrays, and circular arrays.

singly and doubly linked lists, one- and two-ended arrays, and circular arrays. 4.1 The Tree Data Structure We have already seen a number of data structures: singly and doubly linked lists, one- and two-ended arrays, and circular arrays. We will now look at a new data structure: the

More information

Contents. 1 Introduction Basic XML concepts Historical perspectives Query languages Contents... 2

Contents. 1 Introduction Basic XML concepts Historical perspectives Query languages Contents... 2 XML Retrieval 1 2 Contents Contents......................................................................... 2 1 Introduction...................................................................... 5 2 Basic

More information

Fundamentals of Web Programming a

Fundamentals of Web Programming a Fundamentals of Web Programming a Introduction to XML Teodor Rus rus@cs.uiowa.edu The University of Iowa, Department of Computer Science a Copyright 2009 Teodor Rus. These slides have been developed by

More information

Heading-Based Sectional Hierarchy Identification for HTML Documents

Heading-Based Sectional Hierarchy Identification for HTML Documents Heading-Based Sectional Hierarchy Identification for HTML Documents 1 Dept. of Computer Engineering, Boğaziçi University, Bebek, İstanbul, 34342, Turkey F. Canan Pembe 1,2 and Tunga Güngör 1 2 Dept. of

More information

Single-pass Static Semantic Check for Efficient Translation in YAPL

Single-pass Static Semantic Check for Efficient Translation in YAPL Single-pass Static Semantic Check for Efficient Translation in YAPL Zafiris Karaiskos, Panajotis Katsaros and Constantine Lazos Department of Informatics, Aristotle University Thessaloniki, 54124, Greece

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation

More information

Formal languages and computation models

Formal languages and computation models Formal languages and computation models Guy Perrier Bibliography John E. Hopcroft, Rajeev Motwani, Jeffrey D. Ullman - Introduction to Automata Theory, Languages, and Computation - Addison Wesley, 2006.

More information

Semistructured Data Store Mapping with XML and Its Reconstruction

Semistructured Data Store Mapping with XML and Its Reconstruction Semistructured Data Store Mapping with XML and Its Reconstruction Enhong CHEN 1 Gongqing WU 1 Gabriela Lindemann 2 Mirjam Minor 2 1 Department of Computer Science University of Science and Technology of

More information

CSE 12 Abstract Syntax Trees

CSE 12 Abstract Syntax Trees CSE 12 Abstract Syntax Trees Compilers and Interpreters Parse Trees and Abstract Syntax Trees (AST's) Creating and Evaluating AST's The Table ADT and Symbol Tables 16 Using Algorithms and Data Structures

More information

10/24/12. What We Have Learned So Far. XML Outline. Where We are Going Next. XML vs Relational. What is XML? Introduction to Data Management CSE 344

10/24/12. What We Have Learned So Far. XML Outline. Where We are Going Next. XML vs Relational. What is XML? Introduction to Data Management CSE 344 What We Have Learned So Far Introduction to Data Management CSE 344 Lecture 12: XML and XPath A LOT about the relational model Hand s on experience using a relational DBMS From basic to pretty advanced

More information

Propositional Logic. Part I

Propositional Logic. Part I Part I Propositional Logic 1 Classical Logic and the Material Conditional 1.1 Introduction 1.1.1 The first purpose of this chapter is to review classical propositional logic, including semantic tableaux.

More information

CS 406: Syntax Directed Translation

CS 406: Syntax Directed Translation CS 406: Syntax Directed Translation Stefan D. Bruda Winter 2015 SYNTAX DIRECTED TRANSLATION Syntax-directed translation the source language translation is completely driven by the parser The parsing process

More information

(1) I (2) S (3) P allow subscribers to connect to the (4) often provide basic services such as (5) (6)

(1) I (2) S (3) P allow subscribers to connect to the (4) often provide basic services such as (5) (6) Collection of (1) Meta-network That is, a (2) of (3) Uses a standard set of protocols Also uses standards d for structuring t the information transferred (1) I (2) S (3) P allow subscribers to connect

More information

Evaluation of Semantic Actions in Predictive Non- Recursive Parsing

Evaluation of Semantic Actions in Predictive Non- Recursive Parsing Evaluation of Semantic Actions in Predictive Non- Recursive Parsing José L. Fuertes, Aurora Pérez Dept. LSIIS School of Computing. Technical University of Madrid Madrid, Spain Abstract To implement a syntax-directed

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lecture 11: XML and XPath 1 XML Outline What is XML? Syntax Semistructured data DTDs XPath 2 What is XML? Stands for extensible Markup Language 1. Advanced, self-describing

More information

Indexing into Controlled Vocabularies with XML

Indexing into Controlled Vocabularies with XML Indexing into Controlled Vocabularies with XML Stephen C Arnold Bradley J Spenla College of Information Science and Technology Drexel University Philadelphia, PA, 19104, USA stephenarnold@cisdrexeledu

More information

68A8 Multimedia DataBases Information Retrieval - Exercises

68A8 Multimedia DataBases Information Retrieval - Exercises 68A8 Multimedia DataBases Information Retrieval - Exercises Marco Gori May 31, 2004 Quiz examples for MidTerm (some with partial solution) 1. About inner product similarity When using the Boolean model,

More information

Activity Report at SYSTRAN S.A.

Activity Report at SYSTRAN S.A. Activity Report at SYSTRAN S.A. Pierre Senellart September 2003 September 2004 1 Introduction I present here work I have done as a software engineer with SYSTRAN. SYSTRAN is a leading company in machine

More information

A Voting Method for XML Retrieval

A Voting Method for XML Retrieval A Voting Method for XML Retrieval Gilles Hubert 1 IRIT/SIG-EVI, 118 route de Narbonne, 31062 Toulouse cedex 4 2 ERT34, Institut Universitaire de Formation des Maîtres, 56 av. de l URSS, 31400 Toulouse

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction Adaptable and Adaptive Web Information Systems School of Computer Science and Information Systems Birkbeck College University of London Lecture 1: Introduction George Magoulas gmagoulas@dcs.bbk.ac.uk October

More information

Key differentiating technologies for mobile search

Key differentiating technologies for mobile search Key differentiating technologies for mobile search Orange Labs Michel PLU, ORANGE Labs - Research & Development Exploring the Future of Mobile Search Workshop, GHENT Some key differentiating technologies

More information

(2,4) Trees. 2/22/2006 (2,4) Trees 1

(2,4) Trees. 2/22/2006 (2,4) Trees 1 (2,4) Trees 9 2 5 7 10 14 2/22/2006 (2,4) Trees 1 Outline and Reading Multi-way search tree ( 10.4.1) Definition Search (2,4) tree ( 10.4.2) Definition Search Insertion Deletion Comparison of dictionary

More information

Binary Trees

Binary Trees Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what

More information

Additional Readings on XPath/XQuery Main source on XML, but hard to read:

Additional Readings on XPath/XQuery Main source on XML, but hard to read: Introduction to Database Systems CSE 444 Lecture 10 XML XML (4.6, 4.7) Syntax Semistructured data DTDs XML Outline April 21, 2008 1 2 Further Readings on XML Additional Readings on XPath/XQuery Main source

More information

Extending E-R for Modelling XML Keys

Extending E-R for Modelling XML Keys Extending E-R for Modelling XML Keys Martin Necasky Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic martin.necasky@mff.cuni.cz Jaroslav Pokorny Faculty of Mathematics and

More information

Chapter 3 (part 3) Describing Syntax and Semantics

Chapter 3 (part 3) Describing Syntax and Semantics Chapter 3 (part 3) Describing Syntax and Semantics Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the Meanings

More information

2. Write style rules for how you d like certain elements to look.

2. Write style rules for how you d like certain elements to look. CSS for presentation Cascading Style Sheet Orientation CSS Cascading Style Sheet is a language that allows the user to change the appearance or presentation of elements on the page: the size, style, and

More information

2. PRELIMINARIES MANICURE is specically designed to prepare text collections from printed materials for information retrieval applications. In this ca

2. PRELIMINARIES MANICURE is specically designed to prepare text collections from printed materials for information retrieval applications. In this ca The MANICURE Document Processing System Kazem Taghva, Allen Condit, Julie Borsack, John Kilburg, Changshi Wu, and Je Gilbreth Information Science Research Institute University of Nevada, Las Vegas ABSTRACT

More information

XML Clustering by Bit Vector

XML Clustering by Bit Vector XML Clustering by Bit Vector WOOSAENG KIM Department of Computer Science Kwangwoon University 26 Kwangwoon St. Nowongu, Seoul KOREA kwsrain@kw.ac.kr Abstract: - XML is increasingly important in data exchange

More information

FACULTY OF ENGINEERING B.E. 4/4 (CSE) II Semester (Old) Examination, June Subject : Information Retrieval Systems (Elective III) Estelar

FACULTY OF ENGINEERING B.E. 4/4 (CSE) II Semester (Old) Examination, June Subject : Information Retrieval Systems (Elective III) Estelar B.E. 4/4 (CSE) II Semester (Old) Examination, June 2014 Subject : Information Retrieval Systems Code No. 6306 / O 1 Define Information retrieval systems. 3 2 What is precision and recall? 3 3 List the

More information

XML Structures. Web Programming. Uta Priss ZELL, Ostfalia University. XML Introduction Syntax: well-formed Semantics: validity Issues

XML Structures. Web Programming. Uta Priss ZELL, Ostfalia University. XML Introduction Syntax: well-formed Semantics: validity Issues XML Structures Web Programming Uta Priss ZELL, Ostfalia University 2013 Web Programming XML1 Slide 1/32 Outline XML Introduction Syntax: well-formed Semantics: validity Issues Web Programming XML1 Slide

More information

An Intelligent System for Archiving and Retrieval of Audiovisual Material Based on the MPEG-7 Description Schemes

An Intelligent System for Archiving and Retrieval of Audiovisual Material Based on the MPEG-7 Description Schemes An Intelligent System for Archiving and Retrieval of Audiovisual Material Based on the MPEG-7 Description Schemes GIORGOS AKRIVAS, SPIROS IOANNOU, ELIAS KARAKOULAKIS, KOSTAS KARPOUZIS, YANNIS AVRITHIS

More information

HTML is a mark-up language, in that it specifies the roles the different parts of the document are to play.

HTML is a mark-up language, in that it specifies the roles the different parts of the document are to play. Introduction to HTML (5) HTML is a mark-up language, in that it specifies the roles the different parts of the document are to play. For example you may specify which section of a document is a top level

More information

Deep Web Content Mining

Deep Web Content Mining Deep Web Content Mining Shohreh Ajoudanian, and Mohammad Davarpanah Jazi Abstract The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased

More information

Department of Electronic Engineering FINAL YEAR PROJECT REPORT

Department of Electronic Engineering FINAL YEAR PROJECT REPORT Department of Electronic Engineering FINAL YEAR PROJECT REPORT BEngCE-2007/08-HCS-HCS-03-BECE Natural Language Understanding for Query in Web Search 1 Student Name: Sit Wing Sum Student ID: Supervisor:

More information

ONE MARKS QUESTION AND ANSWERS

ONE MARKS QUESTION AND ANSWERS ONE MARKS QUESTION AND ANSWERS Question No 1 Typical Configuration of Computer System: 1. What is a motherboard? The motherboard is the most important circuit board of the computer; all the components

More information

Lecture 21: Relational Programming II. November 15th, 2011

Lecture 21: Relational Programming II. November 15th, 2011 Lecture 21: Relational Programming II November 15th, 2011 Lecture Outline Relational Programming contd The Relational Model of Computation Programming with choice and Solve Search Strategies Relational

More information

Really quick guide to DocBook

Really quick guide to DocBook 1. Introduction Really quick guide to DocBook Ferry Boender This document is about DocBook. DocBook is a standard for creating, mostly technical, documents. DocBook s great advantage lies in the fact that

More information