A Bayesian Approach to WSD for the Retrieval of. XML Documents.

Size: px
Start display at page:

Download "A Bayesian Approach to WSD for the Retrieval of. XML Documents."

Transcription

1 A Bayesian Approach to WSD for the Retrieval of XML Documents Marco Mesiti 1,Paolo Rosso 2, Marina Merlo 1 1 Dipartimento di Informatica e Scienze dell'informazione -Universita digenova, Italy fmesiti,mmerlog@disi.unige.it 2 Dpto. de Sistemas Informaticos y Computacion - Univ. Politecnica de Valencia, Spain prosso@dsic.upv.es Abstract Sources of XML documents are today proliferating on the World Wide Web. An important feature of XML is that information on documents structures is available on the Web together with the documents contents. This information can be exploited to improve document handling and to improve query processing. In such an heterogeneous environment as the Web, it is not reasonable to assume that there arexmldocuments which always satisfy a certain query. A metric for quantifying the structural similarity between an XML document and a query is necessary. The aim is to develop a technique which could allow for aproximate quering, that is, based on structural similarity and synonymy between tags of XML documents. In this paper, we present an algorithm for the retrieval of XML documents which is based on the structural and semantic similarity of a document with a given query. For the semantic indexing of the tags of XML documents and queries, the naive Bayesian approach and the WordNet ontology were used. 1 Introduction XML (extensible Markup Language) [14] is a markup language that has recently emerged as the most relevantstandardisation eort for document representation and exchange on the Web. XML is a simplied version of SGML, designed for use on the Web. Its goal is to provide the same advantages of SGML, while atthe same time providing a language for creating documents on the Web which is easier to learn and use than SGML. HTML, created from SGML for using on the Web, is an helpful tool for presentingdocuments, butitisnot adequate for exchanging them. The main advantages of XML with respect to HTML are related to the possibility ofdening tags, nested document structures, and document types (called DTDs: Document Type Denitions) that describe the structure of documents. The building blocks of XML are nested, tagged elements. Each tagged element has a sequence of zero or more attribute-value pairs, and a sequence of zero or more subelements. These subelements may themselves be tagged elements, or they may be \tagless" segments of natural language text data. A well-formed XML document isadocument that follows the grammar rules of XML [14], but itplacesno restriction on tags, attribute names, orprop- erly nesting ofpatterns. Alternatively, a document can be coupled with a DTD, which is essentially a grammar for restricting the tags and the structure of a document. An XML document conformingto a DTD is considered valid. The need of shifting from exact queries with boolean answers to approximate queries with ranked results has emerged as a requirement of XML query languages for searching the Web [7]andsomeapproaches in this direction have been developed [3]. These approaches, however, consider similarity only between terms appearing inthe documents (i.e., similarity of content) disregarding structure similarity. In the paper we propose an algorithm for matching an XML document against a DTD query with respect to its structure and itstag similarity. In fact, XML documents, because of semantic tags, are self-describing. The aim is to exploit the structural similarityinthedevelopment of XML-based search engines able to extract, from the Web, documents (or portions of documents) which are also semantically similar to a given query. The remainder of the paper is structured as follows. Section 2 presents the matching algorithm based on structural similarity in the eld of classication of documents. Sec-

2 tion 3 deals with our tree representation of XML documents and queries, whereas Section 4 discusses our Bayesian approach for the disambiguation of queries and documents. Section 5 presents the semantic versionofthe matching algorithm in order to label the tags of DTD queries and XML documents with the concepts of the WordNet ontology. Section 6 some preliminary resultsofthe retrieval of the XML documents which match a given query with respect to structure and synonymy between tags. Finally, Section 7 concludes the work, by discussing extensions and applicability of the proposed technique. 2 From Classication to Retrieval of XML Documents The approach we propose for querying XML documents relies on previous work carried out for XML document classication and for disambiguation of natural language queries. In this section we present the main features of the classication approach, whereas in the next one we discuss how to take into account semantics. In [2] an approach for the classication of XML documents against a set of DTDs was proposed. Such an approach relies on a similarity measure that evaluates the similarity degree between an XML document and a DTD counting the exceeding and missing elements of the document with respect to the DTD (named, respectively, plus and minus elements). Such elements are, then, weighted accordingtothelevel in the hierarchical structures in which they are detected and to the complexity of their structures. Starting from a tree representation of an XML document D and a DTD T, the similarity measure is computed by means of the matching algorithm Match which produces a triple (p,m,c) for the pair (D,T). The triple evaluates the plus, minus, and common elements of the two structures taking into account the weight associated to the plus and minus elements. Such algorithm is based on the idea of locally determining the best structure for a DTD element, as soon as the information on the structure of its subelements in the document are known. The Match algorithm recursively visits the document and the DTD, at the same time, from the roottothe leaves, to match common elements. Specically, twodistinct phases can be distinguished: 1. in the rst phase, moving down in the trees from the roots, the parts of the trees to visit through recursive calls are determined, butnoevaluation is performed 2. when a terminal case is reached, on return from the recursive calls and going up in the trees, the various alternatives are evaluated and the best one is selected. Intuitively, in the rst phase the DTD is used as a \guide" todetect theelements ofthe document that are covered by the DTD, disregarding the operators that bind together subelements of an element. In the secondphase, by contrast, the operators used in the DTD are considered to verify that elements are bound as prescribed by the DTD, and to dene an evaluation of the missing or exceeding parts of the document with respect to the DTD. The algorithm matches a document against a query and obtains a structural similarity value given by the function E which isdened as follows: E((p m c)) c p + c + m (1) The function is based on two real parameters and, s.t.,0. Depending on the value assigned to these parameters, the function gives more relevance to plus elements with respect tominus elements, or vice- versa. For example, if 0 and 1, the function does not take into account plus elements in measuring similarity. Therefore, a document with only extra elements with respect to the ones specied in the DTD has a similarity measure equal to 1. By contrast, if 1 and 0 the evaluation function does not take into account minus elements in the similarity measure. We assume that 1, thus giving the same relevance to plus and minus elements. An exhausted description of the matching algorithm goes beyond the scope of this paper. For the detailed version see [8]. 3 Documents and Queries as Trees In the previous section we already mentioned the use of a tree representation for documents and queries. For sake of clarity, we consider sample XML documents, that is, documents formed by nested elements, disregarding attributes and order of elements (i.e., we focus on data-centric documents). Moreover,

3 <movie> <title>la vita e bella</title> <cast> <actor>r. Benigni</actor> <actor>n. Braschi</actor> <actor>m. Paredes</actor> </cast> <story> ::NLtext:: </story> <production> <producer>g. Braschi</producer> <producer>e. Ferri</producer> </production> <>R. Benigni</> </movie> (a) movie title "La vita story e bella" cast production "R.Benigni" "...NL text..." actor actor producer producer actor "M.Paredes" "R.Benigni" "G.Braschi" "E.Ferri" "N.Braschi" (b) Figure 1: XML documentand corresponding tree representation we consider queries dened for the retrieval of the entire document, disregarding queries that can return parts of documents. 3.1 Tree Representation of XML Documents In what follow we use the tree representation of XML documents presented in [2], in which an XML document is represented as a labelled tree. Each nodeofthe tree represents anele- ment oravalue. The label associated to a node represents the corresponding tag name or value. The labels used to tag the tree belongto a set of elementtags (EN)andtoaset of values that the data content of an element can assume (V). In each tree that represents an XML document the root label belongs to EN, being the document element name. Figure 1(b) shows the tree representation of the XML document of Figure 1(a) Queries as Labelled Trees A query is also represented as a labeled tree. In the tree representation, in order to represent optional elements, and alternative of ele- 1 Explicit direction of edges is omitted. All edges are oriented downward. content ANY film actor N.Moretti OR N.Moretti Figure 2: Query expressed as a tree ments, the set of operators OP f OR?g is introduced. The operator represents a sequence of elements, the OR operator represents analternative of elements, and the? operator represents anoptional element. In our representation of queries each node corresponds to an element tag, to an element typeorvalue, to anoperator, or to a predicate. In each tree which represents a query there is a single edge outgoing from the root, and the rootlabel belongs to EN (it is the name ofthe main element ofdocuments we wish to retrieve by the query). Moreover, there can be more than one edge outgoing from a node, only if the node is labelled by or OR. All nodes labelled by types/values are leaves of the tree. Finally, ifanodeislabelled by a predicate, then it has only one child (i.e., a leaf of the tree labelled by a value). Figure 2 shows the tree representation of a query. We remark that the introduction of operators OP f OR?g allows us for representing the structure of all kinds of DTDs. 4 Bayesian Approach for WSD: from NLP vs XML Word Sense Disambiguation (WSD) is the problem of assigning the appropriate meaning (or sense) to a given word in a text (or discourse). Resolving the ambiguity ofwords is a central problem for large scale language understanding application and their associate task [9]. Besides, WSD is one ofthe most importantopen problem in Natural Language Processing (NLP). The election ofthe proper sense is non-trivial undertakinggiven thephenomenon of polysemy. Aword is disambiguated along withthe surrounding words of the text in which itisembedded, that is, along its context. Todetermine the context of each word to sense-tag, a window of a certain size is moved along the text. Words and context, which are already tagged with alabel representing the corresponding syntactic category

4 film Fellini date > 1974 Figure 3: The query expressed as a tree (noun, verb, adjective oradverb), are usually processed together with a lexical relations database like WordNet [13]. The WordNet ontology is organised around the notion of synset, that is, set of synonyms. In fact, WordNet represents concepts aslists comprised of the lexical entries that can be used to express the concept. Dierent are the machine learning algorithms, supervised and unsupervised, which perform the WSD task of NLP. One of these statistical learning methods is the naive Bayesian approach which, assuming the independence of features (i.e., words of the context), it classies a new example (i.e., it sense tagsanew word) by assigning the class (i.e., the synset) that maximises the conditional probability ofthe class given the observed sequence of features (i.e., words of its context) of that example. The formula of the naive-bayes is dened as follow [11], where c j represents the n words of the context, s asenseofthe word to disambiguate and S the set of its senses: max s2s ny P (s) P (c j js) (2) j1 For instance, in the natural language sentence: \What lms did the Fellini make after the date 1974?" if we want to disambiguating the word the words of its context could be date and lm (stemmed version of lms). Model probabilities are estimatedduring the training process using relative frequencies. Asense- tagged like SemCor corpus, which consistof a portion of the Brown corpus tagged with WordNet senses, has to be used during the training andtesting phases. At the end,, should be tagged with the synset which refers to stage. Toavoid the eect of zero counts, the simple at discount smoothing technique can be used [15]. A query like that ofabove, if expressed s argmax si 2S P(si)60P (s i) W ij j1 where: ( (1 ; P ) P (cjjs i) if P (c jjs i) 6 0 n W ij ( k1 P (ckjsi)) if P (c jjs i)0 WN ;jw ij nsig si : occurrences of the lemma with sense s i nlemma: occurrences of the lemma to disambiguate nnamesig wij : occurrences of the context lemma c j with sense s i P (s i) 0 < 1 nsigs i nlemma P (c jjs i) nnamesigw ij nsigsi WN 94474, WordNet nouns ny W i fc j j 1 j nep(c jjs i) 6 0g Figure 4: Smoothed disambiguation algorithm through a query language dened for XML (like XPath [6]) would be translated as: /lm/\fellini" /lm/date>\1974" The Figure 3 shows the tree representation of the previous XPath query. At the semantic indexing, each tag of the DTD query will have to be tagged with the right WordNet concept (i.e. synset) in order to allow for the conceptual query of searching all documents which deal with the concept lm and satisfy certain requirements. In the case of our interest begin a search of certain particular documents with the keyword lm, the ones with the equivalent tag would not be retrieved. In order to apply the naive-bayes NLP technique in the more structured XML world, we have toredene what acontext of a word (i.e. atag) to disambiguate could be. In our context, we dene the surrounding words of a tag as its father, its sons and its brothers. For instance, the context of the tag would be given by lm and date. In Figure 4 is presented the smoothed disambiguation algorithm which was used.

5 5 A Semantic Matching Algorithm for retrieval of XML Documents Our semantic matching algorithm for the retrieval of XML documents is based on the preliminary version which was developed for their classication. In fact, the query evaluation process can be seen as an information retrieval method that usethe hierarchical structure of the document for identifying possible answers tothe query. A user wishing to retrieve documents ofa certain type from a source may not know the structures used in the source for representing such type of documents. Moreover, the possibility toevolve the structure of the DTDs may make unusable queries already dened by the users. For such reasons we aimtodene a query resolution mechanism based on similarity. In general, a query, expressed through a query language dened for XML (like XPath), is translated into adocument, called document template. Thedocument template represents the structural and content constraints adocument should satisfy in order to bean answer to the query. Thus, roughly speaking, adocument template can be seen as a DTD with content constraints. Intuitively, adocument thatweakly conforms to the document template is an answer tothe query. Thus, our idea is todene an approachto querying XML documents based on the classication algorithm. The documents in a source are classied against the document templateandthedocumentssatisfyingits constraints are collected into a set (called query answer set). Note that, because our approach to retrieving documents is based on the classication algorithm, the query answer set can contain similar documents with respect to the document template, both for what concerns the structure and the vocabulary. Since we are weakening the notion of conformance for what concerns document structure, the requirement of tag equality can be weakened as well and tag similarity can be considered. The semantic version of the Match algorithm handles this possibility by relying onanontology like WordNet for evaluating tag similarity. First of all, it has to be said thattags in the document andinthe query need not to be exactly the same: they are considered equivalent ifthey are morphological variants, or stems (like asingular term and its plural or two dierently spelled versions of a term). Moreover, two tags are considered similar if according tothe WordNet ontology,they belong tothe same synset (e.g. lm and movie). Acertain anity value can be assigned in order to take into account the synonymous similarity: 01. A typical value we refer is 0.8 (as considered in [4]). In the similarity measure, when matchingtwotags, thevalue 1- is added to the component m of the subtree evaluation. In this way we capture the missing tag equality. Tag similarity has been handled by extending the approach taken for DTDs with dierent elements withthe sametag at the same level. Also in this case, dierent matches are possible for an element withthe tag in the document. All possible matches must be considered and evaluated, and the bestone (i.e., the one leading tothe highest similarity) must be selected. In this case, however, the DTD could contain elements, subelements of the same element, whose tags are synonyms, but whose structure are dierent. Thus, each subtree under an element tagged with a synonym of that tag in the document must be matched against all of them, resulting inthe same subtree in the document being visited several times. 6 Experimental Results Unfortunately we still lack of large XML repositories: the generation of sources of XML data for evaluation purposes as well as the development ofevaluation methods is indeed mentioned as the rst research issue to be addressed [5]. To validate the proposed technique used by our search engine wehave performed some experiments on \real data" on movies, gathered from the Web, and \synthetic data" generated in order to introduce some polysemy between XML tags (e.g. tags with lm as photographic lm). In case of real data we extracted over 1000 XML documents from HTML documents describing movies. This datacome from two sources of related documents on the Web (sites: and For queries like the one of Figure 5, the matching ofthe documents ofthe collection and the query was performed [10]. The structural and semantic similarity value was calculated for each matchbetween the DTD query and each XML document. A ranking ofevery possible matching was obtained. Figure 5

6 Fellini film Fellini syn > 1974 date Fellini film "stage " > date movie syn syn "date" "movie, picture, picture show" movie date date 1980 Fellini year month 1984 April 0.92 > 0.83 Figure 5: Example of three dierent matches shows a case of perfect matching (similarity value equal to 1)and two cases in which the similarity between the document template and the XML document is quite high (values equal to 0.92 and 0.83, respectively). Precision and recall measures 2 [1] were calculated for every DTD query. For instance, for the query of Figure 5, with a similarity threshold equal to 1 (i.e., we are interested in XML documents which exactly match the query) we obtained: precision 1 and recall 0.27 In case the similarity threshold is equal to 0.88 (i.e., we are interested in documents which are very similar to our query) we obtained an important improvementofthe recall measure but still a good precision: precision 0.8 and recall Conclusions and Further Work In this paper an approach for querying XML sources was developed relying on structural and semantic similarity. As a DTD denes structural constraints onadocument tobe one ofits instances, a query denes structural and content constraints onadocument tobe an anwser to the query. Based on such idea our approach considers a query, expressed through an XML query language, and translates it into adocument, named Document Template. The document template represents 2 precision ) recall ) jfrelevant docsg\fretrieved docsgj jfretrieved docsgj jfrelevant docsg\fretrieved docsgj jfrelevant docsgj the structural and content constraints a document should satisfy in order to be an answer to the query. Our idea was to dene anap- proachto querying XML documents based on the classication algorithm. The documents in a source are classied against the document template andthe documents satisfying its constraints are collected. In order to beable to retrieve a greater number of documents matching a certain query (i.e. to increase the recall of the retrieved documents), the notion of structure and tag similarity was taken into account. Considering the similarity between tags, we allow for conceptual queries. We have proposed a measure to evaluate the structural and semantic similarity of a document with respect to a DTD. Such measure can be used for the retrieval of those documents which matchthe query. Thesemantic indexing was done usingthe WordNet synsets tags. The corpus-based naive-bayes approach, generally used for sense-taggingunstructured text, was employed to sense disambiguate the tags of the structured labelled trees. The rst experimental results we obtained, are promising even if more experiments onrealand synthetic data need to be performed, especially with bothindexed queries and documents. Unfortunately,the lackoflargexmlrepositories is still a problem, as well as the lack of sense tagged corpus for XML documents. In order to overcomewiththe latter problem, for the WSD task we aretaking into consideration the use of a fully automatic knowledge-based method which relies only on the lexical relations of the WordNet ontology and does not need any training data. Acknoledgements We wish to thank Stefania Lombardo for gathering movie HTML pages from the Filmup and 35mm Web sites and mapping them onto XML documents. The work of Paolo Rosso was partially supported by the research grant of the Vicerrectorado de Investigacion, Desarrollo e Innovacion (UPV) and by the Spanish research project (CYCIT) TIC C02. References [1] R. Baez and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, [2] E. Bertino, G. Guerrini, I. Merlo and M. Mesiti. An Approach to Classify Semis-

7 tructured Objects. In R. Guerraoui (Ed.), Proc. of the Thirteenth European Conference on Object-Oriented Programming, number 1628 in Lectures Notes in Computer Science, pp , [3] D. Carmel, Y. Maarek and A. Soer. XML and Information Retrieval: a SIGIR 2000 Workshop. SIGMOD Record, 30(1):62-65, [4] S. Castano, V. De Antonellis, M. G. Fugini and B.Pernici. Conceptual Schema Analysis: Techniques and Applications. ACM Transctions on Database Systems, 23(3): , September [5] S. Ceri, P. Fraternali and S. Paraboschi. XML: Current Developments and Future Challenges for the Database Community.In C. Zaniolo, P. Lockermann, M. Scholl and T. Grust (Eds.), Proc. of the Seventh Int'l Conf. on Extending Database Technology, Vol of Lectures Notes in Computer Science, pp Springer, [6] T. Chinenyanga and N. Kushmerick. An Expressive and Ecient Language for XML Information Retrieval. Journal of the American Society for Information Science an Technology, Special Issue on XML and Information Retrieval, [7] M. N. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri and K. Shim. XTRACT: A System for Extracting Document Type Descriptors from XML Documents. In Proc. of the ACM SIGMOD Int'l Conf. on Management ofdata, pp , [8] E. Bertino, G. Guerrini and M. Mesiti. Measuring the Structural Similarity among XML Documents and DTDs. Internal Report, Dept. of Informatica e Scienze dell'informazione, Universita di Genova, Italy, 2002 (also submitted to a Journal to be published). [9] N. Ide and J. Veronis. Introduction to the Special Issue on Word Sense Disambiguation. Computational Linguistics, 24, [10] M. Merlo. Un Approccio all' Interrogazione di Documenti XML basato su una Misura di Similarita tra Strutture. Thesis, Dept. of Informatica e Scienze dell'informazione, Universita di Genova, Italy, April [11] D. Jurafsky and J. Martin. Speech and Language Processing. Prentince Hall, [12] S. Landes, C. Leacock. and R.I Tengi. Buildind Semantic Concordance. In Fellbaum C. (Ed.), WordNet: An Electronic Lexical Database, pp , MIT Press, Cambridge, MA, USA. [13] A. Miller. WordNet: A Lexical Database for English. Communications of the ACM, 38(11):39-41, November [14] W3C. Extensible Markup Language (XML) 1.0, [15] S. Young and G. Bloothooft. Corpusbased Methods in Language and Speech Pocessing. ELSNET book edition, 1997.

Tag Semantics for the Retrieval of XML Documents

Tag Semantics for the Retrieval of XML Documents Tag Semantics for the Retrieval of XML Documents Davide Buscaldi 1, Giovanna Guerrini 2, Marco Mesiti 3, Paolo Rosso 4 1 Dip. di Informatica e Scienze dell Informazione, Università di Genova, Italy buscaldi@disi.unige.it,

More information

Controlled Access and Dissemination of XML Documents

Controlled Access and Dissemination of XML Documents Controlled Access and Dissemination of XML Documents Elisa Bertino Silvana Castano Elena Ferrari Dip. di Scienze dell'informazione Universita degli Studi di Milano Via Comelico, 39/41 20135 Milano, Italy

More information

Evolving a Set of DTDs According to a Dynamic Set of XML Documents

Evolving a Set of DTDs According to a Dynamic Set of XML Documents Evolving a Set of DTDs According to a Dynamic Set of XML Documents Elisa Bertino 1, Giovanna Guerrini 2, Marco Mesiti 3, and Luigi Tosetto 3 1 Dipartimento di Scienze dell Informazione - Università di

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

Structural Similarity between XML Documents and DTDs

Structural Similarity between XML Documents and DTDs Structural Similarity between XML Documents and DTDs Patrick K.L. Ng and Vincent T.Y. Ng Department of Computing, the Hong Kong Polytechnic University, Hong Kong {csklng,cstyng}@comp.polyu.edu.hk Abstract.

More information

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS 82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

WordNet-based User Profiles for Semantic Personalization

WordNet-based User Profiles for Semantic Personalization PIA 2005 Workshop on New Technologies for Personalized Information Access WordNet-based User Profiles for Semantic Personalization Giovanni Semeraro, Marco Degemmis, Pasquale Lops, Ignazio Palmisano LACAM

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Jorge Gracia, Eduardo Mena IIS Department, University of Zaragoza, Spain {jogracia,emena}@unizar.es Abstract. Ontology matching, the task

More information

A Linguistic Approach for Semantic Web Service Discovery

A Linguistic Approach for Semantic Web Service Discovery A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam

More information

Sense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm

Sense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm ISBN 978-93-84468-0-0 Proceedings of 015 International Conference on Future Computational Technologies (ICFCT'015 Singapore, March 9-30, 015, pp. 197-03 Sense-based Information Retrieval System by using

More information

Evaluating a Conceptual Indexing Method by Utilizing WordNet

Evaluating a Conceptual Indexing Method by Utilizing WordNet Evaluating a Conceptual Indexing Method by Utilizing WordNet Mustapha Baziz, Mohand Boughanem, Nathalie Aussenac-Gilles IRIT/SIG Campus Univ. Toulouse III 118 Route de Narbonne F-31062 Toulouse Cedex 4

More information

Measuring the structural similarity among XML documents and DTDs

Measuring the structural similarity among XML documents and DTDs J Intell Inf Syst (2008) 30:55 92 DOI 10.1007/s10844-006-0023-y Measuring the structural similarity among XML documents and DTDs Elisa Bertino Giovanna Guerrini Marco Mesiti Received: 24 September 2004

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

A JAVA-BASED SYSTEM FOR XML DATA PROTECTION* E. Bertino, M. Braun, S. Castano, E. Ferrari, M. Mesiti

A JAVA-BASED SYSTEM FOR XML DATA PROTECTION* E. Bertino, M. Braun, S. Castano, E. Ferrari, M. Mesiti CHAPTER 2 Author- A JAVA-BASED SYSTEM FOR XML DATA PROTECTION* E. Bertino, M. Braun, S. Castano, E. Ferrari, M. Mesiti Abstract Author- is a Java-based system for access control to XML documents. Author-

More information

A Comprehensive Analysis of using Semantic Information in Text Categorization

A Comprehensive Analysis of using Semantic Information in Text Categorization A Comprehensive Analysis of using Semantic Information in Text Categorization Kerem Çelik Department of Computer Engineering Boğaziçi University Istanbul, Turkey celikerem@gmail.com Tunga Güngör Department

More information

Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation

Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation Eneko Agirre, Aitor Soroa IXA NLP Group University of Basque Country Donostia, Basque Contry a.soroa@ehu.es Abstract

More information

Ontology Extraction from Heterogeneous Documents

Ontology Extraction from Heterogeneous Documents Vol.3, Issue.2, March-April. 2013 pp-985-989 ISSN: 2249-6645 Ontology Extraction from Heterogeneous Documents Kirankumar Kataraki, 1 Sumana M 2 1 IV sem M.Tech/ Department of Information Science & Engg

More information

Query Difficulty Prediction for Contextual Image Retrieval

Query Difficulty Prediction for Contextual Image Retrieval Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.

More information

COMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE

COMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE COMP90042 LECTURE 3 LEXICAL SEMANTICS SENTIMENT ANALYSIS REVISITED 2 Bag of words, knn classifier. Training data: This is a good movie.! This is a great movie.! This is a terrible film. " This is a wonderful

More information

International Journal of Advance Engineering and Research Development SENSE BASED INDEXING OF HIDDEN WEB USING ONTOLOGY

International Journal of Advance Engineering and Research Development SENSE BASED INDEXING OF HIDDEN WEB USING ONTOLOGY Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 SENSE

More information

Falcon-AO: Aligning Ontologies with Falcon

Falcon-AO: Aligning Ontologies with Falcon Falcon-AO: Aligning Ontologies with Falcon Ningsheng Jian, Wei Hu, Gong Cheng, Yuzhong Qu Department of Computer Science and Engineering Southeast University Nanjing 210096, P. R. China {nsjian, whu, gcheng,

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Martin Rajman, Pierre Andrews, María del Mar Pérez Almenta, and Florian Seydoux Artificial Intelligence

More information

Semantic Indexing of Technical Documentation

Semantic Indexing of Technical Documentation Semantic Indexing of Technical Documentation Samaneh CHAGHERI 1, Catherine ROUSSEY 2, Sylvie CALABRETTO 1, Cyril DUMOULIN 3 1. Université de LYON, CNRS, LIRIS UMR 5205-INSA de Lyon 7, avenue Jean Capelle

More information

A hybrid method to categorize HTML documents

A hybrid method to categorize HTML documents Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper

More information

DTD Inference from XML Documents: The XTRACT Approach

DTD Inference from XML Documents: The XTRACT Approach DTD Inference from XML Documents: The XTRACT Approach Minos Garofalakis Bell Laboratories minos@bell-labs.com Aristides Gionis University of Helsinki gionis@cs.helsinki.fi S. Seshadri Strand Genomics seshadri@strandgenomics.com

More information

An Approach to Classify Semi-Structured Objects

An Approach to Classify Semi-Structured Objects An Approach to Classify Semi-Structured Objects Elisa Bertino 1, Giovanna Guerrini 2, Isabella Merlo 2, and Marco Mesiti 3 1 Dipartimento di Scienze dell Informazione Università degli Studi di Milano Via

More information

HIGH-SPEED ACCESS CONTROL FOR XML DOCUMENTS A Bitmap-based Approach

HIGH-SPEED ACCESS CONTROL FOR XML DOCUMENTS A Bitmap-based Approach HIGH-SPEED ACCESS CONTROL FOR XML DOCUMENTS A Bitmap-based Approach Jong P. Yoon Center for Advanced Computer Studies University of Louisiana Lafayette LA 70504-4330 Abstract: Key words: One of the important

More information

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured

More information

Random Walks for Knowledge-Based Word Sense Disambiguation. Qiuyu Li

Random Walks for Knowledge-Based Word Sense Disambiguation. Qiuyu Li Random Walks for Knowledge-Based Word Sense Disambiguation Qiuyu Li Word Sense Disambiguation 1 Supervised - using labeled training sets (features and proper sense label) 2 Unsupervised - only use unlabeled

More information

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es

More information

arxiv:cmp-lg/ v1 5 Aug 1998

arxiv:cmp-lg/ v1 5 Aug 1998 Indexing with WordNet synsets can improve text retrieval Julio Gonzalo and Felisa Verdejo and Irina Chugur and Juan Cigarrán UNED Ciudad Universitaria, s.n. 28040 Madrid - Spain {julio,felisa,irina,juanci}@ieec.uned.es

More information

Improving Retrieval Experience Exploiting Semantic Representation of Documents

Improving Retrieval Experience Exploiting Semantic Representation of Documents Improving Retrieval Experience Exploiting Semantic Representation of Documents Pierpaolo Basile 1 and Annalina Caputo 1 and Anna Lisa Gentile 1 and Marco de Gemmis 1 and Pasquale Lops 1 and Giovanni Semeraro

More information

A framework for the classification and the reclassification of electronic catalogs

A framework for the classification and the reclassification of electronic catalogs 2004 ACM Symposium on Applied Computing A framework for the classification and the reclassification of electronic catalogs Domenico Beneventano Dipartimento di Ingegneria dell Informazione Università di

More information

Hierarchical Clustering of Process Schemas

Hierarchical Clustering of Process Schemas Hierarchical Clustering of Process Schemas Claudia Diamantini, Domenico Potena Dipartimento di Ingegneria Informatica, Gestionale e dell'automazione M. Panti, Università Politecnica delle Marche - via

More information

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA

More information

Chapter 13 XML: Extensible Markup Language

Chapter 13 XML: Extensible Markup Language Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server

More information

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVII, Number 4, 2012 CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL IOAN BADARINZA AND ADRIAN STERCA Abstract. In this paper

More information

ScienceDirect. Enhanced Associative Classification of XML Documents Supported by Semantic Concepts

ScienceDirect. Enhanced Associative Classification of XML Documents Supported by Semantic Concepts Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 194 201 International Conference on Information and Communication Technologies (ICICT 2014) Enhanced Associative

More information

A Hybrid Unsupervised Web Data Extraction using Trinity and NLP

A Hybrid Unsupervised Web Data Extraction using Trinity and NLP IJIRST International Journal for Innovative Research in Science & Technology Volume 2 Issue 02 July 2015 ISSN (online): 2349-6010 A Hybrid Unsupervised Web Data Extraction using Trinity and NLP Anju R

More information

second_language research_teaching sla vivian_cook language_department idl

second_language research_teaching sla vivian_cook language_department idl Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

SAACO: Semantic Analysis based Ant Colony Optimization Algorithm for Efficient Text Document Clustering

SAACO: Semantic Analysis based Ant Colony Optimization Algorithm for Efficient Text Document Clustering SAACO: Semantic Analysis based Ant Colony Optimization Algorithm for Efficient Text Document Clustering 1 G. Loshma, 2 Nagaratna P Hedge 1 Jawaharlal Nehru Technological University, Hyderabad 2 Vasavi

More information

Information Retrieval CSCI

Information Retrieval CSCI Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1

More information

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries Reza Taghizadeh Hemayati 1, Weiyi Meng 1, Clement Yu 2 1 Department of Computer Science, Binghamton university,

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , An Integrated Neural IR System. Victoria J. Hodge Dept. of Computer Science, University ofyork, UK vicky@cs.york.ac.uk Jim Austin Dept. of Computer Science, University ofyork, UK austin@cs.york.ac.uk Abstract.

More information

TIC: A Topic-based Intelligent Crawler

TIC: A Topic-based Intelligent Crawler 2011 International Conference on Information and Intelligent Computing IPCSIT vol.18 (2011) (2011) IACSIT Press, Singapore TIC: A Topic-based Intelligent Crawler Hossein Shahsavand Baghdadi and Bali Ranaivo-Malançon

More information

Evaluation Methods for Focused Crawling

Evaluation Methods for Focused Crawling Evaluation Methods for Focused Crawling Andrea Passerini, Paolo Frasconi, and Giovanni Soda DSI, University of Florence, ITALY {passerini,paolo,giovanni}@dsi.ing.unifi.it Abstract. The exponential growth

More information

SXML: an XML document as an S-expression

SXML: an XML document as an S-expression SXML: an XML document as an S-expression Kirill Lisovsky, Dmitry Lizorkin Institute for System Programming RAS, Moscow State University lisovsky@acm.org lizorkin@hotbox.ru Abstract This article is the

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1 Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.

More information

Evolution of XML Applications

Evolution of XML Applications Evolution of XML Applications University of Technology Sydney, Australia Irena Mlynkova 9.11. 2011 XML and Web Engineering Research Group Department of Software Engineering Faculty of Mathematics and Physics

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

Web Query Translation with Representative Synonyms in Cross Language Information Retrieval

Web Query Translation with Representative Synonyms in Cross Language Information Retrieval Web Query Translation with Representative Synonyms in Cross Language Information Retrieval August 25, 2005 Bo-Young Kang, Qing Li, Yun Jin, Sung Hyon Myaeng Information Retrieval and Natural Language Processing

More information

2 Experimental Methodology and Results

2 Experimental Methodology and Results Developing Consensus Ontologies for the Semantic Web Larry M. Stephens, Aurovinda K. Gangam, and Michael N. Huhns Department of Computer Science and Engineering University of South Carolina, Columbia,

More information

Formulating XML-IR Queries

Formulating XML-IR Queries Alan Woodley Faculty of Information Technology, Queensland University of Technology PO Box 2434. Brisbane Q 4001, Australia ap.woodley@student.qut.edu.au Abstract: XML information retrieval systems differ

More information

A Model and a Visual Query Language for Structured Text. handle structure. language. These indices have been studied in literature and their

A Model and a Visual Query Language for Structured Text. handle structure. language. These indices have been studied in literature and their A Model and a Visual Query Language for Structured Text Ricardo Baeza-Yates Gonzalo Navarro Depto. de Ciencias de la Computacion, Universidad de Chile frbaeza,gnavarrog@dcc.uchile.cl Jesus Vegas Pablo

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval WS 2008/2009 25.11.2008 Information Systems Group Mohammed AbuJarour Contents 2 Basics of Information Retrieval (IR) Foundations: extensible Markup Language (XML)

More information

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network Roberto Navigli, Simone Paolo Ponzetto What is BabelNet a very large, wide-coverage multilingual

More information

Accuracy Avg Error % Per Document = 9.2%

Accuracy Avg Error % Per Document = 9.2% Quixote: Building XML Repositories from Topic Specic Web Documents Christina Yip Chung and Michael Gertz Department of Computer Science, University of California, Davis, CA 95616, USA fchungyjgertzg@cs.ucdavis.edu

More information

Semantic-Based Information Retrieval for Java Learning Management System

Semantic-Based Information Retrieval for Java Learning Management System AENSI Journals Australian Journal of Basic and Applied Sciences Journal home page: www.ajbasweb.com Semantic-Based Information Retrieval for Java Learning Management System Nurul Shahida Tukiman and Amirah

More information

Cross Lingual Question Answering using CINDI_QA for 2007

Cross Lingual Question Answering using CINDI_QA for 2007 Cross Lingual Question Answering using CINDI_QA for QA@CLEF 2007 Chedid Haddad, Bipin C. Desai Department of Computer Science and Software Engineering Concordia University 1455 De Maisonneuve Blvd. W.

More information

Putting ontologies to work in NLP

Putting ontologies to work in NLP Putting ontologies to work in NLP The lemon model and its future John P. McCrae National University of Ireland, Galway Introduction In natural language processing we are doing three main things Understanding

More information

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

Question Answering Approach Using a WordNet-based Answer Type Taxonomy Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering

More information

IJCSC Volume 5 Number 1 March-Sep 2014 pp ISSN

IJCSC Volume 5 Number 1 March-Sep 2014 pp ISSN Movie Related Information Retrieval Using Ontology Based Semantic Search Tarjni Vyas, Hetali Tank, Kinjal Shah Nirma University, Ahmedabad tarjni.vyas@nirmauni.ac.in, tank92@gmail.com, shahkinjal92@gmail.com

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio

Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio Case Study Use Case: Recruiting Segment: Recruiting Products: Rosette Challenge CareerBuilder, the global leader in human capital solutions, operates the largest job board in the U.S. and has an extensive

More information

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE Wei-ning Qian, Hai-lei Qian, Li Wei, Yan Wang and Ao-ying Zhou Computer Science Department Fudan University Shanghai 200433 E-mail: wnqian@fudan.edu.cn

More information

The HMatch 2.0 Suite for Ontology Matchmaking

The HMatch 2.0 Suite for Ontology Matchmaking The HMatch 2.0 Suite for Ontology Matchmaking S. Castano, A. Ferrara, D. Lorusso, and S. Montanelli Università degli Studi di Milano DICo - Via Comelico, 39, 20135 Milano - Italy {castano,ferrara,lorusso,montanelli}@dico.unimi.it

More information

Semantic Web. Tahani Aljehani

Semantic Web. Tahani Aljehani Semantic Web Tahani Aljehani Motivation: Example 1 You are interested in SOAP Web architecture Use your favorite search engine to find the articles about SOAP Keywords-based search You'll get lots of information,

More information

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION Ms. Nikita P.Katariya 1, Prof. M. S. Chaudhari 2 1 Dept. of Computer Science & Engg, P.B.C.E., Nagpur, India, nikitakatariya@yahoo.com 2 Dept.

More information

Design and Implementation of an RDF Triple Store

Design and Implementation of an RDF Triple Store Design and Implementation of an RDF Triple Store Ching-Long Yeh and Ruei-Feng Lin Department of Computer Science and Engineering Tatung University 40 Chungshan N. Rd., Sec. 3 Taipei, 04 Taiwan E-mail:

More information

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,

More information

Extending E-R for Modelling XML Keys

Extending E-R for Modelling XML Keys Extending E-R for Modelling XML Keys Martin Necasky Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic martin.necasky@mff.cuni.cz Jaroslav Pokorny Faculty of Mathematics and

More information

Efficient Discovery of Semantic Web Services

Efficient Discovery of Semantic Web Services ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference

More information

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch

More information

NLP - Based Expert System for Database Design and Development

NLP - Based Expert System for Database Design and Development NLP - Based Expert System for Database Design and Development U. Leelarathna 1, G. Ranasinghe 1, N. Wimalasena 1, D. Weerasinghe 1, A. Karunananda 2 Faculty of Information Technology, University of Moratuwa,

More information

Mining Class Hierarchies from XML Data: Representation Techniques

Mining Class Hierarchies from XML Data: Representation Techniques Mining Class Hierarchies from XML Data: Representation Techniques Paolo Ceravolo 1 and Ernesto Damiani 1 Department of Information Technology University of Milan Via Bramante, 65-26013 Crema (Italy) damiani,

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

From Passages into Elements in XML Retrieval

From Passages into Elements in XML Retrieval From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles

More information

EXTRACTION AND ALIGNMENT OF DATA FROM WEB PAGES

EXTRACTION AND ALIGNMENT OF DATA FROM WEB PAGES EXTRACTION AND ALIGNMENT OF DATA FROM WEB PAGES Praveen Kumar Malapati 1, M. Harathi 2, Shaik Garib Nawaz 2 1 M.Tech, Computer Science Engineering, 2 M.Tech, Associate Professor, Computer Science Engineering,

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

A Framework for Processing Complex Document-centric XML with Overlapping Structures Ionut E. Iacob and Alex Dekhtyar

A Framework for Processing Complex Document-centric XML with Overlapping Structures Ionut E. Iacob and Alex Dekhtyar A Framework for Processing Complex Document-centric XML with Overlapping Structures Ionut E. Iacob and Alex Dekhtyar ABSTRACT Management of multihierarchical XML encodings has attracted attention of a

More information

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate Searching Information Servers Based on Customized Proles Technical Report USC-CS-96-636 Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California

More information

Exam Marco Kuhlmann. This exam consists of three parts:

Exam Marco Kuhlmann. This exam consists of three parts: TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

Processing Structural Constraints

Processing Structural Constraints SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited

More information

Demo of the XSDF Prototype for XML Semantic Disambiguation

Demo of the XSDF Prototype for XML Semantic Disambiguation Demo of the XSDF Prototype for XML Semantic Nathalie Charbel LIUPPA, Univ. of Pau & Adour Countries, 646 Anglet, France nathalie.charbel@univ-pau.fr Joe Tekli SOE, Lebanese American Univ., 36 Byblos, Lebanon

More information

X-KIF New Knowledge Modeling Language

X-KIF New Knowledge Modeling Language Proceedings of I-MEDIA 07 and I-SEMANTICS 07 Graz, Austria, September 5-7, 2007 X-KIF New Knowledge Modeling Language Michal Ševčenko (Czech Technical University in Prague sevcenko@vc.cvut.cz) Abstract:

More information

XML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson

XML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Manning, Raghavan, and Schütze http://www.informationretrieval.org OVERVIEW Introduction Basic XML Concepts Challenges

More information

QUERIES SNIPPET EXPANSION FOR EFFICIENT IMAGES RETRIEVAL

QUERIES SNIPPET EXPANSION FOR EFFICIENT IMAGES RETRIEVAL QUERIES SNIPPET EXPANSION FOR EFFICIENT IMAGES RETRIEVAL ROOH ULLAH, J. JAAFAR Dept of Computer and Information Sciences, Universiti Teknologi PETRONAS, Malaysia E-mail: roohullah_orc@yahoo.com, jafreez@petronas.com.my

More information

A Top-Down Visual Approach to GUI development

A Top-Down Visual Approach to GUI development A Top-Down Visual Approach to GUI development ROSANNA CASSINO, GENNY TORTORA, MAURIZIO TUCCI, GIULIANA VITIELLO Dipartimento di Matematica e Informatica Università di Salerno Via Ponte don Melillo 84084

More information

A Tagging Approach to Ontology Mapping

A Tagging Approach to Ontology Mapping A Tagging Approach to Ontology Mapping Colm Conroy 1, Declan O'Sullivan 1, Dave Lewis 1 1 Knowledge and Data Engineering Group, Trinity College Dublin {coconroy,declan.osullivan,dave.lewis}@cs.tcd.ie Abstract.

More information

XML Clustering by Bit Vector

XML Clustering by Bit Vector XML Clustering by Bit Vector WOOSAENG KIM Department of Computer Science Kwangwoon University 26 Kwangwoon St. Nowongu, Seoul KOREA kwsrain@kw.ac.kr Abstract: - XML is increasingly important in data exchange

More information

Influence of Word Normalization on Text Classification

Influence of Word Normalization on Text Classification Influence of Word Normalization on Text Classification Michal Toman a, Roman Tesar a and Karel Jezek a a University of West Bohemia, Faculty of Applied Sciences, Plzen, Czech Republic In this paper we

More information

Indexing XML Data with ToXin

Indexing XML Data with ToXin Indexing XML Data with ToXin Flavio Rizzolo, Alberto Mendelzon University of Toronto Department of Computer Science {flavio,mendel}@cs.toronto.edu Abstract Indexing schemes for semistructured data have

More information

2. PRELIMINARIES MANICURE is specically designed to prepare text collections from printed materials for information retrieval applications. In this ca

2. PRELIMINARIES MANICURE is specically designed to prepare text collections from printed materials for information retrieval applications. In this ca The MANICURE Document Processing System Kazem Taghva, Allen Condit, Julie Borsack, John Kilburg, Changshi Wu, and Je Gilbreth Information Science Research Institute University of Nevada, Las Vegas ABSTRACT

More information