A Voting Method for XML Retrieval

Size: px
Start display at page:

Download "A Voting Method for XML Retrieval"

Transcription

1 A Voting Method for XML Retrieval Gilles Hubert 1 IRIT/SIG-EVI, 118 route de Narbonne, Toulouse cedex 4 2 ERT34, Institut Universitaire de Formation des Maîtres, 56 av. de l URSS, Toulouse hubert@irit.fr Abstract. This paper describes the retrieval approach proposed by the SIG/EVI group of the IRIT research centre in INEX 2004 evaluation. The approach uses a voting method coupled with some processes to answer content only and content and structure queries. This approach is based on previous works we leaded in the context of automatic text categorization. 1 Introduction The development of systems to perform searches in collections constituted of XML (extensible Markup Language) documents [3] has become a need since the use of XML is growing. Consequently, a growing number of systems intend to provide means to retrieve relevant components among XML documents. XML retrieval systems need to take into account content and structural aspects. Regarding the variety of proposed XML retrieval systems it is interesting to evaluate their effectiveness. For tha the INitiative for the Evaluation of XML retrieval (INEX) provides a testbed and scoring methods allowing participants to evaluate and compare their results. Underlying approaches of systems participating to INEX can be classified in two categories [5] : model-oriented approaches and system-oriented approaches. Modeloriented approaches gather notably approaches based on language models [11], [8], [1] or other probabilistic models [14] which obtained good results in Systemoriented approaches extend textual document retrieval system adding XML-specific processing. Various systems in this category [10], [6], [13], [16] obtained good results in In this paper, we present an IR approach initially applied to automatic categorization of structured documents according to concept hierarchies and its evolution brought for XML retrieval notably within the context of INEX. Section 2 is a short presentation of the INEX initiative 2004 edition. Section 3 presents the initial context in which the method was initiated and its first application within INEX in The evolutions made to this approach for INEX 2004 are described in section 4. Section 5 presents the submitted runs and the obtained results. In section 6 we conclude analyzing the experiment and considering future works.

2 2 The INEX initiative 2.1 Collection The INEX documents correspond to approximately 12,000 articles of the IEEE Computer Society s publications from 1995 to 2002 marked up in XML. All the documents respect the same DTD. The collection gathers over eight millions XML elements of varying length and granularity (ex. title, paragraph or article). 2.2 Queries INEX introduces two types of queries: CO (Content Only) queries describe the expected content of the XML elements to retrieve. CAS (Content and Structure) queries combine content and explicit references to the XML structure using a variant of Xpath [4]. CAS topics contain indications about the structure of expected XML elements and about the location of expected content. Both CO and CAS topics are made up of four parts: topic title, topic description, narrative and keywords. Within the ad-hoc retrieval task, two types of tasks are defined: (1) the CO task, using CO queries, (2) the VCAS task, using CAS queries, for which the structural constraints are considered as vague conditions. 3 A Voting method in information retrieval The approach we proposed is derived from a process we first defined for textual document categorisation [7], [2]. Document categorisation intends to link documents with pre-defined categories. Our approach focuses on categories organised as taxonomy. The original aspect of our approach is that it involves a voting principle instead of a classical similarity computing. The association of a text to categories is based on the Vector Voting method [12]. The voting process evaluates the importance of the association between a given text and a given category. This method is similar to the HVV method (Hyperlink Vector Voting) used within the Web context to compute the relevance of a Web page regarding the web sites referring to it [9]. In our contex the initial strategy considers that the more the category terms appear in the tex the more the link between the text and this category is strong. Thus, this method relies on terms describing each category and their automatic ex-

3 traction from the document to be categorised. The result is a list of categories annotating each document. For INEX 2003, this categorisation process has been applied. Every XML component has been processed as a complete document. Every topic has been considered as a category of a flat taxonomy. The result was a list of topics corresponding to each XML component. It was then reversed and reordered to fit the INEX format of results. Results obtained for the submitted runs [15] have led us to improve the process to suit a retrieval process. The axes of this evolution have been as follows: inverse the voting process to estimate the relevance of each XML component according to each topic, modify the voting function to take into account the great variations of element sizes and to take into account topic treatment rather than category treatmen integrate the aggregation aspect of an XML element (i.e. elements composed of relevant elements), integrate structural constraint processing for CAS topics. 4 Evolution of the voting method within INEX The approach we proposed is derived from a process we first defined for textual document categorisation [7], [2]. Document categorisation intends to link documents with pre-defined categories. Our approach focuses on categories organised as taxonomy. The original aspect of our approach is that it involves a voting principle instead of a classical similarity computing. The association of a text to categories is based on the Vector Voting method [12]. The voting process evaluates the importance of the association between a given text and a given category. 4.1 INEX collection pre-processing From the INEX collection point of view, the documents are considered as sets of text chunks identified by xpaths. For each XML componen concepts are extracted automatically and saved with the xpath identifying the XML component in which they appear and the number of occurrences in the component. Concept extraction involves notably stop word removal. Optionally, some processes can be applied to concepts such as stemming using Porter s algorithm. For INEX'2004 experiments all XML tags except text formatting tags (bold, italic, underline) have been taken into account. From the topic point of view, although our method can use all the parts constituting CO and CAS topics, we used only the title part for the INEX'2004 experiments as requested. For both topic types, stop words are removed and optionally terms can be stemmed using Porter s algorithm.

4 4.2 Voting function The voting function must take into account the importance in the XML element of each term describing the topic and the importance of each term in the topic representation. We have studied different voting functions and the one providing the best results is described as follows: where T is the topic Vote( E is an XML element = t T F( F( F ( This factor measures the importance of the term t in F( the XML element E. F( corresponds the number of occurrences of the term t in the element E. This factor measures the importance of the term t in the topic representation T. F(T) corresponds to the number of occurrences of the term t in the topic T and T) corresponds to the size (number of terms) of T. The voting function combines two factors: the presence of a term in the element and the importance of this term in the topic. 4.3 Scoring function The voting function is coupled with a third factor representing the importance of the topic presence within the XML element. The final function (scoring function) that computes the score of an XML element regarding a given topic is the following: where NT ( T, NT ( T, Score( = Vote( f ( ) This factor measures the presence rate of terms representing the topic in the text (importance of the topic). T) corresponds to the number of terms in the topic representation T and NT(T, corresponds to the number of terms of the topic T that appear in the XML element E.

5 NT ( T, E ) Applying a function ƒ to the third factor (i.e. the presence rate of terms representing the topic in the text) aims at varying the influence of this factor on the scoring function. We tried different functions ƒ, for example the initial function was the exponential (i.e. NT ( T, S ( f ( ) = e ). 4.4 Additional processes for both CO and CAS topics The scoring function is completed with the notion of coverage. The aim of the coverage is to ensure that only documents in which the topic is represented enough will be selected for this topic. The coverage is a threshold corresponding to the percentage of terms from a topic that appears in a text. For example, 50% of coverage implies that at least half of the terms describing a topic have to appear in the text of a document to select it. If NT ( T, CT then NT ( T, Score( = Vote( f ( ) else Score ( = 0. 0 where CT is a real constant (CT 0.0) corresponding to the coverage threshold The hierarchical structure of XML has to be taken into account. The hypothesis on which is based our system is that an element containing a component selected as relevant is also relevant. Our system takes into account this hypothesis propagating the score of an element to the elements it composes. The score propagated to the composed elements is decreased applying a reducing factor. where E a ancestor of E and d( E, α < 1 Score( E, = Score( E, + (1 d( E, α) Score( a α is a constant coefficient and E is an XML element a d(e a, is the distance between E a and E in the xpath associated to E (e.g. in the xpath /article/bdy/s/ss1/p the distance between p and bdy is equal to 3 i.e. d(bdy,p)=3) This process tends to consider a composed element less relevant than the element it is composed of. However, an element composed of several relevant elements can obtain a score greater than one of its components. The hypothesis chosen for INEX is quite different notably due to relevance dimensions: exhaustivity and specificity. Considering exhaustivity, a composed element is considered at least as relevant as the most relevant of its components. Considering specificity, the relevance of an element composed of several relevant components is a a

6 less or equal to the relevance of the most relevant component. It would be interesting to evaluate the impact of this difference of relevance propagation on the retrieval results of our system. In addition, in INEX, terms constituting a topic title can have either the prefix + or -. The sign + is used to emphasize a concept and denotes an unwanted concept. The + and signs do not have strict semantics but just indicate preferences wished by the topic s author. An element containing a term prefixed by in the topic title can be judged relevant to the information need. In the same way, an element judged relevant to the information need even if it does not contain the term prefixed by + in the topic title. To take into account the possibility of having prefixed terms, a coefficient is associated to each term. A coefficient is fixed for each case: term not prefixed, term with the prefix + and term with the prefix -. where Vote( = t T F( sc( F( sc(t) = a if t has the prefix in the topic sc(t) = b if t has no prefix in the topic sc(t) = c if t has the prefix + in the topic a, b, c are real constants 4.5 Specific processes for CAS topics On one hand, we take into account different types of constraints on content. Structural constraints on xpath of elements which are expected to contain keywords (e.g. about(.//p,'+authorization +"access control" +security') and constraints on the year of the article.(e.g. //yr <='2000') are taken into account. These kinds of structural constraints on content gathered all the constraints appearing in the CAS topics of INEX The voting method applied to CO topics has been extended to take into account such constraints as follows: where Vote( = t T F( (1 + β ) F( if E matches a structural constraint defined on t then β>0.0 else β=0.0 On the other hand, an additional step identifies the structural constraints on target elements indicated in CAS topics. All the structural constraints defined on target

7 elements of topics are taken into account and stored to be processed in a post-voting step to enrich the results issued from the voting step. For VCAS evaluation, the target constraint specified in the topic does not have to be strictly verified. The constraint is rather regarded as a hint for expected results without eliminating the elements which do not satisfy the target constraint. To take into account these principles, the score associated to the elements of the results that match the expected xpaths are increased. A factor is applied to the score of matching elements as follows: If R matches X then NT ( T, Score( = γ Vote( f ( ) where γ>1.0 where R is the location path (xpath) of the element E from the root of the document X is the location path (xpath) defined as the target constraint in the topic 5 Experiments 5.1 Experiment setup Our experiments aim at evaluating the efficiency of the evolution given to the voting function and the coefficient adjustments resulting from training performed on the INEX 2003 assessment testbed. The training phase only concerns system processes applied to both CO and CAS topics. Three runs based on the voting method were submitted to INEX'2004. Two runs were performed on CO topics and one run was performed on CAS topics. The runs on CO topics differ from the function f used in the voting method. The run labelled VTCO2004TC35xp400sC-515 uses the voting function: NT ( T, ( ) S ( Score( = Vote( ϕ where ϕ=400. The run labelled VTCO2004TC35p4sC-515 uses the voting function: NT ( T, D) Score ( = Vote( where λ=4. λ

8 The run on CAS topics labelled VTCAS2004C35xp200sC-515PP1 uses the voting function: Score( Vote( ϕ NT ( T, ( ) S ( = where ϕ=200. The coefficient taking into account structural predicates associated to searched concepts was fixed to 1.0 (i.e. the vote of an element regarding a given concept is doubled when the element matches the structural constraint associated to the concept). The coefficient taking into account structural predicates for expected results was fixed to 2.0 (i.e. the score of an element matching the structural predicate is doubled). The values of these two coefficients were fixed arbitrarily. For all submitted runs the other parameters of the scoring function were the same. Coverage threshold was fixed to 35% (i.e. more than a third of terms describing the topic must appear in the text to keep the XML component). Coefficients applied to take into account the signs + and - used to emphasise a concept or to denote an unwanted one were fixed to: +5.0 for concepts marked with + (the vote of these concepts increases the score of the elements in which they appear), -5.0 for concepts marked with - (the vote of these concepts reduces the score of the elements in which they appear), 1.0 for unmarked concepts. The coefficient α used to propagate a component score through the hierarchical structure of the XML document was fixed to 0.1. The values of the parameters are those which gave the best results during a training phase done with INEX 2003 CO topics. 5.2 Results The following table shows the preliminary results of the three runs based on the voting method: Table 1. Results of the 3 runs performed using the voting method Run Aggregate score Rank VTCO2004TC35xp400sC /70 VTCO2004TC35p4sC /70 VTCAS2004TC35xp200sC /51

9 The results of the two runs for CO topics are detailled in the following table: Table 2. Detailed results of the 2 runs for CO topics Quantisation VTCO2004TC35xp400sC-515 Average Rank precision VTCO2004TC35p4sC-515 Average Rank precision strict / /70 generalised / /70 so / /70 s3_e / /70 s3_e / /70 e3_s / /70 e3_s / /70 For CO topics, the run which has obtained the best results is the run labelled VTCO2004TC35xp400sC-515. The best measures have been obtained with e3s321 quantisation. Average precision is equal to , placing the run at the 10 th rank. The run labelled VTCO2004TC35p4sC-515 has obtained values slightly lower for most of the quantisations. Only the best results obtained for CO topics are presented in the following graphs that is to say run VTCO2004TC35xp400sC-515 for e3s321 quantisation. Fig. 1. Precision/Recall curve of the CO run labelled VTCO2004TC35xp400sC-515 for e 3 s 321 quantisation

10 Fig. 2. Rank of the CO run labelled VTCO2004TC35xp400sC-515 for e 3 s 321 quantisation For CAS topics, the run VTCAS2004TC35xp200sC-515PP1 has been ranked at the 5 th place. The results of the run are detailled in the following table: Table 3. Detailed results of the run for CAS topics VTCAS2004TC35xp200sC-515PP1 Quantisation Average precision Rank strict /51 generalised /51 so /51 s3_e /51 s3_e /51 e3_s /51 e3_s /51 The best measures have been obtained for quantisations stric e3s321 and e3s32 for which the run is ranked 5. The following figures present the results corresponding to the strict quantisation and e3s321 quantisation.

11 Fig. 3. Precision/Recall curve of the VCAS run labelled VTCAS2004TC35xp200sC-515PP1 for strict quantisation Fig. 4. Rank of the VCAS run labelled VTCAS2004TC35xp200sC-515PP1 for strict quantisation

12 Fig. 5. Precision/Recall curve of the VCAS run labelled VTCAS2004TC35xp200sC-515PP1 for e 3 s 321 quantisation Fig. 6. Rank of the VCAS run labelled VTCAS2004TC35xp200sC-515PP1 for e 3 s 321 quantisation

13 6 DISCUSSION AND FUTURE WORKS Regarding the experiments that were performed and the obtained results we can notice that: the chosen functions and parameters for the scoring method tend to support exhaustivity rather than specificity. Indeed, the importance of the factor measuring the representation of the topic (i.e. NT(T,/T)) dominates in the scoring function and this factor is related to the exhaustivity relevance. It would be interesting to modify the scoring function to increase the number of elements judged as relevant regarding specificity. The measures obtained using INEX 2003 CO topics were globally better. This suggests that our scoring method is more efficient on certain queries. It would be interesting to identify a class (or classes) of queries for which the function works better, a class (classes) of queries for which the function is less efficient and to understand why. The function could evolve to extend its efficiency to other kinds of queries or different functions could be applied regarding different query classes. The values of coefficients applied for structural constraint matching have been fixed arbitrarily. Additional experiments on INEX 2004 CAS topics will help us to adjust the values of these coefficients. Evaluate the profit of adding a relevance feedback process to our method. On one hand, feedback from first ranked elements of the assessments can be performed. This is the process chosen this year in the relevance feedback track. On the other hand, we plan to integrate a feedback process using first ranked elements of a first search using our system. Acknowledgments Research outlined in the paper is part of the project QUEST: Query reformulation for structured document retrieval, PAI Alliance N 05768UJ. However, this publication only reflects the author s view. References 1. Abolhassani, M., Fuhr, N.: Applying the Divergence from Randomness Approach for Content-Only Search in XML Documents. 26th European Conference on IR Research (ECIR), Lecture Notes in Computer Science vol (2004) Augé, J., Englmeier, K., Huber G., Mothe, J. : Catégorisation automatique de textes basée sur des hiérarchies de concepts. 19ième Journées de Bases de Données Avancées (BDA) Lyon (2003) 69-87

14 3. Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., Yergeau, Y.: Extensible Markup Language (XML) 1.0 (Third Edition). W3C Recommendation., (2004) 4. Clark, J., DeRose, S.: XML Path Language (XPath). W3C Recommendation, (1999). 5. Fuhr, N., Maalik, S., Lalmas, M.: Overview of the INitiative for the Evaluation of XML Retrieval (INEX) Proceedings of the Second INEX Workshop, Dagstuhl, Germany (2004) Geva, S., Leo-Spork, M.: XPath Inverted File for Information Retrieval. INEX 2003 Workshop Proceedings, (2003) IRAIA: Getting Orientation in Complex Information Spaces as an Emergent Behaviour of Autonomous Information Agents. European Information Societies Technology, IST , ( ). 8. Kamps, J., de Rijke, M., Sigurbjörnsson, B.: Length normalization in XML retrieval. Proceedings of the 27th International Conference on Research and Development in Information Retrieval (SIGIR). New York NY, USA, (2004) Li, Y.: Toward a qualitative search engine. IEEE Internet Computing, vol. 2 n 4, (1998) List J., Mihajlovic V., de Vries A. P., Ramirez G., Hiemstra D.: The TIJAH XML-IR system at INEX INEX 2003 Workshop Proceedings, (2003) Ogilvie, P., Callan J.: Using Language Models for Flat Text Queries in XML Retrieval. Proceedings of the Second INEX Workshop. Dagstuhl, Germany, (2004) Pauer, B., Holger, P.: Statfinder. Document Package Statfinder, Vers. 1.8, (2000) 13. Pehcevski, J., Thom J., Vercoustre, A.M.: Enhancing Content-And-Structure Information Retrieval using a Native XML Database. Proceedings of The First Twente Data Management Workshop on XML Databases and Information Retrieval (TDM'04), Enschede, The Netherlands, (2004) 14. Piwowarski B., Vu H.-T., Gallinari P.: Bayesian Networks and INEX'03. Proceedings of the Second INEX Workshop. Dagstuhl, Germany, (2003) Sauvagna K., Huber G., Boughanem, M., Mothe, J.: IRIT at INEX Proceedings of the Second INEX Workshop. Dagstuhl, Germany, (2003) Trotman, A., O'Keefe, R. A.: Identifying and Ranking Relevant Document Elements. INEX 2003 Workshop Proceedings, (2003)

Processing Structural Constraints

Processing Structural Constraints SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited

More information

Formulating XML-IR Queries

Formulating XML-IR Queries Alan Woodley Faculty of Information Technology, Queensland University of Technology PO Box 2434. Brisbane Q 4001, Australia ap.woodley@student.qut.edu.au Abstract: XML information retrieval systems differ

More information

The Utrecht Blend: Basic Ingredients for an XML Retrieval System

The Utrecht Blend: Basic Ingredients for an XML Retrieval System The Utrecht Blend: Basic Ingredients for an XML Retrieval System Roelof van Zwol Centre for Content and Knowledge Engineering Utrecht University Utrecht, the Netherlands roelof@cs.uu.nl Virginia Dignum

More information

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks University of Amsterdam at INEX 2010: Ad hoc and Book Tracks Jaap Kamps 1,2 and Marijn Koolen 1 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Faculty of Science,

More information

Component ranking and Automatic Query Refinement for XML Retrieval

Component ranking and Automatic Query Refinement for XML Retrieval Component ranking and Automatic uery Refinement for XML Retrieval Yosi Mass, Matan Mandelbrod IBM Research Lab Haifa 31905, Israel {yosimass, matan}@il.ibm.com Abstract ueries over XML documents challenge

More information

The Interpretation of CAS

The Interpretation of CAS The Interpretation of CAS Andrew Trotman 1 and Mounia Lalmas 2 1 Department of Computer Science, University of Otago, Dunedin, New Zealand andrew@cs.otago.ac.nz, 2 Department of Computer Science, Queen

More information

Mounia Lalmas, Department of Computer Science, Queen Mary, University of London, United Kingdom,

Mounia Lalmas, Department of Computer Science, Queen Mary, University of London, United Kingdom, XML Retrieval Mounia Lalmas, Department of Computer Science, Queen Mary, University of London, United Kingdom, mounia@acm.org Andrew Trotman, Department of Computer Science, University of Otago, New Zealand,

More information

Relevance in XML Retrieval: The User Perspective

Relevance in XML Retrieval: The User Perspective Relevance in XML Retrieval: The User Perspective Jovan Pehcevski School of CS & IT RMIT University Melbourne, Australia jovanp@cs.rmit.edu.au ABSTRACT A realistic measure of relevance is necessary for

More information

From Passages into Elements in XML Retrieval

From Passages into Elements in XML Retrieval From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles

More information

A Universal Model for XML Information Retrieval

A Universal Model for XML Information Retrieval A Universal Model for XML Information Retrieval Maria Izabel M. Azevedo 1, Lucas Pantuza Amorim 2, and Nívio Ziviani 3 1 Department of Computer Science, State University of Montes Claros, Montes Claros,

More information

CADIAL Search Engine at INEX

CADIAL Search Engine at INEX CADIAL Search Engine at INEX Jure Mijić 1, Marie-Francine Moens 2, and Bojana Dalbelo Bašić 1 1 Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia {jure.mijic,bojana.dalbelo}@fer.hr

More information

Using XML Logical Structure to Retrieve (Multimedia) Objects

Using XML Logical Structure to Retrieve (Multimedia) Objects Using XML Logical Structure to Retrieve (Multimedia) Objects Zhigang Kong and Mounia Lalmas Queen Mary, University of London {cskzg,mounia}@dcs.qmul.ac.uk Abstract. This paper investigates the use of the

More information

The Effect of Structured Queries and Selective Indexing on XML Retrieval

The Effect of Structured Queries and Selective Indexing on XML Retrieval The Effect of Structured Queries and Selective Indexing on XML Retrieval Börkur Sigurbjörnsson 1 and Jaap Kamps 1,2 1 ISLA, Faculty of Science, University of Amsterdam 2 Archives and Information Studies,

More information

Accessing XML documents: The INEX initiative. Mounia Lalmas, Thomas Rölleke, Zoltán Szlávik, Tassos Tombros (+ Duisburg-Essen)

Accessing XML documents: The INEX initiative. Mounia Lalmas, Thomas Rölleke, Zoltán Szlávik, Tassos Tombros (+ Duisburg-Essen) Accessing XML documents: The INEX initiative Mounia Lalmas, Thomas Rölleke, Zoltán Szlávik, Tassos Tombros (+ Duisburg-Essen) XML documents Book Chapters Sections World Wide Web This is only only another

More information

Exploiting Index Pruning Methods for Clustering XML Collections

Exploiting Index Pruning Methods for Clustering XML Collections Exploiting Index Pruning Methods for Clustering XML Collections Ismail Sengor Altingovde, Duygu Atilgan and Özgür Ulusoy Department of Computer Engineering, Bilkent University, Ankara, Turkey {ismaila,

More information

A Comparative Study Weighting Schemes for Double Scoring Technique

A Comparative Study Weighting Schemes for Double Scoring Technique , October 19-21, 2011, San Francisco, USA A Comparative Study Weighting Schemes for Double Scoring Technique Tanakorn Wichaiwong Member, IAENG and Chuleerat Jaruskulchai Abstract In XML-IR systems, the

More information

Passage Retrieval and other XML-Retrieval Tasks. Andrew Trotman (Otago) Shlomo Geva (QUT)

Passage Retrieval and other XML-Retrieval Tasks. Andrew Trotman (Otago) Shlomo Geva (QUT) Passage Retrieval and other XML-Retrieval Tasks Andrew Trotman (Otago) Shlomo Geva (QUT) Passage Retrieval Information Retrieval Information retrieval (IR) is the science of searching for information in

More information

A CONTENT-BASED APPROACH TO RELEVANCE FEEDBACK IN XML-IR FOR CONTENT AND STRUCTURE QUERIES

A CONTENT-BASED APPROACH TO RELEVANCE FEEDBACK IN XML-IR FOR CONTENT AND STRUCTURE QUERIES A CONTENT-BASED APPROACH TO RELEVANCE FEEDBACK IN XML-IR FOR CONTENT AND STRUCTURE QUERIES Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete and Carlos Martín-Dancausa Departamento de Ciencias de

More information

An XML-IR-DB Sandwich: Is it Better With an Algebra in Between?

An XML-IR-DB Sandwich: Is it Better With an Algebra in Between? An XML-IR-DB Sandwich: Is it Better With an Algebra in Between? Vojkan Mihajlović Djoerd Hiemstra Henk Ernst Blok Peter M. G. Apers CTIT, University of Twente P.O. Box 217, 7500AE Enschede, The Netherlands

More information

Aggregation for searching complex information spaces. Mounia Lalmas

Aggregation for searching complex information spaces. Mounia Lalmas Aggregation for searching complex information spaces Mounia Lalmas mounia@acm.org Outline Document Retrieval Focused Retrieval Aggregated Retrieval Complexity of the information space (s) INEX - INitiative

More information

Statistical Language Models for Intelligent XML Retrieval

Statistical Language Models for Intelligent XML Retrieval Statistical Language Models for Intelligent XML Retrieval Djoerd Hiemstra University of Twente, Centre for Telematics and Information Technology P.O. Box 217, 7500 AE Enschede, The Netherlands d.hiemstra@utwente.nl

More information

Structural Feedback for Keyword-Based XML Retrieval

Structural Feedback for Keyword-Based XML Retrieval Structural Feedback for Keyword-Based XML Retrieval Ralf Schenkel and Martin Theobald Max-Planck-Institut für Informatik, Saarbrücken, Germany {schenkel, mtb}@mpi-inf.mpg.de Abstract. Keyword-based queries

More information

THE weighting functions of information retrieval [1], [2]

THE weighting functions of information retrieval [1], [2] A Comparative Study of MySQL Functions for XML Element Retrieval Chuleerat Jaruskulchai, Member, IAENG, and Tanakorn Wichaiwong, Member, IAENG Abstract Due to the ever increasing information available

More information

Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database

Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database Jovan Pehcevski, James Thom, Anne-Marie Vercoustre To cite this version: Jovan Pehcevski, James Thom, Anne-Marie Vercoustre.

More information

Retrieval Quality vs. Effectiveness of Relevance-Oriented Search in XML Documents

Retrieval Quality vs. Effectiveness of Relevance-Oriented Search in XML Documents Retrieval Quality vs. Effectiveness of Relevance-Oriented Search in XML Documents Norbert Fuhr University of Duisburg-Essen Mohammad Abolhassani University of Duisburg-Essen Germany Norbert Gövert University

More information

An Algebra for probabilistic XML retrieval

An Algebra for probabilistic XML retrieval n lgebra for probabilistic XML retrieval Benjamin Piwowarski LIP6, University Paris 6 8, rue du capitaine Scott 75015 Paris, France bpiwowar@poleia.lip6.fr Patrick Gallinari LIP6, University Paris 6 8,

More information

Phrase Detection in the Wikipedia

Phrase Detection in the Wikipedia Phrase Detection in the Wikipedia Miro Lehtonen 1 and Antoine Doucet 1,2 1 Department of Computer Science P. O. Box 68 (Gustaf Hällströmin katu 2b) FI 00014 University of Helsinki Finland {Miro.Lehtonen,Antoine.Doucet}

More information

Informativeness for Adhoc IR Evaluation:

Informativeness for Adhoc IR Evaluation: Informativeness for Adhoc IR Evaluation: A measure that prevents assessing individual documents Romain Deveaud 1, Véronique Moriceau 2, Josiane Mothe 3, and Eric SanJuan 1 1 LIA, Univ. Avignon, France,

More information

Score Region Algebra: Building a Transparent XML-IR Database

Score Region Algebra: Building a Transparent XML-IR Database Vojkan Mihajlović Henk Ernst Blok Djoerd Hiemstra Peter M. G. Apers Score Region Algebra: Building a Transparent XML-IR Database Centre for Telematics and Information Technology (CTIT) Faculty of Electrical

More information

The Heterogeneous Collection Track at INEX 2006

The Heterogeneous Collection Track at INEX 2006 The Heterogeneous Collection Track at INEX 2006 Ingo Frommholz 1 and Ray Larson 2 1 University of Duisburg-Essen Duisburg, Germany ingo.frommholz@uni-due.de 2 University of California Berkeley, California

More information

Structural Features in Content Oriented XML retrieval

Structural Features in Content Oriented XML retrieval Structural Features in Content Oriented XML retrieval Georgina Ramírez Thijs Westerveld Arjen P. de Vries georgina@cwi.nl thijs@cwi.nl arjen@cwi.nl CWI P.O. Box 9479, 19 GB Amsterdam, The Netherlands ABSTRACT

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Controlling Overlap in Content-Oriented XML Retrieval

Controlling Overlap in Content-Oriented XML Retrieval Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science, University of Waterloo, Canada claclark@plg.uwaterloo.ca ABSTRACT The direct application of standard

More information

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

Structured Queries in XML Retrieval

Structured Queries in XML Retrieval Structured Queries in XML Retrieval Jaap Kamps 1,2 Maarten Marx 2 Maarten de Rijke 2 Börkur Sigurbjörnsson 2 1 Archives and Information Studies, University of Amsterdam, Amsterdam, The Netherlands 2 Informatics

More information

Focussed Structured Document Retrieval

Focussed Structured Document Retrieval Focussed Structured Document Retrieval Gabrialla Kazai, Mounia Lalmas and Thomas Roelleke Department of Computer Science, Queen Mary University of London, London E 4NS, England {gabs,mounia,thor}@dcs.qmul.ac.uk,

More information

Passage Retrieval and other XML-Retrieval Tasks

Passage Retrieval and other XML-Retrieval Tasks Passage Retrieval and other XML-Retrieval Tasks Andrew Trotman Department of Computer Science University of Otago Dunedin, New Zealand andrew@cs.otago.ac.nz Shlomo Geva Faculty of Information Technology

More information

Evaluation Metrics. Jovan Pehcevski INRIA Rocquencourt, France

Evaluation Metrics. Jovan Pehcevski INRIA Rocquencourt, France Evaluation Metrics Jovan Pehcevski INRIA Rocquencourt, France jovan.pehcevski@inria.fr Benjamin Piwowarski Yahoo! Research Latin America bpiwowar@yahoo-inc.com SYNONYMS Performance metrics; Evaluation

More information

Focused Retrieval Using Topical Language and Structure

Focused Retrieval Using Topical Language and Structure Focused Retrieval Using Topical Language and Structure A.M. Kaptein Archives and Information Studies, University of Amsterdam Turfdraagsterpad 9, 1012 XT Amsterdam, The Netherlands a.m.kaptein@uva.nl Abstract

More information

Plan for today. CS276B Text Retrieval and Mining Winter Vector spaces and XML. Text-centric XML retrieval. Vector spaces and XML

Plan for today. CS276B Text Retrieval and Mining Winter Vector spaces and XML. Text-centric XML retrieval. Vector spaces and XML CS276B Text Retrieval and Mining Winter 2005 Plan for today Vector space approaches to XML retrieval Evaluating text-centric retrieval Lecture 15 Text-centric XML retrieval Documents marked up as XML E.g.,

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 13 Structured Text Retrieval with Mounia Lalmas Introduction Structuring Power Early Text Retrieval Models Evaluation Query Languages Structured Text Retrieval, Modern

More information

Sound and Complete Relevance Assessment for XML Retrieval

Sound and Complete Relevance Assessment for XML Retrieval Sound and Complete Relevance Assessment for XML Retrieval Benjamin Piwowarski Yahoo! Research Latin America Santiago, Chile Andrew Trotman University of Otago Dunedin, New Zealand Mounia Lalmas Queen Mary,

More information

Kikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML

Kikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML Kikori-KS An Effective and Efficient Keyword Search System for Digital Libraries in XML Toshiyuki Shimizu 1, Norimasa Terada 2, and Masatoshi Yoshikawa 1 1 Graduate School of Informatics, Kyoto University

More information

Identifying and Ranking Relevant Document Elements

Identifying and Ranking Relevant Document Elements Identifying and Ranking Relevant Document Elements Andrew Trotman and Richard A. O Keefe Department of Computer Science University of Otago Dunedin, New Zealand andrew@cs.otago.ac.nz, ok@otago.ac.nz ABSTRACT

More information

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:

More information

Articulating Information Needs in

Articulating Information Needs in Articulating Information Needs in XML Query Languages Jaap Kamps, Maarten Marx, Maarten de Rijke and Borkur Sigurbjornsson Gonçalo Antunes Motivation Users have increased access to documents with additional

More information

Specificity Aboutness in XML Retrieval

Specificity Aboutness in XML Retrieval Specificity Aboutness in XML Retrieval Tobias Blanke and Mounia Lalmas Department of Computing Science, University of Glasgow tobias.blanke@dcs.gla.ac.uk mounia@acm.org Abstract. This paper presents a

More information

XPath Inverted File for Information Retrieval

XPath Inverted File for Information Retrieval XPath Inverted File for Information Retrieval Shlomo Geva Centre for Information Technology Innovation Faculty of Information Technology Queensland University of Technology GPO Box 2434 Brisbane Q 4001

More information

University of Amsterdam at INEX 2009: Ad hoc, Book and Entity Ranking Tracks

University of Amsterdam at INEX 2009: Ad hoc, Book and Entity Ranking Tracks University of Amsterdam at INEX 2009: Ad hoc, Book and Entity Ranking Tracks Marijn Koolen 1, Rianne Kaptein 1, and Jaap Kamps 1,2 1 Archives and Information Studies, Faculty of Humanities, University

More information

The Importance of Length Normalization for XML Retrieval

The Importance of Length Normalization for XML Retrieval The Importance of Length Normalization for XML Retrieval Jaap Kamps, (kamps@science.uva.nl) Maarten de Rijke (mdr@science.uva.nl) and Börkur Sigurbjörnsson (borkur@science.uva.nl) Informatics Institute,

More information

Overview of INEX 2005

Overview of INEX 2005 Overview of INEX 2005 Saadia Malik 1, Gabriella Kazai 2, Mounia Lalmas 2, and Norbert Fuhr 1 1 Information Systems, University of Duisburg-Essen, Duisburg, Germany {malik,fuhr}@is.informatik.uni-duisburg.de

More information

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVII, Number 4, 2012 CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL IOAN BADARINZA AND ADRIAN STERCA Abstract. In this paper

More information

Evaluating the effectiveness of content-oriented XML retrieval

Evaluating the effectiveness of content-oriented XML retrieval Evaluating the effectiveness of content-oriented XML retrieval Norbert Gövert University of Dortmund Norbert Fuhr University of Duisburg-Essen Gabriella Kazai Queen Mary University of London Mounia Lalmas

More information

Book Recommendation based on Social Information

Book Recommendation based on Social Information Book Recommendation based on Social Information Chahinez Benkoussas and Patrice Bellot LSIS Aix-Marseille University chahinez.benkoussas@lsis.org patrice.bellot@lsis.org Abstract : In this paper, we present

More information

Evaluating a Conceptual Indexing Method by Utilizing WordNet

Evaluating a Conceptual Indexing Method by Utilizing WordNet Evaluating a Conceptual Indexing Method by Utilizing WordNet Mustapha Baziz, Mohand Boughanem, Nathalie Aussenac-Gilles IRIT/SIG Campus Univ. Toulouse III 118 Route de Narbonne F-31062 Toulouse Cedex 4

More information

CIRQuL - Complex Information Retrieval Query Language

CIRQuL - Complex Information Retrieval Query Language CIRQuL - Complex Information Retrieval Query Language Vojkan Mihajlovic, Djoerd Hiemstra, Peter M.G. Apers University of Twente, CTIT, Enschede, The Netherlands Abstract In this paper we will present a

More information

Effective Tweet Contextualization with Hashtags Performance Prediction and Multi-Document Summarization

Effective Tweet Contextualization with Hashtags Performance Prediction and Multi-Document Summarization Effective Tweet Contextualization with Hashtags Performance Prediction and Multi-Document Summarization Romain Deveaud 1 and Florian Boudin 2 1 LIA - University of Avignon romain.deveaud@univ-avignon.fr

More information

XML Retrieval More Efficient Using Compression Technique

XML Retrieval More Efficient Using Compression Technique XML Retrieval More Efficient Using Compression Technique Tanakorn Wichaiwong and Chuleerat Jaruskulchai Abstract In this paper, we report experimental results of our approach for retrieval large-scale

More information

DCU and 2010: Ad-hoc and Data-Centric tracks

DCU and 2010: Ad-hoc and Data-Centric tracks DCU and ISI@INEX 2010: Ad-hoc and Data-Centric tracks Debasis Ganguly 1, Johannes Leveling 1, Gareth J. F. Jones 1 Sauparna Palchowdhury 2, Sukomal Pal 2, and Mandar Mitra 2 1 CNGL, School of Computing,

More information

Navigating the User Query Space

Navigating the User Query Space Navigating the User Query Space Ronan Cummins 1, Mounia Lalmas 2, Colm O Riordan 3 and Joemon M. Jose 1 1 School of Computing Science, University of Glasgow, UK 2 Yahoo! Research, Barcelona, Spain 3 Dept.

More information

Information Retrieval (Part 1)

Information Retrieval (Part 1) Information Retrieval (Part 1) Fabio Aiolli http://www.math.unipd.it/~aiolli Dipartimento di Matematica Università di Padova Anno Accademico 2008/2009 1 Bibliographic References Copies of slides Selected

More information

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Alejandro Bellogín 1,2, Thaer Samar 1, Arjen P. de Vries 1, and Alan Said 1 1 Centrum Wiskunde

More information

External Query Reformulation for Text-based Image Retrieval

External Query Reformulation for Text-based Image Retrieval External Query Reformulation for Text-based Image Retrieval Jinming Min and Gareth J. F. Jones Centre for Next Generation Localisation School of Computing, Dublin City University Dublin 9, Ireland {jmin,gjones}@computing.dcu.ie

More information

XML Element Retrieval using terms propagation

XML Element Retrieval using terms propagation International Conference on Automation, Control, Engineering and Computer Science (ACECS'4) Proceedings - Copyright IPCO-204, pp.58-63 ISSN 2356-5608 XML Element Retrieval using terms propagation Samia

More information

Seven years of INEX interactive retrieval experiments lessons and challenges

Seven years of INEX interactive retrieval experiments lessons and challenges Seven years of INEX interactive retrieval experiments lessons and challenges Ragnar Nordlie and Nils Pharo Oslo and Akershus University College of Applied Sciences Postboks 4 St. Olavs plass, N-0130 Oslo,

More information

Overview of the INEX 2008 Ad Hoc Track

Overview of the INEX 2008 Ad Hoc Track Overview of the INEX 2008 Ad Hoc Track Jaap Kamps 1, Shlomo Geva 2, Andrew Trotman 3, Alan Woodley 2, and Marijn Koolen 1 1 University of Amsterdam, Amsterdam, The Netherlands {kamps,m.h.a.koolen}@uva.nl

More information

Using Attribute Grammars to Uniformly Represent Structured Documents - Application to Information Retrieval

Using Attribute Grammars to Uniformly Represent Structured Documents - Application to Information Retrieval Using Attribute Grammars to Uniformly Represent Structured Documents - Application to Information Retrieval Alda Lopes Gançarski Pierre et Marie Curie University, Laboratoire d Informatique de Paris 6,

More information

Configurable Indexing and Ranking for XML Information Retrieval

Configurable Indexing and Ranking for XML Information Retrieval Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCL Computer Science Department, Los ngeles, C, US 90095 {sliu, zou, wwc}@cs.ucla.edu BSTRCT

More information

A Fusion Approach to XML Structured Document Retrieval

A Fusion Approach to XML Structured Document Retrieval A Fusion Approach to XML Structured Document Retrieval Ray R. Larson School of Information Management and Systems University of California, Berkeley Berkeley, CA 94720-4600 ray@sims.berkeley.edu 17 April

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Robust Relevance-Based Language Models

Robust Relevance-Based Language Models Robust Relevance-Based Language Models Xiaoyan Li Department of Computer Science, Mount Holyoke College 50 College Street, South Hadley, MA 01075, USA Email: xli@mtholyoke.edu ABSTRACT We propose a new

More information

Mercure at trec6 2 IRIT/SIG. Campus Univ. Toulouse III. F Toulouse. fbougha,

Mercure at trec6 2 IRIT/SIG. Campus Univ. Toulouse III. F Toulouse.   fbougha, Mercure at trec6 M. Boughanem 1 2 C. Soule-Dupuy 2 3 1 MSI Universite de Limoges 123, Av. Albert Thomas F-87060 Limoges 2 IRIT/SIG Campus Univ. Toulouse III 118, Route de Narbonne F-31062 Toulouse 3 CERISS

More information

Overview of the INEX 2008 Ad Hoc Track

Overview of the INEX 2008 Ad Hoc Track Overview of the INEX 2008 Ad Hoc Track Jaap Kamps 1, Shlomo Geva 2, Andrew Trotman 3, Alan Woodley 2, and Marijn Koolen 1 1 University of Amsterdam, Amsterdam, The Netherlands {kamps,m.h.a.koolen}@uva.nl

More information

Research Article An Exponentiation Method for XML Element Retrieval

Research Article An Exponentiation Method for XML Element Retrieval e Scientific World Journal, Article ID 404518, 10 pages http://dx.doi.org/10.1155/2014/404518 Research Article An Exponentiation Method for XML Element Retrieval Tanakorn Wichaiwong Department of Computer

More information

second_language research_teaching sla vivian_cook language_department idl

second_language research_teaching sla vivian_cook language_department idl Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

A BELIEF NETWORK MODEL FOR EXPERT SEARCH

A BELIEF NETWORK MODEL FOR EXPERT SEARCH A BELIEF NETWORK MODEL FOR EXPERT SEARCH Craig Macdonald, Iadh Ounis Department of Computing Science, University of Glasgow, Glasgow, G12 8QQ, UK craigm@dcs.gla.ac.uk, ounis@dcs.gla.ac.uk Keywords: Expert

More information

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback RMIT @ TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback Ameer Albahem ameer.albahem@rmit.edu.au Lawrence Cavedon lawrence.cavedon@rmit.edu.au Damiano

More information

Federated Search. Jaime Arguello INLS 509: Information Retrieval November 21, Thursday, November 17, 16

Federated Search. Jaime Arguello INLS 509: Information Retrieval November 21, Thursday, November 17, 16 Federated Search Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu November 21, 2016 Up to this point... Classic information retrieval search from a single centralized index all ueries

More information

Aspects of an XML-Based Phraseology Database Application

Aspects of an XML-Based Phraseology Database Application Aspects of an XML-Based Phraseology Database Application Denis Helic 1 and Peter Ďurčo2 1 University of Technology Graz Insitute for Information Systems and Computer Media dhelic@iicm.edu 2 University

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL

HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL International Journal of Mechanical Engineering & Computer Sciences, Vol.1, Issue 1, Jan-Jun, 2017, pp 12-17 HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL BOMA P.

More information

Evaluating the eectiveness of content-oriented XML retrieval methods

Evaluating the eectiveness of content-oriented XML retrieval methods Evaluating the eectiveness of content-oriented XML retrieval methods Norbert Gövert (norbert.goevert@uni-dortmund.de) University of Dortmund, Germany Norbert Fuhr (fuhr@uni-duisburg.de) University of Duisburg-Essen,

More information

Automatic Generation of Query Sessions using Text Segmentation

Automatic Generation of Query Sessions using Text Segmentation Automatic Generation of Query Sessions using Text Segmentation Debasis Ganguly, Johannes Leveling, and Gareth J.F. Jones CNGL, School of Computing, Dublin City University, Dublin-9, Ireland {dganguly,

More information

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Sungbin Choi, Jinwook Choi Medical Informatics Laboratory, Seoul National University, Seoul, Republic of

More information

Reducing Redundancy with Anchor Text and Spam Priors

Reducing Redundancy with Anchor Text and Spam Priors Reducing Redundancy with Anchor Text and Spam Priors Marijn Koolen 1 Jaap Kamps 1,2 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Informatics Institute, University

More information

The overlap problem in content-oriented XML retrieval evaluation

The overlap problem in content-oriented XML retrieval evaluation The overlap problem in content-oriented XML retrieval evaluation Gabriella Kazai Queen Mary University of London London, E1 4NS UK gabs@dcs.qmul.ac.uk Mounia Lalmas Queen Mary University of London London,

More information

Information Retrieval from Structured Documents Represented by Attribute Grammars

Information Retrieval from Structured Documents Represented by Attribute Grammars Abstract Information Retrieval from Structured Documents Represented by Attribute Grammars Alda Lopes Gançarski * alda.lopes@lip6.fr Pedro Rangel Henriques ** prh@di.uminho.pt This paper presents a system

More information

A probabilistic description-oriented approach for categorising Web documents

A probabilistic description-oriented approach for categorising Web documents A probabilistic description-oriented approach for categorising Web documents Norbert Gövert Mounia Lalmas Norbert Fuhr University of Dortmund {goevert,mounia,fuhr}@ls6.cs.uni-dortmund.de Abstract The automatic

More information

DELOS WP7: Evaluation

DELOS WP7: Evaluation DELOS WP7: Evaluation Claus-Peter Klas Univ. of Duisburg-Essen, Germany (WP leader: Norbert Fuhr) WP Objectives Enable communication between evaluation experts and DL researchers/developers Continue existing

More information

Applying the IRStream Retrieval Engine to INEX 2003

Applying the IRStream Retrieval Engine to INEX 2003 Applying the IRStream Retrieval Engine to INEX 2003 Andreas Henrich, Volker Lüdecke University of Bamberg D-96045 Bamberg, Germany {andreas.henrich volker.luedecke}@wiai.unibamberg.de Günter Robbert University

More information

Extracting Output Schemas from XSLT Stylesheets and Their Possible Applications

Extracting Output Schemas from XSLT Stylesheets and Their Possible Applications Extracting Output Schemas from XSLT Stylesheets and Their Possible Applications Ruben Mes ruben.mes@ist.utl.pt José Borbinha jlb@ist.utl.pt Hugo Manguinhas hugo.manguinhas@ist.utl.pt Abstract XML is nowadays

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University

More information

LaHC at CLEF 2015 SBS Lab

LaHC at CLEF 2015 SBS Lab LaHC at CLEF 2015 SBS Lab Nawal Ould-Amer, Mathias Géry To cite this version: Nawal Ould-Amer, Mathias Géry. LaHC at CLEF 2015 SBS Lab. Conference and Labs of the Evaluation Forum, Sep 2015, Toulouse,

More information

The University of Amsterdam at the CLEF 2008 Domain Specific Track

The University of Amsterdam at the CLEF 2008 Domain Specific Track The University of Amsterdam at the CLEF 2008 Domain Specific Track Parsimonious Relevance and Concept Models Edgar Meij emeij@science.uva.nl ISLA, University of Amsterdam Maarten de Rijke mdr@science.uva.nl

More information

A Framework for the Theoretical Evaluation of XML Retrieval

A Framework for the Theoretical Evaluation of XML Retrieval Journal of the American Society for Information Science and Technology A Framework for the Theoretical Evaluation of XML Retrieval Tobias Blanke King s College London Centre for e-research, 26-29 Drury

More information

Information mining and information retrieval : methods and applications

Information mining and information retrieval : methods and applications Information mining and information retrieval : methods and applications J. Mothe, C. Chrisment Institut de Recherche en Informatique de Toulouse Université Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse

More information

XML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson

XML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Manning, Raghavan, and Schütze http://www.informationretrieval.org OVERVIEW Introduction Basic XML Concepts Challenges

More information

Report on the SIGIR 2008 Workshop on Focused Retrieval

Report on the SIGIR 2008 Workshop on Focused Retrieval WORKSHOP REPORT Report on the SIGIR 2008 Workshop on Focused Retrieval Jaap Kamps 1 Shlomo Geva 2 Andrew Trotman 3 1 University of Amsterdam, Amsterdam, The Netherlands, kamps@uva.nl 2 Queensland University

More information