The Interpretation of CAS

Size: px
Start display at page:

Download "The Interpretation of CAS"

Transcription

1 The Interpretation of CAS Andrew Trotman 1 and Mounia Lalmas 2 1 Department of Computer Science, University of Otago, Dunedin, New Zealand andrew@cs.otago.ac.nz, 2 Department of Computer Science, Queen Mary, University of London, London, UK mounia@dcs.qmul.ac.uk, Abstract. There has been much debate over how to interpret the structure in queries that contain structural hints. At INEX 2003 and 2004, there were two interpretations: SCAS in which the user specified target element was interpreted strictly, and VCAS in which it was interpreted vaguely. But how many ways are there that the query could be interpreted? In the investigation at INEX 2005 (discussed herein) four different interpretations were proposed, and compared on the same queries. Those interpretations (SSCAS, SVCAS, VSCAS, and VVCAS) are the four interpretations possible by interpreting the target elements, and the support elements, either strictly or vaguely. An analysis of the submitted runs shows that those that share an interpretation of the target element correlate - that is, the previous decision to divide CAS into the SCAS and VCAS (as done at INEX 2003 and 2004) was sound. The analysis is supported by the fact that the best performing VSCAS run was submitted to the VVCAS task and the best performing SVCAS run was submitted to the SSCAS task. 1 Introduction Does including a structural hint in a query make a precision difference and if so how should we interpret it? At INEX 2005 the ad hoc track has been investigating this question. Two experiments were conducted, the CO+S experiment, and the CAS experiment. In the CO+S experiment the participants were asked to submit topics with content only (CO) queries containing just search terms, and optionally an additional structured (+S) query specified in the NEXI [10] query language. Given these two different interpretations of the same information need it is possible to compare the precision of queries containing structural hints to those that do not for the same information need. The details of the CO+S experiment are beyond the scope of this paper. In a separate experiment participants were asked to submit topics containing queries that contain content and structure (CAS) constraints specified in NEXI [10]. These topics were used to determine how the structural hints, necessarily present in a CAS topic, should be interpreted by a search engine. The two extreme views are the database view that all structural constraints must be

2 upheld, and the information retrieval view that satisfying the information need is more important than following the structural constraints of the query. This contribution discusses the mechanics of the CAS experiment from the topic submission process, the document collection, through to the evaluation methods. The different tasks are compared using Pearson s product moment correlation coefficient showing that there were essentially only two tasks, those that in previous years have gone by the name VCAS and SCAS. Further analysis shows that of the tasks SSCAS is the easiest and VVCAS the hardest. 2 CAS Queries Laboratory experiments in information retrieval following the Cranfield methodology (described by Voorhees [12]) require a document collection, a series of queries (known as topics), and a series of judgments (decisions as to which documents are relevant to which topics). In element retrieval this same process is followed - except with respect to a document element rather than a whole document. Content and structure queries differ from content only queries in so far as they contain structural hints. Two types of structural hints are present, those that specify where to look (support elements) and those that specify what to return to the user (target elements). In INEX topic 258 //article[about(.,intellectual property)]//sec[about(., copyright law)] the search engine is being asked to identify documents about intellectual property and from those extract sections about copyright law. The target element is //article//sec (extract //article//sec elements). The support elements are //article (with support from //article about intellectual property) and //article//sec (and support from //article//sec about copyright law). Full details of the syntax of CAS queries is given by Trotman and Sigurbjörnsson [10]. The applicability of this language to XML evaluation in the context of INEX is also discussed by Trotman and Sigurbjörnsson [11]. 2.1 Query Complexity The simplest CAS queries contain only a single structural constraint. Topic 270, //article//sec[about(., introduction information retrieval)] asks for //article//sec elements about introduction information retrieval. A more complex (multiple clause) query can be decomposed into a series of single constraint (single clause) queries (or child queries). Topic 258, //article[about(.,intellectual property)]//sec[about(., copyright law)] could be written as a series of single constraint queries, each of which must be satisfied. In this case it is decomposed into topic 259,

3 //article[about(.,intellectual property)] and topic 281, //article//sec[about(., copyright law)] if both hold true of a document then the (parent) query is true of that document - and the target element constraints can be considered. The same decomposition property holds true for all multiple constraint CAS topics (so long as the target element is preserved) - it is inherent in the distributive nature of the query language. Having separate parent and children topics makes it possible to look at different interpretations of the same topic. As a topic is judged according to the narrative the judgments are by definition vague. Strict conformance of these judgments to the target element can be generated using a simple filter. This is the approach taken at INEX 2003 and 2004 for the so-called SCAS and VCAS tasks. But what about the sub-clauses of these topics? Should they be interpreted strictly or vaguely? With the judgments for the child topics, vague and strict conformance to these can also be determined. With the combination of child and parent judgments it is possible to look at many different interpretations of the same topic. 2.2 Topic format INEX captures not only the query, but also the information need of the user. These are stored together in XML. Methods not dissimilar to this have been used at TREC [2] and INEX [1] for many years. As an example, INEX topic 258 <?xml version="1.0" encoding="iso "?> <!DOCTYPE inex_topic SYSTEM "topic.dtd"> <inex_topic topic_id="258" query_type="cas" ct_no="72"> <InitialTopicStatement> I have to give a computer science lesson on intellectual property and I m looking for information or examples on copyright law to illustrate it. As I m looking for something which is specific, I don t think I can find a whole article about it. I m consequently looking for section elements. </InitialTopicStatement> <castitle> //article[about(.,intellectual property)]//sec[about(., copyright law)] </castitle> <description> Return sections about copyright law (information or examples) in an article about intellectual property. </description> <narrative> I have to give a computer science lesson on intellectual property, and I m looking for information or examples on copyright law to

4 illustrate it. More precisely, I d like to have information about authors rights and how to protect your creation. As I m looking for something which is specific, I don t think I can find a whole article about it. I m consequently looking for section elements. Information or examples can concern copyright on software, multimedia or operating systems. Copyright on literary work can help but only for examples. Information concerning domain names and trademarks is not relevant. </narrative> </inex_topic> contains several parts all discussing the same information need: <InitialTopicStatement> a description of why the user has chosen to use a search engine, and what it is that the user hopes to achieve. <castitle> the CAS query specified in the NEXI language [10]. <description> a natural language expression of the information need using the same terms as are found in the <castitle>. This element is used by the natural language track at INEX [13]. <narrative> a description of the information need and what makes a result relevant. When judgments are made they are made against this description so it is important that it precisely describes the difference between relevant and irrelevant results. For experiments that additionally take into account the context of a query (such as the interactive track [8]), the purpose for which the information is needed (the work-task) is also given in the narrative. Both the parent query and the child queries are stored in this way - but an additional element, the <parent> element, is present in child topics. This element stores the castitle of the child s parent. This method of linking children to parents was chosen over using identifiers as it was considered less likely to be prone to human input error. 2.3 Query Interpretation A contentious point about CAS queries is the interpretation. The strict view is that the structural hints are constraints and the search engine should follow them to ensure returning elements that satisfy the user. The vague view is that the structural hints are hints and can be down-played so long as a returned element is relevant in the mind of the user (it satisfies the information need). A single clause query might be interpreted strictly, or vaguely - that is the constraint might be followed or can be ignored. If, for example, a user asks for an article abstract about information retrieval, then perhaps an article introduction might just as well satisfy the need - or perhaps not. With multiple clause queries, there are many possible interpretations. In the CAS experiment at INEX 2005, the strict and vague interpretations are applied to both the target element, and the support elements. This gives four interpretations written XYCAS where X is the target element and Y is the

5 support element, and either X or Y can be S for strict or V for vague. Those interpretations are: VVCAS: The target element constraint is vague and the support element constraints are vague. This is the information retrieval view of the topic. SVCAS: The target element constraint is strict, but the support element constraints are vague. VSCAS: The target element constraint is vague, but the support element constraints are followed strictly. SSCAS: Both the target element constraint and the support element constraint are followed strictly. This is the database view. 3 Document Collection The document collection used in the experiments was the INEX IEEE document collection version 1.8. This collection contains 16,819 documents taken from IEEE transactions and magazines published between 1995 and The total size of the source XML is 705MB. This is the latest version of the INEX collection at publication date. 4 Data Acquisition This section discusses the acquisition of the queries from the participants and the verification that they are representative of previous years. It also discusses the acquisition of the judgments and the construction of the different judgment sets. 4.1 Query Acquisition The document collection was distributed to the participating organizations. They were then each asked to submit one CAS topic along with any associated single clause child-topics (should they exist). These topics then went through a selection process in which queries were parsed for syntactic correctness, semantic correctness, consistency, and validated against their child topics. A total of 17 queries passed this selection process. Table 1. A Breakdown of the complexity of INEX 2005 CAS topics shows that they are representative of previous years Clauses (23%) 12 (40%) 6 (20%) 5 (17%) (12%) 22 (65%) 4 (12%) 4 (12%) (18%) 12 (71%) 2 (12%) 0 (00%)

6 The breakdown of CAS topic complexity (excluding child-topics) for each of INEX 2003, 2004, and 2005 is given in Table 1. From visual inspection it can be seen that the breakdown in 2005 is representative of previous years, most queries contain two clauses with approximately the same number of three and one clause topics. In 2005 there were no topics with more than 3 clauses. 4.2 Child Topics Table 2. The 17 topics and the topic numbers of their children Parent Children Parent Children Parent Children , , , 249, , , , , 278, , , , , , , , 287 Each topic and child topic was given a unique identifier (stored in the topic id attribute of the inex topic tag). Table 2 shows which topics are parent topics and which topics are their children. Topic 258, for example, has topics 259 and 281 as children whereas topic 260 is a single clause query and has no children. It may appear at the onset that these child topics can be used as part of the evaluation giving a total of 47 topics. This, however, is not the case. The guidelines for topic development [7] identifies that for evaluation purposes queries must be diverse, independent, and representative. Using both the parent and the children topics for computing performance violates the independence requirement - and weights evaluation in favor of longer topics (which have more children). Using just the child topics, and discarding the parents, violates the requirement that topics are representative. In Table 1, the breakdown of topics from previous years is shown. Most topics have two clauses, whereas child topics (by definition) have only one. The children, without their parents, are not representative. 4.3 Judgment Acquisition The topics and child-topics were distributed to the participants. Each participating group was invited to submit up to two runs for each CAS task. At least one was required for VVCAS. A run consisted of at most 1,500 ranked results for each parent and child topic. There were no restrictions on which part of the topic was used to generate the query - participants were permitted to use the narrative, or description, or castitle if they so chose.

7 These results were then pooled in a similar manner to that used at TREC (and shown to be robust there by Zobel [15]). The details of the INEX pooling method are give by Piwowarski and Lalmas [6] and a discussion of the robustness is provided by Woodley and Geva [14]. The pool identifies which documents and elements the search engines considered relevant to the query. Using a graphical interface (the 2005 version of X-Rai [5, 6]) to the document collection, the original author of the query (where possible) was asked to identify which elements of which documents in the judgment pool were, in fact, relevant to the information need. Assessors first highlighted relevant passages from the text, and then they assigned relevance values to all elements in this region on a three points scale: highly exhaustive, partly exhaustive, or too small. This assessment was performed for the parent topics in isolation of the child topics - and not necessarily by the same assessor. As a topic may contain many different interpretations of the information need (for example the description and the castitle) all judgments were made with reference to the description contained in the topic narrative. 4.4 CAS Relevance Assessments Table 3. Topics assessed by more than one assessor. Listed for each set against each topic is the pool-id of the assessments Pool Topic Set-a Set-b (official) (other) In a separate experiment the consistency of the judgments is being measured across multiple assessors. This is done by asking two or more judges to assess the same topic, without knowledge of the other s decisions. Of the CAS topics, those listed in Table 3 were multiple-judged. The consequence of this multiple assessment process is that there is no single set of relevance assessments. Inline with INEX 2004, the assessments are divided into two groups: set-a, and set-b (see Pehcevski et al. [4] and Trotman [9] for a discussion of the 2004 results of this experiment). The INEX 2005 assignment was made based on proportion of completion at the date the first relevance assessments were released. Those judgments that, from visual inspection, appeared most complete were assigned to set-a, while the other was assigned to set-b. In this way set-a, the set used to generate the official results, was most complete and therefore most reliable.

8 Internal to X-Rai (the online assessment tool), each assessment of each topic by each judge is given an internal identifier - the pool id. Table 3 also shows which pool ids were assigned to which judgment set. 4.5 CAS Relevance Sets From set-a, four sets of judgments were generated, one for each of the four CAS interpretations - each derived from the same initial set of judgments. VVCAS: The assessments as done by the assessors (against the narrative). These assessments are unmodified from those collected by INEX from the assessor. SVCAS: Those VVCAS judgments that strictly satisfy the target element constraint. This set of judgments was computed by taking the VVCAS judgments and removing all judgments that did not satisfy the target element constraint. This was done by a simple matching process. All those elements that were, indeed, the target element were included and those that were not were removed. Topic 260 is an exception. In this case the target element is specified as //bdy//*. To satisfy this constraint all descendants of //bdy (excluding //bdy) are considered to strictly comply. VSCAS: A relevant element is not required to satisfy the target constraint, however the document must satisfy all other constraints specified in the query. That is, for a multiple clause topic, an element is relevant only if it comes from a document in which the child-topics are strictly adhered to. In all except two topics, given the conjuncton of documents relevant to the child topics, this is any relevant element from the VVCAS set that comes from this conjunction. In one exception (topic 247), this conjunction is replaced with a disjunction. In the other exception (topic 250) there are (presently) no judgments as the assessment task has not been completed. SSCAS: Those VSCAS judgments that satisfy the target element constraint. These are computed from the VSCAS judgments in the same way that SVCAS judgments are computed from VVCAS judgments - strict conformance to the target element. The guidelines for topic development [7] identify groups of tags that are equivalent. For example, for historic paper publishing reasons the sec, ss1, ss2 and ss3 tags are all used to identify sections of documents in the collection. The strict conformance to a given structural constraint occurs with reference to the equivalence list - //article//bdy//ss1 strictly conforms to //article//sec. These relevance sets are considered to be the full recall base for each interpretations of CAS. Different metrics and quantizatons functions could further reduce the relevance sets. For example, in the case of struct quantization only those elements that conform to the interpretation of CAS and further conform to the interpretation of strict are considered relevant.

9 5 Measurement The official metric used to report the performance of a system at INEX 2005 is MAep, the mean average nxcg rank at 1500 elements. This measure is described by Kazai and Lalmas [3]. The results (produced using xcgeval) for the INEX 2005 CAS task are available from INEX. There were 99 runs submitted to the CAS tasks, of which 25 were SSCAS, 23 SVCAS, 23 VSCAS, and 28 VVCAS 1. Of the 17 topics used for evaluation (the parent topics of Table 2) judgments currently exist for only 10 topics - at the time of writing the assessment task had not been completed for the other 7 topics. Of those 10 topics, only 7 have any elements that strictly conform to their child topic structural constraint. The comparison of systems herein is based only on these topics. Table 4. Number of relevant elements for each topic using generalised quantization Topic SSCAS SVCAS VSCAS VVCAS In Table 4 and Table 5 the number of relevant elements for each topic of each task is shown. The judgments for strict quantization are highly sparse - for the SSCAS task, there are only 4 topics with highly specific and highly exhaustive judgments. It does not seem reasonable to draw any conclusions from only 4 topics so the remainder of this analysis applies to only the generalized quantization of results. By taking all runs submitted to any CAS task and correlating the performance on one task to the performance on another (say, VVCAS with SSCAS), it is possible to see if a search engine designed for one interpretation also performs well on other interpretations (and therefore if there is any material difference in the tasks). That is, if a search engine is designed to answer in one way but the user expects results in another, does it matter? Taking all the CAS runs (including the unofficial runs) the IBM Haifa Research Lab run VVCAS no phrase no tags submitted to the VVCAS task performs best using the VVCAS judgments (with a MAep score of ), but if the user need included a strict interpretation of 1 Submissons version 1 and judgments version 7 are used throughout

10 Table 5. Number of relevant elements for each topic using strict quantization Topic SSCAS SVCAS VSCAS VVCAS the topic (it was evaluated using the SSCAS judgments) then it is at position 50 with a score of By comparing the performance of runs submitted to each task it is possible to determine if one task is inherently easier, or harder, than the others. With a harder task there is more room for improvement - further investigation into this task might result in improvements all-round. 5.1 Do the Judgment Sets Correlate? Table 6. Pearson s product moment correlation coefficient between each CAS task SSCAS SVCAS VSCAS VVCAS SSCAS SVCAS VSCAS VVCAS Table 6 shows the Pearson s product moment correlation coefficient computed for all runs when scored at each task. Scores close to 1 show a positive correlation, those close to -1 a negative correlation and those at 0 show no correlation. It is clear from the table that VVCAS and VSCAS are strongly correlated. A search strategy that performs well at one task performs well at the other. SSCAS and SVCAS, both with a strict interpretation of the target element are less strongly correlated. There is little correlation between a strict interpretation of the target element and a vague interpretation of the target element (SVCAS and VSCAS, for example).

11 Figure 1 shows this correlation for the vague target element tasks. There is a cluster of best-scoring runs at the top-right of the graph. They are runs that have performed well at both VVCAS and VSCAS. These four runs are those from IBM Haifa Research Lab. Although different runs perform best on the VVCAS and VSCAS task, both best runs were submitted to the VVCAS task - providing further evidence of the correlation of the two tasks. Figure 2 shows the same for the strict target element tasks. The cluster is not seen. The best performing run measuring on the SVCAS task was submitted to the SSCAS task (again IBM Haifa Research Lab). These same runs were only bettered by the four from the University of Tampere when measured for the SSCAS task. Although Tampere produced runs that performed well at the SSCAS task and not at the SVCAS task, IBM Haifa Research Lab produced runs that performed well at both tasks. Again further evidence of the correlation of the two tasks. Figure 3 shows the performance of SSCAS against VVCAS. It is clear from this figure that those runs that perform well at one task do not perform well at the other. It appears, from visual inspection, that they are average performers at each other s tasks. Performance of Vague Target Tasks VVCAS (MAep) VSCAS (MAep) Fig. 1. Plot of performance of all submitted runs using VVCAS and VSCAS shows a strong correlation of one to the other

12 Performance of Strict Target Tasks SVCAS (MAep) SSCAS (MAep) Fig. 2. Plot of performance of all submitted runs using SVCAS and SSCAS shows a strong correlation of one to the other Performance Across Different Targets VVCAS (MAep) SSCAS (MAep) Fig. 3. Plot of performance of all submitted runs using VVCAS and SSCAS shows little correlation of one to the other

13 Table 7. Mean performance of top 21 runs from each task SSCAS SVCAS VSCAS VVCAS Mean Std Dev Best Worst Tasks Complexity For each task the performance (MAep) of each of the top performaing 21 runs submitted to that task was computed. This number was chosen because different numbers of runs were submitted to each task, and for all tasks there were at least 21 runs with a non-zero score. Table 7 presents the performance of the best run, the mean computed over all runs submitted to the task, and the worst run submited to each task. It can be seen that the best performing run was submitted to the SSCAS task, and for that task the average run performs better that the average run from other tasks. From this we deduce that the SSCAS task is easier than the other tasks. This task may be easiest because the required structural constraints are specified explicitly in the query and the search engine can use this as a filter to remove known non-relevant elements from the result list. Normally it is invalid to measure the performance of two different search engines by measuring the performance of one on one collecion and the second on a second collection (or set of topics, or judgments). In this experiment the document collection and topics are fixed, the judgments are derived from a single common source, and mean performance across several search engines is compared. We believe this comparison is sound. 6 Conclusions The Pearson s correlation shows that there are only two different interpretations of the query, those with a strict interpretation of the target element and those with a vague interpretation of the target element (the database and the information retrieval views). It is possible to ignore the interpretation of the child elements and concentrate on only the target elements. In previous years, INEX has made a distinction between strict and vague conformance to the target element, but has disregarded conformance to child constraints (the so-called SCAS and VCAS tasks). This finding suggests the experiments of previous years did, indeed, make the correct distinction. Checking child constraints does not appear worthwhile for content-oriented XML retrieval. The vague task has proven more difficult than the strict task. Strict conformance to the target element can be computed as a filter of a vague run - from those vague elements, remove all that do not conform to the target element con-

14 straint. The vague interpretation of CAS is a better place to concentrate research effort. If the CAS task continues in future years, a single set of topics, without the child topics is all that is necessary for evaluation and participants should concentrate on the vague interpretation of topics. References 1. Fuhr, N., Gövert, N., Kazai, G., and Lalmas, M. (2002). INEX: Initiative for the evaluation of XML retrieval. In Proceedings of the ACM SIGIR 2000 Workshop on XML and Information Retrieval. 2. Harman, D. (1993). Overview of the first TREC conference. In Proceedings of the 16th ACM SIGIR Conference on Information Retrieval, (pp ). 3. Kazai, G., and Lalmas, M. (2006). INEX 2005 evaluation metrics. In Advances in XML Information Retrieval and Evaluation: Fourth Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2005), Lecture Notes in Computer Science, Vol 3977, Springer-Verlag. 4. Pehcevski, J., Thom, J. A., and Vercoustre, A.-M. (2005). Users and assessors in the context of INEX: Are relevance dimensions relevant? In Proceedings of the INEX 2005 Workshop on Element Retrieval Methodology, Second Edition, (pp ). 5. Piwowarski, B., and Lalmas, M. (2004). Interface pour l évaluation de systèmes de recherche sur des documents XML. In Proceedings of the Premiere COnference en Recherche d Information et Applications (CORIA 04). 6. Piwowarski, B., and Lalmas, M. (2004). Providing consistent and exhaustive relevance assessments for XML retrieval evaluation. In Proceedings of the 13th ACM conference on Information and knowledge management, (pp ). 7. Sigurbjörnsson, B., Trotman, A., Geva, S., Lalmas, M., Larsen, B., and Malik, S. (2005). INEX 2005 guidelines for topic development. In Proceedings of INEX Tombros, A., Larsen, B., and Malik, S. (2004). The interactive track at INEX In Proceedings of INEX 2004, (pp ). 9. Trotman, A. (2005). Wanted: Element retrieval users. In Proceedings of the INEX 2005 Workshop on Element Retrieval Methodology, Second Edition, (pp ). 10. Trotman, A., and Sigurbjörnsson, B. (2004). Narrowed Extended XPath I (NEXI). In Proceedings of INEX 2004, (pp ). 11. Trotman, A., and Sigurbjörnsson, B. (2004). NEXI, now and next. In Proceedings of INEX 2004, (pp ). 12. Voorhees, E. M. (2001). The philosophy of information retrieval evaluation. In Proceedings of the The Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems, (pp ). 13. Woodley, A., and Geva, S. (2004). NLPX at INEX In Proceedings of INEX 2004, (pp ). 14. Woodley, A., and Geva, S. (2005). Fine tuning INEX. In Proceedings of the INEX 2005 Workshop on Element Retrieval Methodology, Second Edition, (pp ). 15. Zobel, J. (1998). How reliable are the results of large-scale information retrieval experiments? In Proceedings of the 21st ACM SIGIR Conference on Information Retrieval, (pp ).

Relevance in XML Retrieval: The User Perspective

Relevance in XML Retrieval: The User Perspective Relevance in XML Retrieval: The User Perspective Jovan Pehcevski School of CS & IT RMIT University Melbourne, Australia jovanp@cs.rmit.edu.au ABSTRACT A realistic measure of relevance is necessary for

More information

Overview of INEX 2005

Overview of INEX 2005 Overview of INEX 2005 Saadia Malik 1, Gabriella Kazai 2, Mounia Lalmas 2, and Norbert Fuhr 1 1 Information Systems, University of Duisburg-Essen, Duisburg, Germany {malik,fuhr}@is.informatik.uni-duisburg.de

More information

From Passages into Elements in XML Retrieval

From Passages into Elements in XML Retrieval From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles

More information

Processing Structural Constraints

Processing Structural Constraints SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited

More information

Sound and Complete Relevance Assessment for XML Retrieval

Sound and Complete Relevance Assessment for XML Retrieval Sound and Complete Relevance Assessment for XML Retrieval Benjamin Piwowarski Yahoo! Research Latin America Santiago, Chile Andrew Trotman University of Otago Dunedin, New Zealand Mounia Lalmas Queen Mary,

More information

Passage Retrieval and other XML-Retrieval Tasks. Andrew Trotman (Otago) Shlomo Geva (QUT)

Passage Retrieval and other XML-Retrieval Tasks. Andrew Trotman (Otago) Shlomo Geva (QUT) Passage Retrieval and other XML-Retrieval Tasks Andrew Trotman (Otago) Shlomo Geva (QUT) Passage Retrieval Information Retrieval Information retrieval (IR) is the science of searching for information in

More information

Formulating XML-IR Queries

Formulating XML-IR Queries Alan Woodley Faculty of Information Technology, Queensland University of Technology PO Box 2434. Brisbane Q 4001, Australia ap.woodley@student.qut.edu.au Abstract: XML information retrieval systems differ

More information

The Heterogeneous Collection Track at INEX 2006

The Heterogeneous Collection Track at INEX 2006 The Heterogeneous Collection Track at INEX 2006 Ingo Frommholz 1 and Ray Larson 2 1 University of Duisburg-Essen Duisburg, Germany ingo.frommholz@uni-due.de 2 University of California Berkeley, California

More information

Passage Retrieval and other XML-Retrieval Tasks

Passage Retrieval and other XML-Retrieval Tasks Passage Retrieval and other XML-Retrieval Tasks Andrew Trotman Department of Computer Science University of Otago Dunedin, New Zealand andrew@cs.otago.ac.nz Shlomo Geva Faculty of Information Technology

More information

Mounia Lalmas, Department of Computer Science, Queen Mary, University of London, United Kingdom,

Mounia Lalmas, Department of Computer Science, Queen Mary, University of London, United Kingdom, XML Retrieval Mounia Lalmas, Department of Computer Science, Queen Mary, University of London, United Kingdom, mounia@acm.org Andrew Trotman, Department of Computer Science, University of Otago, New Zealand,

More information

The Effect of Structured Queries and Selective Indexing on XML Retrieval

The Effect of Structured Queries and Selective Indexing on XML Retrieval The Effect of Structured Queries and Selective Indexing on XML Retrieval Börkur Sigurbjörnsson 1 and Jaap Kamps 1,2 1 ISLA, Faculty of Science, University of Amsterdam 2 Archives and Information Studies,

More information

The Utrecht Blend: Basic Ingredients for an XML Retrieval System

The Utrecht Blend: Basic Ingredients for an XML Retrieval System The Utrecht Blend: Basic Ingredients for an XML Retrieval System Roelof van Zwol Centre for Content and Knowledge Engineering Utrecht University Utrecht, the Netherlands roelof@cs.uu.nl Virginia Dignum

More information

Using XML Logical Structure to Retrieve (Multimedia) Objects

Using XML Logical Structure to Retrieve (Multimedia) Objects Using XML Logical Structure to Retrieve (Multimedia) Objects Zhigang Kong and Mounia Lalmas Queen Mary, University of London {cskzg,mounia}@dcs.qmul.ac.uk Abstract. This paper investigates the use of the

More information

Accessing XML documents: The INEX initiative. Mounia Lalmas, Thomas Rölleke, Zoltán Szlávik, Tassos Tombros (+ Duisburg-Essen)

Accessing XML documents: The INEX initiative. Mounia Lalmas, Thomas Rölleke, Zoltán Szlávik, Tassos Tombros (+ Duisburg-Essen) Accessing XML documents: The INEX initiative Mounia Lalmas, Thomas Rölleke, Zoltán Szlávik, Tassos Tombros (+ Duisburg-Essen) XML documents Book Chapters Sections World Wide Web This is only only another

More information

Reliability Tests for the XCG and inex-2002 Metrics

Reliability Tests for the XCG and inex-2002 Metrics Reliability Tests for the XCG and inex-2002 Metrics Gabriella Kazai 1, Mounia Lalmas 1, and Arjen de Vries 2 1 Dept. of Computer Science, Queen Mary University of London, London, UK {gabs, mounia}@dcs.qmul.ac.uk

More information

Component ranking and Automatic Query Refinement for XML Retrieval

Component ranking and Automatic Query Refinement for XML Retrieval Component ranking and Automatic uery Refinement for XML Retrieval Yosi Mass, Matan Mandelbrod IBM Research Lab Haifa 31905, Israel {yosimass, matan}@il.ibm.com Abstract ueries over XML documents challenge

More information

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks University of Amsterdam at INEX 2010: Ad hoc and Book Tracks Jaap Kamps 1,2 and Marijn Koolen 1 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Faculty of Science,

More information

Overview of the INEX 2009 Link the Wiki Track

Overview of the INEX 2009 Link the Wiki Track Overview of the INEX 2009 Link the Wiki Track Wei Che (Darren) Huang 1, Shlomo Geva 2 and Andrew Trotman 3 Faculty of Science and Technology, Queensland University of Technology, Brisbane, Australia 1,

More information

A Voting Method for XML Retrieval

A Voting Method for XML Retrieval A Voting Method for XML Retrieval Gilles Hubert 1 IRIT/SIG-EVI, 118 route de Narbonne, 31062 Toulouse cedex 4 2 ERT34, Institut Universitaire de Formation des Maîtres, 56 av. de l URSS, 31400 Toulouse

More information

Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database

Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database Jovan Pehcevski, James Thom, Anne-Marie Vercoustre To cite this version: Jovan Pehcevski, James Thom, Anne-Marie Vercoustre.

More information

Identifying and Ranking Relevant Document Elements

Identifying and Ranking Relevant Document Elements Identifying and Ranking Relevant Document Elements Andrew Trotman and Richard A. O Keefe Department of Computer Science University of Otago Dunedin, New Zealand andrew@cs.otago.ac.nz, ok@otago.ac.nz ABSTRACT

More information

Evaluating the effectiveness of content-oriented XML retrieval

Evaluating the effectiveness of content-oriented XML retrieval Evaluating the effectiveness of content-oriented XML retrieval Norbert Gövert University of Dortmund Norbert Fuhr University of Duisburg-Essen Gabriella Kazai Queen Mary University of London Mounia Lalmas

More information

CADIAL Search Engine at INEX

CADIAL Search Engine at INEX CADIAL Search Engine at INEX Jure Mijić 1, Marie-Francine Moens 2, and Bojana Dalbelo Bašić 1 1 Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia {jure.mijic,bojana.dalbelo}@fer.hr

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 13 Structured Text Retrieval with Mounia Lalmas Introduction Structuring Power Early Text Retrieval Models Evaluation Query Languages Structured Text Retrieval, Modern

More information

Articulating Information Needs in

Articulating Information Needs in Articulating Information Needs in XML Query Languages Jaap Kamps, Maarten Marx, Maarten de Rijke and Borkur Sigurbjornsson Gonçalo Antunes Motivation Users have increased access to documents with additional

More information

Evaluation Metrics. Jovan Pehcevski INRIA Rocquencourt, France

Evaluation Metrics. Jovan Pehcevski INRIA Rocquencourt, France Evaluation Metrics Jovan Pehcevski INRIA Rocquencourt, France jovan.pehcevski@inria.fr Benjamin Piwowarski Yahoo! Research Latin America bpiwowar@yahoo-inc.com SYNONYMS Performance metrics; Evaluation

More information

Comparative Analysis of Clicks and Judgments for IR Evaluation

Comparative Analysis of Clicks and Judgments for IR Evaluation Comparative Analysis of Clicks and Judgments for IR Evaluation Jaap Kamps 1,3 Marijn Koolen 1 Andrew Trotman 2,3 1 University of Amsterdam, The Netherlands 2 University of Otago, New Zealand 3 INitiative

More information

Informativeness for Adhoc IR Evaluation:

Informativeness for Adhoc IR Evaluation: Informativeness for Adhoc IR Evaluation: A measure that prevents assessing individual documents Romain Deveaud 1, Véronique Moriceau 2, Josiane Mothe 3, and Eric SanJuan 1 1 LIA, Univ. Avignon, France,

More information

Structured Queries in XML Retrieval

Structured Queries in XML Retrieval Structured Queries in XML Retrieval Jaap Kamps 1,2 Maarten Marx 2 Maarten de Rijke 2 Börkur Sigurbjörnsson 2 1 Archives and Information Studies, University of Amsterdam, Amsterdam, The Netherlands 2 Informatics

More information

A Fusion Approach to XML Structured Document Retrieval

A Fusion Approach to XML Structured Document Retrieval A Fusion Approach to XML Structured Document Retrieval Ray R. Larson School of Information Management and Systems University of California, Berkeley Berkeley, CA 94720-4600 ray@sims.berkeley.edu 17 April

More information

Overview of the INEX 2008 Ad Hoc Track

Overview of the INEX 2008 Ad Hoc Track Overview of the INEX 2008 Ad Hoc Track Jaap Kamps 1, Shlomo Geva 2, Andrew Trotman 3, Alan Woodley 2, and Marijn Koolen 1 1 University of Amsterdam, Amsterdam, The Netherlands {kamps,m.h.a.koolen}@uva.nl

More information

Specificity Aboutness in XML Retrieval

Specificity Aboutness in XML Retrieval Specificity Aboutness in XML Retrieval Tobias Blanke and Mounia Lalmas Department of Computing Science, University of Glasgow tobias.blanke@dcs.gla.ac.uk mounia@acm.org Abstract. This paper presents a

More information

Overview of the INEX 2008 Ad Hoc Track

Overview of the INEX 2008 Ad Hoc Track Overview of the INEX 2008 Ad Hoc Track Jaap Kamps 1, Shlomo Geva 2, Andrew Trotman 3, Alan Woodley 2, and Marijn Koolen 1 1 University of Amsterdam, Amsterdam, The Netherlands {kamps,m.h.a.koolen}@uva.nl

More information

Seven years of INEX interactive retrieval experiments lessons and challenges

Seven years of INEX interactive retrieval experiments lessons and challenges Seven years of INEX interactive retrieval experiments lessons and challenges Ragnar Nordlie and Nils Pharo Oslo and Akershus University College of Applied Sciences Postboks 4 St. Olavs plass, N-0130 Oslo,

More information

Evaluating the eectiveness of content-oriented XML retrieval methods

Evaluating the eectiveness of content-oriented XML retrieval methods Evaluating the eectiveness of content-oriented XML retrieval methods Norbert Gövert (norbert.goevert@uni-dortmund.de) University of Dortmund, Germany Norbert Fuhr (fuhr@uni-duisburg.de) University of Duisburg-Essen,

More information

Controlling Overlap in Content-Oriented XML Retrieval

Controlling Overlap in Content-Oriented XML Retrieval Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science, University of Waterloo, Canada claclark@plg.uwaterloo.ca ABSTRACT The direct application of standard

More information

XML Retrieval More Efficient Using Compression Technique

XML Retrieval More Efficient Using Compression Technique XML Retrieval More Efficient Using Compression Technique Tanakorn Wichaiwong and Chuleerat Jaruskulchai Abstract In this paper, we report experimental results of our approach for retrieval large-scale

More information

XPath Inverted File for Information Retrieval

XPath Inverted File for Information Retrieval XPath Inverted File for Information Retrieval Shlomo Geva Centre for Information Technology Innovation Faculty of Information Technology Queensland University of Technology GPO Box 2434 Brisbane Q 4001

More information

INEX REPORT. Report on INEX 2011

INEX REPORT. Report on INEX 2011 INEX REPORT Report on INEX 2011 P. Bellot T. Chappell A. Doucet S. Geva J. Kamps G. Kazai M. Koolen M. Landoni M. Marx V. Moriceau J. Mothe G. Ramírez M. Sanderson E. Sanjuan F. Scholer X. Tannier M. Theobald

More information

Overview of the INEX 2010 Ad Hoc Track

Overview of the INEX 2010 Ad Hoc Track Overview of the INEX 2010 Ad Hoc Track Paavo Arvola 1 Shlomo Geva 2, Jaap Kamps 3, Ralf Schenkel 4, Andrew Trotman 5, and Johanna Vainio 1 1 University of Tampere, Tampere, Finland paavo.arvola@uta.fi,

More information

THE weighting functions of information retrieval [1], [2]

THE weighting functions of information retrieval [1], [2] A Comparative Study of MySQL Functions for XML Element Retrieval Chuleerat Jaruskulchai, Member, IAENG, and Tanakorn Wichaiwong, Member, IAENG Abstract Due to the ever increasing information available

More information

The overlap problem in content-oriented XML retrieval evaluation

The overlap problem in content-oriented XML retrieval evaluation The overlap problem in content-oriented XML retrieval evaluation Gabriella Kazai Queen Mary University of London London, E1 4NS UK gabs@dcs.qmul.ac.uk Mounia Lalmas Queen Mary University of London London,

More information

XML Retrieval: A Survey

XML Retrieval: A Survey ISSN (e): 2250 3005 Vol, 04 Issue, 8 August 2014 International Journal of Computational Engineering Research (IJCER) XML Retrieval: A Survey Hasan Naderi 1, Mohammad Nazari Farokhi 2, Nasredin Niazy 3,

More information

Retrieval Quality vs. Effectiveness of Relevance-Oriented Search in XML Documents

Retrieval Quality vs. Effectiveness of Relevance-Oriented Search in XML Documents Retrieval Quality vs. Effectiveness of Relevance-Oriented Search in XML Documents Norbert Fuhr University of Duisburg-Essen Mohammad Abolhassani University of Duisburg-Essen Germany Norbert Gövert University

More information

Building Test Collections. Donna Harman National Institute of Standards and Technology

Building Test Collections. Donna Harman National Institute of Standards and Technology Building Test Collections Donna Harman National Institute of Standards and Technology Cranfield 2 (1962-1966) Goal: learn what makes a good indexing descriptor (4 different types tested at 3 levels of

More information

INEX a Broadly Accepted Data Set for XML Database Processing?

INEX a Broadly Accepted Data Set for XML Database Processing? INEX a Broadly Accepted Data Set for XML Database Processing? Pavel Loupal and Michal Valenta Dept. of Computer Science and Engineering FEE, Czech Technical University Karlovo náměstí 13, 121 35 Praha

More information

Applying the IRStream Retrieval Engine to INEX 2003

Applying the IRStream Retrieval Engine to INEX 2003 Applying the IRStream Retrieval Engine to INEX 2003 Andreas Henrich, Volker Lüdecke University of Bamberg D-96045 Bamberg, Germany {andreas.henrich volker.luedecke}@wiai.unibamberg.de Günter Robbert University

More information

Plan for today. CS276B Text Retrieval and Mining Winter Vector spaces and XML. Text-centric XML retrieval. Vector spaces and XML

Plan for today. CS276B Text Retrieval and Mining Winter Vector spaces and XML. Text-centric XML retrieval. Vector spaces and XML CS276B Text Retrieval and Mining Winter 2005 Plan for today Vector space approaches to XML retrieval Evaluating text-centric retrieval Lecture 15 Text-centric XML retrieval Documents marked up as XML E.g.,

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

CoXML: A Cooperative XML Query Answering System

CoXML: A Cooperative XML Query Answering System CoXML: A Cooperative XML Query Answering System Shaorong Liu 1 and Wesley W. Chu 2 1 IBM Silicon Valley Lab, San Jose, CA, 95141, USA shaorongliu@gmail.com 2 UCLA Computer Science Department, Los Angeles,

More information

Phrase Detection in the Wikipedia

Phrase Detection in the Wikipedia Phrase Detection in the Wikipedia Miro Lehtonen 1 and Antoine Doucet 1,2 1 Department of Computer Science P. O. Box 68 (Gustaf Hällströmin katu 2b) FI 00014 University of Helsinki Finland {Miro.Lehtonen,Antoine.Doucet}

More information

An XML-IR-DB Sandwich: Is it Better With an Algebra in Between?

An XML-IR-DB Sandwich: Is it Better With an Algebra in Between? An XML-IR-DB Sandwich: Is it Better With an Algebra in Between? Vojkan Mihajlović Djoerd Hiemstra Henk Ernst Blok Peter M. G. Apers CTIT, University of Twente P.O. Box 217, 7500AE Enschede, The Netherlands

More information

A Comparative Study Weighting Schemes for Double Scoring Technique

A Comparative Study Weighting Schemes for Double Scoring Technique , October 19-21, 2011, San Francisco, USA A Comparative Study Weighting Schemes for Double Scoring Technique Tanakorn Wichaiwong Member, IAENG and Chuleerat Jaruskulchai Abstract In XML-IR systems, the

More information

Retrieval Evaluation

Retrieval Evaluation Retrieval Evaluation - Reference Collections Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, Chapter

More information

INEX REPORT. Report on INEX 2012

INEX REPORT. Report on INEX 2012 INEX REPORT Report on INEX 2012 P. Bellot T. Chappell A. Doucet S. Geva S. Gurajada J. Kamps G. Kazai M. Koolen M. Landoni M. Marx A. Mishra V. Moriceau J. Mothe M. Preminger G. Ramírez M. Sanderson E.

More information

Focussed Structured Document Retrieval

Focussed Structured Document Retrieval Focussed Structured Document Retrieval Gabrialla Kazai, Mounia Lalmas and Thomas Roelleke Department of Computer Science, Queen Mary University of London, London E 4NS, England {gabs,mounia,thor}@dcs.qmul.ac.uk,

More information

Aggregation for searching complex information spaces. Mounia Lalmas

Aggregation for searching complex information spaces. Mounia Lalmas Aggregation for searching complex information spaces Mounia Lalmas mounia@acm.org Outline Document Retrieval Focused Retrieval Aggregated Retrieval Complexity of the information space (s) INEX - INitiative

More information

Book Recommendation based on Social Information

Book Recommendation based on Social Information Book Recommendation based on Social Information Chahinez Benkoussas and Patrice Bellot LSIS Aix-Marseille University chahinez.benkoussas@lsis.org patrice.bellot@lsis.org Abstract : In this paper, we present

More information

TopX AdHoc and Feedback Tasks

TopX AdHoc and Feedback Tasks TopX AdHoc and Feedback Tasks Martin Theobald, Andreas Broschart, Ralf Schenkel, Silvana Solomon, and Gerhard Weikum Max-Planck-Institut für Informatik Saarbrücken, Germany http://www.mpi-inf.mpg.de/departments/d5/

More information

A Universal Model for XML Information Retrieval

A Universal Model for XML Information Retrieval A Universal Model for XML Information Retrieval Maria Izabel M. Azevedo 1, Lucas Pantuza Amorim 2, and Nívio Ziviani 3 1 Department of Computer Science, State University of Montes Claros, Montes Claros,

More information

Structure/XML Retrieval

Structure/XML Retrieval Structure/XML Retrieval Mounia Lalmas Department of Computer Science Queen Mary University of London ESSIR 2005 Outline qpart I: Structured document retrieval / XML retrieval qpart II: Evaluating structured

More information

Sound ranking algorithms for XML search

Sound ranking algorithms for XML search Sound ranking algorithms for XML search Djoerd Hiemstra 1, Stefan Klinger 2, Henning Rode 3, Jan Flokstra 1, and Peter Apers 1 1 University of Twente, 2 University of Konstanz, and 3 CWI hiemstra@cs.utwente.nl,

More information

Improving Results for the INEX 2009 and 2010 Relevant in Context Tasks

Improving Results for the INEX 2009 and 2010 Relevant in Context Tasks Improving Results for the INEX 2009 and 2010 Relevant in Context Tasks A thesis submitted to the faculty of the graduate school of the University of Minnesota by Reena Rachel Narendravarapu In partial

More information

A CONTENT-BASED APPROACH TO RELEVANCE FEEDBACK IN XML-IR FOR CONTENT AND STRUCTURE QUERIES

A CONTENT-BASED APPROACH TO RELEVANCE FEEDBACK IN XML-IR FOR CONTENT AND STRUCTURE QUERIES A CONTENT-BASED APPROACH TO RELEVANCE FEEDBACK IN XML-IR FOR CONTENT AND STRUCTURE QUERIES Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete and Carlos Martín-Dancausa Departamento de Ciencias de

More information

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Information Processing and Management 43 (2007) 1044 1058 www.elsevier.com/locate/infoproman Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Anselm Spoerri

More information

Overview of the INEX 2007 Book Search Track (BookSearch 07)

Overview of the INEX 2007 Book Search Track (BookSearch 07) Overview of the INEX 2007 Book Search Track (BookSearch 07) Gabriella Kazai 1 and Antoine Doucet 2 1 Microsoft Research Cambridge, United Kingdom gabkaz@microsoft.com 2 University of Caen, France doucet@info.unicaen.fr

More information

Efficient, Effective and Flexible XML Retrieval Using Summaries

Efficient, Effective and Flexible XML Retrieval Using Summaries Efficient, Effective and Flexible XML Retrieval Using Summaries M. S. Ali, Mariano Consens, Xin Gu, Yaron Kanza, Flavio Rizzolo, and Raquel Stasiu University of Toronto {sali, consens, xgu, yaron, flavio,

More information

A New Measure of the Cluster Hypothesis

A New Measure of the Cluster Hypothesis A New Measure of the Cluster Hypothesis Mark D. Smucker 1 and James Allan 2 1 Department of Management Sciences University of Waterloo 2 Center for Intelligent Information Retrieval Department of Computer

More information

DELOS WP7: Evaluation

DELOS WP7: Evaluation DELOS WP7: Evaluation Claus-Peter Klas Univ. of Duisburg-Essen, Germany (WP leader: Norbert Fuhr) WP Objectives Enable communication between evaluation experts and DL researchers/developers Continue existing

More information

Using Coherence-based Measures to Predict Query Difficulty

Using Coherence-based Measures to Predict Query Difficulty Using Coherence-based Measures to Predict Query Difficulty Jiyin He, Martha Larson, and Maarten de Rijke ISLA, University of Amsterdam {jiyinhe,larson,mdr}@science.uva.nl Abstract. We investigate the potential

More information

Information Retrieval

Information Retrieval Information Retrieval Lecture 7 - Evaluation in Information Retrieval Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 29 Introduction Framework

More information

Mobile Applications of Focused Link Discovery

Mobile Applications of Focused Link Discovery Mobile Applications of Focused Link Discovery Darren Wei-Che Huang Shlomo Geva, Kelly Y. Itakura Queensland University of Technology Brisbane, Australia {s.geva,kelly.itakura,w2.huang}@qut.edu.au Abstract

More information

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007 Information Retrieval Lecture 7 - Evaluation in Information Retrieval Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1 / 29 Introduction Framework

More information

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl

More information

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements

More information

Citation for published version (APA): He, J. (2011). Exploring topic structure: Coherence, diversity and relatedness

Citation for published version (APA): He, J. (2011). Exploring topic structure: Coherence, diversity and relatedness UvA-DARE (Digital Academic Repository) Exploring topic structure: Coherence, diversity and relatedness He, J. Link to publication Citation for published version (APA): He, J. (211). Exploring topic structure:

More information

XML Retrieval Using Pruned Element-Index Files

XML Retrieval Using Pruned Element-Index Files XML Retrieval Using Pruned Element-Index Files Ismail Sengor Altingovde, Duygu Atilgan, and Özgür Ulusoy Department of Computer Engineering, Bilkent University, Ankara, Turkey {ismaila,atilgan,oulusoy}@cs.bilkent.edu.tr

More information

DCU and 2010: Ad-hoc and Data-Centric tracks

DCU and 2010: Ad-hoc and Data-Centric tracks DCU and ISI@INEX 2010: Ad-hoc and Data-Centric tracks Debasis Ganguly 1, Johannes Leveling 1, Gareth J. F. Jones 1 Sauparna Palchowdhury 2, Sukomal Pal 2, and Mandar Mitra 2 1 CNGL, School of Computing,

More information

Overview of the INitiative for the Evaluation of XML retrieval (INEX) 2002

Overview of the INitiative for the Evaluation of XML retrieval (INEX) 2002 Overview of the INitiative for the Evaluation of XML retrieval (INEX) 2002 Norbert Gövert University of Dortmund Germany goevert@ls6.cs.uni-dortmund.de Gabriella Kazai Queen Mary University of London United

More information

Assignment No. 1. Abdurrahman Yasar. June 10, QUESTION 1

Assignment No. 1. Abdurrahman Yasar. June 10, QUESTION 1 COMPUTER ENGINEERING DEPARTMENT BILKENT UNIVERSITY Assignment No. 1 Abdurrahman Yasar June 10, 2014 1 QUESTION 1 Consider the following search results for two queries Q1 and Q2 (the documents are ranked

More information

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Evaluation" Evaluation is key to building

More information

An Active Learning Approach to Efficiently Ranking Retrieval Engines

An Active Learning Approach to Efficiently Ranking Retrieval Engines Dartmouth College Computer Science Technical Report TR3-449 An Active Learning Approach to Efficiently Ranking Retrieval Engines Lisa A. Torrey Department of Computer Science Dartmouth College Advisor:

More information

INEX 2006 Workshop Pre-Proceedings

INEX 2006 Workshop Pre-Proceedings INEX 2006 Workshop Pre-Proceedings December 18-20, 2006 Schloss Dagstuhl International Conference and Research Center for Computer Science http://inex.is.informatik.uni-duisburg.de/2006/ Edited by Norbert

More information

INEX REPORT. Report on INEX 2008

INEX REPORT. Report on INEX 2008 INEX REPORT Report on INEX 2008 Gianluca Demartini Ludovic Denoyer Antoine Doucet Khairun Nisa Fachry Patrick Gallinari Shlomo Geva Wei-Che Huang Tereza Iofciu Jaap Kamps Gabriella Kazai Marijn Koolen

More information

Overview of the INEX 2007 Book Search Track (BookSearch 07)

Overview of the INEX 2007 Book Search Track (BookSearch 07) INEX REPORT Overview of the INEX 2007 Book Search Track (BookSearch 07) Gabriella Kazai 1 and Antoine Doucet 2 1 Microsoft Research Cambridge, United Kingdom gabkaz@microsoft.com 2 University of Caen,

More information

Report on the SIGIR 2008 Workshop on Focused Retrieval

Report on the SIGIR 2008 Workshop on Focused Retrieval WORKSHOP REPORT Report on the SIGIR 2008 Workshop on Focused Retrieval Jaap Kamps 1 Shlomo Geva 2 Andrew Trotman 3 1 University of Amsterdam, Amsterdam, The Netherlands, kamps@uva.nl 2 Queensland University

More information

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,

More information

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback RMIT @ TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback Ameer Albahem ameer.albahem@rmit.edu.au Lawrence Cavedon lawrence.cavedon@rmit.edu.au Damiano

More information

Configurable Indexing and Ranking for XML Information Retrieval

Configurable Indexing and Ranking for XML Information Retrieval Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCL Computer Science Department, Los ngeles, C, US 90095 {sliu, zou, wwc}@cs.ucla.edu BSTRCT

More information

INEX REPORT. Report on INEX 2013

INEX REPORT. Report on INEX 2013 INEX REPORT Report on INEX 2013 P. Bellot A. Doucet S. Geva S. Gurajada J. Kamps G. Kazai M. Koolen A. Mishra V. Moriceau J. Mothe M. Preminger E. SanJuan R. Schenkel X. Tannier M. Theobald M. Trappett

More information

Applying the Annotation View on Messages for Discussion Search

Applying the Annotation View on Messages for Discussion Search Applying the Annotation View on Messages for Discussion Search Ingo Frommholz University of Duisburg-Essen ingo.frommholz@uni-due.de In this paper we present our runs for the TREC 2005 Enterprise Track

More information

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl

More information

TREC Legal Track Learning Task Final Guidelines (revision 0.0)

TREC Legal Track Learning Task Final Guidelines (revision 0.0) TREC Legal Track Learning Task Final Guidelines (revision 0.0) Gordon V. Cormack Maura R. Grossman Abstract In the learning task, participants are given a seed set of documents from a larger collection

More information

The TREC 2005 Terabyte Track

The TREC 2005 Terabyte Track The TREC 2005 Terabyte Track Charles L. A. Clarke University of Waterloo claclark@plg.uwaterloo.ca Ian Soboroff NIST ian.soboroff@nist.gov Falk Scholer RMIT fscholer@cs.rmit.edu.au 1 Introduction The Terabyte

More information

Tilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval

Tilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Tilburg University Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Publication date: 2006 Link to publication Citation for published

More information

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Alejandro Bellogín 1,2, Thaer Samar 1, Arjen P. de Vries 1, and Alan Said 1 1 Centrum Wiskunde

More information

A Framework for the Theoretical Evaluation of XML Retrieval

A Framework for the Theoretical Evaluation of XML Retrieval Journal of the American Society for Information Science and Technology A Framework for the Theoretical Evaluation of XML Retrieval Tobias Blanke King s College London Centre for e-research, 26-29 Drury

More information

XML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson

XML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Manning, Raghavan, and Schütze http://www.informationretrieval.org OVERVIEW Introduction Basic XML Concepts Challenges

More information

Heading-aware Snippet Generation for Web Search

Heading-aware Snippet Generation for Web Search Heading-aware Snippet Generation for Web Search Tomohiro Manabe and Keishi Tajima Graduate School of Informatics, Kyoto Univ. {manabe@dl.kuis, tajima@i}.kyoto-u.ac.jp Web Search Result Snippets Are short

More information

TopX & XXL at INEX 2005

TopX & XXL at INEX 2005 TopX & XXL at INEX 2005 Martin Theobald, Ralf Schenkel, and Gerhard Weikum Max-Planck Institute für Informatik, Saarbrücken, Germany {mtb,schenkel,weikum}@mpi-inf.mpg.de Abstract. We participated with

More information