An Exploration of Query Term Deletion

Similar documents
University of Delaware at Diversity Task of Web Track 2010

Axiomatic Approaches to Information Retrieval - University of Delaware at TREC 2009 Million Query and Web Tracks

A Study of Pattern-based Subtopic Discovery and Integration in the Web Track

An Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments

On Duplicate Results in a Search Session

RMIT University at TREC 2006: Terabyte Track

Improving Difficult Queries by Leveraging Clusters in Term Graph

Reducing Redundancy with Anchor Text and Spam Priors

Chapter 8. Evaluating Search Engine

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"

Modeling Rich Interac1ons in Session Search Georgetown University at TREC 2014 Session Track

A Deep Relevance Matching Model for Ad-hoc Retrieval

An Incremental Approach to Efficient Pseudo-Relevance Feedback

On Duplicate Results in a Search Session

Query Likelihood with Negative Query Generation

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

Modern Retrieval Evaluations. Hongning Wang

Semi-Supervised Clustering with Partial Background Information

Northeastern University in TREC 2009 Web Track

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]

A Study of Methods for Negative Relevance Feedback

Robust Relevance-Based Language Models

A New Technique to Optimize User s Browsing Session using Data Mining

Pseudo-Relevance Feedback and Title Re-Ranking for Chinese Information Retrieval

Webis at the TREC 2012 Session Track

Interactive segmentation, Combinatorial optimization. Filip Malmberg

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University

Exploring the Query Expansion Methods for Concept Based Representation

Effective Structured Query Formulation for Session Search

Northeastern University in TREC 2009 Million Query Track

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014

Towards Privacy-Preserving Evaluation for Information Retrieval Models over Industry Data Sets

WebSci and Learning to Rank for IR

Using a Medical Thesaurus to Predict Query Difficulty

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

Increasing Stability of Result Organization for Session Search

Boolean Model. Hongning Wang

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

iarabicweb16: Making a Large Web Collection More Accessible for Research

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

Risk Minimization and Language Modeling in Text Retrieval Thesis Summary

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

Finding Relevant Information of Certain Types from Enterprise Data

Predicting Query Performance on the Web

Learning to Reweight Terms with Distributed Representations

Automatically Generating Queries for Prior Art Search

Domain Specific Search Engine for Students

Context-Sensitive Information Retrieval Using Implicit Feedback

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification

The Use of Domain Modelling to Improve Performance Over a Query Session

Similarity search in multimedia databases

IRCE at the NTCIR-12 IMine-2 Task

The University of Amsterdam at the CLEF 2008 Domain Specific Track

Context-Based Topic Models for Query Modification

An Attempt to Identify Weakest and Strongest Queries

Automatic Term Mismatch Diagnosis for Selective Query Expansion

Static Pruning of Terms In Inverted Files

Cengage Learning at TREC 2010 Session Track Benjamin King University of Michigan 2260 Hayward Street Ann Arbor, MI USA

This literature review provides an overview of the various topics related to using implicit

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track

arxiv: v3 [cs.dl] 23 Sep 2017

Feature selection. LING 572 Fei Xia

BRANCH COVERAGE BASED TEST CASE PRIORITIZATION

Exploring Reductions for Long Web Queries

Concept Based Tie-breaking and Maximal Marginal Relevance Retrieval in Microblog Retrieval

Query Difficulty Prediction for Contextual Image Retrieval

Retrieval Evaluation. Hongning Wang

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Combinatorial optimization and its applications in image Processing. Filip Malmberg

University of Santiago de Compostela at CLEF-IP09

A Study of Collection-based Features for Adapting the Balance Parameter in Pseudo Relevance Feedback

HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining

M erg in g C lassifiers for Im p ro v ed In fo rm a tio n R e triev a l

Federated Search. Jaime Arguello INLS 509: Information Retrieval November 21, Thursday, November 17, 16

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo

SQL Server 2008 Consolidation

Automatic Boolean Query Suggestion for Professional Search

A HYBRID METHOD FOR SIMULATION FACTOR SCREENING. Hua Shen Hong Wan

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

ResPubliQA 2010

IALP 2016 Improving the Effectiveness of POI Search by Associated Information Summarization

Final Report - Smart and Fast Sorting

A NEW PERFORMANCE EVALUATION TECHNIQUE FOR WEB INFORMATION RETRIEVAL SYSTEMS

A Multiple-stage Approach to Re-ranking Clinical Documents

Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming

Citation for published version (APA): He, J. (2011). Exploring topic structure: Coherence, diversity and relatedness

Evaluation of Web Search Engines with Thai Queries

Search Costs vs. User Satisfaction on Mobile

VIDEO SEARCHING AND BROWSING USING VIEWFINDER

Precision-Oriented Query Facet Extraction

Latent Semantic Indexing

Improving Synoptic Querying for Source Retrieval

Reducing Click and Skip Errors in Search Result Ranking

Window Extraction for Information Retrieval

On Using Machine Learning for Logic BIST

Instructor: Stefan Savev

Information Retrieval

Transcription:

An Exploration of Query Term Deletion Hao Wu and Hui Fang University of Delaware, Newark DE 19716, USA haowu@ece.udel.edu, hfang@ece.udel.edu Abstract. Many search users fail to formulate queries that can well represent information needs. As a result, they often need to reformulate the queries in order to find satisfying results. Query term deletion is one of the commonly used strategies for query reformulation. In this paper, we studytheprobleminthecontextoftrecsession track.wefirstshow that the reformulated queries used in the session track would lead to less satisfying search results than the original queries, which suggets that the current simulation strategy used in the session track for term deletion might not be a good method. Moreover, we also examine the potential of query term deletion, propose two simple strategies that automatically select terms for deletion, and discuss why the simulation strategy for term deletion fails to reformulate better queries. 1 Introduction It is often difficult for search users to describe their information needs using keyword queries. Given an information need of a user, the first query formulated by the user may not be effective to retrieve many relevant documents, and the user often needs to reformulate queries in order to find more satisfying search results. Existing studies on query log analysis show that around half of the queries are re-formulated by users [2, 1]. Query deletion is one of the commonly used query reformulation strategies. Intuitively, query deletion can benefit over-specified queries so that the returned search results would have greater coverage of relevant information. Moreover, query deletion can also benefit ill-specified queries so that non-relevant documents that are attracted by noisy query terms would not be returned. A common belief is that search users are capable of selecting terms for deletion and the reformulated queries would lead to more satisfying search results. Thus, existing work on query deletion mainly focused on how to predict which terms would be deleted by search users during query reformulation [1]. In this paper, we focus on the problem of query term deletion in the context of TREC query session track. Specifically, we first analyze the query sessions in the TREC 2010 query session track, and compare the retrieval performance of the original queries with those of the reformulated queries through query term deletion. We find that query reformulation based on term deletion hurt the performance. The failure analysis suggests that the simulation used by the session track tends to delete more terms than necessary.

We also conduct a set of experiment to study the potential of query term deletion. Specifically, we aim to answer the following question: if the decision on which terms to be deleted could be made correctly, would it be possible to improve the retrieval performance with the reformulated queries? Experiment results suggest that query term deletion, if done appropriately, can significantly improve the retrieval performance. Motivated by the potential of query term deletion on improving retrieval performance, we explore two simple strategies that aim to automatically delete query terms. The first strategy aims to delete less important terms while the second one aims to delete query terms that over-specify the information needs. Empirical evaluation shows that both strategies are effective in improving retrieval performance. 2 Analysis of Term Deletion in Query Sessions In this section, we conduct experiments to study whether search users are capable of selecting appropriate query terms for deletion. We use the collection provided in TREC 2010 session track [2]. The data set we used is ClueWeb 09 Category B collection.there are150topicsand eachofthem havetwoqueries:oneis original query and the other is reformulated query. Out of these queries, 26 topics are reformulated through query term deletion and 22 have relevant judgments. Our experiments are based on these 22 topics. The retrieval function is Dirichlet Prior method [3] with µ = 500. The retrieval performance is evaluated based on ndcg@20. Table 1. An example for ideal query term deleting Candidate queries after term deletion Retrieval Performance (ndcg@20) disneyland 0.026 special 0.000 offers 0.000 disneyland special 0.017 disneyland offers 0.166 special offers 0.000 Table 2. Comparison among different versions of queries Queries Performance(nDCG@20) Average Query Length Original 0.160 4.455 Reformulated 0.138 2.455 Ideal 0.252 3.091 We conduct two sets of experiments. First, we compare the retrieval performance for the original and reformulated queries in order to see whether simulated search users are good at term deletion. Second, we examine the potential of query term deletion. The goal is to study, if done appropriately, whether query term deletion can lead to better retrieval performance. Specifically, for every query, we generate all possible candidate queries after term deletion, evaluate the retrieval performance for each of the queries, and choose the one with best performance.

For example, consider query disneyland special offers, the value of ndcg@20 is 0.103. Table 1 lists all the possible candidate queries after term deletion and their corresponding performance. The candidate query with best performance is referred to as the ideal query in the paper. Table 2 shows the performance comparison between using the original queries, reformulated queries and ideal queries. We can see that simulated search users on average would delete two query terms, which is larger than that for ideal queries. Moreover, we make two interesting observations. The retrieval performance for the reformulated queries is worse than that for the original queries. This observation suggests that simulated users may not capable of deleting query terms that hinder retrieval performance. Thus, the simulation strategy used in the session track might not be a good one. If we could have an oracle that always make the correct decisions on which terms to delete, the retrieval performance can be improved significantly. Clearly, it would be interesting to study what are effective strategies for query term deletion. 3 Methods for Automatic Term Deletion In this section, we study the methods for automatic query term deletion. When formulating a query, users may include terms that are less effective in identifying relevant information. Thus, one possible solution is to delete query terms based on their importance, which is referred to as Imp. One common measure of term importance is the IDF value of a term, so we can delete query terms with lower IDF values. Over-specifying an information need is a common reason for query term deletion. Thus, we need to delete terms that causes over-specification without changing the information need. One possible solution is to delete terms that are more semantically related to other terms, which is referred to as Sim. The rational is that deleting this kind of terms would hurt the performance less because the related terms are still able to bring a lot of relevant information. Following previous studies [4], we compute the term semantic similarities based on the expected mutual information of a term pair over the document collection. Thus, in the second method, we delete query terms with higher average mutual information values. We have discussed two strategies for query term deletion. One is to keep deleting query terms with lowest IDF, and the other is to keep deleting terms with highest average MI values. The remaining challenge is to decide when to stop term deletion. We explore two strategies. In the first strategy, we use a threshold as a cut-off. It means that we delete terms whose IDF values are lower than a threshold or delete terms whose MI values are higher than a threshold. In the second strategy, we use the percentage of the number of original query terms as a cut-off. It means that we stop term deletion when the percentage of the query terms that are deleted is larger than the cut-off. Since we have two term deleting strategies and two stop criteria, there are four methods to be compared. Table 3 shows the results of performance comparison.

Table 3. Performance Comparison Imp + threshold Cut-offs IDF < 0(no deletion) IDF < 0.5 IDF < 1.0 IDF < 1.5 IDF < 2.0 ndcg@20 0.160 0.168 0.167 0.154 0.148 Imp + percentage Cut-offs 0% (no deletion) 20% terms 30% terms 40% terms 50% terms ndcg@20 0.160 0.177 0.176 0.145 0.134 Sim + threshold Cut-offs MI > (no deletion) MI > 2.5 MI > 2.0 MI > 1.5 MI > 1.0 ndcg@20 0.160 0.160 0.163 0.179 0.098 Sim + percentage Cut-offs 0% (no deletion) 20% terms 30% terms 40% terms 50% terms ndcg@20 0.160 0.163 0.156 0.150 0.106 We see that both term deletion strategies can improve retrieval performance if we choose proper parameters. When deleting terms based on term importance (i.e., IDF), using the percentage of query terms as a cut-off is more effective. The reason is that longer queries tend to include more less important terms. Thus, if we delete terms based on their importance, the number of deleted terms should be proportional to the original query length, i.e., we need to delete more terms for longer queries and fewer terms for shorter queries. When deleting terms based on the semantic similarities (i.e., MI), using the threshold yield to better performance, because not all queries are over-specified and it is unfair to delete the same percentage of terms for every query. In summary, deleting 20% of the original query terms seems to always improve the retrieval performance. This is consistent with the results in Table 2. In particular, we suggest to delete 20% of the original query terms with lowest IDF values because it gives good performance and the parameter value is not collection-dependent. 4 Why did simulated users fail? In this section, we discuss why the simulation used in the session track fails to choose appropriate terms for deletion. Table 4. Comparison between reformulated queries and the ideal queries terms deleted by the simulation terms deleted by oracle Total Number 44 30 Average IDF 2.484 2.373 Average MI 1.150 1.335 We first look into what kind of terms are deleted by the simulation during reformulation and what kind of terms should be deleted. Table 4 shows the term statistics in these two situations. We see that simulated users also tend to delete terms with lower IDF and high MI values. But simulated users often delete more terms than necessary. We then conduct per-query analysis to compare query formulated by simulation with the ideal queries, and classify simulated user mistakes into three categories: (1) over-deletion: delete more terms than necessary; (2) under-deletion: delete fewer terms than necessary; (3) error-deletion: delete wrong terms. Table 5 shows our summary about these three error types. It is interesting to see that the number of over-deletion errors is high. Since it is much easier to make errordeletion than over-deletion or under-deletion, it shows that the simulated users

Table 5. Analysis of Errors Error types Num. of queries Examples Original queries Ideal queries Users reformulated queries Over-deletion 9 diabetes education diabetes education diabetes education videos books videos books Under-deletion 1 windows defender reports defender problems windows defender software problems problems Error-deletion 8 wall hung toilet kohler hung toilet kohler wall hung toilet has ability to differentiate the usefulness of query terms, but they delete terms too aggressively. To summarize our finding so far, the simulation is good at deciding which terms to delete but they tend to delete more terms than necessary. This is probably the main reason why the reformulated queries perform worse then the original ones. To further verify our hypothesis, we conduct another set of experiments. We force the system to delete the same number of terms as the simulation s deletion for every query. The results are shown in Table 6. It is clear that the human are smarter at selecting which terms to delete than the system based on the two proposed term deletion methods. Thus, we expect that if simulated search users are more conservative in deleting query terms, the reformulated queries should lead to more satisfying search results. Table 6. Performance comparison when deleting the same number of terms as the simulation did for every query simulation s deletion deleting terms with lowest IDF deleting terms with highest MI ndcg@20 0.138 0.107 0.120 5 Conclusions and Future Work We study the problem of query deletion in the context of TREC query session track. We propose two simple strategies for automatic term deletion, and experiment results show that both methods are effective if the parameter values are set appropriately. We also find that the poor query reformulation of simulated sessions is caused by the fact that the simulation tends to delete more terms than necessary while it is good at picking terms that need to be deleted. In the future, we plan to conduct similar studies on real query sessions and over multiple collections. We also plan to explore other strategies for automatic query term deletion. References 1. R. Jones and D. C. Fain. Query word deletion prediction. In Proceedings of SI- GIR 03, 2003. 2. E. Kanoulas, B. Carterette, P. Clough, and M. Sanderson. Session track overview. In Proceedings of TREC 10, 2010. 3. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR 01, 2001. 4. W. Zheng and H. Fang. Query aspect based term weighting regularization in information retrieval. In Proceedings of ECIR 10, 2010.