Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli udog@essex.ac.uk Abstract. The explosive growth of information on the World Wide Web demands eective intelligent search and ltering methods. Consequently, techniques have been developed that extract conceptual information from documents to build domain models automatically. The model we build is a taxonomy of conceptual terms that is used in a search assistant to help the user navigate to the right set of required documents. We monitor the dialogue steps performed by users to get feedback about the quality of choices proposed by the system and to adjust the model without manual intervention. Thus, we employ implicit relevance feedback to improve the domain model. Unlike in traditional relevance feedback and collaborative ltering tasks we do not need explicitly expressed user opinions. Moreover, we aim at improving the domain model as a whole rather than trying to build individual user proles. 1 Introduction In recent years there has been an explosive growth of the sheer volume of information available on the World Wide Web. This information is free and fairly unstructured. Search engines employing standard information retrieval techniques can help to get to some particular piece of information quickly. However, a common phenomenon is that users nd it dicult to express their actual information need as a query. Smaller domains like local Web sites face the same problems. For example, a query frequently found in the log les of our sample domain, the University of Essex Web site, is \languages". Someone submitting this request might have a clear idea about what sort of documents should be retrieved by the search engine, e.g. information about the Modern Languages Unit (which is the best match Google 1 could nd in our domain). But there are far more than 1,000 documents which contain the query term despite the fact that the domain consists of less than 30,000 indexable pages. Other top ranked documents retrieved by Google contain information about natural, controlled, and Pidgin languages. In addition to that, there is a large number of documents related to various types of computer languages like java. 1 http://www.google.com
One way to help the user getting to the best matching documents is to apply some automatically acquired representation of the actual data sources (a \domain model"), something that is feasible for limited domains. We build such a domain model by exploiting markup found in the documents. The result is a set of hierarchies of related terms. These relations are used to initiate simple dialogue steps by displaying candidate terms for query renement alongside the most highly ranked documents retrieved for a user query. The user's choice to pick a query renement term proposed by the dialogue system or to select some option considered relevant can be interpreted as implicit relevance feedback. We suggest to learn from a user in order to help the next user with a similar request as in collaborative ltering. But, unlike in classical collaborative ltering we do not distinguish a number of user groups. We basically have one large group of users, those who submit queries to the search engine of the particular site. Thus, we aim at improving the domain model of that site rather than user proles. 2 Related Work Relevance feedback is a method used to enhance information retrieval results [8, 2]. A user initially submits a query, and the system returns a small number of documents. The user then indicates which of the returned documents are relevant to the query. However, judging the relevance of documents may become time consuming and users would prefer another solution. By observing the users' actions rather than expecting explicit user feedback on results we introduce the idea of implicit relevance feedback. Actions the user performs, in our case dialogue steps, are judged to be relevant, everything else is judged as irrelevant. Our solution can be seen as a particular application of collaborative ltering. Collaborative ltering is based on identifying the opinions and preferences of similar users in order to predict the preferences and to recommend items to others. These techniques are used in a variety of recommender systems ranging from recommending news (e.g. GroupLens [7]) to recommending movies (e.g. Video Recommender [4]). The Community Search Assistant as described in [3] is a software agent which can be used to augment any kind of search engine. The agent works in parallel with the search engine itself and builds a graph of related queries which can be included in addition to the engine's results. The user can then traverse the graph of related queries in an ordered way. Determining relatedness of documents depends on the documents returned by the various queries and not on the terms used for the queries themselves. Furthermore, the use of the search assistant agent enables a form of collaborative search by allowing the users to draw on the knowledge base of queries submitted by others. Internet search engines have also started incorporating simple collaborative ltering techniques in order to improve search. Such eorts include the popularity engine built by DirectHit 2 which operates using a simple voting mechanism. The popularity engine works by simply tracking the queries input by users and the 2 http://www.directhit.com
links that the users follow. Users vote by following a link and therefore the result of a search in such a search engine will return the most popular results for that query. 3 Improving the Domain Model The search system we apply relies on a sophisticated indexing process that extracts a taxonomy of related concepts from the raw documents. The indexing process distinguishes whether an index term extracted from a document is conceptual information by evaluating the number and nature of various markup environments it is found in. Co-occurrence of dierent conceptual index terms in the same document denes a notion of related concepts. This was explained in detail in [6]. This taxonomy is mainly used in a query renement task, i.e. if the user query returns a large number of matching documents. In that case the dialogue component determines a set of conceptual terms related to the query. Those terms are selected based on their ability to describe only a subset of documents dened by the original user query. The user is asked to choose one. To use the introductory example, a query for \languages" would trigger the dialogue system to oer the following conceptual terms as possible constraints: second language, language department, idl, linguistic, spanish, java etc. The strategy applied to determine good discriminating terms (like java in the above example) is to check all concepts related to any of the input terms. This is computationally fairly cheap since there are far fewer concepts than keywords, and much of the calculation can be performed oine [5]. Then the three important factors to select a term as a good discriminator or not are: (1) the number of related concepts, (2) the frequency of each of those concepts, and (3) the weights of each of the related concept relations. The frequency of a concept is initially determined by the number of documents for which it was selected as a conceptual index term as opposed to just a normal keyword index. In addition to that, for every concept in the taxonomy the weights associated with each of its identied related concepts are equal and sum up to 1. These weights change, if: (1) a concept is oered and selected by the user (increase), or (2) a concept is oered and not selected (decrease). This will only change weights of relations already in place. The result is that the good parts of the taxonomy will gain importance, the rest will be less and less relevant. But that does not allow the creation of new links overlooked in the automatic construction of the model. We are currently experimenting with that. For example a user decides not to choose any of the oered terms, but inputs \query languages". This will implicitly introduce a new pair of related concepts which may become more important over time. Since we keep track of the dialogue history, we only increase weights associated with the links between any new input and the most recent input. That ensures that we do not run into computational explosion. The document ranking function we implemented is basically using the vector space model. In addition to that, dierent weights are given to index terms found
in particular markup contexts (e.g. keywords in titles are more important than in free text). This is not new. Search engines like Google use similar ranking functions [1]. However, our function goes beyond that in a number of ways. First of all, conceptual terms which were extracted during indexing are of higher weight than other terms. Moreover, every term has a weight which increases with the relative frequency of this term in the pool of all queries submitted to the search system so far. Finally, every concept term has a weight increasing with the frequency of this term being selected in a dialogue step within the collection of all options oered by the system so far. None of the weights in the ranking function has a particularly strong impact on the overall weight of a document. Finally, a word about the heterogenous nature of our methods which allow explicit relevance feedback. If documents are displayed they come with a box next to them, where a user can judge a document to be relevant or not. Since we keep track of the dialogue history we can again adjust the weights accordingly. This is not implemented yet, but ts into the framework since it is just another parameter in the equation. 4 Example language second_language research_teaching sla vivian_cook language_department... idl language_processing linguistic_university mphil corba computer_network odgm Fig. 1. Partial concept tree for example query \languages" In the example we reduce words to their base forms but apply no stemming. We use the introductory query (\languages"). For the calculation of index terms related to the query term we apply some fairly strict thresholds, i.e. frequent terms are not considered by the system. This is the reason why the compound term english language does not seem to be related to language. Our experience shows that better discriminating terms can be found by applying stricter thresholds. Figure 1 displays part of the originally constructed hierarchy for the con-
ceptual term language. Only the three most relevant related concept terms are presented on the top two levels. It must be interpreted as follows: the system determined the most important concepts that would constrain the original query in order to get to a smaller set of relevant documents. If the user decides to choose second language, then the new query to be evaluated against the database would contain languages as well as second language as query terms. Again a large number of matching documents exists for this new query and one option would be to select a new term oered by the search system, e.g. sla (which stands for `second language acquisition'). Alternatively, the user could ignore the proposed options completely and enter some input like \english" to continue. The order in which the terms are presented to the user represent their relative importance in respect to the current query applied to the domain model. Following a trial period the example structure in Figure 1 has changed signicantly. Apparently, users querying our system for \languages" were mainly interested in the linguistic sense of the query term. The relation between language and idl (`interface denition language') has disappeared from the list of most relevant related concepts. The fact that e (`English as a foreign language') has become the most relevant potential renement term for the \languages" query, does not reect a new relation between the terms language and e but an increased importance of a relation, which existed before but had initially a much lower weight assigned to it. These changes reect that only observing real users' behaviour can help getting to a more appropriate domain model. References 1. Brin, S., and Page, L. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Proceedings of the Seventh International World Wide Web Conference (WWW7) (Brisbane, 1998). 2. Buckley, C., Salton, G., and Allan, J. The eect of adding relevance information in a relevance feedback environment. In Proceedings of the 17th Annual International ACM SIGIR Conference (1994), pp. 292{301. 3. Glance, N. Community Search Assistant. In Proceedings of the AAAI-2000 Workshop on Articial Intelligence for Web Search (Austin, TX, 2000), Technical Report WS-00-01, AAAI Press. 4. Hill, W., Stead, L., Rosenstein, M., and Furnas, G. Recommending and evaluating choices in a virtual community of use. In Proceedings of CHI'95 (New York, 1995), ACM. 5. Kruschwitz, U. A Rapidly Acquired Domain Model Derived from Markup Structure. In ESSLLI Workshop on Semantic Knowledge Acquisition and Categorisation (Helsinki, 2001). To appear. 6. Kruschwitz, U. Exploiting Structure for Intelligent Web Search. In Proceedings of the 34 th Hawaii International Conference on System Sciences (HICSS) (Maui, Hawaii, 2001), IEEE. 7. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and Riedl, J. Group- Lens: An Open Architecture for Collaborative Filtering of Netnews. In Proceedings of ACM CSCW'94 (1994), pp. 175{186. 8. van Rijsbergen, C. J. Information Retrieval. Butterworths, 1979.