A Frequency Mining-based Algorithm for Re-Ranking Web Search Engine Retrievals

Size: px

Start display at page:

Download "A Frequency Mining-based Algorithm for Re-Ranking Web Search Engine Retrievals"

Anthony Garrison
5 years ago
Views:

1 A Frequency Mining-based Algorithm for Re-Ranking Web Search Engine Retrievals M. Barouni-Ebrahimi, Ebrahim Bagheri and Ali A. Ghorbani Faculty of Computer Science, University of New Brunswick, Fredericton, Canada Abstract. Conventional web search engines retrieve too many documents for the majority of the submitted queries; therefore, they possess a good recall, since there are far more pages than a user can look at. Precision; however, is a critical factor in these conditions, because the most related documents should be presented at the top of the list. In this paper, we propose an online page re-rank model which relies on the users clickthrough feedbacks as well as frequent phrases from the past queries. The method is compared with a similar page re-rank algorithm called I-SPY. The results show the efficiency of the proposed method in ranking the more related pages on top of the retrieved list while monitoring a smaller number of query phrases in a hit-matrix. Employing thirteen months of queries for the University of New Brunswick s search engine, the hit-matrix in our algorithm was on average 30 times smaller, while it showed better performance with regards to the re-rank of web search results. The proposed re-rank method is expandable to support user community-based searches as well as specific domain web search engines. 1 Introduction The incredible growth in the volume of accessible information on the Web has brought about confusion in accessing required information, leaving the the users lost in the hyperspace; despite their use of powerful web search engines. The large number of recommended web pages by a search engine for a single query leaves a user wondering which pages may be most suitable for their purpose. One obvious reason why the results of search engines do not always reflect user requirements is that the queries sent by the users are most likely not to correspond to their intentions. The lack of user knowledge or unfamiliarity with the specific keywords and phrases of the target domain makes the user type keywords that easily come to his/her mind. Our observation in the AOL query log shows that the average number of keywords in a query is 2.14, while it is even less in the query log obtained from the University of New Brunswick s search engine, which is only A shorter query submitted to a search engine will cause the engine to search a larger search space and will hence allow for more documents in the search space to satisfy the minimal requirements. Consequently, this will entail in too many

2 2 Barouni, Bagheri and Ghorbani results from the search engine; hence, the returned documents need to be ranked based on their relative importance. Common web search engines such as Google and Yahoo employ different types of parameters and consider various issues in order to rank the retrieved pages. Furthermore, some researchers have recently focused on re-ranking the outputs of the conventional web search engines. In this approach, the algorithms take advantages of an already ranked search result set from a conventional web search engine and apply a more specific algorithm to re-rank the provided result list. In [20], a re-rank method is proposed that tracks the submitted queries as well as user feedback of the related pages (the basic implicit user feedback is the user clicks on the retrieved pages). A hit-matrix is used to keep track of the number of the user clicks for each specific query. In this method the number of the queries increase rapidly and this has negative effect on the performance of the algorithm. Another problem is that there are not enough user clicks for some queries to validate the relevancy of the clicked pages. To address these problems, a similarity metric is used to unify the similar queries, but this approach has direct effect on the accuracy of the results. Our approach is to substitute the queries with frequent phrases mined from past submitted queries. The number of frequent phrases is much less than the number of the queries. Experimental results show that the number of frequent phrases in our proposed model converges to a constant value and is independent of the number of submitted queries. The number of user clicks associated with a specific frequent phrase is also much more than the ones for a specific queries. In this paper, a model is proposed to re-rank the retrieved pages of a conventional web search engine. The proposed model provides solutions to the following issues: 1) extracting frequent phrases from a query log requires appropriate stream mining techniques since query logs have conceptually evolved into data streams which are the result of an endless and continuous sequence of queries known as query streams; 2) mining frequent phrases from the query stream while keeping track of user clicks is not a straightforward task; and 3) a query may contain more than one frequent phrase. Each frequent phrase contains a list of related pages (based on user clicks in the hit-matrix). An appropriate method is needed to combine the lists and prepare an ordered list of related pages to the submitted query. The rest of the paper is organized as follows. In the next section, the background on the web search engines as well as the related works are briefly reviewed. The Page Rank Reviser (P2R) algorithm is described in Section 3. The experiments and the discussion about the efficiency of the proposed model are provided in Section 4. The paper in finally concluded in Section 5. 2 Background Retrieving documents from a repository related to a user request is the responsibility of a search engine, which is done through indexing mechanisms [25]. Ap-

3 A Frequency Mining-based... 3 propriate term weighting methods capable of distinguishing the important terms from those less significant in the documents is a crucial issue in automatic text retrieval systems. Term frequency-inverse document frequency (tf -idf ) weight is a well-known metric in the area of information retrieval. In [19], various methods for term weighting have been compared. The survey has been extended ten years later in [30] to include newer methods. Due to the dynamic nature of the web, the indexing mechanism of web search engines is a far more complicated task. For a long time, web search engines applied traditional information retrieval methods for term weighing. The output of a web search engine for a user query was a ranked document list in decreasing order based on the computed similarity between the query and the documents. In 1998, the idea of ranking the web pages based on the link structure of the web was successfully implemented in Google [16]. Later, in [11] a method has been proposed to improve the Google PageRank idea by computing a set of PageRank Topic-sensitive page ranking sensitive to the topics vectors rather than a single, generic PageRank vector. The diversity of users interests led researchers to work on the search engines whose retrieved documents change adaptively with respect to the users preferences [6, 20, 27, 29]. Another approach is to guide the user in preparing more appropriate queries by extracting the terms related to the user query from past queries (query space) as well as documents in the repository (document space). In the query space, different techniques have been exploited to mine web search query logs for query recommendation [1, 28] as well as query expansion [7, 22]. In the case of query recommendation, suggesting related queries extracted from query logs, to a new query is considered. In the latter case; however, mining the past query transactions leads to the expansion of the newly submitted user query, which is a result of the concatenation of new related items to the initial query expression. Recently, a variety of investigations have been carried out on the interactive query completion [2, 5, 24, 4, 8] while the query is formulated. Refining the output of the common web search engines (e.g. Google) to create a more appropriate results for a user query is an effective technique in web search area. Personalized web search is still widely studied through creating user profiles by analyzing user interests and activities [21, 23]. The search results are adapted based on the user profiles. In [10], a metasearch engine architecture has been proposed that allows the users to provide preferences explicitly in the form of information need category. Users preferences are then applied to the queries to create more appropriate queries by appending new terms. The results are also reordered based on user preferences. Community Search Assistant [9] is a software agent that suggests a list of related search results to a newly submitted query. It creates a query graph in which each node is a past user query. The queries of the query graph, related to a new submitted query are then suggested to the user. Each query is followed by a list of top search results. In [26], a re-rank algorithm has been proposed based on access logs. Every web site is supposed to keep a set of access logs, which embody browsing behaviors of its users and the time, duration and URL. By mining user logs within the websites, the retrieved pages are re-ranked to present the more popular websites on top of the list (popularity

This work has been further extended by Radlinski and Joachims which employs the concept of query chains to model user preferences [17, 18].

4 4 Barouni, Bagheri and Ghorbani Fig. 1. The structure of the Hit-Matrix. is based on the user visits). In [12], Joachims proposes the employment of a Support Vector Machine (SVM) for learning a retrieval function to improve WWW search engine performance. This work has been further extended by Radlinski and Joachims which employs the concept of query chains to model user preferences [17, 18]. 3 Page Rank Reviser Algorithm In [20], a web search method is proposed called I-SPY that relies on the past users selections to re-rank the query results for the needs of the communities of the users. The algorithm is provided with user queries separated based on different communities. Users need to login to be assigned to a specific community. The algorithm sends the submitted query to a conventional web search engine. The results are re-ranked based on the past community selections. The number of the users in a community selecting each page of the results is stored in a hit-matrix. As it is shown in Figure 1, q i is a query submitted by a user, url j is a page of the results for q i and H ij is the number of users selecting url j for q i. For a new query, the result is re-ordered based on the relevancy of the retrieved pages to the submitted query using the hit-matrix. The relevance value of the page url j to the query q i is calculated as follows: Relevance(url j, q i ) = H ij n j=1 H. (1) ij Smyth et al [20] have reported that only 15% of the queries observed in their experiments were duplicated. This causes two problems. First, the number of queries rapidly increases. It makes the hit-matrix very large that has effects on the performance of the system. Second, for some queries, the number of selections may not be large enough to be considered valid by the algorithm. Therefore, a similarity between queries calculated by Equation 2 is used to increase query

5 A Frequency Mining-based... 5 duplication. Two queries are considered to be the same if they are within a given similarity threshold. Sim(q, q ) = q q q q, (2) where q and q are two queries, q q is the number of the identical items in q and q and q q is the number of the distinct items in q and q. This helps the system to have a smaller hit-matrix as well as more hits for a specific page of a submitted query. However, this has a negative effect on precision. We improve this approach by manipulating the phrases within the queries rather than considering the whole query as a single element. Therefore, the queries q 1 to q m in the hit-matrix are replaced with the frequent phrases extracted using an online single-pass algorithm called Online Frequent Sequence Discovery (OFSD) which mines the set of all frequent sequences in a data stream whose frequency rates satisfy a minimum frequency rate [3]. Informally speaking, search engine query logs are provided to the OFSD algorithm which parses these query streams and extracts the most frequently observed phrases with the query stream and inserts them into a set called candidate set. The queries in the previous form of the hit-matrix are replaced with the frequent phrases in the candidate set from OFSD. In this way, the number of queries in the hit-matrix is much smaller, while there are more precise hits. Figure 2 shows the architecture of our proposed model. The OFSD algorithm receives the submitted queries and updates the hit-matrix. The User Feedback Provider gets the user feedback to update the hits in the hit-matrix. The most reliable feedback is to explicitly ask the user how related is a page to the submitted query. However, this is not an efficient solution, since users may not cooperate in this regard. Another solution is to monitor the period of the time that the user stays in a page. A period more than a specific threshold shows the relevancy of the page to the submitted query. It has been discussed in [14, 15] that counting the clicked pages for a given query is not an optimal solution for identifying page relevancy, however it can be used as a basic relevancy measure. The Page Rank Reviser algorithm gets the frequent phrases of the submitted query as well as the hits of the phrases (see Figure 2). It then re-ranks the output of a conventional web search engines by employing the frequent hits. As an example, assume that the query student financial services has been submitted to a web search engine and two retrieved pages url 1 and url 2 have been selected. Further, suppose that two phrases P 1 = student financial services and P 2 = financial services have already been extracted as frequent phrases by the OFSD algorithm. The hit-numbers of url 1 and url 2 are incremented for both P 1 and P 2 in the hit-matrix. To track the frequency rate of each url j related to a phrase P i, F (P i, url j ), which represents the frequency rate of the page url j for the phrase P i, is defined as: F (P i, url j ) = N urlj CN Pi t urlj + 1, (3)

the first time that url j has been observed for P i.

6 6 Barouni, Bagheri and Ghorbani Fig. 2. The architecture of the Page Re-rank model. where N urlj denotes the number of clicks on the url j for the submitted phrase P i (H ij in the hit-matrix) and t urlj represents the birth number of the page url j for the phrase P i, which shows the first time that url j has been observed for P i. To re-rank the retrieved pages of a conventional web search engine for a submitted query, the query is divided to a set of the longest frequent phrases called QF L = {P 1,..., P n }. There is no overlap between the divided phrases in QFL. For example, a query such as student financial services may be divided into two frequent phrases student and financial services based on the frequent phrases in the hit-matrix. For each phrase P i in QFL, the related urls from the hit-matrix that satisfy a minimum frequency rate are extracted and ordered based on their frequency rates called ranklist(p i ). The position of the url j in the ranklist(p i ) is the rank of the url j for the phrase P i called Rank(P i, url j ). For example, Rank(P i, url j ) = 1 means that the url j is the most frequent clicked url for the phrase P i in the ranklist(p i ) based on Equation 3. For each url i in the retrieved pages of the conventional web search engine, a rank list is assigned as follows: ranklist(url i ) = {Rank(P 1, url i ),..., Rank(P n, url i )}. (4) If url i is not in the ranklist(p j ), the maximum rank is assigned to it (it means that the url i is not related to the phrase P j ). Definition 1 Let Rank(P i, url j ) be the position of url j in the ranklist(p i ) for the phrase P i. The priority of url j, denoted Rank(url i ), is defined as: Rank(url i ) = µ(url i ) + σ(url i ), (5) where µ(url i ) is the arithmetic mean of the ranklist(url i ) which is calculated as follows:

7 A Frequency Mining-based... 7 n j=1 µ(url i ) = Rank(P j, url i ), (6) n σ(url i ) is the standard deviation of the ranklist(url i ) which is calculated as follows: σ(url i ) = 1 n (Rank(P j, url i ) µ(url i )) n 2. (7) j=1 Higher priority urls, which possess a smaller Rank(url i ), represent more important urls. The average of the ranks are important since a url that has low ranks for the phrases in QFL should be on top of the list. On the other hand, average is not a sufficient factor. Standard deviation is also added to mean in Equation 5. Assume a scenario in which: QF L = {P 1, P 2 }, ranklist(p 1 ) = {url 1, url 2, url 3 }, ranklist(p 2 ) = {url 4, url 2, url 1 }, searchresult = {url 5, url 6, url 2, url 1 }, µ(url 1 ) = µ(url 2 ) = 2, σ(url 1 ) = 1, σ(url 2 ) = 0, Although the average ranks of both url 1 and url 2 are the same, url 2 is a more related page for the submitted query compared to url 1. The reason is that it has the same rank to both phrases P 1 and P 2 of the submitted query. The final output would be: outputresult = {url 2, url 1, url 5, url 6 }, 4 Experiments In this section, the P2R algorithm is compared with I-SPY [20] using the query log collected from the University of New Brunswick s Web search engine. We believe that I-SPY and P2R are comparable over the UNB query log since the I-SPY method is intended to function over community search engines; therefore, because the queries directed to UNB Web search engine are domain specific (related to university issues) it can be considered a community search engine and a suitable testbed for both algorithms. In the following, first the evaluation method is described and then the results of the experiments are discussed. The employed UNB query log consists of 13 months of queries collected from November 2006 through to November The details of the distribution of queries in the query log can be seen in Figure Evaluation Method Each query in the UNB query log follows by a list of clicked urls along with their ranks in the retrieved pages. Since there is no way to realize if a clicked

8 8 Barouni, Bagheri and Ghorbani Fig. 3. Query distribution in the UNB query log. url was actually related to the query in the view of the user, we borrow the idea from [20] in which each selected page is assumed to be related to the query. Further analysis and justification on the suitability of such evaluation model can be found in [13]. A simple but effective metric has been employed to evaluate our re-rank method called click satisfaction Definition 2 Let BR(q) be the optimal ranking given in the best case by a web search engine (which would be the best url in the first position, second best in the second rank, etc), DR(q) be the ranking proposed by a re-rank algorithm, clicked(q) represent the set of clicked pages for a query q in the query log, R url be the optimal rank of the url, N url be the rank of the url after re-rank and clicked(q) be the number of clicked pages for q. The click satisfaction rate, denoted ClickSatisf action(q), which shows the degree of retrieval rank optimization by a given re-rank algorithm is defines as follows: BR(q) = BasedRank(q) = DR(q) = DerivedRank(q) = ClickSatisfaction(q) = url clicked(q) url clicked(q) DR(q) BR(q) clicked(q) R url (8) N url (9) (10)

9 A Frequency Mining-based... 9 Fig. 4. The shaded areas represent the click satisfaction rate of each algorithm in the UNB query log. Lower values for the click satisfaction metric show better performance of the re-ranking algorithm. For example, assume that for a submitted query q 1, three urls have been clicked: url 1 with the rank value of 2, url 2 with the rank value of 8 and url 3 with the rank value of 10. The BaseRank would be 6. The P2R algorithm is applied to the query q 1 and the ranks of the three urls are extracted. In this case, assume that it is url 1 with the rank value of 2, url 2 with the rank value of 3 and url 3 with the rank value of 13. The DerivedRank is 18. Based on Equation 10, the ClickSatisfaction is 4. It is important to note that a clicksatisfaction value equal to zero for a given query shows that the re-rank algorithm has an optimal performance. 4.2 Results The proposed algorithm, P2R, has been compared with the results obtained from the I-SPY algorithm. The I-SPY algorithm has been employed in two different settings Sim1 and Sim25 where the threshold values are set to 1 and 0.25, respectively. The obtained results from the UNB search engine were also used as the baseline. There are two important issues that need to be considered in the evaluation. First, the algorithms need to show good performance with regards to the click satisfaction metric, i.e. the algorithm that yields the smallest value for

10 10 Barouni, Bagheri and Ghorbani Fig. 5. The amount of information stored in the hit-matrix by each of the algorithms for the UNB query log. this metric has the best performance with regards to the re-rank of the search engine results. Second, the algorithms should not possess a high space complexity which means that the size of the hit-matrix should be small enough for the algorithm to be able to perform required calculations in a timely manner. A small hit-matrix allows faster inference from the data and requires less storage space. As it can be seen in Figure 4, the P2R algorithm has the lowest click satisfaction rate compared with the other models, which is an indicator of its better re-ranking performance. Furthermore, this figure depicts that the two different cases of the I-SPY algorithm do not have a significant difference with respect to re-rank of the result pages. Furthermore, it can be seen in Figure 5 that the P2R algorithm requires and stores a significantly smaller hit-matrix compared with the I-SPY algorithm in both settings. This is a major advantage since the P2R algorithm is able to achieve better performance in re-ranking the search engine results by only storing a much smaller hit-matrix in comparison with the I-SPY algorithm. More interestingly, as it can be seen in Figure 5 the size of the P2R hit-matrix is bounded and does not grow beyond a certain size because of its internal pruning process while the size of the hit-matrix stored in the I-SPY algorithm grows significantly larger as new queries are observed. On average the size of the P2R hit-matrix is 30 times smaller than that of the I-SPY algorithm.

A Frequency Mining-based... 11 Fig. 6. Query distribution in the AOL query log. In the second set of our experiments, we employed a query log obtained from the AOL web search engine.

11 A Frequency Mining-based Fig. 6. Query distribution in the AOL query log. In the second set of our experiments, we employed a query log obtained from the AOL web search engine. The distribution of queries in this query log are shown in Figure 6. It can be seen that the number of queries submitted to the AOL search engine is much more than those submitted to the UNB web search engine. We have used seven days from this query log to evaluate and compare the re-rank algorithms. As it can be seen in Figure 7, the P2R algorithm still outperforms the I-SPY algorithm. The improvements gained by the P2R algorithm over the I-SPY algorithm in the AOL query log are less than that of those obtained in the UNB query log. This is due to the fact that the P2R algorithm is intended to perform in domain-specific applications, and since AOL is a general-purpose web search engine, P2R cannot perform optimally; however, even in such setting, it performs better than the I-SPY algorithm and the baseline. Analogous to the previous experiment, the size of the hit-matrix saved by the P2R algorithm is significantly smaller than that employed in the I-SPY algorithm. The size of the hit-matrix in the P2R algorithm converges towards approximately 23, 000 phrases after a couple of days worth of queries, while this exceeds 500, 000 after seven days in the I-SPY algorithm. 4.3 Discussions There are advantages to extract frequent phrases from the query space rather than the document space. The extracted phrases are usually specific and well defined. Similar vague phrases are not repeated frequently; therefore, they are filtered out in the OFSD algorithm. The stemming process is not applicable since the common web search engines are sensitive to the different formats of a word

12 Barouni, Bagheri and Ghorbani Fig. 7. The shaded areas represent the click satisfaction rate of each algorithm in the AOL query log. even with the same morphological root (e.g. the top results for the two words structure and structures are different in Google search).

12 12 Barouni, Bagheri and Ghorbani Fig. 7. The shaded areas represent the click satisfaction rate of each algorithm in the AOL query log. even with the same morphological root (e.g. the top results for the two words structure and structures are different in Google search). Due to the dynamic nature of the web, the frequency rates of the phrases as well as the related urls in the hit-matrix may change over time. A page may be related to a phrase in a period of time and is frequently clicked by the users. Changing the content of the page may result in decreasing the clicks in a way that it is not frequently clicked by the users for that specific phrase anymore. Because of the frequency rate tracking, the algorithm is able to extract shortterm frequent phrases as well as frequent related urls and will gradually forget them whenever they are not frequent anymore. Similar to the other re-rank algorithms, we take the advantages of the stateof-the-art algorithms in indexing as well as avoiding spamdexing by getting results from common web search engines. In our method, we do not show any url from the hit-matrix unless it is in the current search result. In this way, if a web site is removed from the search results by the common web search engine, it will be filtered out from the hit-matrix as well. Pruning the hit-matrix is an important issue since deleting the infrequent phrases and urls in the hit-matrix is done through sequential reading of the entire elements in the hit-matrix. It would be of order O(n), where n represents the number of the rows in the hit-matrix (maximum number of the column is a

13 A Frequency Mining-based constant). This remarkably increases the overhead of the algorithm. To reduce the order of the process, the following condition is used to prune the hit-matrix: t c t l > N cs log (N cs ) (11) where t c represents the current transaction number, t l denotes the transaction number of the last prune and N cs is the number of rows in the hit-matrix after the last prune. The condition reduces the order of the OFSD algorithm to O(log n) in an average case. As it is discussed in [2], n can be controlled in OFSD by dynamically choosing appropriate parameters in order to keep the model real-time while efficiently re-ranking the search results (See [2] for more details). 5 Conclusions In this paper, the P2R algorithm has been proposed which applies a hit-matrix to the retrieved pages of the common web search engines for refinement of the search results. The hit-matrix keeps track of the frequently clicked pages for each frequent phrase of the queries. The frequent phrases are extracted from a query stream by the OFSD algorithm in one pass (each query is processed only once). The results show the advantage of the P2R compared to the I- SPY model. Employing the queries of the UNB query log, the hit-matrix was much smaller in the proposed model, while the click satisfaction metric showed significantly better performance for the P2R algorithm based on the proposed evaluation method. Since the size of the hit-matrix is considered as the main performance bottleneck of our algorithm as well as for the I-SPY model, our approach significantly increases the performance by reducing the size of the hitmatrix. The experimental results based on the AOL query log and the UNB query log confirm that the P2R algorithm is intended for domain-specific web search engines and performs better in such cases compared with general-purpose search engines. References 1. Baeza-Yates, R., Hurtado, C., and Mendoza, M. Query recommendation using query logs in search engines. Lecture Notes in Computer Science: Current Trends in Database Technology - EDBT 2004 Workshops (2004), Barouni-Ebrahimi, M., and Ghorbani, A. A. On query completion in web search engines based on query stream mining. In International Conference on Web Intelligence (WI 07) (2-5 Nov. 2007), pp Barouni-Ebrahimi, M., and Ghorbani, A. A. An online frequency rate based algorithm for mining frequent sequences in evolving data streams. In international conference on information technology and management (ICITM 07) (Hong Kong, 2007), pp Barouni-Ebrahimi, M., Zafarani, R., Bagheri, E., and Ghorbani, A. A. Semantic search guidance: Learn from history. In Proceedings of NIPS Workshop on Machine Learning for Web Search (2007).

14 14 Barouni, Bagheri and Ghorbani 5. Bast, H., and Weber, I. Type less, find more: fast autocompletion search with a succinct index. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 06) (New York, NY, USA, 2006), pp Castellucci, A., Ianni, G., Vasile, D., and Costa, S. Searching and surfing the web using a semi-adaptive meta-engine. International Conference on nformation Technology: Coding and Computing (ITCC 01) (2001), Cui, H., Wen, J.-R., Nie, J.-Y., and Ma, W.-Y. Probabilistic query expansion using query logs. In Proceedings of the 11th international conference on World Wide Web (WWW 02) (New York, USA, 2002), pp Ensan, F., Bagheri, E., and Kahani, M. Applying collective experience for crafting suitable search engine query recommendations. In Fifth IEEE/ACM Conference on Communication Networks and Services Research (CNSR 07) (2007), pp Glance, N. S. Community search assistant. In Proceedings of the 6th international conference on Intelligent user interfaces (IUI 01) (New York, NY, USA, 2001), ACM Press, pp Glover, E. J., Lawrence, S., Gordon, M. D., Birmingham, W. P., and Giles, C. L. Web search your way. Communications of the ACM 44, 12 (2001), Haveliwala, T. H. Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering 15, 4 (2003), Joachims, T. Optimizing search engines using clickthrough data. In KDD (2002), pp Joachims, T. Evaluating Retrieval Performance Using Clickthrough Data. Text Mining. 2003, pp Joachims, T., Granka, L. A., Pan, B., Hembrooke, H., and Gay, G. Accurately interpreting clickthrough data as implicit feedback. In SIGIR (2005), pp Joachims, T., Granka, L. A., Pan, B., Hembrooke, H., Radlinski, F., and Gay, G. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transaction on Information Systems 25, 2 (2007), Page, L., Brin, S., Motwani, R., and Winograd, T. The pagerank citation ranking: Bringing order to the web. Tech. Rep. Available via: Radlinski, F., and Joachims, T. Query chains: learning to rank from implicit feedback. In KDD (2005), pp Radlinski, F., and Joachims, T. Active exploration for learning rankings from clickthrough data. In KDD (2007), pp Salton, G., and Buckley, C. Term-weighting approaches in automatic text retrieval. Inf.Process.Manage. 24, 5 (1988), Smyth, B., Balfe, E., Freyne, J., Briggs, P., Coyle, M., and Boydell, O. Exploiting query repetition and regularity in an adaptive community-based web search engine. User Modeling and User-Adapted Interaction 14, 5 (2005), Sugiyama, K., Hatano, K., and Yoshikawa, M. Adaptive web search based on user profile constructed without any effort from users. In Proceedings of the 13th international conference on World Wide Web (WWW 04) (New York, NY, USA, 2004), pp

15 A Frequency Mining-based Sun, R., Ong, C.-H., and Chua, T.-S. Mining dependency relations for query expansion in passage retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 06) (New York, USA, 2006), pp Teevan, J., Dumais, S. T., and Horvitz, E. Personalizing search via automated analysis of interests and activities. In SIGIR 05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (New York, NY, USA, 2005), ACM Press, pp White, R. W., and Marchionini, G. Examining the effectiveness of real-time query expansion. Information Processing and Management 43, 3 (2007), Witten, I. H., Moffat, A., and Bell, T. C. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kauffman Publishing, San Francisco, Xue, G.-R., Zeng, H.-J., Chen, Z., Ma, W.-Y., and Lu, C.-J. Log mining to improve the performance of site search. In WISEW 02: Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops) - (WISEw 02) (Washington, DC, USA, 2002), IEEE Computer Society, p Zhang, W., Xu, B., and Yang, H. Development of a self-adaptive web search engine. Proceedings of 3rd International Workshop on Web Site Evolution (wse 01) 00 (2001), Zhang, Z., and Nasraoui, O. Mining search engine query logs for query recommendation. In Proceedings of the 15th international conference on World Wide Web (WWW 06) (New York, USA, 2006), pp Zhixiang Chen, Xiannong Meng, R. H. F. B. Z. Features: Real-time adaptive feature and document learning for web search. Journal of the American Society for Information Science and Technology 52, 8 (2001), Zobel, J., and Moffat, A. Exploring the similarity space. SIGIR Forum 32, 1 (1998),

Estimating Credibility of User Clicks with Mouse Movement and Eye-tracking Information

Estimating Credibility of User Clicks with Mouse Movement and Eye-tracking Information Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma Department of Computer Science and Technology, Tsinghua University Background