A Frequency Mining-based Algorithm for Re-Ranking Web Search Engine Retrievals

Size: px
Start display at page:

Download "A Frequency Mining-based Algorithm for Re-Ranking Web Search Engine Retrievals"

Transcription

1 A Frequency Mining-based Algorithm for Re-Ranking Web Search Engine Retrievals M. Barouni-Ebrahimi, Ebrahim Bagheri and Ali A. Ghorbani Faculty of Computer Science, University of New Brunswick, Fredericton, Canada Abstract. Conventional web search engines retrieve too many documents for the majority of the submitted queries; therefore, they possess a good recall, since there are far more pages than a user can look at. Precision; however, is a critical factor in these conditions, because the most related documents should be presented at the top of the list. In this paper, we propose an online page re-rank model which relies on the users clickthrough feedbacks as well as frequent phrases from the past queries. The method is compared with a similar page re-rank algorithm called I-SPY. The results show the efficiency of the proposed method in ranking the more related pages on top of the retrieved list while monitoring a smaller number of query phrases in a hit-matrix. Employing thirteen months of queries for the University of New Brunswick s search engine, the hit-matrix in our algorithm was on average 30 times smaller, while it showed better performance with regards to the re-rank of web search results. The proposed re-rank method is expandable to support user community-based searches as well as specific domain web search engines. 1 Introduction The incredible growth in the volume of accessible information on the Web has brought about confusion in accessing required information, leaving the the users lost in the hyperspace; despite their use of powerful web search engines. The large number of recommended web pages by a search engine for a single query leaves a user wondering which pages may be most suitable for their purpose. One obvious reason why the results of search engines do not always reflect user requirements is that the queries sent by the users are most likely not to correspond to their intentions. The lack of user knowledge or unfamiliarity with the specific keywords and phrases of the target domain makes the user type keywords that easily come to his/her mind. Our observation in the AOL query log shows that the average number of keywords in a query is 2.14, while it is even less in the query log obtained from the University of New Brunswick s search engine, which is only A shorter query submitted to a search engine will cause the engine to search a larger search space and will hence allow for more documents in the search space to satisfy the minimal requirements. Consequently, this will entail in too many

2 2 Barouni, Bagheri and Ghorbani results from the search engine; hence, the returned documents need to be ranked based on their relative importance. Common web search engines such as Google and Yahoo employ different types of parameters and consider various issues in order to rank the retrieved pages. Furthermore, some researchers have recently focused on re-ranking the outputs of the conventional web search engines. In this approach, the algorithms take advantages of an already ranked search result set from a conventional web search engine and apply a more specific algorithm to re-rank the provided result list. In [20], a re-rank method is proposed that tracks the submitted queries as well as user feedback of the related pages (the basic implicit user feedback is the user clicks on the retrieved pages). A hit-matrix is used to keep track of the number of the user clicks for each specific query. In this method the number of the queries increase rapidly and this has negative effect on the performance of the algorithm. Another problem is that there are not enough user clicks for some queries to validate the relevancy of the clicked pages. To address these problems, a similarity metric is used to unify the similar queries, but this approach has direct effect on the accuracy of the results. Our approach is to substitute the queries with frequent phrases mined from past submitted queries. The number of frequent phrases is much less than the number of the queries. Experimental results show that the number of frequent phrases in our proposed model converges to a constant value and is independent of the number of submitted queries. The number of user clicks associated with a specific frequent phrase is also much more than the ones for a specific queries. In this paper, a model is proposed to re-rank the retrieved pages of a conventional web search engine. The proposed model provides solutions to the following issues: 1) extracting frequent phrases from a query log requires appropriate stream mining techniques since query logs have conceptually evolved into data streams which are the result of an endless and continuous sequence of queries known as query streams; 2) mining frequent phrases from the query stream while keeping track of user clicks is not a straightforward task; and 3) a query may contain more than one frequent phrase. Each frequent phrase contains a list of related pages (based on user clicks in the hit-matrix). An appropriate method is needed to combine the lists and prepare an ordered list of related pages to the submitted query. The rest of the paper is organized as follows. In the next section, the background on the web search engines as well as the related works are briefly reviewed. The Page Rank Reviser (P2R) algorithm is described in Section 3. The experiments and the discussion about the efficiency of the proposed model are provided in Section 4. The paper in finally concluded in Section 5. 2 Background Retrieving documents from a repository related to a user request is the responsibility of a search engine, which is done through indexing mechanisms [25]. Ap-

3 A Frequency Mining-based... 3 propriate term weighting methods capable of distinguishing the important terms from those less significant in the documents is a crucial issue in automatic text retrieval systems. Term frequency-inverse document frequency (tf -idf ) weight is a well-known metric in the area of information retrieval. In [19], various methods for term weighting have been compared. The survey has been extended ten years later in [30] to include newer methods. Due to the dynamic nature of the web, the indexing mechanism of web search engines is a far more complicated task. For a long time, web search engines applied traditional information retrieval methods for term weighing. The output of a web search engine for a user query was a ranked document list in decreasing order based on the computed similarity between the query and the documents. In 1998, the idea of ranking the web pages based on the link structure of the web was successfully implemented in Google [16]. Later, in [11] a method has been proposed to improve the Google PageRank idea by computing a set of PageRank Topic-sensitive page ranking sensitive to the topics vectors rather than a single, generic PageRank vector. The diversity of users interests led researchers to work on the search engines whose retrieved documents change adaptively with respect to the users preferences [6, 20, 27, 29]. Another approach is to guide the user in preparing more appropriate queries by extracting the terms related to the user query from past queries (query space) as well as documents in the repository (document space). In the query space, different techniques have been exploited to mine web search query logs for query recommendation [1, 28] as well as query expansion [7, 22]. In the case of query recommendation, suggesting related queries extracted from query logs, to a new query is considered. In the latter case; however, mining the past query transactions leads to the expansion of the newly submitted user query, which is a result of the concatenation of new related items to the initial query expression. Recently, a variety of investigations have been carried out on the interactive query completion [2, 5, 24, 4, 8] while the query is formulated. Refining the output of the common web search engines (e.g. Google) to create a more appropriate results for a user query is an effective technique in web search area. Personalized web search is still widely studied through creating user profiles by analyzing user interests and activities [21, 23]. The search results are adapted based on the user profiles. In [10], a metasearch engine architecture has been proposed that allows the users to provide preferences explicitly in the form of information need category. Users preferences are then applied to the queries to create more appropriate queries by appending new terms. The results are also reordered based on user preferences. Community Search Assistant [9] is a software agent that suggests a list of related search results to a newly submitted query. It creates a query graph in which each node is a past user query. The queries of the query graph, related to a new submitted query are then suggested to the user. Each query is followed by a list of top search results. In [26], a re-rank algorithm has been proposed based on access logs. Every web site is supposed to keep a set of access logs, which embody browsing behaviors of its users and the time, duration and URL. By mining user logs within the websites, the retrieved pages are re-ranked to present the more popular websites on top of the list (popularity

4 4 Barouni, Bagheri and Ghorbani Fig. 1. The structure of the Hit-Matrix. is based on the user visits). In [12], Joachims proposes the employment of a Support Vector Machine (SVM) for learning a retrieval function to improve WWW search engine performance. This work has been further extended by Radlinski and Joachims which employs the concept of query chains to model user preferences [17, 18]. 3 Page Rank Reviser Algorithm In [20], a web search method is proposed called I-SPY that relies on the past users selections to re-rank the query results for the needs of the communities of the users. The algorithm is provided with user queries separated based on different communities. Users need to login to be assigned to a specific community. The algorithm sends the submitted query to a conventional web search engine. The results are re-ranked based on the past community selections. The number of the users in a community selecting each page of the results is stored in a hit-matrix. As it is shown in Figure 1, q i is a query submitted by a user, url j is a page of the results for q i and H ij is the number of users selecting url j for q i. For a new query, the result is re-ordered based on the relevancy of the retrieved pages to the submitted query using the hit-matrix. The relevance value of the page url j to the query q i is calculated as follows: Relevance(url j, q i ) = H ij n j=1 H. (1) ij Smyth et al [20] have reported that only 15% of the queries observed in their experiments were duplicated. This causes two problems. First, the number of queries rapidly increases. It makes the hit-matrix very large that has effects on the performance of the system. Second, for some queries, the number of selections may not be large enough to be considered valid by the algorithm. Therefore, a similarity between queries calculated by Equation 2 is used to increase query

5 A Frequency Mining-based... 5 duplication. Two queries are considered to be the same if they are within a given similarity threshold. Sim(q, q ) = q q q q, (2) where q and q are two queries, q q is the number of the identical items in q and q and q q is the number of the distinct items in q and q. This helps the system to have a smaller hit-matrix as well as more hits for a specific page of a submitted query. However, this has a negative effect on precision. We improve this approach by manipulating the phrases within the queries rather than considering the whole query as a single element. Therefore, the queries q 1 to q m in the hit-matrix are replaced with the frequent phrases extracted using an online single-pass algorithm called Online Frequent Sequence Discovery (OFSD) which mines the set of all frequent sequences in a data stream whose frequency rates satisfy a minimum frequency rate [3]. Informally speaking, search engine query logs are provided to the OFSD algorithm which parses these query streams and extracts the most frequently observed phrases with the query stream and inserts them into a set called candidate set. The queries in the previous form of the hit-matrix are replaced with the frequent phrases in the candidate set from OFSD. In this way, the number of queries in the hit-matrix is much smaller, while there are more precise hits. Figure 2 shows the architecture of our proposed model. The OFSD algorithm receives the submitted queries and updates the hit-matrix. The User Feedback Provider gets the user feedback to update the hits in the hit-matrix. The most reliable feedback is to explicitly ask the user how related is a page to the submitted query. However, this is not an efficient solution, since users may not cooperate in this regard. Another solution is to monitor the period of the time that the user stays in a page. A period more than a specific threshold shows the relevancy of the page to the submitted query. It has been discussed in [14, 15] that counting the clicked pages for a given query is not an optimal solution for identifying page relevancy, however it can be used as a basic relevancy measure. The Page Rank Reviser algorithm gets the frequent phrases of the submitted query as well as the hits of the phrases (see Figure 2). It then re-ranks the output of a conventional web search engines by employing the frequent hits. As an example, assume that the query student financial services has been submitted to a web search engine and two retrieved pages url 1 and url 2 have been selected. Further, suppose that two phrases P 1 = student financial services and P 2 = financial services have already been extracted as frequent phrases by the OFSD algorithm. The hit-numbers of url 1 and url 2 are incremented for both P 1 and P 2 in the hit-matrix. To track the frequency rate of each url j related to a phrase P i, F (P i, url j ), which represents the frequency rate of the page url j for the phrase P i, is defined as: F (P i, url j ) = N urlj CN Pi t urlj + 1, (3)

6 6 Barouni, Bagheri and Ghorbani Fig. 2. The architecture of the Page Re-rank model. where N urlj denotes the number of clicks on the url j for the submitted phrase P i (H ij in the hit-matrix) and t urlj represents the birth number of the page url j for the phrase P i, which shows the first time that url j has been observed for P i. To re-rank the retrieved pages of a conventional web search engine for a submitted query, the query is divided to a set of the longest frequent phrases called QF L = {P 1,..., P n }. There is no overlap between the divided phrases in QFL. For example, a query such as student financial services may be divided into two frequent phrases student and financial services based on the frequent phrases in the hit-matrix. For each phrase P i in QFL, the related urls from the hit-matrix that satisfy a minimum frequency rate are extracted and ordered based on their frequency rates called ranklist(p i ). The position of the url j in the ranklist(p i ) is the rank of the url j for the phrase P i called Rank(P i, url j ). For example, Rank(P i, url j ) = 1 means that the url j is the most frequent clicked url for the phrase P i in the ranklist(p i ) based on Equation 3. For each url i in the retrieved pages of the conventional web search engine, a rank list is assigned as follows: ranklist(url i ) = {Rank(P 1, url i ),..., Rank(P n, url i )}. (4) If url i is not in the ranklist(p j ), the maximum rank is assigned to it (it means that the url i is not related to the phrase P j ). Definition 1 Let Rank(P i, url j ) be the position of url j in the ranklist(p i ) for the phrase P i. The priority of url j, denoted Rank(url i ), is defined as: Rank(url i ) = µ(url i ) + σ(url i ), (5) where µ(url i ) is the arithmetic mean of the ranklist(url i ) which is calculated as follows:

7 A Frequency Mining-based... 7 n j=1 µ(url i ) = Rank(P j, url i ), (6) n σ(url i ) is the standard deviation of the ranklist(url i ) which is calculated as follows: σ(url i ) = 1 n (Rank(P j, url i ) µ(url i )) n 2. (7) j=1 Higher priority urls, which possess a smaller Rank(url i ), represent more important urls. The average of the ranks are important since a url that has low ranks for the phrases in QFL should be on top of the list. On the other hand, average is not a sufficient factor. Standard deviation is also added to mean in Equation 5. Assume a scenario in which: QF L = {P 1, P 2 }, ranklist(p 1 ) = {url 1, url 2, url 3 }, ranklist(p 2 ) = {url 4, url 2, url 1 }, searchresult = {url 5, url 6, url 2, url 1 }, µ(url 1 ) = µ(url 2 ) = 2, σ(url 1 ) = 1, σ(url 2 ) = 0, Although the average ranks of both url 1 and url 2 are the same, url 2 is a more related page for the submitted query compared to url 1. The reason is that it has the same rank to both phrases P 1 and P 2 of the submitted query. The final output would be: outputresult = {url 2, url 1, url 5, url 6 }, 4 Experiments In this section, the P2R algorithm is compared with I-SPY [20] using the query log collected from the University of New Brunswick s Web search engine. We believe that I-SPY and P2R are comparable over the UNB query log since the I-SPY method is intended to function over community search engines; therefore, because the queries directed to UNB Web search engine are domain specific (related to university issues) it can be considered a community search engine and a suitable testbed for both algorithms. In the following, first the evaluation method is described and then the results of the experiments are discussed. The employed UNB query log consists of 13 months of queries collected from November 2006 through to November The details of the distribution of queries in the query log can be seen in Figure Evaluation Method Each query in the UNB query log follows by a list of clicked urls along with their ranks in the retrieved pages. Since there is no way to realize if a clicked

8 8 Barouni, Bagheri and Ghorbani Fig. 3. Query distribution in the UNB query log. url was actually related to the query in the view of the user, we borrow the idea from [20] in which each selected page is assumed to be related to the query. Further analysis and justification on the suitability of such evaluation model can be found in [13]. A simple but effective metric has been employed to evaluate our re-rank method called click satisfaction Definition 2 Let BR(q) be the optimal ranking given in the best case by a web search engine (which would be the best url in the first position, second best in the second rank, etc), DR(q) be the ranking proposed by a re-rank algorithm, clicked(q) represent the set of clicked pages for a query q in the query log, R url be the optimal rank of the url, N url be the rank of the url after re-rank and clicked(q) be the number of clicked pages for q. The click satisfaction rate, denoted ClickSatisf action(q), which shows the degree of retrieval rank optimization by a given re-rank algorithm is defines as follows: BR(q) = BasedRank(q) = DR(q) = DerivedRank(q) = ClickSatisfaction(q) = url clicked(q) url clicked(q) DR(q) BR(q) clicked(q) R url (8) N url (9) (10)

9 A Frequency Mining-based... 9 Fig. 4. The shaded areas represent the click satisfaction rate of each algorithm in the UNB query log. Lower values for the click satisfaction metric show better performance of the re-ranking algorithm. For example, assume that for a submitted query q 1, three urls have been clicked: url 1 with the rank value of 2, url 2 with the rank value of 8 and url 3 with the rank value of 10. The BaseRank would be 6. The P2R algorithm is applied to the query q 1 and the ranks of the three urls are extracted. In this case, assume that it is url 1 with the rank value of 2, url 2 with the rank value of 3 and url 3 with the rank value of 13. The DerivedRank is 18. Based on Equation 10, the ClickSatisfaction is 4. It is important to note that a clicksatisfaction value equal to zero for a given query shows that the re-rank algorithm has an optimal performance. 4.2 Results The proposed algorithm, P2R, has been compared with the results obtained from the I-SPY algorithm. The I-SPY algorithm has been employed in two different settings Sim1 and Sim25 where the threshold values are set to 1 and 0.25, respectively. The obtained results from the UNB search engine were also used as the baseline. There are two important issues that need to be considered in the evaluation. First, the algorithms need to show good performance with regards to the click satisfaction metric, i.e. the algorithm that yields the smallest value for

10 10 Barouni, Bagheri and Ghorbani Fig. 5. The amount of information stored in the hit-matrix by each of the algorithms for the UNB query log. this metric has the best performance with regards to the re-rank of the search engine results. Second, the algorithms should not possess a high space complexity which means that the size of the hit-matrix should be small enough for the algorithm to be able to perform required calculations in a timely manner. A small hit-matrix allows faster inference from the data and requires less storage space. As it can be seen in Figure 4, the P2R algorithm has the lowest click satisfaction rate compared with the other models, which is an indicator of its better re-ranking performance. Furthermore, this figure depicts that the two different cases of the I-SPY algorithm do not have a significant difference with respect to re-rank of the result pages. Furthermore, it can be seen in Figure 5 that the P2R algorithm requires and stores a significantly smaller hit-matrix compared with the I-SPY algorithm in both settings. This is a major advantage since the P2R algorithm is able to achieve better performance in re-ranking the search engine results by only storing a much smaller hit-matrix in comparison with the I-SPY algorithm. More interestingly, as it can be seen in Figure 5 the size of the P2R hit-matrix is bounded and does not grow beyond a certain size because of its internal pruning process while the size of the hit-matrix stored in the I-SPY algorithm grows significantly larger as new queries are observed. On average the size of the P2R hit-matrix is 30 times smaller than that of the I-SPY algorithm.

11 A Frequency Mining-based Fig. 6. Query distribution in the AOL query log. In the second set of our experiments, we employed a query log obtained from the AOL web search engine. The distribution of queries in this query log are shown in Figure 6. It can be seen that the number of queries submitted to the AOL search engine is much more than those submitted to the UNB web search engine. We have used seven days from this query log to evaluate and compare the re-rank algorithms. As it can be seen in Figure 7, the P2R algorithm still outperforms the I-SPY algorithm. The improvements gained by the P2R algorithm over the I-SPY algorithm in the AOL query log are less than that of those obtained in the UNB query log. This is due to the fact that the P2R algorithm is intended to perform in domain-specific applications, and since AOL is a general-purpose web search engine, P2R cannot perform optimally; however, even in such setting, it performs better than the I-SPY algorithm and the baseline. Analogous to the previous experiment, the size of the hit-matrix saved by the P2R algorithm is significantly smaller than that employed in the I-SPY algorithm. The size of the hit-matrix in the P2R algorithm converges towards approximately 23, 000 phrases after a couple of days worth of queries, while this exceeds 500, 000 after seven days in the I-SPY algorithm. 4.3 Discussions There are advantages to extract frequent phrases from the query space rather than the document space. The extracted phrases are usually specific and well defined. Similar vague phrases are not repeated frequently; therefore, they are filtered out in the OFSD algorithm. The stemming process is not applicable since the common web search engines are sensitive to the different formats of a word

12 12 Barouni, Bagheri and Ghorbani Fig. 7. The shaded areas represent the click satisfaction rate of each algorithm in the AOL query log. even with the same morphological root (e.g. the top results for the two words structure and structures are different in Google search). Due to the dynamic nature of the web, the frequency rates of the phrases as well as the related urls in the hit-matrix may change over time. A page may be related to a phrase in a period of time and is frequently clicked by the users. Changing the content of the page may result in decreasing the clicks in a way that it is not frequently clicked by the users for that specific phrase anymore. Because of the frequency rate tracking, the algorithm is able to extract shortterm frequent phrases as well as frequent related urls and will gradually forget them whenever they are not frequent anymore. Similar to the other re-rank algorithms, we take the advantages of the stateof-the-art algorithms in indexing as well as avoiding spamdexing by getting results from common web search engines. In our method, we do not show any url from the hit-matrix unless it is in the current search result. In this way, if a web site is removed from the search results by the common web search engine, it will be filtered out from the hit-matrix as well. Pruning the hit-matrix is an important issue since deleting the infrequent phrases and urls in the hit-matrix is done through sequential reading of the entire elements in the hit-matrix. It would be of order O(n), where n represents the number of the rows in the hit-matrix (maximum number of the column is a

13 A Frequency Mining-based constant). This remarkably increases the overhead of the algorithm. To reduce the order of the process, the following condition is used to prune the hit-matrix: t c t l > N cs log (N cs ) (11) where t c represents the current transaction number, t l denotes the transaction number of the last prune and N cs is the number of rows in the hit-matrix after the last prune. The condition reduces the order of the OFSD algorithm to O(log n) in an average case. As it is discussed in [2], n can be controlled in OFSD by dynamically choosing appropriate parameters in order to keep the model real-time while efficiently re-ranking the search results (See [2] for more details). 5 Conclusions In this paper, the P2R algorithm has been proposed which applies a hit-matrix to the retrieved pages of the common web search engines for refinement of the search results. The hit-matrix keeps track of the frequently clicked pages for each frequent phrase of the queries. The frequent phrases are extracted from a query stream by the OFSD algorithm in one pass (each query is processed only once). The results show the advantage of the P2R compared to the I- SPY model. Employing the queries of the UNB query log, the hit-matrix was much smaller in the proposed model, while the click satisfaction metric showed significantly better performance for the P2R algorithm based on the proposed evaluation method. Since the size of the hit-matrix is considered as the main performance bottleneck of our algorithm as well as for the I-SPY model, our approach significantly increases the performance by reducing the size of the hitmatrix. The experimental results based on the AOL query log and the UNB query log confirm that the P2R algorithm is intended for domain-specific web search engines and performs better in such cases compared with general-purpose search engines. References 1. Baeza-Yates, R., Hurtado, C., and Mendoza, M. Query recommendation using query logs in search engines. Lecture Notes in Computer Science: Current Trends in Database Technology - EDBT 2004 Workshops (2004), Barouni-Ebrahimi, M., and Ghorbani, A. A. On query completion in web search engines based on query stream mining. In International Conference on Web Intelligence (WI 07) (2-5 Nov. 2007), pp Barouni-Ebrahimi, M., and Ghorbani, A. A. An online frequency rate based algorithm for mining frequent sequences in evolving data streams. In international conference on information technology and management (ICITM 07) (Hong Kong, 2007), pp Barouni-Ebrahimi, M., Zafarani, R., Bagheri, E., and Ghorbani, A. A. Semantic search guidance: Learn from history. In Proceedings of NIPS Workshop on Machine Learning for Web Search (2007).

14 14 Barouni, Bagheri and Ghorbani 5. Bast, H., and Weber, I. Type less, find more: fast autocompletion search with a succinct index. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 06) (New York, NY, USA, 2006), pp Castellucci, A., Ianni, G., Vasile, D., and Costa, S. Searching and surfing the web using a semi-adaptive meta-engine. International Conference on nformation Technology: Coding and Computing (ITCC 01) (2001), Cui, H., Wen, J.-R., Nie, J.-Y., and Ma, W.-Y. Probabilistic query expansion using query logs. In Proceedings of the 11th international conference on World Wide Web (WWW 02) (New York, USA, 2002), pp Ensan, F., Bagheri, E., and Kahani, M. Applying collective experience for crafting suitable search engine query recommendations. In Fifth IEEE/ACM Conference on Communication Networks and Services Research (CNSR 07) (2007), pp Glance, N. S. Community search assistant. In Proceedings of the 6th international conference on Intelligent user interfaces (IUI 01) (New York, NY, USA, 2001), ACM Press, pp Glover, E. J., Lawrence, S., Gordon, M. D., Birmingham, W. P., and Giles, C. L. Web search your way. Communications of the ACM 44, 12 (2001), Haveliwala, T. H. Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering 15, 4 (2003), Joachims, T. Optimizing search engines using clickthrough data. In KDD (2002), pp Joachims, T. Evaluating Retrieval Performance Using Clickthrough Data. Text Mining. 2003, pp Joachims, T., Granka, L. A., Pan, B., Hembrooke, H., and Gay, G. Accurately interpreting clickthrough data as implicit feedback. In SIGIR (2005), pp Joachims, T., Granka, L. A., Pan, B., Hembrooke, H., Radlinski, F., and Gay, G. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transaction on Information Systems 25, 2 (2007), Page, L., Brin, S., Motwani, R., and Winograd, T. The pagerank citation ranking: Bringing order to the web. Tech. Rep. Available via: Radlinski, F., and Joachims, T. Query chains: learning to rank from implicit feedback. In KDD (2005), pp Radlinski, F., and Joachims, T. Active exploration for learning rankings from clickthrough data. In KDD (2007), pp Salton, G., and Buckley, C. Term-weighting approaches in automatic text retrieval. Inf.Process.Manage. 24, 5 (1988), Smyth, B., Balfe, E., Freyne, J., Briggs, P., Coyle, M., and Boydell, O. Exploiting query repetition and regularity in an adaptive community-based web search engine. User Modeling and User-Adapted Interaction 14, 5 (2005), Sugiyama, K., Hatano, K., and Yoshikawa, M. Adaptive web search based on user profile constructed without any effort from users. In Proceedings of the 13th international conference on World Wide Web (WWW 04) (New York, NY, USA, 2004), pp

15 A Frequency Mining-based Sun, R., Ong, C.-H., and Chua, T.-S. Mining dependency relations for query expansion in passage retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 06) (New York, USA, 2006), pp Teevan, J., Dumais, S. T., and Horvitz, E. Personalizing search via automated analysis of interests and activities. In SIGIR 05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (New York, NY, USA, 2005), ACM Press, pp White, R. W., and Marchionini, G. Examining the effectiveness of real-time query expansion. Information Processing and Management 43, 3 (2007), Witten, I. H., Moffat, A., and Bell, T. C. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kauffman Publishing, San Francisco, Xue, G.-R., Zeng, H.-J., Chen, Z., Ma, W.-Y., and Lu, C.-J. Log mining to improve the performance of site search. In WISEW 02: Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops) - (WISEw 02) (Washington, DC, USA, 2002), IEEE Computer Society, p Zhang, W., Xu, B., and Yang, H. Development of a self-adaptive web search engine. Proceedings of 3rd International Workshop on Web Site Evolution (wse 01) 00 (2001), Zhang, Z., and Nasraoui, O. Mining search engine query logs for query recommendation. In Proceedings of the 15th international conference on World Wide Web (WWW 06) (New York, USA, 2006), pp Zhixiang Chen, Xiannong Meng, R. H. F. B. Z. Features: Real-time adaptive feature and document learning for web search. Journal of the American Society for Information Science and Technology 52, 8 (2001), Zobel, J., and Moffat, A. Exploring the similarity space. SIGIR Forum 32, 1 (1998),

Estimating Credibility of User Clicks with Mouse Movement and Eye-tracking Information

Estimating Credibility of User Clicks with Mouse Movement and Eye-tracking Information Estimating Credibility of User Clicks with Mouse Movement and Eye-tracking Information Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma Department of Computer Science and Technology, Tsinghua University Background

More information

Automated Online News Classification with Personalization

Automated Online News Classification with Personalization Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798

More information

A New Technique to Optimize User s Browsing Session using Data Mining

A New Technique to Optimize User s Browsing Session using Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach P.T.Shijili 1 P.G Student, Department of CSE, Dr.Nallini Institute of Engineering & Technology, Dharapuram, Tamilnadu, India

More information

A Metric for Inferring User Search Goals in Search Engines

A Metric for Inferring User Search Goals in Search Engines International Journal of Engineering and Technical Research (IJETR) A Metric for Inferring User Search Goals in Search Engines M. Monika, N. Rajesh, K.Rameshbabu Abstract For a broad topic, different users

More information

Inferring User Search for Feedback Sessions

Inferring User Search for Feedback Sessions Inferring User Search for Feedback Sessions Sharayu Kakade 1, Prof. Ranjana Barde 2 PG Student, Department of Computer Science, MIT Academy of Engineering, Pune, MH, India 1 Assistant Professor, Department

More information

International Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine

International Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine International Journal of Scientific & Engineering Research Volume 2, Issue 12, December-2011 1 Web Search Engine G.Hanumantha Rao*, G.NarenderΨ, B.Srinivasa Rao+, M.Srilatha* Abstract This paper explains

More information

Deep Web Crawling and Mining for Building Advanced Search Application

Deep Web Crawling and Mining for Building Advanced Search Application Deep Web Crawling and Mining for Building Advanced Search Application Zhigang Hua, Dan Hou, Yu Liu, Xin Sun, Yanbing Yu {hua, houdan, yuliu, xinsun, yyu}@cc.gatech.edu College of computing, Georgia Tech

More information

Learning Ranking Functions with SVMs

Learning Ranking Functions with SVMs Learning Ranking Functions with SVMs CS4780/5780 Machine Learning Fall 2012 Thorsten Joachims Cornell University T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings of the ACM Conference

More information

A Novel Approach for Inferring and Analyzing User Search Goals

A Novel Approach for Inferring and Analyzing User Search Goals A Novel Approach for Inferring and Analyzing User Search Goals Y. Sai Krishna 1, N. Swapna Goud 2 1 MTech Student, Department of CSE, Anurag Group of Institutions, India 2 Associate Professor, Department

More information

Query Sugges*ons. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata

Query Sugges*ons. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata Query Sugges*ons Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata Search engines User needs some information search engine tries to bridge this gap ssumption: the

More information

Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page

Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page International Journal of Soft Computing and Engineering (IJSCE) ISSN: 31-307, Volume-, Issue-3, July 01 Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page Neelam Tyagi, Simple

More information

Web Service Usage Mining: Mining For Executable Sequences

Web Service Usage Mining: Mining For Executable Sequences 7th WSEAS International Conference on APPLIED COMPUTER SCIENCE, Venice, Italy, November 21-23, 2007 266 Web Service Usage Mining: Mining For Executable Sequences MOHSEN JAFARI ASBAGH, HASSAN ABOLHASSANI

More information

EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING

EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,

More information

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna

More information

INTRODUCTION. Chapter GENERAL

INTRODUCTION. Chapter GENERAL Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which

More information

On Finding Power Method in Spreading Activation Search

On Finding Power Method in Spreading Activation Search On Finding Power Method in Spreading Activation Search Ján Suchal Slovak University of Technology Faculty of Informatics and Information Technologies Institute of Informatics and Software Engineering Ilkovičova

More information

Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines

Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Appears in WWW 04 Workshop: Measuring Web Effectiveness: The User Perspective, New York, NY, May 18, 2004 Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Anselm

More information

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK 1 Mount Steffi Varish.C, 2 Guru Rama SenthilVel Abstract - Image Mining is a recent trended approach enveloped in

More information

An Adaptive Agent for Web Exploration Based on Concept Hierarchies

An Adaptive Agent for Web Exploration Based on Concept Hierarchies An Adaptive Agent for Web Exploration Based on Concept Hierarchies Scott Parent, Bamshad Mobasher, Steve Lytinen School of Computer Science, Telecommunication and Information Systems DePaul University

More information

Web Structure Mining using Link Analysis Algorithms

Web Structure Mining using Link Analysis Algorithms Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

A Model for Information Retrieval Agent System Based on Keywords Distribution

A Model for Information Retrieval Agent System Based on Keywords Distribution A Model for Information Retrieval Agent System Based on Keywords Distribution Jae-Woo LEE Dept of Computer Science, Kyungbok College, 3, Sinpyeong-ri, Pocheon-si, 487-77, Gyeonggi-do, Korea It2c@koreaackr

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

Keywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts.

Keywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts. Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Advanced Preferred

More information

Semantic Website Clustering

Semantic Website Clustering Semantic Website Clustering I-Hsuan Yang, Yu-tsun Huang, Yen-Ling Huang 1. Abstract We propose a new approach to cluster the web pages. Utilizing an iterative reinforced algorithm, the model extracts semantic

More information

Ranking Classes of Search Engine Results

Ranking Classes of Search Engine Results Ranking Classes of Search Engine Results Zheng Zhu, Mark Levene Department of Computer Science and Information Systems, Birkbeck College, University of London, Malet Street, London, UK zheng@dcs.bbk.ac.uk,

More information

Design of Query Suggestion System using Search Logs and Query Semantics

Design of Query Suggestion System using Search Logs and Query Semantics Design of Query Suggestion System using Search Logs and Query Semantics Abstract Query suggestion is an assistive technology mechanism commonly used in search engines to enable a user to formulate their

More information

LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION

LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION Evgeny Kharitonov *, ***, Anton Slesarev *, ***, Ilya Muchnik **, ***, Fedor Romanenko ***, Dmitry Belyaev ***, Dmitry Kotlyarov *** * Moscow Institute

More information

A User Preference Based Search Engine

A User Preference Based Search Engine A User Preference Based Search Engine 1 Dondeti Swedhan, 2 L.N.B. Srinivas 1 M-Tech, 2 M-Tech 1 Department of Information Technology, 1 SRM University Kattankulathur, Chennai, India Abstract - In this

More information

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html

More information

THE WEB SEARCH ENGINE

THE WEB SEARCH ENGINE International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) Vol.1, Issue 2 Dec 2011 54-60 TJPRC Pvt. Ltd., THE WEB SEARCH ENGINE Mr.G. HANUMANTHA RAO hanu.abc@gmail.com

More information

second_language research_teaching sla vivian_cook language_department idl

second_language research_teaching sla vivian_cook language_department idl Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli

More information

An Improved PageRank Method based on Genetic Algorithm for Web Search

An Improved PageRank Method based on Genetic Algorithm for Web Search Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 2983 2987 Advanced in Control Engineeringand Information Science An Improved PageRank Method based on Genetic Algorithm for Web

More information

INCORPORATING SYNONYMS INTO SNIPPET BASED QUERY RECOMMENDATION SYSTEM

INCORPORATING SYNONYMS INTO SNIPPET BASED QUERY RECOMMENDATION SYSTEM INCORPORATING SYNONYMS INTO SNIPPET BASED QUERY RECOMMENDATION SYSTEM Megha R. Sisode and Ujwala M. Patil Department of Computer Engineering, R. C. Patel Institute of Technology, Shirpur, Maharashtra,

More information

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback RMIT @ TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback Ameer Albahem ameer.albahem@rmit.edu.au Lawrence Cavedon lawrence.cavedon@rmit.edu.au Damiano

More information

Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches

Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches Masaki Eto Gakushuin Women s College Tokyo, Japan masaki.eto@gakushuin.ac.jp Abstract. To improve the search performance

More information

The application of Randomized HITS algorithm in the fund trading network

The application of Randomized HITS algorithm in the fund trading network The application of Randomized HITS algorithm in the fund trading network Xingyu Xu 1, Zhen Wang 1,Chunhe Tao 1,Haifeng He 1 1 The Third Research Institute of Ministry of Public Security,China Abstract.

More information

Query Modifications Patterns During Web Searching

Query Modifications Patterns During Web Searching Bernard J. Jansen The Pennsylvania State University jjansen@ist.psu.edu Query Modifications Patterns During Web Searching Amanda Spink Queensland University of Technology ah.spink@qut.edu.au Bhuva Narayan

More information

Contextual Search Using Ontology-Based User Profiles Susan Gauch EECS Department University of Kansas Lawrence, KS

Contextual Search Using Ontology-Based User Profiles Susan Gauch EECS Department University of Kansas Lawrence, KS Vishnu Challam Microsoft Corporation One Microsoft Way Redmond, WA 9802 vishnuc@microsoft.com Contextual Search Using Ontology-Based User s Susan Gauch EECS Department University of Kansas Lawrence, KS

More information

An Improved Computation of the PageRank Algorithm 1

An Improved Computation of the PageRank Algorithm 1 An Improved Computation of the PageRank Algorithm Sung Jin Kim, Sang Ho Lee School of Computing, Soongsil University, Korea ace@nowuri.net, shlee@computing.ssu.ac.kr http://orion.soongsil.ac.kr/ Abstract.

More information

Success Index: Measuring the efficiency of search engines using implicit user feedback

Success Index: Measuring the efficiency of search engines using implicit user feedback Success Index: Measuring the efficiency of search engines using implicit user feedback Apostolos Kritikopoulos, Martha Sideri, Iraklis Varlamis Athens University of Economics and Business, Patision 76,

More information

Reading Time: A Method for Improving the Ranking Scores of Web Pages

Reading Time: A Method for Improving the Ranking Scores of Web Pages Reading Time: A Method for Improving the Ranking Scores of Web Pages Shweta Agarwal Asst. Prof., CS&IT Deptt. MIT, Moradabad, U.P. India Bharat Bhushan Agarwal Asst. Prof., CS&IT Deptt. IFTM, Moradabad,

More information

Adaptive Search Engines Learning Ranking Functions with SVMs

Adaptive Search Engines Learning Ranking Functions with SVMs Adaptive Search Engines Learning Ranking Functions with SVMs CS478/578 Machine Learning Fall 24 Thorsten Joachims Cornell University T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings

More information

Personalized Information Retrieval by Using Adaptive User Profiling and Collaborative Filtering

Personalized Information Retrieval by Using Adaptive User Profiling and Collaborative Filtering Personalized Information Retrieval by Using Adaptive User Profiling and Collaborative Filtering Department of Computer Science & Engineering, Hanyang University {hcjeon,kimth}@cse.hanyang.ac.kr, jmchoi@hanyang.ac.kr

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

Modeling Contextual Factors of Click Rates

Modeling Contextual Factors of Click Rates Modeling Contextual Factors of Click Rates Hila Becker Columbia University 500 W. 120th Street New York, NY 10027 Christopher Meek Microsoft Research 1 Microsoft Way Redmond, WA 98052 David Maxwell Chickering

More information

CS 6740: Advanced Language Technologies April 2, Lecturer: Lillian Lee Scribes: Navin Sivakumar, Lakshmi Ganesh, Taiyang Chen.

CS 6740: Advanced Language Technologies April 2, Lecturer: Lillian Lee Scribes: Navin Sivakumar, Lakshmi Ganesh, Taiyang Chen. CS 6740: Advanced Language Technologies April 2, 2010 Lecture 15: Implicit Relevance Feedback & Clickthrough Data Lecturer: Lillian Lee Scribes: Navin Sivakumar, Lakshmi Ganesh, Taiyang Chen Abstract Explicit

More information

Information Retrieval. hussein suleman uct cs

Information Retrieval. hussein suleman uct cs Information Management Information Retrieval hussein suleman uct cs 303 2004 Introduction Information retrieval is the process of locating the most relevant information to satisfy a specific information

More information

Context based Re-ranking of Web Documents (CReWD)

Context based Re-ranking of Web Documents (CReWD) Context based Re-ranking of Web Documents (CReWD) Arijit Banerjee, Jagadish Venkatraman Graduate Students, Department of Computer Science, Stanford University arijitb@stanford.edu, jagadish@stanford.edu}

More information

Learning Ranking Functions with SVMs

Learning Ranking Functions with SVMs Learning Ranking Functions with SVMs CS4780/5780 Machine Learning Fall 2014 Thorsten Joachims Cornell University T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings of the ACM Conference

More information

A Mobile Web Focused Search Engine Using Implicit Feedback

A Mobile Web Focused Search Engine Using Implicit Feedback A Mobile Web Focused Search Engine Using Implicit Feedback Malvika Pimple Department of Computer Science University of North Dakota Grand Forks, ND 58202 malvika.pimple@email.und.edu Naima Kaabouch Department

More information

A User Profiles Acquiring Approach Using Pseudo-Relevance Feedback

A User Profiles Acquiring Approach Using Pseudo-Relevance Feedback A User Profiles Acquiring Approach Using Pseudo-Relevance Feedback Xiaohui Tao and Yuefeng Li Faculty of Science & Technology, Queensland University of Technology, Australia {x.tao, y2.li}@qut.edu.au Abstract.

More information

Dynamic Visualization of Hubs and Authorities during Web Search

Dynamic Visualization of Hubs and Authorities during Web Search Dynamic Visualization of Hubs and Authorities during Web Search Richard H. Fowler 1, David Navarro, Wendy A. Lawrence-Fowler, Xusheng Wang Department of Computer Science University of Texas Pan American

More information

Optimizing Search Engines using Click-through Data

Optimizing Search Engines using Click-through Data Optimizing Search Engines using Click-through Data By Sameep - 100050003 Rahee - 100050028 Anil - 100050082 1 Overview Web Search Engines : Creating a good information retrieval system Previous Approaches

More information

From Passages into Elements in XML Retrieval

From Passages into Elements in XML Retrieval From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles

More information

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL Lim Bee Huang 1, Vimala Balakrishnan 2, Ram Gopal Raj 3 1,2 Department of Information System, 3 Department

More information

Success Index: Measuring the efficiency of search engines using implicit user feedback

Success Index: Measuring the efficiency of search engines using implicit user feedback Success Index: Measuring the efficiency of search engines using implicit user feedback Apostolos Kritikopoulos, Martha Sideri, Iraklis Varlamis Athens University of Economics and Business Patision 76,

More information

Equivalence Detection Using Parse-tree Normalization for Math Search

Equivalence Detection Using Parse-tree Normalization for Math Search Equivalence Detection Using Parse-tree Normalization for Math Search Mohammed Shatnawi Department of Computer Info. Systems Jordan University of Science and Tech. Jordan-Irbid (22110)-P.O.Box (3030) mshatnawi@just.edu.jo

More information

Home Page. Title Page. Page 1 of 14. Go Back. Full Screen. Close. Quit

Home Page. Title Page. Page 1 of 14. Go Back. Full Screen. Close. Quit Page 1 of 14 Retrieving Information from the Web Database and Information Retrieval (IR) Systems both manage data! The data of an IR system is a collection of documents (or pages) User tasks: Browsing

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm

Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm Rekha Jain 1, Sulochana Nathawat 2, Dr. G.N. Purohit 3 1 Department of Computer Science, Banasthali University, Jaipur, Rajasthan ABSTRACT

More information

Word Disambiguation in Web Search

Word Disambiguation in Web Search Word Disambiguation in Web Search Rekha Jain Computer Science, Banasthali University, Rajasthan, India Email: rekha_leo2003@rediffmail.com G.N. Purohit Computer Science, Banasthali University, Rajasthan,

More information

User Profiling for Interest-focused Browsing History

User Profiling for Interest-focused Browsing History User Profiling for Interest-focused Browsing History Miha Grčar, Dunja Mladenič, Marko Grobelnik Jozef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia {Miha.Grcar, Dunja.Mladenic, Marko.Grobelnik}@ijs.si

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten

More information

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,

More information

Popularity Weighted Ranking for Academic Digital Libraries

Popularity Weighted Ranking for Academic Digital Libraries Popularity Weighted Ranking for Academic Digital Libraries Yang Sun and C. Lee Giles Information Sciences and Technology The Pennsylvania State University University Park, PA, 16801, USA Abstract. We propose

More information

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor

More information

Music Recommendation with Implicit Feedback and Side Information

Music Recommendation with Implicit Feedback and Side Information Music Recommendation with Implicit Feedback and Side Information Shengbo Guo Yahoo! Labs shengbo@yahoo-inc.com Behrouz Behmardi Criteo b.behmardi@criteo.com Gary Chen Vobile gary.chen@vobileinc.com Abstract

More information

Towards Predicting Web Searcher Gaze Position from Mouse Movements

Towards Predicting Web Searcher Gaze Position from Mouse Movements Towards Predicting Web Searcher Gaze Position from Mouse Movements Qi Guo Emory University 400 Dowman Dr., W401 Atlanta, GA 30322 USA qguo3@emory.edu Eugene Agichtein Emory University 400 Dowman Dr., W401

More information

Intelligent Query Search

Intelligent Query Search www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 4 April 2015, Page No. 11518-11523 Intelligent Query Search Prof.M.R.Kharde, Priyanka Ghuge,Kurhe Prajakta,Cholake

More information

Optimal Rare Query Suggestion With Implicit User Feedback

Optimal Rare Query Suggestion With Implicit User Feedback WWW 21 Full Paper April 26-3 Raleigh NC USA Optimal Rare Query Suggestion With Implicit User Feedback Yang Song, Li-wei He Microsoft Research, One Microsoft Way, Redmond, WA 9852, USA {yangsong, lhe}@microsoft.com

More information

Query suggestion by query search: a new approach to user support in web search

Query suggestion by query search: a new approach to user support in web search Query suggestion by query search: a new approach to user support in web search Shen Jiang Department of Computing Science University of Alberta Edmonton, Alberta, Canada sjiang1@cs.ualberta.ca Sandra Zilles

More information

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

Clickthrough Log Analysis by Collaborative Ranking

Clickthrough Log Analysis by Collaborative Ranking Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Clickthrough Log Analysis by Collaborative Ranking Bin Cao 1, Dou Shen 2, Kuansan Wang 3, Qiang Yang 1 1 Hong Kong

More information

Capturing User Interests by Both Exploitation and Exploration

Capturing User Interests by Both Exploitation and Exploration Capturing User Interests by Both Exploitation and Exploration Ka Cheung Sia 1, Shenghuo Zhu 2, Yun Chi 2, Koji Hino 2, and Belle L. Tseng 2 1 University of California, Los Angeles, CA 90095, USA kcsia@cs.ucla.edu

More information

Automatic New Topic Identification in Search Engine Transaction Log Using Goal Programming

Automatic New Topic Identification in Search Engine Transaction Log Using Goal Programming Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3 6, 2012 Automatic New Topic Identification in Search Engine Transaction Log

More information

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,

More information

Chapter 2. Related Work

Chapter 2. Related Work Chapter 2 Related Work There are three areas of research highly related to our exploration in this dissertation, namely sequential pattern mining, multiple alignment, and approximate frequent pattern mining.

More information

Abstract. 1. Introduction

Abstract. 1. Introduction A Visualization System using Data Mining Techniques for Identifying Information Sources on the Web Richard H. Fowler, Tarkan Karadayi, Zhixiang Chen, Xiaodong Meng, Wendy A. L. Fowler Department of Computer

More information

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University

More information

Hierarchical Location and Topic Based Query Expansion

Hierarchical Location and Topic Based Query Expansion Hierarchical Location and Topic Based Query Expansion Shu Huang 1 Qiankun Zhao 2 Prasenjit Mitra 1 C. Lee Giles 1 Information Sciences and Technology 1 AOL Research Lab 2 Pennsylvania State University

More information

A Parallel Computing Architecture for Information Processing Over the Internet

A Parallel Computing Architecture for Information Processing Over the Internet A Parallel Computing Architecture for Information Processing Over the Internet Wendy A. Lawrence-Fowler, Xiannong Meng, Richard H. Fowler, Zhixiang Chen Department of Computer Science, University of Texas

More information

Efficient Multiple-Click Models in Web Search

Efficient Multiple-Click Models in Web Search Efficient Multiple-Click Models in Web Search Fan Guo Carnegie Mellon University Pittsburgh, PA 15213 fanguo@cs.cmu.edu Chao Liu Microsoft Research Redmond, WA 98052 chaoliu@microsoft.com Yi-Min Wang Microsoft

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two

More information

A Survey On Diversification Techniques For Unabmiguous But Under- Specified Queries

A Survey On Diversification Techniques For Unabmiguous But Under- Specified Queries J. Appl. Environ. Biol. Sci., 4(7S)271-276, 2014 2014, TextRoad Publication ISSN: 2090-4274 Journal of Applied Environmental and Biological Sciences www.textroad.com A Survey On Diversification Techniques

More information

A Model for Interactive Web Information Retrieval

A Model for Interactive Web Information Retrieval A Model for Interactive Web Information Retrieval Orland Hoeber and Xue Dong Yang University of Regina, Regina, SK S4S 0A2, Canada {hoeber, yang}@uregina.ca Abstract. The interaction model supported by

More information

An Introduction to Search Engines and Web Navigation

An Introduction to Search Engines and Web Navigation An Introduction to Search Engines and Web Navigation MARK LEVENE ADDISON-WESLEY Ал imprint of Pearson Education Harlow, England London New York Boston San Francisco Toronto Sydney Tokyo Singapore Hong

More information

A NEW CLUSTER MERGING ALGORITHM OF SUFFIX TREE CLUSTERING

A NEW CLUSTER MERGING ALGORITHM OF SUFFIX TREE CLUSTERING A NEW CLUSTER MERGING ALGORITHM OF SUFFIX TREE CLUSTERING Jianhua Wang, Ruixu Li Computer Science Department, Yantai University, Yantai, Shandong, China Abstract: Key words: Document clustering methods

More information

University of Delaware at Diversity Task of Web Track 2010

University of Delaware at Diversity Task of Web Track 2010 University of Delaware at Diversity Task of Web Track 2010 Wei Zheng 1, Xuanhui Wang 2, and Hui Fang 1 1 Department of ECE, University of Delaware 2 Yahoo! Abstract We report our systems and experiments

More information

SUPPORTING PRIVACY PROTECTION IN PERSONALIZED WEB SEARCH- A REVIEW Neha Dewangan 1, Rugraj 2

SUPPORTING PRIVACY PROTECTION IN PERSONALIZED WEB SEARCH- A REVIEW Neha Dewangan 1, Rugraj 2 SUPPORTING PRIVACY PROTECTION IN PERSONALIZED WEB SEARCH- A REVIEW Neha Dewangan 1, Rugraj 2 1 PG student, Department of Computer Engineering, Alard College of Engineering and Management 2 Department of

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Interface. Dispatcher. Meta Searcher. Index DataBase. Parser & Indexer. Ranker

Interface. Dispatcher. Meta Searcher. Index DataBase. Parser & Indexer. Ranker WebSail: From On-line Learning to Web Search Zhixiang Chen Xiannong Meng Binhai Zhu y Richard H. Fowler Department of Computer Science, University of Texas-Pan American Edinburg, TX 78539, USA. Emails:

More information

PageRank and related algorithms

PageRank and related algorithms PageRank and related algorithms PageRank and HITS Jacob Kogan Department of Mathematics and Statistics University of Maryland, Baltimore County Baltimore, Maryland 21250 kogan@umbc.edu May 15, 2006 Basic

More information