An Improved Usage-Based Ranking
|
|
- Cody Logan
- 6 years ago
- Views:
Transcription
1 Chen Ding 1, Chi-Hung Chi 1,2, and Tiejian Luo 2 1 School of Computing, National University of Singapore Lower Kent Ridge Road, Singapore chich@comp.nus.edu.sg 2 The Graduate School of Chinese Academy of Sciences Beijing, P.R. China Abstract. A good ranking is critical to gain a positive searching experience. With usage data collected from past searching activities, it could be improved from current approaches which are largely based on text or link information. In this paper, we proposed a usage-based ranking algorithm. Basically, it calculates the rank score on time duration considering the propagated effect, which is an improvement on the simple selection frequency determined method. Besides, it also has some heuristics to further improve the accuracy of top-positioned results. 1 Introduction Ranking module is a key component in web information retrieval system because it can alleviate the cognitive overload for users to identify most relevant documents from the huge result list by improving the quality and accuracy of top-ranked results. The key to achieve a good ranking for web searching is to make full use of the available sources on the web instead of being confined to the pure text information. One example is the link connectivity of the web graph, which has been largely investigated in many researches ([2], [4], [7]). In traditional IR systems, in order to find the specific information, users often spend some time to provide feedback to refine the search, and such kind of feedback could improve the final ranking. While on the web, the dominant searches are informal ones. Without the clear and urgent information requirements in mind, with the ease of accessing information on the same topic from different web resources, it is not likely for users to spend much time on one searching. Thus, explicit feedback is quite rare on the web. However, the powerful capability of web tracking techniques makes it easier to capture the user behaviors when they browse the web content. From the information such as which links users click, how long users spend on a page, the users satisfaction degree for the relevance of the web page could be estimated. It is actually a kind of implicit feedback from users. We believe that such kind of usage data could be a good source for relevance judging and a quality ranking. X. Meng, J. Su, and Y. Wang (Eds.): WAIM 2002, LNCS 2419, pp , Springer-Verlag Berlin Heidelberg 2002
2 347 The usage data has been investigated a lot in many researches. But normally it is collected from a web site and utilized to better present the site or help user navigation ([5], [8], [13]). There is limited work to utilize the usage data in the web information retrieval systems, especially in the ranking algorithm. For some systems [10] that do use the usage data in ranking, they determine the relevance of a web page by its selection frequency. This measurement is not that accurate to indicate the real relevance. The time spent on reading the page, the operation of saving, printing the page or adding the page to the bookmark, and the action of following the links in the page, are all good indicators, perhaps better than the simple selection frequency. So it is worth for further exploration on how to apply this kind of actual user behavior to the ranking mechanism. It is the purpose of this study to develop a more accurate ranking algorithm to utilize the usage data. In this paper, we developed a usage-based ranking algorithm. The time duration on reading and navigating the web page constitutes the basic rank score, and several heuristics are summarized to further increase the precision for the top-positioned results. We believe that such kind of ranking could supplement the current algorithm (e.g. text-based, link-based) and provide a high accuracy and quality. 2 Related Work The traditional relevance feedback mechanism ([11]) is used to benefit one search session by refining the initial query towards the relevant documents. Other users submitting the same query cannot benefit from it. Thus, the performance improvement by relevance feedback is on a per user basis. While on the web, the implicit feedback information is collected from various users. It is not aimed to benefit one user s retrieval experience. Its underlying rationale is more similar to the collaborative filtering. The relevance or quality of a web page could be determined only by the large number of collaborative judgment data, since users submit the same query usually share some opinions on the result relevance. Thus, collaborative filtering is a closely related area to our work. Collaborative filtering is a way to use others knowledge as recommendations that serve as inputs to a system that aggregates and directs results to the appropriate recipients ([9]). It relies on the fact that people s tastes are not randomly distributed, and there are general trends and patterns within the taste of a person and as well as between groups of people. Most of early collaborative filtering systems ([1]) use the explicit judgment information provided by users to recommend documents. The major disadvantage is that they require the user s participation, which is normally not desired for web searches. To address this problem, several systems have been developed to try to extract user judgments by observing user activities and then conjecturing their evaluation on the documents. We review two systems that aim to improve the web searching.
3 348 C. Ding, C.-H. Chi, and T. Luo KSS ([10]) has a proxy to record users access patterns when they browse the web. The number of times a link is accessed is stored in the proxy and annotated besides the link to indicate its value. It also has a meta search engine that attempts to find the best search engines for a given query based on the selection frequency for results. This number of previous accesses could also be used to rank result lists merged from multiple search engines. In DirectHit search engine, each result URL in the result list is made to point back to the search engine server first, and then re-direct to the target. In this way, what users actually follow upon a result list could be recorded in server log. It gradually develops the data to identify which result pages are popular and which are not. Then in later searches for the same query, the returned pages could be ranked by their popularity (i.e., the access count). The exact ranking mechanism is unknown to the public. 3 User-Based Ranking The general idea for the usage-based ranking is to monitor which pages users choose from the result list and what actions users take upon these pages; from this kind of user behavior information, user s judgment on the relevance of these pages could be induced; usage-based scores are then assigned to them the next time when the same query is submitted by other users. Intuitively, if a page is more frequently selected, its chance to be judged as relevant is higher; if a page is less frequently selected, its chance to be relevant is lower. Thus, it seems to be natural to assign a score proportional to its selection frequency. DirectHit and KSS use this kind of selection frequency determined method to judge the relevance degree of the web page. However, it is not that accurate. For instance, if a user clicks to browse a web page and returns to the result list immediately since it is not relevant, and if this kind of patterns is observed from many different clicks, then it is not correct to judge the relevance of this page based on its selection frequency. The reason might be the inadvertent human mistake, the misleading titles of web pages, or returned summaries not representing the real content. Therefore, the selection frequency is not a good measurement, and the time to spend on a page may be better. If users spend some time on reading through a page, it is more likely for the page to be relevant than the case in which users just click the page. The usage-based score for the page relevance could be better measured with the time spent on it. [3] and [6] confirm this observation. Definitely, the longer the page, the longer users could spend on it. Sometimes, users spend less time on a page just because it is quite short although its content might be very relevant. In order to redeem this effect, the time duration should be normalized to the length of the web page. Most of web pages contain hyperlinks to other pages. When two pages are connected due to their content, the relevance of one page on a query could induce the relevance of the other page. Hence, a higher percentage of links accessed from a page
4 349 could be a strong indication of the relevance of the page. This is particularly important for index pages, which contains a lot of related links, and on which users spend less time than those content-related. In this way, in addition to text information, the hyper information could also contribute to the relevance of a page. Likewise, the total time spent on linked pages is a more appropriate measurement than the access percentage. Thus, when users follow hyperlinks in a page, the time spent on these linked pages could be propagated to this page to increase its relevance degree. From above analyses, the time duration and the access via hyperlinks are two major factors to measure the relevance. They are used to calculate a basic usagebased rank score. The hyperlink effect could be propagated recursively, with a fading factor, along the link hierarchy in which the first level nodes are search results and higher level nodes are expanded from the first level by hyperlinks. Apart from the duration, the usage-based rank is also related to the page latest access time from the search results. For two pages with the same duration value, the one with the latest access time should have a higher rank score because it is more likely to reflect the current user interest on that query. The ranking formula is as follows, 0 URank = n Q, D ( lt 1 lt i= 1 Q i Dur( i)) Dur i) = dur( i) + F dur( L i) u LD linked pagesfromd Where lt Q is the latest access time of query Q and lt i is the latest access time of document D in the ith access for Q; n D is the number of accesses for D from Q; dur( i) is the time spent on D in the ith access, which is normalized on the length of D; F u is the fading factor for durations propagated from linked pages. After the basic score for the web page has been calculated, there should be an adjustment value on it if certain conditions hold. The main purpose for the score adjustment is to decrease the score for the high-positioned pages that are not that relevant judged by previous users and increase the score for the low-positioned pages that are quite relevant known from previous judgments. We concluded four heuristics. Heuristic1 If a web page has a high rank, and its selection frequency is less than the average selection frequency for the query, then it should have a negative adjustment value computed as follows, clickrate( URank = ( 1) ( HR _ THRES rank ) avg( clickrate( ) clickrate = freq( freq( Q)
5 350 C. Ding, C.-H. Chi, and T. Luo Where freq( is the selection frequency of D for which is the same value as n D ; freq(q) is the frequency of Q; rank'( is the average rank position of D in previous searches for Q; average value is averaged on all result documents for Q. When the rank of a document is less than HR_THRES, it is considered to have a high rank. Heuristic2 If a web page has a high rank, and its average duration is less than the lower bound for duration value LB_DUR, then it has a negative adjustment value. (1/ n URank = ( D n Q, D ) Dur( i) i= 1 1) ( HR _ THRES rank ) LB _ DUR Heuristic3 If a web page has a high rank, but it has never been accessed, then it has a negative adjustment value. hrfreq( URank = ( HRFREQ _ THRES) ( HR _ THRES rank ) freq( Q) Where hrfreq( is to measure how many times a document D occurs in the high position of the ranked list for Q and is accessed; HRFREQ_THRES is a threshold value for hrfreq(. Heuristic4 If a document has a low rank, and its selection frequency is larger than a threshold value LB_CLICKRATE, it has a positive adjustment value. LB _ CLICKRATE URank = (1 ) ( rank LR _ THRES) clickrate( When the rank of a document is larger than LR_THRES, it is considered to have a low rank. After the basic score and the adjustment value of the web page are computed, the reliability of the combined value should be measured based on some statistical data, and the final score should be further adjusted on this reliability factor. The reliability of the rank score could be determined by the query frequency, the page selection frequency for a given query and others. Therefore, the final usage-based rank score is the basic rank score adjusted with a value (either negative or positive) and then multiplied by a reliability factor. It is as follows, 0 URank = rf ( Q) ( URank + URank ) rf ( Q) = ltq freq( Q) ( ltq ftq )
6 351 Where rf(q) is the reliability factor for the rank value; ft Q is the first time of query submission for Q. The reliability factor is determined by the usage data collected for the query. If the latest submission time for the query is more current, the usage-based rank for this query is more reliable. If the query is more frequently submitted, the rank is more reliable. If the query exists for longer time in query database, the rank is more reliable. In the above calculation, all the thresholds are selected from the iterative tests on real log data. All the duration and rank position values are normalized before the calculation. 4 Experiment Since our algorithm is targeted on general queries, we chose 21 queries on general topics for the experiment, including intellectual property law Singapore, mobile agent, Ian Thorpe, information retrieval, travel Europe, World Wide Web conference, classical guitar, machine learning researcher, free web server software, amusement park Japan, MP3 Discman, client server computing, concord aircraft, Internet research Singapore, computer science department National University Singapore, information agent, ATM technology, movie awards, quest correct information web hyper search engine, Scatter Gather Douglass Cutting, and WAP specification. After the query set was specified, each query was submitted to an existing search engine (Google) to collect the top 200 results as the test database. This number was considered to be large enough because usually user only review the top 20 to 50 result documents. The usage-based ranking alone may not work sometimes when no usage data is available for some queries. So, the usage-based ranking should be as the complement to some existing algorithms. We chose a ranking algorithm based on both text and link information as the basis. In order to obtain the rank scores, we downloaded the full documents, and performed the whole indexing and ranking procedure on them. The final rank was a linear combination of the basic rank score and the usage-based rank score. We defined two sessions in the experiment. In session 1, evaluators should judge the relevance of results returned from the basic ranking algorithm. The whole evaluation procedure was logged in the proxy server. Based on the evaluation results, usage-based ranking could be calculated. Then in session 2, the new rankings (the combination of basic ranking and usage-based ranking) could be presented to different evaluators to see whether the improvement was made. To evaluate the performance, we used the top-n precision value to measure the relevance. The precision value for the top n results is defined as the ratio of the number of relevant results within top n results to the value of n.
7 352 C. Ding, C.-H. Chi, and T. Luo Figure 1 shows the comparisons of top 30 precision values for Google and our ranking algorithm (basic plus usage-based). The results from session 1 were different with those from session 2 since the relevance was judged by different groups of persons. The figure shows the comparisons for all the 21 queries and also comparisons for average query, average general query and average specific query. Precision Comparisons 90% 80% Top 30 Precision 70% 60% 50% 40% 30% 20% 10% 0% ipls mba it ir te csnus wwwc cg mlr fwss apj mp3d csc ca quest irs Queries waps ia atmt mva scatter avg avg-g avg-s Google session1 session2 Fig. 1. Comparison Graph of Top 30 Precision Values for 21 Queries From these figures, we could see that for most of queries, precision values derived from session 2 were better than those from session 1, and both of them were better than those from Google. As long as the precision values judged by new users were comparable to the precision values judged by old users, it indicated that the combination of usage data collected from previous searches could benefit the later searches conducted by different users. So the experiment results could verify the effectiveness of our proposed ranking algorithm. The improvement over the Google results implied that the usage-based ranking could further enhance the text-and-linkbased Google ranking algorithm and produce a better ranking list. The overall conclusion from these observations was that the usage-based ranking could improve the retrieval effectiveness, when it is combined with other ranking algorithms. From this conclusion, we could know that our proposed ranking algorithm has achieved the expected performance.
8 353 4 Conclusion From the study, we could know that the usage data on past searching experiences could be used to benefit the later searching if it could be utilized in the ranking module. In our proposed usage-based ranking algorithm, the basic rank score is calculated on the time users spend on reading the page and browsing the connected pages, the high-ranked pages may have a negative adjustment value if their positions could not match their actual usage, and the low-ranked pages may have a positive adjustment value if users tend to dig them out from low positions. References 1. Balabanovic, Y. Shoham, "Fab: Content-based Collaborative Recommendation," Communications of the ACM, 40(3), pp , J. M. Kleinberg, "Authoritative Sources in a Hyperlinked Environment," IBM Research Report RJ 10076, J. Konstan, B. Miller, D. Maltz, J. Herlocker, L.Gordon, J. Riedl, "GroupLens: Applying Collaborative Filtering to Usenet News," Communications of ACM, 40(3), pp , M. Marchiori, "The Quest for Correct Information on the Web: Hyper Search Engines," Proceedings of the 6th World Wide Web Conference (WWW6), B. Mobasher, R. Cooley, J. Srivastava, "Automatic Personalization Based on Web Usage Mining," Technical Report TR99-010, Department of Computer Science, Depaul University, M. Morita, Y. Shinoda, "Information Filtering Based on User Behavior Analysis and Best Match Text Retrieval,"Proceedings of 17th ACM SIGIR, L. Page, S. Brin, R. Motwani, T. Winograd, "The PageRank Citation Ranking: Bringing Order to the Web," Stanford University working paper SIDL-WP , M. Perkowitz, O. Etzioni, "Towards Adaptive Web Sites: Conceptual Framework and Case Study," Proceedings of the 8th World Wide Web Conference (WWW8), P. Resnik, H. Varian, "Recommender Systems," Communications of the ACM, 40(3), G. Rodriguez-Mula, H. Garcia-Molina, A. Paepcke, "Collaborative Value Filtering on the Web," Proceedings of the 7th World Wide Web Conference (WWW7), G. Salton, M. J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, Inc., T. W. Yan, M. Jacobsen, H. Garcia-Monila, U. Dayal, "From User Access Patterns to Dynamic Hypertext Linking," Proceedings of the 5th World Wide Web Conference (WWW5), 1996.
Effectively Capturing User Navigation Paths in the Web Using Web Server Logs
Effectively Capturing User Navigation Paths in the Web Using Web Server Logs Amithalal Caldera and Yogesh Deshpande School of Computing and Information Technology, College of Science Technology and Engineering,
More informationA Time-based Recommender System using Implicit Feedback
A Time-based Recommender System using Implicit Feedback T. Q. Lee Department of Mobile Internet Dongyang Technical College Seoul, Korea Abstract - Recommender systems provide personalized recommendations
More informationWeighted Page Rank Algorithm Based on Number of Visits of Links of Web Page
International Journal of Soft Computing and Engineering (IJSCE) ISSN: 31-307, Volume-, Issue-3, July 01 Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page Neelam Tyagi, Simple
More informationSocial Information Filtering
Social Information Filtering Tersia Gowases, Student No: 165531 June 21, 2006 1 Introduction In today s society an individual has access to large quantities of data there is literally information about
More informationAn Improved PageRank Method based on Genetic Algorithm for Web Search
Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 2983 2987 Advanced in Control Engineeringand Information Science An Improved PageRank Method based on Genetic Algorithm for Web
More informationsecond_language research_teaching sla vivian_cook language_department idl
Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli
More informationWeb Structure Mining using Link Analysis Algorithms
Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.
More informationAutomatic Query Type Identification Based on Click Through Information
Automatic Query Type Identification Based on Click Through Information Yiqun Liu 1,MinZhang 1,LiyunRu 2, and Shaoping Ma 1 1 State Key Lab of Intelligent Tech. & Sys., Tsinghua University, Beijing, China
More informationAdvances in Natural and Applied Sciences. Information Retrieval Using Collaborative Filtering and Item Based Recommendation
AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/anas Information Retrieval Using Collaborative Filtering and Item Based Recommendation
More informationCHAPTER THREE INFORMATION RETRIEVAL SYSTEM
CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost
More informationThe application of Randomized HITS algorithm in the fund trading network
The application of Randomized HITS algorithm in the fund trading network Xingyu Xu 1, Zhen Wang 1,Chunhe Tao 1,Haifeng He 1 1 The Third Research Institute of Ministry of Public Security,China Abstract.
More informationPersonalizing PageRank Based on Domain Profiles
Personalizing PageRank Based on Domain Profiles Mehmet S. Aktas, Mehmet A. Nacar, and Filippo Menczer Computer Science Department Indiana University Bloomington, IN 47405 USA {maktas,mnacar,fil}@indiana.edu
More informationihits: Extending HITS for Personal Interests Profiling
ihits: Extending HITS for Personal Interests Profiling Ziming Zhuang School of Information Sciences and Technology The Pennsylvania State University zzhuang@ist.psu.edu Abstract Ever since the boom of
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationCOMP5331: Knowledge Discovery and Data Mining
COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd, Jon M. Kleinberg 1 1 PageRank
More informationSimRank : A Measure of Structural-Context Similarity
SimRank : A Measure of Structural-Context Similarity Glen Jeh and Jennifer Widom 1.Co-Founder at FriendDash 2.Stanford, compter Science, department chair SIGKDD 2002, Citation : 506 (Google Scholar) 1
More informationContent-Based Recommendation for Web Personalization
Content-Based Recommendation for Web Personalization R.Kousalya 1, K.Saranya 2, Dr.V.Saravanan 3 1 PhD Scholar, Manonmaniam Sundaranar University,Tirunelveli HOD,Department of Computer Applications, Dr.NGP
More informationLecture #3: PageRank Algorithm The Mathematics of Google Search
Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,
More informationMining for User Navigation Patterns Based on Page Contents
WSS03 Applications, Products and Services of Web-based Support Systems 27 Mining for User Navigation Patterns Based on Page Contents Yue Xu School of Software Engineering and Data Communications Queensland
More informationRanking web pages using machine learning approaches
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2008 Ranking web pages using machine learning approaches Sweah Liang Yong
More informationBehaviour Recovery and Complicated Pattern Definition in Web Usage Mining
Behaviour Recovery and Complicated Pattern Definition in Web Usage Mining Long Wang and Christoph Meinel Computer Department, Trier University, 54286 Trier, Germany {wang, meinel@}ti.uni-trier.de Abstract.
More informationWeighted PageRank using the Rank Improvement
International Journal of Scientific and Research Publications, Volume 3, Issue 7, July 2013 1 Weighted PageRank using the Rank Improvement Rashmi Rani *, Vinod Jain ** * B.S.Anangpuria. Institute of Technology
More informationA Tagging Approach to Ontology Mapping
A Tagging Approach to Ontology Mapping Colm Conroy 1, Declan O'Sullivan 1, Dave Lewis 1 1 Knowledge and Data Engineering Group, Trinity College Dublin {coconroy,declan.osullivan,dave.lewis}@cs.tcd.ie Abstract.
More informationWEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW
ISSN: 9 694 (ONLINE) ICTACT JOURNAL ON COMMUNICATION TECHNOLOGY, MARCH, VOL:, ISSUE: WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW V Lakshmi Praba and T Vasantha Department of Computer
More informationInternational Journal of Advance Engineering and Research Development. A Review Paper On Various Web Page Ranking Algorithms In Web Mining
Scientific Journal of Impact Factor (SJIF): 4.14 International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 e-issn (O): 2348-4470 p-issn (P): 2348-6406 A Review
More informationWEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE
WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE Ms.S.Muthukakshmi 1, R. Surya 2, M. Umira Taj 3 Assistant Professor, Department of Information Technology, Sri Krishna College of Technology, Kovaipudur,
More informationEXTRACTION OF RELEVANT WEB PAGES USING DATA MINING
Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,
More informationPart 11: Collaborative Filtering. Francesco Ricci
Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating
More informationA New Technique for Ranking Web Pages and Adwords
A New Technique for Ranking Web Pages and Adwords K. P. Shyam Sharath Jagannathan Maheswari Rajavel, Ph.D ABSTRACT Web mining is an active research area which mainly deals with the application on data
More informationAN EFFICIENT COLLECTION METHOD OF OFFICIAL WEBSITES BY ROBOT PROGRAM
AN EFFICIENT COLLECTION METHOD OF OFFICIAL WEBSITES BY ROBOT PROGRAM Masahito Yamamoto, Hidenori Kawamura and Azuma Ohuchi Graduate School of Information Science and Technology, Hokkaido University, Japan
More informationCOMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION
International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION
More informationUsing Implicit Feedback for User Modeling in Internet and Intranet Searching
University of South Carolina Scholar Commons Faculty Publications Library and information Science, School of 5-1-2000 Using Implicit Feedback for User Modeling in Internet and Intranet Searching Jinmook
More informationRiMOM Results for OAEI 2009
RiMOM Results for OAEI 2009 Xiao Zhang, Qian Zhong, Feng Shi, Juanzi Li and Jie Tang Department of Computer Science and Technology, Tsinghua University, Beijing, China zhangxiao,zhongqian,shifeng,ljz,tangjie@keg.cs.tsinghua.edu.cn
More informationDynamic Visualization of Hubs and Authorities during Web Search
Dynamic Visualization of Hubs and Authorities during Web Search Richard H. Fowler 1, David Navarro, Wendy A. Lawrence-Fowler, Xusheng Wang Department of Computer Science University of Texas Pan American
More informationA User-driven Model for Content-based Image Retrieval
A User-driven Model for Content-based Image Retrieval Yi Zhang *, Zhipeng Mo *, Wenbo Li * and Tianhao Zhao * * Tianjin University, Tianjin, China E-mail: yizhang@tju.edu.cn Abstract The intention of image
More informationRanking Techniques in Search Engines
Ranking Techniques in Search Engines Rajat Chaudhari M.Tech Scholar Manav Rachna International University, Faridabad Charu Pujara Assistant professor, Dept. of Computer Science Manav Rachna International
More informationCS 6740: Advanced Language Technologies April 2, Lecturer: Lillian Lee Scribes: Navin Sivakumar, Lakshmi Ganesh, Taiyang Chen.
CS 6740: Advanced Language Technologies April 2, 2010 Lecture 15: Implicit Relevance Feedback & Clickthrough Data Lecturer: Lillian Lee Scribes: Navin Sivakumar, Lakshmi Ganesh, Taiyang Chen Abstract Explicit
More informationComputer Engineering, University of Pune, Pune, Maharashtra, India 5. Sinhgad Academy of Engineering, University of Pune, Pune, Maharashtra, India
Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Performance
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationInternational Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine
International Journal of Scientific & Engineering Research Volume 2, Issue 12, December-2011 1 Web Search Engine G.Hanumantha Rao*, G.NarenderΨ, B.Srinivasa Rao+, M.Srilatha* Abstract This paper explains
More informationAbstract. 1. Introduction
A Visualization System using Data Mining Techniques for Identifying Information Sources on the Web Richard H. Fowler, Tarkan Karadayi, Zhixiang Chen, Xiaodong Meng, Wendy A. L. Fowler Department of Computer
More informationSubjective Relevance: Implications on Interface Design for Information Retrieval Systems
Subjective : Implications on interface design for information retrieval systems Lee, S.S., Theng, Y.L, Goh, H.L.D., & Foo, S (2005). Proc. 8th International Conference of Asian Digital Libraries (ICADL2005),
More informationFinding Hubs and authorities using Information scent to improve the Information Retrieval precision
Finding Hubs and authorities using Information scent to improve the Information Retrieval precision Suruchi Chawla 1, Dr Punam Bedi 2 1 Department of Computer Science, University of Delhi, Delhi, INDIA
More informationCAPTURING USER BROWSING BEHAVIOUR INDICATORS
CAPTURING USER BROWSING BEHAVIOUR INDICATORS Deepika 1, Dr Ashutosh Dixit 2 1,2 Computer Engineering Department, YMCAUST, Faridabad, India ABSTRACT The information available on internet is in unsystematic
More informationUniversity of Amsterdam at INEX 2010: Ad hoc and Book Tracks
University of Amsterdam at INEX 2010: Ad hoc and Book Tracks Jaap Kamps 1,2 and Marijn Koolen 1 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Faculty of Science,
More informationPersonalized Information Retrieval
Personalized Information Retrieval Shihn Yuarn Chen Traditional Information Retrieval Content based approaches Statistical and natural language techniques Results that contain a specific set of words or
More informationAn Enhanced Page Ranking Algorithm Based on Weights and Third level Ranking of the Webpages
An Enhanced Page Ranking Algorithm Based on eights and Third level Ranking of the ebpages Prahlad Kumar Sharma* 1, Sanjay Tiwari #2 M.Tech Scholar, Department of C.S.E, A.I.E.T Jaipur Raj.(India) Asst.
More informationSelection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3
Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3 Department of Computer Science & Engineering, Gitam University, INDIA 1. binducheekati@gmail.com,
More informationOn the Effectiveness of Web Usage Mining for Page Recommendation and Restructuring
On the Effectiveness of Web Usage Mining for Recommendation and Restructuring Hiroshi Ishikawa, Manabu Ohta, Shohei Yokoyama, Junya Nakayama, and Kaoru Katayama Tokyo Metropolitan University Abstract.
More informationRecommendation Models for User Accesses to Web Pages (Invited Paper)
Recommendation Models for User Accesses to Web Pages (Invited Paper) Ṣule Gündüz 1 and M. Tamer Özsu2 1 Department of Computer Science, Istanbul Technical University Istanbul, Turkey, 34390 gunduz@cs.itu.edu.tr
More informationRecent Researches on Web Page Ranking
Recent Researches on Web Page Pradipta Biswas School of Information Technology Indian Institute of Technology Kharagpur, India Importance of Web Page Internet Surfers generally do not bother to go through
More information5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search
Seo tutorial Seo tutorial Introduction to seo... 4 1. General seo information... 5 1.1 History of search engines... 5 1.2 Common search engine principles... 6 2. Internal ranking factors... 8 2.1 Web page
More informationTREC 2017 Dynamic Domain Track Overview
TREC 2017 Dynamic Domain Track Overview Grace Hui Yang Zhiwen Tang Ian Soboroff Georgetown University Georgetown University NIST huiyang@cs.georgetown.edu zt79@georgetown.edu ian.soboroff@nist.gov 1. Introduction
More informationAssociation-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications
Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Daniel Mican, Nicolae Tomai Babes-Bolyai University, Dept. of Business Information Systems, Str. Theodor
More informationLink Recommendation Method Based on Web Content and Usage Mining
Link Recommendation Method Based on Web Content and Usage Mining Przemys law Kazienko and Maciej Kiewra Wroc law University of Technology, Wyb. Wyspiańskiego 27, Wroc law, Poland, kazienko@pwr.wroc.pl,
More informationDATA MINING II - 1DL460. Spring 2014"
DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationAssisting Trustworthiness Based Web Services Selection Using the Fidelity of Websites *
Assisting Trustworthiness Based Web Services Selection Using the Fidelity of Websites * Lijie Wang, Fei Liu, Ge Li **, Liang Gu, Liangjie Zhang, and Bing Xie Software Institute, School of Electronic Engineering
More informationBetter Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web
Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl
More informationAn Improved Computation of the PageRank Algorithm 1
An Improved Computation of the PageRank Algorithm Sung Jin Kim, Sang Ho Lee School of Computing, Soongsil University, Korea ace@nowuri.net, shlee@computing.ssu.ac.kr http://orion.soongsil.ac.kr/ Abstract.
More informationOptimizing Search Engines using Click-through Data
Optimizing Search Engines using Click-through Data By Sameep - 100050003 Rahee - 100050028 Anil - 100050082 1 Overview Web Search Engines : Creating a good information retrieval system Previous Approaches
More informationHYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL
International Journal of Mechanical Engineering & Computer Sciences, Vol.1, Issue 1, Jan-Jun, 2017, pp 12-17 HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL BOMA P.
More informationApplying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task
Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,
More informationA Modified Algorithm to Handle Dangling Pages using Hypothetical Node
A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal
More informationUsing Bloom Filters to Speed Up HITS-like Ranking Algorithms
Using Bloom Filters to Speed Up HITS-like Ranking Algorithms Sreenivas Gollapudi, Marc Najork, and Rina Panigrahy Microsoft Research, Mountain View CA 94043, USA Abstract. This paper describes a technique
More informationReading Time: A Method for Improving the Ranking Scores of Web Pages
Reading Time: A Method for Improving the Ranking Scores of Web Pages Shweta Agarwal Asst. Prof., CS&IT Deptt. MIT, Moradabad, U.P. India Bharat Bhushan Agarwal Asst. Prof., CS&IT Deptt. IFTM, Moradabad,
More informationInformation Gathering Support Interface by the Overview Presentation of Web Search Results
Information Gathering Support Interface by the Overview Presentation of Web Search Results Takumi Kobayashi Kazuo Misue Buntarou Shizuki Jiro Tanaka Graduate School of Systems and Information Engineering
More informationTHE STUDY OF WEB MINING - A SURVEY
THE STUDY OF WEB MINING - A SURVEY Ashish Gupta, Anil Khandekar Abstract over the year s web mining is the very fast growing research field. Web mining contains two research areas: Data mining and World
More informationLET:Towards More Precise Clustering of Search Results
LET:Towards More Precise Clustering of Search Results Yi Zhang, Lidong Bing,Yexin Wang, Yan Zhang State Key Laboratory on Machine Perception Peking University,100871 Beijing, China {zhangyi, bingld,wangyx,zhy}@cis.pku.edu.cn
More informationSemantic Clickstream Mining
Semantic Clickstream Mining Mehrdad Jalali 1, and Norwati Mustapha 2 1 Department of Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran 2 Department of Computer Science, Universiti
More informationWeighted Page Content Rank for Ordering Web Search Result
Weighted Page Content Rank for Ordering Web Search Result Abstract: POOJA SHARMA B.S. Anangpuria Institute of Technology and Management Faridabad, Haryana, India DEEPAK TYAGI St. Anne Mary Education Society,
More informationRoadmap. Roadmap. Ranking Web Pages. PageRank. Roadmap. Random Walks in Ranking Query Results in Semistructured Databases
Roadmap Random Walks in Ranking Query in Vagelis Hristidis Roadmap Ranking Web Pages Rank according to Relevance of page to query Quality of page Roadmap PageRank Stanford project Lawrence Page, Sergey
More informationJoining Collaborative and Content-based Filtering
Joining Collaborative and Content-based Filtering 1 Patrick Baudisch Integrated Publication and Information Systems Institute IPSI German National Research Center for Information Technology GMD 64293 Darmstadt,
More informationINTRODUCTION. Chapter GENERAL
Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which
More informationCompact Encoding of the Web Graph Exploiting Various Power Laws
Compact Encoding of the Web Graph Exploiting Various Power Laws Statistical Reason Behind Link Database Yasuhito Asano, Tsuyoshi Ito 2, Hiroshi Imai 2, Masashi Toyoda 3, and Masaru Kitsuregawa 3 Department
More informationAn Adaptive Approach in Web Search Algorithm
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 15 (2014), pp. 1575-1581 International Research Publications House http://www. irphouse.com An Adaptive Approach
More informationA SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 A SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS Satwinder Kaur 1 & Alisha Gupta 2 1 Research Scholar (M.tech
More informationSocial Navigation Support in E-Learning: What are the Real Footprints?
Social Navigation Support in E-Learning: What are the Real Footprints? Rosta Farzan 2 and Peter Brusilovsky 1,2 1 School of Information Sciences and 2 Intelligent Systems Program Pittsburgh PA 15260, USA
More informationWeb Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy
Text Technologies for Data Science INFR11145 Web Search Instructor: Walid Magdy 14-Nov-2017 Lecture Objectives Learn about: Working with Massive data Link analysis (PageRank) Anchor text 2 1 The Web Document
More informationAnatomy of a search engine. Design criteria of a search engine Architecture Data structures
Anatomy of a search engine Design criteria of a search engine Architecture Data structures Step-1: Crawling the web Google has a fast distributed crawling system Each crawler keeps roughly 300 connection
More informationTitle: Artificial Intelligence: an illustration of one approach.
Name : Salleh Ahshim Student ID: Title: Artificial Intelligence: an illustration of one approach. Introduction This essay will examine how different Web Crawling algorithms and heuristics that are being
More informationPart 11: Collaborative Filtering. Francesco Ricci
Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating
More information5/13/2009. Introduction. Introduction. Introduction. Introduction. Introduction
Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007) Two types of technologies are widely used to overcome
More informationNavigation Retrieval with Site Anchor Text
Navigation Retrieval with Site Anchor Text Hideki Kawai Kenji Tateishi Toshikazu Fukushima NEC Internet Systems Research Labs. 8916-47, Takayama-cho, Ikoma-city, Nara, JAPAN {h-kawai@ab, k-tateishi@bq,
More informationContext based Re-ranking of Web Documents (CReWD)
Context based Re-ranking of Web Documents (CReWD) Arijit Banerjee, Jagadish Venkatraman Graduate Students, Department of Computer Science, Stanford University arijitb@stanford.edu, jagadish@stanford.edu}
More informationA novel approach of web search based on community wisdom
University of Wollongong Research Online Faculty of Engineering - Papers (Archive) Faculty of Engineering and Information Sciences 2008 A novel approach of web search based on community wisdom Weiliang
More informationProject Report. An Introduction to Collaborative Filtering
Project Report An Introduction to Collaborative Filtering Siobhán Grayson 12254530 COMP30030 School of Computer Science and Informatics College of Engineering, Mathematical & Physical Sciences University
More informationA Constrained Spreading Activation Approach to Collaborative Filtering
A Constrained Spreading Activation Approach to Collaborative Filtering Josephine Griffith 1, Colm O Riordan 1, and Humphrey Sorensen 2 1 Dept. of Information Technology, National University of Ireland,
More informationRetrieval Evaluation. Hongning Wang
Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User
More informationSurvey Paper on Web Usage Mining for Web Personalization
ISSN 2278 0211 (Online) Survey Paper on Web Usage Mining for Web Personalization Namdev Anwat Department of Computer Engineering Matoshri College of Engineering & Research Center, Eklahare, Nashik University
More informationE-Business s Page Ranking with Ant Colony Algorithm
E-Business s Page Ranking with Ant Colony Algorithm Asst. Prof. Chonawat Srisa-an, Ph.D. Faculty of Information Technology, Rangsit University 52/347 Phaholyothin Rd. Lakok Pathumthani, 12000 chonawat@rangsit.rsu.ac.th,
More informationPopularity Weighted Ranking for Academic Digital Libraries
Popularity Weighted Ranking for Academic Digital Libraries Yang Sun and C. Lee Giles Information Sciences and Technology The Pennsylvania State University University Park, PA, 16801, USA Abstract. We propose
More informationEvaluating Implicit Measures to Improve Web Search
Evaluating Implicit Measures to Improve Web Search Abstract Steve Fox, Kuldeep Karnawat, Mark Mydland, Susan Dumais, and Thomas White {stevef, kuldeepk, markmyd, sdumais, tomwh}@microsoft.com Of growing
More informationCS224W Final Report Emergence of Global Status Hierarchy in Social Networks
CS224W Final Report Emergence of Global Status Hierarchy in Social Networks Group 0: Yue Chen, Jia Ji, Yizheng Liao December 0, 202 Introduction Social network analysis provides insights into a wide range
More informationInternational Journal of Advancements in Research & Technology, Volume 2, Issue 6, June ISSN
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2013 159 Re-ranking the Results Based on user profile. Email: anuradhakale20@yahoo.com Anuradha R. Kale, Prof. V.T.
More informationA New Measure of the Cluster Hypothesis
A New Measure of the Cluster Hypothesis Mark D. Smucker 1 and James Allan 2 1 Department of Management Sciences University of Waterloo 2 Center for Intelligent Information Retrieval Department of Computer
More informationAn Adaptive Agent for Web Exploration Based on Concept Hierarchies
An Adaptive Agent for Web Exploration Based on Concept Hierarchies Scott Parent, Bamshad Mobasher, Steve Lytinen School of Computer Science, Telecommunication and Information Systems DePaul University
More informationUNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.
UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index
More informationOn Finding Power Method in Spreading Activation Search
On Finding Power Method in Spreading Activation Search Ján Suchal Slovak University of Technology Faculty of Informatics and Information Technologies Institute of Informatics and Software Engineering Ilkovičova
More informationijade Reporter An Intelligent Multi-agent Based Context Aware News Reporting System
ijade Reporter An Intelligent Multi-agent Based Context Aware Reporting System Eddie C.L. Chan and Raymond S.T. Lee The Department of Computing, The Hong Kong Polytechnic University, Hung Hong, Kowloon,
More informationNBA 600: Day 15 Online Search 116 March Daniel Huttenlocher
NBA 600: Day 15 Online Search 116 March 2004 Daniel Huttenlocher Today s Class Finish up network effects topic from last week Searching, browsing, navigating Reading Beyond Google No longer available on
More information