Recent Researches in Ranking Bias of Modern Search Engines
|
|
- Darlene Taylor
- 5 years ago
- Views:
Transcription
1 DEWS2007 C7-7 Web,, {hirate,yoshida,yamana}@yama.info.waseda.ac.jp Web Web Web Recent Researches in Ranking Bias of Modern Search Engines Yu HIRATE,, Yasuaki YOSHIDA, and Hayato YAMANA, Media Network Center, Waseda University Totsuka-cho, Shinjuku-ku, Tokyo, , Japan Science and Engineering, Waseda University Okubo, Shinjuku-ku, Tokyo, , Japan National Institute of Informatics Hitotsubashi, Chiyoda-ku, Tokyo Japan {hirate,yoshida,yamana}@yama.info.waseda.ac.jp Abstract Recently, modern web search engines are recognized as fundamental tools for information retrieval from the web. It means the information that we are able to obtain may be depend on rankings of search engines. However, if search engines return unfair rankings, then following the suggestion comes up. Is the information that we obtain from the web biased according to the rankings? From this point of view, many researchers and politicians investigated about search engine rankings. We define such unfairness as ranking bias, and surveyed several current researches about ranking bias. In this paper, we report these researches by dividing two categories, comparing index coverages, analysis of ranking algorithms. Also, we report researches about comparing set of results or ranking of search engine. Key words Search Engine, ranking, bias, index, ranking algorithm [1] Web [2] Web Web [3] [4] Web
2 [5] CNET Olsen 2002 Google [6] 2005 Dogpile.com [7] Google [8], Yahoo! [9], MSN [10], Ask Jeeves [11] % [12] 2 CS CS 2 rich-get-richer Googlearchy , rich-get-richer [13] rich-get-richer [13] Top10 Top10 Top10 Top10 Top10 Top10 [14] 2. 2 rich-get-richer UCLA Cho [13] Web rich-get-richer Web Open Directory [15] PageRank [16] PageRank 4 2 PageRank
3 3. 1 ( [13] ) 2 PageRank ( [13] ) PageRank PageRank PageRank rich-get-richer 1 PageRank PageRank 2 rich-get-richer 2. 3 Googlearchy [17] [18] rich-and-richer Web [17] [19] [20] Google Googlearchy [17] [18] [13] Googlocracy [18] PageRank [16] PageRank PageRank 2.1 rich-and-richer Web PageRank PageRank 3. 3 Web Web [21] [22] [23] [24]
4 3.2 [13] [14] [31] [32] [35] [36] [37] Bharat [21] 99 Henzinger [22] 2004 Vaughan [23] 2006 Yossef [24] DEC Bharat 4 97 [21] AltaVista [25], Excite [26], HotBot [27], Infoseek 5 [28] [21] Compaq Henzinger 6 98 [22] Web AltaVista [25], HotBot [27], Excite [26], Infoseek [28], Google [8], Lycos [29] Western Ontario Vaughan [23] Google [8] AltaVista [25] AlltheWeb [30] Web Google Google 5 Go.com Yahoo! 6 Google 7 Google site: AltaVista host: AlltheWeb 3 4 ( [23] ) 1 Google, MSN, Yahoo! [24] Page from Google MSN Yahoo! Indexed by Google 46% 45% MSN 55% 51% Yahoo! 44% 22% 4 Google, MSN,Yahoo! [24] AltaVista Technion Bar-Yossef 2006 [24] Google [8] Yahoo! [9] MSN [10] [24] Query-Pool Query-Pool URL Yahoo! [24] PageRank [13] [14] [31] [32] V R P
5 [13] [31] [14] [32] 2.1 rich-get-richer rich-get-richer [13] [31] UCLA Cho [13] 60 [31] t p V (p, t) t p P (p, t) [13] (1) t V (p, t) P (p, t) (1) P (p, t) t P (p, t) 3 (0 < = t < 13) (13 < = t < 25) 8 [14] Q(p)(0 < = Q(p) < = 1) 1 5 ( [14] ) 6 ( [14] ) (t > = 25) Q(p) V (p, t) P (p, t) 2.25 (2) [4] V (p, t) R(p, t) V (p, t) R(p, t) 1.5 (3) Stanford Web- Base [33] [34] R(p, t) P (p, t) R(p, t) P (p, t) 1.5 (4) P (p, t) [13] [31] Indiana Fortunato 2006 [31] 2 2 [13] V (p, t) P (p, t) 2:25
6 7. 1 [35] [36] Mowshowitz 9 [35] [36] [35] [36] [35] [36] 7 ( [31] ) [31] V (p, t) P (p, t) 1.8 V (p, t) P (p, t) 7 7 [31] [13] [31] V (p, t) P (p, t) Googlearchy Googlocracy 6. 2 [14] UCLA Cho [14] PageRank [16] 3.2 PageRank [14] Q(p) p Q(p) P (p, t) 6. 3 [32] CMU Pandey [32] Randomized Rank Promotion Randomized Rank Promotion 5 P (p, t) [32] 7. [35] [36] [37] [35] [36] [37] E B(E) E C Sim(E) B(E) = 1 Sim(E) (5) E 1, E 2, E 3 q 1 E 1 URL (u 1, u 2, u 3) E 2 URL (u 4, u 3, u 5) E 3 URL (u 3, u 1, u 4) S S 1 = (1, 1, 1, 0, 0), S 2 = (0, 0, 1, 1, 1), S 3 = (1, 0, 1, 1, 0) S S = (2, 1, 3, 2, 1) (6) Sim(E i) S S i E 1, E 2, E 3 B(E 1), B(E 2), B(E 3) B(E 1) = 1 = {3 ( )} 0.5 (7) B(E 2) = 1 = {3 ( )} 0.5 (8) B(E 3) = 1 = {3 ( )} 0.5 (9) [36] About, Ah-ha 10, AltaVista [25], AOLSearch, FastSearch, Google [8], LookSmart, Lycos [29], MSN [10], Netscape, Overture, Sprinks, Teoma, TureSearch, WiseNut, Yahoo! [9] 16 [38] [39] 5 25 [38] [35] [36] 10 Enhance Interactive
7 8 [38] [36] 10 DNA evidence Google 10 [37] 11 Bondi beach Yahoo! 10 [37] 9 [39] [36] 8 [39] AOL Search, Google, Netscape, Yahoo!Ah-ha MSN, Sprinks, Teoma, TrueSearch [36] Bar-llan Bar-llan [37] [37] Google, Yahoo!, Teoma Google Image, Yahoo! Image, Picsearch [40] 2 13 Spearman s footrule [41] [42] Fagin s measure [43] M measure [37] 4 11 Teoma Ask.com [11] 12 US elections 2004, organic food, DNA evidence 13 Twin towers, Bondi beach Google DNA evidence 11 Yahoo! Bondi beach 10 Google DNA evidence [37] 11 Yahoo! Bondi beach 10 Google Google Yahoo! [37] Picsearch [40]
8 API [1], 2006, R D, [2] Y. Hirate, S. Kato, and H. Yamana, Web Structure in Proc. of WAW2006, [3] T. Josvhims, Optimizing Search Engine using Clickthrough Data, Proc. of ACM SIGKDD2002, pp , [4] R. Lempel and S. Moran, Predictive Caching and Prefetching of Query Results in Search Engines, Proc. of WWW2003, pp , [5],, [6] S. Olsen, Does search engine s power threaten Web s independence?, [7] Dogpile Web Search Home Page, [8] Google, [9] Yahoo!, [10] MSN, [11] Ask.com Search Engine, [12] Dogpile.com, Different Engines, Different Results, [13] J. Cho and S. Roy, Impact of Search Engines on Page Popularity, Proc. of WWW2004, pp , [14] J. Cho, S. Roy, and R. E. Adams, Page Quality: In Search of an Unbiased Web Ranking, Proc. of ACM SIG- MOD2005, pp , [15] Open Directory, [16] L. Page, S. Brin, R. Motwani, and T. Winograd, The Pagerank Citation Ranking: Bringing Order to the Web, Technical Report, Stanford University, [17] M. Hindman, K. Tsioutsiouliklis, and J. A. Johnson, Googlearchy: How a Few Heavily-Linked Sites Dominate Politics on the Web, Annual Meeting of the Midwest Political Science Association, [18] F. Menczer, S. Fortunato, A. Flammini, and A. Vespignani, Googlearchy or Googlocracy?, IEEE Spectrum, [19] R. Albert, A.-L. Barabasi, and H. Jeong, Diameter of the World Wide Web, Nature, Vol. 401(6749), pp , [20] A. Broder, R.Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, Graph Structure in the Web, The International Journal of Computer and Telecommunication Networking, Vol. 33, pp , [21] K. Bharat and A. Broder, A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines, Computer Networks and ISDN Systems, Vol. 30, Issue 1 7, pp , [22] M. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork, Measuring Index Quality using Random Walks on the Web, Proc. of WWW1999, pp , [23] L. Vaughan and M. Thelwall, Search Engine Coverage Bias: Evidence and Possible Causes, Information Processing and Management, Vol. 40, No. 4, pp , [24] Z. Bar-Yossef and M. Gurevich, Random Sampling from a Search Engine s Index, Proc. of WWW2006, pp , [25] AltaVista, [26] Excite, [27] HotBot, [28] Go.com, [29] Lycos, [30] AlltheWeb, [31] S. Fortunato, A. Flammini, F. Menczer, and A. Vespignani, The Egalitarian Effect of Search Engine, eprint arxive:cs.cy/ , [32] S. Pandey, S. Roy, C. Olston, J. Cho, and S. Chakrabarti, Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results, Proc. of VLDB2005, pp , [33] H. Hirai, S. Raghavan, H. G-Molina, and A. Paepcke, Webbase: A repository of the Web, Proc. of WWW2000, pp , [34] The Stanford WebBase Project, testbed/doc2/webbase/ [35] A. Mowshowitz and A. Kawaguchi, Bias on the Web, Communications of the ACM, Vol. 45, No. 9, pp , [36] A. Mowshowiz and A. Kawaguchi, Measuring Search Engine Bias, Information Processing and Management, Vol. 41, pp , [37] J. Bar-llan, M. Mat-Hassan and M. Levene, Methods for Comparing Rankings of Search Engine Results, eprint arxive:cs.ir/050539, [38] West Publishing, West s analysis of American low, Eagan, MN: Thomson-West, [39] Library of Congress Classification System, [40] Picsearch - Image Search for Pictures and Images, [41] P. Diaconis, R. L. Graham, Spearman s Footrule as a Measure of Disarray, Journal of the Royal Statistical Society, Series B(Methodological), Vol. 39, pp , [42] C. Dwork, R. Kumar, M. Naor and D. Sivakumar, Rank Aggregation Methods for the Web, Proc. of WWW2001, pp , [43] R. Fagin, R. Kumar and D. Sivakumar, Comparing Top k Lists, SIAM Journal on Discrete Mathematics, Vol. 17, No. 1, pp , 2003.
CS Search Engine Technology
CS236620 - Search Engine Technology Ronny Lempel Winter 2008/9 The course consists of 14 2-hour meetings, divided into 4 main parts. It aims to cover both engineering and theoretical aspects of search
More informationarxiv:cs/ v2 [cs.cy] 23 Aug 2006
The egalitarian effect of search engines Santo Fortunato 1,2 santo@indiana.edu Alessandro Flammini 1 aflammin@indiana.edu Filippo Menczer 1 fil@indiana.edu Alessandro Vespignani 1 alexv@indiana.edu arxiv:cs/0511005v2
More informationSearch Engines Considered Harmful In Search of an Unbiased Web Ranking
Search Engines Considered Harmful In Search of an Unbiased Web Ranking Junghoo John Cho cho@cs.ucla.edu UCLA Search Engines Considered Harmful Junghoo John Cho 1/38 Motivation If you are not indexed by
More informationReview: Searching the Web [Arasu 2001]
Review: Searching the Web [Arasu 2001] Gareth Cronin University of Auckland gareth@cronin.co.nz The authors of Searching the Web present an overview of the state of current technologies employed in the
More informationLink Analysis in Web Information Retrieval
Link Analysis in Web Information Retrieval Monika Henzinger Google Incorporated Mountain View, California monika@google.com Abstract The analysis of the hyperlink structure of the web has led to significant
More informationLecture 17 November 7
CS 559: Algorithmic Aspects of Computer Networks Fall 2007 Lecture 17 November 7 Lecturer: John Byers BOSTON UNIVERSITY Scribe: Flavio Esposito In this lecture, the last part of the PageRank paper has
More informationImpact of Search Engines on Page Popularity
Impact of Search Engines on Page Popularity Junghoo John Cho (cho@cs.ucla.edu) Sourashis Roy (roys@cs.ucla.edu) University of California, Los Angeles Impact of Search Engines on Page Popularity J. Cho,
More informationSearch Engines Considered Harmful In Search of an Unbiased Web Ranking
Search Engines Considered Harmful In Search of an Unbiased Web Ranking Junghoo John Cho cho@cs.ucla.edu UCLA Search Engines Considered Harmful Junghoo John Cho 1/45 World-Wide Web 10 years ago With Web
More informationarxiv:cs/ v1 [cs.ir] 26 Apr 2002
Navigating the Small World Web by Textual Cues arxiv:cs/0204054v1 [cs.ir] 26 Apr 2002 Filippo Menczer Department of Management Sciences The University of Iowa Iowa City, IA 52242 Phone: (319) 335-0884
More informationThe topology of the Web as a complex, scale-free network is now
Topical interests and the mitigation of search engine bias S. Fortunato*, A. Flammini*, F. Menczer*, and A. Vespignani* *School of Informatics, Indiana University, Bloomington, IN 47406; Fakultät für Physik,
More informationBreadth-First Search Crawling Yields High-Quality Pages
Breadth-First Search Crawling Yields High-Quality Pages Marc Najork Compaq Systems Research Center 13 Lytton Avenue Palo Alto, CA 9431, USA marc.najork@compaq.com Janet L. Wiener Compaq Systems Research
More informationSocial and Technological Network Data Analytics. Lecture 5: Structure of the Web, Search and Power Laws. Prof Cecilia Mascolo
Social and Technological Network Data Analytics Lecture 5: Structure of the Web, Search and Power Laws Prof Cecilia Mascolo In This Lecture We describe power law networks and their properties and show
More informationA Metasearch Engine Using the Measure of Uniquenes
1112 8610 2 1 1 113 0033 7 3 1 1112 8610 2 1 1 E-mail: yoko@dblab.is.ocha.ac.jp, moris@k.u-tokyo.ac.jp, masunaga@is.ocha.ac.jp, WWW,,..,,.,,.,,., WWW, Yahoo!,. Web,, A Metasearch Engine Using the Measure
More informationon the WorldWideWeb Abstract. The pages and hyperlinks of the World Wide Web may be
Average-clicks: A New Measure of Distance on the WorldWideWeb Yutaka Matsuo 12,Yukio Ohsawa 23, and Mitsuru Ishizuka 1 1 University oftokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, JAPAN, matsuo@miv.t.u-tokyo.ac.jp,
More informationA Random Walk Web Crawler with Orthogonally Coupled Heuristics
A Random Walk Web Crawler with Orthogonally Coupled Heuristics Andrew Walker and Michael P. Evans School of Mathematics, Kingston University, Kingston-Upon-Thames, Surrey, UK Applied Informatics and Semiotics
More informationSearch Engine Coverage Bias: Evidence and Possible Causes
Search Engine Coverage Bias: Evidence and Possible Causes Liwen Vaughan * Faculty of Information and Media Studies University of Western Ontario London, Ontario, N6A 5B7, Canada Phone: (519) 661-2111 ext.
More informationOn near-uniform URL sampling
Computer Networks 33 (2000) 295 308 www.elsevier.com/locate/comnet On near-uniform URL sampling Monika R. Henzinger a,ł, Allan Heydon b, Michael Mitzenmacher c, Marc Najork b a Google, Inc. 2400 Bayshore
More informationParallel Closed Frequent Pattern Mining on PC Cluster
DEWS2005 3C-i5 PC, 169-8555 3-4-1 169-8555 3-4-1 101-8430 2-1-2 E-mail: {eigo,hirate}@yama.info.waseda.ac.jp, yamana@waseda.jp FPclose PC 32 PC 2% 30.9 PC Parallel Closed Frequent Pattern Mining on PC
More informationInformation Retrieval. Lecture 9 - Web search basics
Information Retrieval Lecture 9 - Web search basics Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 30 Introduction Up to now: techniques for general
More informationLecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule
Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule 1 How big is the Web How big is the Web? In the past, this question
More informationA FAST COMMUNITY BASED ALGORITHM FOR GENERATING WEB CRAWLER SEEDS SET
A FAST COMMUNITY BASED ALGORITHM FOR GENERATING WEB CRAWLER SEEDS SET Shervin Daneshpajouh, Mojtaba Mohammadi Nasiri¹ Computer Engineering Department, Sharif University of Technology, Tehran, Iran daneshpajouh@ce.sharif.edu,
More informationBrian D. Davison Department of Computer Science Rutgers, The State University of New Jersey New Brunswick, NJ USA
From: AAAI Technical Report WS-00-01. Compilation copyright 2000, AAAI (www.aaai.org). All rights reserved. Recognizing Nepotistic Links on the Web Brian D. Davison Department of Computer Science Rutgers,
More informationCompact Encoding of the Web Graph Exploiting Various Power Laws
Compact Encoding of the Web Graph Exploiting Various Power Laws Statistical Reason Behind Link Database Yasuhito Asano, Tsuyoshi Ito 2, Hiroshi Imai 2, Masashi Toyoda 3, and Masaru Kitsuregawa 3 Department
More informationAN EFFICIENT COLLECTION METHOD OF OFFICIAL WEBSITES BY ROBOT PROGRAM
AN EFFICIENT COLLECTION METHOD OF OFFICIAL WEBSITES BY ROBOT PROGRAM Masahito Yamamoto, Hidenori Kawamura and Azuma Ohuchi Graduate School of Information Science and Technology, Hokkaido University, Japan
More informationA SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 A SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS Satwinder Kaur 1 & Alisha Gupta 2 1 Research Scholar (M.tech
More informationEfficient PageRank approximation via graph aggregation
Inf Retrieval (2006) 9: 123 138 DOI 10.1007/s10791-006-7146-1 Efficient PageRank approximation via graph aggregation A. Z. Broder R. Lempel F. Maghoul J. Pedersen Received: 28 March 2004 / Revised: 30
More informationI/O-Efficient Techniques for Computing Pagerank
I/O-Efficient Techniques for Computing Pagerank Yen-Yu Chen Qingqing Gan Torsten Suel CIS Department Polytechnic University Brooklyn, NY 11201 {yenyu, qq gan, suel}@photon.poly.edu Categories and Subject
More informationMeasures of Ignorance on the Web
Measures of Ignorance on the Web Siddhartha Reddy K. Srinath Srinivasa Mandar R. Mutalikdesai International Institute of Information Technology 26/C, Electronics City, Hosur Road Bangalore, India 561 {siddhartha.reddy.k,
More informationURL-Enhanced Adaptive Page-Refresh Models
URL-Enhanced Adaptive Page-Refresh Models Robert Warren, Dana Wilkinson, Alejandro López-Ortiz School of Computer Science, University of Waterloo {rhwarren, d3wilkin, alopez-o}@uwaterloo.ca Technical Report
More informationReliability Verification of Search Engines Hit Counts: How to Select a Reliable Hit Count for a Query
Reliability Verification of Search Engines Hit Counts: How to Select a Reliable Hit Count for a Query Takuya Funahashi and Hayato Yamana Computer Science and Engineering Div., Waseda University, 3-4-1
More informationLocal Methods for Estimating PageRank Values
Local Methods for Estimating PageRank Values Yen-Yu Chen Qingqing Gan Torsten Suel CIS Department Polytechnic University Brooklyn, NY 11201 yenyu, qq gan, suel @photon.poly.edu Abstract The Google search
More informationAnatomy of a search engine. Design criteria of a search engine Architecture Data structures
Anatomy of a search engine Design criteria of a search engine Architecture Data structures Step-1: Crawling the web Google has a fast distributed crawling system Each crawler keeps roughly 300 connection
More informationA STUDY ON THE EVOLUTION OF THE WEB
A STUDY ON THE EVOLUTION OF THE WEB Alexandros Ntoulas, Junghoo Cho, Hyun Kyu Cho 2, Hyeonsung Cho 2, and Young-Jo Cho 2 Summary We seek to gain improved insight into how Web search engines should cope
More information1.1 Our Solution: Random Walks for Uniform Sampling In order to estimate the results of aggregate queries or the fraction of all web pages that would
Approximating Aggregate Queries about Web Pages via Random Walks Λ Ziv Bar-Yossef y Alexander Berg Steve Chien z Jittat Fakcharoenphol x Dror Weitz Computer Science Division University of California at
More informationHYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL
International Journal of Mechanical Engineering & Computer Sciences, Vol.1, Issue 1, Jan-Jun, 2017, pp 12-17 HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL BOMA P.
More informationWhy is the Internet so important? Reaching clients on the Internet
Why is the Internet so important? According to a recent study,* 70% of U.S. households use the Internet to search for local goods and services. *March 2005, The Kelsey Group Reaching clients 1 Reaching
More informationStructural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques
Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques Kouhei Sugiyama, Hiroyuki Ohsaki and Makoto Imase Graduate School of Information Science and Technology,
More informationSet Cover at Web Scale
Set Cover at Web Scale Stergios Stergiou Yahoo! Labs Sunnyvale, CA USA stergios@yahoo-inc.com Kostas Tsioutsiouliklis Yahoo! Labs Sunnyvale, CA USA kostas@yahoo-inc.com ABSTRACT The classic Set Cover problem
More informationSurvey on Web Structure Mining
Survey on Web Structure Mining Hiep T. Nguyen Tri, Nam Hoai Nguyen Department of Electronics and Computer Engineering Chonnam National University Republic of Korea Email: tuanhiep1232@gmail.com Abstract
More informationA BEGINNER S GUIDE TO THE WEBGRAPH: PROPERTIES, MODELS AND ALGORITHMS.
A BEGINNER S GUIDE TO THE WEBGRAPH: PROPERTIES, MODELS AND ALGORITHMS. Debora Donato Luigi Laura Stefano Millozzi Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza {donato,laura,millozzi}@dis.uniroma1.it
More informationRandom Sampling of Search Engine s Index Using Monte Carlo Simulation Method
Random Sampling of Search Engine s Index Using Monte Carlo Simulation Method Sajib Kumer Sinha University of Windsor Getting uniform random samples from a search engine s index is a challenging problem
More informationTopical TrustRank: Using Topicality to Combat Web Spam
Topical TrustRank: Using Topicality to Combat Web Spam Baoning Wu Vinay Goel Brian D. Davison Department of Computer Science & Engineering Lehigh University Bethlehem, PA 18015 USA {baw4,vig204,davison}@cse.lehigh.edu
More informationSupport System- Pioneering approach for Web Data Mining
Support System- Pioneering approach for Web Data Mining Geeta Kataria 1, Surbhi Kaushik 2, Nidhi Narang 3 and Sunny Dahiya 4 1,2,3,4 Computer Science Department Kurukshetra University Sonepat, India ABSTRACT
More informationA Method for Finding Search Keywords Associated with Topics using Global Web Access Logs and Web Communities
THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. E-mail: {otsuka,kitsure}@tkl.iis.u-tokyo.ac.jp URL A Method for Finding Search Keywords Associated with
More informationWeb Structure Mining using Link Analysis Algorithms
Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.
More informationThe Structure of E-Government - Developing a Methodology for Quantitative Evaluation -
The Structure of E-Government - Developing a Methodology for Quantitative Evaluation - Vaclav Petricek :: UCL Computer Science Tobias Escher :: UCL Political Science Ingemar Cox :: UCL Computer Science
More informationDATA SEARCH ENGINE INTRODUCTION
D DATA SEARCH ENGINE INTRODUCTION The World Wide Web was first developed by Tim Berners- Lee and his colleagues in 1990. In just over a decade, it has become the largest information source in human history.
More informationWhat do the Neighbours Think? Computing Web Page Reputations
What do the Neighbours Think? Computing Web Page Reputations Alberto O. Mendelzon Department of Computer Science University of Toronto mendel@cs.toronto.edu Davood Rafiei Department of Computing Science
More informationSearch Engine Techniques: A Review
Search Engine Techniques: A Review 53 Shivanshu Rastogi Email: rshivanshu1145@gmail.com Zubair Iqbal Email: zubairiqbal17@gmail.com Prabal Bhatnagar Email: prabal_bhatnagar@yahoo.com ABSTRACT The World
More informationarxiv:cs/ v1 [cs.ir] 4 Mar 2005
arxiv:cs/0503011v1 [cs.ir] 4 Mar 2005 Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results Sandeep Pandey Sourashis Roy Christopher Olston Carnegie Mellon University
More informationCompetitors Analysis Report
Competitors Analysis Report Copyright 2010 XceedAgents. Do not copy or distribute without prior permission 1 of 6 Document History Document Created By: Date of Creation: June 25 10 Approved By: Last Modified
More informationKohei Arai 1 Graduate School of Science and Engineering Saga University Saga City, Japan
Numerical Representation of Web Sites of Remote Sensing Satellite Data Providers and Its Application to Knowledge Based Information Retrievals with Natural Language Kohei Arai 1 Graduate School of Science
More informationDeep Web Crawling and Mining for Building Advanced Search Application
Deep Web Crawling and Mining for Building Advanced Search Application Zhigang Hua, Dan Hou, Yu Liu, Xin Sun, Yanbing Yu {hua, houdan, yuliu, xinsun, yyu}@cc.gatech.edu College of computing, Georgia Tech
More informationA study of results overlap and uniqueness among major Web search engines
Information Processing and Management 42 (2006) 1379 1391 www.elsevier.com/locate/infoproman A study of results overlap and uniqueness among major Web search engines Amanda Spink a, *, Bernard J. Jansen
More informationWorld Wide Web has specific challenges and opportunities
6. Web Search Motivation Web search, as offered by commercial search engines such as Google, Bing, and DuckDuckGo, is arguably one of the most popular applications of IR methods today World Wide Web has
More informationSearching the Web [Arasu 01]
Searching the Web [Arasu 01] Most user simply browse the web Google, Yahoo, Lycos, Ask Others do more specialized searches web search engines submit queries by specifying lists of keywords receive web
More informationA Modified Algorithm to Handle Dangling Pages using Hypothetical Node
A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal
More informationMeasuring index quality using random walks on the Web
ELSEVIER Measuring index quality using random walks on the Web Monika R. Henzinger Ł,1, Allan Heydon 1, Michael Mitzenmacher 1, Marc Najork 1 Compaq Computer Corporation Systems Research Center, 130 Lytton
More informationMAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds
MAE 298, Lecture 9 April 30, 2007 Web search and decentralized search on small-worlds Search for information Assume some resource of interest is stored at the vertices of a network: Web pages Files in
More informationTime-aware Authority Ranking
Klaus Berberich 1 Michalis Vazirgiannis 1,2 Gerhard Weikum 1 1 Max Planck Institute for Informatics {kberberi, mvazirgi, weikum}@mpi-inf.mpg.de 2 Athens University of Economics & Business {mvazirg}@aueb.gr
More informationQuery Modifications Patterns During Web Searching
Bernard J. Jansen The Pennsylvania State University jjansen@ist.psu.edu Query Modifications Patterns During Web Searching Amanda Spink Queensland University of Technology ah.spink@qut.edu.au Bhuva Narayan
More informationComputer Engineering, University of Pune, Pune, Maharashtra, India 5. Sinhgad Academy of Engineering, University of Pune, Pune, Maharashtra, India
Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Performance
More informationAn Overview of Search Engine. Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia
An Overview of Search Engine Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia haixu@microsoft.com July 24, 2007 1 Outline History of Search Engine Difference Between Software and
More informationAgreeing to Disagree: Search Engines and their Public Interfaces
Agreeing to Disagree: Search Engines and their Public Interfaces ABSTRACT Frank McCown Old Dominion University Computer Science Department Norfolk, Virginia, USA 23529 fmccown@cs.odu.edu Many studies of
More informationPersonalizing PageRank Based on Domain Profiles
Personalizing PageRank Based on Domain Profiles Mehmet S. Aktas, Mehmet A. Nacar, and Filippo Menczer Computer Science Department Indiana University Bloomington, IN 47405 USA {maktas,mnacar,fil}@indiana.edu
More informationWeb Crawling. Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India
Web Crawling Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India - 382 481. Abstract- A web crawler is a relatively simple automated program
More informationHow to Get Your Website Listed on Major Search Engines
Contents Introduction 1 Submitting via Global Forms 1 Preparing to Submit 2 Submitting to the Top 3 Search Engines 3 Paid Listings 4 Understanding META Tags 5 Adding META Tags to Your Web Site 5 Introduction
More informationWEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW
ISSN: 9 694 (ONLINE) ICTACT JOURNAL ON COMMUNICATION TECHNOLOGY, MARCH, VOL:, ISSUE: WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW V Lakshmi Praba and T Vasantha Department of Computer
More information~ Ian Hunneybell: WWWT Revision Notes (15/06/2006) ~
. Search Engines, history and different types In the beginning there was Archie (990, indexed computer files) and Gopher (99, indexed plain text documents). Lycos (994) and AltaVista (995) were amongst
More informationOptimizing Search Engines using Click-through Data
Optimizing Search Engines using Click-through Data By Sameep - 100050003 Rahee - 100050028 Anil - 100050082 1 Overview Web Search Engines : Creating a good information retrieval system Previous Approaches
More informationWeb Searching, Search Engines and Information Retrieval
Web Searching, Search Engines and Information Retrieval Item type Authors Citation Publisher Journal Journal Article (On-line/Unpaginated) Lewandowski, Dirk Web Searching, Search Engines and Information
More informationDesign and implementation of an incremental crawler for large scale web. archives
DEWS2007 B9-5 Web, 247 850 5 53 8505 4 6 E-mail: ttamura@acm.org, kitsure@tkl.iis.u-tokyo.ac.jp Web Web Web Web Web Web Web URL Web Web PC Web Web Design and implementation of an incremental crawler for
More informationAgreeing to Disagree: Search Engines and Their Public Interfaces
Agreeing to Disagree: Search Engines and Their Public Interfaces ABSTRACT Frank McCown Old Dominion University Computer Science Department Norfolk, Virginia, USA 23529 fmccown@cs.odu.edu Google, Yahoo
More informationGraph structure in the web
页码,1/13 Graph structure in the web Andrei Broder 1, Ravi Kumar 2, Farzin Maghoul 1, Prabhakar Raghavan 2, Sridhar Rajagopalan 2, Raymie Stata 3, Andrew Tomkins 2, Janet Wiener 3 1: AltaVista Company, San
More informationAccessibility of INGO FAST 1997 ARTVILLE, LLC. 32 Spring 2000 intelligence
Accessibility of INGO FAST 1997 ARTVILLE, LLC 32 Spring 2000 intelligence On the Web Information On the Web Steve Lawrence C. Lee Giles Search engines do not index sites equally, may not index new pages
More informationSearch Engines. Information Technology and Social Life March 2, Ask difference between a search engine and a directory
Search Engines Information Technology and Social Life March 2, 2005 Ask difference between a search engine and a directory 1 Search Engine History A search engine is a program designed to help find files
More informationInformation Retrieval Issues on the World Wide Web
Information Retrieval Issues on the World Wide Web Ashraf Ali 1 Department of Computer Science, Singhania University Pacheri Bari, Rajasthan aali1979@rediffmail.com Dr. Israr Ahmad 2 Department of Computer
More informationThe Design of Model for Tibetan Language Search System
International Conference on Chemical, Material and Food Engineering (CMFE-2015) The Design of Model for Tibetan Language Search System Wang Zhong School of Information Science and Engineering Lanzhou University
More informationThe PageRank Citation Ranking: Bringing Order to the Web
The PageRank Citation Ranking: Bringing Order to the Web Marlon Dias msdias@dcc.ufmg.br Information Retrieval DCC/UFMG - 2017 Introduction Paper: The PageRank Citation Ranking: Bringing Order to the Web,
More informationAdding the Temporal Dimension to Search - A Case Study in Publication Search
Adding the emporal Dimension to Search - A Case Study in Publication Search Philip S. Yu IBM.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 psyu@us.ibm.com Xin Li, Bing Liu Department of
More informationWelcome to the class of Web and Information Retrieval! Min Zhang
Welcome to the class of Web and Information Retrieval! Min Zhang z-m@tsinghua.edu.cn Coffee Time The Sixth Sense By 费伦 Min Zhang z-m@tsinghua.edu.cn III. Key Techniques of Web IR Using web specific page
More informationUsing SiteRank for P2P Web Retrieval
Using SiteRank for P2P Web Retrieval Jie Wu Karl Aberer March 24, 2004 EPFL Technical Report ID: IC/2004/31 School of Computer and Communication Sciences Swiss Federal Institute of Technology (EPF), Lausanne
More informationOn the Diversity Within Internet Search Engines: A Clustering Approach Using Query Based Metrics
On the Diversity Within Internet Search Engines: A Clustering Approach Using Query Based Metrics Sibel Adalı, Brandeis Hill, Malik Magdon-Ismail Rensselaer Polytechnic Institute {sibel, hillb, magdon}@cs.rpi.edu
More information5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search
Seo tutorial Seo tutorial Introduction to seo... 4 1. General seo information... 5 1.1 History of search engines... 5 1.2 Common search engine principles... 6 2. Internal ranking factors... 8 2.1 Web page
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationWeighted PageRank using the Rank Improvement
International Journal of Scientific and Research Publications, Volume 3, Issue 7, July 2013 1 Weighted PageRank using the Rank Improvement Rashmi Rani *, Vinod Jain ** * B.S.Anangpuria. Institute of Technology
More informationAn Improved Computation of the PageRank Algorithm 1
An Improved Computation of the PageRank Algorithm Sung Jin Kim, Sang Ho Lee School of Computing, Soongsil University, Korea ace@nowuri.net, shlee@computing.ssu.ac.kr http://orion.soongsil.ac.kr/ Abstract.
More informationUsing Bloom Filters to Speed Up HITS-like Ranking Algorithms
Using Bloom Filters to Speed Up HITS-like Ranking Algorithms Sreenivas Gollapudi, Marc Najork, and Rina Panigrahy Microsoft Research, Mountain View CA 94043, USA Abstract. This paper describes a technique
More informationWeighted Page Rank Algorithm Based on Number of Visits of Links of Web Page
International Journal of Soft Computing and Engineering (IJSCE) ISSN: 31-307, Volume-, Issue-3, July 01 Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page Neelam Tyagi, Simple
More informationOnline Research Methodology. Dr. David R. Fletcher
Online Research Methodology Dr. David R. Fletcher drf@xpastor.org www.xpastor.org Areas of Discussion Archived Databases DTS Library Databases Search Engines Footnotes & Bibliographies Archived Databases
More informationImpact of Search Engines on Page Popularity
Introduction October 10, 2007 Outline Introduction Introduction 1 Introduction Introduction 2 Number of incoming links 3 Random-surfer model Case study 4 Search-dominant model 5 Summary and conclusion
More informationLink Analysis and Site Structure in Information Retrieval
In: Dittrich, Klaus; König, Wolfgang; Oberweis, Andreas; Rannenberg, Kai; Wahlster, Wolfgang (Hrsg.): Informatik 2003: Innovative Informatikanwendungen. Beiträge der 33. Jahrestagung der Gesellschaft für
More informationCRAWLING THE WEB: DISCOVERY AND MAINTENANCE OF LARGE-SCALE WEB DATA
CRAWLING THE WEB: DISCOVERY AND MAINTENANCE OF LARGE-SCALE WEB DATA An Implementation Amit Chawla 11/M.Tech/01, CSE Department Sat Priya Group of Institutions, Rohtak (Haryana), INDIA anshmahi@gmail.com
More informationCompressing the Graph Structure of the Web
Compressing the Graph Structure of the Web Torsten Suel Jun Yuan Abstract A large amount of research has recently focused on the graph structure (or link structure) of the World Wide Web. This structure
More informationA subjective measure of web search quality
Information Sciences 169 (2005) 365 381 www.elsevier.com/locate/ins A subjective measure of web search quality M.M. Sufyan Beg * Department of Electrical Engineering, Indian Institute of Technology, Delhi,
More informationAn Adaptive Approach in Web Search Algorithm
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 15 (2014), pp. 1575-1581 International Research Publications House http://www. irphouse.com An Adaptive Approach
More informationBuilding a Terabyte-scale Web Data Collection "NW1000G-04" in the NTCIR-5 WEB Task
ISSN 346-5597 NII Technical Report Building a Terabyte-scale Web Data Collection "NW000G-04" in the NTCIR-5 WEB Task Masao Takaku, Keizo Oyama, Akiko Aizawa, Haruko Ishikawa, Kengo Minamide, Shin Kato,
More informationMeasuring Similarity to Detect
Measuring Similarity to Detect Qualified Links Xiaoguang Qi, Lan Nie, and Brian D. Davison Dept. of Computer Science & Engineering Lehigh University Introduction Approach Experiments Discussion & Conclusion
More informationEstimating Page Importance based on Page Accessing Frequency
Estimating Page Importance based on Page Accessing Frequency Komal Sachdeva Assistant Professor Manav Rachna College of Engineering, Faridabad, India Ashutosh Dixit, Ph.D Associate Professor YMCA University
More informationWeb Hosts Growth 2: Narrative Overview Timeline of Developments 1991: Web introduced at CERN 1993: Mosaic popularizes the Web 130 servers to 10,000 in
The Web s Missing Links: The Search Engine & Portal Industry Thomas Haigh The Haigh Group & University of Wisconsin, Milwaukee thaigh@computer.org ETH, Informatik Lunch, 18 June 2007 About Me Dual training
More information