Recent Researches in Ranking Bias of Modern Search Engines

Size: px
Start display at page:

Download "Recent Researches in Ranking Bias of Modern Search Engines"

Transcription

1 DEWS2007 C7-7 Web,, {hirate,yoshida,yamana}@yama.info.waseda.ac.jp Web Web Web Recent Researches in Ranking Bias of Modern Search Engines Yu HIRATE,, Yasuaki YOSHIDA, and Hayato YAMANA, Media Network Center, Waseda University Totsuka-cho, Shinjuku-ku, Tokyo, , Japan Science and Engineering, Waseda University Okubo, Shinjuku-ku, Tokyo, , Japan National Institute of Informatics Hitotsubashi, Chiyoda-ku, Tokyo Japan {hirate,yoshida,yamana}@yama.info.waseda.ac.jp Abstract Recently, modern web search engines are recognized as fundamental tools for information retrieval from the web. It means the information that we are able to obtain may be depend on rankings of search engines. However, if search engines return unfair rankings, then following the suggestion comes up. Is the information that we obtain from the web biased according to the rankings? From this point of view, many researchers and politicians investigated about search engine rankings. We define such unfairness as ranking bias, and surveyed several current researches about ranking bias. In this paper, we report these researches by dividing two categories, comparing index coverages, analysis of ranking algorithms. Also, we report researches about comparing set of results or ranking of search engine. Key words Search Engine, ranking, bias, index, ranking algorithm [1] Web [2] Web Web [3] [4] Web

2 [5] CNET Olsen 2002 Google [6] 2005 Dogpile.com [7] Google [8], Yahoo! [9], MSN [10], Ask Jeeves [11] % [12] 2 CS CS 2 rich-get-richer Googlearchy , rich-get-richer [13] rich-get-richer [13] Top10 Top10 Top10 Top10 Top10 Top10 [14] 2. 2 rich-get-richer UCLA Cho [13] Web rich-get-richer Web Open Directory [15] PageRank [16] PageRank 4 2 PageRank

3 3. 1 ( [13] ) 2 PageRank ( [13] ) PageRank PageRank PageRank rich-get-richer 1 PageRank PageRank 2 rich-get-richer 2. 3 Googlearchy [17] [18] rich-and-richer Web [17] [19] [20] Google Googlearchy [17] [18] [13] Googlocracy [18] PageRank [16] PageRank PageRank 2.1 rich-and-richer Web PageRank PageRank 3. 3 Web Web [21] [22] [23] [24]

4 3.2 [13] [14] [31] [32] [35] [36] [37] Bharat [21] 99 Henzinger [22] 2004 Vaughan [23] 2006 Yossef [24] DEC Bharat 4 97 [21] AltaVista [25], Excite [26], HotBot [27], Infoseek 5 [28] [21] Compaq Henzinger 6 98 [22] Web AltaVista [25], HotBot [27], Excite [26], Infoseek [28], Google [8], Lycos [29] Western Ontario Vaughan [23] Google [8] AltaVista [25] AlltheWeb [30] Web Google Google 5 Go.com Yahoo! 6 Google 7 Google site: AltaVista host: AlltheWeb 3 4 ( [23] ) 1 Google, MSN, Yahoo! [24] Page from Google MSN Yahoo! Indexed by Google 46% 45% MSN 55% 51% Yahoo! 44% 22% 4 Google, MSN,Yahoo! [24] AltaVista Technion Bar-Yossef 2006 [24] Google [8] Yahoo! [9] MSN [10] [24] Query-Pool Query-Pool URL Yahoo! [24] PageRank [13] [14] [31] [32] V R P

5 [13] [31] [14] [32] 2.1 rich-get-richer rich-get-richer [13] [31] UCLA Cho [13] 60 [31] t p V (p, t) t p P (p, t) [13] (1) t V (p, t) P (p, t) (1) P (p, t) t P (p, t) 3 (0 < = t < 13) (13 < = t < 25) 8 [14] Q(p)(0 < = Q(p) < = 1) 1 5 ( [14] ) 6 ( [14] ) (t > = 25) Q(p) V (p, t) P (p, t) 2.25 (2) [4] V (p, t) R(p, t) V (p, t) R(p, t) 1.5 (3) Stanford Web- Base [33] [34] R(p, t) P (p, t) R(p, t) P (p, t) 1.5 (4) P (p, t) [13] [31] Indiana Fortunato 2006 [31] 2 2 [13] V (p, t) P (p, t) 2:25

6 7. 1 [35] [36] Mowshowitz 9 [35] [36] [35] [36] [35] [36] 7 ( [31] ) [31] V (p, t) P (p, t) 1.8 V (p, t) P (p, t) 7 7 [31] [13] [31] V (p, t) P (p, t) Googlearchy Googlocracy 6. 2 [14] UCLA Cho [14] PageRank [16] 3.2 PageRank [14] Q(p) p Q(p) P (p, t) 6. 3 [32] CMU Pandey [32] Randomized Rank Promotion Randomized Rank Promotion 5 P (p, t) [32] 7. [35] [36] [37] [35] [36] [37] E B(E) E C Sim(E) B(E) = 1 Sim(E) (5) E 1, E 2, E 3 q 1 E 1 URL (u 1, u 2, u 3) E 2 URL (u 4, u 3, u 5) E 3 URL (u 3, u 1, u 4) S S 1 = (1, 1, 1, 0, 0), S 2 = (0, 0, 1, 1, 1), S 3 = (1, 0, 1, 1, 0) S S = (2, 1, 3, 2, 1) (6) Sim(E i) S S i E 1, E 2, E 3 B(E 1), B(E 2), B(E 3) B(E 1) = 1 = {3 ( )} 0.5 (7) B(E 2) = 1 = {3 ( )} 0.5 (8) B(E 3) = 1 = {3 ( )} 0.5 (9) [36] About, Ah-ha 10, AltaVista [25], AOLSearch, FastSearch, Google [8], LookSmart, Lycos [29], MSN [10], Netscape, Overture, Sprinks, Teoma, TureSearch, WiseNut, Yahoo! [9] 16 [38] [39] 5 25 [38] [35] [36] 10 Enhance Interactive

7 8 [38] [36] 10 DNA evidence Google 10 [37] 11 Bondi beach Yahoo! 10 [37] 9 [39] [36] 8 [39] AOL Search, Google, Netscape, Yahoo!Ah-ha MSN, Sprinks, Teoma, TrueSearch [36] Bar-llan Bar-llan [37] [37] Google, Yahoo!, Teoma Google Image, Yahoo! Image, Picsearch [40] 2 13 Spearman s footrule [41] [42] Fagin s measure [43] M measure [37] 4 11 Teoma Ask.com [11] 12 US elections 2004, organic food, DNA evidence 13 Twin towers, Bondi beach Google DNA evidence 11 Yahoo! Bondi beach 10 Google DNA evidence [37] 11 Yahoo! Bondi beach 10 Google Google Yahoo! [37] Picsearch [40]

8 API [1], 2006, R D, [2] Y. Hirate, S. Kato, and H. Yamana, Web Structure in Proc. of WAW2006, [3] T. Josvhims, Optimizing Search Engine using Clickthrough Data, Proc. of ACM SIGKDD2002, pp , [4] R. Lempel and S. Moran, Predictive Caching and Prefetching of Query Results in Search Engines, Proc. of WWW2003, pp , [5],, [6] S. Olsen, Does search engine s power threaten Web s independence?, [7] Dogpile Web Search Home Page, [8] Google, [9] Yahoo!, [10] MSN, [11] Ask.com Search Engine, [12] Dogpile.com, Different Engines, Different Results, [13] J. Cho and S. Roy, Impact of Search Engines on Page Popularity, Proc. of WWW2004, pp , [14] J. Cho, S. Roy, and R. E. Adams, Page Quality: In Search of an Unbiased Web Ranking, Proc. of ACM SIG- MOD2005, pp , [15] Open Directory, [16] L. Page, S. Brin, R. Motwani, and T. Winograd, The Pagerank Citation Ranking: Bringing Order to the Web, Technical Report, Stanford University, [17] M. Hindman, K. Tsioutsiouliklis, and J. A. Johnson, Googlearchy: How a Few Heavily-Linked Sites Dominate Politics on the Web, Annual Meeting of the Midwest Political Science Association, [18] F. Menczer, S. Fortunato, A. Flammini, and A. Vespignani, Googlearchy or Googlocracy?, IEEE Spectrum, [19] R. Albert, A.-L. Barabasi, and H. Jeong, Diameter of the World Wide Web, Nature, Vol. 401(6749), pp , [20] A. Broder, R.Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, Graph Structure in the Web, The International Journal of Computer and Telecommunication Networking, Vol. 33, pp , [21] K. Bharat and A. Broder, A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines, Computer Networks and ISDN Systems, Vol. 30, Issue 1 7, pp , [22] M. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork, Measuring Index Quality using Random Walks on the Web, Proc. of WWW1999, pp , [23] L. Vaughan and M. Thelwall, Search Engine Coverage Bias: Evidence and Possible Causes, Information Processing and Management, Vol. 40, No. 4, pp , [24] Z. Bar-Yossef and M. Gurevich, Random Sampling from a Search Engine s Index, Proc. of WWW2006, pp , [25] AltaVista, [26] Excite, [27] HotBot, [28] Go.com, [29] Lycos, [30] AlltheWeb, [31] S. Fortunato, A. Flammini, F. Menczer, and A. Vespignani, The Egalitarian Effect of Search Engine, eprint arxive:cs.cy/ , [32] S. Pandey, S. Roy, C. Olston, J. Cho, and S. Chakrabarti, Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results, Proc. of VLDB2005, pp , [33] H. Hirai, S. Raghavan, H. G-Molina, and A. Paepcke, Webbase: A repository of the Web, Proc. of WWW2000, pp , [34] The Stanford WebBase Project, testbed/doc2/webbase/ [35] A. Mowshowitz and A. Kawaguchi, Bias on the Web, Communications of the ACM, Vol. 45, No. 9, pp , [36] A. Mowshowiz and A. Kawaguchi, Measuring Search Engine Bias, Information Processing and Management, Vol. 41, pp , [37] J. Bar-llan, M. Mat-Hassan and M. Levene, Methods for Comparing Rankings of Search Engine Results, eprint arxive:cs.ir/050539, [38] West Publishing, West s analysis of American low, Eagan, MN: Thomson-West, [39] Library of Congress Classification System, [40] Picsearch - Image Search for Pictures and Images, [41] P. Diaconis, R. L. Graham, Spearman s Footrule as a Measure of Disarray, Journal of the Royal Statistical Society, Series B(Methodological), Vol. 39, pp , [42] C. Dwork, R. Kumar, M. Naor and D. Sivakumar, Rank Aggregation Methods for the Web, Proc. of WWW2001, pp , [43] R. Fagin, R. Kumar and D. Sivakumar, Comparing Top k Lists, SIAM Journal on Discrete Mathematics, Vol. 17, No. 1, pp , 2003.

CS Search Engine Technology

CS Search Engine Technology CS236620 - Search Engine Technology Ronny Lempel Winter 2008/9 The course consists of 14 2-hour meetings, divided into 4 main parts. It aims to cover both engineering and theoretical aspects of search

More information

arxiv:cs/ v2 [cs.cy] 23 Aug 2006

arxiv:cs/ v2 [cs.cy] 23 Aug 2006 The egalitarian effect of search engines Santo Fortunato 1,2 santo@indiana.edu Alessandro Flammini 1 aflammin@indiana.edu Filippo Menczer 1 fil@indiana.edu Alessandro Vespignani 1 alexv@indiana.edu arxiv:cs/0511005v2

More information

Search Engines Considered Harmful In Search of an Unbiased Web Ranking

Search Engines Considered Harmful In Search of an Unbiased Web Ranking Search Engines Considered Harmful In Search of an Unbiased Web Ranking Junghoo John Cho cho@cs.ucla.edu UCLA Search Engines Considered Harmful Junghoo John Cho 1/38 Motivation If you are not indexed by

More information

Review: Searching the Web [Arasu 2001]

Review: Searching the Web [Arasu 2001] Review: Searching the Web [Arasu 2001] Gareth Cronin University of Auckland gareth@cronin.co.nz The authors of Searching the Web present an overview of the state of current technologies employed in the

More information

Link Analysis in Web Information Retrieval

Link Analysis in Web Information Retrieval Link Analysis in Web Information Retrieval Monika Henzinger Google Incorporated Mountain View, California monika@google.com Abstract The analysis of the hyperlink structure of the web has led to significant

More information

Lecture 17 November 7

Lecture 17 November 7 CS 559: Algorithmic Aspects of Computer Networks Fall 2007 Lecture 17 November 7 Lecturer: John Byers BOSTON UNIVERSITY Scribe: Flavio Esposito In this lecture, the last part of the PageRank paper has

More information

Impact of Search Engines on Page Popularity

Impact of Search Engines on Page Popularity Impact of Search Engines on Page Popularity Junghoo John Cho (cho@cs.ucla.edu) Sourashis Roy (roys@cs.ucla.edu) University of California, Los Angeles Impact of Search Engines on Page Popularity J. Cho,

More information

Search Engines Considered Harmful In Search of an Unbiased Web Ranking

Search Engines Considered Harmful In Search of an Unbiased Web Ranking Search Engines Considered Harmful In Search of an Unbiased Web Ranking Junghoo John Cho cho@cs.ucla.edu UCLA Search Engines Considered Harmful Junghoo John Cho 1/45 World-Wide Web 10 years ago With Web

More information

arxiv:cs/ v1 [cs.ir] 26 Apr 2002

arxiv:cs/ v1 [cs.ir] 26 Apr 2002 Navigating the Small World Web by Textual Cues arxiv:cs/0204054v1 [cs.ir] 26 Apr 2002 Filippo Menczer Department of Management Sciences The University of Iowa Iowa City, IA 52242 Phone: (319) 335-0884

More information

The topology of the Web as a complex, scale-free network is now

The topology of the Web as a complex, scale-free network is now Topical interests and the mitigation of search engine bias S. Fortunato*, A. Flammini*, F. Menczer*, and A. Vespignani* *School of Informatics, Indiana University, Bloomington, IN 47406; Fakultät für Physik,

More information

Breadth-First Search Crawling Yields High-Quality Pages

Breadth-First Search Crawling Yields High-Quality Pages Breadth-First Search Crawling Yields High-Quality Pages Marc Najork Compaq Systems Research Center 13 Lytton Avenue Palo Alto, CA 9431, USA marc.najork@compaq.com Janet L. Wiener Compaq Systems Research

More information

Social and Technological Network Data Analytics. Lecture 5: Structure of the Web, Search and Power Laws. Prof Cecilia Mascolo

Social and Technological Network Data Analytics. Lecture 5: Structure of the Web, Search and Power Laws. Prof Cecilia Mascolo Social and Technological Network Data Analytics Lecture 5: Structure of the Web, Search and Power Laws Prof Cecilia Mascolo In This Lecture We describe power law networks and their properties and show

More information

A Metasearch Engine Using the Measure of Uniquenes

A Metasearch Engine Using the Measure of Uniquenes 1112 8610 2 1 1 113 0033 7 3 1 1112 8610 2 1 1 E-mail: yoko@dblab.is.ocha.ac.jp, moris@k.u-tokyo.ac.jp, masunaga@is.ocha.ac.jp, WWW,,..,,.,,.,,., WWW, Yahoo!,. Web,, A Metasearch Engine Using the Measure

More information

on the WorldWideWeb Abstract. The pages and hyperlinks of the World Wide Web may be

on the WorldWideWeb Abstract. The pages and hyperlinks of the World Wide Web may be Average-clicks: A New Measure of Distance on the WorldWideWeb Yutaka Matsuo 12,Yukio Ohsawa 23, and Mitsuru Ishizuka 1 1 University oftokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, JAPAN, matsuo@miv.t.u-tokyo.ac.jp,

More information

A Random Walk Web Crawler with Orthogonally Coupled Heuristics

A Random Walk Web Crawler with Orthogonally Coupled Heuristics A Random Walk Web Crawler with Orthogonally Coupled Heuristics Andrew Walker and Michael P. Evans School of Mathematics, Kingston University, Kingston-Upon-Thames, Surrey, UK Applied Informatics and Semiotics

More information

Search Engine Coverage Bias: Evidence and Possible Causes

Search Engine Coverage Bias: Evidence and Possible Causes Search Engine Coverage Bias: Evidence and Possible Causes Liwen Vaughan * Faculty of Information and Media Studies University of Western Ontario London, Ontario, N6A 5B7, Canada Phone: (519) 661-2111 ext.

More information

On near-uniform URL sampling

On near-uniform URL sampling Computer Networks 33 (2000) 295 308 www.elsevier.com/locate/comnet On near-uniform URL sampling Monika R. Henzinger a,ł, Allan Heydon b, Michael Mitzenmacher c, Marc Najork b a Google, Inc. 2400 Bayshore

More information

Parallel Closed Frequent Pattern Mining on PC Cluster

Parallel Closed Frequent Pattern Mining on PC Cluster DEWS2005 3C-i5 PC, 169-8555 3-4-1 169-8555 3-4-1 101-8430 2-1-2 E-mail: {eigo,hirate}@yama.info.waseda.ac.jp, yamana@waseda.jp FPclose PC 32 PC 2% 30.9 PC Parallel Closed Frequent Pattern Mining on PC

More information

Information Retrieval. Lecture 9 - Web search basics

Information Retrieval. Lecture 9 - Web search basics Information Retrieval Lecture 9 - Web search basics Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 30 Introduction Up to now: techniques for general

More information

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule 1 How big is the Web How big is the Web? In the past, this question

More information

A FAST COMMUNITY BASED ALGORITHM FOR GENERATING WEB CRAWLER SEEDS SET

A FAST COMMUNITY BASED ALGORITHM FOR GENERATING WEB CRAWLER SEEDS SET A FAST COMMUNITY BASED ALGORITHM FOR GENERATING WEB CRAWLER SEEDS SET Shervin Daneshpajouh, Mojtaba Mohammadi Nasiri¹ Computer Engineering Department, Sharif University of Technology, Tehran, Iran daneshpajouh@ce.sharif.edu,

More information

Brian D. Davison Department of Computer Science Rutgers, The State University of New Jersey New Brunswick, NJ USA

Brian D. Davison Department of Computer Science Rutgers, The State University of New Jersey New Brunswick, NJ USA From: AAAI Technical Report WS-00-01. Compilation copyright 2000, AAAI (www.aaai.org). All rights reserved. Recognizing Nepotistic Links on the Web Brian D. Davison Department of Computer Science Rutgers,

More information

Compact Encoding of the Web Graph Exploiting Various Power Laws

Compact Encoding of the Web Graph Exploiting Various Power Laws Compact Encoding of the Web Graph Exploiting Various Power Laws Statistical Reason Behind Link Database Yasuhito Asano, Tsuyoshi Ito 2, Hiroshi Imai 2, Masashi Toyoda 3, and Masaru Kitsuregawa 3 Department

More information

AN EFFICIENT COLLECTION METHOD OF OFFICIAL WEBSITES BY ROBOT PROGRAM

AN EFFICIENT COLLECTION METHOD OF OFFICIAL WEBSITES BY ROBOT PROGRAM AN EFFICIENT COLLECTION METHOD OF OFFICIAL WEBSITES BY ROBOT PROGRAM Masahito Yamamoto, Hidenori Kawamura and Azuma Ohuchi Graduate School of Information Science and Technology, Hokkaido University, Japan

More information

A SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS

A SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 A SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS Satwinder Kaur 1 & Alisha Gupta 2 1 Research Scholar (M.tech

More information

Efficient PageRank approximation via graph aggregation

Efficient PageRank approximation via graph aggregation Inf Retrieval (2006) 9: 123 138 DOI 10.1007/s10791-006-7146-1 Efficient PageRank approximation via graph aggregation A. Z. Broder R. Lempel F. Maghoul J. Pedersen Received: 28 March 2004 / Revised: 30

More information

I/O-Efficient Techniques for Computing Pagerank

I/O-Efficient Techniques for Computing Pagerank I/O-Efficient Techniques for Computing Pagerank Yen-Yu Chen Qingqing Gan Torsten Suel CIS Department Polytechnic University Brooklyn, NY 11201 {yenyu, qq gan, suel}@photon.poly.edu Categories and Subject

More information

Measures of Ignorance on the Web

Measures of Ignorance on the Web Measures of Ignorance on the Web Siddhartha Reddy K. Srinath Srinivasa Mandar R. Mutalikdesai International Institute of Information Technology 26/C, Electronics City, Hosur Road Bangalore, India 561 {siddhartha.reddy.k,

More information

URL-Enhanced Adaptive Page-Refresh Models

URL-Enhanced Adaptive Page-Refresh Models URL-Enhanced Adaptive Page-Refresh Models Robert Warren, Dana Wilkinson, Alejandro López-Ortiz School of Computer Science, University of Waterloo {rhwarren, d3wilkin, alopez-o}@uwaterloo.ca Technical Report

More information

Reliability Verification of Search Engines Hit Counts: How to Select a Reliable Hit Count for a Query

Reliability Verification of Search Engines Hit Counts: How to Select a Reliable Hit Count for a Query Reliability Verification of Search Engines Hit Counts: How to Select a Reliable Hit Count for a Query Takuya Funahashi and Hayato Yamana Computer Science and Engineering Div., Waseda University, 3-4-1

More information

Local Methods for Estimating PageRank Values

Local Methods for Estimating PageRank Values Local Methods for Estimating PageRank Values Yen-Yu Chen Qingqing Gan Torsten Suel CIS Department Polytechnic University Brooklyn, NY 11201 yenyu, qq gan, suel @photon.poly.edu Abstract The Google search

More information

Anatomy of a search engine. Design criteria of a search engine Architecture Data structures

Anatomy of a search engine. Design criteria of a search engine Architecture Data structures Anatomy of a search engine Design criteria of a search engine Architecture Data structures Step-1: Crawling the web Google has a fast distributed crawling system Each crawler keeps roughly 300 connection

More information

A STUDY ON THE EVOLUTION OF THE WEB

A STUDY ON THE EVOLUTION OF THE WEB A STUDY ON THE EVOLUTION OF THE WEB Alexandros Ntoulas, Junghoo Cho, Hyun Kyu Cho 2, Hyeonsung Cho 2, and Young-Jo Cho 2 Summary We seek to gain improved insight into how Web search engines should cope

More information

1.1 Our Solution: Random Walks for Uniform Sampling In order to estimate the results of aggregate queries or the fraction of all web pages that would

1.1 Our Solution: Random Walks for Uniform Sampling In order to estimate the results of aggregate queries or the fraction of all web pages that would Approximating Aggregate Queries about Web Pages via Random Walks Λ Ziv Bar-Yossef y Alexander Berg Steve Chien z Jittat Fakcharoenphol x Dror Weitz Computer Science Division University of California at

More information

HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL

HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL International Journal of Mechanical Engineering & Computer Sciences, Vol.1, Issue 1, Jan-Jun, 2017, pp 12-17 HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL BOMA P.

More information

Why is the Internet so important? Reaching clients on the Internet

Why is the Internet so important? Reaching clients on the Internet Why is the Internet so important? According to a recent study,* 70% of U.S. households use the Internet to search for local goods and services. *March 2005, The Kelsey Group Reaching clients 1 Reaching

More information

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques Kouhei Sugiyama, Hiroyuki Ohsaki and Makoto Imase Graduate School of Information Science and Technology,

More information

Set Cover at Web Scale

Set Cover at Web Scale Set Cover at Web Scale Stergios Stergiou Yahoo! Labs Sunnyvale, CA USA stergios@yahoo-inc.com Kostas Tsioutsiouliklis Yahoo! Labs Sunnyvale, CA USA kostas@yahoo-inc.com ABSTRACT The classic Set Cover problem

More information

Survey on Web Structure Mining

Survey on Web Structure Mining Survey on Web Structure Mining Hiep T. Nguyen Tri, Nam Hoai Nguyen Department of Electronics and Computer Engineering Chonnam National University Republic of Korea Email: tuanhiep1232@gmail.com Abstract

More information

A BEGINNER S GUIDE TO THE WEBGRAPH: PROPERTIES, MODELS AND ALGORITHMS.

A BEGINNER S GUIDE TO THE WEBGRAPH: PROPERTIES, MODELS AND ALGORITHMS. A BEGINNER S GUIDE TO THE WEBGRAPH: PROPERTIES, MODELS AND ALGORITHMS. Debora Donato Luigi Laura Stefano Millozzi Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza {donato,laura,millozzi}@dis.uniroma1.it

More information

Random Sampling of Search Engine s Index Using Monte Carlo Simulation Method

Random Sampling of Search Engine s Index Using Monte Carlo Simulation Method Random Sampling of Search Engine s Index Using Monte Carlo Simulation Method Sajib Kumer Sinha University of Windsor Getting uniform random samples from a search engine s index is a challenging problem

More information

Topical TrustRank: Using Topicality to Combat Web Spam

Topical TrustRank: Using Topicality to Combat Web Spam Topical TrustRank: Using Topicality to Combat Web Spam Baoning Wu Vinay Goel Brian D. Davison Department of Computer Science & Engineering Lehigh University Bethlehem, PA 18015 USA {baw4,vig204,davison}@cse.lehigh.edu

More information

Support System- Pioneering approach for Web Data Mining

Support System- Pioneering approach for Web Data Mining Support System- Pioneering approach for Web Data Mining Geeta Kataria 1, Surbhi Kaushik 2, Nidhi Narang 3 and Sunny Dahiya 4 1,2,3,4 Computer Science Department Kurukshetra University Sonepat, India ABSTRACT

More information

A Method for Finding Search Keywords Associated with Topics using Global Web Access Logs and Web Communities

A Method for Finding Search Keywords Associated with Topics using Global Web Access Logs and Web Communities THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. E-mail: {otsuka,kitsure}@tkl.iis.u-tokyo.ac.jp URL A Method for Finding Search Keywords Associated with

More information

Web Structure Mining using Link Analysis Algorithms

Web Structure Mining using Link Analysis Algorithms Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.

More information

The Structure of E-Government - Developing a Methodology for Quantitative Evaluation -

The Structure of E-Government - Developing a Methodology for Quantitative Evaluation - The Structure of E-Government - Developing a Methodology for Quantitative Evaluation - Vaclav Petricek :: UCL Computer Science Tobias Escher :: UCL Political Science Ingemar Cox :: UCL Computer Science

More information

DATA SEARCH ENGINE INTRODUCTION

DATA SEARCH ENGINE INTRODUCTION D DATA SEARCH ENGINE INTRODUCTION The World Wide Web was first developed by Tim Berners- Lee and his colleagues in 1990. In just over a decade, it has become the largest information source in human history.

More information

What do the Neighbours Think? Computing Web Page Reputations

What do the Neighbours Think? Computing Web Page Reputations What do the Neighbours Think? Computing Web Page Reputations Alberto O. Mendelzon Department of Computer Science University of Toronto mendel@cs.toronto.edu Davood Rafiei Department of Computing Science

More information

Search Engine Techniques: A Review

Search Engine Techniques: A Review Search Engine Techniques: A Review 53 Shivanshu Rastogi Email: rshivanshu1145@gmail.com Zubair Iqbal Email: zubairiqbal17@gmail.com Prabal Bhatnagar Email: prabal_bhatnagar@yahoo.com ABSTRACT The World

More information

arxiv:cs/ v1 [cs.ir] 4 Mar 2005

arxiv:cs/ v1 [cs.ir] 4 Mar 2005 arxiv:cs/0503011v1 [cs.ir] 4 Mar 2005 Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results Sandeep Pandey Sourashis Roy Christopher Olston Carnegie Mellon University

More information

Competitors Analysis Report

Competitors Analysis Report Competitors Analysis Report Copyright 2010 XceedAgents. Do not copy or distribute without prior permission 1 of 6 Document History Document Created By: Date of Creation: June 25 10 Approved By: Last Modified

More information

Kohei Arai 1 Graduate School of Science and Engineering Saga University Saga City, Japan

Kohei Arai 1 Graduate School of Science and Engineering Saga University Saga City, Japan Numerical Representation of Web Sites of Remote Sensing Satellite Data Providers and Its Application to Knowledge Based Information Retrievals with Natural Language Kohei Arai 1 Graduate School of Science

More information

Deep Web Crawling and Mining for Building Advanced Search Application

Deep Web Crawling and Mining for Building Advanced Search Application Deep Web Crawling and Mining for Building Advanced Search Application Zhigang Hua, Dan Hou, Yu Liu, Xin Sun, Yanbing Yu {hua, houdan, yuliu, xinsun, yyu}@cc.gatech.edu College of computing, Georgia Tech

More information

A study of results overlap and uniqueness among major Web search engines

A study of results overlap and uniqueness among major Web search engines Information Processing and Management 42 (2006) 1379 1391 www.elsevier.com/locate/infoproman A study of results overlap and uniqueness among major Web search engines Amanda Spink a, *, Bernard J. Jansen

More information

World Wide Web has specific challenges and opportunities

World Wide Web has specific challenges and opportunities 6. Web Search Motivation Web search, as offered by commercial search engines such as Google, Bing, and DuckDuckGo, is arguably one of the most popular applications of IR methods today World Wide Web has

More information

Searching the Web [Arasu 01]

Searching the Web [Arasu 01] Searching the Web [Arasu 01] Most user simply browse the web Google, Yahoo, Lycos, Ask Others do more specialized searches web search engines submit queries by specifying lists of keywords receive web

More information

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal

More information

Measuring index quality using random walks on the Web

Measuring index quality using random walks on the Web ELSEVIER Measuring index quality using random walks on the Web Monika R. Henzinger Ł,1, Allan Heydon 1, Michael Mitzenmacher 1, Marc Najork 1 Compaq Computer Corporation Systems Research Center, 130 Lytton

More information

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds MAE 298, Lecture 9 April 30, 2007 Web search and decentralized search on small-worlds Search for information Assume some resource of interest is stored at the vertices of a network: Web pages Files in

More information

Time-aware Authority Ranking

Time-aware Authority Ranking Klaus Berberich 1 Michalis Vazirgiannis 1,2 Gerhard Weikum 1 1 Max Planck Institute for Informatics {kberberi, mvazirgi, weikum}@mpi-inf.mpg.de 2 Athens University of Economics & Business {mvazirg}@aueb.gr

More information

Query Modifications Patterns During Web Searching

Query Modifications Patterns During Web Searching Bernard J. Jansen The Pennsylvania State University jjansen@ist.psu.edu Query Modifications Patterns During Web Searching Amanda Spink Queensland University of Technology ah.spink@qut.edu.au Bhuva Narayan

More information

Computer Engineering, University of Pune, Pune, Maharashtra, India 5. Sinhgad Academy of Engineering, University of Pune, Pune, Maharashtra, India

Computer Engineering, University of Pune, Pune, Maharashtra, India 5. Sinhgad Academy of Engineering, University of Pune, Pune, Maharashtra, India Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Performance

More information

An Overview of Search Engine. Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia

An Overview of Search Engine. Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia An Overview of Search Engine Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia haixu@microsoft.com July 24, 2007 1 Outline History of Search Engine Difference Between Software and

More information

Agreeing to Disagree: Search Engines and their Public Interfaces

Agreeing to Disagree: Search Engines and their Public Interfaces Agreeing to Disagree: Search Engines and their Public Interfaces ABSTRACT Frank McCown Old Dominion University Computer Science Department Norfolk, Virginia, USA 23529 fmccown@cs.odu.edu Many studies of

More information

Personalizing PageRank Based on Domain Profiles

Personalizing PageRank Based on Domain Profiles Personalizing PageRank Based on Domain Profiles Mehmet S. Aktas, Mehmet A. Nacar, and Filippo Menczer Computer Science Department Indiana University Bloomington, IN 47405 USA {maktas,mnacar,fil}@indiana.edu

More information

Web Crawling. Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India

Web Crawling. Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India Web Crawling Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India - 382 481. Abstract- A web crawler is a relatively simple automated program

More information

How to Get Your Website Listed on Major Search Engines

How to Get Your Website Listed on Major Search Engines Contents Introduction 1 Submitting via Global Forms 1 Preparing to Submit 2 Submitting to the Top 3 Search Engines 3 Paid Listings 4 Understanding META Tags 5 Adding META Tags to Your Web Site 5 Introduction

More information

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW ISSN: 9 694 (ONLINE) ICTACT JOURNAL ON COMMUNICATION TECHNOLOGY, MARCH, VOL:, ISSUE: WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW V Lakshmi Praba and T Vasantha Department of Computer

More information

~ Ian Hunneybell: WWWT Revision Notes (15/06/2006) ~

~ Ian Hunneybell: WWWT Revision Notes (15/06/2006) ~ . Search Engines, history and different types In the beginning there was Archie (990, indexed computer files) and Gopher (99, indexed plain text documents). Lycos (994) and AltaVista (995) were amongst

More information

Optimizing Search Engines using Click-through Data

Optimizing Search Engines using Click-through Data Optimizing Search Engines using Click-through Data By Sameep - 100050003 Rahee - 100050028 Anil - 100050082 1 Overview Web Search Engines : Creating a good information retrieval system Previous Approaches

More information

Web Searching, Search Engines and Information Retrieval

Web Searching, Search Engines and Information Retrieval Web Searching, Search Engines and Information Retrieval Item type Authors Citation Publisher Journal Journal Article (On-line/Unpaginated) Lewandowski, Dirk Web Searching, Search Engines and Information

More information

Design and implementation of an incremental crawler for large scale web. archives

Design and implementation of an incremental crawler for large scale web. archives DEWS2007 B9-5 Web, 247 850 5 53 8505 4 6 E-mail: ttamura@acm.org, kitsure@tkl.iis.u-tokyo.ac.jp Web Web Web Web Web Web Web URL Web Web PC Web Web Design and implementation of an incremental crawler for

More information

Agreeing to Disagree: Search Engines and Their Public Interfaces

Agreeing to Disagree: Search Engines and Their Public Interfaces Agreeing to Disagree: Search Engines and Their Public Interfaces ABSTRACT Frank McCown Old Dominion University Computer Science Department Norfolk, Virginia, USA 23529 fmccown@cs.odu.edu Google, Yahoo

More information

Graph structure in the web

Graph structure in the web 页码,1/13 Graph structure in the web Andrei Broder 1, Ravi Kumar 2, Farzin Maghoul 1, Prabhakar Raghavan 2, Sridhar Rajagopalan 2, Raymie Stata 3, Andrew Tomkins 2, Janet Wiener 3 1: AltaVista Company, San

More information

Accessibility of INGO FAST 1997 ARTVILLE, LLC. 32 Spring 2000 intelligence

Accessibility of INGO FAST 1997 ARTVILLE, LLC. 32 Spring 2000 intelligence Accessibility of INGO FAST 1997 ARTVILLE, LLC 32 Spring 2000 intelligence On the Web Information On the Web Steve Lawrence C. Lee Giles Search engines do not index sites equally, may not index new pages

More information

Search Engines. Information Technology and Social Life March 2, Ask difference between a search engine and a directory

Search Engines. Information Technology and Social Life March 2, Ask difference between a search engine and a directory Search Engines Information Technology and Social Life March 2, 2005 Ask difference between a search engine and a directory 1 Search Engine History A search engine is a program designed to help find files

More information

Information Retrieval Issues on the World Wide Web

Information Retrieval Issues on the World Wide Web Information Retrieval Issues on the World Wide Web Ashraf Ali 1 Department of Computer Science, Singhania University Pacheri Bari, Rajasthan aali1979@rediffmail.com Dr. Israr Ahmad 2 Department of Computer

More information

The Design of Model for Tibetan Language Search System

The Design of Model for Tibetan Language Search System International Conference on Chemical, Material and Food Engineering (CMFE-2015) The Design of Model for Tibetan Language Search System Wang Zhong School of Information Science and Engineering Lanzhou University

More information

The PageRank Citation Ranking: Bringing Order to the Web

The PageRank Citation Ranking: Bringing Order to the Web The PageRank Citation Ranking: Bringing Order to the Web Marlon Dias msdias@dcc.ufmg.br Information Retrieval DCC/UFMG - 2017 Introduction Paper: The PageRank Citation Ranking: Bringing Order to the Web,

More information

Adding the Temporal Dimension to Search - A Case Study in Publication Search

Adding the Temporal Dimension to Search - A Case Study in Publication Search Adding the emporal Dimension to Search - A Case Study in Publication Search Philip S. Yu IBM.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 psyu@us.ibm.com Xin Li, Bing Liu Department of

More information

Welcome to the class of Web and Information Retrieval! Min Zhang

Welcome to the class of Web and Information Retrieval! Min Zhang Welcome to the class of Web and Information Retrieval! Min Zhang z-m@tsinghua.edu.cn Coffee Time The Sixth Sense By 费伦 Min Zhang z-m@tsinghua.edu.cn III. Key Techniques of Web IR Using web specific page

More information

Using SiteRank for P2P Web Retrieval

Using SiteRank for P2P Web Retrieval Using SiteRank for P2P Web Retrieval Jie Wu Karl Aberer March 24, 2004 EPFL Technical Report ID: IC/2004/31 School of Computer and Communication Sciences Swiss Federal Institute of Technology (EPF), Lausanne

More information

On the Diversity Within Internet Search Engines: A Clustering Approach Using Query Based Metrics

On the Diversity Within Internet Search Engines: A Clustering Approach Using Query Based Metrics On the Diversity Within Internet Search Engines: A Clustering Approach Using Query Based Metrics Sibel Adalı, Brandeis Hill, Malik Magdon-Ismail Rensselaer Polytechnic Institute {sibel, hillb, magdon}@cs.rpi.edu

More information

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search Seo tutorial Seo tutorial Introduction to seo... 4 1. General seo information... 5 1.1 History of search engines... 5 1.2 Common search engine principles... 6 2. Internal ranking factors... 8 2.1 Web page

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Weighted PageRank using the Rank Improvement

Weighted PageRank using the Rank Improvement International Journal of Scientific and Research Publications, Volume 3, Issue 7, July 2013 1 Weighted PageRank using the Rank Improvement Rashmi Rani *, Vinod Jain ** * B.S.Anangpuria. Institute of Technology

More information

An Improved Computation of the PageRank Algorithm 1

An Improved Computation of the PageRank Algorithm 1 An Improved Computation of the PageRank Algorithm Sung Jin Kim, Sang Ho Lee School of Computing, Soongsil University, Korea ace@nowuri.net, shlee@computing.ssu.ac.kr http://orion.soongsil.ac.kr/ Abstract.

More information

Using Bloom Filters to Speed Up HITS-like Ranking Algorithms

Using Bloom Filters to Speed Up HITS-like Ranking Algorithms Using Bloom Filters to Speed Up HITS-like Ranking Algorithms Sreenivas Gollapudi, Marc Najork, and Rina Panigrahy Microsoft Research, Mountain View CA 94043, USA Abstract. This paper describes a technique

More information

Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page

Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page International Journal of Soft Computing and Engineering (IJSCE) ISSN: 31-307, Volume-, Issue-3, July 01 Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page Neelam Tyagi, Simple

More information

Online Research Methodology. Dr. David R. Fletcher

Online Research Methodology. Dr. David R. Fletcher Online Research Methodology Dr. David R. Fletcher drf@xpastor.org www.xpastor.org Areas of Discussion Archived Databases DTS Library Databases Search Engines Footnotes & Bibliographies Archived Databases

More information

Impact of Search Engines on Page Popularity

Impact of Search Engines on Page Popularity Introduction October 10, 2007 Outline Introduction Introduction 1 Introduction Introduction 2 Number of incoming links 3 Random-surfer model Case study 4 Search-dominant model 5 Summary and conclusion

More information

Link Analysis and Site Structure in Information Retrieval

Link Analysis and Site Structure in Information Retrieval In: Dittrich, Klaus; König, Wolfgang; Oberweis, Andreas; Rannenberg, Kai; Wahlster, Wolfgang (Hrsg.): Informatik 2003: Innovative Informatikanwendungen. Beiträge der 33. Jahrestagung der Gesellschaft für

More information

CRAWLING THE WEB: DISCOVERY AND MAINTENANCE OF LARGE-SCALE WEB DATA

CRAWLING THE WEB: DISCOVERY AND MAINTENANCE OF LARGE-SCALE WEB DATA CRAWLING THE WEB: DISCOVERY AND MAINTENANCE OF LARGE-SCALE WEB DATA An Implementation Amit Chawla 11/M.Tech/01, CSE Department Sat Priya Group of Institutions, Rohtak (Haryana), INDIA anshmahi@gmail.com

More information

Compressing the Graph Structure of the Web

Compressing the Graph Structure of the Web Compressing the Graph Structure of the Web Torsten Suel Jun Yuan Abstract A large amount of research has recently focused on the graph structure (or link structure) of the World Wide Web. This structure

More information

A subjective measure of web search quality

A subjective measure of web search quality Information Sciences 169 (2005) 365 381 www.elsevier.com/locate/ins A subjective measure of web search quality M.M. Sufyan Beg * Department of Electrical Engineering, Indian Institute of Technology, Delhi,

More information

An Adaptive Approach in Web Search Algorithm

An Adaptive Approach in Web Search Algorithm International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 15 (2014), pp. 1575-1581 International Research Publications House http://www. irphouse.com An Adaptive Approach

More information

Building a Terabyte-scale Web Data Collection "NW1000G-04" in the NTCIR-5 WEB Task

Building a Terabyte-scale Web Data Collection NW1000G-04 in the NTCIR-5 WEB Task ISSN 346-5597 NII Technical Report Building a Terabyte-scale Web Data Collection "NW000G-04" in the NTCIR-5 WEB Task Masao Takaku, Keizo Oyama, Akiko Aizawa, Haruko Ishikawa, Kengo Minamide, Shin Kato,

More information

Measuring Similarity to Detect

Measuring Similarity to Detect Measuring Similarity to Detect Qualified Links Xiaoguang Qi, Lan Nie, and Brian D. Davison Dept. of Computer Science & Engineering Lehigh University Introduction Approach Experiments Discussion & Conclusion

More information

Estimating Page Importance based on Page Accessing Frequency

Estimating Page Importance based on Page Accessing Frequency Estimating Page Importance based on Page Accessing Frequency Komal Sachdeva Assistant Professor Manav Rachna College of Engineering, Faridabad, India Ashutosh Dixit, Ph.D Associate Professor YMCA University

More information

Web Hosts Growth 2: Narrative Overview Timeline of Developments 1991: Web introduced at CERN 1993: Mosaic popularizes the Web 130 servers to 10,000 in

Web Hosts Growth 2: Narrative Overview Timeline of Developments 1991: Web introduced at CERN 1993: Mosaic popularizes the Web 130 servers to 10,000 in The Web s Missing Links: The Search Engine & Portal Industry Thomas Haigh The Haigh Group & University of Wisconsin, Milwaukee thaigh@computer.org ETH, Informatik Lunch, 18 June 2007 About Me Dual training

More information