Text and Image Metasearch on the Web

Size: px
Start display at page:

Download "Text and Image Metasearch on the Web"

Transcription

1 Appears in International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 99, CSREA Press, pp , Copyright c CSREA Press. Text and Image Metasearch on the Web ËØ Ú Ä ÛÖ Ò Ò º Ä Ð Æ Ê Ö ÁÒ Ø ØÙØ ÁÒ Ô Ò Ò Ï Ý ÈÖ Ò ØÓÒ Æ ¼ ¼ Ð ÛÖ Ò Ð Ö Ö ºÒ ºÒ ºÓÑ Abstract As the Web continues to increase in size, the relative coverage of Web search engines is decreasing, and search tools that combine the results of multiple search engines are becoming more valuable. This paper provides details of the text and image metasearch functions of the Inquirus search engine developed at the NEC Research Institute. For text metasearch, we describe features including the use of link information in metasearch, and provide statistics on the usage and performance of Inquirus and the Web search engines. For image metasearch, Inquirus queries multiple image search engines on the Web, downloads the actual images, and creates image thumbnails for display to the user. Inquirus handles image search engines that return direct links to images, and engines that return links to HTML pages. For the engines that return HTML pages, Inquirus analyzes the text on the pages in order to predict which images are most likely to correspond to the query. The individual image search engines tend to excel at different classes of queries, and the combination of engines is surprisingly effective at finding images corresponding to a given query. Both the text and image metasearch functions of Inquirus are surprisingly fast, and we describe the parallel architecture of the engine that provides this efficiency. Keywords: text search, image search, metasearch, parallel Web search. We have previously developed Inquirus, a metasearch engine for the Web [3]. Inquirus has a number of differences from typical metasearch engines. Inquirus is fundamentally architectured around downloading and analyzing individual Web pages referenced by the search engines, as opposed to simply retrieving and merging search engine responses. This means that Inquirus is always up to date and working with the current contents of pages, and allows extensive analysis to be done on page contents. The page analysis functions performed by Inquirus include extracting query-sensitive summaries that make relevance assessment easier for the user, using a consistent relevance measure across all pages (difficult for a typical metasearch engine), filtering non-relevant documents, improved duplicate detection based on query term context, query term highlighting and quick jump links when viewing pages, query expansion using co-occurring morphological variants, and clustering using the co-occurrence of words and phrases. Inquirus returns results progressively as they are processed, and the parallel architecture of Inquirus allows it to typically return the first result faster than a standard search engine can respond. While the use of multiple search engines improves coverage of the Web, we note that most of the advantages of Inquirus are also applicable even if only one search engine is used. This paper provides details of the image metasearch functions of Inquirus, as well as providing details and statistics for text metasearch not covered in previous papers. 1 Introduction Limitations of the search services on the Web have led to the introduction of metasearch engines [1, ]. A metasearch engine searches the Web by making requests to multiple search engines such as AltaVista or Infoseek. The results from the individual search engines are combined into a single result set. Advantages of metasearch engines include a consistent interface to multiple engines and improved coverage. Image Metasearch There are a number of image databases on the Web, e.g. Yahoo Image Surfer ( ØØÔ»» ÙÖ ºÝ ÓÓº ÓÑ») and AltaVista PhotoFinder ( ØØÔ»» Ñ º ÐØ Ú Ø ºÓÑ»). Many of the large Web search engines also allow searching for images using keywords, e.g. Lycos ( ØØÔ»»ÛÛÛºÐÝÓ ºÓÑ»), HotBot ( ØØÔ»»ÛÛÛº ÓØ ÓغÓÑ»), and AltaVista ( ØØÔ»» ÛÛÛº ÐØ Ú Ø ºÓÑ»). However, these databases tend to index many different images and tend to excel at dif-

2 ferent queries. The same concepts that have been applied to text metasearch can also be applied to images. The following sections describe the motivation for and operation of the keyword based image metasearch functions of Inquirus..1 Previous Work Several companies [for example, PLS (ÛÛÛºÔÐ ºÓÑ), Lexis-Nexis (ÛÛÛºÐ Ü ¹Ò Ü ºÓÑ), DIALOG (ÛÛÛº ÐÓ ºÓÑ), and Verity (ÛÛÛºÚ Ö ØݺÓÑ)] have produced systems that integrate results from multiple heterogeneous databases [4]. Many text metasearch services exist such as the popular and useful MetaCrawler service [1]. MetaSEEk [5] is a metasearch engine for images. MetaSEEk targets query by example as opposed to keyword search, which is the focus of the image metasearch functions of Inquirus.. Motivation The principle motivations behind the image metasearch functions of Inquirus are similar to the motivations behind the creation of text metasearch engines: the poor precision, limited coverage, limited availability, and limited user interfaces of the image search engines. Expanding on these points: Poor precision. Automatically determining the relevance of images to keyword based queries is a difficult task and the various image search engines typically use different means of assessing relevance. For a given query, it is difficult to know which image database will produce the best results. Limited coverage. The search engines that index publicly available images on the Web (e.g. Lycos) do not index the Web exhaustively [6]. As with their coverage of the Web in general, their coverage of images is limited. The different engines tend to index different images, so coverage can be improved by combining the results of multiple engines. Limited availability. Due to search engine and/or network difficulties, we have observed that the engine which responds the quickest varies over time. Limited user interfaces. Many of the image search engines support a different query syntax, e.g. some engines do not support the use of phrases in queries. One advantage of metasearch techniques is that queries can be modified appropriately for individual engines..3 Architecture Figure 1 shows a simplified control flow diagram for the image metasearch algorithm we have used. The engine consists of two main logical parts: the image metasearch code and a parallel page retrieval daemon. Pseudocode for (a simplified version of) the search code is as follows: ÈÖÓ Ø Ö ÕÙ Ø ØÓ ÝÒØ Ü Ò Ö Ø ºº Ö ÙÐ Ö ÜÔÖ ÓÒ Û Ö Ù ØÓºº Ñ Ø ÕÙ ÖÝ Ø ÖÑ Ë Ò Ö ÕÙ Ø ÑÓ ÔÔÖÓÔÖ Ø Ðݵ Ö Ð Ú ÒØ Ñ Ö Ò Ò ØÓ Ðкº ÄÓÓÔ ÓÖ Ô Ö ØÖ Ú ÙÒØ Ð Ñ Ü ÑÙÑ ÒÙÑ Öºº Ó Ñ ÓÖ ÐÐ Ô Ö ØÖ Ú Ò ÐÓÓÔ Á Ô ÖÓÑ Ö Ò Ò È Ö Ö Ò Ò Ö ÔÓÒ ºº ÜØÖ Ø Ò Ø Ò ÒÝ Ð Ò ÓÖºº Ø Ò ÜØ Ø Ó Ö ÙÐØ Ë Ò Ö ÕÙ Ø ÓÖ ÐÐ Ó Ø Ø Ë Ò Ö ÕÙ Ø ÓÖ Ø Ò ÜØ Ø Ó ºº Ö ÙÐØ ÔÔÐ Ð Ð Ô Ò Ñ Ð Ò Á Ñ Ñ Ø ÔÐ Ý Ö Ø Ö ºº Ø Ò Ñ ØÓ Ø ÔÐ Ý ÕÙ Ù Ò ÐÝÞ ÕÙ ÖÝ Ø ÖÑ ÐÓ Ø ÓÒ Ò Ø ºº Ô Ò ÔÖ Ø Û Òݵ Ó ºº Ø Ñ ÓÒ Ø Ô ÓÖÖ ÔÓÒ ºº ØÓ Ø ÕÙ ÖÝ ¹ Ò Ö ÕÙ Ø ØÓºº ÓÛÒÐÓ Ø Ñ Á Ò Ñ Ö Ò Ø ÔÐ Ý ÕÙ Ù Ò Ö Ø Ò Ð Ñ ÑÓÒØ Ó Ø ºº Ñ Ò Ø ÕÙ Ù ÔÐ Ý Ø ÑÓÒØ Ð Ð ºº Ñ Û Ö ÔÓÖØ ÓÒ Ó Ø ºº Ñ ÓÖÖ ÔÓÒ Ò ØÓ Ø ºº ÓÖ Ò Ð Ò Ú Ù Ð Ñ ÓÛ ºº Ø Ð Ô ÓÖ Ø ÓÖ Ò Ð Ñ Á ÒÝ Ñ Ö Ò Ø ÔÐ Ý ÕÙ Ù Ò Ö Ø Ò Ð Ñ ÑÓÒØ Ó Ø Ñ ºº Ò Ø ÕÙ Ù ÔÐ Ý Ø ÑÓÒØ Ð Ð Ñ ºº Û Ö ÔÓÖØ ÓÒ Ó Ø Ñ ºº ÓÖÖ ÔÓÒ Ò ØÓ Ø ÓÖ Ò Ð Ò Ú Ù Ðºº Ñ ÓÛ Ø Ð Ô ÓÖ Ø ºº ÓÖ Ò Ð Ñ ÈÖ ÒØ ÙÑÑ ÖÝ Ø Ø Ø As indicated in the pseudocode, for those engines that return a list of Web pages, the engine analyzes the pages in order to predict which (if any) of the images on the pages correspond to the query. The mechanism for doing this is currently relatively simple: the engine first looks for query terms in the filename of image files. If none are found then it locates the image closest to the query terms.

3 Figure 1. Simplified control flow for image metasearch. Interactions with the page retrieval daemon are shown in gray. Very small or very thin images are filtered out these images are typically icons or separators. Note that the engine displays images in groups of five, creating a montage of the five thumbnails. An image map is used to display the appropriate page when the individual thumbnails are clicked on. The montage technique was used because some browser implementations were prone to crash when presented with a large number of the smaller thumbnail images. A few notes about the architecture of the engine: 1. The engine does not rely on the validity of URLs provided by the image search engines all images are downloaded and processed, thereby filtering out links which are no longer valid.. The engine presents thumbnails of all images. Because of the low precision of the search engines this helps users to determine the relevance of an image more easily and more rapidly, when compared to engines that only return a URL referencing an image or a page containing an image. 3. Results are returned progressively as the images are downloaded and analyzed, rather than after all images are downloaded. 4. When clicking on a thumbnail, the full image is displayed in another window. For images contained on Web pages, the pages are shown with the query terms highlighted. 5. All images are cached locally, which results in a substantial improvement in browsing speed, and a reduction in frustration waiting for images to load. 6. The engine detects and filters out identical images using a checksum on the files. Figure shows the response of Inquirus to the image query ÙÒ Ø. 3

4 Find: Home Options Help Feedback Add URL Analysis Papers About sunset Web Usenet Web & Usenet News Images All Hits: 1 Context: 1 Cluster: No Tracking: No Searching for: sunset using: Yahoo HotBot AltaVista Photo Finder. [... section deleted...] Engine Response Total Retrieved Processed Duplicates AltaVista Images Yes HotBot Images Yes Photo Finder Yes Yahoo Images Yes Figure. Sample response of Inquirus for an image search with the keyword ÙÒ Ø..4 Efficiency A first impression of the image metasearch technique presented here may be that the technique would be too slow. However, it is actually surprisingly fast. The first results from the engine typically display within a few seconds and the remaining images typically display faster than the user browses the currently displayed images (using a T1 connection). Part of the perceived speed is due to the parallel architecture of the engine. Figure 3 shows the median time for the first of Ò engines to respond when queries are made simultaneously to Ò Web search engines (as happens in Inquirus). Figure 4 shows the median time for the first of Ò arbitrary Web pages to be downloaded when queries are made simultaneously to all Ò pages. We use the median because the distribution of response times is significantly skewed, as shown in Figure 5, which shows a histogram of page retrieval times for arbitrary Web pages. It can be seen that downloading pages in parallel can result in the first pages being retrieved significantly faster than the average page retrieval times, which helps explain why the image metasearch engine can start displaying results quickly. One potential drawback of this image metasearch technique is that it uses a significant amount of bandwidth. We note simply that bandwidth is increasing rapidly and that the bandwidth issue may become less important in the future. Certainly, the bandwidth requirements are far less than brute force search of the Web. 3 Text Metasearch Details of many of the text metasearch features of Inquirus have been provided before [3]. This section provides details of the use of links by Inquirus, and statistics on the usage of the service and the performance of search engines. 4

5 Median Time for the First of n Web Search Engines to Respond.5 Median Time to Respond Number of Search Engines Figure 3. Median time (seconds) for the first of Ò Web search engines to respond. Median Time to Download the First of n Web Pages Median Time to Respond Number of Pages Figure 4. Median time (seconds) to download the first of Ò pages requested simultaneously. 3.1 Using Links to Improve Metasearch Standard search engines are making increasing use of the links between pages. For example, Google [7] uses PageRank [8] as part of its method for ranking pages. PageRank considers the numbers of links to pages (pages that are linked to more often or by more highly linked pages get ranked higher). Inquirus does not know the number of links to all pages and hence does not use the same technique. However, Inquirus does make use of link information. Often a query can return pages that link to a desired page before returning the desired page. These pages may be found because the links can contain better descriptions of pages than the pages themselves. For example, a researcher might have a homepage that does not contain their name in the title, but another page might link to their page and include the researcher s name in the link. As shown in Figure 6, Inquirus displays links in page summaries, so it is easy to see and follow these links. Inquirus also analyzes the graph created by links between pages in the result set, in order to identify authoritative pages (those with many links to them), and pages that have links to many authorities ( hubs ). The results are similar to the hubs and authorities computation performed by Kleinberg and others [9, 1], although the algorithm currently used by Inquirus simply looks at the number of links to and from pages within the result set. Figure 7 shows an example. The algorithms used by Kleinberg and others are typically iterative and augment the result set with neighboring pages in the link graph of the Web. Inquirus does not do this because it would add substantially to the computational requirements. We plan to compare our algorithm with the other proposed algorithms. 3. Usage and Performance Statistics Inquirus is used by employees of the NEC Research Institute. We analyzed the last 5 queries sent to Inquirus in April Users performed an average of 7.7 queries each. On average,.5 pages are viewed per query. Since Inquirus shows query-sensitive summaries, this suggests 5

6 Frequency All Pages Response Time Time Figure 5. Response time (seconds) for arbitrary Web pages. Response times greater than 15 seconds are shown in the last bin. Find: Home Options Help Feedback Add URL Analysis Papers About david waltz Web Usenet Web & Usenet News Images All Hits: 5 Context: 1 Cluster: No Tracking: No View: Main Vote Ranked Hubs & Authorities Duplicates Sites Partial Suggestions Summary Searching for: "david waltz" using: HotBot Google MSN Infoseek AltaVista Excite Lycos Northern Light Snap Fast Yahoo. Tip: You can search for links to a page by simply entering the URL. JAIR Masthead A 3m 1k meta: Adobe PageMill 3. Mac Editors and Staff, 1999 Executive Editor Michael Wellman Associate Editors Craig Boutilier Bernhard.../ Toby Walsh Usama Fayyad Sridhar Mahadevan David Waltz Stephanie Forrest Maja Mataric Brian... [... section deleted...] Figure 6. Sample response of Inquirus with the query Ú Û ÐØÞ. Although the first result is not David Waltz s homepage, the query-sensitive summary contains a link to his homepage. Although the link is not underlined (links are shown in a different color), the link can be directly followed. that users are finding valuable documents. There was an average of.9 words used per query, which is higher than typically reported values for Web search engines [11]. 51% of queries used phrases, which is much higher than typically reported for Web search engines [11]. The higher number of words per query and the increased use of phrases may be due to the user population (primarily scientists) and/or the query suggestions that Inquirus makes [3]. For example, when a two term query without phrases returns many pages that contain the terms as a phrase, Inquirus suggests the use of phrases. Many users may follow the suggestions and use phrases in the future if they obtain improved results. Few users read help information in our experience, and we believe that query-sensitive suggestions can be a good way to teach users about query syntax, query formulation, and features of the search service. Figure 8 shows the median response time of various Web search engines as a function of time. The median is used because the distribution of response times is significantly skewed. We can see that the response time varies significantly across engines and across time. 4 Summary We have discussed the text and image metasearch functions of the Inquirus search service. Inquirus provides a number of advantages for text metasearch by analyzing the current contents of pages. Inquirus demonstrates that real-time text and image metasearch, including downloading and analysis of the matching pages and images, is feasible. For image metasearch, Inquirus queries multiple image search engines on the Web, downloads the actual 6

7 Hubs and authorities Authorities Hubs 1: CNNfn - the financial network 33: News and Financial Information 7: Barron's ( 134: Yahoo! Finance ( 6: Wall Street Journal ( 16: The Computerplaza-Finance Links 5: ( 117: NetGuide: Money guide, Financial News section 5: Bloomberg Personal ( 115: NetGuide: Money guide, Financial News section 5: PC Quotes ( 16: The Search Beat's Business News Cube 4: NYSE ( 81: Financial News Online, Financial Links for Investo... 4: The Economist ( 81: Financial News Online, Financial Links for Investo... 4: Data Broadcasting Corporation ( 78: Active Lynx Showcase: Investing: Financial News : Briefing.com ( 75: A s i a n N e t - Finance Forum Figure 7. Sample of hubs and authorities computation for the query Ò Ò Ð Ò Û. Median Engine Response Time Median Engine Response Time (Seconds) AltaVista Excite Google HotBot Infoseek Lycos Microsoft Northern Light SNAP Yahoo 11 Apr Apr 99 5 Apr 99 May 99 Date Figure 8. The median time for search engines to respond. images, and creates thumbnails for display to the user. For image search engines that return a list of pages, Inquirus analyzes the text on the page in order to predict which image is most likely to correspond to the query. The individual image search engines tend to excel at different classes of queries, and the combination of engines is surprisingly effective at finding images corresponding to a given query. Implementation of Inquirus shows that it is surprisingly fast, given that it downloads and processes all of the images. References [1] E. Selberg and O. Etzioni. The MetaCrawler architecture for resource aggregation on the Web. IEEE Expert, pages 11 14, January February [] D. Dreilinger and A.E. Howe. An information gathering agent for querying web search engines. Technical Report CS , Computer Science Department, Colorado State University, [4] E. Selberg and O. Etzioni. Multi-service search and comparison using the MetaCrawler. In Proceedings of the 1995 World Wide Web Conference, [5] Shih-Fu Chang, John R. Smith, Mandis Beigi, and Ana Benitez. Visual information retrieval from large distributed online repositories. Communications of the ACM, 4:63 71, [6] Steve Lawrence and C. Lee Giles. Searching the World Wide Web. Science, 8(536):98 1, [7] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Seventh International World Wide Web Conference, Brisbane, Australia, [8] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web [9] J. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings ACM-SIAM Symposium on Discrete Algorithms, pages , San Francisco, California, 5 7 January [1] K. Bharat and M.R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In SIGIR Conference on Research and Development in Information Retrieval, [11] B. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real life information retrieval: A study of user queries on the web. In Proceedings SIGIR, volume 3, pages 5 17, [3] Steve Lawrence and C. Lee Giles. Context and page analysis for improved web search. IEEE Internet Computing, (4):38 46,

Architecture of a Metasearch Engine that Supports User Information Needs

Architecture of a Metasearch Engine that Supports User Information Needs In Proceedings of the Eighth International Conference on Information Knowledge Management, (CIKM-99), pp. 210-216, Copyright 1999, ACM Architecture of a Metasearch Engine that Supports User Information

More information

Accessibility of INGO FAST 1997 ARTVILLE, LLC. 32 Spring 2000 intelligence

Accessibility of INGO FAST 1997 ARTVILLE, LLC. 32 Spring 2000 intelligence Accessibility of INGO FAST 1997 ARTVILLE, LLC 32 Spring 2000 intelligence On the Web Information On the Web Steve Lawrence C. Lee Giles Search engines do not index sites equally, may not index new pages

More information

Models, Notation, Goals

Models, Notation, Goals Scope Ë ÕÙ Ò Ð Ò ÐÝ Ó ÝÒ Ñ ÑÓ Ð Ü Ô Ö Ñ Ö ² Ñ ¹Ú ÖÝ Ò Ú Ö Ð Ö ÒÙÑ Ö Ð ÔÓ Ö ÓÖ ÔÔÖÓÜ Ñ ÓÒ ß À ÓÖ Ð Ô Ö Ô Ú ß Ë ÑÙÐ ÓÒ Ñ Ó ß ËÑÓÓ Ò ² Ö Ò Ö Ò Ô Ö Ñ Ö ÑÔÐ ß Ã ÖÒ Ð Ñ Ó ÚÓÐÙ ÓÒ Ñ Ó ÓÑ Ò Ô Ö Ð Ð Ö Ò Ð ÓÖ Ñ

More information

Eric J. Glover, Steve Lawrence, Michael D. Gordon, William P. Birmingham, and C. Lee Giles

Eric J. Glover, Steve Lawrence, Michael D. Gordon, William P. Birmingham, and C. Lee Giles Improving Web searching with user preferences. WEB SEARCH YOUR WAY The World Wide Web was estimated at over 800 million indexable pages circa 1998 [9]. When searching the Web, a user can be overwhelmed

More information

Control-Flow Graph and. Local Optimizations

Control-Flow Graph and. Local Optimizations Control-Flow Graph and - Part 2 Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler Design Outline of the Lecture What is

More information

Ancillary Software Development at GSI. Michael Reese. Outline: Motivation Old Software New Software

Ancillary Software Development at GSI. Michael Reese. Outline: Motivation Old Software New Software Ancillary Software Development at GSI Michael Reese Outline: Motivation Old Software New Software Work supported by BMBF NuSTAR.DA - TP 6, FKZ: BMBF 05P12RDFN8 (TP 6). March 20, 2013 AGATA week 2013 at

More information

Lecture 17 November 7

Lecture 17 November 7 CS 559: Algorithmic Aspects of Computer Networks Fall 2007 Lecture 17 November 7 Lecturer: John Byers BOSTON UNIVERSITY Scribe: Flavio Esposito In this lecture, the last part of the PageRank paper has

More information

Using SmartXplorer to achieve timing closure

Using SmartXplorer to achieve timing closure Using SmartXplorer to achieve timing closure The main purpose of Xilinx SmartXplorer is to achieve timing closure where the default place-and-route (PAR) strategy results in a near miss. It can be much

More information

How to Implement DOTGO Engines. CMRL Version 1.0

How to Implement DOTGO Engines. CMRL Version 1.0 How to Implement DOTGO Engines CMRL Version 1.0 Copyright c 2009 DOTGO. All rights reserved. Contents 1 Introduction 3 2 A Simple Example 3 2.1 The CMRL Document................................ 3 2.2 The

More information

Query Modifications Patterns During Web Searching

Query Modifications Patterns During Web Searching Bernard J. Jansen The Pennsylvania State University jjansen@ist.psu.edu Query Modifications Patterns During Web Searching Amanda Spink Queensland University of Technology ah.spink@qut.edu.au Bhuva Narayan

More information

Recent Researches on Web Page Ranking

Recent Researches on Web Page Ranking Recent Researches on Web Page Pradipta Biswas School of Information Technology Indian Institute of Technology Kharagpur, India Importance of Web Page Internet Surfers generally do not bother to go through

More information

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule 1 How big is the Web How big is the Web? In the past, this question

More information

³ ÁÒØÖÓÙØÓÒ ½º ÐÙ ØÖ ÜÔÒ ÓÒ Ò ÌÒ ÓÖ ÓÖ ¾º ÌÛÓ¹ÓÝ ÈÖÓÔÖØ Ó ÓÑÔÐÜ ÆÙÐ º ËÙÑÑÖÝ Ò ÓÒÐÙ ÓÒ º ² ± ÇÆÌÆÌË Åº ÐÚÓÐ ¾ Ëʼ

³ ÁÒØÖÓÙØÓÒ ½º ÐÙ ØÖ ÜÔÒ ÓÒ Ò ÌÒ ÓÖ ÓÖ ¾º ÌÛÓ¹ÓÝ ÈÖÓÔÖØ Ó ÓÑÔÐÜ ÆÙÐ º ËÙÑÑÖÝ Ò ÓÒÐÙ ÓÒ º ² ± ÇÆÌÆÌË Åº ÐÚÓÐ ¾ Ëʼ È Ò È Æ ÇÊÊÄÌÁÇÆË È ÅÁÍŹÏÁÀÌ ÆÍÄÁ ÁÆ Åº ÐÚÓÐ ÂÄ ÇØÓÖ ¾¼¼ ½ ³ ÁÒØÖÓÙØÓÒ ½º ÐÙ ØÖ ÜÔÒ ÓÒ Ò ÌÒ ÓÖ ÓÖ ¾º ÌÛÓ¹ÓÝ ÈÖÓÔÖØ Ó ÓÑÔÐÜ ÆÙÐ º ËÙÑÑÖÝ Ò ÓÒÐÙ ÓÒ º ² ± ÇÆÌÆÌË Åº ÐÚÓÐ ¾ Ëʼ À Ò Ò Ò ÛØ À Ç Òµ Ú Ò µ Ç Òµ

More information

AN EFFICIENT COLLECTION METHOD OF OFFICIAL WEBSITES BY ROBOT PROGRAM

AN EFFICIENT COLLECTION METHOD OF OFFICIAL WEBSITES BY ROBOT PROGRAM AN EFFICIENT COLLECTION METHOD OF OFFICIAL WEBSITES BY ROBOT PROGRAM Masahito Yamamoto, Hidenori Kawamura and Azuma Ohuchi Graduate School of Information Science and Technology, Hokkaido University, Japan

More information

COMP5331: Knowledge Discovery and Data Mining

COMP5331: Knowledge Discovery and Data Mining COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd, Jon M. Kleinberg 1 1 PageRank

More information

Introduction to Scientific Typesetting Lesson 11: Foreign Languages, Columns, and Section Titles

Introduction to Scientific Typesetting Lesson 11: Foreign Languages, Columns, and Section Titles Introduction to Scientific Typesetting Lesson 11: Foreign Languages,, and Ryan Higginbottom January 19, 2012 1 Ð The Package 2 Without Ð What s the Problem? With Ð Using Another Language Typing in Spanish

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html

More information

Graphs (MTAT , 4 AP / 6 ECTS) Lectures: Fri 12-14, hall 405 Exercises: Mon 14-16, hall 315 või N 12-14, aud. 405

Graphs (MTAT , 4 AP / 6 ECTS) Lectures: Fri 12-14, hall 405 Exercises: Mon 14-16, hall 315 või N 12-14, aud. 405 Graphs (MTAT.05.080, 4 AP / 6 ECTS) Lectures: Fri 12-14, hall 405 Exercises: Mon 14-16, hall 315 või N 12-14, aud. 405 homepage: http://www.ut.ee/~peeter_l/teaching/graafid08s (contains slides) For grade:

More information

Information Retrieval. Lecture 4: Search engines and linkage algorithms

Information Retrieval. Lecture 4: Search engines and linkage algorithms Information Retrieval Lecture 4: Search engines and linkage algorithms Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk Today 2

More information

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai. UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index

More information

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds MAE 298, Lecture 9 April 30, 2007 Web search and decentralized search on small-worlds Search for information Assume some resource of interest is stored at the vertices of a network: Web pages Files in

More information

Word Disambiguation in Web Search

Word Disambiguation in Web Search Word Disambiguation in Web Search Rekha Jain Computer Science, Banasthali University, Rajasthan, India Email: rekha_leo2003@rediffmail.com G.N. Purohit Computer Science, Banasthali University, Rajasthan,

More information

SEARCH ENGINE INSIDE OUT

SEARCH ENGINE INSIDE OUT SEARCH ENGINE INSIDE OUT From Technical Views r86526020 r88526016 r88526028 b85506013 b85506010 April 11,2000 Outline Why Search Engine so important Search Engine Architecture Crawling Subsystem Indexing

More information

Anatomy of a search engine. Design criteria of a search engine Architecture Data structures

Anatomy of a search engine. Design criteria of a search engine Architecture Data structures Anatomy of a search engine Design criteria of a search engine Architecture Data structures Step-1: Crawling the web Google has a fast distributed crawling system Each crawler keeps roughly 300 connection

More information

An Esterel Virtual Machine

An Esterel Virtual Machine An Esterel Virtual Machine Stephen A. Edwards Columbia University Octopi Workshop Chalmers University of Technology Gothenburg, Sweden December 28 An Esterel Virtual Machine Goal: Run big Esterel programs

More information

Concurrent Execution

Concurrent Execution Concurrent Execution Overview: concepts and definitions modelling: parallel composition action interleaving algebraic laws shared actions composite processes process labelling, action relabeling and hiding

More information

Breadth-First Search Crawling Yields High-Quality Pages

Breadth-First Search Crawling Yields High-Quality Pages Breadth-First Search Crawling Yields High-Quality Pages Marc Najork Compaq Systems Research Center 13 Lytton Avenue Palo Alto, CA 9431, USA marc.najork@compaq.com Janet L. Wiener Compaq Systems Research

More information

Probabilistic analysis of algorithms: What s it good for?

Probabilistic analysis of algorithms: What s it good for? Probabilistic analysis of algorithms: What s it good for? Conrado Martínez Univ. Politècnica de Catalunya, Spain February 2008 The goal Given some algorithm taking inputs from some set Á, we would like

More information

Directory Search Engines Searching the Yahoo Directory

Directory Search Engines Searching the Yahoo Directory Searching on the WWW Directory Oriented Search Engines Often looking for some specific information WWW has a growing collection of Search Engines to aid in locating information The Search Engines return

More information

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)

More information

International Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine

International Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine International Journal of Scientific & Engineering Research Volume 2, Issue 12, December-2011 1 Web Search Engine G.Hanumantha Rao*, G.NarenderΨ, B.Srinivasa Rao+, M.Srilatha* Abstract This paper explains

More information

Using USB Hot-Plug For UMTS Short Message Service. Technical Brief from Missing Link Electronics:

Using USB Hot-Plug For UMTS Short Message Service. Technical Brief from Missing Link Electronics: Technical Brief 20100507 from Missing Link Electronics: Using USB Hot-Plug For UMTS Short Message Service This Technical Brief describes how the USB hot-plug capabilities of the MLE Soft Hardware Platform

More information

A Model for Interactive Web Information Retrieval

A Model for Interactive Web Information Retrieval A Model for Interactive Web Information Retrieval Orland Hoeber and Xue Dong Yang University of Regina, Regina, SK S4S 0A2, Canada {hoeber, yang}@uregina.ca Abstract. The interaction model supported by

More information

Lecture 20: Classification and Regression Trees

Lecture 20: Classification and Regression Trees Fall, 2017 Outline Basic Ideas Basic Ideas Tree Construction Algorithm Parameter Tuning Choice of Impurity Measure Missing Values Characteristics of Classification Trees Main Characteristics: very flexible,

More information

Event List Management In Distributed Simulation

Event List Management In Distributed Simulation Event List Management In Distributed Simulation Jörgen Dahl ½, Malolan Chetlur ¾, and Philip A Wilsey ½ ½ Experimental Computing Laboratory, Dept of ECECS, PO Box 20030, Cincinnati, OH 522 0030, philipwilsey@ieeeorg

More information

Mesh Smoothing via Mean and Median Filtering Applied to Face Normals

Mesh Smoothing via Mean and Median Filtering Applied to Face Normals Mesh Smoothing via Mean and ing Applied to Face Normals Ý Hirokazu Yagou Yutaka Ohtake Ý Alexander G. Belyaev Ý Shape Modeling Lab, University of Aizu, Aizu-Wakamatsu 965-8580 Japan Computer Graphics Group,

More information

Link Analysis in Web Information Retrieval

Link Analysis in Web Information Retrieval Link Analysis in Web Information Retrieval Monika Henzinger Google Incorporated Mountain View, California monika@google.com Abstract The analysis of the hyperlink structure of the web has led to significant

More information

Deep Web Crawling and Mining for Building Advanced Search Application

Deep Web Crawling and Mining for Building Advanced Search Application Deep Web Crawling and Mining for Building Advanced Search Application Zhigang Hua, Dan Hou, Yu Liu, Xin Sun, Yanbing Yu {hua, houdan, yuliu, xinsun, yyu}@cc.gatech.edu College of computing, Georgia Tech

More information

A Parallel Computing Architecture for Information Processing Over the Internet

A Parallel Computing Architecture for Information Processing Over the Internet A Parallel Computing Architecture for Information Processing Over the Internet Wendy A. Lawrence-Fowler, Xiannong Meng, Richard H. Fowler, Zhixiang Chen Department of Computer Science, University of Texas

More information

To Evaluate the Performance of Metasearch Engines: A Comparative Study

To Evaluate the Performance of Metasearch Engines: A Comparative Study To Evaluate the Performance of Metasearch Engines: A Comparative Study SATINDER BALL 1, RAJENDAR NATH 2 1 Assistant Professor, Department of Computer Science and ApplicationsVaish College of Engineering,

More information

THE AUSTRALIAN NATIONAL UNIVERSITY Practice Final Examination, October 2012

THE AUSTRALIAN NATIONAL UNIVERSITY Practice Final Examination, October 2012 THE AUSTRALIAN NATIONAL UNIVERSITY Practice Final Examination, October 2012 COMP2310 / COMP6310 (Concurrent and Distributed Systems ) Writing Period: 3 hours duration Study Period: 15 minutes duration

More information

Web Structure Mining using Link Analysis Algorithms

Web Structure Mining using Link Analysis Algorithms Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.

More information

Searching the Web for Information

Searching the Web for Information Search Xin Liu Searching the Web for Information How a Search Engine Works Basic parts: 1. Crawler: Visits sites on the Internet, discovering Web pages 2. Indexer: building an index to the Web's content

More information

Dynamic Visualization of Hubs and Authorities during Web Search

Dynamic Visualization of Hubs and Authorities during Web Search Dynamic Visualization of Hubs and Authorities during Web Search Richard H. Fowler 1, David Navarro, Wendy A. Lawrence-Fowler, Xusheng Wang Department of Computer Science University of Texas Pan American

More information

A Comparison of Mesh Smoothing Methods

A Comparison of Mesh Smoothing Methods A Comparison of Mesh Smoothing Methods Alexander Belyaev Yutaka Ohtake Computer Graphics Group, Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany Phone: [+49](681)9325-408 Fax: [+49](681)9325-499

More information

Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines

Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Appears in WWW 04 Workshop: Measuring Web Effectiveness: The User Perspective, New York, NY, May 18, 2004 Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Anselm

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

On Finding Power Method in Spreading Activation Search

On Finding Power Method in Spreading Activation Search On Finding Power Method in Spreading Activation Search Ján Suchal Slovak University of Technology Faculty of Informatics and Information Technologies Institute of Informatics and Software Engineering Ilkovičova

More information

Finding Neighbor Communities in the Web using Inter-Site Graph

Finding Neighbor Communities in the Web using Inter-Site Graph Finding Neighbor Communities in the Web using Inter-Site Graph Yasuhito Asano 1, Hiroshi Imai 2, Masashi Toyoda 3, and Masaru Kitsuregawa 3 1 Graduate School of Information Sciences, Tohoku University

More information

Finding Code on the World Wide Web: A Preliminary Investigation

Finding Code on the World Wide Web: A Preliminary Investigation TO APPEAR IN Proc. First Int. Workshop on Source Code Analysis and Manipulation (SCAM 2001), NOV. 2001. 1 Finding Code on the World Wide Web: A Preliminary Investigation James M. Bieman Vanessa Murdock

More information

PageRank and related algorithms

PageRank and related algorithms PageRank and related algorithms PageRank and HITS Jacob Kogan Department of Mathematics and Statistics University of Maryland, Baltimore County Baltimore, Maryland 21250 kogan@umbc.edu May 15, 2006 Basic

More information

Searching the Web [Arasu 01]

Searching the Web [Arasu 01] Searching the Web [Arasu 01] Most user simply browse the web Google, Yahoo, Lycos, Ask Others do more specialized searches web search engines submit queries by specifying lists of keywords receive web

More information

From Clarity to Efficiency for Distributed Algorithms

From Clarity to Efficiency for Distributed Algorithms From Clarity to Efficiency for Distributed Algorithms Yanhong A. Liu Scott D. Stoller Bo Lin Michael Gorbovitski Computer Science Department, State University of New York at Stony Brook, Stony Brook, NY

More information

A Query-Level Examination of End User Searching Behaviour on the Excite Search Engine. Dietmar Wolfram

A Query-Level Examination of End User Searching Behaviour on the Excite Search Engine. Dietmar Wolfram A Query-Level Examination of End User Searching Behaviour on the Excite Search Engine Dietmar Wolfram University of Wisconsin Milwaukee Abstract This study presents an analysis of selected characteristics

More information

Optimal Time Bounds for Approximate Clustering

Optimal Time Bounds for Approximate Clustering Optimal Time Bounds for Approximate Clustering Ramgopal R. Mettu C. Greg Plaxton Department of Computer Science University of Texas at Austin Austin, TX 78712, U.S.A. ramgopal, plaxton@cs.utexas.edu Abstract

More information

CS-C Data Science Chapter 9: Searching for relevant pages on the Web: Random walks on the Web. Jaakko Hollmén, Department of Computer Science

CS-C Data Science Chapter 9: Searching for relevant pages on the Web: Random walks on the Web. Jaakko Hollmén, Department of Computer Science CS-C3160 - Data Science Chapter 9: Searching for relevant pages on the Web: Random walks on the Web Jaakko Hollmén, Department of Computer Science 30.10.2017-18.12.2017 1 Contents of this chapter Story

More information

Web Search Using a Graph-Based Discovery System

Web Search Using a Graph-Based Discovery System From: FLAIRS-01 Proceedings. Copyright 2001, AAAI (www.aaai.org). All rights reserved. Structural Web Search Using a Graph-Based Discovery System Nifish Manocha, Diane J. Cook, and Lawrence B. Holder Department

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

Concurrent Architectures - Unix: Sockets, Select & Signals

Concurrent Architectures - Unix: Sockets, Select & Signals Concurrent Architectures - Unix: Sockets, Select & Signals Assignment 1: Drop In Labs reminder check compiles in CS labs & you have submitted all your files in StReAMS! formatting your work: why to 80

More information

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal

More information

Evaluating the Effectiveness of Term Frequency Histograms for Supporting Interactive Web Search Tasks

Evaluating the Effectiveness of Term Frequency Histograms for Supporting Interactive Web Search Tasks Evaluating the Effectiveness of Term Frequency Histograms for Supporting Interactive Web Search Tasks ABSTRACT Orland Hoeber Department of Computer Science Memorial University of Newfoundland St. John

More information

Searching in All the Right Places. How Is Information Organized? Chapter 5: Searching for Truth: Locating Information on the WWW

Searching in All the Right Places. How Is Information Organized? Chapter 5: Searching for Truth: Locating Information on the WWW Chapter 5: Searching for Truth: Locating Information on the WWW Fluency with Information Technology Third Edition by Lawrence Snyder Searching in All the Right Places The Obvious and Familiar To find tax

More information

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW ISSN: 9 694 (ONLINE) ICTACT JOURNAL ON COMMUNICATION TECHNOLOGY, MARCH, VOL:, ISSUE: WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW V Lakshmi Praba and T Vasantha Department of Computer

More information

An Improved Computation of the PageRank Algorithm 1

An Improved Computation of the PageRank Algorithm 1 An Improved Computation of the PageRank Algorithm Sung Jin Kim, Sang Ho Lee School of Computing, Soongsil University, Korea ace@nowuri.net, shlee@computing.ssu.ac.kr http://orion.soongsil.ac.kr/ Abstract.

More information

Topical Locality in the Web

Topical Locality in the Web Topical Locality in the Web Brian D. Davison Department of Computer Science Rutgers, The State University of New Jersey New Brunswick, NJ 893 USA davison@cs.rutgers.edu Abstract Most web pages are linked

More information

Analysis of Meta-Search engines using the Meta-Meta- Search tool SSIR

Analysis of Meta-Search engines using the Meta-Meta- Search tool SSIR 2010 International Journal of Computer Applications (0975 8887 Analysis of Meta-Search engines using the Meta-Meta- Search tool SSIR Manoj M Senior Research Fellow Computational Modelling and Simulation

More information

The application of Randomized HITS algorithm in the fund trading network

The application of Randomized HITS algorithm in the fund trading network The application of Randomized HITS algorithm in the fund trading network Xingyu Xu 1, Zhen Wang 1,Chunhe Tao 1,Haifeng He 1 1 The Third Research Institute of Ministry of Public Security,China Abstract.

More information

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Web Search Instructor: Walid Magdy 14-Nov-2017 Lecture Objectives Learn about: Working with Massive data Link analysis (PageRank) Anchor text 2 1 The Web Document

More information

Compressing the Graph Structure of the Web

Compressing the Graph Structure of the Web Compressing the Graph Structure of the Web Torsten Suel Jun Yuan Abstract A large amount of research has recently focused on the graph structure (or link structure) of the World Wide Web. This structure

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Lecture #3: PageRank Algorithm The Mathematics of Google Search

Lecture #3: PageRank Algorithm The Mathematics of Google Search Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,

More information

Recent Researches in Ranking Bias of Modern Search Engines

Recent Researches in Ranking Bias of Modern Search Engines DEWS2007 C7-7 Web,, 169 8050 1 104 169 8555 3 4 1 101 8430 2 1 2 E-mail: {hirate,yoshida,yamana}@yama.info.waseda.ac.jp Web Web Web Recent Researches in Ranking Bias of Modern Search Engines Yu HIRATE,,

More information

Secrets of Profitable Freelance Writing

Secrets of Profitable Freelance Writing Secrets of Profitable Freelance Writing Proven Strategies for Finding High Paying Writing Jobs Online Nathan Segal Cover by Nathan Segal Editing Precision Proofreading Nathan Segal 2014 Secrets of Profitable

More information

A Survey of Google's PageRank

A Survey of Google's PageRank http://pr.efactory.de/ A Survey of Google's PageRank Within the past few years, Google has become the far most utilized search engine worldwide. A decisive factor therefore was, besides high performance

More information

Web Search Basics. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University

Web Search Basics. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University Web Search Basics Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction

More information

On the Complexity of List Scheduling Algorithms for Distributed-Memory Systems.

On the Complexity of List Scheduling Algorithms for Distributed-Memory Systems. On the Complexity of List Scheduling Algorithms for Distributed-Memory Systems. Andrei Rădulescu Arjan J.C. van Gemund Faculty of Information Technology and Systems Delft University of Technology P.O.Box

More information

Parallel Functional Reactive Programming

Parallel Functional Reactive Programming Parallel Functional Reactive Programming John Peterson, Valery Trifonov, and Andrei Serjantov Yale University Ô Ø Ö ÓÒ¹ Ó Ò ºÝ Ð º Ù ØÖ ÓÒÓÚ¹Ú Ð ÖÝ ºÝ Ð º Ù Ò Ö º Ö ÒØÓÚÝ Ð º Ù Abstract In this paper,

More information

CSE 3. How Is Information Organized? Searching in All the Right Places. Design of Hierarchies

CSE 3. How Is Information Organized? Searching in All the Right Places. Design of Hierarchies CSE 3 Comics Updates Shortcut(s)/Tip(s) of the Day Web Proxy Server PrimoPDF How Computers Work Ch 30 Chapter 5: Searching for Truth: Locating Information on the WWW Fluency with Information Technology

More information

A brief history of Google

A brief history of Google the math behind Sat 25 March 2006 A brief history of Google 1995-7 The Stanford days (aka Backrub(!?)) 1998 Yahoo! wouldn't buy (but they might invest...) 1999 Finally out of beta! Sergey Brin Larry Page

More information

Abstract. 1. Introduction

Abstract. 1. Introduction A Visualization System using Data Mining Techniques for Identifying Information Sources on the Web Richard H. Fowler, Tarkan Karadayi, Zhixiang Chen, Xiaodong Meng, Wendy A. L. Fowler Department of Computer

More information

A STUDY OF RANKING ALGORITHM USED BY VARIOUS SEARCH ENGINE

A STUDY OF RANKING ALGORITHM USED BY VARIOUS SEARCH ENGINE A STUDY OF RANKING ALGORITHM USED BY VARIOUS SEARCH ENGINE Bohar Singh 1, Gursewak Singh 2 1, 2 Computer Science and Application, Govt College Sri Muktsar sahib Abstract The World Wide Web is a popular

More information

You 2 Software

You 2 Software PrismaCards Enter text for languages with exotic fonts You 2 Software http://www.you2.de info@you2.de Introduction To work in PrismaCards and other programs with complex fonts for different languages you

More information

Optimizing Search Engines using Click-through Data

Optimizing Search Engines using Click-through Data Optimizing Search Engines using Click-through Data By Sameep - 100050003 Rahee - 100050028 Anil - 100050082 1 Overview Web Search Engines : Creating a good information retrieval system Previous Approaches

More information

Constraint Logic Programming (CLP): a short tutorial

Constraint Logic Programming (CLP): a short tutorial Constraint Logic Programming (CLP): a short tutorial What is CLP? the use of a rich and powerful language to model optimization problems modelling based on variables, domains and constraints DCC/FCUP Inês

More information

Personalized Information Retrieval

Personalized Information Retrieval Personalized Information Retrieval Shihn Yuarn Chen Traditional Information Retrieval Content based approaches Statistical and natural language techniques Results that contain a specific set of words or

More information

Internet Power Searching: The Advanced Manual

Internet Power Searching: The Advanced Manual Internet Power Searching: The Advanced Manual Phil Bradley NEAL-SCHUMAN PUBLISHERS INC. NEW YORK, LONDON Contents зт figures асе An introduction to the Internet An overview of the Internet What the Internet

More information

DSPTricks A Set of Macros for Digital Signal Processing Plots

DSPTricks A Set of Macros for Digital Signal Processing Plots DSPTricks A Set of Macros for Digital Signal Processing Plots Paolo Prandoni The package DSPTricks is a set of L A TEX macros for plotting the kind of graphs and figures that are usually employed in digital

More information

D. BOONE/CORBIS. Structural Web Search Using a Graph-Based Discovery System. 20 Spring 2001 intelligence

D. BOONE/CORBIS. Structural Web Search Using a Graph-Based Discovery System. 20 Spring 2001 intelligence D. BOONE/CORBIS Structural Web Search Using a Graph-Based Discovery System 20 Spring 2001 intelligence Cover Story Our structural search engine searches not only for particular words or topics, but also

More information

This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 3.0.

This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 3.0. Range: This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version.. isclaimer The shapes of the reference glyphs used in these code charts

More information

Self-Organizing Maps of Web Link Information

Self-Organizing Maps of Web Link Information Self-Organizing Maps of Web Link Information Sami Laakso, Jorma Laaksonen, Markus Koskela, and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O. Box 5400,

More information

The Anatomy of a Large-Scale Hypertextual Web Search Engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine The Anatomy of a Large-Scale Hypertextual Web Search Engine Article by: Larry Page and Sergey Brin Computer Networks 30(1-7):107-117, 1998 1 1. Introduction The authors: Lawrence Page, Sergey Brin started

More information

LET:Towards More Precise Clustering of Search Results

LET:Towards More Precise Clustering of Search Results LET:Towards More Precise Clustering of Search Results Yi Zhang, Lidong Bing,Yexin Wang, Yan Zhang State Key Laboratory on Machine Perception Peking University,100871 Beijing, China {zhangyi, bingld,wangyx,zhy}@cis.pku.edu.cn

More information

A New Technique for Ranking Web Pages and Adwords

A New Technique for Ranking Web Pages and Adwords A New Technique for Ranking Web Pages and Adwords K. P. Shyam Sharath Jagannathan Maheswari Rajavel, Ph.D ABSTRACT Web mining is an active research area which mainly deals with the application on data

More information

An Overview of Search Engine. Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia

An Overview of Search Engine. Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia An Overview of Search Engine Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia haixu@microsoft.com July 24, 2007 1 Outline History of Search Engine Difference Between Software and

More information

COMP 4601 Hubs and Authorities

COMP 4601 Hubs and Authorities COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one

More information

Ticc: A Tool for Interface Compatibility and Composition

Ticc: A Tool for Interface Compatibility and Composition ÒØÖ Ö Ò Î Ö Ø ÓÒ Ì Ò Ð Ê ÔÓÖØ ÒÙÑ Ö ¾¼¼ º Ì ÌÓÓÐ ÓÖ ÁÒØ Ö ÓÑÔ Ø Ð ØÝ Ò ÓÑÔÓ Ø ÓÒº Ð Ö º Ì ÓÑ Å ÖÓ ÐÐ ÄÙ Ð ÖÓ Äº Ë ÐÚ Ü Ð Ä Ý Î Û Ò Ø Ê Ñ Ò Èº Ê ÓÝ Ì ÛÓÖ Û Ô ÖØ ÐÐÝ ÙÔÔÓÖØ Ý Ê Ö ÒØ ¾º ¼º¼¾ ØØÔ»»ÛÛÛºÙÐ º

More information

On near-uniform URL sampling

On near-uniform URL sampling Computer Networks 33 (2000) 295 308 www.elsevier.com/locate/comnet On near-uniform URL sampling Monika R. Henzinger a,ł, Allan Heydon b, Michael Mitzenmacher c, Marc Najork b a Google, Inc. 2400 Bayshore

More information

WWW and Web Browser. 6.1 Objectives In this chapter we will learn about:

WWW and Web Browser. 6.1 Objectives In this chapter we will learn about: WWW and Web Browser 6.0 Introduction WWW stands for World Wide Web. WWW is a collection of interlinked hypertext pages on the Internet. Hypertext is text that references some other information that can

More information

Using Clusters on the Vivisimo Web Search Engine

Using Clusters on the Vivisimo Web Search Engine Using Clusters on the Vivisimo Web Search Engine Sherry Koshman and Amanda Spink School of Information Sciences University of Pittsburgh 135 N. Bellefield Ave., Pittsburgh, PA 15237 skoshman@sis.pitt.edu,

More information

Search Engines. Information Technology and Social Life March 2, Ask difference between a search engine and a directory

Search Engines. Information Technology and Social Life March 2, Ask difference between a search engine and a directory Search Engines Information Technology and Social Life March 2, 2005 Ask difference between a search engine and a directory 1 Search Engine History A search engine is a program designed to help find files

More information