信息检索与搜索引擎 Introduction to Information Retrieval GESC1007
|
|
- Melanie Hodge
- 5 years ago
- Views:
Transcription
1 信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities Spring
2 Last week We have discussed: Evaluation in an information retrieval system Today: Web search engines Solution of second assignment About the final exam 2
3 Course schedule ( 日程安排 ) Lecture 1 Introduction (Chapter 1) Boolean retrieval Lecture 2 Term vocabulary and posting lists (Chapter 2) Lecture 3 Dictionaries and tolerant retrieval (Chapter 3) Lecture 4 Index construction (Chapter 4) Lecture 5 Scoring, term weighting, the vector space model (Chapter 6) Lecture 6 A complete search system (Chapter 7) Lecture 7 Lecture 8 Evaluation in information retrieval Web search engines, advanced topics, conclusion Final exam 3
4 WEB SEARCH ENGINES 4
5 19.1 The Web What is special about the Web? The number of documents (very large) Lack of coordination in the creation of the documents, Diversity of background and motives of participants. 5
6 The Web The Web is a set of webpages ( 网页 ) Webpages are created using a language called HTML Webpage HTML 6
7 The Web Browser Webpages are stored on servers ( 服务器 ) To access a webpage, one must use a software called a Web browser ( 浏览器 ) Internet SERVER of HITSZ Home 7
8 The Web Browser Webpages are stored on servers ( 服务器 ) To access a webpage, one must use a software called a Web browser ( 浏览器 ) Internet SERVER of HITSZ Home Webpages are sent over the internet using the HTTP protocol (HTTP 协议 ) 8
9 The Web The idea of the Web: each webpage contain links to other webpages (hyperlinks - 超链接 ). Each webpage has an address (URL) e.g. Creating a webpage is not difficult. Webpages have become one of the best way to supply and consume information. 9
10 The Web Billions of webpages containing information. But if we cannot search this information, it is useless. Historically, two ways of searching for information: Search engines (Baidu, Bing, etc.) Directories (Yahoo!, etc.) 10
11 Web directories ( 网络目录 ) Web directory: a list of websites, separated by categories. 11
12 Web directories ( 网络目录 ) A Web directory contains only the best webpages for each category. Problems: Web directories are created by humans. This takes a lot of time. It is not convenient for searching. A user need to know how to find information within the categories. There can be thousands of categories. Information in categories is often old For this reason, Web directories have mostly disapeared. 12
13 Web search engines Baidu, Bing, etc. They adapt information retrieval techniques to search billions of documents. Adapted in terms of: Indexing, Processing queries, Ranking documents 13
14 Web search engines Why are they popular? ability to quickly answer queries. ability to index millions of documents. almost always up-to-date. Fifteen years ago, results returned by Web search engines were not very good Novel ranking techniques ( 排序技术 ) and spam-fighting techniques ( 反垃圾邮件技术 ) have been proposed to obtain better results 14
15 19.2 Web characteristics The Web is mainly decentralized ( 分散 ). Many languages. Many different types of content. Some webpages contains only pictures and no text. The Web contains a lot of non reliable information. How can a search engine knows which websites can be trusted? 15
16 Size of the Web 1995: 30 million webpages indexed by AltaVista 2017: 4.48 billion webpages Note: only static webpages are counted. Dynamic webpage: the content is generated in realtime for the user. 16
17 The Web graph The Web can be viewed as a graph ( 图 ) Each webpage is a vertex ( 顶点 ) A link between two webpages is an edge ( 图的边 ) The Web is a directed graph ( 有向图 ) Webpages: A,B,C,, F 17
18 The Web graph Two types of links: In-links: links that go to a page Out-links: links that leave a page Node B has 3 in-links has 1 out-link 18
19 The Web graph Not all web pages are equally popular Many web pages have few in-links Few web pages have many in-links The number of in-links per website follows a power law distribution ( 幂律分布 ) Number of webpages Number of in-links 19
20 Spam For some queries, there is a huge competition to appear high in the results of search engines. e.g. Beijing real-estate ( 房地产 ) Thus, many people modify their website to try to appear first in the search results. e.g. write multiple times Beijing real-estate in a webpage to increase the term frequency. e.g. write invisible text using the background color of the webpage (e.g. white) 20
21 Spam detection Nowadays, search engines use many sophisticated methods to detect spam (repeated keywords, etc.). Websites that are trying to cheat may be blocked from search engines. Thus, some people have developed new techniques to cheat search engines ( 欺骗搜索引擎 ) 21
22 Cloaking ( 伪装 ) One such technique is cloaking. Some websites try to cheat by showing different content to search engines and users. This is a problem that did not exist in traditional IR. 22
23 Paid inclusion Paid inclusion: a website can also pay a search engine to appear high in the results. Some search engines do not allow paid inclusion. 23
24 24
25 Doorway page Doorway page: a page containing carefully chosen text to rank highly in search engines for some keywords. the page then links to another page containing commercial content. a website may have many doorway pages. Doorway page Doorway page Another webpage 25
26 Link analysis To reduce the problem of spam on the Web, many search engine perform link analysis. Basic idea: to rank a page higher or treat it as more reliable if it has many in-links. e.g. PageRank algorithm 26
27 Link analysis But some people create fake links to increase the popularity of their webpages. There is thus a continuing battle between spammers and search engines. 27
28 19.3 Advertising ( 广告 ) Two main advertisement models: 1) cost per view: The goal is to show some content to the user (branding). An image is typically used. A company may pay to display the image 1000 times. 28
29 Advertising ( 广告 ) Two main advertisement models: 2) cost per click: The goal is that some people click on an advertisement to visit the website of the advertiser (initiate a transaction). The website may ask the person to buy something. An image or text may be used with a link. A company may pay for 1000 clicks. 29
30 19.3 Advertising ( 广告 ) Today, many search engines earn money from advertising. Some will display search results and advertisement separately. Search results Sponsored search results Some other search engines will combine search results and advertisement. 30
31 Search results Sponsored search results 31
32 Online advertisement networks There are many advertisement networks : Bing Ads: provides pay per-click advertisements for Bing and Yahoo, Adwords: sells advertisements on various websites. 32
33 Click-spam Click spam: a company clicks on the advertisement of its competitors to spend their money. This may be done using some automatic software. A search engine must use some techniques to block click spam. 33
34 Example: AllAdvantage ( ) It was an online advertisement company. 34
35 19.4 Search user experience ( 用户体验 ) It is also important to understand users of search engines. For traditional IR systems: Users often received a training about how to search and write queries. For Web search engines: Users may not know or care about how to write queries. Usually, people use 2 or 3 keywords in a query. Usually people do not use special operators (wildcard queries, Boolean operators ) 35
36 Search user experience ( 用户体验 ) The more people use a search engine, the more money it can earn. How a search engine can get more users? By increasing the precision in the first few results, By updating the index frequently, By having a larger index, By offering a website that is simple and easy to use, and that is very fast. A user can quickly find what he is looking for. 36
37 Three types of user queries 1) Informational queries: seek general information on a broad topic. e.g. how to play piano There is not a single webpage that contains all the information that the user wants. The user generally want to combine information from several webpages. 37
38 Three types of user queries 2) Navigational queries: seek the website or home page of a given entity. e.g. find the webpage of Huawei( 华为 ) The user expects that the first result is the webpage of the entity (e.g. Huawei) The user only needs one document. He wants a very high precision (1). 38
39 Three types of user queries 3) Transactional queries: the user wants to make a transaction. e.g. reserve a hotel room in Guangzhou, e.g. buy train tickets The search engine should provides links to service providers. 39
40 Three types of user queries For a given query, it can be difficult to identify the type of the query. Identifying the type of a query is useful: for selecting the most relevant results, for displaying relevant advertisements (e.g. advertisement about train tickets) 40
41 41
42 42
43 43
44 44
45 Components of a Web search engine ( 网络爬虫 ) 45
46 Index size How can we compare the sizes of the indexes of two search engines (e.g. Baidu vs Bing)? This may be difficult to evaluate A search engine may only index the first few thousands words in a page. A search engine may display a page in its results that is not in its index (because some other page in its index links to that page) Search engines may organize their indexes in tiers using tiered indexes. For general queries, only the main page of a website may be shown and other pages may not be shown. 46
47 Index size Some techniques have been developped to compare the size of search engines indexes. Hypothesis: each search engine indexes only one part of the Web, chosen randomly. The Capture-recapture method 47
48 Capture-recapture method Two search engines E1 and E2. Take a page from E1 and check if it is in E2 This gives a ratio x Take a page from E2 and check if it is in E1 This gives a ratio y If E1 and E2 are independent and uniform random subsets of the Web, we should have: More details in the book 48
49 19.6 Near-duplicates ( 近似重复 ) Another issue: the Web may contain multiple copies of the same webpage. Up to 40% of the webpages are duplicates ( 重复 ) of other pages. Some of these of these copies are legitimate ( 合法的 ). Others are not. Search engines try to avoid indexing duplicates to reduce the size of their indexes. 49
50 Detecting duplicates How to detect duplicates? We do not want to compare billions of webpages with each other. Simple approach: calculate a fingerprint (hash) for each webpage that is a number. If two pages have the same fingerprint, they may be duplicates, so we need to compare them. If they are duplicates, only one of them is indexed. 50
51 20 Web crawling (Web 信息发现 ) Web crawling: the process by which a search engine gather pages from the Web to index them. Goal: Collect information about webpages, Collect information about links between webpages, Do this quickly! 51
52 Web crawler ( 网络爬虫 ) A web crawler must have the following features ( 特征 ): 1) Robustness: Several websites try to cheat and may try to generate an infinite number of pages to mislead web crawlers. Web-crawlers must be able to avoid these «traps» ( 陷阱 ). 52
53 Web crawler ( 网络爬虫 ) 2) Politeness ( 礼貌 ): A Web crawler should be polite. It should not visit a website too often. Otherwise, the owner of the website may not be happy. 3) Efficient The Web crawler should be able to efficiently index a huge amount of webpages. 53
54 Web crawler ( 网络爬虫 ) 4) Quality The Web crawler should try to index the high quality or most useful webpages first The Web crawler must be able to assign different priority levels to different webpages. 5) Extensible A Web crawler should work with different technologies, different languages, different data format, etc. 54
55 Crawling How a Web crawler indexes websites? The crawler begins with one or more URL (web page addresses). The crawler visit one of these webpages. The crawler extracts the text and links. The text is indexed. The links are used to find more webpages. The crawler then continue visiting other webpages. 55
56 Crawling A Web crawler should not visit the same webpage twice. How fast can it be to crawl the Web? 4 billion webpages 1 month = 1540 webpages / second! A Web Crawler may be designed to visit popular websites more often than less popular websites. 56
57 Robot exclusion Some people do not want that Web crawlers index their website. To do this, we can put a file robots.txt on a website to tell the Web Crawlers to ignore the website. Name of a search engine 57
58 Crawling Generally, a search engine will have many computers working as Web crawlers. These Web crawlers could be located in different locations: China, Europe, America, etc. These Web crawlers must work together. They must split the work and avoid visiting the same websites multiple times. This can be challenging! 58
59 Distributed index For a Web search engine, the index may be very large. Moreover, many users may want to access the index at the same time. Thus an index will be stored on several computers. 59
60 Link analysis Many search engines consider the links between websites as an important information to rank webpages. Link analysis: analyzing the links between websites to derive useful information. A link from a website A to another website B is considered as an endorsement ( 认可 ) of the website B by A. A B 60
61 Link analysis When analyzing links, we can also analyze the context of each link in a webpage (the text of the link). e.g. The real-estate market in Shenzhen ( ) This is useful because the webpage B may not provide an accurate description of itself. A B 61
62 Link analysis In fact, there is often a gap between the terms in a webpage and how web users would describe a page. The text used in a link is useful. But some terms may not be useful. e.g. Click here for information about Shenzhen. We can use the TF-IDF measure to filter unimportant words. A B 62
63 Link analysis Thanks to the analysis of the text of links: If we search «big blue», we may find the webpage of IBM. This is great. But there can be some side-effects. For example, if we search «miserable failure» we can find the page of George W. Bush. 63
64 This is because many people have purposely linked to the page of George W Bush. with the text «miserable failure» to fool the search engines. 64
65 Another example 65
66 Link analysis Search engines try to use various techniques to avoid this problem. Some search engines will not only consider the text of links, but also the text before and after a link. 66
67 FINAL EXAM 67
68 68
69 Some questions 69
70 Some questions 70
71 Some questions 71
72 Conclusion Today, Web search engine Wish you a good preparation for the final exam! 再见! 72
73 References Manning, C. D., Raghavan, P., Schütze, H. Introduction to information retrieval. Cambridge: Cambridge University Press,
信息检索与搜索引擎 Introduction to Information Retrieval GESC1007
信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Introduction Philippe Fournier-Viger
More information信息检索与搜索引擎 Introduction to Information Retrieval GESC1007
信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Last week We have discussed: A
More information信息检索与搜索引擎 Introduction to Information Retrieval GESC1007
信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Last week We have discussed about:
More information信息检索与搜索引擎 Introduction to Information Retrieval GESC1007
信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2018 1 Last week What is Information Retrieval
More informationInformation Retrieval. Lecture 9 - Web search basics
Information Retrieval Lecture 9 - Web search basics Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 30 Introduction Up to now: techniques for general
More information信息检索与搜索引擎 Introduction to Information Retrieval GESC1007
信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Last week We have discussed in
More informationInformation Retrieval Spring Web retrieval
Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic
More informationA web directory lists web sites by category and subcategory. Web directory entries are usually found and categorized by humans.
1 After WWW protocol was introduced in Internet in the early 1990s and the number of web servers started to grow, the first technology that appeared to be able to locate them were Internet listings, also
More informationLec 8: Adaptive Information Retrieval 2
Lec 8: Adaptive Information Retrieval 2 Advaith Siddharthan Introduction to Information Retrieval by Manning, Raghavan & Schütze. Website: http://nlp.stanford.edu/ir-book/ Linear Algebra Revision Vectors:
More informationInformation Retrieval May 15. Web retrieval
Information Retrieval May 15 Web retrieval What s so special about the Web? The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically
More informationCS/INFO 1305 Summer 2009
Information Retrieval Information Retrieval (Search) IR Search Using a computer to find relevant pieces of information Text search Idea popularized in the article As We May Think by Vannevar Bush in 1945
More informationInformation Retrieval
Introduction to Information Retrieval CS3245 12 Lecture 12: Crawling and Link Analysis Information Retrieval Last Time Chapter 11 1. Probabilistic Approach to Retrieval / Basic Probability Theory 2. Probability
More informationpower up your business SEO (SEARCH ENGINE OPTIMISATION)
SEO (SEARCH ENGINE OPTIMISATION) SEO (SEARCH ENGINE OPTIMISATION) The visibility of your business when a customer is looking for services that you offer is important. The first port of call for most people
More informationAN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES
Journal of Defense Resources Management No. 1 (1) / 2010 AN OVERVIEW OF SEARCHING AND DISCOVERING Cezar VASILESCU Regional Department of Defense Resources Management Studies Abstract: The Internet becomes
More informationWeb Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson
Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Overview Overview Introduction Classic
More informationInformation Networks. Hacettepe University Department of Information Management DOK 422: Information Networks
Information Networks Hacettepe University Department of Information Management DOK 422: Information Networks Search engines Some Slides taken from: Ray Larson Search engines Web Crawling Web Search Engines
More informationAdministrative. Web crawlers. Web Crawlers and Link Analysis!
Web Crawlers and Link Analysis! David Kauchak cs458 Fall 2011 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture15-linkanalysis.ppt http://webcourse.cs.technion.ac.il/236522/spring2007/ho/wcfiles/tutorial05.ppt
More informationAdministrivia. Crawlers: Nutch. Course Overview. Issues. Crawling Issues. Groups Formed Architecture Documents under Review Group Meetings CSE 454
Administrivia Crawlers: Nutch Groups Formed Architecture Documents under Review Group Meetings CSE 454 4/14/2005 12:54 PM 1 4/14/2005 12:54 PM 2 Info Extraction Course Overview Ecommerce Standard Web Search
More informationSEOHUNK INTERNATIONAL D-62, Basundhara Apt., Naharkanta, Hanspal, Bhubaneswar, India
SEOHUNK INTERNATIONAL D-62, Basundhara Apt., Naharkanta, Hanspal, Bhubaneswar, India 752101. p: 305-403-9683 w: www.seohunkinternational.com e: info@seohunkinternational.com DOMAIN INFORMATION: S No. Details
More informationMachine Vision Market Analysis of 2015 Isabel Yang
Machine Vision Market Analysis of 2015 Isabel Yang CHINA Machine Vision Union Content 1 1.Machine Vision Market Analysis of 2015 Revenue of Machine Vision Industry in China 4,000 3,500 2012-2015 (Unit:
More informationHow Does a Search Engine Work? Part 1
How Does a Search Engine Work? Part 1 Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0 What we ll examine Web crawling
More informationContractors Guide to Search Engine Optimization
Contractors Guide to Search Engine Optimization CONTENTS What is Search Engine Optimization (SEO)? Why Do Businesses Need SEO (If They Want To Generate Business Online)? Which Search Engines Should You
More informationWeb Search Basics. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University
Web Search Basics Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction
More informationInformation Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.
Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.
More informationInformation Retrieval. Lecture 10 - Web crawling
Information Retrieval Lecture 10 - Web crawling Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 30 Introduction Crawling: gathering pages from the
More informationWorld Wide Web has specific challenges and opportunities
6. Web Search Motivation Web search, as offered by commercial search engines such as Google, Bing, and DuckDuckGo, is arguably one of the most popular applications of IR methods today World Wide Web has
More informationCS47300 Web Information Search and Management
CS47300 Web Information Search and Management Search Engine Optimization Prof. Chris Clifton 31 October 2018 What is Search Engine Optimization? 90% of search engine clickthroughs are on the first page
More information6 WAYS Google s First Page
6 WAYS TO Google s First Page FREE EBOOK 2 CONTENTS 03 Intro 06 Search Engine Optimization 08 Search Engine Marketing 10 Start a Business Blog 12 Get Listed on Google Maps 15 Create Online Directory Listing
More information如何查看 Cache Engine 缓存中有哪些网站 /URL
如何查看 Cache Engine 缓存中有哪些网站 /URL 目录 简介 硬件与软件版本 处理日志 验证配置 相关信息 简介 本文解释如何设置处理日志记录什么网站 /URL 在 Cache Engine 被缓存 硬件与软件版本 使用这些硬件和软件版本, 此配置开发并且测试了 : Hardware:Cisco 缓存引擎 500 系列和 73xx 软件 :Cisco Cache 软件版本 2.3.0
More information5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search
Seo tutorial Seo tutorial Introduction to seo... 4 1. General seo information... 5 1.1 History of search engines... 5 1.2 Common search engine principles... 6 2. Internal ranking factors... 8 2.1 Web page
More information5. search engine marketing
5. search engine marketing What s inside: A look at the industry known as search and the different types of search results: organic results and paid results. We lay the foundation with key terms and concepts
More informationCS6200 Information Retreival. Crawling. June 10, 2015
CS6200 Information Retreival Crawling Crawling June 10, 2015 Crawling is one of the most important tasks of a search engine. The breadth, depth, and freshness of the search results depend crucially on
More informationCloak of Visibility. -Detecting When Machines Browse A Different Web. Zhe Zhao
Cloak of Visibility -Detecting When Machines Browse A Different Web Zhe Zhao Title: Cloak of Visibility -Detecting When Machines Browse A Different Web About Author: Google Researchers Publisher: IEEE
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 20: Crawling Hinrich Schütze Center for Information and Language Processing, University of Munich 2009.07.14 1/36 Outline 1 Recap
More informationdeseo: Combating Search-Result Poisoning Yu USF
deseo: Combating Search-Result Poisoning Yu Jin @MSCS USF Your Google is not SAFE! SEO Poisoning - A new way to spread malware! Why choose SE? 22.4% of Google searches in the top 100 results > 50% for
More informationCS47300: Web Information Search and Management
CS47300: Web Information Search and Management Web Search Prof. Chris Clifton 17 September 2018 Some slides courtesy Manning, Raghavan, and Schütze Other characteristics Significant duplication Syntactic
More informationRanking of ads. Sponsored Search
Sponsored Search Ranking of ads Goto model: Rank according to how much advertiser pays Current model: Balance auction price and relevance Irrelevant ads (few click-throughs) Decrease opportunities for
More information4. Backlink Analysis Check backlinks What Else? Analyze historical data... 29
QUICK START Guide 1 Introduction... 3 1. Your Website s Performance... 4 Set up a project... 6 Track your keyword rankings... 6 Control your website s on-page health... 9 2. Competitive Intelligence...
More informationBrief (non-technical) history
Web Data Management Part 2 Advanced Topics in Database Management (INFSCI 2711) Textbooks: Database System Concepts - 2010 Introduction to Information Retrieval - 2008 Vladimir Zadorozhny, DINS, SCI, University
More informationCS 345A Data Mining Lecture 1. Introduction to Web Mining
CS 345A Data Mining Lecture 1 Introduction to Web Mining What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns Web Mining v. Data Mining Structure (or lack of
More informationAbhishek Dixit, Mukesh Agarwal
Hybrid Approach to Search Engine Optimization (SEO) Techniques Abhishek Dixit, Mukesh Agarwal First Author: Assistant Professor, Department of Computer Science & Engineering, JECRC, Jaipur, India Second
More informationDATA MINING II - 1DL460. Spring 2014"
DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationGlossary of on line marketing terms
Glossary of on line marketing terms As more and more NCDC members become interested and involved in on line marketing, the demand for a deeper understanding of the terms used in the field is growing. To
More informationCS/INFO 1305 Information Retrieval
(Search) Search Using a computer to find relevant pieces of information Text search Idea popularized in the article As We May Think by Vannevar Bush in 1945 Artificial Intelligence Where (or for what)
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationWeb Search Basics. Berlin Chen Department t of Computer Science & Information Engineering National Taiwan Normal University
Web Search Basics Berlin Chen Department t of Computer Science & Information Engineering i National Taiwan Normal University References: 1. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze,
More information3 Media Web. Understanding SEO WHITEPAPER
3 Media Web WHITEPAPER WHITEPAPER In business, it s important to be in the right place at the right time. Online business is no different, but with Google searching more than 30 trillion web pages, 100
More informationInternet Lead Generation START with Your Own Web Site
Internet Lead Generation START with Your Own Web Site Matt Johnston, Santa Barbara Business College Mike McHugh, PlattForm Career College Association 2007 What s s The Big Deal? More Control Higher Quality
More informationWhy is Search Engine Optimisation (SEO) important?
Why is Search Engine Optimisation (SEO) important? With literally billions of searches conducted every month search engines have essentially become our gateway to the internet. Unfortunately getting yourself
More informationBing.com scholar. Мобильный портал WAP версия: wap.altmaster.ru
Мобильный портал WAP версия: wap.altmaster.ru Bing.com scholar Aug 16 2011. I have already had several people ask me whether Bing offers something comparable to Google Scholar. Bing's alternative is Microsoft.
More informationComputer Science 572 Midterm Prof. Horowitz Thursday, March 8, 2012, 2:00pm 3:00pm
Computer Science 572 Midterm Prof. Horowitz Thursday, March 8, 2012, 2:00pm 3:00pm Name: Student Id Number: 1. This is a closed book exam. 2. Please answer all questions. 3. There are a total of 40 questions.
More informationCHAPTER THREE INFORMATION RETRIEVAL SYSTEM
CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Web Crawler Finds and downloads web pages automatically provides the collection for searching Web is huge and constantly
More informationseosummit seosummit April 24-26, 2017 Copyright 2017 Rebecca Gill & ithemes
April 24-26, 2017 CLASSROOM EXERCISE #1 DEFINE YOUR SEO GOALS Template: SEO Goals.doc WHAT DOES SEARCH ENGINE OPTIMIZATION REALLY MEAN? Search engine optimization is often about making SMALL MODIFICATIONS
More informationAdvertising Network Affiliate Marketing Algorithm Analytics Auto responder autoresponder Backlinks Blog
Advertising Network A group of websites where one advertiser controls all or a portion of the ads for all sites. A common example is the Google Search Network, which includes AOL, Amazon,Ask.com (formerly
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 19: Web Search Basics Hinrich Schütze Institute for Natural Language Processing, Universität Stuttgart 2008.07.07 Schütze: Web
More informationIn the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,
1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationICP Enablon User Manual Factory ICP Enablon 用户手册 工厂 Version th Jul 2012 版本 年 7 月 16 日. Content 内容
Content 内容 A1 A2 A3 A4 A5 A6 A7 A8 A9 Login via ICTI CARE Website 通过 ICTI 关爱网站登录 Completing the Application Form 填写申请表 Application Form Created 创建的申请表 Receive Acknowledgement Email 接收确认电子邮件 Receive User
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 Web Search Engine Crawling Indexing Computing
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 10: Introduction to Web Retrieval January 8 th, 2015 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig
More informationPromoting Website CS 4640 Programming Languages for Web Applications
Promoting Website CS 4640 Programming Languages for Web Applications [Jakob Nielsen and Hoa Loranger, Prioritizing Web Usability, Chapter 5] [Sean McManus, Web Design, Chapter 15] 1 Search Engine Optimization
More informationWeb Crawling. Introduction to Information Retrieval CS 150 Donald J. Patterson
Web Crawling Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Robust Crawling A Robust Crawl Architecture DNS Doc.
More informationCLOAK OF VISIBILITY : DETECTING WHEN MACHINES BROWSE A DIFFERENT WEB
CLOAK OF VISIBILITY : DETECTING WHEN MACHINES BROWSE A DIFFERENT WEB CIS 601: Graduate Seminar Prof. S. S. Chung Presented By:- Amol Chaudhari CSU ID 2682329 AGENDA About Introduction Contributions Background
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationConstructing Websites toward High Ranking Using Search Engine Optimization SEO
Constructing Websites toward High Ranking Using Search Engine Optimization SEO Pre-Publishing Paper Jasour Obeidat 1 Dr. Raed Hanandeh 2 Master Student CIS PhD in E-Business Middle East University of Jordan
More informationMobile Travel Trends in China. Nov 2013
Mobile Travel Trends in China Nov 2013 Qunar is the world s largest Chinese travel platform Background Monthly Unique Visitors (in mm) Founded: 2005 Headquarters: Beijing, China Employees: 1699 Listed:
More informationPrevious on Computer Networks Class 18. ICMP: Internet Control Message Protocol IP Protocol Actually a IP packet
ICMP: Internet Control Message Protocol IP Protocol Actually a IP packet 前 4 个字节都是一样的 0 8 16 31 类型代码检验和 ( 这 4 个字节取决于 ICMP 报文的类型 ) ICMP 的数据部分 ( 长度取决于类型 ) ICMP 报文 首部 数据部分 IP 数据报 ICMP: Internet Control Message
More informationDP Project Development Pvt. Ltd.
Search Engine Optimization Training Syllabus Training that makes you focus on the correct business: Today's market is competitive and one has to be top in his field to make profits and stay in the business.
More informationInformation Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group
Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)
More information搜索引擎优化. Search Engine Optimization 赵卫东博士复旦大学软件学院
搜索引擎优化 Search Engine Optimization 赵卫东博士复旦大学软件学院 2009 10 23 It is not easy to design a good website? user perspective search engine Internet marketing search engine A web search engine has the following
More informationThis presentation is copyrighted by ProSites, Inc. No part of this presentation can be copied, reproduced, displayed or changed without the express
This presentation is copyrighted by ProSites, Inc. No part of this presentation can be copied, reproduced, displayed or changed without the express written permission of ProSites, Inc. Logos or third party
More informationSEO and UAEX.EDU GETTING YOUR WEB PAGES FOUND IN GOOGLE
SEO and UAEX.EDU GETTING YOUR WEB PAGES FOUND IN GOOGLE What is Search Engine Optimization? SEO is a marketing discipline focused on growing visibility in organic (non-paid) search engine results. Why
More information云计算入门 Introduction to Cloud Computing GESC1001
Lecture #3 云计算入门 Introduction to Cloud Computing GESC1001 Philippe Fournier-Viger Professor School of Humanities and Social Sciences philfv8@yahoo.com Fall 2018 1 Course schedule Part 1 Part 2 Part 3 Introduction
More informationAn Introduction to Search Engines and Web Navigation
An Introduction to Search Engines and Web Navigation MARK LEVENE ADDISON-WESLEY Ал imprint of Pearson Education Harlow, England London New York Boston San Francisco Toronto Sydney Tokyo Singapore Hong
More informationWebsite Name. Project Code: # SEO Recommendations Report. Version: 1.0
Website Name Project Code: #10001 Version: 1.0 DocID: SEO/site/rec Issue Date: DD-MM-YYYY Prepared By: - Owned By: Rave Infosys Reviewed By: - Approved By: - 3111 N University Dr. #604 Coral Springs FL
More informationHow to Get Your Website Listed on Major Search Engines
Contents Introduction 1 Submitting via Global Forms 1 Preparing to Submit 2 Submitting to the Top 3 Search Engines 3 Paid Listings 4 Understanding META Tags 5 Adding META Tags to Your Web Site 5 Introduction
More informationTHE HISTORY & EVOLUTION OF SEARCH
THE HISTORY & EVOLUTION OF SEARCH Duration : 1 Hour 30 Minutes Let s talk about The History Of Search Crawling & Indexing Crawlers / Spiders Datacenters Answer Machine Relevancy (200+ Factors)
More informationOutline. Motivations (1/3) Distributed File Systems. Motivations (3/3) Motivations (2/3)
Outline TFS: Tianwang File System -Performance Gain with Variable Chunk Size in GFS-like File Systems Authors: Zhifeng Yang, Qichen Tu, Kai Fan, Lei Zhu, Rishan Chen, Bo Peng Introduction (what s it all
More informationSearching the Deep Web
Searching the Deep Web 1 What is Deep Web? Information accessed only through HTML form pages database queries results embedded in HTML pages Also can included other information on Web can t directly index
More informationMicrosoft RemoteFX: USB 和设备重定向 姓名 : 张天民 职务 : 高级讲师 公司 : 东方瑞通 ( 北京 ) 咨询服务有限公司
Microsoft RemoteFX: USB 和设备重定向 姓名 : 张天民 职务 : 高级讲师 公司 : 东方瑞通 ( 北京 ) 咨询服务有限公司 RemoteFX 中新的 USB 重定向特性 在 RDS 中所有设备重定向机制 VDI 部署场景讨论 : 瘦客户端和胖客户端 (Thin&Rich). 用户体验 : 演示使用新的 USB 重定向功能 81% 4 本地和远程的一致的体验 (Close
More informationUNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.
UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index
More informationRelevant?!? Algoritmi per IR. Goal of a Search Engine. Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Web Search
Algoritmi per IR Web Search Goal of a Search Engine Retrieve docs that are relevant for the user query Doc: file word or pdf, web page, email, blog, e-book,... Query: paradigm bag of words Relevant?!?
More informationExecuted by Rocky Sir, tech Head Suven Consultants & Technology Pvt Ltd. seo.suven.net 1
Executed by Rocky Sir, tech Head Suven Consultants & Technology Pvt Ltd. seo.suven.net 1 1. Parts of a Search Engine Every search engine has the 3 basic parts: a crawler an index (or catalog) matching
More informationBROKERS MISSING THEIR CUSTOMERS Victor Lund Partner
BROKERS MISSING THEIR CUSTOMERS Victor Lund Partner 805-709-6696 victor@wavgroup.com http://waves.wavgroup.com Brokers Missing Consumers on Search If I ask a brokerage how their website ranks for top keywords
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationHomework: Exercise 19. Homework: Exercise 21. Homework: Exercise 20. Homework: Exercise 22. Detour: Apache Lucene
Homework: Exercise 19 Are the following statements true or false? Information Retrieval and Web Search Engines In a Boolean retrieval system, stemming never lowers precision Lecture 10: Introduction to
More informationHigh Quality Inbound Links For Your Website Success
Axandra How To Get ö Benefit from tested linking strategies and get more targeted visitors. High Quality Inbound Links For Your Website Success How to: ü Ü Build high quality inbound links from related
More informationSearch Engine Optimization
Search Engine Optimization A necessary campaign for heightened corporate awareness What is SEO? Definition: The practice of building or transforming a Web site so that its content is seen as highly readable,
More informationCC PROCESAMIENTO MASIVO DE DATOS OTOÑO 2018
CC5212-1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2018 Lecture 6 Information Retrieval: Crawling & Indexing Aidan Hogan aidhog@gmail.com MANAGING TEXT DATA Information Overload If we didn t have search Contains
More informationJargon Buster. Ad Network. Analytics or Web Analytics Tools. Avatar. App (Application) Blog. Banner Ad
D I G I TA L M A R K E T I N G Jargon Buster Ad Network A platform connecting advertisers with publishers who want to host their ads. The advertiser pays the network every time an agreed event takes place,
More informationBasic Internet Skills
The Internet might seem intimidating at first - a vast global communications network with billions of webpages. But in this lesson, we simplify and explain the basics about the Internet using a conversational
More informationDetecting Spam Web Pages
Detecting Spam Web Pages Marc Najork Microsoft Research Silicon Valley About me 1989-1993: UIUC (home of NCSA Mosaic) 1993-2001: Digital Equipment/Compaq Started working on web search in 1997 Mercator
More informationMultiprotocol Label Switching The future of IP Backbone Technology
Multiprotocol Label Switching The future of IP Backbone Technology Computer Network Architecture For Postgraduates Chen Zhenxiang School of Information Science and Technology. University of Jinan (c) Chen
More informationONLINE EVALUATION FOR: Company Name
ONLINE EVALUATION FOR: Company Name Address Phone URL media advertising design P.O. Box 2430 Issaquah, WA 98027 (800) 597-1686 platypuslocal.com SUMMARY A Thank You From Platypus: Thank you for purchasing
More information60-538: Information Retrieval
60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are
More informationNBA 600: Day 15 Online Search 116 March Daniel Huttenlocher
NBA 600: Day 15 Online Search 116 March 2004 Daniel Huttenlocher Today s Class Finish up network effects topic from last week Searching, browsing, navigating Reading Beyond Google No longer available on
More informationELEVATESEO. INTERNET TRAFFIC SALES TEAM PRODUCT INFOSHEETS. JUNE V1.0 WEBSITE RANKING STATS. Internet Traffic
SALES TEAM PRODUCT INFOSHEETS. JUNE 2017. V1.0 1 INTERNET TRAFFIC Internet Traffic Most of your internet traffic will be provided from the major search engines. Social Media services and other referring
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 10: Introduction to Web Retrieval June 22, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig
More information