信息检索与搜索引擎 Introduction to Information Retrieval GESC1007

Size: px
Start display at page:

Download "信息检索与搜索引擎 Introduction to Information Retrieval GESC1007"

Transcription

1 信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities Spring

2 Last week We have discussed: A complete search system Today: Brief review of last week Evaluation in an information retrieval system Second assignment 2

3 Course schedule ( 日程安排 ) Lecture 1 Introduction (Chapter 1) Boolean retrieval Lecture 2 Term vocabulary and posting lists (Chapter 2) Lecture 3 Dictionaries and tolerant retrieval (Chapter 3) Lecture 4 Index construction (Chapter 4) Lecture 5 Scoring, term weighting, the vector space model (Chapter 6) Lecture 6 A complete search system (Chapter 7) Lecture 7 Lecture 8 Evaluation in information retrieval Web search engines, advanced topics, conclusion Final exam 3

4 LAST WEEK 4

5 1) Initially, we have a set of documents. 5

6 2)Linguistic processing is applied to these documents (tokenization, stemming, language detection ) Each document is a set of terms. 6

7 3) The IR System keeps a copy of each document in a cache ( 缓存 ). This is useful to generate snippets ( 片段 ) 7

8 Snippet: a short text that accompany each document in the result list of a search engine 8

9 4) A copy of each document is given to indexers. These programs will create different kind of indexes: positional indexes, indexes for spell correction, structures for inexact retrieval. 9

10 5) When a user searches using a free-text query, the query parser transforms the query, and spell-correction is applied. 10

11 6) The indexes are then used to answer the query. Documents are scored and ranked 11

12 7) A page of results is generated and show to the user. 12

13 EVALUATION IN AN INFORMATION RETRIEVAL SYSTEM Chapter 8, pdf p

14 Introduction In previous chapters, we have discussed many techniques. Which techniques should be used in an IR system? Should we use stop lists? Should we use stemming? Should we use TF-IDF? 14

15 Different search engines will show different results BAIDU BING How can we measure the effectiveness of an IR system? 15

16 Introduction We will discuss: How can we measure the effectiveness of an IR system? Document collections use to evaluate an IR system? Relevant vs non relevant documents Evaluation methodology for unranked retrieval results. Evaluation methodology for ranked retrieval results. 16

17 User utility We discussed the concept of document relevance ( 文件关联 ) for a query. Relevance is not the most important measure. User utility: What makes the user happy? Speed of response, Size of the index, Relevance of the results, User interface design ( 用户界面设计 ): clarity ( 清晰 ), layout ( 布局 ), responsiveness ( 响应能力 ) of the user interface. he generation of high-quality snippets ( 片段 ) 17

18 User utility What makes the user happy? Speed of response, Size of the index, Relevance of the results, User interface design ( 用户界面设计 ): clarity ( 清晰 ), layout ( 布局 ), responsiveness ( 响应能力 ) of the user interface. The generation of high-quality snippets ( 片段 ) 18

19 Snippet: a short text that accompany each document in the result list of a search engine 19

20 8.1 How to evaluate an IR system? To evaluate an IR system, we can use: a collection of documents a set of test queries a set of relevance judgments indicating which document is relevant for each query. Testing data ( 测试数据 ) Documents Queries Relevance judgments 20

21 Traditional evaluation approach The standard approach is to consider whether retrieved documents are relevant or not for a query. We use the set of test queries to evaluate whether an IR system returns relevant results. The relevance judgments are also called the ground truth ( 地面的真理 ) or gold standard ( 金标准 ). It is recommended to use at least 50 queries to evaluate an IR system. 21

22 Information needs vs queries There is a distinction between query and information needs. A user has an information needs (wants to find some information) But the same query may correspond to different information needs. QUERY = PYTHON - an animal? - or a programming language? 22

23 Information needs vs queries To evaluate an IR system, we will make a simplification. We will suppose that a document is either relevant (1) or irrelevant (0) for a query. But in real-life, a document may be partially relevant. We will ignore this for now. 23

24 Tuning an information systems IR systems have parameters ( 参数 ) that can be adjusted ( 调整 ). e.g. we can use different scoring functions to retrieve documents Depending on how the parameters are adjusted, the IR system may perform better or worse on the test data. To adjust the parameters, we should use some data that is different from the testing data. Otherwise, it would be like cheating 24

25 1) Tuning an IR system Training data ( 训练数据 ) IR SYSTEM Results are good? Adjusting the parameters 2) Testing the IR system Testing data ( 测试数据 ) Results are good? 25

26 8.2 Standard test collections If we develop a new IR system, we could create our own data for training an testing our system. However, there exists some standard collections of documents, which can be used for training/testing and IR system. A few examples 26

27 The GOV2 collection GOV2: a collection of 25 million webpages. Size of the data: 426 GB to_data.html Provided by the University of Glasgow, UK. Useful for researchers and companies working on the development of web search engines and IR systems. But more than 100 times smaller than the number of webpages on the Internet. 27

28 Reuters Two documents collections of news articles. Reuters-21578: 21,578 news articles. Reuters-RCV1: 806,791 news articles. This data is especially useful for testing system to classify new articles into categories. 28

29 8.3 Evaluation of unranked retrieval results There exist many measures to evaluate whether the results of an IR system are good or not. Some popular measures: Precision ( 准确率 ) Recall ( 召回 ) Accuracy.. 29

30 Precision ( 准确率 ) Precision: What fraction of the returned results are relevant to the user query? Example: A person searches for webpages about Beijing The search engine returns: 5 relevant webpages 5 irrelevant webpages. Precision = 5 / 10 = 0.5 (50 %) P(relevant retrieved) 30

31 Contengency table ( 应急表 ) Precision and recall, can also be expressed in terms of a contingency table True positive ( 真阳性 ) False positive ( 假阳性 ) False negative ( 假阴性 ) True negative ( 真阴性 ) Precision Recall 31

32 Recall ( 召回 ) Recall: What fraction of the relevant documents in a collection were returned by the system? Example: A database contains 1000 documents about HITSZ. The user search for documents about HITSZ. Only 100 documents about HITSZ are retrieved. Recall = 100 / 1000 = 0.1 (10 %) P(retrieved relevant) 32

33 Accuracy ( 精确 ) Accuracy: The number of documents correctly identified (as relevant or non relevant): Example: There are 1000 documents The IR system correctly identifies 300 documents (as relevant or irrelevant) The IR system incorrectly identify 400 documents (as relevant or irrelevant). Accuracy = 600 / 1000 = 0.6 (60 %) 33

34 Limitations of the accuracy Accuracy has some problem The distribution is skewed ( 偏态分布 ) Generally, over 99.9 % of documents are irrelevant. Thus, an IR system that would consider ALL documents as irrelevant has a high accuracy! But such system would not be good for the user! In a real IR system, identifying some documents as relevant may produce many false positives. 34

35 Limitations of the accuracy A user can tolerate seeing irrelevant documents in the results, as long as there are some relevant documents. For a Web surfer, precision is the most important, every results on the first page should be relevant (high precision) It is ok if some documents are missing (low recall) 35

36 Limitations of the accuracy For a professional searcher: precision can be low, but the recall should be high (all documents should be available) Precision and recall are generally inversely related. if precision increases, recall decreases if recall increases, precision decreases 36

37 F1-measure (F1 度量 ) A trade-off between the precision and recall In this formula the precision and recall have the same importance. We could change the formula to put more importance on the precision or recall 37

38 8.4 Evaluation of ranked retrieval results The previous measures (precision, recall, F1-measure) do not consider how documents are ranked. But in search engines, documents are usually ranked Thus we need new measures to consider how results are ranked. 38

39 Precision-recall curve We take the top k documents for a query (e.g. top 10 documents). We create a graph to see how the recall changes when the precision changes. If the i th document is irrelevant, recall is the same, but precision drops. If the j th document is relevant, recall increases and precision increases. 39

40 Precision-recall curve The curve has a lot of jiggles Solution: use the interpolated precision At a certain recall level r, we only keep the highest precision found. Illustration for the same curve 40

41 11-point interpolated average precision (11 点插值平均精度 ) With the previous precision-recall graph, we can evaluate the result of a single query. But what if we have more than one query? 11-point interpolated average precision For each information need (query), we calculate the precision for the 11 recall levels: 0.0, 0.1, (as in the previous table). For each recall level, we calculate the average precision. We visualize this using a graph 41

42 42

43 Precision-recall graphs can be used to compare two IR systems 43

44 11-point interpolated average precision 44

45 Mean average precision MAP: average precision value obtained for the set of top k documents after each relevant document is retrieved. this value is averaged for each information need (query) The number of queries For each query For each result The documents from j to k The number of results 45

46 Mean average precision This is an interesting measure as it produces a single number rather than a curve. Moreover, it is not necessary to do interpolation or to specify recall levels. The MAP measure can very greatly for different queries. Thus, it is necessary to use many queries for testing an IR system. 46

47 Several other measures Precision at k: the precision at the k-th document in the search results. R-precision ROC Curve Sensitivity Specificity. 47

48 8.5 Assessing relevance To assess the relevance of results, we need some testing data (documents, queries, relevance judgments). Documents Queries Relevance judgments Appropriate queries for the test documents may be selected by some domain experts. Providing relevance judgments for all documents is time consuming. Solution: We can use a subset of all documents for evaluating each query 48

49 Relevance judgments Another problem: relevance judgments of humans may be variable and different for each person. Solution: Measure the agreement between different humans with the Kappa statistic. P(A) = proportion of times the judges agreed P(E) = proportion of times the judges would be expected to agreed by chance. 49

50 Calculating the Kappa statistic 50

51 How to interpret the Kappa stat.? In general: > 0.8 good agreement between judges > 0.67 and <0.8 fair agreement < 0.67 weak agreement. The data should not be used for evaluation. For some real data TREC (see book), it was found that the Kappa was generally between 0.67 and

52 The concept of relevance We have discussed various measures to evaluate IR systems. These measures are useful for tuning the parameters of an IR system to ensure that it returns relevant documents. However, the measures may not reflect what the users really want. So an IR system will be as good as these measures. In practice, it is still quite good. 52

53 The concept of relevance Should we tune the parameters of an IR by hand ( 手动 )? This would be time-consuming! Search engine companies such as Baidu and Bing will instead use machine learning ( 机器学习 ). Machine learning will automatically search for optimal parameter settings to obtain the best performance for the evaluation measures. e.g. to choose weights for the scoring function 53

54 Limitations of relevance Limitations of the concept of relevance The relevance of one document is treated as independent of the relevance of other documents. A document is either relevant or irrelevant (there is no in-between) Relevance is viewed as absolute but it varies among people. Testing with a collection of documents or a population may not translate well to other documents or population. 54

55 Some solutions Define the concepts of relevance using different degrees of relevance: 0 = irrelevant 0.7 = high relevance 1.0 = very high relevance The measure of marginal relevance: how relevant a document is after the user has viewed other documents? e.g. duplicate documents 55

56 System issues Besides retrieval quality, we may want to evaluate the following aspects of an IR system: How fast does it index? (documents / hour) How fast does it search? (speed / index size) How expressive is the query language? How fast the IR system can answer complex queries? 56

57 System issues How large is the document collection? (number of documents) Does the collection covers many topics? Most of these criteria are measurable (speed, size ) 57

58 User utility We would like to evaluate user happiness by considering: relevance, speed, user interface of the system. A happy user finds what he wants. tends to use the same Web search engine again (we can measure how many users come back). There also exists some surveys comparing how many users use each Web search engines. 58

59 User happiness User happiness is hard to measure. For this reason, relevance is often measured instead. To measure user satisfaction (happiness), we need to do user studies ( 用户研究 ). We ask users to do some tasks with the IR system We observe the persons using the IR system and take notes and calculate measures. We can interview the users. 59

60 8.6 User satisfaction We may use objective measures ( 客观的措施 ) time to complete a task, the user look at how many pages of results. We may use subjective measures ( 主观的措施 ) score for user satisfaction user comments on the search interface. Both qualitative ( 定性的措施 ) and quantitative measures ( 定量的措施 ) 60

61 User satisfaction User studies are very useful (e.g. to evaluate the user interface) But user studies are expensive! User studies are also time-consuming. It is difficult to do good user studies Need to design the study well Need to interpret the results. 61

62 User utility For e-commerce: We may measure the time to purchase. We may measure the fraction of searchers who buy something. e.g. 50 % of searchers bought some product. User happiness may not be the most important. The store owner happiness may be more important (how much money is made). VS 62

63 User utility For an enterprise, school, or government: The most important metric is probably user productivity. How much time do users spend to find the information that they need? Information security 63

64 Improving an IR system A/B testing If an IR system is used by many users, it is possible to try different versions of the system with different users. We can compare user satisfaction for these two groups of users to decide which version is better. Usually, this is done to test some small modifications. Usually, only 1 to 10 % of users will be selected randomly to test the new version of the system. 64

65 Improving an IR system Example We want to improve the scoring function of the IR system. We can ask two groups of users to use different versions of the IR system (A/B testing). We can compare the number of clicks on the top search results for the two versions of the IR system. This can help us to choose the best scoring function. 65

66 Improving an IR system Why A/B testing is popular? It is easy to do A/B testing, We can do multiple tests to see if multiples changes are good or bad, Results are easy to understand But it requires to have enough users. 66

67 Results snippets An IR system should show some relevant information to the user about each document found. The standard way is to show a snippet (a short summary of the document) Usually, a snippet consists of: the title of the document, a summary, which is automatically extracted. 67

68 Snippets Two types of snippets: static snippets: the snippets are always the same (they independent of the query) dynamic snippets: the snippets are different for different queries. Simple approach to create a snippet: Take the first two sentences or 50 words of a document. Or show some metadata such as title, date, and author. The snippet is created when indexing documents. The snippet is static. 68

69 Snippets Text summarization Some researchers develop techniques to automatically summarize a text. It is a difficult research problem. How? Try to select the most important sentences of a text. First or last sentences of paragraphs. Sentences with key terms 69

70 Dynamic summaries Display some part of the document. The part of the document should let the user evaluate if the document is useful for his query. Usually, an IR system selects some part that contains many of the query terms, and include some words appearing before and after. Users generally like dynamic summaries. But it is more complicated to provide dynamic summaries than providing static summaries. Could use positional indexes but still need to keep some copy of the documents to generate the snippets. 70

71 Snippets Generating snippets should be fast. Snippets should not be too long. If a document changes, we should also update the snippets 71

72 Conclusion Today, Evaluation of information retrieval systems Second assignment. See you next week! 72

73 References Manning, C. D., Raghavan, P., Schütze, H. Introduction to information retrieval. Cambridge: Cambridge University Press,

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Last week We have discussed about:

More information

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Introduction Philippe Fournier-Viger

More information

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2018 1 Last week What is Information Retrieval

More information

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Last week We have discussed: Evaluation

More information

Information Retrieval

Information Retrieval Information Retrieval Lecture 7 - Evaluation in Information Retrieval Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 29 Introduction Framework

More information

Retrieval Evaluation. Hongning Wang

Retrieval Evaluation. Hongning Wang Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User

More information

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007 Information Retrieval Lecture 7 - Evaluation in Information Retrieval Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1 / 29 Introduction Framework

More information

Information Retrieval. Lecture 7

Information Retrieval. Lecture 7 Information Retrieval Lecture 7 Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations This lecture Evaluating a search engine Benchmarks Precision

More information

Information Retrieval

Information Retrieval Information Retrieval ETH Zürich, Fall 2012 Thomas Hofmann LECTURE 6 EVALUATION 24.10.2012 Information Retrieval, ETHZ 2012 1 Today s Overview 1. User-Centric Evaluation 2. Evaluation via Relevance Assessment

More information

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Search Evaluation Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Table of Content Search Engine Evaluation Metrics for relevancy Precision/recall F-measure MAP NDCG Difficulties in Evaluating

More information

Evaluation. David Kauchak cs160 Fall 2009 adapted from:

Evaluation. David Kauchak cs160 Fall 2009 adapted from: Evaluation David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture8-evaluation.ppt Administrative How are things going? Slides Points Zipf s law IR Evaluation For

More information

CSCI 5417 Information Retrieval Systems. Jim Martin!

CSCI 5417 Information Retrieval Systems. Jim Martin! CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 7 9/13/2011 Today Review Efficient scoring schemes Approximate scoring Evaluating IR systems 1 Normal Cosine Scoring Speedups... Compute the

More information

THIS LECTURE. How do we know if our results are any good? Results summaries: Evaluating a search engine. Making our good results usable to a user

THIS LECTURE. How do we know if our results are any good? Results summaries: Evaluating a search engine. Making our good results usable to a user EVALUATION Sec. 6.2 THIS LECTURE How do we know if our results are any good? Evaluating a search engine Benchmarks Precision and recall Results summaries: Making our good results usable to a user 2 3 EVALUATING

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 8: Evaluation & Result Summaries Hinrich Schütze Center for Information and Language Processing, University of Munich 2013-05-07

More information

Overview. Lecture 6: Evaluation. Summary: Ranked retrieval. Overview. Information Retrieval Computer Science Tripos Part II.

Overview. Lecture 6: Evaluation. Summary: Ranked retrieval. Overview. Information Retrieval Computer Science Tripos Part II. Overview Lecture 6: Evaluation Information Retrieval Computer Science Tripos Part II Recap/Catchup 2 Introduction Ronan Cummins 3 Unranked evaluation Natural Language and Information Processing (NLIP)

More information

云计算入门 Introduction to Cloud Computing GESC1001

云计算入门 Introduction to Cloud Computing GESC1001 Lecture #6 云计算入门 Introduction to Cloud Computing GESC1001 Philippe Fournier-Viger Professor School of Humanities and Social Sciences philfv8@yahoo.com Fall 2017 1 Introduction Last week: how cloud applications

More information

Part 7: Evaluation of IR Systems Francesco Ricci

Part 7: Evaluation of IR Systems Francesco Ricci Part 7: Evaluation of IR Systems Francesco Ricci Most of these slides comes from the course: Information Retrieval and Web Search, Christopher Manning and Prabhakar Raghavan 1 This lecture Sec. 6.2 p How

More information

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

云计算入门 Introduction to Cloud Computing GESC1001

云计算入门 Introduction to Cloud Computing GESC1001 Lecture #3 云计算入门 Introduction to Cloud Computing GESC1001 Philippe Fournier-Viger Professor School of Humanities and Social Sciences philfv8@yahoo.com Fall 2018 1 Course schedule Part 1 Part 2 Part 3 Introduction

More information

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Last week We have discussed in

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS3245 Information Retrieval Lecture 9: IR Evaluation 9 Ch. 7 Last Time The VSM Reloaded optimized for your pleasure! Improvements to the computation and selection

More information

Web Information Retrieval. Exercises Evaluation in information retrieval

Web Information Retrieval. Exercises Evaluation in information retrieval Web Information Retrieval Exercises Evaluation in information retrieval Evaluating an IR system Note: information need is translated into a query Relevance is assessed relative to the information need

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Information Search and Management Prof. Chris Clifton 27 August 2018 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 AD-hoc IR: Basic Process Information

More information

CS6322: Information Retrieval Sanda Harabagiu. Lecture 13: Evaluation

CS6322: Information Retrieval Sanda Harabagiu. Lecture 13: Evaluation Sanda Harabagiu Lecture 13: Evaluation Sec. 6.2 This lecture How do we know if our results are any good? Evaluating a search engine Benchmarks Precision and recall Results summaries: Making our good results

More information

CS54701: Information Retrieval

CS54701: Information Retrieval CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful

More information

Logitech G302 Daedalus Prime Setup Guide 设置指南

Logitech G302 Daedalus Prime Setup Guide 设置指南 Logitech G302 Daedalus Prime Setup Guide 设置指南 Logitech G302 Daedalus Prime Contents / 目录 English................. 3 简体中文................. 6 2 Logitech G302 Daedalus Prime 1 On 2 USB Your Daedalus Prime

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 5: Evaluation Ruixuan Li http://idc.hust.edu.cn/~rxli/ Sec. 6.2 This lecture How do we know if our results are any good? Evaluating a search engine Benchmarks

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

数据挖掘 Introduction to Data Mining

数据挖掘 Introduction to Data Mining 数据挖掘 Introduction to Data Mining Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 S8700113C 1 Introduction Last week: Association Analysis

More information

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann Search Engines Chapter 8 Evaluating Search Engines 9.7.2009 Felix Naumann Evaluation 2 Evaluation is key to building effective and efficient search engines. Drives advancement of search engines When intuition

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Chris Manning, Pandu Nayak and Prabhakar Raghavan Evaluation 1 Situation Thanks to your stellar performance in CS276, you

More information

Machine Vision Market Analysis of 2015 Isabel Yang

Machine Vision Market Analysis of 2015 Isabel Yang Machine Vision Market Analysis of 2015 Isabel Yang CHINA Machine Vision Union Content 1 1.Machine Vision Market Analysis of 2015 Revenue of Machine Vision Industry in China 4,000 3,500 2012-2015 (Unit:

More information

梁永健. W K Leung. 华为企业业务 BG 解决方案销售部 CTO Chief Technology Officer, Solution Sales, Huawei

梁永健. W K Leung. 华为企业业务 BG 解决方案销售部 CTO Chief Technology Officer, Solution Sales, Huawei 梁永健 W K Leung 华为企业业务 BG 解决方案销售部 CTO Chief Technology Officer, Solution Sales, Huawei Network Threats ICT 移动化云计算社交化大数据 Mobile Cloud Social Big Data 网络威胁 APT Mobile threats Web threats Worms Trojans Botnet

More information

Presentation Title. By Author The MathWorks, Inc. 1

Presentation Title. By Author The MathWorks, Inc. 1 Presentation Title By Author 2014 The MathWorks, Inc. 1 4G LTE 轻松入门 陈建平 MathWorks 中国 2014 The MathWorks, Inc. 2 大纲 4G 综述 LTE 系统工具箱的应用 黄金参考模型 点到点链路级仿真 信号发生和分析 信号信息恢复 4G 系统的并行仿真加速 3 无线标准的演化 * *Although ETSI

More information

A Benchmark For Stroke Extraction of Chinese Characters

A Benchmark For Stroke Extraction of Chinese Characters 2015-09-29 13:04:51 http://www.cnki.net/kcms/detail/11.2442.n.20150929.1304.006.html 北京大学学报 ( 自然科学版 ) Acta Scientiarum Naturalium Universitatis Pekinensis doi: 10.13209/j.0479-8023.2016.025 A Benchmark

More information

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set. Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?

More information

Information Retrieval Spring Web retrieval

Information Retrieval Spring Web retrieval Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Chapter 8. Evaluating Search Engine

Chapter 8. Evaluating Search Engine Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can

More information

2.8 Megapixel industrial camera for extreme environments

2.8 Megapixel industrial camera for extreme environments Prosilica GT 1920 Versatile temperature range for extreme environments PTP PoE P-Iris and DC-Iris lens control 2.8 Megapixel industrial camera for extreme environments Prosilica GT1920 is a 2.8 Megapixel

More information

5.1 Megapixel machine vision camera with GigE interface

5.1 Megapixel machine vision camera with GigE interface Manta G-507 Latest Sony CMOS sensor PoE optional Angled-head and board level variants Video-iris lens control 5.1 Megapixel machine vision camera with GigE interface Manta G-507 is a 5.1 Megapixel machine

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document

More information

Introduction to Information Retrieval (Manning, Raghavan, Schutze)

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 3 Dictionaries and Tolerant retrieval Chapter 4 Index construction Chapter 5 Index compression Content Dictionary data structures

More information

Lecture 5: Evaluation

Lecture 5: Evaluation Lecture 5: Evaluation Information Retrieval Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group Simone.Teufel@cl.cam.ac.uk Lent 2014 204 Overview 1 Recap/Catchup

More information

Information Retrieval (Part 1)

Information Retrieval (Part 1) Information Retrieval (Part 1) Fabio Aiolli http://www.math.unipd.it/~aiolli Dipartimento di Matematica Università di Padova Anno Accademico 2008/2009 1 Bibliographic References Copies of slides Selected

More information

IEEE 成立于 1884 年, 是全球最大的技术行业协会, 凭借其多样化的出版物 会议 教育论坛和开发标准, 在激励未来几代人进行技术创新方面做出了巨大的贡献, 其数据库产品 IEL(IEEE/IET Electronic Library)

IEEE 成立于 1884 年, 是全球最大的技术行业协会, 凭借其多样化的出版物 会议 教育论坛和开发标准, 在激励未来几代人进行技术创新方面做出了巨大的贡献, 其数据库产品 IEL(IEEE/IET Electronic Library) IEL Newsletter 2013 年 12 月特刊 :2012 年 IEEE 期刊影响因子及相关评价指标情况概览 欢迎体验全新的 IEEE Xplore 数字图书馆 www.ieee.org/ieeexplore IEEE 成立于 1884 年, 是全球最大的技术行业协会, 凭借其多样化的出版物 会议 教育论坛和开发标准, 在激励未来几代人进行技术创新方面做出了巨大的贡献, 其数据库产品 IEL(IEEE/IET

More information

SESEC IV. China Cybersecurity. Standardization Monthly. Newsletter. June 2018

SESEC IV. China Cybersecurity. Standardization Monthly. Newsletter. June 2018 Author: Betty XU Distributed to: SESEC Partners, EU standardization stakeholders Date of issue: 19-07-2018 SESEC IV China Cybersecurity Standardization Monthly Newsletter Introduction of SESEC Project

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Informa(on Retrieval

Informa(on Retrieval Introduc*on to Informa(on Retrieval Lecture 8: Evalua*on 1 Sec. 6.2 This lecture How do we know if our results are any good? Evalua*ng a search engine Benchmarks Precision and recall 2 EVALUATING SEARCH

More information

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points? Ranked Retrieval One option is to average the precision scores at discrete Precision 100% 0% More junk 100% Everything points on the ROC curve But which points? Recall We want to evaluate the system, not

More information

测试基础架构 演进之路. 茹炳晟 (Robin Ru) ebay 中国研发中心

测试基础架构 演进之路. 茹炳晟 (Robin Ru) ebay 中国研发中心 测试基础架构 演进之路 茹炳晟 (Robin Ru) ebay 中国研发中心 茹炳晟 (Robin Ru) 主要工作经历 : ebay 中国研发中心 -- 测试基础架构技术主管 Hewlett-Packard 惠普软件 ( 中国 ) 研发中心 -- 测试架构师 资深测试专家 Alcatel-Lucent 阿尔卡特朗讯 ( 上海 ) 研发中心 -- 测试技术主管 Cisco 思科 ( 中国 ) 研发中心

More information

Technology: Anti-social Networking 科技 : 反社交网络

Technology: Anti-social Networking 科技 : 反社交网络 Technology: Anti-social Networking 科技 : 反社交网络 1 Technology: Anti-social Networking 科技 : 反社交网络 The Growth of Online Communities 社交网络使用的增长 Read the text below and do the activity that follows. 阅读下面的短文, 然后完成练习

More information

PRODUCT SPECIFICATION

PRODUCT SPECIFICATION Capacitive PRODUCT SPECIFICATION Customer ( 客户名称 ) : Customer No.( 客户编码 ): Product( 产品类型 ) :U Touch Panel Product No.( 产品编号 ): BET-CT016001V1 Date ( 日期 ) : 2014-1-01 BET Optronics Technology Co.,Ltd APPROVED

More information

TDS - 3. Battery Compartment. LCD Screen. Power Button. Hold Button. Body. Sensor. HM Digital, Inc.

TDS - 3. Battery Compartment. LCD Screen. Power Button. Hold Button. Body. Sensor. HM Digital, Inc. TDS - 3 Battery Compartment LCD Screen Power Button Hold Button Body Sensor Dual Range Measures from 0~999ppm, with a resolution of 1 ppm. From 1,000 to 9,990ppm, the resolution is 10 ppm, indicated by

More information

Chapter 2. Architecture of a Search Engine

Chapter 2. Architecture of a Search Engine Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them

More information

Search Engine Architecture II

Search Engine Architecture II Search Engine Architecture II Primary Goals of Search Engines Effectiveness (quality): to retrieve the most relevant set of documents for a query Process text and store text statistics to improve relevance

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

Oriented Scene Text Detection Revisited. Xiang Bai Huazhong University of Science and Technology

Oriented Scene Text Detection Revisited. Xiang Bai Huazhong University of Science and Technology The Invited Talk in Vision and Learning Seminar (VALSE) Xiamen, 2017-4-22 Oriented Scene Text Detection Revisited Xiang Bai Huazhong University of Science and Technology xbai@hust.edu.cn http://mclab.eic.hust.edu.cn/~xbai/

More information

Understanding IO patterns of SSDs

Understanding IO patterns of SSDs 固态硬盘 I/O 特性测试 周大 众所周知, 固态硬盘是一种由闪存作为存储介质的数据库存储设备 由于闪存和磁盘之间物理特性的巨大差异, 现有的各种软件系统无法直接使用闪存芯片 为了提供对现有软件系统的支持, 往往在闪存之上添加一个闪存转换层来实现此目的 固态硬盘就是在闪存上附加了闪存转换层从而提供和磁盘相同的访问接口的存储设备 一方面, 闪存本身具有独特的访问特性 另外一方面, 闪存转换层内置大量的算法来实现闪存和磁盘访问接口之间的转换

More information

Assignment 1. Assignment 2. Relevance. Performance Evaluation. Retrieval System Evaluation. Evaluate an IR system

Assignment 1. Assignment 2. Relevance. Performance Evaluation. Retrieval System Evaluation. Evaluate an IR system Retrieval System Evaluation W. Frisch Institute of Government, European Studies and Comparative Social Science University Vienna Assignment 1 How did you select the search engines? How did you find the

More information

Duke University. Information Searching Models. Xianjue Huang. Math of the Universe. Hubert Bray

Duke University. Information Searching Models. Xianjue Huang. Math of the Universe. Hubert Bray Duke University Information Searching Models Xianjue Huang Math of the Universe Hubert Bray 24 July 2017 Introduction Information searching happens in our daily life, and even before the computers were

More information

绝佳的并行处理 - FPGA 加速的根本基石

绝佳的并行处理 - FPGA 加速的根本基石 赛灵思技术日 XILINX TECHNOLOGY DAY 绝佳的并行处理 - 加速的根本基石 朱勇赛灵思大中华区业务拓展总监 2019 年 3 月 19 日 加速 : 大幅提升应用的性能 Without acceleration CPU func1 func2 func3 func4 With acceleration CPU func1 func3 func4 func2 handles compute-intensive,

More information

China Next Generation Internet (CNGI) project and its impact. MA Yan Beijing University of Posts and Telecommunications 2009/08/06.

China Next Generation Internet (CNGI) project and its impact. MA Yan Beijing University of Posts and Telecommunications 2009/08/06. China Next Generation Internet (CNGI) project and its impact MA Yan Beijing University of Posts and Telecommunications 2009/08/06 Outline Next Generation Internet CNGI project in general CNGI-CERNET2 CERNET2

More information

Informa(on Retrieval

Informa(on Retrieval Introduc)on to Informa(on Retrieval CS276 Informa)on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 8: Evalua)on Sec. 6.2 This lecture How do we know if our results are any good? Evalua)ng

More information

This lecture. Measures for a search engine EVALUATING SEARCH ENGINES. Measuring user happiness. Measures for a search engine

This lecture. Measures for a search engine EVALUATING SEARCH ENGINES. Measuring user happiness. Measures for a search engine Sec. 6.2 Introduc)on to Informa(on Retrieval CS276 Informa)on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 8: Evalua)on This lecture How do we know if our results are any good? Evalua)ng

More information

Information Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Boolean Retrieval vs. Ranked Retrieval Many users (professionals) prefer

More information

Exam IST 441 Spring 2014

Exam IST 441 Spring 2014 Exam IST 441 Spring 2014 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.

More information

Oracle 一体化创新云技术 助力智慧政府信息化战略. Copyright* *2014*Oracle*and/or*its*affiliates.*All*rights*reserved.** *

Oracle 一体化创新云技术 助力智慧政府信息化战略. Copyright* *2014*Oracle*and/or*its*affiliates.*All*rights*reserved.** * Oracle 一体化创新云技术 助力智慧政府信息化战略 ?* x * Exadata Exadata* * * Exadata* InfiniBand 0Gbits/S 5?10 * Exadata* * Exadata& & Oracle exadata! " 4 " 240 12! "!! " " " Exadata* Exadata & Single?Instance*Database*

More information

Exam IST 441 Spring 2011

Exam IST 441 Spring 2011 Exam IST 441 Spring 2011 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.

More information

Information Retrieval May 15. Web retrieval

Information Retrieval May 15. Web retrieval Information Retrieval May 15 Web retrieval What s so special about the Web? The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically

More information

AvalonMiner Raspberry Pi Configuration Guide. AvalonMiner 树莓派配置教程 AvalonMiner Raspberry Pi Configuration Guide

AvalonMiner Raspberry Pi Configuration Guide. AvalonMiner 树莓派配置教程 AvalonMiner Raspberry Pi Configuration Guide AvalonMiner 树莓派配置教程 AvalonMiner Raspberry Pi Configuration Guide 简介 我们通过使用烧录有 AvalonMiner 设备管理程序的树莓派作为控制器 使 用户能够通过控制器中管理程序的图形界面 来同时对多台 AvalonMiner 6.0 或 AvalonMiner 6.01 进行管理和调试 本教程将简要的说明 如何把 AvalonMiner

More information

计算机科学与技术专业本科培养计划. Undergraduate Program for Specialty in Computer Science & Technology

计算机科学与技术专业本科培养计划. Undergraduate Program for Specialty in Computer Science & Technology 计算机科学与技术学院 计算机科学与技术学院下设 6 个研究所 : 计算科学理论研究所 数据工程研究所 并行分布式计算研究所 数据存储研究所 数字媒体研究所 信息安全研究所 ;2 个中心 : 嵌入式软件与系统工程中心和教学中心 外存储系统国家专业实验室 教育部信息存储系统重点实验室 中国教育科研网格主结点 国家高性能计算中心 ( 武汉 ) 服务计算技术与系统教育部重点实验室 湖北省数据库工程技术研究中心

More information

The State and Opportunities of HPC Applications in China. Ruibo Wang National University of Defense Technology

The State and Opportunities of HPC Applications in China. Ruibo Wang National University of Defense Technology The State and Opportunities of HPC Applications in China Ruibo Wang National University of Defense Technology Outline Brief introduction to the Sites Applications Fusion Development of HPC, Cloud & Big

More information

dr.ir. D. Hiemstra dr. P.E. van der Vet

dr.ir. D. Hiemstra dr. P.E. van der Vet dr.ir. D. Hiemstra dr. P.E. van der Vet Abstract Over the last 20 years genomics research has gained a lot of interest. Every year millions of articles are published and stored in databases. Researchers

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval SCCS414: Information Storage and Retrieval Christopher Manning and Prabhakar Raghavan Lecture 10: Text Classification; Vector Space Classification (Rocchio) Relevance

More information

Final Exam Search Engines ( / ) December 8, 2014

Final Exam Search Engines ( / ) December 8, 2014 Student Name: Andrew ID: Seat Number: Final Exam Search Engines (11-442 / 11-642) December 8, 2014 Answer all of the following questions. Each answer should be thorough, complete, and relevant. Points

More information

大数据基准测试 : 原理 方法和应用. 詹剑锋 中国科学院计算技术研究所中国科学院大学 可信云服务大会, 北京 INSTITUTE OF COMPUTING TECHNOLOGY

大数据基准测试 : 原理 方法和应用. 詹剑锋   中国科学院计算技术研究所中国科学院大学 可信云服务大会, 北京 INSTITUTE OF COMPUTING TECHNOLOGY 大数据基准测试 : 原理 方法和应用 詹剑锋 http://prof.ict.ac.cn/bigdatabench 中国科学院计算技术研究所中国科学院大学 2015.7.31 2015 可信云服务大会, 北京 INSTITUTE OF COMPUTING TECHNOLOGY Outline 原理 方法 BigDataBench 计量的意义 科学和人类日常生活的基础 牛顿 ( 力 ) 开尔文 ( 温度

More information

Developing a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter

Developing a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter university of copenhagen Københavns Universitet Developing a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter Published in: Advances

More information

Virtual Memory Management for Main-Memory KV Database Using Solid State Disk *

Virtual Memory Management for Main-Memory KV Database Using Solid State Disk * ISSN 1673-9418 CODEN JKYTA8 E-mail: fcst@vip.163.com Journal of Frontiers of Computer Science and Technology http://www.ceaj.org 1673-9418/2011/05(08)-0686-09 Tel: +86-10-51616056 DOI: 10.3778/j.issn.1673-9418.2011.08.002

More information

Automatic people tagging for expertise profiling in the enterprise

Automatic people tagging for expertise profiling in the enterprise Automatic people tagging for expertise profiling in the enterprise Pavel Serdyukov * (Yandex, Moscow, Russia) Mike Taylor, Vishwa Vinay, Matthew Richardson, Ryen White (Microsoft Research, Cambridge /

More information

MeeGo : An Open Source OS Solution For Client Devices

MeeGo : An Open Source OS Solution For Client Devices MeeGo : An Open Source OS Solution For Client Devices Fleming Feng Open Source Technology Center System Software Division Intel Asia Pacific Research and Development Ltd. 1. Agenda Mobile Internet boosts

More information

三 依赖注入 (dependency injection) 的学习

三 依赖注入 (dependency injection) 的学习 三 依赖注入 (dependency injection) 的学习 EJB 3.0, 提供了一个简单的和优雅的方法来解藕服务对象和资源 使用 @EJB 注释, 可以将 EJB 存根对象注入到任何 EJB 3.0 容器管理的 POJO 中 如果注释用在一个属性变量上, 容器将会在它被第一次访问之前赋值给它 在 Jboss 下一版本中 @EJB 注释从 javax.annotation 包移到了 javax.ejb

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Bi-monthly report. Tianyi Luo

Bi-monthly report. Tianyi Luo Bi-monthly report Tianyi Luo 1 Work done in this week Write a crawler plus based on keywords (Support Chinese and English) Modify a Sina weibo crawler (340M/day) Offline learning to rank module is completed

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction Inverted index Processing Boolean queries Course overview Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute for Natural

More information

Performance Evaluation

Performance Evaluation Chapter 4 Performance Evaluation For testing and comparing the effectiveness of retrieval and classification methods, ways of evaluating the performance are required. This chapter discusses several of

More information

XML allows your content to be created in one workflow, at one cost, to reach all your readers XML 的优势 : 只需一次加工和投入, 到达所有读者的手中

XML allows your content to be created in one workflow, at one cost, to reach all your readers XML 的优势 : 只需一次加工和投入, 到达所有读者的手中 XML allows your content to be created in one workflow, at one cost, to reach all your readers XML 的优势 : 只需一次加工和投入, 到达所有读者的手中 We can format your materials to be read.. in print 印刷 XML Conversions online

More information

From Passages into Elements in XML Retrieval

From Passages into Elements in XML Retrieval From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles

More information

Information Retrieval

Information Retrieval Natural Language Processing SoSe 2014 Information Retrieval Dr. Mariana Neves June 18th, 2014 (based on the slides of Dr. Saeedeh Momtazi) Outline Introduction Indexing Block 2 Document Crawling Text Processing

More information

: Operating System 计算机原理与设计

: Operating System 计算机原理与设计 0117401: Operating System 计算机原理与设计 Chapter 1-2: CS Structure 陈香兰 xlanchen@ustceducn http://staffustceducn/~xlanchen Computer Application Laboratory, CS, USTC @ Hefei Embedded System Laboratory, CS, USTC

More information

UK-China Science Bridges: R&D of 4G Wireless Mobile Communications. An Introduction of Shanghai Research Center for Wireless Communications (WiCO)

UK-China Science Bridges: R&D of 4G Wireless Mobile Communications. An Introduction of Shanghai Research Center for Wireless Communications (WiCO) UK-China Science Bridges: R&D of 4G Wireless Mobile Communications An Introduction of Shanghai Research Center for Wireless Communications (WiCO) Outline About WiCO National Key Special Programs in Science

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search IR Evaluation and IR Standard Text Collections Instructor: Rada Mihalcea Some slides in this section are adapted from lectures by Prof. Ray Mooney (UT) and Prof. Razvan

More information

Information Retrieval

Information Retrieval Natural Language Processing SoSe 2015 Information Retrieval Dr. Mariana Neves June 22nd, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline Introduction Indexing Block 2 Document Crawling Text Processing

More information

Bing.com scholar. Мобильный портал WAP версия: wap.altmaster.ru

Bing.com scholar. Мобильный портал WAP версия: wap.altmaster.ru Мобильный портал WAP версия: wap.altmaster.ru Bing.com scholar Aug 16 2011. I have already had several people ask me whether Bing offers something comparable to Google Scholar. Bing's alternative is Microsoft.

More information

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Evaluation" Evaluation is key to building

More information

Chapter 4. Processing Text

Chapter 4. Processing Text Chapter 4 Processing Text Processing Text Modifying/Converting documents to index terms Convert the many forms of words into more consistent index terms that represent the content of a document What are

More information

我们应该做什么? 告知性分析 未来会发生什么? 预测性分析 为什么会发生 诊断性分析 过去发生了什么? 描述性分析 高级分析 传统 BI. Source: Gartner

我们应该做什么? 告知性分析 未来会发生什么? 预测性分析 为什么会发生 诊断性分析 过去发生了什么? 描述性分析 高级分析 传统 BI. Source: Gartner 价值 我们应该做什么? 告知性分析 未来会发生什么? 预测性分析 为什么会发生 诊断性分析 过去发生了什么? 描述性分析 传统 BI 高级分析 Source: Gartner 困难 常见方案 Cortana 高级分析套件 SQL Server 2016 或者 Microsoft R Server Machine Learning 或者 Microsoft R Server 1. 业务理解 2. 数据理解

More information

modern database systems lecture 4 : information retrieval

modern database systems lecture 4 : information retrieval modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation

More information