信息检索与搜索引擎 Introduction to Information Retrieval GESC1007

Size: px
Start display at page:

Download "信息检索与搜索引擎 Introduction to Information Retrieval GESC1007"

Transcription

1 信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities Spring

2 Last week We have discussed in more details about how index are created. Tokenization, normalization, lemmatization Phrase queries using positional indexes QQ Group: Website: PPTs 2

3 Course schedule ( 日程安排 ) Lecture 1 Lecture 2 Lecture 3 Lecture 4 Lecture 5 Lecture 6 Lecture 7 Introduction Boolean retrieval ( 布尔检索模型 ) Term vocabulary and posting lists Dictionaries and tolerant retrieval Index construction and compression Scoring, weighting, and the vector space model Computer scores, and a complete search system Evaluation in information retrieval Web search engines, advanced topics, and conclusion 3

4 About last course Normalization - 规范化 : the process of converting tokens to a standard form Stemming: consists of removing the end of words (simple) cars airplanes car airplane Lemmatization: converting a word to a common base form called lemma (complicate) am, are, is be 4

5 CHAPTER 3 DICTIONARIES AND TOLERANT RETRIEVAL PDF p.86-5

6 Previous weeks Boolean retrieval model ( 布尔检索模型 using Boolean operators) Shenzhen AND food Phrase ( 短语 ) queries Airplane tickets from Beijing Proximity queries Shenzhen (within 5 words) of City To find documents, we have used a dictionary ( 词典 - also called inverted index 倒排索引 ). 6

7 Today How to deal with typographical errors ( 打字错误 )? Shenzhen vs Shenzhennn often made by accident ( 无意地 ) How to deal with different spellings ( 拼法 )? Color vs Colour analyze vs analyse How to deal with phonetically similar terms ( 发音相似的词 )? concede vs conceed right vs write vs rite vs wright 7

8 Wildcard queries ( 通配符查询 ) Wildcard (*) query: a query containing the wildcard ( 通配符 ) character * * = one or more characters e.g. automat* to search for: automated, automation, automata When should we use wildcard queries? when we want documents containing variants of a query term; when we are uncertain about how to spell a query term, e.g. Sydney vs Sidney 8

9 Searching for documents Given A set of documents An inverted index (dictionary 词典 ) A query ( 查询 ) we can search for documents. Several steps for searching 9

10 Example Dictionary China City Located Shenzhen Book1, Book 20, Book 34 Book1, Book2, Book 7, Book 20. Book1, Book3, Book 5, Book 9. The query is: CITY AND CHINA 1) Locate CITY in the dictionary 10

11 Example Dictionary China City Located Shenzhen Book1, Book 20, Book 34 Book1, Book2, Book 7, Book 20. Book1, Book3, Book 5, Book 9. The query is: CITY AND CHINA 1) Locate CITY in the dictionary 2) Retrieve its postings 11

12 Example Dictionary China City Located Shenzhen Book1, Book 20, Book 34 Book1, Book2, Book 7, Book 20. Book1, Book3, Book 5, Book 9. The query is: CITY AND CHINA 1) Locate CITY in the dictionary 2) Retrieve its postings 3) Locate CHINA in the dictionary 12

13 Example Dictionary China City Located Shenzhen Book1, Book 20, Book 34 Book1, Book2, Book 7, Book 20. Book1, Book3, Book 5, Book 9. The query is: CITY AND CHINA 1) Locate CITY in the dictionary 2) Retrieve its postings 3) Locate CHINA in the dictionary 4) Retrieve its postings 13

14 How an IR system answers boolean queries? Dictionary China City Located Shenzhen Book1, Book 20, Book 34 Book1, Book2, Book 7, Book 20. Book1, Book3, Book 5, Book 9. The query is: CITY AND CHINA RESULT: 1) Locate CITY in the dictionary Book 1, Book20 2) Retrieve its postings 3) Locate CHINA in the dictionary 4) Retrieve its postings 5) Do the intersection ( 交线 ) of the two lists 14

15 How to quickly search terms in a dictionary? Different approaches Choosing an approach depends on: the number of terms in the dictionary (few or many?) the terms in the dictionary are static or dynamic (they may change)? (new terms are added? some terms are removed?) the relative frequencies( 相对频率 ) that each term is accessed (some terms are much more popular than others?) 15

16 Approach 1: Hashing ( 散列 ) Basic idea: An hash function ( 散列函数 ) is used to associate a positive number to each term of the dictionary. Example: h(shenzhen) =

17 Approach 1: Hashing ( 散列 ) Example: We can define the hash function as the number of letters in a word h(term) = number of letters h(china) = 5 h(shenzhen) = 8 h(city) = 4 h(located) = 6 These numbers are called «hash values» ( 散列值 ) 17

18 Dictionary Approach 1: Hashing ( 散列 ) The dictionary is created such that terms are associated to their values 4 City Book1, Book2, Book 10,, Book20 5 China Book1, Book 20, 6 8 Located Shenzhen Book1, Book3 18

19 Approach 1: Hashing ( 散列 ) When searching in a dictionary, the hash function is used to quickly find the terms of the query. Dictionary 4 City Book1, Book2, Book 10,, Book20 5 China Book1, Book 20, 6 8 Located Shenzhen Book1, Book3 19

20 Approach 1: Hashing ( 散列 ) Dictionary City AND Shenzhen h(city) = h(shenzhen) = 4 City Book1, Book2, Book 10,, Book20 5 China Book1, Book 20, 6 Located 8 Shenzhen Book1, Book3 20

21 Approach 1: Hashing ( 散列 ) Dictionary City AND Shenzhen h(city) = 4 h(shenzhen) = 8 4 City Book1, Book2, Book 10,, Book20 5 China Book1, Book 20, 6 Located 8 Shenzhen Book1, Book3 21

22 Approach 1: Hashing ( 散列 ) Dictionary City AND Shenzhen h(city) = 4 h(shenzhen) = 8 4 City Book1, Book2, Book 10,, Book20 5 China Book1, Book 20, 6 Located 8 Shenzhen Book1, Book3 22

23 Approach 1: Hashing ( 散列 ) Dictionary City AND Shenzhen h(city) = 4 h(shenzhen) = 8 4 City Book1, Book2, Book 10,, Book20 5 China Book1, Book 20, 6 Located Result: Book 1 8 Shenzhen Book1, Book3 23

24 Advantage of Hashing ( 散列 ) Using a hash function ( 散列函数 ) is very fast for searching in a dictionary. Dictionary By calculating the value of the hash function, we can directly find where a term is located in the dictionary. 4 City Book1, Book2, Book 10,, Book20 5 China 24

25 Problem of Hashing ( 散列 ) However, it is possible that many terms have the same value for the hash function (this is a collision 冲突 ). In this case, this approach will still be slow In our example: Most words in English have less than 17 letters Thus, there will be many collisions. 25

26 Dictionary 4 City Maze Quiz Book1, Book2, Book 10,, Book20 5 Jury 26

27 Problem of Hashing ( 散列 ) We could solve that problem by using a better hash function ( 散列函数 ). h(term) = sum of the letters when converted to numbers h(city) = c + I + t + y = 57 This would work better because terms are less likely to have the same number. 27

28 Problem of Hashing ( 散列 ) There is no simple way of finding variants of the same query term: resume vs résumé Those two words may not have the same number. We cannot do wildcard queries automat* to search for automated, automation 28

29 Approach 2: Search tree ( 搜索树 ) Basic idea: To be able to search quickly, a tree will be used. The terms will be inserted in the tree. The tree will be used to quickly search for the terms. 29

30 Ilustration a-m Root 根节点 n-z a-h h-m n-r s-z city located shenzhen 30

31 Ilustration a-m Root 根节点 n-z a-h h-m n-r s-z internal nodes ( 内部节点 ) city located shenzhen 31

32 Description of a search tree A search tree is a tree where each node can have several child nodes. To search for a term, we start from the root ( 根节点 ) of the tree. Each internal node ( 内部节点 ) in the tree has a test to decide which child node should be explored. The search ends when the term is found. EXAMPLE 32

33 Searching CITY a-m Root 根节点 n-z a-h h-m n-r s-z city located shenzhen 33

34 Searching CITY a-m Root 根节点 Root n-z a-h h-m n-r s-z city located shenzhen Search always start from the root of the tree 34

35 Searching CITY a-m Root 根节点 Root n-z a-h h-m n-r s-z city located shenzhen Search always start from the root of the tree 35

36 Searching CITY a-m Root 根节点 Root n-z a-h h-m n-r s-z city located shenzhen Search always start from the root of the tree 36

37 Searching CITY a-m Root 根节点 Root n-z a-h h-m n-r s-z city located shenzhen Search always start from the root of the tree 37

38 Searching Shenzhen a-m Root 根节点 n-z a-h h-m n-r s-z city located shenzhen 38

39 Searching Shenzhen a-m Root 根节点 n-z a-h h-m n-r s-z city located shenzhen 39

40 Searching Shenzhen a-m Root 根节点 n-z a-h h-m n-r s-z city located shenzhen 40

41 Searching Shenzhen a-m Root 根节点 n-z a-h h-m n-r s-z city located shenzhen 41

42 Searching Shenzhen a-m Root 根节点 n-z a-h h-m n-r s-z city located shenzhen 42

43 Approach 2: Search tree ( 搜索树 ) Advantages: Using a search tree ( 搜索树 ) allows to quickly find terms in a dictionary to answer a query. It allows to search all terms that match a prefix ( 前缀 ). e.g. automat* (a type of wildcard query) 43

44 Searching Automat * a-m Root 根节点 Root n-z a-h h-m n-r s-z utomated automation located shenzhen Search always start from the root of the tree 44

45 Technical details There are many types of search trees: binary tree ( 二叉树 ): a tree where each node has no more than two childs. B tree (B 树 ): a type of tree where all branches are equally long. B+ tree (B+ 树 ) : We will not discuss these details 45

46 How to apply this to Chinese? In English there is an order between letters: A, B, C. X, Y, Z. In Chinese, there is not a standard ordering for the characters used for dictionaries, etc. Semantically organized dictionaries Phonetically organized dictionary (pinyin) Number of strokes etc. 46

47 When to use wildcard queries? When the user is uncertain of the spelling of a term S*dney for Sydney or Sidney The user want to find variations of the same word. col*r for color or colour 47

48 When to use wildcard queries? The user want to find variations a term judic* for judicial or judiciary The user want to find a word that may be written differently in another language Universit* of Stutgart University Université Universitad 48

49 Trailing wildcard queries Trailing wildcard query: the * symbol appears at the end of a term. automat* judic* These queries can be easily handled using a search tree with a dictionary. 49

50 Leading wildcard queries Leading wildcard query: the * symbol appears at the beginning of a term. *mobile automobile mobile immobile How to handle these queries? Solution: use a reverse search tree where the terms are read backward. Thus two trees: one for trailing queries one for leading queries 50

51 Reverse search tree CITY a-m Root n-z a-h h-m n-s t-z located shenzhen Search always start from the root of the tree city 51

52 Reverse search tree CITY a-m Root n-z a-h h-m n-s t-z located shenzhen Search always start from the root of the tree city 52

53 Other wildcard queries? But what if the wilcard * is not at the end or beginning of a term? S*dney We would like to handle queries where the * symbol can appear anywhere in a term 53

54 Queries with one wildcard (*) Using a search tree and a reverse search tree, an IR system can answer any queries containing one wildcard (*). How? example: S*dney Use the search tree to find all terms starting with S*. Use the reverse search tree to find all terms ending with *dney. Calculate the intersection of the terms starting with S* and ending with *dney. Then, find the documents corresponding to these terms in the dictionary as usual. 54

55 Words that start with S*. Sidney Shanghai Shenzhen Words that end with *dney. Kidney Sidney Words that match s*dney Sidney 55

56 General wildcard queries General wildcard query: a query containing one or more wildcards (*) transf*mat* *an* How to answer these queries? Two techniques 56

57 Permuterm indexes The permuterm index is a special type of dictionary (which is also called inverted index). A special symbol $ is used to indicate the end of each term. hello$ Shenzhen$ Beijing$ 57

58 In a Permuterm index, all rotations of a term link to the term. Permuterm vocabulary Original term All rotations of a term are used to create the search tree 58

59 Searching with a permuterm index Example 1: a query m*n Rotate the term so that the * symbol appears at the end of the text: m*n$ n$m* Then, a search tree is used to find the terms containing n$m* We can find some terms such as: n$ma man n$moro moron 59

60 Searching with a permuterm index Example 2: a query fi*mo*er Search the tree for all terms containing er$fi* fishmonger fillibuster Then, keep only the terms that do not contain mo in the middle fishmonger 60

61 Permuterm indexes Advantage: can be used to answer all types of wildcard queries Disadvantage: We need to store all rotations of each term in the dictionary. The dictionary can be quite big. for English, this can increase the size of the dictionary by 10 times. 61

62 k-gram indexes This is another type of index for answering general wildcard queries. k-gram: a sequence of k characters e.g. 3-grams from the word castle: $ca, cas, ast, stl, tle, le$ 62

63 k-gram index The dictionary of a k-gram index contains all k-grams that occur in any terms in the vocabulary. cas castle 63

64 k-gram index: answering queries Answering a wildcard query e.g. re*ve we search all terms containing $re using the k-gram index we search all terms containing ve$ using the k-gram index we do the intersection of these terms remove, relive, retrieve then, we use a standard dictionary to find the documents matching these terms. cas castle 64

65 A problem Query: red* If we use the previous approach on a 3- gram index, we will find some words such as retired matching: $re and red. But they do not match the query red* Thus, for each term found, we still need to compare the query with the term to ensure that it matches the query. 65

66 More complex queries Many search engines allow complex queries such as: re*d AND fe*ri Those queries can be answered with the technique that we have discussed. Find all documents with re*d Find all documents with fe*ri Find the intersection of these documents Such queries may be slow are they require more processing. 66

67 SPELLING CORRECTION S*d*n*y 67

68 Spelling correction We will learn two techniques for dealing with spelling errors. e.g. carot instead of carrot 68

69 Two principles for spell correction 1. To correct a misspelled word, it is generally better to chose the nearest word (most similar word). carot carrot or carotid 2. If several correctly spelled words are equally similar to the mispelled word, then we should choose the most common word. grnt grunt or grant? - the most frequent in a text? - the most frequently used in queries by other users. 69

70 How search engine handle spelling errors? On the query carot, retrieve documents containing carot as well as the corrected term carrot. retrieves documents containing carrot if the term carot is not in the dictionary. retrieves documents containing carrot if the term carot returns few documents (less than a given number). show suggested spelling to the user, and let the user choose Did you mean carrot? 70

71 71

72 Forms of spelling corrections Isolated-term correction: we attempt to correct a single query term carot carrot Context-sensitive correction: consider the whole query to try to fix spelling errors flew form Heathrow flew from Heathrow 72

73 Edit distance ( 编辑距离 ) The edit distance between two terms s1 and s2 is the minimum number of edit operations to transform s1 into s2. Three operations: insert a character delete a character replace a character with another 73

74 Example editdistance( cat, dog) = 3 editdistance( cat, cat ) = 0 editdistance( cat, car ) = 1 editdistance( cat, cart ) = 1 editdistance( cat, category ) = 5 74

75 Spell-correction with edit distance To correct the spelling of a term (e.g. carot), we search for the terms that have the smallest edit distance with this term. editdistance(carot,carrot) = 1 editdistance(carot,carotid) = 2 But calculating this may be expensive (we don t want to compare each term with every other terms). Solution? 75

76 Solution We can use some heuristics ( 启发式 ) Only search for words beginning with the same letter as the query term. An alternative: use multiple rotations of the query term using a permuterm index to search for terms similar to the query term, while omitting some letters (see book p. 60). 76

77 k-gram index for spell correction Using k-grams is another way or reducing the number of candidate terms for spelling correction. Consider a query term q. We retrieve all terms containing the k- grams in q. We keep those having the smallest edit distance. 77

78 Example query = bord Using 2-grams, we find some terms similar to bord: Using the edit distance, we may find that border or lord are more likely than boardroom We can eliminate terms that are too different immediately (e.g. by comparing term lengths) 78

79 Variations Some types of errors are more frequent than others. We can use some weights to indicate that some operations are more important (likely) than others. e.g. Insert a character may be less likely than replacing a character with another 79

80 Context sensitive spelling correction Isolated-term correction may fail for some queries such as: flew form Heathrow flew from Heathrow A simple approach to consider the context Even if the words are spelled correctly, apply spellcorrection. Generate all combinations of corrected terms to create new queries. Execute all these queries on the search engine. Return the results for the query that has the largest number of results. This method can be time-consuming! 80

81 Alternatives We may use heuristics to reduce the number of possibilities. An heuristic: consider the most frequent combinations of query terms according to previous queries from other users. we keep flew from but not flea from or flew fore. 81

82 Conclusion Today, we have discussed in more details about how to search in dictionaries. We discussed wildcard queries. We discussed spell correction. The PPT slides are on the website. QQ group:

83 References Manning, C. D., Raghavan, P., Schütze, H. Introduction to information retrieval. Cambridge: Cambridge University Press,

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2018 1 Last week What is Information Retrieval

More information

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Last week We have discussed about:

More information

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Introduction Philippe Fournier-Viger

More information

Overview. Lecture 3: Index Representation and Tolerant Retrieval. Type/token distinction. IR System components

Overview. Lecture 3: Index Representation and Tolerant Retrieval. Type/token distinction. IR System components Overview Lecture 3: Index Representation and Tolerant Retrieval Information Retrieval Computer Science Tripos Part II Ronan Cummins 1 Natural Language and Information Processing (NLIP) Group 1 Recap 2

More information

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Last week We have discussed: Evaluation

More information

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007

信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Last week We have discussed: A

More information

Information Retrieval

Information Retrieval Information Retrieval Dictionaries & Tolerant Retrieval Gintarė Grigonytė gintare@ling.su.se Department of Linguistics and Philology Uppsala University Slides based on previous IR course given by Jörg

More information

Preliminary draft (c)2008 Cambridge UP

Preliminary draft (c)2008 Cambridge UP DRAFT! February 16, 2008 Cambridge University Press. Feedback welcome. 49 3 Dictionariesandtolerant retrieval WILDCARD QUERY In Chapters 1 and 2 we developed the ideas underlying inverted indexes for handling

More information

Dictionaries and tolerant retrieval CE-324 : Modern Information Retrieval Sharif University of Technology

Dictionaries and tolerant retrieval CE-324 : Modern Information Retrieval Sharif University of Technology Dictionaries and tolerant retrieval CE-324 : Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Nayak & Raghavan (CS- 276, Stanford)

More information

3-1. Dictionaries and Tolerant Retrieval. Most slides were adapted from Stanford CS 276 course and University of Munich IR course.

3-1. Dictionaries and Tolerant Retrieval. Most slides were adapted from Stanford CS 276 course and University of Munich IR course. 3-1. Dictionaries and Tolerant Retrieval Most slides were adapted from Stanford CS 276 course and University of Munich IR course. 1 Dictionary data structures for inverted indexes Sec. 3.1 The dictionary

More information

Tolerant Retrieval. Searching the Dictionary Tolerant Retrieval. Information Retrieval & Extraction Misbhauddin 1

Tolerant Retrieval. Searching the Dictionary Tolerant Retrieval. Information Retrieval & Extraction Misbhauddin 1 Tolerant Retrieval Searching the Dictionary Tolerant Retrieval Information Retrieval & Extraction Misbhauddin 1 Query Retrieval Dictionary data structures Tolerant retrieval Wild-card queries Soundex Spelling

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 3: Dictionaries and tolerant retrieval 1 Outline Dictionaries Wildcard queries skip Edit distance skip Spelling correction skip Soundex 2 Inverted index Our

More information

Recap of the previous lecture. This lecture. A naïve dictionary. Introduction to Information Retrieval. Dictionary data structures Tolerant retrieval

Recap of the previous lecture. This lecture. A naïve dictionary. Introduction to Information Retrieval. Dictionary data structures Tolerant retrieval Ch. 2 Recap of the previous lecture Introduction to Information Retrieval Lecture 3: Dictionaries and tolerant retrieval The type/token distinction Terms are normalized types put in the dictionary Tokenization

More information

Recap of last time CS276A Information Retrieval

Recap of last time CS276A Information Retrieval Recap of last time CS276A Information Retrieval Index compression Space estimation Lecture 4 This lecture Tolerant retrieval Wild-card queries Spelling correction Soundex Wild-card queries Wild-card queries:

More information

Dictionaries and tolerant retrieval CE-324 : Modern Information Retrieval Sharif University of Technology

Dictionaries and tolerant retrieval CE-324 : Modern Information Retrieval Sharif University of Technology Dictionaries and tolerant retrieval CE-324 : Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Nayak & Raghavan (CS- 276, Stanford)

More information

Dictionaries and Tolerant retrieval

Dictionaries and Tolerant retrieval Dictionaries and Tolerant retrieval Slides adapted from Stanford CS297:Introduction to Information Retrieval A skipped lecture The type/token distinction Terms are normalized types put in the dictionary

More information

Dictionaries and tolerant retrieval. Slides by Manning, Raghavan, Schutze

Dictionaries and tolerant retrieval. Slides by Manning, Raghavan, Schutze Dictionaries and tolerant retrieval 1 Ch. 2 Recap of the previous lecture The type/token distinction Terms are normalized types put in the dictionary Tokenization problems: Hyphens, apostrophes, compounds,

More information

Lecture 3: Phrasal queries and wildcards

Lecture 3: Phrasal queries and wildcards Lecture 3: Phrasal queries and wildcards Trevor Cohn (tcohn@unimelb.edu.au) COMP90042, 2015, Semester 1 What we ll learn today Building on the boolean index and query mechanism to support multi-word queries

More information

Information Retrieval CS-E credits

Information Retrieval CS-E credits Information Retrieval CS-E4420 5 credits Tokenization, further indexing issues Antti Ukkonen antti.ukkonen@aalto.fi Slides are based on materials by Tuukka Ruotsalo, Hinrich Schütze and Christina Lioma

More information

Introduction to Information Retrieval (Manning, Raghavan, Schutze)

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 3 Dictionaries and Tolerant retrieval Chapter 4 Index construction Chapter 5 Index compression Content Dictionary data structures

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction Inverted index Processing Boolean queries Course overview Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute for Natural

More information

Information Retrieval

Information Retrieval Information Retrieval Suan Lee - Information Retrieval - 03 Dictionaries and Tolerant Retrieval 1 03 Dictionaries and Tolerant Retrieval - Information Retrieval - 03 Dictionaries and Tolerant Retrieval

More information

Inverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5

Inverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5 Inverted Indexes Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5 Basic Concepts Inverted index: a word-oriented mechanism for indexing a text collection to speed up the

More information

OUTLINE. Documents Terms. General + Non-English English. Skip pointers. Phrase queries

OUTLINE. Documents Terms. General + Non-English English. Skip pointers. Phrase queries WECHAT GROUP 1 OUTLINE Documents Terms General + Non-English English Skip pointers Phrase queries 2 Phrase queries We want to answer a query such as [stanford university] as a phrase. Thus The inventor

More information

Digital Libraries: Language Technologies

Digital Libraries: Language Technologies Digital Libraries: Language Technologies RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Recall: Inverted Index..........................................

More information

Data-analysis and Retrieval Boolean retrieval, posting lists and dictionaries

Data-analysis and Retrieval Boolean retrieval, posting lists and dictionaries Data-analysis and Retrieval Boolean retrieval, posting lists and dictionaries Hans Philippi (based on the slides from the Stanford course on IR) April 25, 2018 Boolean retrieval, posting lists & dictionaries

More information

数据挖掘 Introduction to Data Mining

数据挖掘 Introduction to Data Mining 数据挖掘 Introduction to Data Mining Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 S8700113C 1 Introduction Last week: Association Analysis

More information

Machine Vision Market Analysis of 2015 Isabel Yang

Machine Vision Market Analysis of 2015 Isabel Yang Machine Vision Market Analysis of 2015 Isabel Yang CHINA Machine Vision Union Content 1 1.Machine Vision Market Analysis of 2015 Revenue of Machine Vision Industry in China 4,000 3,500 2012-2015 (Unit:

More information

Text Technologies for Data Science INFR Indexing (2) Instructor: Walid Magdy

Text Technologies for Data Science INFR Indexing (2) Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Indexing (2) Instructor: Walid Magdy 03-Oct-2018 Lecture Objectives Learn more about indexing: Structured documents Extent index Index compression Data structure

More information

Text Technologies for Data Science INFR Indexing (2) Instructor: Walid Magdy

Text Technologies for Data Science INFR Indexing (2) Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Indexing (2) Instructor: Walid Magdy 10-Oct-2017 Lecture Objectives Learn more about indexing: Structured documents Extent index Index compression Data structure

More information

Introduction to Computer Science

Introduction to Computer Science Introduction to Computer Science 郝建业副教授 软件学院 http://www.escience.cn/people/jianye/index.html Lecturer Jianye HAO ( 郝建业 ) Email: jianye.hao@tju.edu.cn Tutor: Li Shuxin ( 李姝昕 ) Email: 957005030@qq.com Outline

More information

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488) Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-

More information

ΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου

ΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Σηµερινό ερώτηµα Typically we want to retrieve the top K docs (in the cosine ranking for the query) not totally order all docs in the corpus can we pick off docs

More information

云计算入门 Introduction to Cloud Computing GESC1001

云计算入门 Introduction to Cloud Computing GESC1001 Lecture #6 云计算入门 Introduction to Cloud Computing GESC1001 Philippe Fournier-Viger Professor School of Humanities and Social Sciences philfv8@yahoo.com Fall 2017 1 Introduction Last week: how cloud applications

More information

modern database systems lecture 4 : information retrieval

modern database systems lecture 4 : information retrieval modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Boolean Retrieval. Manning, Raghavan and Schütze, Chapter 1. Daniël de Kok

Boolean Retrieval. Manning, Raghavan and Schütze, Chapter 1. Daniël de Kok Boolean Retrieval Manning, Raghavan and Schütze, Chapter 1 Daniël de Kok Boolean query model Pose a query as a boolean query: Terms Operations: AND, OR, NOT Example: Brutus AND Caesar AND NOT Calpuria

More information

CS347. Lecture 2 April 9, Prabhakar Raghavan

CS347. Lecture 2 April 9, Prabhakar Raghavan CS347 Lecture 2 April 9, 2001 Prabhakar Raghavan Today s topics Inverted index storage Compressing dictionaries into memory Processing Boolean queries Optimizing term processing Skip list encoding Wild-card

More information

Today s topics CS347. Inverted index storage. Inverted index storage. Processing Boolean queries. Lecture 2 April 9, 2001 Prabhakar Raghavan

Today s topics CS347. Inverted index storage. Inverted index storage. Processing Boolean queries. Lecture 2 April 9, 2001 Prabhakar Raghavan Today s topics CS347 Lecture 2 April 9, 2001 Prabhakar Raghavan Inverted index storage Compressing dictionaries into memory Processing Boolean queries Optimizing term processing Skip list encoding Wild-card

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document

More information

Multiprotocol Label Switching The future of IP Backbone Technology

Multiprotocol Label Switching The future of IP Backbone Technology Multiprotocol Label Switching The future of IP Backbone Technology Computer Network Architecture For Postgraduates Chen Zhenxiang School of Information Science and Technology. University of Jinan (c) Chen

More information

Indexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze

Indexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze Indexing UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze All slides Addison Wesley, 2008 Table of Content Inverted index with positional information

More information

Oriented Scene Text Detection Revisited. Xiang Bai Huazhong University of Science and Technology

Oriented Scene Text Detection Revisited. Xiang Bai Huazhong University of Science and Technology The Invited Talk in Vision and Learning Seminar (VALSE) Xiamen, 2017-4-22 Oriented Scene Text Detection Revisited Xiang Bai Huazhong University of Science and Technology xbai@hust.edu.cn http://mclab.eic.hust.edu.cn/~xbai/

More information

如何查看 Cache Engine 缓存中有哪些网站 /URL

如何查看 Cache Engine 缓存中有哪些网站 /URL 如何查看 Cache Engine 缓存中有哪些网站 /URL 目录 简介 硬件与软件版本 处理日志 验证配置 相关信息 简介 本文解释如何设置处理日志记录什么网站 /URL 在 Cache Engine 被缓存 硬件与软件版本 使用这些硬件和软件版本, 此配置开发并且测试了 : Hardware:Cisco 缓存引擎 500 系列和 73xx 软件 :Cisco Cache 软件版本 2.3.0

More information

Indexing and Searching

Indexing and Searching Indexing and Searching Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, chapter 8 2. Information Retrieval:

More information

Boolean Queries. Keywords combined with Boolean operators:

Boolean Queries. Keywords combined with Boolean operators: Query Languages 1 Boolean Queries Keywords combined with Boolean operators: OR: (e 1 OR e 2 ) AND: (e 1 AND e 2 ) BUT: (e 1 BUT e 2 ) Satisfy e 1 but not e 2 Negation only allowed using BUT to allow efficient

More information

60-538: Information Retrieval

60-538: Information Retrieval 60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS3245 Information Retrieval Lecture 4: Dictionaries and Tolerant Retrieval4 Last Time: Terms and Postings Details Ch. 2 Skip pointers Encoding a tree-like structure

More information

Lecture 05: Basic Python Programming

Lecture 05: Basic Python Programming BI296: Linux and Shell Programming Lecture 05: Basic Python Programming Maoying,Wu ricket.woo@gmail.com Dept. of Bioinformatics & Biostatistics Shanghai Jiao Tong University Spring, 2017 Maoying Wu (CBB)

More information

Recap: lecture 2 CS276A Information Retrieval

Recap: lecture 2 CS276A Information Retrieval Recap: lecture 2 CS276A Information Retrieval Stemming, tokenization etc. Faster postings merges Phrase queries Lecture 3 This lecture Index compression Space estimation Corpus size for estimates Consider

More information

Previous on Computer Networks Class 18. ICMP: Internet Control Message Protocol IP Protocol Actually a IP packet

Previous on Computer Networks Class 18. ICMP: Internet Control Message Protocol IP Protocol Actually a IP packet ICMP: Internet Control Message Protocol IP Protocol Actually a IP packet 前 4 个字节都是一样的 0 8 16 31 类型代码检验和 ( 这 4 个字节取决于 ICMP 报文的类型 ) ICMP 的数据部分 ( 长度取决于类型 ) ICMP 报文 首部 数据部分 IP 数据报 ICMP: Internet Control Message

More information

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser Text Analytics Index-Structures for Information Retrieval Ulf Leser Content of this Lecture Inverted files Storage structures Phrase and proximity search Building and updating the index Using a RDBMS Ulf

More information

Part 2: Boolean Retrieval Francesco Ricci

Part 2: Boolean Retrieval Francesco Ricci Part 2: Boolean Retrieval Francesco Ricci Most of these slides comes from the course: Information Retrieval and Web Search, Christopher Manning and Prabhakar Raghavan Content p Term document matrix p Information

More information

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser Text Analytics Index-Structures for Information Retrieval Ulf Leser Content of this Lecture Inverted files Storage structures Phrase and proximity search Building and updating the index Using a RDBMS Ulf

More information

Web Information Retrieval. Lecture 4 Dictionaries, Index Compression

Web Information Retrieval. Lecture 4 Dictionaries, Index Compression Web Information Retrieval Lecture 4 Dictionaries, Index Compression Recap: lecture 2,3 Stemming, tokenization etc. Faster postings merges Phrase queries Index construction This lecture Dictionary data

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 2: Preprocessing 1 Ch. 1 Recap of the previous lecture Basic inverted indexes: Structure: Dictionary and Postings Key step in construction: Sorting Boolean

More information

Indexing and Searching

Indexing and Searching Indexing and Searching Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, chapter 9 2. Information Retrieval:

More information

Recap of the previous lecture. Recall the basic indexing pipeline. Plan for this lecture. Parsing a document. Introduction to Information Retrieval

Recap of the previous lecture. Recall the basic indexing pipeline. Plan for this lecture. Parsing a document. Introduction to Information Retrieval Ch. Introduction to Information Retrieval Recap of the previous lecture Basic inverted indexes: Structure: Dictionary and Postings Lecture 2: The term vocabulary and postings lists Key step in construction:

More information

Technology: Anti-social Networking 科技 : 反社交网络

Technology: Anti-social Networking 科技 : 反社交网络 Technology: Anti-social Networking 科技 : 反社交网络 1 Technology: Anti-social Networking 科技 : 反社交网络 The Growth of Online Communities 社交网络使用的增长 Read the text below and do the activity that follows. 阅读下面的短文, 然后完成练习

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

3 Keynote Speech:

3 Keynote Speech: 3 Keynote Speech: Digital Tools for Chinese Language Learning and Teaching: CKC Code and its Online Dictionary By Dr. Esther S. C. Chan & Dr. K. H. Tse The Hong Kong Institute of Education When we study,

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan Lecture 2: The term vocabulary Ch. 1 Recap of the previous lecture Basic inverted

More information

CS 525: Advanced Database Organization 04: Indexing

CS 525: Advanced Database Organization 04: Indexing CS 5: Advanced Database Organization 04: Indexing Boris Glavic Part 04 Indexing & Hashing value record? value Slides: adapted from a course taught by Hector Garcia-Molina, Stanford InfoLab CS 5 Notes 4

More information

Information Retrieval

Information Retrieval Information Retrieval Natural Language Processing: Lecture 12 30.11.2017 Kairit Sirts Homework 4 things that seemed to work Bidirectional LSTM instead of unidirectional Change LSTM activation to sigmoid

More information

上汽通用汽车供应商门户网站项目 (SGMSP) User Guide 用户手册 上汽通用汽车有限公司 2014 上汽通用汽车有限公司未经授权, 不得以任何形式使用本文档所包括的任何部分

上汽通用汽车供应商门户网站项目 (SGMSP) User Guide 用户手册 上汽通用汽车有限公司 2014 上汽通用汽车有限公司未经授权, 不得以任何形式使用本文档所包括的任何部分 上汽通用汽车供应商门户网站项目 (SGMSP) User Guide 用户手册 上汽通用汽车有限公司 2014 上汽通用汽车有限公司未经授权, 不得以任何形式使用本文档所包括的任何部分 SGM IT < 上汽通用汽车供应商门户网站项目 (SGMSP)> 工作产品名称 :< User Guide 用户手册 > Current Version: Owner: < 曹昌晔 > Date Created:

More information

Information Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Boolean Retrieval vs. Ranked Retrieval Many users (professionals) prefer

More information

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2016/17 IR Chapter 02 The Term Vocabulary and Postings Lists Constructing Inverted Indexes The major steps in constructing

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 4 9/1/2011 Today Finish up spelling correction Realistic indexing Block merge Single-pass in memory Distributed indexing Next HW details 1 Query

More information

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan Today s topic CS347 Clustering documents Lecture 8 May 7, 2001 Prabhakar Raghavan Why cluster documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics

More information

CSE 562 Database Systems

CSE 562 Database Systems Goal of Indexing CSE 562 Database Systems Indexing Some slides are based or modified from originals by Database Systems: The Complete Book, Pearson Prentice Hall 2 nd Edition 08 Garcia-Molina, Ullman,

More information

vector space retrieval many slides courtesy James Amherst

vector space retrieval many slides courtesy James Amherst vector space retrieval many slides courtesy James Allan@umass Amherst 1 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 4: Index Construction 1 Plan Last lecture: Dictionary data structures Tolerant retrieval Wildcards Spell correction Soundex a-hu hy-m n-z $m mace madden mo

More information

数据挖掘 Introduction to Data Mining

数据挖掘 Introduction to Data Mining 数据挖掘 Introduction to Data Mining Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 S8700113C 1 Introduction Last week: Classification (Part

More information

GUJARAT TECHNOLOGICAL UNIVERSITY

GUJARAT TECHNOLOGICAL UNIVERSITY GUJARAT TECHNOLOGICAL UNIVERSITY INFORMATION TECHNOLOGY DATA COMPRESSION AND DATA RETRIVAL SUBJECT CODE: 2161603 B.E. 6 th SEMESTER Type of course: Core Prerequisite: None Rationale: Data compression refers

More information

Lecture 11: Packet forwarding

Lecture 11: Packet forwarding Lecture 11: Packet forwarding Anirudh Sivaraman 2017/10/23 This week we ll talk about the data plane. Recall that the routing layer broadly consists of two parts: (1) the control plane that computes routes

More information

Bi-monthly report. Tianyi Luo

Bi-monthly report. Tianyi Luo Bi-monthly report Tianyi Luo 1 Work done in this week Write a crawler plus based on keywords (Support Chinese and English) Modify a Sina weibo crawler (340M/day) Offline learning to rank module is completed

More information

IN4325 Indexing and query processing. Claudia Hauff (WIS, TU Delft)

IN4325 Indexing and query processing. Claudia Hauff (WIS, TU Delft) IN4325 Indexing and query processing Claudia Hauff (WIS, TU Delft) The big picture Information need Topic the user wants to know more about The essence of IR Query Translation of need into an input for

More information

nbns-list netbios-type network next-server option reset dhcp server conflict 1-34

nbns-list netbios-type network next-server option reset dhcp server conflict 1-34 目录 1 DHCP 1-1 1.1 DHCP 公共命令 1-1 1.1.1 dhcp dscp 1-1 1.1.2 dhcp enable 1-1 1.1.3 dhcp select 1-2 1.2 DHCP 服务器配置命令 1-3 1.2.1 address range 1-3 1.2.2 bims-server 1-4 1.2.3 bootfile-name 1-5 1.2.4 class 1-6

More information

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi. Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processing and Information Retrieval Indexing and Vector Space Models Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email: moschitti@disi.unitn.it

More information

XML allows your content to be created in one workflow, at one cost, to reach all your readers XML 的优势 : 只需一次加工和投入, 到达所有读者的手中

XML allows your content to be created in one workflow, at one cost, to reach all your readers XML 的优势 : 只需一次加工和投入, 到达所有读者的手中 XML allows your content to be created in one workflow, at one cost, to reach all your readers XML 的优势 : 只需一次加工和投入, 到达所有读者的手中 We can format your materials to be read.. in print 印刷 XML Conversions online

More information

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time B + -TREES MOTIVATION An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations finish within O(log N) time The theoretical conclusion

More information

Information Retrieval. Lecture 5 - The vector space model. Introduction. Overview. Term weighting. Wintersemester 2007

Information Retrieval. Lecture 5 - The vector space model. Introduction. Overview. Term weighting. Wintersemester 2007 Information Retrieval Lecture 5 - The vector space model Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 28 Introduction Boolean model: all documents

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 3: Dictionaries and tolerant retrieval Paul Ginsparg Cornell University,

More information

OTAD Application Note

OTAD Application Note OTAD Application Note Document Title: OTAD Application Note Version: 1.0 Date: 2011-08-30 Status: Document Control ID: Release _OTAD_Application_Note_CN_V1.0 Copyright Shanghai SIMCom Wireless Solutions

More information

Command Dictionary CUSTOM

Command Dictionary CUSTOM 命令模式 CUSTOM [(filename)] [parameters] Executes a "custom-designed" command which has been provided by special programming using the GHS Programming Interface. 通过 GHS 程序接口, 执行一个 用户设计 的命令, 该命令由其他特殊程序提供 参数说明

More information

Information Retrieval

Information Retrieval Information Retrieval Suan Lee - Information Retrieval - 04 Index Construction 1 04 Index Construction - Information Retrieval - 04 Index Construction 2 Plan Last lecture: Dictionary data structures Tolerant

More information

CS 206 Introduction to Computer Science II

CS 206 Introduction to Computer Science II CS 206 Introduction to Computer Science II 04 / 25 / 2018 Instructor: Michael Eckmann Today s Topics Questions? Comments? Balanced Binary Search trees AVL trees / Compression Uses binary trees Balanced

More information

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Ralf Moeller Hamburg Univ. of Technology Acknowledgement Slides taken from presentation material for the following

More information

Boolean retrieval & basics of indexing CE-324: Modern Information Retrieval Sharif University of Technology

Boolean retrieval & basics of indexing CE-324: Modern Information Retrieval Sharif University of Technology Boolean retrieval & basics of indexing CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2013 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Information Retrieval

Information Retrieval Information Retrieval Suan Lee - Information Retrieval - 05 Index Compression 1 05 Index Compression - Information Retrieval - 05 Index Compression 2 Last lecture index construction Sort-based indexing

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 7: Scores in a Complete Search System Paul Ginsparg Cornell University, Ithaca,

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS4611: Information Retrieval Professor M. P. Schellekens Assistant: Ang Gao Slides adapted from P. Nayak and P. Raghavan Information Retrieval Lecture 2: The term

More information

Models for Document & Query Representation. Ziawasch Abedjan

Models for Document & Query Representation. Ziawasch Abedjan Models for Document & Query Representation Ziawasch Abedjan Overview Introduction & Definition Boolean retrieval Vector Space Model Probabilistic Information Retrieval Language Model Approach Summary Overview

More information

Lecture 5: Information Retrieval using the Vector Space Model

Lecture 5: Information Retrieval using the Vector Space Model Lecture 5: Information Retrieval using the Vector Space Model Trevor Cohn (tcohn@unimelb.edu.au) Slide credits: William Webber COMP90042, 2015, Semester 1 What we ll learn today How to take a user query

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS3245 Information Retrieval Lecture 6: Index Compression 6 Last Time: index construction Sort- based indexing Blocked Sort- Based Indexing Merge sort is effective

More information

实验三十三 DEIGRP 的配置 一 实验目的 二 应用环境 三 实验设备 四 实验拓扑 五 实验要求 六 实验步骤 1. 掌握 DEIGRP 的配置方法 2. 理解 DEIGRP 协议的工作过程

实验三十三 DEIGRP 的配置 一 实验目的 二 应用环境 三 实验设备 四 实验拓扑 五 实验要求 六 实验步骤 1. 掌握 DEIGRP 的配置方法 2. 理解 DEIGRP 协议的工作过程 实验三十三 DEIGRP 的配置 一 实验目的 1. 掌握 DEIGRP 的配置方法 2. 理解 DEIGRP 协议的工作过程 二 应用环境 由于 RIP 协议的诸多问题, 神州数码开发了与 EIGRP 完全兼容的 DEIGRP, 支持变长子网 掩码 路由选择参考更多因素, 如带宽等等 三 实验设备 1. DCR-1751 三台 2. CR-V35FC 一条 3. CR-V35MT 一条 四 实验拓扑

More information

: Operating System 计算机原理与设计

: Operating System 计算机原理与设计 .. 0117401: Operating System 计算机原理与设计 Chapter 11: File system interface( 文件系统接口 ) 陈香兰 xlanchen@ustc.edu.cn http://staff.ustc.edu.cn/~xlanchen Computer Application Laboratory, CS, USTC @ Hefei Embedded

More information