Search engines. Børge Svingen Chief Technology Officer, Open AdExchange
|
|
- Loren Matthews
- 5 years ago
- Views:
Transcription
1 Search engines Børge Svingen Chief Technology Officer, Open AdExchange
2 Information retrieval (IR) IR: Looking for information in data Much research in IR since the 60s Late 90s: The first Internet search engines: Excite FPTSearch AltaVista AllTheWeb Inktomi HotBot
3 The difference between IR and search engines Information retrieval: Small data sets Homogenous data sets High precision Expert users Search engines: Large data sets Heterogenous data sets Speed Now: IR and search engines are merging
4 Applications of search engines Search engines are now used for three main purposes: Web search E-commerce: Typically online shopping sites. Enterprise search: Searching internal data in businesses.
5 Information retrieval models Three main groups of IR methodologies: Set theoretic models: Documents represented as sets of words. Boolean and fuzzy models are most common. Algebraic models: Documents represented as algebraic constructs, typically vectors. Probabilistic models: Documents are retrieved based probabilistic estimations of relevance.
6 Relevancy Relevancy: The degree to which a document is relevant to a query. Some definitions: A query q. A set of documents D. A relevancy function R(d, q) for a document d and query q. A relevancy cut-off value R 0, R(d, q) > R 0 is good enough. A set of relevant documents D rel where R(d, q) > R 0 for all d D rel. A result set D res.
7 Recall and precision Two important performance measures: Recall tells us how many of the relevant documents that are in the result set. recall = R rel R res R rel. Precision tells us how many of the documents in the result set that are relevant. recall = R rel R res R res.
8 Expanding scope Increasing types of information are used by search engines to calculate relevancy: Traditional IR: Relevancy decided by document content. Web search: Started included other information about documents, i.e. link graphs. Mobile search: Takes into accord the device from which a search is being performed, in order to return only content that can be used by the device. Personalized search: Uses personal interests and behavioral history to give different results to different users.
9 PageRank Use link graph to estimate document quality Query independent. Looks at the graph of links between documents. Assumption: Better documents are linked to by more documents. Disadvantage: Really measures popularity, not quality.
10 Components of a search engine Crawler Connectors Data preparation Indexes Query processing Result processing
11 Crawling The purpose of the crawler is to retrieve content from the web. The problem: No centralized catalog of all web pages. The solution: Start with a number of seed URLs. Retrieve the web pages. Analyze web pages for links, giving new URLs. Repeat... Crawling is difficult...
12 Connectors Retrieves content from different sources: Applications Content management systems servers Databases Etc. Responsible for keeping content up to date.
13 Data preparation Prepare data for indexing: Normalize content Metadata enrichment Categorization Linguistic analysis Etc.
14 Query and result processing Prepares queries for searching, and results for presentation: Normalize content Metadata enrichment Categorization Linguistic analysis Etc. Same as for content...
15 Why indexes How to search terabytes of data? Linear search takes to long. Answer: Use an index. A index is a mapping from a term to the set of documents containing the term.
16 How to index How to choose the right type of index? Many types of indexes available.. Different index types have different space and time complexities. Different index types perform differently for different types of queries. The choice of index types depend on the application: What data to search. What queries will be used. Level of expertise of the users.
17 Suffix trees A suffix tree is a compact suffix trie. A suffix trie is a trie containing all suffixes of a string. Basic observation: Every substring is the prefix of a suffix. Can be built in linear time. Main advantage: Suffix trees allow substring matching. Disadvantage: A suffix can take considerably more space than the original data.
18 Suffix tree example ABC BC C $ 6 ABC$ $ ABC$ $ ABC$ $ Example string: ABCABC$
19 Suffix arrays Suffix arrays are a more efficient implementation of suffix trees. Example string: ABCABC$ ABC$ ABCABC$ BC$ BCABC$ C$ CABC$
20 Inverted files Inverted files looks at individual terms. Each term points to documents containing the term (with positions). Advantage: Creates smaller index than original data set. Disadvantage: Limited queries, no substrings Inverted files often refer to a dictionary instead of actual terms.
21 Inverted files, example Two documents: doc1: This is a test. doc2: So is this. a is so test this (doc1,2) (doc1,1), (doc2,1) (doc2,0) (doc1,3) (doc1,0), (doc2,2)
22 Scaling search engines Search engines need to handle huge scaling requirements. There are two main dimensions in which to scale: Data volume Query volume
23 Scaling linearly In the following, a linearly scalable search architecture will be described. Required hardware is O(data volume) Required hardware is O(query volume)
24 Data partitioning A data collection D is given. On this collection an equivalence relation is defined. From the equivalence classes form a partition P = {D i }. This means that D i, D j (D i P, D j P D i D j = ) and D i P = D. On the subsets of D, a function σ : P (D) N gives a measure of the actual data size.
25 Equivalence relation properties Being an equivalence relation, fulfills the following requirements: (d, d) for all d D (reflexiveness) (d 1, d 2 ) (d 2, d 1 ) (symmetry) (d 1, d 2 ) and (d 2, d 3 ) (d 1, d 3 ) (transitivity)
26 Query definitions A set of queries Q is given. Each q Q is of the form q = {q, P }, where q is a query representation (for instance, query string). P is the subset of P that is of relevance to the query, so that P D.
27 Time distribution of queries It is assumed that the set of queries Q follow a Poisson distribution characterized by the average λ. This means that the probability of k queries arriving during a time unit is equal to P(k) = e λ λ k k! The number of queries arriving in non-overlapping intervals are therefore considered independent.
28 Types of nodes We have four types of nodes: Processing nodes. Query distribution nodes. Result accumulation nodes. Data preprocessing nodes.
29 Distributed architecture
30 Processing nodes The set of nodes N proc is used to solve the set of queries Q. A function φ : N proc P specifies how the data set D is distributed to the set of nodes.
31 Query distribution nodes The set of nodes N distr distributes the query q = {q, P } to the set of processing nodes used to process the query. This set is given by the function δ : Q P (N proc ).
32 Result accumulation nodes Upon completion of the query processing, the results are accumulated by the set of nodes N acc.
33 Data preprocessing nodes In some cases the data D on which the queries will work need to be preprocessed. A set of nodes N pre will serve this task. These nodes will not be discussed further.
34 Problem solving steps To evaluate a query q = {q, P }, the following steps are performed: 1. Distribution. The query q = {q, P } is distributed to the subset δ (q) of N proc δ is chosen so that N proc δ(q) φ (N proc) P. 2. Parallel evaluation. The query q will be evaluated in parallel on the processing nodes δ (q). 3. Result accumulation and merging. Upon completion of the parallel solving process, the results from the processing nodes are accumulated and merged into the final result.
35 Performance specifications Each processing node N N proc is assumed to have the following performance specifications: An average of kproc queries can be handled in a time unit. Up to a data amount of σmax can be handled. It is assumed that is decided so that max Di P σ (D i ) σ max. Each query distribution node N N distr is assumed to be able to distribute queries to up to k distr other nodes. Each result accumulation node N N acc is assumed to be able to accumulate results from up to k acc other nodes.
36 Two-dimensional scalability Processing nodes organized in a matrix: Each column contains a full replica of all data. Each row contains the same data. The distribution and accumulation nodes are organized as trees: Queries are first distributed to columns. Each query goes to a single column. Queries are then distributed rows. Each query goes to all rows in a column.
37 Data distribution tree
38 The matrix N proc,1,1 N proc,1,2 N proc,1,c N proc,2,1 N proc,2,2 N proc,2,c N proc,r,1 N proc,r,2 N proc,r,c
39 Fault tolerance General principles: It is not acceptable that some of the data in D is not available. It is acceptable that the performance goes down until the error is corrected. Simple strategy: Don t use columns with faulty nodes. (Complex topic...)
40 Linear scalability r = c = σ (D) σ max λ k proc N proc = cr N distr = N proc 1 N acc = N proc 1 Somewhat simplified... Assumes worst case conditions, no extra fault tolerance.
41 Linear scalability, proof N = N proc + N distr + N acc = N proc + N proc 1 + N proc 1 = 3 N proc 2 σ (D) = 3 σ max λ k proc 2
42 Pattern Matching Chip (PMC)
43 PMC overview Data Data distribution Pattern Matching Result Processing Match reports
44 The Comparison Element > = not or not = MUX
45 Searching for a string = c = b = a
46 Searching for a regular expression a b a = c = b = a = c = b = a b b a b b b = c = b = a = c = b = a c b b = c = b = a
47 The Processing Element sc i res[i] sc res[i-1] ff out [i] res[i-1] ff out [i+1] res[i] ff in D res[i]
48 Binary distribution tree M[i] : MUXed distribution node : The MUX for PE[i] : Simple distribution node Data source M[0] From neighbouring tree : MUXed PE M[4] : PE shifting data M[6] M[2] Results
49 Implementation of binary distribution tree 0 M 1 M 2 M 3 M RESULTS 7 M 6 M 5 M 4
50 Binary distribution tree with sequence control sc 0 M 1 M 2 M 3 M RESULTS res[i] 7 M 6 M 5 M 4 res[i] sc res[i-1] ff out [i] ff out [i+1] res[i] ff in res[i-1]
51 Larger binary distribution tree with sequence control s c res[i] res[i-1] M s c res[i] res[i] sc M res[i-1] s c R E S U L T S res[i-1] ff out [i] res[i] ff out [i+1] res[i] ff in res[i-1] M s c res[i] res[i-1] res[i-1]
52 The result selector res1 eq1 res2 eq2 sel doc res eq
53 Result selector operations COMPARE [C] Performs alphabetical/numerical comparison (L==R) [==] Compares L and R (L > R) [>] Compares L and R (L R) [ ] Compares L and R L + R [+] Adds L and R ((L+R) C) [ C] Compare (L + R) to C ((L+R) C) [ C] Compare (L + R) to C ((L+R)==C) [==C] Compare (L + R) to C
54 Implementing all boolean functions 1/2 F0 0 A [ 3] B Null F1 (A AND B) A [ 2] B F2 (A AND NOT B) A [>] B F3 (Transfer A) A [+] B B subtree not used, generates always 0 F4 (A NOR NOT B) B [>] A Subtrees swapped F5 (Transfer B) A [+] B A subtree not used, generates always 0 F6 (A XOR B) A [==1] B F7 (A OR B) A [ 1] B
55 Implementing all boolean functions 2/2 F8 (A NOR B) A [ 0] B F9 (A XNOR B) A [==] B Equivalence F10 (NOT B) A [ 0] B A subtree not used, generates always 0 F11 (A OR NOT B) A [ ] B Implication, if B then A else true F12 (NOT A) A [ 0] B B subtree not used, generates always 0 F13 (NOT A OR B) B [ ] A Implication, subtrees swapped, see F11 F14 (A NAND B) A [ 1] B F15 1 A [ 0] B Identity
56 Data distribution tree with result selectors 0 MUX 1 MUX 2 MUX 3 MUX sel sel sel sel sel sel sel 7 MUX MUX MUX 6 5 4
57 sc sc sc sc sc sc sc sc lat lat lat lat lat lat lat lat res res res res sc sc sc sc lat lat lat lat res sc lat res res sc lat Data distribution and result gathering CE0 CE1 CE2 CE3 CE4 CE5 CE6 CE7
58 The end.
CS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 Web Search Engine Crawling Indexing Computing
More informationAdvances In Industrial Logic Synthesis
Advances In Industrial Logic Synthesis Luca Amarù, Patrick Vuillod, Jiong Luo Design Group, Synopsys Inc., Sunnyvale, California, USA Design Group, Synopsys, Grenoble, FR Logic Synthesis Y
More informationIntroduction to Computer Architecture
Boolean Operators The Boolean operators AND and OR are binary infix operators (that is, they take two arguments, and the operator appears between them.) A AND B D OR E We will form Boolean Functions of
More informationQUESTION BANK FOR TEST
CSCI 2121 Computer Organization and Assembly Language PRACTICE QUESTION BANK FOR TEST 1 Note: This represents a sample set. Please study all the topics from the lecture notes. Question 1. Multiple Choice
More informationIndexing and Searching
Indexing and Searching Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, chapter 9 2. Information Retrieval:
More informationGC03 Boolean Algebra
Why study? GC3 Boolean Algebra Computers transfer and process binary representations of data. Binary operations are easily represented and manipulated in Boolean algebra! Digital electronics is binary/boolean
More informationIndexing and Searching
Indexing and Searching Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, chapter 8 2. Information Retrieval:
More informationInformation Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.
Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.
More informationIndexing Web pages. Web Search: Indexing Web Pages. Indexing the link structure. checkpoint URL s. Connectivity Server: Node table
Indexing Web pages Web Search: Indexing Web Pages CPS 296.1 Topics in Database Systems Indexing the link structure AltaVista Connectivity Server case study Bharat et al., The Fast Access to Linkage Information
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationExam IST 441 Spring 2014
Exam IST 441 Spring 2014 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.
More informationINFSCI 2140 Information Storage and Retrieval Lecture 2: Models of Information Retrieval: Boolean model. Final Group Projects
INFSCI 2140 Information Storage and Retrieval Lecture 2: Models of Information Retrieval: Boolean model Peter Brusilovsky http://www2.sis.pitt.edu/~peterb/2140-051/ Final Group Projects Groups of variable
More informationExam IST 441 Spring 2011
Exam IST 441 Spring 2011 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable References Bigtable: A Distributed Storage System for Structured Data. Fay Chang et. al. OSDI
More informationRepresentation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s
Summary agenda Summary: EITN01 Web Intelligence and Information Retrieval Anders Ardö EIT Electrical and Information Technology, Lund University March 13, 2013 A Ardö, EIT Summary: EITN01 Web Intelligence
More informationExam IST 441 Spring 2013
Exam IST 441 Spring 2013 Last name: Student ID: First name: I acknowledge and accept the University Policies and the Course Policies on Academic Integrity This 100 point exam determines 30% of your grade.
More informationCS8803: Advanced Digital Design for Embedded Hardware
CS883: Advanced Digital Design for Embedded Hardware Lecture 2: Boolean Algebra, Gate Network, and Combinational Blocks Instructor: Sung Kyu Lim (limsk@ece.gatech.edu) Website: http://users.ece.gatech.edu/limsk/course/cs883
More informationContents. Chapter 3 Combinational Circuits Page 1 of 34
Chapter 3 Combinational Circuits Page of 34 Contents Contents... 3 Combinational Circuits... 2 3. Analysis of Combinational Circuits... 2 3.. Using a Truth Table... 2 3..2 Using a Boolean unction... 4
More informationData Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data
More informationUnit 4: Formal Verification
Course contents Unit 4: Formal Verification Logic synthesis basics Binary-decision diagram (BDD) Verification Logic optimization Technology mapping Readings Chapter 11 Unit 4 1 Logic Synthesis & Verification
More informationSEARCH ENGINE INSIDE OUT
SEARCH ENGINE INSIDE OUT From Technical Views r86526020 r88526016 r88526028 b85506013 b85506010 April 11,2000 Outline Why Search Engine so important Search Engine Architecture Crawling Subsystem Indexing
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search IR models: Boolean model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Browsing boolean vector probabilistic
More informationPropositional Calculus: Boolean Algebra and Simplification. CS 270: Mathematical Foundations of Computer Science Jeremy Johnson
Propositional Calculus: Boolean Algebra and Simplification CS 270: Mathematical Foundations of Computer Science Jeremy Johnson Propositional Calculus Topics Motivation: Simplifying Conditional Expressions
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationDocument indexing, similarities and retrieval in large scale text collections
Document indexing, similarities and retrieval in large scale text collections Eric Gaussier Univ. Grenoble Alpes - LIG Eric.Gaussier@imag.fr Eric Gaussier Document indexing, similarities & retrieval 1
More informationEfficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)
Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-
More informationCS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Text data and information retrieval Li Xiong Department of Mathematics and Computer Science Emory University Outline Information Retrieval (IR) Concepts Text Preprocessing Inverted
More informationAdministrivia. Crawlers: Nutch. Course Overview. Issues. Crawling Issues. Groups Formed Architecture Documents under Review Group Meetings CSE 454
Administrivia Crawlers: Nutch Groups Formed Architecture Documents under Review Group Meetings CSE 454 4/14/2005 12:54 PM 1 4/14/2005 12:54 PM 2 Info Extraction Course Overview Ecommerce Standard Web Search
More information20489: Developing Microsoft SharePoint Server 2013 Advanced Solutions
20489: Developing Microsoft SharePoint Server 2013 Advanced Solutions Length: 5 days Audience: Developers Level: 300 OVERVIEW This course provides SharePoint developers the information needed to implement
More informationIndexing and Searching
Indexing and Searching Introduction How to retrieval information? A simple alternative is to search the whole text sequentially Another option is to build data structures over the text (called indices)
More informationWhat is database? Types and Examples
What is database? Types and Examples Visit our site for more information: www.examplanning.com Facebook Page: https://www.facebook.com/examplanning10/ Twitter: https://twitter.com/examplanning10 TABLE
More informationELCT201: DIGITAL LOGIC DESIGN
ELCT201: DIGITAL LOGIC DESIGN Dr. Eng. Haitham Omran, haitham.omran@guc.edu.eg Dr. Eng. Wassim Alexan, wassim.joseph@guc.edu.eg Lecture 3 Following the slides of Dr. Ahmed H. Madian ذو الحجة 1438 ه Winter
More informationBoolean Model. Hongning Wang
Boolean Model Hongning Wang CS@UVa Abstraction of search engine architecture Indexed corpus Crawler Ranking procedure Doc Analyzer Doc Representation Query Rep Feedback (Query) Evaluation User Indexer
More informationInformation Retrieval
Introduction to Information Retrieval Lecture 4: Index Construction Plan Last lecture: Dictionary data structures Tolerant retrieval Wildcards This time: Spell correction Soundex Index construction Index
More informationLecture 5: Information Retrieval using the Vector Space Model
Lecture 5: Information Retrieval using the Vector Space Model Trevor Cohn (tcohn@unimelb.edu.au) Slide credits: William Webber COMP90042, 2015, Semester 1 What we ll learn today How to take a user query
More informationAdministrative. Distributed indexing. Index Compression! What I did last summer lunch talks today. Master. Tasks
Administrative Index Compression! n Assignment 1? n Homework 2 out n What I did last summer lunch talks today David Kauchak cs458 Fall 2012 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture5-indexcompression.ppt
More informationVLSI System Design Part II : Logic Synthesis (1) Oct Feb.2007
VLSI System Design Part II : Logic Synthesis (1) Oct.2006 - Feb.2007 Lecturer : Tsuyoshi Isshiki Dept. Communications and Integrated Systems, Tokyo Institute of Technology isshiki@vlsi.ss.titech.ac.jp
More informationMultimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency
Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Ralf Moeller Hamburg Univ. of Technology Acknowledgement Slides taken from presentation material for the following
More informationindex construct Overview Overview Recap How to construct index? Introduction Index construction Introduction to Recap
to to Information Retrieval Index Construct Ruixuan Li Huazhong University of Science and Technology http://idc.hust.edu.cn/~rxli/ October, 2012 1 2 How to construct index? Computerese term document docid
More informationDigital Forensic Text String Searching: Improving Information Retrieval Effectiveness by Thematically Clustering Search Results
Digital Forensic Text String Searching: Improving Information Retrieval Effectiveness by Thematically Clustering Search Results DFRWS 2007 Department of Information Systems & Technology Management The
More informationEECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling
EECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling Doug Downey Based partially on slides by Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze Announcements Project progress report
More informationIndex construction CE-324: Modern Information Retrieval Sharif University of Technology
Index construction CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Ch.
More informationData Structures and Algorithms(12)
Ming Zhang "Data s and Algorithms" Data s and Algorithms(12) Instructor: Ming Zhang Textbook Authors: Ming Zhang, Tengjiao Wang and Haiyan Zhao Higher Education Press, 28.6 (the "Eleventh Five-Year" national
More informationData-analysis and Retrieval Boolean retrieval, posting lists and dictionaries
Data-analysis and Retrieval Boolean retrieval, posting lists and dictionaries Hans Philippi (based on the slides from the Stanford course on IR) April 25, 2018 Boolean retrieval, posting lists & dictionaries
More information4. Suffix Trees and Arrays
4. Suffix Trees and Arrays Let T = T [0..n) be the text. For i [0..n], let T i denote the suffix T [i..n). Furthermore, for any subset C [0..n], we write T C = {T i i C}. In particular, T [0..n] is the
More information1. Fill in the entries in the truth table below to specify the logic function described by the expression, AB AC A B C Z
CS W3827 05S Solutions for Midterm Exam 3/3/05. Fill in the entries in the truth table below to specify the logic function described by the expression, AB AC A B C Z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2.
More informationText Analytics. Index-Structures for Information Retrieval. Ulf Leser
Text Analytics Index-Structures for Information Retrieval Ulf Leser Content of this Lecture Inverted files Storage structures Phrase and proximity search Building and updating the index Using a RDBMS Ulf
More informationInformation Retrieval Tutorial 1: Boolean Retrieval
Information Retrieval Tutorial 1: Boolean Retrieval Professor: Michel Schellekens TA: Ang Gao University College Cork 2012-10-26 Boolean Retrieval 1 / 19 Outline 1 Review 2 Boolean Retrieval 2 / 19 Definition
More informationdoc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague
Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationNADAR SARASWATHI COLLEGE OF ENGINEERING AND TECHNOLOGY Vadapudupatti, Theni
NADAR SARASWATHI COLLEGE OF ENGINEERING AND TECHNOLOGY Vadapudupatti, Theni-625531 Question Bank for the Units I to V SEMESTER BRANCH SUB CODE 3rd Semester B.E. / B.Tech. Electrical and Electronics Engineering
More informationInformation Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes
CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten
More informationInformation Retrieval May 15. Web retrieval
Information Retrieval May 15 Web retrieval What s so special about the Web? The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically
More informationE-Companion: On Styles in Product Design: An Analysis of US. Design Patents
E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing
More informationCode No: R Set No. 1
Code No: R059210504 Set No. 1 II B.Tech I Semester Supplementary Examinations, February 2007 DIGITAL LOGIC DESIGN ( Common to Computer Science & Engineering, Information Technology and Computer Science
More informationCS47300: Web Information Search and Management
CS47300: Web Information Search and Management Prof. Chris Clifton 27 August 2018 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 AD-hoc IR: Basic Process Information
More informationIndex construction CE-324: Modern Information Retrieval Sharif University of Technology
Index construction CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Ch.
More informationRegular Languages and Regular Expressions
Regular Languages and Regular Expressions According to our definition, a language is regular if there exists a finite state automaton that accepts it. Therefore every regular language can be described
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationQ: Given a set of keywords how can we return relevant documents quickly?
Keyword Search Traditional B+index is good for answering 1-dimensional range or point query Q: What about keyword search? Geo-spatial queries? Q: Documents on Computer Science? Q: Nearby coffee shops?
More informationIndexing and Query Processing. What will we cover?
Indexing and Query Processing CS 510 Winter 2007 1 What will we cover? Key concepts and terminology Inverted index structures Organization, creation, maintenance Compression Distribution Answering queries
More informationIndex construction CE-324: Modern Information Retrieval Sharif University of Technology
Index construction CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Ch.
More informationMATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix.
MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix. Row echelon form A matrix is said to be in the row echelon form if the leading entries shift to the
More informationAssignment 1. Assignment 2. Relevance. Performance Evaluation. Retrieval System Evaluation. Evaluate an IR system
Retrieval System Evaluation W. Frisch Institute of Government, European Studies and Comparative Social Science University Vienna Assignment 1 How did you select the search engines? How did you find the
More informationELCT201: DIGITAL LOGIC DESIGN
ELCT201: DIGITAL LOGIC DESIGN Dr. Eng. Haitham Omran, haitham.omran@guc.edu.eg Dr. Eng. Wassim Alexan, wassim.joseph@guc.edu.eg Lecture 3 Following the slides of Dr. Ahmed H. Madian محرم 1439 ه Winter
More informationCSCI 2121 Computer Organization and Assembly Language PRACTICE QUESTION BANK
CSCI 2121 Computer Organization and Assembly Language PRACTICE QUESTION BANK Question 1: Choose the most appropriate answer 1. In which of the following gates the output is 1 if and only if all the inputs
More informationDATA MINING - 1DL105, 1DL111
1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database
More informationEECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are:
Problem 1: CLD2 Problems. (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are: C 0 = A + BD + C + BD C 1 = A + CD + CD + B C 2 = A + B + C + D C 3 = BD + CD + BCD + BC C 4
More information1. Prove that if you have tri-state buffers and inverters, you can build any combinational logic circuit. [4]
HW 3 Answer Key 1. Prove that if you have tri-state buffers and inverters, you can build any combinational logic circuit. [4] You can build a NAND gate from tri-state buffers and inverters and thus you
More informationGeneralization of Hierarchical Crisp Clustering Algorithms to Fuzzy Logic
Generalization of Hierarchical Crisp Clustering Algorithms to Fuzzy Logic Mathias Bank mathias.bank@uni-ulm.de Faculty for Mathematics and Economics University of Ulm Dr. Friedhelm Schwenker friedhelm.schwenker@uni-ulm.de
More informationEE244: Design Technology for Integrated Circuits and Systems Outline Lecture 9.2. Introduction to Behavioral Synthesis (cont.)
EE244: Design Technology for Integrated Circuits and Systems Outline Lecture 9.2 Introduction to Behavioral Synthesis (cont.) Relationship to silicon compilation Stochastic Algorithms and Learning EE244
More information58093 String Processing Algorithms. Lectures, Autumn 2013, period II
58093 String Processing Algorithms Lectures, Autumn 2013, period II Juha Kärkkäinen 1 Contents 0. Introduction 1. Sets of strings Search trees, string sorting, binary search 2. Exact string matching Finding
More informationVALLIAMMAI ENGINEERING COLLEGE. SRM Nagar, Kattankulathur DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING EC6302 DIGITAL ELECTRONICS
VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur-603 203 DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING EC6302 DIGITAL ELECTRONICS YEAR / SEMESTER: II / III ACADEMIC YEAR: 2015-2016 (ODD
More informationA mathematician has asked us to design a simple digital device that works similarly to a pocket calculator.
Lecture 1: Let's Put Together - Manual Processor Customer Specification mathematician has asked us to design a simple digital device that works similarly to a pocket calculator. The mathematician is interested
More informationTechnology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas
Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas 1 RTL Design Flow HDL RTL Synthesis Manual Design Module Generators Library netlist
More informationGate-Level Minimization
MEC520 디지털공학 Gate-Level Minimization Jee-Hwan Ryu School of Mechanical Engineering Gate-Level Minimization-The Map Method Truth table is unique Many different algebraic expression Boolean expressions may
More informationRelevant?!? Algoritmi per IR. Goal of a Search Engine. Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Web Search
Algoritmi per IR Web Search Goal of a Search Engine Retrieve docs that are relevant for the user query Doc: file word or pdf, web page, email, blog, e-book,... Query: paradigm bag of words Relevant?!?
More informationVALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER
VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER CS6007-INFORMATION RETRIEVAL Regulation 2013 Academic Year 2018
More informationVLSI Test Technology and Reliability (ET4076)
VLSI Test Technology and Reliability (ET476) Lecture 5 Combinational Circuit Test Generation (Chapter 7) Said Hamdioui Computer Engineering Lab elft University of Technology 29-2 Learning aims of today
More informationCS/ECE 374 Fall Homework 1. Due Tuesday, September 6, 2016 at 8pm
CSECE 374 Fall 2016 Homework 1 Due Tuesday, September 6, 2016 at 8pm Starting with this homework, groups of up to three people can submit joint solutions. Each problem should be submitted by exactly one
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Web Crawler Finds and downloads web pages automatically provides the collection for searching Web is huge and constantly
More informationPage 1. Outline. A Good Reference and a Caveat. Testing. ECE 254 / CPS 225 Fault Tolerant and Testable Computing Systems. Testing and Design for Test
Page Outline ECE 254 / CPS 225 Fault Tolerant and Testable Computing Systems Testing and Design for Test Copyright 24 Daniel J. Sorin Duke University Introduction and Terminology Test Generation for Single
More informationMining Frequent Patterns without Candidate Generation
Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview
More informationCMPUT 403: Strings. Zachary Friggstad. March 11, 2016
CMPUT 403: Strings Zachary Friggstad March 11, 2016 Outline Tries Suffix Arrays Knuth-Morris-Pratt Pattern Matching Tries Given a dictionary D of strings and a query string s, determine if s is in D. Using
More informationLogistics. CSE Case Studies. Indexing & Retrieval in Google. Review: AltaVista. BigTable. Index Stream Readers (ISRs) Advanced Search
CSE 454 - Case Studies Indexing & Retrieval in Google Some slides from http://www.cs.huji.ac.il/~sdbi/2000/google/index.htm Logistics For next class Read: How to implement PageRank Efficiently Projects
More informationChapter 4: Mining Frequent Patterns, Associations and Correlations
Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent
More informationInformation Networks. Hacettepe University Department of Information Management DOK 422: Information Networks
Information Networks Hacettepe University Department of Information Management DOK 422: Information Networks Search engines Some Slides taken from: Ray Larson Search engines Web Crawling Web Search Engines
More informationSuffix Trees and Arrays
Suffix Trees and Arrays Yufei Tao KAIST May 1, 2013 We will discuss the following substring matching problem: Problem (Substring Matching) Let σ be a single string of n characters. Given a query string
More informationIntelligent flexible query answering Using Fuzzy Ontologies
International Conference on Control, Engineering & Information Technology (CEIT 14) Proceedings - Copyright IPCO-2014, pp. 262-277 ISSN 2356-5608 Intelligent flexible query answering Using Fuzzy Ontologies
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2015/16 IR Chapter 04 Index Construction Hardware In this chapter we will look at how to construct an inverted index Many
More informationCOMP combinational logic 1 Jan. 18, 2016
In lectures 1 and 2, we looked at representations of numbers. For the case of integers, we saw that we could perform addition of two numbers using a binary representation and using the same algorithm that
More informationALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007
ALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007 This examination is a three hour exam. All questions carry the same weight. Answer all of the following six questions.
More informationImproving Memory Repair by Selective Row Partitioning
200 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems Improving Memory Repair by Selective Row Partitioning Muhammad Tauseef Rab, Asad Amin Bawa, and Nur A. Touba Computer
More informationCHAPTER THREE INFORMATION RETRIEVAL SYSTEM
CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost
More informationCSCI 5417 Information Retrieval Systems Jim Martin!
CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 4 9/1/2011 Today Finish up spelling correction Realistic indexing Block merge Single-pass in memory Distributed indexing Next HW details 1 Query
More informationBawar Abid Abdalla. Assistant Lecturer Software Engineering Department Koya University
Logic Design First Stage Lecture No.5 Boolean Algebra Bawar Abid Abdalla Assistant Lecturer Software Engineering Department Koya University Boolean Operations Laws of Boolean Algebra Rules of Boolean Algebra
More informationWhat s new in SharePoint Search 2010 for end users. IW109 Mirjam van Olst
What s new in SharePoint Search 2010 for end users IW109 Mirjam van Olst About Mirjam Microsoft Certified Master SharePoint 2007 MVP SharePoint Server SharePoint Architect at Macaw Co-organizer DIWUG and
More information