Information Retrieval CSCI
|
|
- Meagan Morgan
- 5 years ago
- Views:
Transcription
1 Information Retrieval CSCI My name is Anwar Alhenshiri My is: I prefer: The course website is: 5/6/2012 1
2 Introductory Lecture Information Retrieval 5/6/2012 2
3 Q6. When you gather information from the Web(say for a project, a school report you are doing, or a trip you are planning to take), how many times do you usually change your query to get relevant results? A- Never, I always get what I want in the first attempt B- One time C- Two Times D- Three Times E- More 5/6/2012 3
4 IR and Databases Data Attributes Queries Results IR Databases 5/6/2012 4
5 IR and Databases IR Databases Data unstructured structured Attributes Queries Results 5/6/2012 5
6 IR and Databases IR Databases Data unstructured structured Attributes vague well defined Queries Results 5/6/2012 6
7 IR and Databases IR Databases Data unstructured structured Attributes vague well defined Queries Results keyword and features SQL defined 5/6/2012 7
8 IR and Databases IR Databases Data unstructured structured Attributes vague well defined Queries keyword and features Results imprecise exact SQL defined 5/6/2012 8
9 Definition Information retrieval (IR) is: finding material (usually documents). of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). (Manning et al., 2009) 5/6/2012 9
10 Docs DB Introduction Index Terms Doc abstract match Information Need Query 5/6/
11 Definitions A database is a collection of documents. A document is a sequence of terms, expressing ideas about some topic in a natural language. A term is a semantic unit, a word, phrase, or potentially root of a word. A query is a request for documents pertaining to some topic. 5/6/
12 So, what would be the job of an information retrieval system? 5/6/
13 Information Retrieval An Information Retrieval (IR) System attempts to find relevant documents to respond to a user s request. The real problem boils down to matching the language of the query to the language of the document 5/6/
14 IR Problems Simply matching on words is a very brittle approach. One word can have a zillion different semantic meanings Consider the word Take take a place at the table take money to the bank take a picture take a lot of time take drugs 5/6/
15 IR Problems, cont d You can t even tell what part of speech a word has: I saw her duck A query that searches for pictures of a duck will find documents that contain I saw her duck away from the ball falling from the sky 5/6/
16 IR Problems, cont d Proper Nouns often use regular old nouns Consider a document with a man named Abraham owned a Lincoln A word matching query for Abraham Lincoln may well find the above document. 5/6/
17 What is Different about IR from the Rest of Computer Science Most algorithms in computer science have a right answer: Consider the two problems: Sort the following ten integers Find the highest integer Now consider: Find the document most relevant to hippos in the zoo 5/6/
18 Measuring Effectiveness heuristic vs. exact An algorithm is deemed incorrect if it does not have a right answer. A heuristic tries to guess something close to the right answer. Heuristics are measured on how close they come to a right answer. IR techniques are essentially heuristics because we do not know the right answer. So we have to measure how close to the right answer we can come. 5/6/
19 Information Retrieval 5/6/
20 Back to the Definition of IR 5/6/
21 Unstructured (text) vs. Structured (database) data in 1996 Market 5/6/2012 capitalization/capitalisation (often market cap) is a measurement of size of a business enterprise (corporation) equal to the share price times the number of shares outstanding of a public company. 21
22 Unstructured (text) vs. Structured (database) data in /6/
23 Unstructured Data Typically refers to free text Allows Keyword queries including operators More sophisticated concept queries e.g., find all web pages dealing with drug abuse Classic model for searching text documents 5/6/
24 Semi-Structured Data In fact almost no data is unstructured E.g., this slide has distinctly identified zones such as the Title and Bullets Facilitates semi-structured search such as Title contains data AND Bullets contain search to say nothing of linguistic structure 5/6/
25 More Sophisticated Semi-Structured Search Title is about Object Oriented Programming AND Author something like stro*rup where * is the wild-card operator Issues: how do you process about? how do you rank results? The focus of XML search (IIR chapter 10) 5/6/
26 The Web and its Challenges Unusual and diverse documents Unusual and diverse users, queries, information needs Beyond terms, exploit ideas from social networks link analysis, clickstreams... How do search engines work? And how can we make them better? 5/6/
27 More Sophisticated Information Retrieval Cross-language information retrieval Question answering Summarization Text mining 5/6/
28 Sec. 1.1 Basic assumptions of Information Retrieval Collection: Fixed set of documents Goal: Retrieve documents with information that is relevant to the user s information need and helps the user complete a task 5/6/
29 Again from the Definition Document is Set of terms Bag of terms Sequence of terms Each choice has consequences term is used instead of word to signal more general possibilities: serial numbers, nonsense, etc. 5/6/
30 Modeling: query Query Basic query is one term Multi-term query is List of terms OR model: some terms AND model: all terms Boolean combination of terms Other constraints? 5/6/
31 Information Retrieval (NOTION) User wants information from a collection of objects : information need User formulates need as a query Language of information retrieval system System finds objects that satisfy query System presents objects to user in useful form User determines which objects from among those presented are relevant 5/6/
32 Information Retrieval (NOTION), cont d Define each of the words in quotes Information object Query Satisfying objects Useful presentation Notion of relevance is critical What really want? Insufficient structure for exact retrieval Develop algorithms for the search and retrieval tasks 5/6/
33 Documents Early digital searches digital card catalog: subject classifications, keywords Full text : words + English structure No meta-structure Classic study Gerald Salton SMART project 1960 s 5/6/
34 Scaling What are attributes changing from 1960 s to online searches of today? Some answers: Much much larger collections Heterogeneous collections Collections dynamic: documents come, go, change Decentralized / distributed collections More diverse users Use for relevance? More demanding users More complex queries Much much more computing resources 5/6/
35 Scaling, cont d How do these changes change problem? Some thoughts: lower concentration of clues i.e. important words computing power through clustering more complex algorithms others? 5/6/
36 The Classic Search Model TASK Info Need Verbal form Query Misconception? Mistranslation? Misformulation? Get rid of mice in a politically correct way Info about removing mice without killing them How do I trap mice alive? mouse trap SEARCH ENGINE Query Results Corpus 5/6/2012 Refinement 36
37 Readings Please read Chapter 1 of the Modern Information Retrieval Book. Credits: 5/6/
38 Information Retrieval Zipf Law 5/6/
39 Zipf Law Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Meaning that: Too frequent terms are less important. Too rare terms are less important!? 5/6/
40 A log-log plot of Words in Wikipedia A plot of word frequency in Wikipedia (November 27, 2006). The plot is in log-log coordinates. x is rank of a word in the frequency table; y is the total number of the word s occurrences. Most popular words are the, of and and, as expected. Zipf's law corresponds to the upper linear portion of the curve, roughly following the green (1/x) line. The Zipf constant of an English corpus is close to /6/
41 Zipf Law, cont d Zipf's law is most easily observed by plotting the data on a loglog graph. The axes being X= log(rank order) and Y=log(frequency). For example, the word "the" would appear at x = log(1), y = log(69971) is the number of times the word the appears in a hypothetical corpus. 1 is the rank of the word the. The data conform to Zipf's law to the extent that the plot is almost linear. 5/6/
42 Zipf Law, cont d To apply Zipf s law: Terms in the corpus of interest may be organized so that: One column carries the ranks of the terms starting with the most common terms with rank 1. Continue increasing the number in the first column as more terms appear. Ties do not matter. The second column holds the number of times each term appears in the corpus. 5/6/
43 Zipf Law Formally, cont d Zipf Law, what we can determine: R * P = A P is the probability of the number of occurrences of the term = n N A is the Zipf value of the term. R n n N = A R n = AN n The number of terms that occur n times: I n = R n R n+1 I n = AN n AN n+1 = AN n n+1 I n AN = 1 n n+1 When the frequency is 1, we get the highest rank: In /An = 1 / n (n+1) = how many terms occur n times. 5/6/
44 Example n (Rank) 1 Percentage n(n + 1) % of terms occur ONCE Around 66% of terms occur at most TWICE Around 75% of terms occur at most three times 4 5/6/
45 Zipf Law Allows us to decide on what terms to keep and what terms to ignore when indexing. If Zipf law holds, the loglog plot will be a straight line with a slope close to -1. See figure. 5/6/
46 Summary The simplest case of Zipf's law is a "1/f function". Given a set of Zipfian distributed frequencies, sorted from most common to least common, the second most common frequency will occur ½ as often as the first. The third most common frequency will occur 1/3 as often as the first. The n th most common frequency will occur 1/n as often as the first. However, this cannot hold exactly, because items must occur an integer number of times: there cannot be 2.5 occurrences of a word. Nevertheless, over fairly wide ranges, and to a fairly good approximation, many natural phenomena obey Zipf's law. 5/6/
Information Retrieval
Introduction to Information Retrieval Information Retrieval and Web Search Lecture 1: Introduction and Boolean retrieval Outline ❶ Course details ❷ Information retrieval ❸ Boolean retrieval 2 Course details
More informationInformation Retrieval
Introduction to Information Retrieval CS3245 Information Retrieval Lecture 2: Boolean retrieval 2 Blanks on slides, you may want to fill in Last Time: Ngram Language Models Unigram LM: Bag of words Ngram
More informationCS 572: Information Retrieval. Lecture 1: Course Overview and Introduction 11 January 2016
CS 572: Information Retrieval Lecture 1: Course Overview and Introduction 11 January 2016 1/11/2016 CS 572: Information Retrieval. Spring 2016 1 Lecture Plan What is IR? (the big questions) Course overview
More informationBoolean retrieval & basics of indexing CE-324: Modern Information Retrieval Sharif University of Technology
Boolean retrieval & basics of indexing CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan lectures
More informationInformation Retrieval
Introduction to Information Retrieval Boolean retrieval Basic assumptions of Information Retrieval Collection: Fixed set of documents Goal: Retrieve documents with information that is relevant to the user
More informationIntroducing Information Retrieval and Web Search. borrowing from: Pandu Nayak
Introducing Information Retrieval and Web Search borrowing from: Pandu Nayak Information Retrieval Information Retrieval (IR) is finding material (usually documents) of an unstructured nature (usually
More informationBoolean retrieval & basics of indexing CE-324: Modern Information Retrieval Sharif University of Technology
Boolean retrieval & basics of indexing CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2013 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationInformation Retrieval
Introduction to Information Retrieval CS276 Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 1: Boolean retrieval Information Retrieval Information Retrieval (IR) is finding
More informationIntroduction to Information Retrieval. (COSC 488) Spring Nazli Goharian. Course Outline
Introduction to Information Retrieval (COSC 488) Spring 2012 Nazli Goharian nazli@cs.georgetown.edu Course Outline Introduction Retrieval Strategies (Models) Retrieval Utilities Evaluation Indexing Efficiency
More informationElementary IR: Scalable Boolean Text Search. (Compare with R & G )
Elementary IR: Scalable Boolean Text Search (Compare with R & G 27.1-3) Information Retrieval: History A research field traditionally separate from Databases Hans P. Luhn, IBM, 1959: Keyword in Context
More informationInformation Retrieval
Information Retrieval Suan Lee - Information Retrieval - 01 Boolean Retrieval 1 01 Boolean Retrieval - Information Retrieval - 01 Boolean Retrieval 2 Introducing Information Retrieval and Web Search -
More informationBoolean retrieval & basics of indexing CE-324: Modern Information Retrieval Sharif University of Technology
Boolean retrieval & basics of indexing CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan lectures
More informationVannevar Bush. Information Retrieval. Prophetic: Hypertext. Historic Vision 2/8/17
Information Retrieval Vannevar Bush Director of the Office of Scientific Research and Development (1941-1947) Vannevar Bush,1890-1974 End of WW2 - what next big challenge for scientists? 1 Historic Vision
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections
More informationInformation Retrieval
Introduction to Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval Information Retrieval (IR) is finding material (usually documents) of an unstructurednature
More informationAdvanced Retrieval Information Analysis Boolean Retrieval
Advanced Retrieval Information Analysis Boolean Retrieval Irwan Ary Dharmawan 1,2,3 iad@unpad.ac.id Hana Rizmadewi Agustina 2,4 hagustina@unpad.ac.id 1) Development Center of Information System and Technology
More informationInformation Retrieval
Introduction Information Retrieval Information retrieval is a field concerned with the structure, analysis, organization, storage, searching and retrieval of information Gerard Salton, 1968 J. Pei: Information
More informationInformation Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationPlan for today. CS276B Text Retrieval and Mining Winter Vector spaces and XML. Text-centric XML retrieval. Vector spaces and XML
CS276B Text Retrieval and Mining Winter 2005 Plan for today Vector space approaches to XML retrieval Evaluating text-centric retrieval Lecture 15 Text-centric XML retrieval Documents marked up as XML E.g.,
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught at UT Austin and Stanford) Information Retrieval
More informationMultimedia Information Systems
Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 6: Text Information Retrieval 1 Digital Video Library Meta-Data Meta-Data Similarity Similarity Search Search Analog Video Archive
More information60-538: Information Retrieval
60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are
More informationIntroduction to Information Retrieval (Manning, Raghavan, Schutze)
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 3 Dictionaries and Tolerant retrieval Chapter 4 Index construction Chapter 5 Index compression Content Dictionary data structures
More informationNear Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri
Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions
More informationData Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data
More informationMore on indexing CE-324: Modern Information Retrieval Sharif University of Technology
More on indexing CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Plan
More informationUnstructured Data Management. Advanced Topics in Database Management (INFSCI 2711)
Unstructured Data Management Advanced Topics in Database Management (INFSCI 2711) Textbooks: Database System Concepts - 2010 Introduction to Information Retrieval - 2008 Vladimir Zadorozhny, DINS, SCI,
More informationInformation Retrieval. Information Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent
More informationCSCI 5417 Information Retrieval Systems! What is Information Retrieval?
CSCI 5417 Information Retrieval Systems! Lecture 1 8/23/2011 Introduction 1 What is Information Retrieval? Information retrieval is the science of searching for information in documents, searching for
More informationCIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.
CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Text To Knowledge IR and Boolean Search Text to Knowledge (IE)
More informationInstructor: Stefan Savev
LECTURE 2 What is indexing? Indexing is the process of extracting features (such as word counts) from the documents (in other words: preprocessing the documents). The process ends with putting the information
More informationMultimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency
Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Ralf Moeller Hamburg Univ. of Technology Acknowledgement Slides taken from presentation material for the following
More informationInformation Retrieval
Information Retrieval Natural Language Processing: Lecture 12 30.11.2017 Kairit Sirts Homework 4 things that seemed to work Bidirectional LSTM instead of unidirectional Change LSTM activation to sigmoid
More informationIntroduction to Information Retrieval
Mustafa Jarrar: Lecture Notes on Information Retrieval University of Birzeit, Palestine 2014 Introduction to Information Retrieval Dr. Mustafa Jarrar Sina Institute, University of Birzeit mjarrar@birzeit.edu
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationDesigning and Building an Automatic Information Retrieval System for Handling the Arabic Data
American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far
More informationIn = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most
In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100
More informationInformation Retrieval
Introduction to Information Retrieval CS3245 Information Retrieval Lecture 6: Index Compression 6 Last Time: index construction Sort- based indexing Blocked Sort- Based Indexing Merge sort is effective
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationInformation Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes
CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten
More informationIntroduction to Information Retrieval and Boolean model. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H.
Introduction to Information Retrieval and Boolean model Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Unstructured (text) vs. structured (database) data in late
More informationΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου
Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Σηµερινό ερώτηµα Typically we want to retrieve the top K docs (in the cosine ranking for the query) not totally order all docs in the corpus can we pick off docs
More informationInformation Retrieval
Introduction to Information Retrieval Lecture 1: Boolean retrieval 1 Sec. 1.1 Unstructured data in 1680 Which plays of Shakespeare contain the words Brutus AND Caesar but NOT Calpurnia? One could grep
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)"
CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Retrieval Models" Provide
More informationOutline of the course
Outline of the course Introduction to Digital Libraries (15%) Description of Information (30%) Access to Information (30%) User Services (10%) Additional topics (15%) Buliding of a (small) digital library
More informationmodern database systems lecture 4 : information retrieval
modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation
More informationRanked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?
Ranked Retrieval One option is to average the precision scores at discrete Precision 100% 0% More junk 100% Everything points on the ROC curve But which points? Recall We want to evaluate the system, not
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationChapter 2. Architecture of a Search Engine
Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them
More informationDATA MINING - 1DL105, 1DL111
1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database
More informationIntroduction to Information Retrieval. Hongning Wang
Introduction to Information Retrieval Hongning Wang CS@UVa What is information retrieval? 2 Why information retrieval Information overload It refers to the difficulty a person can have understanding an
More informationIntroduction to IR Systems: Supporting Boolean Text Search
Introduction to IR Systems: Supporting Boolean Text Search Ramakrishnan & Gehrke: Chapter 27, Sections 27.1 27.2 CPSC 404 Laks V.S. Lakshmanan 1 Information Retrieval A research field traditionally separate
More informationLevel of analysis Finding Out About Chapter 3: 25 Sept 01 R. K. Belew
Overview The fascination with the subliminal, the camouflaged, and the encrypted is ancient. Getting a computer to munch away at long strings of letters from the Old Testament is not that different from
More informationCS-490WIR Web Information Retrieval and Management. Luo Si
CS490W: Web Information Retrieval & Management CS-490WIR Web Information Retrieval and Management Luo Si Department of Computer Science Purdue University Overview Web: Growth of the Web The world produces
More informationDefinitions. Lecture Objectives. Text Technologies for Data Science INFR Learn about main concepts in IR 9/19/2017. Instructor: Walid Magdy
Text Technologies for Data Science INFR11145 Definitions Instructor: Walid Magdy 19-Sep-2017 Lecture Objectives Learn about main concepts in IR Document Information need Query Index BOW 2 1 IR in a nutshell
More informationBoolean Model. Hongning Wang
Boolean Model Hongning Wang CS@UVa Abstraction of search engine architecture Indexed corpus Crawler Ranking procedure Doc Analyzer Doc Representation Query Rep Feedback (Query) Evaluation User Indexer
More informationInformation Retrieval
Information Retrieval WS 2016 / 2017 Lecture 2, Tuesday October 25 th, 2016 (Ranking, Evaluation) Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University
More informationDatabases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016
+ Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html
More informationInformation Retrieval and Knowledge Organisation
Information Retrieval and Knowledge Organisation Knut Hinkelmann Content Information Retrieval Indexing (string search and computer-linguistic aproach) Classical Information Retrieval: Boolean, vector
More informationIndex construction CE-324: Modern Information Retrieval Sharif University of Technology
Index construction CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Ch.
More informationDatabase Management Systems MIT Introduction By S. Sabraz Nawaz
Database Management Systems MIT 22033 Introduction By S. Sabraz Nawaz Recommended Reading Database Management Systems 3 rd Edition, Ramakrishnan, Gehrke Murach s SQL Server 2008 for Developers Any book
More informationCS490W: Web Information Search & Management. CS-490W Web Information Search and Management. Luo Si. Department of Computer Science Purdue University
CS490W: Web Information Search & Management CS-490W Web Information Search and Management Luo Si Department of Computer Science Purdue University Overview Web: Growth of the Web The world produces between
More informationProblem Solving through Programming In C Prof. Anupam Basu Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur
Problem Solving through Programming In C Prof. Anupam Basu Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 04 Introduction to Programming Language Concepts
More informationInformation Retrieval
Information Retrieval Suan Lee - Information Retrieval - 06 Scoring, Term Weighting and the Vector Space Model 1 Recap of lecture 5 Collection and vocabulary statistics: Heaps and Zipf s laws Dictionary
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationThis lecture: IIR Sections Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring
This lecture: IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring 1 Ch. 6 Ranked retrieval Thus far, our queries have all
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationCHAPTER 5 Querying of the Information Retrieval System
5.1 Introduction CHAPTER 5 Querying of the Information Retrieval System Information search and retrieval involves finding out useful documents from a store of information. In any information search and
More informationInforma/on Retrieval. Text Search. CISC437/637, Lecture #23 Ben CartereAe. Consider a database consis/ng of long textual informa/on fields
Informa/on Retrieval CISC437/637, Lecture #23 Ben CartereAe Copyright Ben CartereAe 1 Text Search Consider a database consis/ng of long textual informa/on fields News ar/cles, patents, web pages, books,
More informationRelevance of a Document to a Query
Relevance of a Document to a Query Computing the relevance of a document to a query has four parts: 1. Computing the significance of a word within document D. 2. Computing the significance of word to document
More informationInformation Retrieval (Part 1)
Information Retrieval (Part 1) Fabio Aiolli http://www.math.unipd.it/~aiolli Dipartimento di Matematica Università di Padova Anno Accademico 2008/2009 1 Bibliographic References Copies of slides Selected
More informationCHAPTER-26 Mining Text Databases
CHAPTER-26 Mining Text Databases 26.1 Introduction 26.2 Text Data Analysis and Information Retrieval 26.3 Basle Measures for Text Retrieval 26.4 Keyword-Based and Similarity-Based Retrieval 26.5 Other
More informationCS 6320 Natural Language Processing
CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic
More informationProcessing Structural Constraints
SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited
More informationCS105 Introduction to Information Retrieval
CS105 Introduction to Information Retrieval Lecture: Yang Mu UMass Boston Slides are modified from: http://www.stanford.edu/class/cs276/ Information Retrieval Information Retrieval (IR) is finding material
More informationFall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12
Fall 2016 CS646: Information Retrieval Lecture 2 - Introduction to Search Result Ranking Jiepu Jiang University of Massachusetts Amherst 2016/09/12 More course information Programming Prerequisites Proficiency
More informationThe Security Role for Content Analysis
The Security Role for Content Analysis Jim Nisbet Founder, Tablus, Inc. November 17, 2004 About Us Tablus is a 3 year old company that delivers solutions to provide visibility to sensitive information
More informationAdministrative. Distributed indexing. Index Compression! What I did last summer lunch talks today. Master. Tasks
Administrative Index Compression! n Assignment 1? n Homework 2 out n What I did last summer lunch talks today David Kauchak cs458 Fall 2012 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture5-indexcompression.ppt
More informationUser-Centered and System-Centered IR
User-Centered and System-Centered IR Information Retrieval Lecture 2 User tasks Role of the system Document view and model Lecture 2 Information Retrieval 1 What is Information Retrieval? IR is the study
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationLanguage engineering and Domain Specific Languages
Language engineering and Domain Specific Languages Perdita Stevens School of Informatics University of Edinburgh Plan 1. Defining languages 2. General purpose languages vs domain specific languages 3.
More informationDatabase Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.
Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 18 Transaction Processing and Database Manager In the previous
More information21. Search Models and UIs for IR
21. Search Models and UIs for IR INFO 202-10 November 2008 Bob Glushko Plan for Today's Lecture The "Classical" Model of Search and the "Classical" UI for IR Web-based Search Best practices for UIs in
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams
More informationStructural Text Features. Structural Features
Structural Text Features CISC489/689 010, Lecture #13 Monday, April 6 th Ben CartereGe Structural Features So far we have mainly focused on vanilla features of terms in documents Term frequency, document
More informationRanking of ads. Sponsored Search
Sponsored Search Ranking of ads Goto model: Rank according to how much advertiser pays Current model: Balance auction price and relevance Irrelevant ads (few click-throughs) Decrease opportunities for
More informationIndex Compression. David Kauchak cs160 Fall 2009 adapted from:
Index Compression David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture5-indexcompression.ppt Administrative Homework 2 Assignment 1 Assignment 2 Pair programming?
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationWeb Information Retrieval using WordNet
Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT
More informationPlan. Language engineering and Domain Specific Languages. Language designer defines syntax. How to define language
Plan Language engineering and Domain Specific Languages Perdita Stevens School of Informatics University of Edinburgh 1. Defining languages 2. General purpose languages vs domain specific languages 3.
More informationInformation Retrieval
Information Retrieval Suan Lee - Information Retrieval - 05 Index Compression 1 05 Index Compression - Information Retrieval - 05 Index Compression 2 Last lecture index construction Sort-based indexing
More informationCOMP6237 Data Mining Searching and Ranking
COMP6237 Data Mining Searching and Ranking Jonathon Hare jsh2@ecs.soton.ac.uk Note: portions of these slides are from those by ChengXiang Cheng Zhai at UIUC https://class.coursera.org/textretrieval-001
More informationIntroduction to Data Management. Lecture #1 (Course Trailer )
Introduction to Data Management Lecture #1 (Course Trailer ) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Topics! Welcome to my biggest
More informationGuide to the Enterprise Catalogue
Guide to the Enterprise Catalogue Welcome to the new Enterprise online catalogue at the Mississauga Library System. We hope that you will find the catalogue easy to use. This handout should get you started.
More informationIndexing and Query Processing. What will we cover?
Indexing and Query Processing CS 510 Winter 2007 1 What will we cover? Key concepts and terminology Inverted index structures Organization, creation, maintenance Compression Distribution Answering queries
More informationNatural Language Processing
Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document
More informationInformatics 1: Data & Analysis
Informatics 1: Data & Analysis Lecture 9: Trees and XML Ian Stark School of Informatics The University of Edinburgh Tuesday 11 February 2014 Semester 2 Week 5 http://www.inf.ed.ac.uk/teaching/courses/inf1/da
More informationTask Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document
More information