VK Multimedia Information Systems
|
|
- Gabriella Daniel
- 6 years ago
- Views:
Transcription
1 VK Multimedia Information Systems Mathias Lux, This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0
2 Information Retrieval Basics: Agenda Information Retrieval History Information Retrieval & Data Retrieval Searching & Browsing Information Retrieval Models
3 Information Retrieval History Currently there are no museums for IR IR is the process of searching through a document collection based on a particular information need.
4 IR Key Concepts Searching Indexing, Ranking Document Collection Textual, Visual, Auditive Particular Needs Query, User based
5 A History of Libraries Libraries are perfect examples for document collections. Wall paintings in caves e.g. Altamira, ~ 18,500 years old Writing in clay, stone, bones e.g. Mesopotamian cuneiforms, ~ BC e.g. Chinese tortoise-shell carvings, ~ BC e.g. Hieroglyphic inscriptions, Narmer Palette ~ BC
6 A History of Libraries (ctd.) Papyrus Specific plant (subtropical) Organized in rolls, e.g. in Alexandria Parchment Independence from papyrus Sewed together in books Paper Invented in China (bones and bamboo too heavy, silk too expensive) Invention spread -> in 1120 first paper mill in Europe
7 A History of Libraries (ctd.) Gutenberg s printing press (1454) Inexpensive reproduction e.g. Gutenberg Bible Organization & Storage Dewey Decimal System (DDC, 1872) Card Catalog (early 1900s) Microfilm (1930s) MARC (Machine Readable Cataloging, 1960s) Digital computers (1940s+)
8 Library & Archives today Partially converted to electronic catalogues From a certain time point on ( ) Often based on proprietary systems Digitization happens slowly No full text search available Problems with preservation Storage devices & formats
9 History of Searching Browsing Like finding information yourself Catalogs Organized in taxonomies, keywords, etc. Content Based Searching SELECT * FROM books WHERE title= %Search% Information Retrieval ranking, models, weighting link analysis, LSA,...
10 History of IR Starts with development of computers Term Information Retrieval coined by Mooers in 1950 Mooers, C. (March 1950). "The theory of digital handling of nonnumerical information and its implications to machine economics". Proceedings of the meeting of the Association for Computing Machinery at Rutgers University. Two main periods (Spark Jones u. Willett) : academic research models and basics main topics: search & indexing : commercial applications improvement of basic methods
11 A Challenge: The World Wide Web First actual implementation of Hypertext Interconnected documents Linked and referenced World Wide Web (1989, T. Berners-Lee) Unidirectional links (target is not aware) Links are not typed Simple document format & communication protocol (HTML & HTTP) Distributed and not controlled
12 Some IR History Milestones Book Automatic Information Organization and Retrieval, Gerard Salton (1968) Vector space model Paper A statistical interpretation of term specificity and its application in retrieval, Karen Sparck Jones (1972) IDF weighting Book Information Retrieval of C.J. Rijsbergen (1975) Probabilistic model
13 Some IR History Milestones Paper Indexing by Latent Semantic Analysis, S. Deerwester, Susan Dumais, G. W. Furnas, T. K. Landauer, R. Harshman (1990). Latent Semantic Indexing Paper Some simple effective approximations to the 2- Poisson model for probabilistic weighted retrieval Robertsen & Walker (1994) BM25 weighting scheme Paper The Anatomy of a Large-Scale Hypertextual Web Search Engine, Sergey Brin & Larry Page (1998) World wide web retrieval
14 Information Retrieval Basics: Agenda Information Retrieval History Information Retrieval & Data Retrieval Searching & Browsing Information Retrieval Models
15 Organizational: References in the Library Modern Information Retrieval, Ricardo Baeza-Yates & Berthier Ribeiro-Neto, Addison Wesley Google's Pagerank and Beyond: The Science of Search Engine Rankings, Amy N. Langville & Carl D. Meyer, University Presses of CA Readings in Information Retrieval, Karen Sparck Jones, Peter Willett, Morgan Kaufmann
16 Organizational: References On the WWW Skriptum Information Retrieval, Norbert Fuhr, Lecture Notes on Information Retrieval - Univ. Dortmund, Updated in 2002 Information Retrieval 2nd Edt., C.J. Rijsbergen, Butterworth, London 1979 Through me: Lectures on Information Retrieval: Third European Summer-School, Essir 2000 Varenna, Italy, Revised Lectures, Maristella Agosti, Fabio Crestani & Gabriela Pasi (eds.), Lecture Notes in Computer Science, Springer 2000
17 Information Retrieval & Data Retrieval Information Retrieval Data Retrieval Information Level Search Engine Teoma / Google Data Level Data Base Oracle / MySQL
18 Information Retrieval & Data Retrieval Information Retrieval Content Based Search Query ambigous Results ranked by relevance Error tolerant Multiple iterations Examples Search for synonyms Bag of Words Data Retrieval Search for Patterns and String Query formal & unambigous Results not ranked Not error tolerant Clearly defined result set Examples Search for patterns SQL Statement Retrieval is nearly always a combination of both.
19 Information Retrieval Basics: Agenda Information Retrieval History Information Retrieval & Data Retrieval Searching & Browsing Information Retrieval Models
20 Information Retrieval Basics: Searching A user has an information need, which needs to be satisfied. Two different approaches: Browsing Searching
21 Searching & Browsing Searching Explicit information need Definition through query Result lists e.g. Google Browsing User Not necessarily explicit need Navigation through repositories Searching Browsing Docum ents
22 Browsing Flat Browsing User navigates through set of documents No implied ordering, explicit ordering possible Examples: One single directory, one single file Structure Guided Browsing An explicit structure is available for navigation Mostly hierarchical (file directories) Can be generic digraph (WWW) Examples: File systems, World Wide Web
23 Searching Query defines Information Need Ad Hoc Searching Search when you need it Query is created to fit the need Information Filtering Make sets of documents smaller Query is filter criterion Information Push Same as filtering, delivery is different
24 Information Retrieval Basics: Agenda Information Retrieval History Information Retrieval & Data Retrieval Searching & Browsing Information Retrieval Models
25 Information Retrieval System Architecture Aspects Query & languages IR models Documents Internal representation Pre- and post-processing Relevance feedback HCI
26 Information Retrieval Models Boolean Model Set theory & Boolean algebra Vector Model Non binary weights on dimensions Partial match Probabilistic Model Modeling IR in a probabilistic framework
27 Formal Definition of Models An information retrieval model is a quadruple [D, Q, F, R(q i, d j )] D is a set of logical views (or representations) for the documents in the collection. Q is a set of logical views (or representations) for the user needs or queries. F is a framework for modeling document representations, queries and their relationship. R(q i, d j ) is a ranking function which associates a real number with a query q i of Q and a document d j of D.
28 Definitions in Context of Text Retrieval index term word of a document expressing (part of) document semantics weight w i,j quantifies the importance of index term t i for document d j index term vector for document d j (having t different terms in all documents): d ( w, w,..., w ) j 1, j 2, j t, j
29 Boolean Model Based on set theory and Boolean algebra Set of index terms Query is Boolean expression Intuitive concept: Wide usage in bibliographic system Easy implementation and simple formalisms Drawbacks: Binary decision components (true/false) No relevance scale (relevant or not)
30 Boolean Model: Example k a (1,0,0) (1,1,0) k b (1,1,1) k c q k ( k k ) a b c
31 Boolean Model: DNF q k ( k k )... q (1,1,1) (1,1,0) (1,0,0) a b c dnf Express queries in disjunctive normal form (disjunction of conjunctive components) Each of the components is a binary weighted vector associated with (k a,k b,k c ) Weights w i,j {0,1}
32 Boolean Model: Ranking function sim( d, q) j 1 if qcc ( qcc qdnf ) ( k, g ( j ) ( )) i i d gi qcc 0 otherwise similarity is one if one of the conjunctive components in the query is exactly the same as the document term vector.
33 Boolean Model Advantages Clean formalisms Simplicity Disadvantages Might lead to too few / many results No notion of partial match Sequential ordering of terms not taken into account.
34 Vector Model Integrates the notion of partial match Non-binary weights (terms & queries) Degree of similarity computed d ( w, w,..., w ) j 1, j 2, j t, j q ( w, w,..., w ) 1, q 2, q t, q
35 Vector model: Similarity sim( d, q) j d d j j q q t i 1 w w i, j i, q t t 2 2 wi, j wi, q i 1 i 1
36 Vector Model: Example
37 Another Example: Document & Query: D = The quick brown fox jumps over the lazy dog Q = brown lazy fox Results: (1,1,1,1,1,1,1,2) t * (1,1,1,0,0,0,0,0) t = 3 sqrt(11) * sqrt(3) = sqrt(3*11) = sqrt(33) Similarity = 3 / sqrt(33) ~= sim( d, q) j d d j j q q t i 1 w w i, j i, q t t 2 2 wi, j wi, q i 1 i 1
38 Term weighting: TF*IDF Term weighting increases retrieval performance Term frequency How often does a term occur in a document? Most intuitive approach Inverse Document Frequency What is the information content of a term for a document collection? Compare to Information Theory of Shannon
39 Example: IDF 300 documents corpus 3 Term occurs in few documents: High weight for ranking, high discrimination 2,5 2 Term occurs in nearly every document: Low weight for ranking, low discrimination idf 1,5 1 0, Docum ent Frequency
40 Definitions: Normalized Term Frequency f i, j i, j freq i, j max ( freq ) l l, j... normalized term frequency freq... raw term frequency of term i in document j Maximum is computed over all terms in a document Terms which are not present in a document have a raw frequency of 0
41 Definitions: Inverse Document Frequency idf N n i i N log... inverse document frequency for term i n i... number of documents in the corpus... number of document in the corpus which contain term i Note that idf i is independent from the document. Note that the whole corpus has to be taken into account.
42 Why log(...) in IDF? idf value docum ent frequency Logarithm ic IDF No logarithm
43 TF*IDF TF*IDF is a very prominent weighting scheme Works fine, much better than TF or Boolean Quite easy to implement w f log i, j i, j N n i
44 Weighting of query terms w iq, 0.5 fiq, (0.5 ) log max ( f ) N n l l, q i Also using IDF of the corpus But TF is normalized differently TF > 0.5 Note: the query is not part of the corpus!
45 Vector Model Advantages Weighting schemes improve retrieval performance Partial matching allows retrieving documents that approximate query conditions Cosine coefficient allows ranked list output Disadvantages Term are assumed to be mutually independent
46 Simple example (i) Scenario Given a document corpus on birds: nearly each document (say 99%) contains the word bird someone is searching for a document about sparrow nest construction with a query sparrow bird nest construction Exactly the document which would satisfy the user needs does not have the word bird in it.
47 Simple example (ii) TF*IDF weighting knows upon the low discrimative power of the term bird The weight of this term is near to zero This term has virtually no influence on the result list.
48 Exercise 01 Given a document collection... Find the results to a query... Employing the Boolean model Employing the vector model (with TF*IDF) Some hints: Excel: Sheet on homepage Use functions Summenprodukt & Quadratesumme
49 Exercise 01 Document collection (6 documents) spatz, amsel, vogel, drossel, fink, falke, flug spatz, vogel, flug, nest, amsel, amsel, amsel kuckuck, nest, nest, ei, ei, ei, flug, amsel, amsel, vogel amsel, elster, elster, drossel, vogel, ei falke, katze, nest, nest, flug, vogel spatz, spatz, konstruktion, nest, ei Queries: spatz, vogel, nest, konstruktion amsel, ei, nest
50 Exercise 01 d1 d2 d3 d4 d6 d6 idf amsel drossel 1 1 ei elster 2 falke 1 1 fink 1 flug katze 1 konstruktion 1 kuckuck 1 nest spatz vogel ITEC, Klagenfurt University, Austria Multimedia
51 Exercise 02 Create a term vector of a text file with a language of your choice, raw frequency Create a graph
52 Exercise 02 Ideas... Make sure you remove ^[A-Za-z0-9] Make sure to normalize the case ie. by using lower case Print term + \t + value Import output to Excel
53 Don t forget.. To send me the results of Exercise 01 and the graph from Exercise 02
54 Thanks... for your attention!
VK Multimedia Information Systems
VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at Dienstags, 16.oo Uhr c.t., E.2.69 This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 2.0 License.
More informationVK Multimedia Information Systems
VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Information Retrieval Basics: Agenda Vector
More informationModern Information Retrieval
Modern Information Retrieval Chapter 3 Modeling Part I: Classic Models Introduction to IR Models Basic Concepts The Boolean Model Term Weighting The Vector Model Probabilistic Model Chap 03: Modeling,
More informationInformation Retrieval
s Information Retrieval Information system management system Model Processing of queries/updates Queries Answer Access to stored data Patrick Lambrix Department of Computer and Information Science Linköpings
More informationmodern database systems lecture 4 : information retrieval
modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections
More informationA NEW CLUSTER MERGING ALGORITHM OF SUFFIX TREE CLUSTERING
A NEW CLUSTER MERGING ALGORITHM OF SUFFIX TREE CLUSTERING Jianhua Wang, Ruixu Li Computer Science Department, Yantai University, Yantai, Shandong, China Abstract: Key words: Document clustering methods
More informationIntroduction to Information Retrieval. Lecture Outline
Introduction to Information Retrieval Lecture 1 CS 410/510 Information Retrieval on the Internet Lecture Outline IR systems Overview IR systems vs. DBMS Types, facets of interest User tasks Document representations
More informationDuke University. Information Searching Models. Xianjue Huang. Math of the Universe. Hubert Bray
Duke University Information Searching Models Xianjue Huang Math of the Universe Hubert Bray 24 July 2017 Introduction Information searching happens in our daily life, and even before the computers were
More informationFall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12
Fall 2016 CS646: Information Retrieval Lecture 2 - Introduction to Search Result Ranking Jiepu Jiang University of Massachusetts Amherst 2016/09/12 More course information Programming Prerequisites Proficiency
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught at UT Austin and Stanford) Information Retrieval
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 1: Introduction October 23 rd, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig What is Information
More informationMultimedia Information Systems
Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 6: Text Information Retrieval 1 Digital Video Library Meta-Data Meta-Data Similarity Similarity Search Search Analog Video Archive
More informationCS/INFO 431 Web Information Systems Spring Carl Lagoze Sadat Shami
CS/INFO 431 Web Information Systems Spring 2008 Carl Lagoze Sadat Shami What is the Web? Thinking beyond the browser The Web provides: an architecture and a data layer and the site A web information system
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationVK Multimedia Information Systems
VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Results Exercise 01 Exercise 02 Retrieval
More informationRepresentation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s
Summary agenda Summary: EITN01 Web Intelligence and Information Retrieval Anders Ardö EIT Electrical and Information Technology, Lund University March 13, 2013 A Ardö, EIT Summary: EITN01 Web Intelligence
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationDepartment of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _
COURSE DELIVERY PLAN - THEORY Page 1 of 6 Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ LP: CS6007 Rev. No: 01 Date: 27/06/2017 Sub.
More informationSocial Information Retrieval
Social Information Retrieval Sebastian Marius Kirsch kirschs@informatik.uni-bonn.de th November 00 Format of this talk about my diploma thesis advised by Prof. Dr. Armin B. Cremers inspired by research
More informationInformation Retrieval. Information Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent
More informationDesigning and Building an Automatic Information Retrieval System for Handling the Arabic Data
American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far
More informationBasic techniques. Text processing; term weighting; vector space model; inverted index; Web Search
Basic techniques Text processing; term weighting; vector space model; inverted index; Web Search Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea What is this course about? Processing Indexing Retrieving textual data (or audio, video, geo-spatial,, data) Fits in four
More informationBoolean Model. Hongning Wang
Boolean Model Hongning Wang CS@UVa Abstraction of search engine architecture Indexed corpus Crawler Ranking procedure Doc Analyzer Doc Representation Query Rep Feedback (Query) Evaluation User Indexer
More informationWhat is Information Retrieval (IR)? Information Retrieval vs. Databases. What is Information Retrieval (IR)? Why Should I Know about All This?
What is Information Retrieval (IR)? Information Retrieval and Web Search Engines Lecture 1: Introduction November 5, 2008 Wolf-Tilo Balke with Joachim Selke Institut für Informationssysteme Technische
More informationCHAPTER 8 Multimedia Information Retrieval
CHAPTER 8 Multimedia Information Retrieval Introduction Text has been the predominant medium for the communication of information. With the availability of better computing capabilities such as availability
More informationVannevar Bush. Information Retrieval. Prophetic: Hypertext. Historic Vision 2/8/17
Information Retrieval Vannevar Bush Director of the Office of Scientific Research and Development (1941-1947) Vannevar Bush,1890-1974 End of WW2 - what next big challenge for scientists? 1 Historic Vision
More informationEnabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines
Appears in WWW 04 Workshop: Measuring Web Effectiveness: The User Perspective, New York, NY, May 18, 2004 Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Anselm
More informationVK Multimedia Information Systems
VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at Dienstags, 16.oo Uhr This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Agenda Evaluations
More informationLRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier
LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier Wang Ding, Songnian Yu, Shanqing Yu, Wei Wei, and Qianfeng Wang School of Computer Engineering and Science, Shanghai University, 200072
More informationInternational Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.
A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish
More informationSearch Engine Architecture. Hongning Wang
Search Engine Architecture Hongning Wang CS@UVa CS@UVa CS4501: Information Retrieval 2 Document Analyzer Classical search engine architecture The Anatomy of a Large-Scale Hypertextual Web Search Engine
More informationAuthoritative K-Means for Clustering of Web Search Results
Authoritative K-Means for Clustering of Web Search Results Gaojie He Master in Information Systems Submission date: June 2010 Supervisor: Kjetil Nørvåg, IDI Co-supervisor: Robert Neumayer, IDI Norwegian
More informationInformation Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes
CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten
More informationInformation Retrieval. hussein suleman uct cs
Information Management Information Retrieval hussein suleman uct cs 303 2004 Introduction Information retrieval is the process of locating the most relevant information to satisfy a specific information
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationIn = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most
In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100
More informationModern Information Retrieval
Modern Information Retrieval Ricardo Baeza-Yates Berthier Ribeiro-Neto ACM Press NewYork Harlow, England London New York Boston. San Francisco. Toronto. Sydney Singapore Hong Kong Tokyo Seoul Taipei. New
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search IR models: Boolean model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Browsing boolean vector probabilistic
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2016/17 IR Chapter 00 Motivation What is Information Retrieval? The meaning of the term Information Retrieval (IR) can be
More information60-538: Information Retrieval
60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are
More informationCOMP6237 Data Mining Searching and Ranking
COMP6237 Data Mining Searching and Ranking Jonathon Hare jsh2@ecs.soton.ac.uk Note: portions of these slides are from those by ChengXiang Cheng Zhai at UIUC https://class.coursera.org/textretrieval-001
More informationModern information retrieval
Modern information retrieval Modelling Saif Rababah 1 Introduction IR systems usually adopt index terms to process queries Index term: a keyword or group of selected words any word (more general) Stemming
More informationDocument indexing, similarities and retrieval in large scale text collections
Document indexing, similarities and retrieval in large scale text collections Eric Gaussier Univ. Grenoble Alpes - LIG Eric.Gaussier@imag.fr Eric Gaussier Document indexing, similarities & retrieval 1
More informationOntology Driven Semantic Search
Ontology Driven Semantic Search DARIO BONINO, FULVIO CORNO, LAURA FARINETTI, ALESSIO BOSCA Dipartimento di Automatica ed Informatica Politecnico di Torino Corso Duca degli Abruzzi, 10129 Torino ITALY {dario.bonino,
More informationQuery Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4
Query Languages Berlin Chen 2005 Reference: 1. Modern Information Retrieval, chapter 4 Data retrieval Pattern-based querying The Kinds of Queries Retrieve docs that contains (or exactly match) the objects
More informationOutline. Lecture 2: EITN01 Web Intelligence and Information Retrieval. Previous lecture. Representation/Indexing (fig 1.
Outline Lecture 2: EITN01 Web Intelligence and Information Retrieval Anders Ardö EIT Electrical and Information Technology, Lund University January 23, 2013 A. Ardö, EIT Lecture 2: EITN01 Web Intelligence
More informationCSE 494: Information Retrieval, Mining and Integration on the Internet
CSE 494: Information Retrieval, Mining and Integration on the Internet Midterm. 18 th Oct 2011 (Instructor: Subbarao Kambhampati) In-class Duration: Duration of the class 1hr 15min (75min) Total points:
More informationCS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Text data and information retrieval Li Xiong Department of Mathematics and Computer Science Emory University Outline Information Retrieval (IR) Concepts Text Preprocessing Inverted
More informationInformation Retrieval and Data Mining Part 1 Information Retrieval
Information Retrieval and Data Mining Part 1 Information Retrieval 2005/6, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Retrieval - 1 1 Today's Question 1. Information
More informationRanking models in Information Retrieval: A Survey
Ranking models in Information Retrieval: A Survey R.Suganya Devi Research Scholar Department of Computer Science and Engineering College of Engineering, Guindy, Chennai, Tamilnadu, India Dr D Manjula Professor
More informationInternational ejournals
Available online at www.internationalejournals.com International ejournals ISSN 0976 1411 International ejournal of Mathematics and Engineering 112 (2011) 1023-1029 ANALYZING THE REQUIREMENTS FOR TEXT
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)"
CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Retrieval Models" Provide
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationClassic IR Models 5/6/2012 1
Classic IR Models 5/6/2012 1 Classic IR Models Idea Each document is represented by index terms. An index term is basically a (word) whose semantics give meaning to the document. Not all index terms are
More informationOptimizing Search Engines using Click-through Data
Optimizing Search Engines using Click-through Data By Sameep - 100050003 Rahee - 100050028 Anil - 100050082 1 Overview Web Search Engines : Creating a good information retrieval system Previous Approaches
More informationIntroduction & Administrivia
Introduction & Administrivia Information Retrieval Evangelos Kanoulas ekanoulas@uva.nl Section 1: Unstructured data Sec. 8.1 2 Big Data Growth of global data volume data everywhere! Web data: observation,
More informationChapter 1 AN INTRODUCTION TO TEXT MINING. 1. Introduction. Charu C. Aggarwal. ChengXiang Zhai
Chapter 1 AN INTRODUCTION TO TEXT MINING Charu C. Aggarwal IBM T. J. Watson Research Center Yorktown Heights, NY charu@us.ibm.com ChengXiang Zhai University of Illinois at Urbana-Champaign Urbana, IL czhai@cs.uiuc.edu
More informationContent-based Management of Document Access. Control
Content-based Management of Document Access Control Edgar Weippl, Ismail Khalil Ibrahim Software Competence Center Hagenberg Hauptstr. 99, A-4232 Hagenberg, Austria {edgar.weippl, ismail.khalil-ibrahim}@scch.at
More informationTwo-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California
Two-Dimensional Visualization for Internet Resource Discovery Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California 90089-0781 fshli, danzigg@cs.usc.edu
More informationCS 6320 Natural Language Processing
CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic
More informationCS506/606 - Topics in Information Retrieval
CS506/606 - Topics in Information Retrieval Instructors: Class time: Steven Bedrick, Brian Roark, Emily Prud hommeaux Tu/Th 11:00 a.m. - 12:30 p.m. September 25 - December 6, 2012 Class location: WCC 403
More informationLevel of analysis Finding Out About Chapter 3: 25 Sept 01 R. K. Belew
Overview The fascination with the subliminal, the camouflaged, and the encrypted is ancient. Getting a computer to munch away at long strings of letters from the Old Testament is not that different from
More informationMultimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency
Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Ralf Moeller Hamburg Univ. of Technology Acknowledgement Slides taken from presentation material for the following
More informationTEXT CHAPTER 5. W. Bruce Croft BACKGROUND
41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia
More informationhighest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate
Searching Information Servers Based on Customized Proles Technical Report USC-CS-96-636 Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California
More informationMounia Lalmas, Department of Computer Science, Queen Mary, University of London, United Kingdom,
XML Retrieval Mounia Lalmas, Department of Computer Science, Queen Mary, University of London, United Kingdom, mounia@acm.org Andrew Trotman, Department of Computer Science, University of Otago, New Zealand,
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationInformation Retrieval: Retrieval Models
CS473: Web Information Retrieval & Management CS-473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models
More informationA Model for Interactive Web Information Retrieval
A Model for Interactive Web Information Retrieval Orland Hoeber and Xue Dong Yang University of Regina, Regina, SK S4S 0A2, Canada {hoeber, yang}@uregina.ca Abstract. The interaction model supported by
More informationKohei Arai 1 Graduate School of Science and Engineering Saga University Saga City, Japan
Numerical Representation of Web Sites of Remote Sensing Satellite Data Providers and Its Application to Knowledge Based Information Retrievals with Natural Language Kohei Arai 1 Graduate School of Science
More informationINFSCI 2140 Information Storage and Retrieval Lecture 2: Models of Information Retrieval: Boolean model. Final Group Projects
INFSCI 2140 Information Storage and Retrieval Lecture 2: Models of Information Retrieval: Boolean model Peter Brusilovsky http://www2.sis.pitt.edu/~peterb/2140-051/ Final Group Projects Groups of variable
More informationHomework: Exercise 1. Homework: Exercise 2b. Homework: Exercise 2a. Homework: Exercise 2d. Homework: Exercise 2c
Homework: Exercise Information Retrieval and Web Search Engines How to classify a book about the Boolean retrieval model using the 998 ACM Computing Classification System? http://www.acm.org/about/class/ccs98-html
More informationIntroduction to Information Retrieval. (COSC 488) Spring Nazli Goharian. Course Outline
Introduction to Information Retrieval (COSC 488) Spring 2012 Nazli Goharian nazli@cs.georgetown.edu Course Outline Introduction Retrieval Strategies (Models) Retrieval Utilities Evaluation Indexing Efficiency
More informationSession 10: Information Retrieval
INFM 63: Information Technology and Organizational Context Session : Information Retrieval Jimmy Lin The ischool University of Maryland Thursday, November 7, 23 Information Retrieval What you search for!
More informationAN ENCODING TECHNIQUE BASED ON WORD IMPORTANCE FOR THE CLUSTERING OF WEB DOCUMENTS
- - Proceedings of the 9th international Conference on Neural Information Processing (ICONIP OZ), Vol. 5 Lip0 Wag, Jagath C. Rajapakse, Kunihiko Fukushima, Soo-Young Lee, and Xin Yao (Editors). AN ENCODING
More informationA probabilistic description-oriented approach for categorising Web documents
A probabilistic description-oriented approach for categorising Web documents Norbert Gövert Mounia Lalmas Norbert Fuhr University of Dortmund {goevert,mounia,fuhr}@ls6.cs.uni-dortmund.de Abstract The automatic
More informationInforma/on Retrieval. Text Search. CISC437/637, Lecture #23 Ben CartereAe. Consider a database consis/ng of long textual informa/on fields
Informa/on Retrieval CISC437/637, Lecture #23 Ben CartereAe Copyright Ben CartereAe 1 Text Search Consider a database consis/ng of long textual informa/on fields News ar/cles, patents, web pages, books,
More informationRe-ranking Documents Based on Query-Independent Document Specificity
Re-ranking Documents Based on Query-Independent Document Specificity Lei Zheng and Ingemar J. Cox Department of Computer Science University College London London, WC1E 6BT, United Kingdom lei.zheng@ucl.ac.uk,
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationInformation Retrieval (Part 1)
Information Retrieval (Part 1) Fabio Aiolli http://www.math.unipd.it/~aiolli Dipartimento di Matematica Università di Padova Anno Accademico 2008/2009 1 Bibliographic References Copies of slides Selected
More informationInformation Search and Retrieval System in Libraries
Information Search and Retrieval System in Libraries N Rupsing Naik A Madhava Rao Abstract A digital library comprises diverse collections of digital objects representing text, sound, maps, videos, photos,
More informationInformation Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.
Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.
More informationKeyword Extraction by KNN considering Similarity among Features
64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,
More informationA Model for Information Retrieval Agent System Based on Keywords Distribution
A Model for Information Retrieval Agent System Based on Keywords Distribution Jae-Woo LEE Dept of Computer Science, Kyungbok College, 3, Sinpyeong-ri, Pocheon-si, 487-77, Gyeonggi-do, Korea It2c@koreaackr
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationRelevance Feedback and Query Reformulation. Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price. Outline
Relevance Feedback and Query Reformulation Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price IR on the Internet, Spring 2010 1 Outline Query reformulation Sources of relevance
More informationSimilarity search in multimedia databases
Similarity search in multimedia databases Performance evaluation for similarity calculations in multimedia databases JO TRYTI AND JOHAN CARLSSON Bachelor s Thesis at CSC Supervisor: Michael Minock Examiner:
More informationAdaptable and Adaptive Web Information Systems. Lecture 1: Introduction
Adaptable and Adaptive Web Information Systems School of Computer Science and Information Systems Birkbeck College University of London Lecture 1: Introduction George Magoulas gmagoulas@dcs.bbk.ac.uk October
More informationvector space retrieval many slides courtesy James Amherst
vector space retrieval many slides courtesy James Allan@umass Amherst 1 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,
More informationRetrieval of Web Documents Using a Fuzzy Hierarchical Clustering
International Journal of Computer Applications (97 8887) Volume No., August 2 Retrieval of Documents Using a Fuzzy Hierarchical Clustering Deepti Gupta Lecturer School of Computer Science and Information
More informationLetter Pair Similarity Classification and URL Ranking Based on Feedback Approach
Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach P.T.Shijili 1 P.G Student, Department of CSE, Dr.Nallini Institute of Engineering & Technology, Dharapuram, Tamilnadu, India
More informationText Mining for Documents Annotation and Ontology Support
Text Mining for Documents Annotation and Ontology Support Jan Paralic and Peter Bednar Department of Cybernetics and Artificial Intelligence, Technical University of Kosice, Letná 9, 041 20 Kosice, Slovakia
More informationWeb Information Retrieval using WordNet
Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT
More informationInformation Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured
More informationDocument Clustering for Mediated Information Access The WebCluster Project
Document Clustering for Mediated Information Access The WebCluster Project School of Communication, Information and Library Sciences Rutgers University The original WebCluster project was conducted at
More informationLecture #3: PageRank Algorithm The Mathematics of Google Search
Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,
More information