Introduction to Information Retrieval
|
|
- Katherine Sutton
- 5 years ago
- Views:
Transcription
1 Introduction to Information Retrieval WS 2008/ Information Systems Group Mohammed AbuJarour
2 Contents 2 Basics of Information Retrieval (IR) Foundations: extensible Markup Language (XML) Probability & Statistics
3 What is IR? 3 IR by examples: Credit card example. Searching for a book or paper in a library system. Use Google to find a restaurant in Potsdam. Look through a product catalog to find an item. Browse a movie catalog to find an interesting movie.
4 What is IR? 4 IR by definition (scientifically): Representation of information items Storage of information items Organization of information items Access to information items Characterization of the User Information Need is not simple problem: Example: Find all web pages of researchers who studied in Germany and participated at least in 3 EU-funded projects and has been doing research in IR for more than 10 years! This description could not be used directly to get user s information need. Translated into query: typically a set of keywords. IR system (query) relevant information.
5 Data vs. Information Retrieval 5 Data Retrieval Deals with data that has well-defined structure and semantics. Clearly defined conditions, like regular expressions, relational algebra. Information Retrieval Deals with natural language text that is usually not well structured and could be semantically ambiguous. A set of keywords or terms. Each element in the result must satisfy the conditions in the query. An element in the result might be inaccurate with a small number of errors. Example: Name Grade Major Michael B Physics Martin C Mathematics John A Bioinformatik
6 Unstructured (text) vs. Structured (database) Data in 1996 and Unstructured Structured
7 The World Wide Web 7 Huge amount of information. Unusual and diverse documents, e.g., HTML, XHTML, XML, Multimedia... etc. Unusual and diverse users, queries, information needs. (Number of webpages) Size in billion webpages GYWA = Sorted on Google, Yahoo!, Windows Live Search (Msn Search) and Ask YGWA = Sorted on Yahoo!, Google, Windows Live Search (Msn Search) and Ask
8 The Retrieval Process 8 User need User Interface Text Text Text Operations User feedback Query Logical view Query Operations Logical view Inverted file Indexing DB Manager Module Searching Index Retrieved docs Text Database Ranked docs Ranking
9 Basic Concepts in IR 9 Documents: whatever units we have decided to build a retrieval system over, e.g., web page, XML file, pdf file, article, paper, book chapter, product... etc. Collection (Corpus): the group of documents over which we perform retrieval. Information need: the topic about which the user desires to know more. Query: what the user conveys to the computer (system) in an attempt to communicate the information need. Relevance: the degree to which the user perceives a document as containing information of value with respect to his personal information need. DocID: the unique serial number for each document in the collection.
10 Basic Concepts in IR 10 The effectiveness of IR System: the quality of it search results. Precision (Präzision): the fraction of the returned results that are relevant to the information need. Recall (Ausbeute): the fraction of the relevant documents in the collection that were returned by the system.
11 Basic Concepts in IR 10 The effectiveness of IR System: the quality of it search results. Precision (Präzision): the fraction of the returned results that are relevant to the information need. Recall (Ausbeute): the fraction of the relevant documents in the collection that were returned by the system. Collection
12 Basic Concepts in IR 10 The effectiveness of IR System: the quality of it search results. Precision (Präzision): the fraction of the returned results that are relevant to the information need. Recall (Ausbeute): the fraction of the relevant documents in the collection that were returned by the system. Collection Relevant
13 Basic Concepts in IR 10 The effectiveness of IR System: the quality of it search results. Precision (Präzision): the fraction of the returned results that are relevant to the information need. Recall (Ausbeute): the fraction of the relevant documents in the collection that were returned by the system. Collection Relevant Retrieved
14 Basic Concepts in IR 10 The effectiveness of IR System: the quality of it search results. Precision (Präzision): the fraction of the returned results that are relevant to the information need. Recall (Ausbeute): the fraction of the relevant documents in the collection that were returned by the system. Collection Relevant Retrieved Relevant Retrieved
15 Basic Concepts in IR 10 The effectiveness of IR System: the quality of it search results. Precision (Präzision): the fraction of the returned results that are relevant to the information need. Recall (Ausbeute): the fraction of the relevant documents in the collection that were returned by the system. Collection Relevant Retrieved Relevant Retrieved
16 Basic Concepts in IR 10 The effectiveness of IR System: the quality of it search results. Precision (Präzision): the fraction of the returned results that are relevant to the information need. Recall (Ausbeute): the fraction of the relevant documents in the collection that were returned by the system. Collection Relevant Retrieved Relevant Retrieved
17 Contents 11 Basics of Information Retrieval (IR) Foundations: extensible Markup Language (XML) Probability & Statistics
18 extensible Markup Language (XML) 12 Overview of XML: XML was designed to carry/ transfer data, not to display data. XML tags are not predefined. You must define your own tags. XML is designed to be self-descriptive. XML documents may conform to well-defined schemata, e.g., DTD, XSD. XML is a W3C Recommendation. Example:
19 extensible Markup Language (XML) 13 Elements and Relationships
20 extensible Markup Language (XML) 13 Elements and Relationships
21 extensible Markup Language (XML) 13 Elements and Relationships
22 extensible Markup Language (XML) 13 Elements and Relationships
23 extensible Markup Language (XML) 14 Modeling XML Documents Modeled as trees. Tree traversal algorithms: Preorder. Inorder. Postorder. Pre + Post: unique identification of nodes.
24 extensible Markup Language (XML) 14 Modeling XML Documents Modeled as trees. Tree traversal algorithms: Preorder. Inorder. Postorder. Pre + Post: unique identification of nodes
25 extensible Markup Language (XML) 14 Modeling XML Documents Modeled as trees. Tree traversal algorithms: Preorder. Inorder. Postorder. Pre + Post: unique identification of nodes.
26 extensible Markup Language (XML) 14 Modeling XML Documents Modeled as trees. Tree traversal algorithms: Preorder. Inorder. Postorder. Pre + Post: unique identification of nodes
27 extensible Markup Language (XML) 14 Modeling XML Documents Modeled as trees. Tree traversal algorithms: Preorder. Inorder. Postorder. Pre + Post: unique identification of nodes.
28 extensible Markup Language (XML) 14 Modeling XML Documents Modeled as trees. Tree traversal algorithms: Preorder. Inorder. Postorder. Pre + Post: unique identification of nodes
29 extensible Markup Language (XML) 15 XPath is a language for finding information in an XML document. XPath is used to navigate through elements and attributes in an XML document. What is XPath? XPath is a syntax for defining parts of an XML document. XPath uses path expressions to navigate in XML documents. XPath contains a library of standard functions. XPath is a W3C recommendation XPath uses path expressions to select nodes or node-sets in an XML document. The node is selected by following a path or steps.
30 extensible Markup Language (XML) 16 Examples: XPath Syntax XPATH /bookstore Description Selects the root element bookstore bookstore/book Selects all book elements that are children of bookstore //book Selects all book elements no matter where they are in the document bookstore//book Selects all book elements that are descendant of the bookstore element, no matter where they are under the bookstore element Selects all attributes that are named lang /bookstore/ book[price>35.00]/ title Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than //* Selects all elements in the document
31 extensible Markup Language (XML) 17 XPath expression consists of a sequence of one or more location steps. //descendant::book [position()=1] / child::title [contains ( text(), XML )].
32 extensible Markup Language (XML) 17 XPath expression consists of a sequence of one or more location steps. //descendant::book [position()=1] / child::title [contains ( text(), XML )]. Axis Name
33 extensible Markup Language (XML) 17 XPath expression consists of a sequence of one or more location steps. //descendant::book [position()=1] / child::title [contains ( text(), XML )]. Axis Name Node Test
34 extensible Markup Language (XML) 17 XPath expression consists of a sequence of one or more location steps. //descendant::book [position()=1] / child::title [contains ( text(), XML )]. Axis Name Node Test Predicate Expression
35 extensible Markup Language (XML) 17 XPath expression consists of a sequence of one or more location steps. //descendant::book [position()=1] / child::title [contains ( text(), XML )]. Axis Name Node Test Predicate Expression Location step
36 extensible Markup Language (XML) 17 XPath expression consists of a sequence of one or more location steps. //descendant::book [position()=1] / child::title [contains ( text(), XML )]. Axis Name Node Test Predicate Expression Predicates & Functions Location step
37 extensible Markup Language (XML) 17 XPath expression consists of a sequence of one or more location steps. //descendant::book [position()=1] / child::title [contains ( text(), XML )]. Axis Name Node Test Predicate Expression Predicates & Functions Location step Location step
38 extensible Markup Language (XML) 18 What is XQuery? XQuery is the language for querying XML data. XQuery for XML is like SQL for databases. XQuery is built on XPath expressions. XQuery is a W3C Recommendation Example: XQuery: for $x in doc("books.xml")/bookstore/book where $x/price>30 order by $x/title return $x/title Result: <title lang="eng"> Learning XML</title>
39 Contents 19 Basics of Information Retrieval (IR) Foundations: extensible Markup Language (XML) Probability & Statistics
40 Basics from Probability Theory 20 A probability space is a triple(ω, E, P) with a set Ω of elementary events (sample space), a family E (events) of subsets of Ω with Ω E which is closed under,, and with a countable number of operands, Note: with finite Ω usually E=2 Ω. a probability measure P: E [0,1] with P[Ω]=1 and P[ i Ai] = Σi P[Ai] for countably many, pairwise disjoint Ai. Properties of P: P[A] + P[ A] = 1 P[A B] = P[A] + P[B] P[A B] P[ ] = 0 (null / impossible event) P[Ω] = 1 (true / certain event)
41 Basics from Probability Theory Example 21 Drawing one playing card: Ω of elementary events={s,h,d,c} E (events)= {, S, H, D, C, SH, SD, SC, HD, HC, DC, SHD, SHC, HDC, SDC, SHDC} P[S] + P[ S] = ¼+¾= 1 P[H D] = P[H] + P[D] P[H D] = ¼+¼=½ P[ ] = 0 (null / impossible event) P[SHDC] = ¼+¼+¼+¼=1 Note: SHDC means S H D C.
42 Independence and Conditional Probabilities 22 Two events A, B of a probability space are independent if P[A B] = P[A] P[B]. A finite set of events A={A1,..., An} is independent if for every subset S A the equation holds. The conditional probability P[A B] of A under the condition (hypothesis) B is defined as:
43 Total Probability and Bayes Theorem 23 Total probability theorem: For a partitioning of Ω into events B1,..., Bn: Bayes theorem: P[A B] is called posterior probability. P[A] is called prior probability.
44 Total Probability and Bayes Theorem Example 24 M: a man is chosen. E: the one chosen is employed. P[M] = 500/900= 5/9 P[E] = 600/900 = 2/3 P[M E]= 460/900 = 46/90 Male Female Total Employed Unemployed Total P[M E] = P[M E]/P[E]= 460/600 = 23/30 P[E M] = P[M E]XP[E]/P[M] = (23/30 X 2/3) / (5/9) = 23/25
45 References 25 Baeza-Yates, R. A. and Ribeiro-Neto, B. 1999: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc. Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, David A. Grossman, Ophir Frieder: Information Retrieval: Algorithms and Heuristics, Springer, 2004 Walpole, Myers, Myers, Ye: Probability and Statistics for Engineers and Scientists. Prentice-Hall, Seventh Edition,
46 The End 26 Questions?
Information Retrieval and Web Search
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea What is this course about? Processing Indexing Retrieving textual data (or audio, video, geo-spatial,, data) Fits in four
More informationDepartment of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _
COURSE DELIVERY PLAN - THEORY Page 1 of 6 Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ LP: CS6007 Rev. No: 01 Date: 27/06/2017 Sub.
More informationIntroduction to Database Systems CSE 414
Introduction to Database Systems CSE 414 Lecture 14-15: XML CSE 414 - Spring 2013 1 Announcements Homework 4 solution will be posted tomorrow Midterm: Monday in class Open books, no notes beyond one hand-written
More informationData Formats and APIs
Data Formats and APIs Mike Carey mjcarey@ics.uci.edu 0 Announcements Keep watching the course wiki page (especially its attachments): https://grape.ics.uci.edu/wiki/asterix/wiki/stats170ab-2018 Ditto for
More informationIntroduction to Database Systems CSE 414
Introduction to Database Systems CSE 414 Lecture 13: XML and XPath 1 Announcements Current assignments: Web quiz 4 due tonight, 11 pm Homework 4 due Wednesday night, 11 pm Midterm: next Monday, May 4,
More information10/24/12. What We Have Learned So Far. XML Outline. Where We are Going Next. XML vs Relational. What is XML? Introduction to Data Management CSE 344
What We Have Learned So Far Introduction to Data Management CSE 344 Lecture 12: XML and XPath A LOT about the relational model Hand s on experience using a relational DBMS From basic to pretty advanced
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 11: XML and XPath 1 XML Outline What is XML? Syntax Semistructured data DTDs XPath 2 What is XML? Stands for extensible Markup Language 1. Advanced, self-describing
More informationmodern database systems lecture 4 : information retrieval
modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationIntroduction to Information Retrieval. Lecture Outline
Introduction to Information Retrieval Lecture 1 CS 410/510 Information Retrieval on the Internet Lecture Outline IR systems Overview IR systems vs. DBMS Types, facets of interest User tasks Document representations
More informationXML and Semi-structured Data
XML and Semi-structured Data Krzysztof Trawiński Winter Semester 2008 slides 1/27 Outline 1. Introduction 2. Usage & Design 3. Expressions 3.1 Xpath 3.2 Datatypes 3.3 FLWOR 4. Functions 5. Summary 6. Questions
More informationDesigning and Building an Automatic Information Retrieval System for Handling the Arabic Data
American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far
More informationCHAPTER THREE INFORMATION RETRIEVAL SYSTEM
CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost
More informationIntroduction to Database Systems CSE 444
Introduction to Database Systems CSE 444 Lecture 25: XML 1 XML Outline XML Syntax Semistructured data DTDs XPath Coverage of XML is much better in new edition Readings Sections 11.1 11.3 and 12.1 [Subset
More informationInternational Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.
A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish
More informationChapter 13 XML: Extensible Markup Language
Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server
More informationIntroduction to Information Retrieval. Hongning Wang
Introduction to Information Retrieval Hongning Wang CS@UVa What is information retrieval? 2 Why information retrieval Information overload It refers to the difficulty a person can have understanding an
More informationInformation Retrieval (Part 1)
Information Retrieval (Part 1) Fabio Aiolli http://www.math.unipd.it/~aiolli Dipartimento di Matematica Università di Padova Anno Accademico 2008/2009 1 Bibliographic References Copies of slides Selected
More information60-538: Information Retrieval
60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are
More informationOne of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while
1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling
More informationCS506/606 - Topics in Information Retrieval
CS506/606 - Topics in Information Retrieval Instructors: Class time: Steven Bedrick, Brian Roark, Emily Prud hommeaux Tu/Th 11:00 a.m. - 12:30 p.m. September 25 - December 6, 2012 Class location: WCC 403
More informationSection 5.5. Left subtree The left subtree of a vertex V on a binary tree is the graph formed by the left child L of V, the descendents
Section 5.5 Binary Tree A binary tree is a rooted tree in which each vertex has at most two children and each child is designated as being a left child or a right child. Thus, in a binary tree, each vertex
More informationCopyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1
Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.
More informationXML, DTD, and XPath. Announcements. From HTML to XML (extensible Markup Language) CPS 116 Introduction to Database Systems. Midterm has been graded
XML, DTD, and XPath CPS 116 Introduction to Database Systems Announcements 2 Midterm has been graded Graded exams available in my office Grades posted on Blackboard Sample solution and score distribution
More informationTrees 11/15/16. Chapter 11. Terminology. Terminology. Terminology. Terminology. Terminology
Chapter 11 Trees Definition of a general tree A general tree T is a set of one or more nodes such that T is partitioned into disjoint subsets: A single node r, the root Sets that are general trees, called
More information: Semantic Web (2013 Fall)
03-60-569: Web (2013 Fall) University of Windsor September 4, 2013 Table of contents 1 2 3 4 5 Definition of the Web The World Wide Web is a system of interlinked hypertext documents accessed via the Internet
More informationSeleniet XPATH Locator QuickRef
Seleniet XPATH Locator QuickRef Author(s) Thomas Eitzenberger Version 0.2 Status Ready for review Page 1 of 11 Content Selecting Nodes...3 Predicates...3 Selecting Unknown Nodes...4 Selecting Several Paths...5
More informationXML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson
Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Manning, Raghavan, and Schütze http://www.informationretrieval.org OVERVIEW Introduction Basic XML Concepts Challenges
More informationInformation Retrieval. Lecture 9 - Web search basics
Information Retrieval Lecture 9 - Web search basics Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 30 Introduction Up to now: techniques for general
More informationInformation Retrieval CSCI
Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1
More informationCS290N Summary Tao Yang
CS290N Summary 2015 Tao Yang Text books [CMS] Bruce Croft, Donald Metzler, Trevor Strohman, Search Engines: Information Retrieval in Practice, Publisher: Addison-Wesley, 2010. Book website. [MRS] Christopher
More informationCSE 544 Principles of Database Management Systems. Lecture 4: Data Models a Never-Ending Story
CSE 544 Principles of Database Management Systems Lecture 4: Data Models a Never-Ending Story 1 Announcements Project Start to think about class projects If needed, sign up to meet with me on Monday (I
More informationRelational Approach. Problem Definition
Relational Approach (COSC 416) Nazli Goharian nazli@cs.georgetown.edu Slides are mostly based on Information Retrieval Algorithms and Heuristics, Grossman, Frieder Grossman, Frieder 2002, 2010 1 Problem
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections
More information11. EXTENSIBLE MARKUP LANGUAGE (XML)
11. EXTENSIBLE MARKUP LANGUAGE (XML) Introduction Extensible Markup Language is a Meta language that describes the contents of the document. So these tags can be called as self-describing data tags. XML
More informationInformation Retrieval and Extraction
Information Retrieval and Extraction Berlin Chen (Picture from the TREC web site) Textbooks Textbook and References R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley Longman,
More informationEquivalence Detection Using Parse-tree Normalization for Math Search
Equivalence Detection Using Parse-tree Normalization for Math Search Mohammed Shatnawi Department of Computer Info. Systems Jordan University of Science and Tech. Jordan-Irbid (22110)-P.O.Box (3030) mshatnawi@just.edu.jo
More informationInformation Retrieval
s Information Retrieval Information system management system Model Processing of queries/updates Queries Answer Access to stored data Patrick Lambrix Department of Computer and Information Science Linköpings
More informationIntroduction to XML. Yanlei Diao UMass Amherst April 17, Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
Introduction to XML Yanlei Diao UMass Amherst April 17, 2008 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. 1 Structure in Data Representation Relational data is highly
More informationCS490W: Web Information Search & Management. CS-490W Web Information Search and Management. Luo Si. Department of Computer Science Purdue University
CS490W: Web Information Search & Management CS-490W Web Information Search and Management Luo Si Department of Computer Science Purdue University Overview Web: Growth of the Web The world produces between
More informationState of the Art and Trends in Search Engine Technology. Gerhard Weikum
State of the Art and Trends in Search Engine Technology Gerhard Weikum (weikum@mpi-inf.mpg.de) Commercial Search Engines Web search Google, Yahoo, MSN simple queries, chaotic data, many results key is
More informationRelational Approach. Problem Definition
Relational Approach (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Slides are mostly based on Information Retrieval Algorithms and Heuristics, Grossman & Frieder 1 Problem Definition Three conceptual
More informationWeb Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson
Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Overview Overview Introduction Classic
More informationMaking Retrieval Faster Through Document Clustering
R E S E A R C H R E P O R T I D I A P Making Retrieval Faster Through Document Clustering David Grangier 1 Alessandro Vinciarelli 2 IDIAP RR 04-02 January 23, 2004 D a l l e M o l l e I n s t i t u t e
More informationCS-490WIR Web Information Retrieval and Management. Luo Si
CS490W: Web Information Retrieval & Management CS-490WIR Web Information Retrieval and Management Luo Si Department of Computer Science Purdue University Overview Web: Growth of the Web The world produces
More informationUser Interaction: XML and JSON
User Interaction: XML and JSON Assoc. Professor Donald J. Patterson INF 133 Fall 2012 1 HTML and XML 1989: Tim Berners-Lee invents the Web with HTML as its publishing language Based on SGML Separates data
More informationOutline. Lecture 3: EITN01 Web Intelligence and Information Retrieval. Query languages - aspects. Previous lecture. Anders Ardö.
Outline Lecture 3: EITN01 Web Intelligence and Information Retrieval Anders Ardö EIT Electrical and Information Technology, Lund University February 5, 2013 A. Ardö, EIT Lecture 3: EITN01 Web Intelligence
More informationChapter 2. Architecture of a Search Engine
Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them
More informationIntroduction & Administrivia
Introduction & Administrivia Information Retrieval Evangelos Kanoulas ekanoulas@uva.nl Section 1: Unstructured data Sec. 8.1 2 Big Data Growth of global data volume data everywhere! Web data: observation,
More informationXML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9
XML databases Jan Chomicki University at Buffalo Jan Chomicki (University at Buffalo) XML databases 1 / 9 Outline 1 XML data model 2 XPath 3 XQuery Jan Chomicki (University at Buffalo) XML databases 2
More informationXML: Extensible Markup Language
XML: Extensible Markup Language CSC 375, Fall 2015 XML is a classic political compromise: it balances the needs of man and machine by being equally unreadable to both. Matthew Might Slides slightly modified
More informationInformation Retrieval
Introduction to Information Retrieval SCCS414: Information Storage and Retrieval Christopher Manning and Prabhakar Raghavan Lecture 10: Text Classification; Vector Space Classification (Rocchio) Relevance
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2016/17 IR Chapter 00 Motivation What is Information Retrieval? The meaning of the term Information Retrieval (IR) can be
More informationWeb Search Basics. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University
Web Search Basics Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction
More informationCMSC th Lecture: Graph Theory: Trees.
CMSC 27100 26th Lecture: Graph Theory: Trees. Lecturer: Janos Simon December 2, 2018 1 Trees Definition 1. A tree is an acyclic connected graph. Trees have many nice properties. Theorem 2. The following
More informationIntroduction to Information Retrieval. (COSC 488) Spring Nazli Goharian. Course Outline
Introduction to Information Retrieval (COSC 488) Spring 2012 Nazli Goharian nazli@cs.georgetown.edu Course Outline Introduction Retrieval Strategies (Models) Retrieval Utilities Evaluation Indexing Efficiency
More informationM359 Block5 - Lecture12 Eng/ Waleed Omar
Documents and markup languages The term XML stands for extensible Markup Language. Used to label the different parts of documents. Labeling helps in: Displaying the documents in a formatted way Querying
More informationChapter 11.!!!!Trees! 2011 Pearson Addison-Wesley. All rights reserved 11 A-1
Chapter 11!!!!Trees! 2011 Pearson Addison-Wesley. All rights reserved 11 A-1 2015-12-01 09:30:53 1/54 Chapter-11.pdf (#13) Terminology Definition of a general tree! A general tree T is a set of one or
More informationChapter 11.!!!!Trees! 2011 Pearson Addison-Wesley. All rights reserved 11 A-1
Chapter 11!!!!Trees! 2011 Pearson Addison-Wesley. All rights reserved 11 A-1 2015-03-25 21:47:41 1/53 Chapter-11.pdf (#4) Terminology Definition of a general tree! A general tree T is a set of one or more
More informationInformation Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured
More informationADT 2009 Other Approaches to XQuery Processing
Other Approaches to XQuery Processing Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ 12.11.2009: Schedule 2 RDBMS back-end support for XML/XQuery (1/2): Document Representation (XPath
More informationHYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL
International Journal of Mechanical Engineering & Computer Sciences, Vol.1, Issue 1, Jan-Jun, 2017, pp 12-17 HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL BOMA P.
More informationResearch Topics in Information Retrieval
Research Topics in Information Retrieval Cristina Ribeiro Sérgio Nunes FEUP / INESC TEC Information Systems Research Group http://infolab.fe.up.pt Information Retrieval "Information retrieval (IR) is finding
More informationData Structure Lecture#10: Binary Trees (Chapter 5) U Kang Seoul National University
Data Structure Lecture#10: Binary Trees (Chapter 5) U Kang Seoul National University U Kang (2016) 1 In This Lecture The concept of binary tree, its terms, and its operations Full binary tree theorem Idea
More informationEECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling
EECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling Doug Downey Based partially on slides by Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze Announcements Project progress report
More informationUNIT 3 XML DATABASES
UNIT 3 XML DATABASES XML Databases: XML Data Model DTD - XML Schema - XML Querying Web Databases JDBC Information Retrieval Data Warehousing Data Mining. 3.1. XML Databases: XML Data Model The common method
More informationXML and Databases. Outline. Outline - Lectures. Outline - Assignments. from Lecture 3 : XPath. Sebastian Maneth NICTA and UNSW
Outline XML and Databases Lecture 10 XPath Evaluation using RDBMS 1. Recall / encoding 2. XPath with //,, @, and text() 3. XPath with / and -sibling: use / size / level encoding Sebastian Maneth NICTA
More informationDatabases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016
+ Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html
More informationVannevar Bush. Information Retrieval. Prophetic: Hypertext. Historic Vision 2/8/17
Information Retrieval Vannevar Bush Director of the Office of Scientific Research and Development (1941-1947) Vannevar Bush,1890-1974 End of WW2 - what next big challenge for scientists? 1 Historic Vision
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 10: XML Retrieval Hinrich Schütze, Christina Lioma Center for Information and Language Processing, University of Munich 2010-07-12
More informationSearch Engine Architecture. Hongning Wang
Search Engine Architecture Hongning Wang CS@UVa CS@UVa CS4501: Information Retrieval 2 Document Analyzer Classical search engine architecture The Anatomy of a Large-Scale Hypertextual Web Search Engine
More informationWeb scraping and crawling, open data, markup languages and data shaping. Paolo Boldi Dipartimento di Informatica Università degli Studi di Milano
Web scraping and crawling, open data, markup languages and data shaping Paolo Boldi Dipartimento di Informatica Università degli Studi di Milano Data Analysis Three steps Data Analysis Three steps In every
More informationXML. extensible Markup Language. ... and its usefulness for linguists
XML extensible Markup Language... and its usefulness for linguists Thomas Mayer thomas.mayer@uni-konstanz.de Fachbereich Sprachwissenschaft, Universität Konstanz Seminar Computerlinguistik II (Miriam Butt)
More informationXML Query Languages. Content. Slide 1 Norbert Gövert. January 11, XML documents as trees. Slide 2. Overview on XML query languages XQL
XML Query Languages Slide 1 Norbert Gövert January 11, 2001 Content Slide 2 XML documents as trees Overview on XML query languages XQL XIRQL: IR extension for XQL 1 XML documents as trees Slide 3
More informationBinary Trees
Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what
More informationEfficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)
Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-
More informationModels for Document & Query Representation. Ziawasch Abedjan
Models for Document & Query Representation Ziawasch Abedjan Overview Introduction & Definition Boolean retrieval Vector Space Model Probabilistic Information Retrieval Language Model Approach Summary Overview
More informationInformatics 1: Data & Analysis
Informatics 1: Data & Analysis Lecture 9: Trees and XML Ian Stark School of Informatics The University of Edinburgh Tuesday 11 February 2014 Semester 2 Week 5 http://www.inf.ed.ac.uk/teaching/courses/inf1/da
More informationImprovement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation
Volume 3, No.5, May 24 International Journal of Advances in Computer Science and Technology Pooja Bassin et al., International Journal of Advances in Computer Science and Technology, 3(5), May 24, 33-336
More informationIn the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,
1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to
More informationBeyond Ten Blue Links Seven Challenges
Beyond Ten Blue Links Seven Challenges Ricardo Baeza-Yates VP of Yahoo! Research for EMEA & LatAm Barcelona, Spain Thanks to Andrei Broder, Yoelle Maarek & Prabhakar Raghavan Agenda Past and Present Wisdom
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationXML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW
XML and Databases Lecture 10 XPath Evaluation using RDBMS Sebastian Maneth NICTA and UNSW CSE@UNSW -- Semester 1, 2009 Outline 1. Recall pre / post encoding 2. XPath with //, ancestor, @, and text() 3.
More informationCSE 544 Data Models. Lecture #3. CSE544 - Spring,
CSE 544 Data Models Lecture #3 1 Announcements Project Form groups by Friday Start thinking about a topic (see new additions to the topic list) Next paper review: due on Monday Homework 1: due the following
More informationDiversification of Query Interpretations and Search Results
Diversification of Query Interpretations and Search Results Advanced Methods of IR Elena Demidova Materials used in the slides: Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova,
More informationCS145 Introduction. About CS145 Relational Model, Schemas, SQL Semistructured Model, XML
CS145 Introduction About CS145 Relational Model, Schemas, SQL Semistructured Model, XML 1 Content of CS145 Design of databases. E/R model, relational model, semistructured model, XML, UML, ODL. Database
More informationElementary IR: Scalable Boolean Text Search. (Compare with R & G )
Elementary IR: Scalable Boolean Text Search (Compare with R & G 27.1-3) Information Retrieval: History A research field traditionally separate from Databases Hans P. Luhn, IBM, 1959: Keyword in Context
More informationTree. A path is a connected sequence of edges. A tree topology is acyclic there is no loop.
Tree A tree consists of a set of nodes and a set of edges connecting pairs of nodes. A tree has the property that there is exactly one path (no more, no less) between any pair of nodes. A path is a connected
More informationModern Information Retrieval
Modern Information Retrieval Ricardo Baeza-Yates Berthier Ribeiro-Neto ACM Press NewYork Harlow, England London New York Boston. San Francisco. Toronto. Sydney Singapore Hong Kong Tokyo Seoul Taipei. New
More information8/1/2016. XSL stands for EXtensible Stylesheet Language. CSS = Style Sheets for HTML XSL = Style Sheets for XML. XSL consists of four parts:
XSL stands for EXtensible Stylesheet Language. CSS = Style Sheets for HTML XSL = Style Sheets for XML http://www.w3schools.com/xsl/ kasunkosala@yahoo.com 1 2 XSL consists of four parts: XSLT - a language
More informationEMERGING TECHNOLOGIES
EMERGING TECHNOLOGIES XML (Part 2): Data Model for XML documents and XPath Outline 1. Introduction 2. Structure of XML data 3. XML Document Schema 3.1. Document Type Definition (DTD) 3.2. XMLSchema 4.
More informationQuerying XML. COSC 304 Introduction to Database Systems. XML Querying. Example DTD. Example XML Document. Path Descriptions in XPath
COSC 304 Introduction to Database Systems XML Querying Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Querying XML We will look at two standard query languages: XPath
More informationA Universal Model for XML Information Retrieval
A Universal Model for XML Information Retrieval Maria Izabel M. Azevedo 1, Lucas Pantuza Amorim 2, and Nívio Ziviani 3 1 Department of Computer Science, State University of Montes Claros, Montes Claros,
More informationQuerying XML Data. Querying XML has two components. Selecting data. Construct output, or transform data
Querying XML Data Querying XML has two components Selecting data pattern matching on structural & path properties typical selection conditions Construct output, or transform data construct new elements
More informationCS 572: Information Retrieval. Lecture 1: Course Overview and Introduction 11 January 2016
CS 572: Information Retrieval Lecture 1: Course Overview and Introduction 11 January 2016 1/11/2016 CS 572: Information Retrieval. Spring 2016 1 Lecture Plan What is IR? (the big questions) Course overview
More informationF453 Module 7: Programming Techniques. 7.2: Methods for defining syntax
7.2: Methods for defining syntax 2 What this module is about In this module we discuss: explain how functions, procedures and their related variables may be used to develop a program in a structured way,
More informationChapter 2 XML, XML Schema, XSLT, and XPath
Summary Chapter 2 XML, XML Schema, XSLT, and XPath Ryan McAlister XML stands for Extensible Markup Language, meaning it uses tags to denote data much like HTML. Unlike HTML though it was designed to carry
More informationKikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML
Kikori-KS An Effective and Efficient Keyword Search System for Digital Libraries in XML Toshiyuki Shimizu 1, Norimasa Terada 2, and Masatoshi Yoshikawa 1 1 Graduate School of Informatics, Kyoto University
More informationAn Effective and Efficient Approach for Keyword-Based XML Retrieval. Xiaoguang Li, Jian Gong, Daling Wang, and Ge Yu retold by Daryna Bronnykova
An Effective and Efficient Approach for Keyword-Based XML Retrieval Xiaoguang Li, Jian Gong, Daling Wang, and Ge Yu retold by Daryna Bronnykova Search on XML documents 2 Why not use google? Why are traditional
More informationDon t just read it; fight it! Ask your own questions, look for your own examples, discover your own proofs. Is the hypothesis necessary?
Don t just read it; fight it! Ask your own questions, look for your own examples, discover your own proofs. Is the hypothesis necessary? Is the converse true? What happens in the classical special case?
More information