Relational Approach. Problem Definition
|
|
- Horace Ramsey
- 5 years ago
- Views:
Transcription
1 Relational Approach (COSC 416) Nazli Goharian Slides are mostly based on Information Retrieval Algorithms and Heuristics, Grossman, Frieder Grossman, Frieder 2002, Problem Definition Three conceptual data types: Structured: Objective (absolute) correctness; perfect accuracy; efficiency only issue (e.g.; relational database) Unstructured: Subjective (relative) correctness; accuracy & efficiency trade-off (e.g.; documents, images, video, sound). Semi-structured: Combination of structured and unstructured (e.g.; XML) Problem: Until recently, state of the art individually queried each type specific system and merged answers. Since the early 1990 s our IR Lab and its predecessors work has been to seamlessly integrate the searching of multiple data types. Grossman, Frieder 2002,
2 IR as application of RDBMS Typical IR functionality may be accommodated: Boolean query Relevance Ranking multiple similarity measures Proximity Search Advantages are: Integration Free parallel processing of IR Leverages work done to optimize DBMS Portability Grossman, Frieder 2002, Relational Inverted Index All inverted index entries <term> <list of documents> e.g., vehicle D1, D3, D4 results in: term vehicle vehicle vehicle docid D1 D3 D4 Grossman, Frieder 2002,
3 Text Retrieval Conference (TREC) Sample Document <DOC> <DOCNO> AP </DOCNO> <FILEID>AP-NR EST</FILEID> <FIRST>u i BC-Japan-Stocks </FIRST> <SECOND>BC-Japan-Stocks,0026</SECOND> <HEAD>Stocks Up In Tokyo</HEAD> <DATELINE>TOKYO (AP) </DATELINE> <TEXT> The Nikkei Stock Average closed at 29, points up points on the Tokyo Stock Exchange Wednesday. </TEXT> </DOC> Grossman, Frieder 2002, Relational Document Representation DOCUMENT docid docname headline dateline 28 AP Stocks Up In Tokyo TOKYO (AP) INDEX docid termcnt term 28 1 nikkei 28 2 stock 28 1 average 28 1 closed 28 2 points 28 1 up 28 1 tokyo 28 1 exchange 28 1 wednesday TERM term idf average 1.08 closed 1.08 exchange 1.00 nikkei 2.07 points 1.23 stock 1.00 tokyo 1.58 up 0.30 wednesday 0.60 Grossman, Frieder 2002,
4 Simplistic Models: Keyword and Boolean Searches Grossman, Frieder 2002, Keyword search Relational Approach: Keyword Search select i.docid from INDEX i, QUERY q where i.term = q.term Keyword search with stop word list select i.docid from INDEX i, QUERY q, STOPLIST s where (i.term = q.term) and (i.term <> s.term) Grossman, Frieder 2002,
5 Relational Approach: Boolean Search OR query select docid select docid from INDEX from INDEX where term = term1 where term = term1 OR union term = term2 OR select docid term = term3 OR from INDEX... where term = term2 term = termn union select docid from INDEX where term = term3... union select docid from INDEX where term = termn Grossman, Frieder 2002, Relational Approach: Boolean Search AND query select docid select docid from INDEX from INDEX a, INDEX b, INDEX c,... INDEX N where term = term1 where a.term = term1 AND intersect b.term = term2 AND select docid c.term = term3 AND from INDEX... where term = term2 n.term = termn AND intersect a.docid = b.docid AND select docid b.docid = c.docid AND from INDEX... where term = term3 N-1.docID = N.docID... intersect select docid from INDEX where term = termn Grossman, Frieder 2002,
6 Fixed Join-Count AND Queries Find all documents that contain all of the terms found in the QUERY relation: select i.docid from INDEX i, QUERY q where i.term = q.term group by i.docid having count (distinct (i.term)) = select count(distinct (term)) from QUERY Grossman, Frieder 2002, TAND Queries Find all documents that contain at least X of the terms found in the QUERY relation: select i.docid from INDEX i, QUERY q where i.term = q.term group by i.docid having count (distinct (i.term)) >= X Grossman, Frieder 2002,
7 Relevance Ranking: Vector Space and Probabilistic Models Grossman, Frieder 2002, Relational Document Representation (Single Term Processing) DOCUMENT docid docname headline dateline 28 AP Stocks Up In Tokyo TOKYO (AP) INDEX docid termcnt term 28 1 nikkei 28 2 stock 28 1 average 28 1 closed 28 2 points 28 1 up 28 1 tokyo 28 1 exchange 28 1 wednesday TERM term idf average 1.08 closed 1.08 exchange 1.00 nikkei 2.07 points 1.23 stock 1.00 tokyo 1.58 up 0.30 wednesday 0.60 Grossman, Frieder 2002,
8 Relational Query Representation ORIGINAL QUERY: nikkei stock exchange american stock exchange QUERY Term tf nikkei 1 Stock 2 exchange 2 american 1 Grossman, Frieder 2002, Vector Space Model Term Frequency (tf ik ): number of occurrences of term t k in document i Document Frequency (df j ): number of documents which contain t j Inverse Document Frequency (idf j ): log(d/df j ) where d is the total number of documents Notes: idf is a measure of uniqueness of a term across the collection tf is the frequency of a term in a given document Grossman, Frieder 2002,
9 Similarity Coefficients Several similarity coefficients based on the query vector X and the document vector Y are defined: Inner Prod uct x y Cosine Coefficient t i= 1 i Grossman, Frieder 2002, i t xiyi i= 1 t t 2 xi 2 yi i= 1 i= 1 Relevance Ranking: SQL for VSM dot product List all documents in the order of their similarity coefficient where the coefficient is computed using the dot product. SQL: (Query Weight * Document Weight) SELECT d.docid, d.docname, SUM(q.termcnt * t.idf * i.termcnt * t.idf) FROM QUERY q, INDEX i, TERM t, DOCUMENT d WHERE q.term = i.term AND q.term = t.term AND i.docid = d.docid GROUP BY d.docid, docname ORDER BY 3 DESC Grossman, Frieder 2002,
10 Relevance Ranking: SQL for Probabilistic Retrieval num_ terms log ( numdocs dfi ) tfid ( df ) (.75 doclength avgdoclength) i= 1 i / + tf id qtf SELECT d.docid, d.docname, SUM( LOG(((NumDocs - t.df) + 0.5) / (t.df + 0.5)) * ((2.2*i.tf) / (.3 + ((.75 * d.doclen)/avgdoclen) + i.tf)) * q.termcnt ) FROM INDEX i, TERM t, DOCUMENT d, QUERY q WHERE i.term = t.term AND i.docid = d.docid AND t.term = q.term GROUP BY d.docid, d.docname ORDER BY 3; Grossman, Frieder 2002, Relational Document Representation (Phrase Processing) DOCUMENT docid docname headline dateline 28 AP Stocks Up In Tokyo TOKYO (AP) INDEX docid termcnt phrase 28 1 nikkei stock 28 1 stock average 28 1 average closed 28 1 points up 28 1 tokyo stock 28 1 stock exchange 28 1 exchange wednesday PHRASE phrase idf average closed 2.49 exchange Wednesday 3.33 nikkei stock 2.14 points up 2.61 stock average 2.14 stock exchange 1.34 tokyo stock 2.10 Grossman, Frieder 2002,
11 Simple Phrase Parsing Simple phrase parser with the following rules Phrases do not include stop terms Phrases do not span across symbols Example: The Nikkei Stock Average closed at 29, points up points, on the Tokyo Stock Exchange Wednesday. Phrases: nikkei stock stock average average closed points up tokyo stock stock exchange exchange wednesday Grossman, Frieder 2002, Relational Document Representation (Proximity Search) DOCUMENT docid docname headline dateline 28 AP Stocks Up In Tokyo TOKYO (AP) INDEX docid term offset 28 nikkei stock average closed points up points tokyo stock exchange wednesday 60 Grossman, Frieder 2002,
12 Caveat: Logical Design versus Physical Implementation While the design shown represents the replication of the document identifier, in the physical implementation, clustered tables are actually used. That is, attribute values that are logically repeated many times are physically clustered by the attribute value to eliminate the replication storing only one copy for each unique attribute value. (Note clustered tables in Oracle 8) The I/O to retrieve a posting list is achieved via a grouped block read as opposed to retrieval across distributed storage. Grossman, Frieder 2002, 2010 Clustered Tables: Posting List Processing term docid tf stock { ( 1 1 ), ( 28 2 ), ( ), ( ), ( ), ( ), ( ), ( ), ( ) } Grossman, Frieder 2002,
13 Clustered Tables: Relevance Feedback Processing TRADITIONAL docid termcnt phrase 28 1 nikkei 28 2 stock 28 1 average 28 1 closed 28 2 points 28 1 up 28 1 tokyo 28 1 exchange 28 1 wednesday Grossman, Frieder 2002, 2010 CLUSTERED docid termcnt phrase 28 { ( 1 nikkei ), ( 2 stock ), ( 1 average ), ( 1 closed ), ( 2 points ), ( 1 up ), ( 1 tokyo ), ( 1 exchange ), ( 1 wednesday ) } Relational XML Approach: Architecture XML Collection Semistructured Query SQL Query DB Database Results Grossman, Frieder 2002,
14 XML Search XML provides tags that allow both structured and unstructured data to be represented in the same XML document. Frequently used as a common representation for a variety of document formats. Grossman, Frieder 2002, Introduction XML: Extensible Markup Language Defined by the WWW Consortium (W3C) Derived from SGML (Standard Generalized Markup Language), but simpler to use than SGML Documents have tags giving extra information about sections of the document E.g. <title> XML </title> <slide> Introduction </slide> Extensible, unlike HTML Users can add new tags, and separately specify how the tag should be handled for display A wide variety of tools is available for parsing, browsing and querying XML documents/data Silberschatz, Korth, Sudarshan 14
15 XML Introduction (Cont.) Tags make data (relatively) self-documenting E.g. <university> <department> <dept_name> Comp. Sci. </dept_name> <building> Taylor </building> <budget> </budget> </department> <course> <course_id> CS-101 </course_id> <title> Intro. to Computer Science </title> <dept_name> Comp. Sci </dept_name> <credits> 4 </credits> </course> </university> Silberschatz, Korth, Sudarshan Structure of XML Data Tag: label for a section of data Element: section of data beginning with <tagname> and ending with matching </tagname> Elements must be properly nested Proper nesting <course> <title>. </title> </course> Improper nesting <course> <title>. </course> </title> Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element. Every document must have a single top-level element Silberschatz, Korth, Sudarshan 15
16 Example of Nested Elements <purchase_order> <identifier> P-101 </identifier> <purchaser>. </purchaser> <itemlist> <item> <identifier> RS1 </identifier> <description> Atom powered rocket sled </description> <quantity> 2 </quantity> <price> </price> </item> <item> <identifier> SG2 </identifier> <description> Superb glue </description> <quantity> 1 </quantity> <unit-of-measure> liter </unit-of-measure> <price> </price> </item> </itemlist> </purchase_order> Silberschatz, Korth, Sudarshan Tree Model of XML Data An XML document is modeled as a tree, with nodes corresponding to elements and attributes Element nodes have child nodes, which can be attributes or subelements Text in an element is modeled as a text node child of the element Children of a node are ordered according to their order in the XML document Element and attribute nodes (except for the root node) have a single parent, which is an element node The root node has a single child, which is the root element of the document Silberschatz, Korth, Sudarshan 16
17 Relational XML Approach: Storage <book> <author> John Smith </author> <title> Colonial America </title> <publisher state= NY > NY Publishing </publisher> <book> book author title publisher (E) (E) (E) State= NY (A) Grossman, Frieder 2002, Relational XML Approach: Storage <book> <author> John Smith </author> <title> Colonial America </title> <publisher state= NY > NY Publishing </publisher> <book> pinndx pinndxnum parent tagtype tagpath atrname value 1 0 E 1 1 NULL 2 1 E 2 1 John Smith 3 1 E 3 1 Colonial America 4 1 E 4 1 NY Publishing 5 4 A 4 2 NY tagpath value atrname value 1 [book] 1 NULL 2 [book,author] 2 state 3 [book,title] 4 [book,publisher] atrnametbl tagpathtbl Grossman, Frieder 2002,
18 XQuery XQuery is a general purpose query language for XML data Currently being standardized by the World Wide Web Consortium (W3C) The textbook description is based on a January 2005 draft of the standard. The final version may differ, but major features likely to stay unchanged. XQuery is derived from the Quilt query language, which itself borrows from SQL, XQL and XML-QL XQuery uses a for let where order by result syntax for SQL from where SQL where order by SQL order by result SQL select let allows temporary variables, and has no equivalent in SQL Silberschatz, Korth, Sudarshan Relational XML Approach: Translation XQuery let $item := /order/item where $item[customer_id=3 and credit_card/type= VISA ] order by $item/id Return <item> <id>{$item/id}</id> <price>{$item/price}</price> </item> SQL SELECT DISTINCT q3.value, q3.pinndxnum, q4.value, q4.pinndxnum FROM pinndx q0, pinndx q1, pinndx q2, pinndx q2_1, pinndx q3, pinndx q4 WHERE q0.tagpath=1 AND q1.nvalue=3 AND q1.tagpath=4 AND q1.parent=q0.pinndxnum AND q2.tagpath=5 AND q2.value="visa" AND q2.parent=q2_1.pinndxnum AND q2_1.parent=q0.pinndxnum AND q3.tagpath=7 AND q3.parent=q0.pinndxnum AND q4.tagpath=8 AND q4.parent=q0.pinndxnum ORDER BY q4.nvalue Grossman, Frieder 2002,
Relational Approach. Problem Definition
Relational Approach (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Slides are mostly based on Information Retrieval Algorithms and Heuristics, Grossman & Frieder 1 Problem Definition Three conceptual
More informationOn Scalable Information Retrieval Systems
On Scalable Information Retrieval Systems Ophir Frieder ophir@ir.iit.edu www.ir.iit.edu 1 Scalable Search Structured Semi-structured Text, video, etc. Answer Engine 2 Scalable Information Systems: Characteristics
More informationCS34800, Fall 2016, Assignment 4
1 CS34800, Fall 2016, Assignment 4 Due 11:59pm 07 December (Wed.), 2016 * if you submit it by Tuesday, Dec. 06 then it will be graded and returned back to you after lecture on Friday, Dec. 09. If you submit
More informationEfficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)
Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-
More informationChapter 13 XML: Extensible Markup Language
Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server
More informationRelevance Feedback & Other Query Expansion Techniques
Relevance Feedback & Other Query Expansion Techniques (Thesaurus, Semantic Network) (COSC 416) Nazli Goharian nazli@cs.georgetown.edu Slides are mostly based on Informion Retrieval Algorithms and Heuristics,
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationText Analytics. Index-Structures for Information Retrieval. Ulf Leser
Text Analytics Index-Structures for Information Retrieval Ulf Leser Content of this Lecture Inverted files Storage structures Phrase and proximity search Building and updating the index Using a RDBMS Ulf
More informationA Parallel Relational Database Management System Approach to Relevance Feedback in Information Retrieval
A Parallel Relational Database Management System Approach to Relevance Feedback in Information Retrieval Carol Lundquist 1, Ophir Frieder 2, David Grossman 3, and David O. Holmes 4 Abstract. A scalable,
More informationCopyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1
Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.
More informationUsing a Relational Database for Scalable XML Search
Using a Relational Database for Scalable XML Search Rebecca J. Cathey, Steven M. Beitzel, Eric C. Jensen, David Grossman, Ophir Frieder Information Retrieval Laboratory Department of Computer Science Illinois
More informationDatabase System Concepts
s Design Chapter 1: Introduction Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2009/2010 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationXML Query Languages. Content. Slide 1 Norbert Gövert. January 11, XML documents as trees. Slide 2. Overview on XML query languages XQL
XML Query Languages Slide 1 Norbert Gövert January 11, 2001 Content Slide 2 XML documents as trees Overview on XML query languages XQL XIRQL: IR extension for XQL 1 XML documents as trees Slide 3
More informationADT 2005 Lecture 7 Chapter 10: XML
ADT 2005 Lecture 7 Chapter 10: XML Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ Database System Concepts Silberschatz, Korth and Sudarshan The Challenge: Comic Strip Finder The Challenge:
More informationChapter 2. Architecture of a Search Engine
Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them
More informationIn = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most
In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100
More informationText Analytics. Index-Structures for Information Retrieval. Ulf Leser
Text Analytics Index-Structures for Information Retrieval Ulf Leser Content of this Lecture Inverted files Storage structures Phrase and proximity search Building and updating the index Using a RDBMS Ulf
More informationCS 582 Database Management Systems II
Review of SQL Basics SQL overview Several parts Data-definition language (DDL): insert, delete, modify schemas Data-manipulation language (DML): insert, delete, modify tuples Integrity View definition
More informationApproaches. XML Storage. Storing arbitrary XML. Mapping XML to relational. Mapping the link structure. Mapping leaf values
XML Storage CPS 296.1 Topics in Database Systems Approaches Text files Use DOM/XSLT to parse and access XML data Specialized DBMS Lore, Strudel, exist, etc. Still a long way to go Object-oriented DBMS
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval WS 2008/2009 25.11.2008 Information Systems Group Mohammed AbuJarour Contents 2 Basics of Information Retrieval (IR) Foundations: extensible Markup Language (XML)
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections
More informationQuery Processing Strategies and Optimization
Query Processing Strategies and Optimization CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/25/12 Agenda Check-in Design Project Presentations Query Processing Programming Project
More informationOne of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while
1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling
More informationCOMP6237 Data Mining Searching and Ranking
COMP6237 Data Mining Searching and Ranking Jonathon Hare jsh2@ecs.soton.ac.uk Note: portions of these slides are from those by ChengXiang Cheng Zhai at UIUC https://class.coursera.org/textretrieval-001
More informationMultimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency
Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Ralf Moeller Hamburg Univ. of Technology Acknowledgement Slides taken from presentation material for the following
More informationLecture 5: Information Retrieval using the Vector Space Model
Lecture 5: Information Retrieval using the Vector Space Model Trevor Cohn (tcohn@unimelb.edu.au) Slide credits: William Webber COMP90042, 2015, Semester 1 What we ll learn today How to take a user query
More informationA DTD-Syntax-Tree Based XML file Modularization Browsing Technique
IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.2A, February 2006 127 A DTD-Syntax-Tree Based XML file Modularization Browsing Technique Zhu Zhengyu 1, Changzhi Li, Yuan
More informationIntroduction. " Documents have tags giving extra information about sections of the document
Chapter 10: XML Introduction! XML: Extensible Markup Language! Defined by the WWW Consortium (W3C)! Originally intended as a document markup language not a database language " Documents have tags giving
More informationCS425 Fall 2016 Boris Glavic Chapter 1: Introduction
CS425 Fall 2016 Boris Glavic Chapter 1: Introduction Modified from: Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Textbook: Chapter 1 1.2 Database Management System (DBMS)
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More informationCS54701: Information Retrieval
CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationIntro to Peer-to-Peer Search
Intro to Peer-to-Peer Search (COSC 416) Nazli Goharian nazli@cs.georgetown.edu 1 Outline Peer-to-peer historical perspective Problem definition Local client data processing Ranking functions Metadata copying
More informationEMERGING TECHNOLOGIES. XML Documents and Schemas for XML documents
EMERGING TECHNOLOGIES XML Documents and Schemas for XML documents Outline 1. Introduction 2. Structure of XML data 3. XML Document Schema 3.1. Document Type Definition (DTD) 3.2. XMLSchema 4. Data Model
More informationXML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson
Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Manning, Raghavan, and Schütze http://www.informationretrieval.org OVERVIEW Introduction Basic XML Concepts Challenges
More informationXML: extensible Markup Language
Datamodels XML: extensible Markup Language Slides are based on slides from Database System Concepts Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Many examples are from
More informationIntroduction. " Documents have tags giving extra information about sections of the document
Chapter 10: XML Introduction! XML: Extensible Markup Language! Defined by the WWW Consortium (W3C)! Originally intended as a document markup language not a database language " Documents have tags giving
More informationmodern database systems lecture 4 : information retrieval
modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation
More informationSemistructured Content
On our first day Semistructured Content 1 Structured data : database system tagged, typed well-defined semantic interpretation Semi-structured data: tagged - (HTML?) some help with semantic interpretation
More informationInformation Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.
Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.
More informationM359 Block5 - Lecture12 Eng/ Waleed Omar
Documents and markup languages The term XML stands for extensible Markup Language. Used to label the different parts of documents. Labeling helps in: Displaying the documents in a formatted way Querying
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationSemistructured Content
On our first day Semistructured Content 1 Structured data : database system tagged, typed well-defined semantic interpretation Semi-structured data: tagged - XML (HTML?) some help with semantic interpretation
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationCS34800 Information Systems. Course Overview Prof. Walid Aref January 8, 2018
CS34800 Information Systems Course Overview Prof. Walid Aref January 8, 2018 1 Why this Course? Managing Data is one of the primary uses of computers This course covers the foundations of organized data
More informationSemistructured Content
On our first day Semistructured Content 1 Structured data : database system tagged, typed well-defined semantic interpretation Semi-structured data: tagged - XML (HTML?) some help with semantic interpretation
More informationSemistructured Data and XML
Semistructured Data and XML Computer Science E-66 Harvard University David G. Sullivan, Ph.D. Structured Data The logical models we've covered thus far all use some type of schema to define the structure
More informationΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου
Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Σηµερινό ερώτηµα Typically we want to retrieve the top K docs (in the cosine ranking for the query) not totally order all docs in the corpus can we pick off docs
More informationSQL Data Query Language
SQL Data Query Language André Restivo 1 / 68 Index Introduction Selecting Data Choosing Columns Filtering Rows Set Operators Joining Tables Aggregating Data Sorting Rows Limiting Data Text Operators Nested
More informationRepresentation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s
Summary agenda Summary: EITN01 Web Intelligence and Information Retrieval Anders Ardö EIT Electrical and Information Technology, Lund University March 13, 2013 A Ardö, EIT Summary: EITN01 Web Intelligence
More informationInformation Retrieval
Information Retrieval Natural Language Processing: Lecture 12 30.11.2017 Kairit Sirts Homework 4 things that seemed to work Bidirectional LSTM instead of unidirectional Change LSTM activation to sigmoid
More informationBasic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert
More informationCS 6320 Natural Language Processing
CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic
More informationMore on indexing CE-324: Modern Information Retrieval Sharif University of Technology
More on indexing CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Plan
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More informationXML and information exchange. XML extensible Markup Language XML
COS 425: Database and Information Management Systems XML and information exchange 1 XML extensible Markup Language History 1988 SGML: Standard Generalized Markup Language Annotate text with structure 1992
More informationInformation Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured
More informationWeb Search Ranking. (COSC 488) Nazli Goharian Evaluation of Web Search Engines: High Precision Search
Web Search Ranking (COSC 488) Nazli Goharian nazli@cs.georgetown.edu 1 Evaluation of Web Search Engines: High Precision Search Traditional IR systems are evaluated based on precision and recall. Web search
More informationCS490W XML data and Retrieval XML and Retrieval: Outline
CS490W XML data and Retrieval Luo Si Department of Computer Science Purdue University XML and Retrieval: Outline Outline: Semi-Structure Data XML, Examples, Application XML Search XQuery XIRQL Text-Based
More informationXML-QE: A Query Engine for XML Data Soures
XML-QE: A Query Engine for XML Data Soures Bruce Jackson, Adiel Yoaz {brucej, adiel}@cs.wisc.edu 1 1. Introduction XML, short for extensible Markup Language, may soon be used extensively for exchanging
More informationComparative Analysis of Sparse Matrix Algorithms For Information Retrieval
Comparative Analysis of Sparse Matrix Algorithms For Information Retrieval Nazli Goharian, Ankit Jain, Qian Sun Information Retrieval Laboratory Illinois Institute of Technology Chicago, Illinois {goharian,ajain,qian@ir.iit.edu}
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More informationCS3DB3/SE4DB3/SE6M03 TUTORIAL
CS3DB3/SE4DB3/SE6M03 TUTORIAL Mei Jiang Jan 30/Feb 1, 013 Outline Connecting db server ANY, ALL, IN, EXISTS Operators Set Operations Connecting db server On Campus MacSecure Connect directly MacConnect
More informationChapter 1: Introduction
Chapter 1: Introduction Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query
More informationData Formats and APIs
Data Formats and APIs Mike Carey mjcarey@ics.uci.edu 0 Announcements Keep watching the course wiki page (especially its attachments): https://grape.ics.uci.edu/wiki/asterix/wiki/stats170ab-2018 Ditto for
More informationUNIT 3 XML DATABASES
UNIT 3 XML DATABASES XML Databases: XML Data Model DTD - XML Schema - XML Querying Web Databases JDBC Information Retrieval Data Warehousing Data Mining. 3.1. XML Databases: XML Data Model The common method
More informationQuery Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4
Query Languages Berlin Chen 2005 Reference: 1. Modern Information Retrieval, chapter 4 Data retrieval Pattern-based querying The Kinds of Queries Retrieve docs that contains (or exactly match) the objects
More informationThe SQL data-definition language (DDL) allows defining :
Introduction to SQL Introduction to SQL Overview of the SQL Query Language Data Definition Basic Query Structure Additional Basic Operations Set Operations Null Values Aggregate Functions Nested Subqueries
More informationQuery Evaluation Strategies
Introduction to Search Engine Technology Term-at-a-Time and Document-at-a-Time Evaluation Ronny Lempel Yahoo! Research (Many of the following slides are courtesy of Aya Soffer and David Carmel, IBM Haifa
More informationWriting Queries Using Microsoft SQL Server 2008 Transact-SQL. Overview
Writing Queries Using Microsoft SQL Server 2008 Transact-SQL Overview The course has been extended by one day in response to delegate feedback. This extra day will allow for timely completion of all the
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 10: XML Retrieval Hinrich Schütze, Christina Lioma Center for Information and Language Processing, University of Munich 2010-07-12
More informationChapter 3: Introduction to SQL
Chapter 3: Introduction to SQL Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 3: Introduction to SQL Overview of the SQL Query Language Data Definition Basic Query
More informationQ: Given a set of keywords how can we return relevant documents quickly?
Keyword Search Traditional B+index is good for answering 1-dimensional range or point query Q: What about keyword search? Geo-spatial queries? Q: Documents on Computer Science? Q: Nearby coffee shops?
More informationIntroduction to Information Retrieval. Lecture Outline
Introduction to Information Retrieval Lecture 1 CS 410/510 Information Retrieval on the Internet Lecture Outline IR systems Overview IR systems vs. DBMS Types, facets of interest User tasks Document representations
More informationCIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.
CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Text To Knowledge IR and Boolean Search Text to Knowledge (IE)
More informationOverview. Introduction. Introduction XML XML. Lecture 16 Introduction to XML. Boriana Koleva Room: C54
Overview Lecture 16 Introduction to XML Boriana Koleva Room: C54 Email: bnk@cs.nott.ac.uk Introduction The Syntax of XML XML Document Structure Document Type Definitions Introduction Introduction SGML
More informationOverview. Structured Data. The Structure of Data. Semi-Structured Data Introduction to XML Querying XML Documents. CMPUT 391: XML and Querying XML
Database Management Systems Winter 2004 CMPUT 391: XML and Querying XML Lecture 12 Overview Semi-Structured Data Introduction to XML Querying XML Documents Dr. Osmar R. Zaïane University of Alberta Chapter
More informationChapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"
Chapter 11: Indexing and Hashing" Database System Concepts, 6 th Ed.! Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use " Chapter 11: Indexing and Hashing" Basic Concepts!
More informationXML: Extensible Markup Language
XML: Extensible Markup Language CSC 375, Fall 2015 XML is a classic political compromise: it balances the needs of man and machine by being equally unreadable to both. Matthew Might Slides slightly modified
More informationCSCB20 Week 4. Introduction to Database and Web Application Programming. Anna Bretscher Winter 2017
CSCB20 Week 4 Introduction to Database and Web Application Programming Anna Bretscher Winter 2017 Last Week Intro to SQL and MySQL Mapping Relational Algebra to SQL queries Focused on queries to start
More informationCS3DB3/SE4DB3/SE6M03 TUTORIAL
CS3DB3/SE4DB3/SE6M03 TUTORIAL Mei Jiang Feb 13/15, 2013 Outline Relational Algebra SQL and Relational Algebra Examples Relational Algebra Basic Operators Select: C (R) where C is a list of conditions Project:
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationChapter 3: Introduction to SQL
Chapter 3: Introduction to SQL Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 3: Introduction to SQL Overview of the SQL Query Language Data Definition Basic Query
More informationFull-Text Indexing For Heritrix
Full-Text Indexing For Heritrix Project Advisor: Dr. Chris Pollett Committee Members: Dr. Mark Stamp Dr. Jeffrey Smith Darshan Karia CS298 Master s Project Writing 1 2 Agenda Introduction Heritrix Design
More informationCOMP9321 Web Application Engineering. Extensible Markup Language (XML)
COMP9321 Web Application Engineering Extensible Markup Language (XML) Dr. Basem Suleiman Service Oriented Computing Group, CSE, UNSW Australia Semester 1, 2016, Week 4 http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2442
More informationInformation Retrieval CSCI
Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1
More informationBoolean Model. Hongning Wang
Boolean Model Hongning Wang CS@UVa Abstraction of search engine architecture Indexed corpus Crawler Ranking procedure Doc Analyzer Doc Representation Query Rep Feedback (Query) Evaluation User Indexer
More informationModels for Document & Query Representation. Ziawasch Abedjan
Models for Document & Query Representation Ziawasch Abedjan Overview Introduction & Definition Boolean retrieval Vector Space Model Probabilistic Information Retrieval Language Model Approach Summary Overview
More informationPre-Discussion. XQuery: An XML Query Language. Outline. 1. The story, in brief is. Other query languages. XML vs. Relational Data
Pre-Discussion XQuery: An XML Query Language D. Chamberlin After the presentation, we will evaluate XQuery. During the presentation, think about consequences of the design decisions on the usability of
More informationInformation Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes
CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten
More informationRelational Algebra. ICOM 5016 Database Systems. Roadmap. R.A. Operators. Selection. Example: Selection. Relational Algebra. Fundamental Property
Relational Algebra ICOM 06 Database Systems Relational Algebra Dr. Amir H. Chinaei Department of Electrical and Computer Engineering University of Puerto Rico, Mayagüez Slides are adapted from: Introduction.
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable References Bigtable: A Distributed Storage System for Structured Data. Fay Chang et. al. OSDI
More informationDocument indexing, similarities and retrieval in large scale text collections
Document indexing, similarities and retrieval in large scale text collections Eric Gaussier Univ. Grenoble Alpes - LIG Eric.Gaussier@imag.fr Eric Gaussier Document indexing, similarities & retrieval 1
More informationClustering (COSC 416) Nazli Goharian. Document Clustering.
Clustering (COSC 416) Nazli Goharian nazli@cs.georgetown.edu 1 Document Clustering. Cluster Hypothesis : By clustering, documents relevant to the same topics tend to be grouped together. C. J. van Rijsbergen,
More informationHome Page. Title Page. Page 1 of 14. Go Back. Full Screen. Close. Quit
Page 1 of 14 Retrieving Information from the Web Database and Information Retrieval (IR) Systems both manage data! The data of an IR system is a collection of documents (or pages) User tasks: Browsing
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411 1 Extensible
More informationPrinciples of Data Management
Principles of Data Management Alvin Lin August 2018 - December 2018 Structured Query Language Structured Query Language (SQL) was created at IBM in the 80s: SQL-86 (first standard) SQL-89 SQL-92 (what
More informationXQuery. Leonidas Fegaras University of Texas at Arlington. Web Databases and XML L7: XQuery 1
XQuery Leonidas Fegaras University of Texas at Arlington Web Databases and XML L7: XQuery 1 XQuery Influenced by SQL Based on XPath Purely functional language may access elements from documents, may construct
More informationData about data is database Select correct option: True False Partially True None of the Above
Within a table, each primary key value. is a minimal super key is always the first field in each table must be numeric must be unique Foreign Key is A field in a table that matches a key field in another
More information