Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan

Similar documents
Searching Databases with Keywords

Implementation of Skyline Sweeping Algorithm

Effective Top-k Keyword Search in Relational Databases Considering Query Semantics

Top-k Keyword Search Over Graphs Based On Backward Search

Keyword Search over Hybrid XML-Relational Databases

Volume 2, Issue 11, November 2014 International Journal of Advance Research in Computer Science and Management Studies

Keyword query interpretation over structured data

Achieving effective keyword ranked search by using TF-IDF and cosine similarity

Extending Keyword Search to Metadata in Relational Database

Roadmap. Roadmap. Ranking Web Pages. PageRank. Roadmap. Random Walks in Ranking Query Results in Semistructured Databases

Information Retrieval Using Keyword Search Technique

DbSurfer: A Search and Navigation Tool for Relational Databases

Keyword query interpretation over structured data

Effective Keyword Search in Relational Databases for Lyrics

Supporting Fuzzy Keyword Search in Databases

Intranet Search. Exploiting Databases for Document Retrieval. Christoph Mangold Universität Stuttgart

Ontology Based Prediction of Difficult Keyword Queries

A FRAMEWORK FOR PROCESSING KEYWORD-BASED QUERIES IN RELATIONAL DATABASES

Keyword Search in Databases

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 2.114

Efficient Engines for Keyword Proximity Search

MAINTAIN TOP-K RESULTS USING SIMILARITY CLUSTERING IN RELATIONAL DATABASE

Evaluation of Keyword Search System with Ranking

Department of Computer Engineering, Sharadchandra Pawar College of Engineering, Dumbarwadi, Otur, Pune, Maharashtra, India

ISSN Vol.08,Issue.18, October-2016, Pages:

Hierarchical Result Views for Keyword Queries over Relational Databases

PACOKS: Progressive Ant-Colony-Optimization-Based Keyword Search over Relational Databases

Semantic Search Focus: IR on Structured Data

Results Clustering for Keyword Search over Relational Database

Relational Keyword Search System

RELATIVE QUERY RESULTS RANKING FOR ONLINE USERS IN WEB DATABASES

SPARK: Top-k Keyword Query in Relational Database

Effective Semantic Search over Huge RDF Data

ISSN Vol.05,Issue.07, July-2017, Pages:

International Journal of Advance Engineering and Research Development. Performance Enhancement of Search System

Querying Wikipedia Documents and Relationships

Novel Materialized View Selection in a Multidimensional Database

Using Proximity Search to Estimate Authority Flow

Keyword Join: Realizing Keyword Search in P2P-based Database Systems

Toward Scalable Keyword Search over Relational Data

Keyword Search in Databases

Database Selection and Keyword Search of Structured Databases: Powerful Search for Naive Users

EFFICIENT APPROACH FOR DETECTING HARD KEYWORD QUERIES WITH MULTI-LEVEL NOISE GENERATION

Integrating and Querying Source Code of Programs Working on a Database

Query Segmentation Using Conditional Random Fields

Efficiently Enumerating Results of Keyword Search

KeyLabel Algorithms for Keyword Search in Large Graphs

Processing Recommender Top-N Queries in Relational Databases

Information Retrieval Overview

Efficient Keyword Search over Relational Data Streams

Interactive keyword-based access to large-scale structured datasets

Principles of Dataspaces

Answering Top K Queries Efficiently with Overlap in Sources and Source Paths

Efficient Keyword Search Across Heterogeneous Relational Databases

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

Efficient Prediction of Difficult Keyword Queries over Databases

A Proximity-Based Fallback Model for Hybrid Web Recommender Systems

INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN EFFECTIVE KEYWORD SEARCH OF FUZZY TYPE IN XML

ABSTRACT I. INTRODUCTION

WEB DATA EXTRACTION METHOD BASED ON FEATURED TERNARY TREE

AN INTERACTIVE FORM APPROACH FOR DATABASE QUERIES THROUGH F-MEASURE

A System for Query-Specific Document Summarization *

A System for Query-Specific Document Summarization

Efficient Keyword Search for Smallest LCAs in XML Databases

A NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY

RDBMS. A Project Report Submitted in partial fulfilment of the requirements for the Degree of Master of Engineering

Effici ent Type-Ahead Search on Rel ati onal D ata: a TASTIER Approach

SpiderX: Fast XML Exploration System

ObjectRank: Authority-Based Keyword Search in Databases

Searching the Web What is this Page Known for? Luis De Alba

Ranked Keyword Query on Semantic Web Data

Keyword Join: Realizing Keyword Search for Information Integration

A Graph Method for Keyword-based Selection of the top-k Databases

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

Précis: The Essence of a Query Answer *

A NEW WATERMARKING TECHNIQUE FOR SECURE DATABASE

Ginix: Generalized Inverted Index for Keyword Search

Improved Structured Robustness (I-SR): A Novel Approach to Predict Hard Keyword Queries

Keyword Search over Graph-structured Data for Finding Effective and Non-redundant Answers

Towards open-source shared implementations of keyword-based access systems to relational data

Enhancing Search with Structure

Authority-Based Keyword Search in Databases

OPTIMIZED METHOD FOR INDEXING THE HIDDEN WEB DATA

Improving Data Access Performance by Reverse Indexing

Information Discovery, Extraction and Integration for the Hidden Web

Keyword Search in External Memory Graph Representations of Data

NOVEL CACHE SEARCH TO SEARCH THE KEYWORD COVERS FROM SPATIAL DATABASE

Searching SNT in XML Documents Using Reduction Factor

ROU: Advanced Keyword Search on Graph

Keyword Search over Relational Tables and Streams

Keyword Proximity Search on XML Graphs

EFFICIENT AND EFFECTIVE AGGREGATE KEYWORD SEARCH ON RELATIONAL DATABASES

AutoJoin: Providing Freedom from Specifying Joins

Qunits: queried units for database search

Optimization of Queries in Distributed Database Management System

SPATIAL INVERTED INDEX BY USING FAST NEAREST NEIGHBOR SEARCH

Aggregate Keyword Search on Large Relational Databases

Query Optimization in Distributed Databases. Dilşat ABDULLAH

AN ARCHITECTURE FOR FAST AJAX ENABLED WEB FORMS. Bryon Chan

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data

Transcription:

Keyword search in relational databases By SO Tsz Yan Amanda & HON Ka Lam Ethan 1

Introduction Ubiquitous relational databases Need to know SQL and database structure Hard to define an object 2

Query representation How can we apply keyword search on relational databases? Data representation Query processing Result ranking Result representation 3

Query representation What is a query? Pre-processing operations The first step 4

Query representation Query = (finite) list of keywords The query needs to be pre-processed to understand better about the user s need. It will then be used for internal queries. Possible operations Logical conjunction (AND) vs disjunction (OR) Condition/filtering (e.g. year > 3000) Categorize keywords into types (NUITS) And more... 5

Logical conjunction (AND) vs disjunction (OR) AND = all keywords OR = some keywords Less common = OR (in top-k query processing) 6

Filtering/condition e.g. year > 3000 Limit candidate data 7

Data representation How a database is modeled Graph-based Data graph Schema graph Comparison 8

Finding top-k min-cost connected trees [2] 9

Finding top-k min-cost connected trees [2] Node = tuple Edge = relationship between 2 tuples Edge/node weight = function defined by the authors 10

Finding top-k min-cost connected trees [2] Query = {Keyword, Query, DB, Jim} 2 Steiner trees (candidates) Steiner tree = tree of subset of vertices Tree-1 is ranked higher (lower cost) Tree cost = edge weights 11

IR-Style Keyword Search [3] 12

IR-Style Keyword Search [3] Node = relation Edge = foreign key relationship from one relation to another 13

IR-Style Keyword Search [3] 1. Construct a schema graph 2. Use the schema graph to compute joining trees of tuples a. Joining tree nodes of tuples connected by an edge of foreign key relationship 3. Return the trees of the highest scores 14

Data graphs vs schema graphs Data graphs Schema graphs 1. Larger (nodes = records) 1. Smaller (nodes = relations) 2. Don t need access to database 2. Need access to database 3. Harder to maintain 3. Easier to maintain 15

Query processing Constructing an index Top-k query processing Effectiveness - Crucial requirement. 16

Indexing Structure - Inverted Index MOTIVATION : Avoid the need to linearly scan all of the tables in the database for every query. Traditional Way of finding location of a keyword: Inverted index Balmin A, Hristidis V, Papakonstantinou Y (2004) ObjectRank: authority-based keyword search in databases. In: Proceedings of the 30th international conference on very large data bases, pp 564 575, August 31 September 03, 2004, Toronto, Canada An inverted index that supports phrase searches 17

Indexing Structure - 2 Main Challenges 1. How to control granularity of indexed content 2. How to efficiently find the exact results from the indexed context 18

Indexing Structure - Symbol table A symbol table maintains the list of columns or cells that contain the keywords. Agrawal S, Chaudhuri S, Das G (2002) DBXplorer: a system for keyword-based search over relational databases. In: Proceedings of the 18th international conference on data engineering, pp 5 17, February 26 March 01, 2002, San Jose, California, USA 19

Indexing Structure - Symbol table (Compression) Larger symbol table increases the I/O cost during the search step Need to reduce the space needed for this auxiliary data. Compression Goldman R, Shivakumar N, Venkatasubramanian S, Garcia-Molina H (1998) Proximity search in databases. In: Proceedings of the 24th international conference on very large data bases, pp 26 37, August 24 27, 1998, San Francisco, California, USA 20

Indexing Structure Symbol table (Granularity levels) To reduce the scan time and storage space costs, symbol table is designed to several granularity levels of schema elements: column level and record level. 21

Why we need top-k processing techniques? Retrieve information scattered across several tables Require multiple JOIN operations. If the system attempts to join ALL of the tuples with ALL of the query keywords Extremely inefficient Only a few matches for query keywords are of interest. requires efficient top-k processing techniques. 22

Top-k query processing Users are only interested in a small number of results, k, that best match the given query keywords. 23

Top-k query processing - Candidate Network (CN) DISCOVER executes top-k queries by avoiding creation of ALL query results Shares intermediate results that are used for evaluating CN The top-k results are only distributed in a few CNs. search system has to decide which CN will produce top-k results CN: JOIN expressions to be used to create joining trees of tuples that will be considered as potential answers to the query. Architecture of DISCOVER Hristidis V, Papakonstantinou Y (2002) DISCOVER: keyword search in relational databases. In: Proceedings of the 28th international conference on very large data bases, pp 670 681, August 20 23, 2002, Hong Kong, China 24

Result ranking 1. RELEVANCE 2. IMPORTANCE R- Size of an answer R- Graph Representation R- IR weighting methods I- Authority transferring methods 25

Relevance - Size of an answer To measure the relevance, many approaches have considered the size of an answer as a ranking factor. Answers with smaller number of joins are generally more meaningful/ helpful. Luo Y, Lin X, Wang W, Zhou X (2007) SPARK: Top-k keyword query in relational databases. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 115 126, June 11 14, 2007 Beijing, China 26

Relevance - Graph Representation Answers represent as minimal subgraph that includes ALL of the query keywords. includes nodes that are not matched to the query keywords but just connect the matched nodes, e.g. T2 and T5 Should minimize non-matched nodes, and find a complete transitive closure STEINER TREE PROBLEM Join Trees Hulgeri A, Nakhe C (2002) Keyword searching and browsing in databases using BANKS. In: Proceedings of the 18th international conference on data engineering, pp 431 441, February 26 March 01, 2002, San Jose, California, USA 27

Relevance - Number of edges Nodes Edges Dataspot ranks candidate answers by the number of edges in the subgraph. Dataspot: Sample database (left), Hyperbase (right) Dar S, Entin G, Geva S, Palmon E (1998) DTL s dataspot: database exploration using plain language. In: Proceedings of the 24th international conference on very large data bases, pp 645 649, August 24 27, 1998, San Francisco, California, USA 28

Relevance - Semantic Closeness Proximity search differentiates distance between different kinds of schema elements - between a table and its attributes between tuples in the same table between tuples related through primary and foreign keys Regards the distance as the semantic closeness between objects. A fragment of the movie database relational schema and a database instance as a graph Using the shortest path between schema elements to measure size of an answer. Goldman R, Shivakumar N, Venkatasubramanian S, Garcia-Molina H (1998) Proximity search in databases. In: Proceedings of the 24th international conference on very large data bases, pp 26 37, August 24 27, 1998, San Francisco, California, USA 29

Relevance - IR weighting methods Ranking function considers each text column as a collection, and uses the standard IR weighting methods, e.g. tf-idf to compute a weight for each term in the field. [Focus on improving quality of relevance ranking for text documents] 30

Importance - Authority transferring methods The DBLP schema graph. Nodes with an incoming link with high authority are assumed to have higher importance. compute importance of node based on the link structure in the graph model. The DBLP authority transfer schema graph. Hristidis V, Hwang H, Papakonstantinou Y (2008) Authority-based keyword search in databases. ACM Trans Database Syst 33(1):1 40 31

Importance - Authority transferring methods Authority transfer data graph. A subset of the DBLP graph. Sum of authority transfer rates of outgoing edges determines authority of the node within the same domain. a node that is referenced by other authoritative nodes obtains authority. Hristidis V, Hwang H, Papakonstantinou Y (2008) Authority-based keyword search in databases. ACM Trans Database Syst 33(1):1 40 An edge is omitted only if the transfer rate is 0 in that direction. Edge weights are assigned as the authority transfer rate.

Result representation Examples Little but essential 33

BANKS [4] {soumen, sunita} 34

Finding top-k min-cost connected trees [2] 35

Query representation Data representation Query processing Result ranking Result representation 36

References 1. 2. 3. 4. Park, Jaehui, and Sang-goo Lee. "Keyword search in relational databases." Knowledge and Information Systems 26.2 (2011): 175-193. Ding, Bolin, et al. "Finding top-k min-cost connected trees in databases." Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on. IEEE, 2007. Hristidis, Vagelis, Luis Gravano, and Yannis Papakonstantinou. "Efficient IR-style keyword search over relational databases." Proceedings of the 29th international conference on Very large data bases-volume 29. VLDB Endowment, 2003. Bhalotia, Gaurav, et al. "Keyword searching and browsing in databases using BANKS." Data Engineering, 2002. Proceedings. 18th International Conference on. IEEE, 2002. 37