SPARK: Top-k Keyword Query in Relational Database

Size: px
Start display at page:

Download "SPARK: Top-k Keyword Query in Relational Database"

Transcription

1 SPARK: Top-k Keyword Query in Relational Database Wei Wang University of New South Wales Australia 20/03/2007 1

2 Outline Demo & Introduction Ranking Query Evaluation Conclusions 20/03/2007 2

3 Demo 20/03/2007 3

4 Demo 20/03/2007 4

5 SPARK I Searching, Probing & Ranking Top-k Results Thesis project ( ) Taste of Research Summary Scholarship (2005) Finally, CISRA prize winner ering.php 20/03/2007 5

6 SPARK II Continued as a research project with PhD student Yi Luo SIGMOD paper trying VLDB 2007 Demo now! 20/03/2007 6

7 A Motivating Example 20/03/2007 7

8 A Motivating Example Top-3 results in our system Movies: Primetime Glick (2001) Tom Hanks/Ben Stiller (#2.1) Movies: Primetime Glick (2001) Tom Hanks/Ben Stiller (#2.1) ActorPlay: Character = Himself Actors: Hanks, Tom Actors: John Hanks ActorPlay: Character = Alexander Kerst Movies: Rosamunde Pilcher - Winduber dem Fluss (2001) 20/03/2007 8

9 Improving the Effectiveness Three factors are considered to contribute to the final score of a search result (joined tuple tree) (modified) IR ranking score. the completeness factor. the size normalization factor. 20/03/2007 9

10 Preliminaries Data Model Relation-based Query Model Joined tuple trees (JTTs) Sophisticated ranking address one flaw in previous approaches unify AND and OR semantics alternative size normalization 20/03/

11 Problems with DISCOVER2 t Q D 1+ ln(1 + ln( tf )) ln (1 s) + s dl avdl qtf ln N + 1 df t Q D 1+ ln(1 + ln( tf )) ln N + 1 df score(c i ) score(p j ) score signature SPARK c1 p (1, 1) 0.98 c2 p (0, 2) /03/

12 Virtual Document Combine tf contributions before tf normalization / attenuation. t Q D 1+ ln(1 + ln( tf )) ln N + 1 df c i p j score(maxtor) score(netvista) score a * c1 p c2 p /03/

13 Virtual Document Collection Collection: 3 results idf netvista = ln(4/3) idf maxtor = ln(4/2) Estimate idf: idf netvista = ε idf maxtor = 1 t Q D ln = 1 (1 1 )(1 1 ) ln(1 + ln( tf ln (1 s) + s ln 9 5 dl avdl )) qtf ln N + 1 df Estimate avdl = avdl C + avdl P c1 p1 c2 p2 score a /03/

14 Completeness Factor For short queries User prefer results matching more keywords Derive completeness factor based on extended Boolean model Measure L p distance to the idea position netvista c1 p1 c2 p2 d = 1 (c2 p2) d = 1.41 L 2 distance Ideal Pos (1,1) d = 0.5 (c1 p1) maxtor score b ( )/1.41 = 0.65 (1.41-1)/1.41 = /03/

15 Size Normalization Results in large CNs tend to have more matches to the keywords Score c = (1+s 1 -s 1 * CN ) * (1+s 2 -s 2 * CN nf ) Empirically, s 1 = 0.15, s 2 = 1 / ( Q + 1) works well 20/03/

16 Putting em Together score(jtt) = score a * score b * score c a : IR-score of the virtual document b : completeness factor c : size normalization factor c1 p1 c2 p2 score a * score b 0.98 * 0.65 = * 0.29 = /03/

17 Comparing Top-1 Results DBLP; Query = nikosclique 20/03/

18 #Rel and R-Rank Results #Rel DBLP; 18 queries; Union of top-20 results R-Rank #Rel DISCOVER [Liu et al, SIGMOD06] p = Mondial; 35 queries; Union of top-20 results R-Rank DISCOVER [Liu et al, SIGMOD06] p = p = p = p = p = /03/

19 Query Processing 3 Steps Generate candidate tuples in every relation in the schema (using full-text indexes) 20/03/

20 Query Processing 3 Steps Generate candidate tuples in every relation in the schema (using full-text indexes) Enumerate all possible Candidate Networks (CN) 20/03/

21 Query Processing 3 Steps Generate candidate tuples in every relation in the schema (using full-text indexes) Enumerate all possible Candidate Networks (CN) Execute the CNs Most algorithms differ here. The key is how to optimize for top-k retrieval 20/03/

22 Monotonic Scoring Function Execute a CN Assume: idf netvista > idf maxtor and k = 1 P CN: P Q C Q c1 p1 score(c i ) 1.06 score(p j ) 0.97 score 2.03 c2 p P 1 P 2 C 2 C 1 DISCOVER2 C c1 p1 < c2 p2 c1 p1 < c2 p2 20/03/

23 Non-Monotonic Scoring Function Execute a CN Assume: idf netvista > idf maxtor and k = 1 P 2 P 1 P CN: P Q C Q c1 p1 score(c i ) 1.06 score(p j ) 0.97 score a 0.98 c2 p ?? SPARK C C 1 C 2 c1 p1 < c1 p1 c2 p2 c2 p2 1) Re-establish the early stopping criterion 2) Check candidates in an optimal order 20/03/ <

24 Upper Bounding Function Idea: use a monotonic & tight, upper bounding function to SPARK s non-monotonic scoring function Details sumidf = Σ w idf w watf(t) = (1/sumidf) * Σ w (tf w (t) * idf w ) A = sumidf * (1 + ln(1 + ln( Σ t watf(t) ))) B = sumidf * Σ t watf(t) then, score a uscore a = (1/(1-s)) * min(a, B) score b monotonic wrt. watf(t) score c are constants given the CN score uscore 20/03/

25 Early Stopping Criterion Execute a CN Assume: idf netvista > idf maxtor and k = 1 P CN: P Q C Q c1 p1 uscore 1.13 score a 0.98 c2 p P 1 P 2 score( ) uscore( ) score( ) uscore( ) stop! C 2 C 1 C SPARK 1) Re-establish the early stopping criterion 2) Check candidates in an optimal order 20/03/

26 Query Processing Execute the CNs {P 1, P 2, } and {C1, C2, } have been sorted based on their IR relevance scores. Score(Pi Cj) = Score(Pi) + Score(Cj) CN: P Q C Q Operations: P [P 1,P 1 ] [C 1,C 1 ] C.get_next() // a parametric SQL query is sent to the dbms P 3 P 2 P 1 C 1 C 2 C 3 [VLDB 03] C [P 1,P 1 ] C 2 P.get_next() P 2 [C 1,C 2 ] P.get_next() P 3 [C 1,C 2 ] 20/03/

27 Skyline Sweeping Algorithm Execute the CNs Dominance uscore(<p i, C j >) > uscore(<p i+1, C j >) and uscore(<p i, C j >) > uscore(<p i, C j+1 >) CN: P Q C Q Operations: Priority Queue: P P 3 P 2 P 1 C 1 C 2 C 3 C P 1 C 1 P 2 C 1 P 3 C 1 <P 1, C 1 > <P 2, C 1 >, <P 1, C 2 > <P 3, C 1 >, <P 1, C 2 >, <P 2, C 2 > <P 1, C 2 >, <P 2, C 2 >, <P 4, C 1 >, <P 3, C 2 > Skyline Sweep 1) Re-establish the early stopping criterion 2) Check candidates in an optimal order sort of 20/03/

28 Block Pipeline Algorithm Inherent deficiency to bound non-monotonic function with (a few) monotonic upper bounding functions draw an example Lots of candidates with high uscores return much lower (real) score unnecessary (expensive) checking cannot stop earlier Idea Partition the space (into blocks) and derive tighter upper bounds for each partitions unwilling to check a candidate until we are quite sure about its prospect (bscore) 20/03/

29 Block Pipeline Algorithm Execute a CN Assume: idf n > idf m and k = 1 P (n:0, m:1) CN: P Q C Q Block uscore bscore score a (n:1, m:0) Block Pipeline C (n:1, m:0) (n:0, m:1) ) Re-establish the early stopping criterion 2) Check candidates in an optimal order 20/03/ stop!

30 Efficiency DBLP ~ 0.9M tuples in total k = 10 PC 1.8G, 512M Sparse GP SS BP time(ms) /03/2007 DQ1 DQ2 DQ3 DQ4 DQ5 DQ6 DQ7 DQ8 DQ9 DQ10 DQ11 DQ12 DQ13 DQ14 DQ15 DQ16 DQ17 DQ18 30

31 Efficiency DBLP, DQ Sparse GP SS BP /03/

32 Conclusions A system that can perform effective & efficient keyword search on relational databases Meaningful query results with appropriate rankings second-level response time for ~10M tuple DB (imdb data) on a commodity PC 20/03/

33 Q&A Thank you. 20/03/

34 Backup Slides BANKS demo: -shashank//servlet/searchform 20/03/

SPARK2: Top-k Keyword Query in Relational Databases

SPARK2: Top-k Keyword Query in Relational Databases TKDE SPECIAL IUE: KEYWORD SEARCH ON STRUCTURED DATA, 20 SPARK2: Top-k Keyword Query in Relational Databases Yi Luo, Wei Wang, Member, IEEE, Xuemin Lin, Xiaofang Zhou, Senior Member, IEEE Jianmin Wang,

More information

Keyword Search in Databases

Keyword Search in Databases Keyword Search in Databases Wei Wang University of New South Wales, Australia Outline Based on the tutorial given at APWeb 2006 Introduction IR Preliminaries Systems Open Issues Dr. Wei Wang @ CSE, UNSW

More information

Implementation of Skyline Sweeping Algorithm

Implementation of Skyline Sweeping Algorithm Implementation of Skyline Sweeping Algorithm BETHINEEDI VEERENDRA M.TECH (CSE) K.I.T.S. DIVILI Mail id:veeru506@gmail.com B.VENKATESWARA REDDY Assistant Professor K.I.T.S. DIVILI Mail id: bvr001@gmail.com

More information

Extending Keyword Search to Metadata in Relational Database

Extending Keyword Search to Metadata in Relational Database DEWS2008 C6-1 Extending Keyword Search to Metadata in Relational Database Jiajun GU Hiroyuki KITAGAWA Graduate School of Systems and Information Engineering Center for Computational Sciences University

More information

Effective Top-k Keyword Search in Relational Databases Considering Query Semantics

Effective Top-k Keyword Search in Relational Databases Considering Query Semantics Effective Top-k Keyword Search in Relational Databases Considering Query Semantics Yanwei Xu 1,2, Yoshiharu Ishikawa 1, and Jihong Guan 2 1 Graduate School of Information Science, Nagoya University, Japan

More information

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan Keyword search in relational databases By SO Tsz Yan Amanda & HON Ka Lam Ethan 1 Introduction Ubiquitous relational databases Need to know SQL and database structure Hard to define an object 2 Query representation

More information

Keyword Search in Databases

Keyword Search in Databases + Databases and Information Retrieval Integration TIETS42 Keyword Search in Databases Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html

More information

Information Retrieval Using Keyword Search Technique

Information Retrieval Using Keyword Search Technique Information Retrieval Using Keyword Search Technique Dhananjay A. Gholap, Dr.Gumaste S. V Department of Computer Engineering, Sharadchandra Pawar College of Engineering, Dumbarwadi, Otur, Pune, India ABSTRACT:

More information

PAPER SRT-Rank: Ranking Keyword Query Results in Relational Databases Using the Strongly Related Tree

PAPER SRT-Rank: Ranking Keyword Query Results in Relational Databases Using the Strongly Related Tree 2398 PAPER SRT-Rank: Ranking Keyword Query Results in Relational Databases Using the Strongly Related Tree In-Joong KIM, Student Member, Kyu-Young WHANG a), and Hyuk-Yoon KWON, Nonmembers SUMMARY A top-k

More information

Evaluation of Keyword Search System with Ranking

Evaluation of Keyword Search System with Ranking Evaluation of Keyword Search System with Ranking P.Saranya, Dr.S.Babu UG Scholar, Department of CSE, Final Year, IFET College of Engineering, Villupuram, Tamil nadu, India Associate Professor, Department

More information

MAINTAIN TOP-K RESULTS USING SIMILARITY CLUSTERING IN RELATIONAL DATABASE

MAINTAIN TOP-K RESULTS USING SIMILARITY CLUSTERING IN RELATIONAL DATABASE INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 MAINTAIN TOP-K RESULTS USING SIMILARITY CLUSTERING IN RELATIONAL DATABASE Syamily K.R 1, Belfin R.V 2 1 PG student,

More information

A System for Query-Specific Document Summarization

A System for Query-Specific Document Summarization A System for Query-Specific Document Summarization Ramakrishna Varadarajan, Vagelis Hristidis. FLORIDA INTERNATIONAL UNIVERSITY, School of Computing and Information Sciences, Miami. Roadmap Need for query-specific

More information

Keyword query interpretation over structured data

Keyword query interpretation over structured data Keyword query interpretation over structured data Advanced Methods of IR Elena Demidova Materials used in the slides: Jeffrey Xu Yu, Lu Qin, Lijun Chang. Keyword Search in Databases. Synthesis Lectures

More information

Relational Keyword Search System

Relational Keyword Search System Relational Keyword Search System Pradeep M. Ghige #1, Prof. Ruhi R. Kabra *2 # Student, Department Of Computer Engineering, University of Pune, GHRCOEM, Ahmednagar, Maharashtra, India. * Asst. Professor,

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Information Retrieval

Information Retrieval Information Retrieval WS 2016 / 2017 Lecture 2, Tuesday October 25 th, 2016 (Ranking, Evaluation) Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University

More information

Rank-aware XML Data Model and Algebra: Towards Unifying Exact Match and Similar Match in XML

Rank-aware XML Data Model and Algebra: Towards Unifying Exact Match and Similar Match in XML Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 253 Rank-aware XML Data Model and Algebra: Towards Unifying Exact

More information

Keyword query interpretation over structured data

Keyword query interpretation over structured data Keyword query interpretation over structured data Advanced Methods of Information Retrieval Elena Demidova SS 2018 Elena Demidova: Advanced Methods of Information Retrieval SS 2018 1 Recap Elena Demidova:

More information

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert

More information

Query Segmentation Using Conditional Random Fields

Query Segmentation Using Conditional Random Fields Query Segmentation Using Conditional Random Fields Xiaohui Yu and Huxia Shi York University Toronto, ON, Canada, M3J 1P3 xhyu@yorku.ca,huxiashi@cse.yorku.ca ABSTRACT A growing mount of available text data

More information

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 2.114

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 2.114 [Saranya, 4(3): March, 2015] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A SURVEY ON KEYWORD QUERY ROUTING IN DATABASES N.Saranya*, R.Rajeshkumar, S.Saranya

More information

Keyword Search over RDF Graphs. Elisa Menendez

Keyword Search over RDF Graphs. Elisa Menendez Elisa Menendez emenendez@inf.puc-rio.br Summary Motivation Keyword Search over RDF Process Challenges Example QUIOW System Next Steps Motivation Motivation Keyword search is an easy way to retrieve information

More information

EFFICIENT AND EFFECTIVE AGGREGATE KEYWORD SEARCH ON RELATIONAL DATABASES

EFFICIENT AND EFFECTIVE AGGREGATE KEYWORD SEARCH ON RELATIONAL DATABASES EFFICIENT AND EFFECTIVE AGGREGATE KEYWORD SEARCH ON RELATIONAL DATABASES by Luping Li B.Eng., Renmin University, 2009 a Thesis submitted in partial fulfillment of the requirements for the degree of MASTER

More information

Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach +

Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Abdullah Al-Hamdani, Gultekin Ozsoyoglu Electrical Engineering and Computer Science Dept, Case Western Reserve University,

More information

Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar

Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar School of Computer Science Faculty of Science University of Windsor How Big is this Big Data? 40 Billion Instagram Photos 300 Hours

More information

Volume 2, Issue 11, November 2014 International Journal of Advance Research in Computer Science and Management Studies

Volume 2, Issue 11, November 2014 International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 11, November 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com

More information

International Journal of Advance Engineering and Research Development. Performance Enhancement of Search System

International Journal of Advance Engineering and Research Development. Performance Enhancement of Search System Scientific Journal of Impact Factor(SJIF): 3.134 International Journal of Advance Engineering and Research Development Volume 2,Issue 7, July -2015 Performance Enhancement of Search System Ms. Uma P Nalawade

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Keyword search in databases: the power of RDBMS

Keyword search in databases: the power of RDBMS Keyword search in databases: the power of RDBMS 1 Introduc

More information

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15 Examples of Physical Query Plan Alternatives Selected Material from Chapters 12, 14 and 15 1 Query Optimization NOTE: SQL provides many ways to express a query. HENCE: System has many options for evaluating

More information

Database System Concepts

Database System Concepts Chapter 14: Optimization Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2007/2008 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth and Sudarshan.

More information

A Keyword-based Structured Query Language

A Keyword-based Structured Query Language Expressive and Flexible Access to Web-Extracted Data : A Keyword-based Structured Query Language Department of Computer Science and Engineering Indian Institute of Technology Delhi 22th September 2011

More information

Interactive keyword-based access to large-scale structured datasets

Interactive keyword-based access to large-scale structured datasets Interactive keyword-based access to large-scale structured datasets 2 nd Keystone Summer School 20 July 2016 Dr. Elena Demidova University of Southampton 1 Overview Keyword-based access to structured data

More information

Department of Computer Engineering, Sharadchandra Pawar College of Engineering, Dumbarwadi, Otur, Pune, Maharashtra, India

Department of Computer Engineering, Sharadchandra Pawar College of Engineering, Dumbarwadi, Otur, Pune, Maharashtra, India Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Information Retrieval

More information

Keywords Machine learning, Pattern matching, Query processing, NLP

Keywords Machine learning, Pattern matching, Query processing, NLP Volume 7, Issue 3, March 2017 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Ratatta: Chatbot

More information

Refinement of keyword queries over structured data with ontologies and users

Refinement of keyword queries over structured data with ontologies and users Refinement of keyword queries over structured data with ontologies and users Advanced Methods of IR Elena Demidova SS 2014 Materials used in the slides: Sandeep Tata and Guy M. Lohman. SQAK: doing more

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1 Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1 Chapter 25 Distributed Databases and Client-Server Architectures Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 25 Outline

More information

Relational Query Optimization

Relational Query Optimization Relational Query Optimization Module 4, Lectures 3 and 4 Database Management Systems, R. Ramakrishnan 1 Overview of Query Optimization Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 15-16: Basics of Data Storage and Indexes (Ch. 8.3-4, 14.1-1.7, & skim 14.2-3) 1 Announcements Midterm on Monday, November 6th, in class Allow 1 page of notes (both sides,

More information

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia WWW 2010

More information

Chapter 13: Query Optimization

Chapter 13: Query Optimization Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Outline. Eg. 1: DBLP. Motivation. Eg. 2: ACM DL Portal. Eg. 2: DBLP. Digital Libraries (DL) often have many errors that negatively affect:

Outline. Eg. 1: DBLP. Motivation. Eg. 2: ACM DL Portal. Eg. 2: DBLP. Digital Libraries (DL) often have many errors that negatively affect: Outline Effective and Scalable Solutions for Mixed and Split Citation Problems in Digital Libraries Dongwon Lee, Byung-Won On Penn State University, USA Jaewoo Kang North Carolina State University, USA

More information

Kikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML

Kikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML Kikori-KS An Effective and Efficient Keyword Search System for Digital Libraries in XML Toshiyuki Shimizu 1, Norimasa Terada 2, and Masatoshi Yoshikawa 1 1 Graduate School of Informatics, Kyoto University

More information

CMSC 424 Database design Lecture 18 Query optimization. Mihai Pop

CMSC 424 Database design Lecture 18 Query optimization. Mihai Pop CMSC 424 Database design Lecture 18 Query optimization Mihai Pop More midterm solutions Projects do not be late! Admin Introduction Alternative ways of evaluating a given query Equivalent expressions Different

More information

Optimization of Queries in Distributed Database Management System

Optimization of Queries in Distributed Database Management System Optimization of Queries in Distributed Database Management System Bhagvant Institute of Technology, Muzaffarnagar Abstract The query optimizer is widely considered to be the most important component of

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

An Empirical Performance Evaluation of Relational Keyword Search Systems

An Empirical Performance Evaluation of Relational Keyword Search Systems An Empirical Performance Evaluation of Relational Keyword Search Systems University of Virginia Department of Computer Science Technical Report CS-2011-07 Joel Coffman, Alfred C. Weaver Department of Computer

More information

Processing Recommender Top-N Queries in Relational Databases

Processing Recommender Top-N Queries in Relational Databases Processing Recommender Top-N Queries in Relational Databases Liang Zhu1*, Quanlong Lei1, Guang Liu2, Feifei Liu1 1 Key Lab of Machine Learning and Computational Intelligence, School of Mathematics and

More information

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: Relational Query Optimization R & G Chapter 13 Review Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory

More information

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML)

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML) Since in the result relation each group is represented by exactly one tuple, in the select clause only aggregate functions can appear, or attributes that are used for grouping, i.e., that are also used

More information

Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs

Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs Authors: Andreas Wagner, Veli Bicer, Thanh Tran, and Rudi Studer Presenter: Freddy Lecue IBM Research Ireland 2014 International

More information

Semantic Optimization of Preference Queries

Semantic Optimization of Preference Queries Semantic Optimization of Preference Queries Jan Chomicki University at Buffalo http://www.cse.buffalo.edu/ chomicki 1 Querying with Preferences Find the best answers to a query, instead of all the answers.

More information

Information Retrieval Overview

Information Retrieval Overview Roadmap Information Retrieval Overview Vagelis Hristidis School of Computer Science Florida International University COP 6727 What is IR? Matching Models Evaluation of Results Digital Libraries vs. IR

More information

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016 + Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

CGS 3066: Spring 2017 SQL Reference

CGS 3066: Spring 2017 SQL Reference CGS 3066: Spring 2017 SQL Reference Can also be used as a study guide. Only covers topics discussed in class. This is by no means a complete guide to SQL. Database accounts are being set up for all students

More information

Efficient Subgraph Matching by Postponing Cartesian Products

Efficient Subgraph Matching by Postponing Cartesian Products Efficient Subgraph Matching by Postponing Cartesian Products Computer Science and Engineering Lijun Chang Lijun.Chang@unsw.edu.au The University of New South Wales, Australia Joint work with Fei Bi, Xuemin

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week. Database Systems ( 料 ) December 13/14, 2006 Lecture #10 1 Announcement Assignment #4 is due next week. 2 1 Overview of Query Evaluation Chapter 12 3 Outline Query evaluation (Overview) Relational Operator

More information

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005.

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005. Improving Query Plans CS157B Chris Pollett Mar. 21, 2005. Outline Parse Trees and Grammars Algebraic Laws for Improving Query Plans From Parse Trees To Logical Query Plans Syntax Analysis and Parse Trees

More information

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example Student Introduction to Database Systems CSE 414 Hash table example Index Student_ID on Student.ID Data File Student 10 Tom Hanks 10 20 20 Amy Hanks ID fname lname 10 Tom Hanks 20 Amy Hanks Lecture 26:

More information

PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 INTRODUCTION In centralized database: Data is located in one place (one server) All DBMS functionalities are done by that server

More information

Open Data Integration. Renée J. Miller

Open Data Integration. Renée J. Miller Open Data Integration Renée J. Miller miller@northeastern.edu !2 Open Data Principles Timely & Comprehensive Accessible and Usable Complete - All public data is made available. Public data is data that

More information

Multi-dimensional Skyline to find shopping malls. Md Amir Amiruzzaman Suphanut Parn Jamonnak Zhengyong Ren

Multi-dimensional Skyline to find shopping malls. Md Amir Amiruzzaman Suphanut Parn Jamonnak Zhengyong Ren Multi-dimensional Skyline to find shopping malls Md Amir Amiruzzaman Suphanut Parn Jamonnak Zhengyong Ren Introduction In market research predicting customer movement is very important. While customers

More information

CS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Text data and information retrieval Li Xiong Department of Mathematics and Computer Science Emory University Outline Information Retrieval (IR) Concepts Text Preprocessing Inverted

More information

Overview of Implementing Relational Operators and Query Evaluation

Overview of Implementing Relational Operators and Query Evaluation Overview of Implementing Relational Operators and Query Evaluation Chapter 12 Motivation: Evaluating Queries The same query can be evaluated in different ways. The evaluation strategy (plan) can make orders

More information

This lecture: IIR Sections Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring

This lecture: IIR Sections Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring This lecture: IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring 1 Ch. 6 Ranked retrieval Thus far, our queries have all

More information

Outline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs.

Outline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs. Parallel Database Systems STAVROS HARIZOPOULOS stavros@cs.cmu.edu Outline Background Hardware architectures and performance metrics Parallel database techniques Gamma Bonus: NCR / Teradata Conclusions

More information

Diversification of Query Interpretations and Search Results

Diversification of Query Interpretations and Search Results Diversification of Query Interpretations and Search Results Advanced Methods of IR Elena Demidova Materials used in the slides: Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova,

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction Inverted index Processing Boolean queries Course overview Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute for Natural

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

Virtual views. Incremental View Maintenance. View maintenance. Materialized views. Review of bag algebra. Bag algebra operators (slide 1)

Virtual views. Incremental View Maintenance. View maintenance. Materialized views. Review of bag algebra. Bag algebra operators (slide 1) Virtual views Incremental View Maintenance CPS 296.1 Topics in Database Systems A view is defined by a query over base tables Example: CREATE VIEW V AS SELECT FROM R, S WHERE ; A view can be queried just

More information

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs Introduction to Database Systems CSE 414 Lecture 26: More Indexes and Operator Costs CSE 414 - Spring 2018 1 Student ID fname lname Hash table example 10 Tom Hanks Index Student_ID on Student.ID Data File

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections

More information

Mobile and Heterogeneous databases Distributed Database System Query Processing. A.R. Hurson Computer Science Missouri Science & Technology

Mobile and Heterogeneous databases Distributed Database System Query Processing. A.R. Hurson Computer Science Missouri Science & Technology Mobile and Heterogeneous databases Distributed Database System Query Processing A.R. Hurson Computer Science Missouri Science & Technology 1 Note, this unit will be covered in four lectures. In case you

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

Overview of Query Evaluation. Chapter 12

Overview of Query Evaluation. Chapter 12 Overview of Query Evaluation Chapter 12 1 Outline Query Optimization Overview Algorithm for Relational Operations 2 Overview of Query Evaluation DBMS keeps descriptive data in system catalogs. SQL queries

More information

Effective Keyword Search in Relational Databases

Effective Keyword Search in Relational Databases Effective Keyword Search in Relational Databases Fang Liu, Clement Yu Computer Science Department University of Illinois at Chicago {fliu1,yu}@cs.uic.edu Weiyi Meng Computer Science Department Binghamton

More information

Welcome to the topic of SAP HANA modeling views.

Welcome to the topic of SAP HANA modeling views. Welcome to the topic of SAP HANA modeling views. 1 At the end of this topic, you will be able to describe the three types of SAP HANA modeling views and use the SAP HANA Studio to work with views in the

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

Clustering Analysis for Malicious Network Traffic

Clustering Analysis for Malicious Network Traffic Clustering Analysis for Malicious Network Traffic Jie Wang, Lili Yang, Jie Wu and Jemal H. Abawajy School of Information Science and Engineering, Central South University, Changsha, China Email: jwang,liliyang@csu.edu.cn

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation

Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation Kaushik Chakrabarti Venkatesh Ganti Jiawei Han Dong Xin* Microsoft Research Microsoft Research University of Illinois University

More information

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation

More information

Introduction to Database Systems. Motivation. Werner Nutt

Introduction to Database Systems. Motivation. Werner Nutt Introduction to Database Systems Motivation Werner Nutt 1 Databases Are Everywhere Database = a large (?) collection of related data Classically, a DB models a real-world organisation (e.g., enterprise,

More information

Query Optimization in Distributed Databases. Dilşat ABDULLAH

Query Optimization in Distributed Databases. Dilşat ABDULLAH Query Optimization in Distributed Databases Dilşat ABDULLAH 1302108 Department of Computer Engineering Middle East Technical University December 2003 ABSTRACT Query optimization refers to the process of

More information

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence What s a database system? Review of Basic Database Concepts CPS 296.1 Topics in Database Systems According to Oxford Dictionary Database: an organized body of related information Database system, DataBase

More information

XML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson

XML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Manning, Raghavan, and Schütze http://www.informationretrieval.org OVERVIEW Introduction Basic XML Concepts Challenges

More information

Incremental Keyword Search in Relational Databases

Incremental Keyword Search in Relational Databases Dipartimento di Informatica e Automazione Via della Vasca Navale, 79 00146 Roma, Italy Incremental Keyword Search in Relational Databases Roberto De Virgilio, Antonio Maccioni, Riccardo Torlone RT-DIA-204-2013

More information

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa ICS 624 Spring 2011 Overview of DB & IR Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 1/12/2011 Lipyeow Lim -- University of Hawaii at Manoa 1 Example

More information

Databases & Information Retrieval

Databases & Information Retrieval Databases & Information Retrieval Maya Ramanath (Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G. Weikum, G. Kasneci, M. Ramanath and F.M. Suchanek,

More information

Towards Efficient and Effective Semantic Table Interpretation Ziqi Zhang

Towards Efficient and Effective Semantic Table Interpretation Ziqi Zhang Towards Efficient and Effective Semantic Table Interpretation Ziqi Zhang Department of Computer Science, University of Sheffield Outline Define semantic table interpretation State-of-the-art and motivation

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

Data about data is database Select correct option: True False Partially True None of the Above

Data about data is database Select correct option: True False Partially True None of the Above Within a table, each primary key value. is a minimal super key is always the first field in each table must be numeric must be unique Foreign Key is A field in a table that matches a key field in another

More information

Query Relaxation Using Malleable Schemas. Dipl.-Inf.(FH) Michael Knoppik

Query Relaxation Using Malleable Schemas. Dipl.-Inf.(FH) Michael Knoppik Query Relaxation Using Malleable Schemas Dipl.-Inf.(FH) Michael Knoppik Table Of Contents 1.Introduction 2.Limitations 3.Query Relaxation 4.Implementation Issues 5.Experiments 6.Conclusion Slide 2 1.Introduction

More information

Effective Semantic Search over Huge RDF Data

Effective Semantic Search over Huge RDF Data Effective Semantic Search over Huge RDF Data 1 Dinesh A. Zende, 2 Chavan Ganesh Baban 1 Assistant Professor, 2 Post Graduate Student Vidya Pratisthan s Kamanayan Bajaj Institute of Engineering & Technology,

More information

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten

More information

1. Data Model, Categories, Schemas and Instances. Outline

1. Data Model, Categories, Schemas and Instances. Outline Chapter 2: Database System Concepts and Architecture Outline Ramez Elmasri, Shamkant B. Navathe(2016) Fundamentals of Database Systems (7th Edition),pearson, isbn 10: 0-13-397077-9;isbn-13:978-0-13-397077-7.

More information