Answering Queries Using Views: A Survey. Alon Y. Halevy. Stanislav Angelov, CIS650 Data Sharing and the Web U. of Pennsylvania

Similar documents
MiniCon: A scalable algorithm for answering queries using views

Scalable Query Rewriting: A Graph-Based Approach

Query Rewriting Using Views in the Presence of Inclusion Dependencies

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

A Scalable Algorithm for Answering Queries Using Views

Finding Equivalent Rewritings in the Presence of Arithmetic Comparisons

Keyword Search on Form Results

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management

COMP718: Ontologies and Knowledge Bases

CSE-6490B Assignment #1

Philip A. Bernstein Microsoft Research One Microsoft Way Redmond, WA , USA.

Virtual views. Incremental View Maintenance. View maintenance. Materialized views. Review of bag algebra. Bag algebra operators (slide 1)

Distributed Systems

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Client-server architectures

Outline. Introduction. Introduction Definition -- Selection and Join Semantics The Cost Model Load Shedding Experiments Conclusion Discussion Points

Model Checking Dynamic Datapaths

Overview of Query Evaluation

Distributed Data Management

Data Integration Systems

Database Management Systems

Graph Databases. Guilherme Fetter Damasio. University of Ontario Institute of Technology and IBM Centre for Advanced Studies IBM Corporation

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

In this class, we addressed problem 14 from Chapter 2. So first step, we expressed the problem in STANDARD FORM:

Homework Set #2 Math 440 Topology Topology by J. Munkres

36 Modular Arithmetic

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Distributed Data Management

UpBit: Scalable In-Memory Updatable Bitmap Indexing. Guanting Chen, Yuan Zhang 2/21/2019

Lecture Notes on Binary Decision Diagrams

Datalog Evaluation. Linh Anh Nguyen. Institute of Informatics University of Warsaw

q1(name) :- studs(stno,name,_,_,_,_), grads(stno,_,cno,_,_,_),cours(cno,_,4).

15-415/615 Faloutsos 1

Principles of Data Management. Lecture #9 (Query Processing Overview)

Optimizing Recursive Information Gathering Plans

5.6 Rational Equations

Combinational Equivalence Checking Using Incremental SAT Solving, Output Ordering, and Resets

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

Threads (light weight processes) Chester Rebeiro IIT Madras

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CSE-6490B Final Exam

Query Processing & Optimization

Topic I (d): Static Single Assignment Form (SSA)

ABC basics (compilation from different articles)

DATABASE THEORY. Lecture 15: Datalog Evaluation (2) TU Dresden, 26th June Markus Krötzsch Knowledge-Based Systems

Regions Review. A region is a (typed) collection. Regent: More on Regions. Regions are the cross product of. CS315B Lecture 5

Working with SQL SERVER EXPRESS

Automatic Scaling Iterative Computations. Aug. 7 th, 2012

Query Evaluation Strategies

Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery

SOURCE IDENTIFICATION AND QUERY REWRITING IN OPEN XML DATA INTEGRATION SYSTEMS

CSE Midterm - Spring 2017 Solutions

Introduction to Operating Systems

Relational Model 2: Relational Algebra

Computing Query Answers with Consistent Support

Michalis Petropoulos Alin Deutsch Yannis Papakonstantinou Athens University of Economics and Business, May 2006

Local Two-Level And-Inverter Graph Minimization without Blowup

You should provide the best solutions you can think of.

Chapter 3. The Relational Model. Database Systems p. 61/569

Database Architectures

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

Security for Structured Peer-to-peer Overlay Networks. Acknowledgement. Outline. By Miguel Castro et al. OSDI 02 Presented by Shiping Chen in IT818

Dependability tree 1

The Volcano Optimizer Generator: Extensibility and Efficient Search

Notes in Computational Geometry Voronoi Diagrams

CSE 544, Winter 2009, Final Examination 11 March 2009

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

Robot Motion Planning

An Algorithm for Answering Queries Efficiently Using Views Prasenjit Mitra Infolab, Stanford University Stanford, CA, 94305, U.S.A.

Distributed Sampling in a Big Data Management System

CS330. Query Processing

Database Architectures

Interprocedural Analysis with Data-Dependent Calls. Circularity dilemma. A solution: optimistic iterative analysis. Example

Lecture 20: Query Optimization (2)

Data Integration: Logic Query Languages

Using Views to Generate Efficient Evaluation Plans for Queries

Topics : Analysis of Software Systems. Side channel analysis. Remote Timing Attacks are Practical

Computational Biology Lecture 6: Affine gap penalty function, multiple sequence alignment Saad Mneimneh

Expressing Fault Tolerant Algorithms with MPI-2. William D. Gropp Ewing Lusk

Chapter 2 & 3: Representations & Reasoning Systems (2.2)

Access Control Lists. Don Porter CSE 506

Answering Queries Using Views

Consistency in Distributed Systems

Query Execution [15]

Update Exchange with Mappings and Provenance

Overview of Query Evaluation. Overview of Query Evaluation

I. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications,

Interactive Query Formulation for Systems that Rewrite Queries Using Views

PEER-TO-PEER (P2P) systems are now one of the most

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

CSC 5930/9010 Cloud S & P: Cloud Primitives

15/16 CSY2041 Quality and User-Centred Systems

Principles of Data Management. Lecture #13 (Query Optimization II)

Performance analysis, development and improvement of programs, commands and BASH scripts in GNU/Linux systems

8. Secondary and Hierarchical Access Paths

Solving Equations with Inverse Operations

PROFESSOR: Last time, we took a look at an explicit control evaluator for Lisp, and that bridged the gap between

Advanced Database Systems

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

Transcription:

MiniCon Answering Queries Using Views: A Survey. Alon Y. Halevy Stanislav Angelov, CIS650 Data Sharing and the Web U. of Pennsylvania Additional references: MiniCon: A Scalable algorithm for answering queries using views. R. Pottinger, A. Halevy

Inverse-rules rules Algorithm Construct set of rules that invert the view definition. Pros Simplicity and modularity Returns maximally contained rewriting even w/ arbitrary recursive Datalog programs Needs additional constant propagation to trim redundant computations Cons May invert some of the useful computation done to produce view Needs additional constant propagation to trim redundant computations

Bucket Algorithm 1) Create a bucket for each sub-goal in the query containing the views that have the same sub-goal and there is a mapping. 2) Consider conjunctive rewriting for f each element of the Cartesian product of the buckets, and check whether it is contained or can be made to be contained in the query. Pros Considers each sub-goal in isolation To some degree takes into account context to prune search space Would possibly take advantage of materialized views Cons Considers each sub-goal in isolation Considers Cartesian product of buckets It is hard to recover projected away attributes w/o additional knowledge

MiniCon Algorithm Inverse-rules algorithm (extended version) is very similar to the Bucket algorithm and performs better. Main objective of the MiniCon Algorithm is to scale better with the number of available views Key difference between MiniCon and the above algorithms is the MiniCon Descriptors computed for each goal mapping - more preprocessing to build Descriptors (scale nicely to number of goals/views) - less work on combining phase (potentially exponential).

MiniCon Algorithm Outline 1a) Begin like the Bucket Algorithm 1b) Form the MiniCon Descriptors For sub-goal g in the query Q mapped to sub-goal g in view V (bucket), look at the variables Q and consider the join predicates to find the minimal additional set of sub-goals in Q that must be mapped to sub-goal in V in order V be usable. 2) Combine MCD-s - proceed as in the bucket algorithm but consider rewritings involving only disjoin MCD-s - no need of containment check (additional speedup) for each rewriting

Example: Setting and 1 st phase q(d) :- Major(S, D), Registered(S, 444, Q), Advises(P, S) V1(dept) :- Major(student,dept), Registered(student,444,quarter) V2(prof,student,area) :- Advises(prof,student), Prof(prof,area) V3(dept,c-number) :- Major(student,dept), Registered(student,c-number,quarter), Advises(prof,student) Bucket Algorithm (phase 1): 1. Major(S,Q) 2. Registered(S,C,Q) V3(D,C) 3. Advises(P,S) V2(P,S,A )

Example: MCD Construction 1/3 q(d) :- Major(S, D), Registered(S, 444, Q), Advises(P, S) V1(dept) :- Major(student,dept), Registered(student,444,quarter) 1. Major(S,Q) 2. Registered(S,C,Q) V3(D,C) 3. Advises(P,S) V2(P,S,A ) In order V1 (q:major -> V1:Major) to be usable we need to be able to join Major with Registered and Advises on Student. Since Student is not in the head of V1, V1 should include those two joins but is include only of them. So join with Advises on S cannot be done unless additional functional dependencies exist and are known. We apply the same argument to determine that the mapping (q:registered -> V1:Registered) is not possible.

Example: MCD Construction 2/3 q(d) :- Major(S, D), Registered(S, 444, Q), Advises(P, S) V2(prof,student,area) :- Advises(prof,student), Prof(prof,area) 1. Major(S,Q) 2. Registered(S,C,Q) V3(D,C) 3. Advises(P,S) V2(P,S,A ) In order V2 (q:advises->v2:advises) to be usable we need to be able to join it with Major and Registered on Student. Since (S-> Student) is in the head of V2 we can apply the join predicates later and we don t need to do additional mappings.

Example: MCD Construction 3/3 q(d) :- Major(S, D), Registered(S, 444, Q), Advises(P, S) V3(dept,c-number) :- Major(student,dept), Registered(student,c-number,quarter), Advises(prof,student) 1. Major(S,Q) 2. Registered(S,C,Q) V3(D,C) 3. Advises(P,S) V2(P,S,A ) In order V3 (q:major->v3:major) to be usable we need to be able to join it with Registered and Advises on Student. Since S is not in the head of V3 we have to map Registered and Advises also.

Example: Combining MCD-s 1 2 3 q(d) :- Major(S, D), Registered(S, 444, Q), Advises(P, S) MCD-s V(Y) Head homomorphism* h ϕ** Sub-goals V1(dept) 1,2,3 V2 (prof,student,area) identity P profs, 3 S students V3 (dept,c-number) identity D->dept 1,2,3 c-number->444 * Can equate distinguished variables h(x)=h(h(x)) ** Partial mapping of Vars(Q) to head h(vars(v)) Now we have to consider only disjoined MCD-s when combining. V2 s and V3 s are not disjoined so we would consider only rewriting involving V3

Rules and Properties of MiniCon For a query Q sub-goal g, and a view V sub-goal g, we map g to g, with the following properties for every query variable X that is mapped to view variable A: Case I: X is head variable, A is head variable OK Case II: X is not head variable, A is head variable OK Case III: X is head variable, A is not head variable NOT OK (x need to be in the answer but a is not exported) Case IV: X is not head variable, A is not head variable??? All the query sub-goals using X must be able to be mapped to other sub-goals in V in order to be able to reconstruct the join Given a query Q, a set of views V, and the set of MCD-s C for Q over the views in V, the only combinations of MCD-s that result in non-redundant rewritings of Q are of the from C 1, C 2,, C l, where Sub-goals(Q) = Goals(C 1 ) Goals(C 2 ) For every i j, Goals(C i ) Goals(C j ) =

Experimental Results Chain queries with only few rewritings Chain queries with many rewritings

Experimental Results Unique sub-goal joins with everything. Few rewritings. Every sub-goal is joined with every other sub-goal. Prune earlier.

Experimental Results Comparison predicates don t slow MiniCon too much. Only few rewritings so overhead not large. Prune rewritings because of comparison predicates

More and Conclusions Completeness Certain answers and maximally-contained rewritings (closedworld, open world assumptions) Use of MiniCon in the context of query optimization MiniCon algorithm scales better with the number of views. Though it requires more preprocessing, it reduces the work at the more expensive rewriting phase