Reformulating Aggregate Queries using Views. Abhijeet Mohapatra, Michael R. Genesereth

Similar documents
Reformulating Aggregate Queries Using Views

Dexter respects users privacy by storing users local data and evaluating queries client sided inside their browsers.

Data integration lecture 3

Chapter 6 - Part II The Relational Algebra and Calculus

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Query Rewriting Using Views in the Presence of Inclusion Dependencies

Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems

Virtual Data Integration

DATABASE THEORY. Lecture 12: Evaluation of Datalog (2) TU Dresden, 30 June Markus Krötzsch

Data integration supports seamless access to autonomous, heterogeneous information

DATABASE THEORY. Lecture 15: Datalog Evaluation (2) TU Dresden, 26th June Markus Krötzsch Knowledge-Based Systems

Access Patterns (Extended Version) Chen Li. Department of Computer Science, Stanford University, CA Abstract

Data Integration: A Theoretical Perspective

Data integration lecture 2

Relational Model, Relational Algebra, and SQL

With Great Power... Inverses of Power Functions. Lesson 9.1 Assignment. 1. Consider the power function, f(x) 5 x 7. a. Complete the table for f(x).

Spring 2018 CSE191 Midterm Exam I

SQL (Structured Query Language)

Database Learning: Toward a Database that Becomes Smarter Over Time

Logic Programs for Consistently Querying Data Integration Systems

Overview of Prometheus Mediator. Snehal Thakkar CSCI 548 Information Integration On the Web

Introduction Data Integration Summary. Data Integration. COCS 6421 Advanced Database Systems. Przemyslaw Pawluk. CSE, York University.

CHAPTER 7. Copyright Cengage Learning. All rights reserved.

Introduction to Prometheus Mediator

Integer Arithmetic. CS 347 Lecture 03. January 20, 1998

Database Systems CSE 414

3. Relational Data Model 3.5 The Tuple Relational Calculus

Data Integration 1. Giuseppe De Giacomo. Dipartimento di Informatica e Sistemistica Antonio Ruberti Università di Roma La Sapienza

CSE-6490B Final Exam

Complexity of Answering Queries Using Materialized Views

Range Queries. Kuba Karpierz, Bruno Vacherot. March 4, 2016

Foundations of Schema Mapping Management

CSE 344 Final Review. August 16 th

CSE Midterm - Spring 2017 Solutions

Query Containment in the Presence of Limited Access Patterns. Abstract

CS 347 Parallel and Distributed Data Processing

OPTIMIZING RECURSIVE INFORMATION GATHERING PLANS. Eric M. Lambrecht

Computing Query Answers with Consistent Support

Alon Levy University of Washington

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

CSE414 Midterm Exam Spring 2018

Graph Data Management

VALUE RECONCILIATION IN MEDIATORS OF HETEROGENEOUS INFORMATION COLLECTIONS APPLYING WELL-STRUCTURED CONTEXT SPECIFICATIONS

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Relational Algebra and SQL

(i.e., produced only a subset of the possible answers). We describe the novel class

CS1110. Lecture 6: Function calls

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management

Apache HAWQ (incubating)

Joins, NULL, and Aggregation

INTRODUCTION TO PROLOG

Supplemental 1.5. Objectives Interval Notation Increasing & Decreasing Functions Average Rate of Change Difference Quotient

Data Blocks: Hybrid OLTP and OLAP on compressed storage

Relational Algebra. B term 2004: lecture 10, 11

Announcements. Database Systems CSE 414. Datalog. What is Datalog? Why Do We Learn Datalog? HW2 is due today 11pm. WQ2 is due tomorrow 11pm

Functions. Def. Let A and B be sets. A function f from A to B is an assignment of exactly one element of B to each element of A.

Motivation and basic concepts Storage Principle Query Principle Index Principle Implementation and Results Conclusion

Robust and Efficient Fuzzy Match for Online Data Cleaning. Motivation. Methodology

CS 232A: Database System Principles. Introduction. Introduction. Applications View of a Relational Database Management (RDBMS) System

I. Khalil Ibrahim, V. Dignum, W. Winiwarter, E. Weippl, Logic Based Approach to Semantic Query Transformation for Knowledge Management Applications,

Incrementally Maintaining Run-length Encoded Attributes in Column Stores. Abhijeet Mohapatra, Michael Genesereth

Simple SQL Queries (2)

CSE 190D Spring 2017 Final Exam Answers

A Comprehensive Semantic Framework for Data Integration Systems

The Inverse of a Schema Mapping

5.2 E xtended Operators of R elational Algebra

Introduction to Data Management CSE 344. Lecture 14: Datalog

Figure 5.1: Query processing in a data integration system. This chapter focuses on the reformulation step, highlighted in the dashed box.

Computer Science 336 Fall 2017 Homework 2

Consistent Answers from Integrated Data Sources

CSE 344 Midterm Exam

. Relation and attribute names: The relation and attribute names in the mediated. Describing Data Sources. 3.1 Overview and Desiderata

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data?

Answering Queries Using Views: A Survey. Alon Y. Halevy. Stanislav Angelov, CIS650 Data Sharing and the Web U. of Pennsylvania

Update Exchange with Mappings and Provenance

Rec 8 Notes. 1 Divide-and-Conquer Convex Hull. 1.1 Algorithm Idea. 1.2 Algorithm. Here s another random Divide-And-Conquer algorithm.

Calculus Chapter 1 Limits. Section 1.2 Limits

Database Systems CSE 414. Lecture 9-10: Datalog (Ch )

Data Integration. Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart

I Relational Database Modeling how to define

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Leveraging the Expressivity of Grounded Conjunctive Query Languages

The Extreme Value Theorem (IVT)

Midterm 1. CS Intermediate Data Structures and Algorithms. October 23, 2013

Section 1.6. Inverse Functions

How to Deploy Enterprise Analytics Applications With SAP BW and SAP HANA

Maurizio Lenzerini. Dipartimento di Ingegneria Informatica Automatica e Gestionale Antonio Ruberti

Virtual views. Incremental View Maintenance. View maintenance. Materialized views. Review of bag algebra. Bag algebra operators (slide 1)

Optimizing Recursive Information Gathering Plans

Chapter 5 Relational Algebra. Nguyen Thi Ai Thao

Structural Characterizations of Schema-Mapping Languages

Functions. How is this definition written in symbolic logic notation?

LOGIC AND DISCRETE MATHEMATICS

Functions 2/1/2017. Exercises. Exercises. Exercises. and the following mathematical appetizer is about. Functions. Functions

Theorem proving. PVS theorem prover. Hoare style verification PVS. More on embeddings. What if. Abhik Roychoudhury CS 6214

RELATIONAL DATA MODEL

Data Integration Services

Parallel SQL and Streaming Expressions in Apache Solr 6. Shalin Shekhar Lucidworks Inc.

XML publishing. Querying and storing XML. From relations to XML Views. From relations to XML Views

Resource Sharing in Continuous Sliding-Window Aggregates

Transcription:

Reformulating Aggregate Queries using Views Abhijeet Mohapatra, Michael R. Genesereth

LAV Integration Aggregate Queries Reformulation Big Picture What is the average rating of Man of Steel? Common Schema IMDB Movie Rating Man of Steel 8.1 Twilight 5.4 Netflix Movie Views Man of Steel 100 Twilight (ratings out of 10.0) 10 Movie Rating Reviews Man of Steel 4.3 200 Twilight 1.0 550 (ratings out of 5.0)

Global as View (GAV) Local as View (LAV) Master Schema m(x, Y, Z) m(x, Y, Z) Master Schema s1(x, Y) s2(x, Y) s1(x, Y) s2(x, Y) Sources Master Schema mapped to Sources m(x, Y, s1) :- s1(x, Y) m(x, Y, s2) :- s2(x, Y) Hard to add sources Easy to query Sources Sources mapped to Master Schema s1(x, Y) :- m(x, Y, s1) s2(x, Z) :- m(x, Y, s2) Easy to add sources Hard to query

Query Answering in LAV Answering a Query using Views Query m(x, Y, Z) Master Schema Sources are views over Master Schema predicates s1(x, Y, s1) :- m(x, Y, Z) s1(x, Y) Source

Query Answering in LAV Answering a Query using Views Query m(x, Y, Z) Query Master Schema Master Schema s1(x, Y) Source Source Inverse Method (Duschka and Genesereth 97)

Inverse Method [Duschka and Genesereth 97] Sources are mapped to Master Schema s1(x, Y) :- m(x, Y, s1) a b s1(x, Y) b c Inverse Rules m(x, Y, s1) :- s1(x, Y) m(x, Y, Z) a b s1 b c s1 We can reformulate m(x, Y, Z) using the source s1

Inverse Method [Duschka and Genesereth 97] Equivalent reformulations cannot always be generated!! s1(x, Y) :- m(x, Y, Z) a b s1(x, Y) b c Inverse Rules m(x, Y, f(x, Y)) :- s1(x, Y) m(x, Y, Z) a b f(a, b) b c f(b, c) Generates equivalent reformulations when they exist

Leverage Inverse Method to reformulate non-aggregate queries using views What if the query and / or views contain aggregates? Finding equivalent reformulations is undecidable!

Prior Work Small fraction of prior work on query reformulation addresses aggregate queries (references in paper) Reformulations are considered in restricted settings: Same aggregates in the query and the views Built-in aggregates only (min, max, sum and count) Central Rewritings [Afrati 01]

Representation Extend Datalog using sets and tuples Generate sets using setof operator t(x, W) :- setof(y, m(x, Y), W) For every X, the set W = {Y m(x, Y)} m(x, Y) a 1 a 2 b 1 t(x, W) a {1, 2} b {1}

Representation Aggregates are predicates over sets sum({ }, 0) sum({x Y}, S) :- sum(y, T), S = T + X avg(w, A) :- sum(w, S), count(w, C), A = S / C Modular definition t(x, W) :- setof(y, m(x, Y), W) s1(x, S) :- t(x, W), sum(w, S) m(x, Y) t(x, W) t(x, W) s1(x, S) a 1 a {1, 2} a {1, 2} a 3 a 2 b {1} b {1} b 1 b 1

Inverting aggregate views s1(x, S) :- setof(y, m(x, Y), W), sum(w, S) Rewrite using an aggregate view :- auxiliary view t(x, W) auxiliary view, aggregate predicate s1(x, S) :- t(x, W), sum(w, S) t(x, W) :- setof(y, m(x, Y), W) Inverting aggregate views Inverting views with sets

Inverting views with sets t(x, W) :- setof(y, m(x, Y), W) Inverse Rule (t - 1 ) m(x, Y) :- t(x, W), Y W t(x, S) a {1, 2} b {1} m(x, Y) a 1 a 2 b 1

Inverting aggregate views s1(x, S) :- setof(y, m(x, Y), W), sum(w, S) Rewrite using auxiliary views s1(x, S) :- t(x, W), sum(w, S) t(x, W) :- setof(y, m(x, Y), W) Invert modified view definition m(x, Y) :- t(x, W), Y W t(x, f(x, S)) :- s1(x, S) sum(f(x, S), S) :- s1(x, S)

Algorithm Invertagg Input : Query q and set of views V Expand the non-recursive aggregates in q as q Rewrite the views V using auxiliary views Invert the modified view definitions (V -1 ) Generate functional dependencies Γ Output : Query plan {q } V -1 Γ

Example movie(m, R) score(m, S) views(m, C) Query q(m, A) :- setof(r, movie(m, R), W), avg(w, A) Views score(m, S) :- setof(r, movie(m, R), W), sum(w, S) views(m, C) :- setof(r, movie(m, R), W), count(w, C)

Modified Query q (M, A) :- setof(y, movie(m, R), W), sum(w, S) count(w, C), A = S / C Inverse Rules (V - 1 ) movie(m, R) :- t(m, W), R W t(m, f(m, S)) :- score(m, S) t(m, g(m, C)) :- views(m, C) sum(f(m,s), S) :- score(m, S) count(g(m,c), C) :- views(m, C) Γ: t(m, X) & t(m, Y) X = Y

LAV Integration Aggregate Queries score(m, S) Man of Steel 16600 Twilight 2550 Reformulation Big Picture views(m, S) Man of Steel 2000 Twilight 100 What is the average rating of Man of Steel? q( Man of Steel, A) t( Man of Steel, W), sum( Man of Steel, S), count( Man of Steel, C), A = S / C Man of Steel Man of Steel t(m, W) f( Man of Steel, 16600) g( Man of Steel, 2000) sum(w, S) f( Man of Steel, 16600) 16600 q( Man of Steel, 8.3) count(w, C) g( Man of Steel, 2000) 2000

Properties of Invertagg If a query can be equivalently reformulated using a supplied set of views, Invertagg computes such a reformulation. Size of the query plan is linear in the size of the input.

LAV Integration Aggregate Queries Reformulation Big Picture optimize Our technique compute using available resources Examples: 1. BOOM Project (UCB) pathcount(x,y, count<c>) :- path(x, Z, C) stableroute(x,y) :- pathcount(x,y, Z), router(y), Z > 10 2. A deductive approach to AI planning [Brogi 03] addj(cond) :- firedj(α), postcond(α, Cond, pos)

Thank you!