Perm Integrating Data Provenance Support in Database Systems
|
|
- Hugh Lane
- 5 years ago
- Views:
Transcription
1 Perm Integrating Data Provenance Support in Database Systems Boris Glavic Database Technology Group Department of Informatics University of Zurich Gustavo Alonso Systems Group Department of Computer Science ETH Zurich
2 Overview. Introduction 2. The Perm Provenance Representation 3. Query Rewriting for Provenance Computation 4. Advanced Topics and New Stuff 5. Results 6. Conclusion 2
3 . Introduction Query Which input data item(s) contributed to which output data item(s)? Granularity Tuple Attribute Value... Contribution semantics Influence (Why) Copy (Where)... 3
4 . Introduction Application domains Datawarehousing Scientific data and curated databases Data Integration / Data Exchange Workflow-Management-Systems Accountability... 4
5 . Introduction The problem of computing this type of provenance has been solved before See e.g. [Cui, Widom ICDE 00] but... Non-relational representation of provenance data Separation of provenance and normal data Non-relational computation of provenance data Important SQL-features were not addressed Nested Subqueries 5
6 . Introduction A lot of work has been done in Data Provenance A solid theoretical foundation Several Workflow systems support provenance Some research prototypes 6
7 . Introduction Lack of real Provenance Management Systems Manage provenance information Generate on the fly Store for later use Handle external provenance Make provenance queryable Expressive provenance representation Expressive query language In connection with normal data Contribution semantics No one fits all 7
8 . Introduction Perm Provenance Extension of the Relational Model Provenance Management System Pure relational representation of provenance Query result tuples and provenance tuples are represented as a single relation 8
9 . Introduction Benefits: Provenance can be Stored in standard DBMS... Queried using SQL... Directly interpreted by a user Direct association between provenance and normal data 9
10 . Introduction Provenance computation On the fly -> Use query rewrite Given query q Generate query q+ Computes the provenance of all result tuples from q By propagation 0
11 . Introduction Benefits: Rewritten query is expressed in relational algebra Can be optimized and executed by a R-DBMS E.g. can be stored as a view Used as a subquery
12 . Introduction Extension of PostgreSQL DBMS Implemented inside of PostgreSQL Extended SQL language Perm module Implements algebraic rewrite on internal representation of a query 2
13 . Introduction SQL-PLE: SQL extension SELECT PROVENANCE... Nice benefits: CREATE VIEW x AS SELECT PROVENANCE... SELECT PROVENANCE... INTO x... SELECT... FROM (SELECT PROVENANCE... 3
14 . Introduction External provenance SELECT PROVENANCE... FROM view PROVENANCE (attr,...) Trace provenance of intermediate results SELECT PROVENANCE... FROM view BASERELATION 4
15 . Introduction Perm Browser SELECT PROVENANCE... SELECT a,b... JDBC psql Perm 5
16 . Introduction Perm Browser SELECT PROVENANCE... SELECT a,b... JDBC psql Perm 6
17 . Introduction Perm Architecture Parser & Analyser Rewriter Perm Module Planner Executor SELECT PROVENANCE... Q =... Q =... Q + =... MergeJoin (... 7
18 Overview. Introduction 2. The Perm Provenance Representation 3. Query Rewriting for Provenance Computation 4. Results 5. Conclusion 6. Demo 8
19 2. The Perm Provenance Representation What information are we representing? Tuple-level provenance Contributing tuples from base relations For a query Contribution semantics: Influence (Lineage) 9
20 2. The Perm Provenance Representation Definition of contribution semantics Why/Influence-provenance Introduced in [Cui, Widom ICDE 00] Provenance defined as a list of subsets of the input relations Defined for a single algebra operator and a single result tuple 20
21 2. The Perm Provenance Representation Definition : For a single algebra operator op with input relations T,..., Tn a list (T*,...,Tn*) of maximal subsets of the input relation is the provenance of a tuple t from the result of op iff:. op(t*,..., Tn*) = t 2. For all i and t* with t* in Ti*: op(t*,... Ti-*, t*, Ti+*,...,Tn*)!= " 2
22 2. The Perm Provenance Representation sales sname Coop Coop itemid items id 2 3 price
23 2. The Perm Provenance Representation Compute the sum of sales for each shop SELECT sname, sum(price) FROM sales, items WHERE itemid = id GROUP BY sname; 23
24 2. The Perm Provenance Representation sales sname Coop Coop itemid items id 2 3 result name Coop price Sum(price)
25 2. The Perm Provenance Representation sales sname Coop Coop itemid items id 2 3 result name Coop price Sum(price)
26 2. The Perm Provenance Representation sales sname Coop Coop itemid items id 2 3 result name Coop price Sum(price)
27 2. The Perm Provenance Representation Desired result format: Original Attributes Relation Attributes Relation n Attributes 27
28 2. The Perm Provenance Representation Original result sales items name sum(price) P(sName) P(itemId) P(id) P(price) Coop 0 Coop Coop 0 Coop
29 Overview. Introduction 2. The Perm Provenance Representation 3. Query Rewriting for Provenance Computation 4. Results 5. Conclusion 6. Demo 29
30 3. Query Rewriting for Provenance Computation Rewrite method basics Use algebra representation of the query Replace every algebra operator with an algebra statement that propagates provenance alongside with the original results -> need a rewrite rule for each relational algebra operator 30
31 3. Query Rewriting for Provenance Computation Rewrite process op op3 op2 3
32 3. Query Rewriting for Provenance Computation Rewrite process Apply Rewrite rule opa op opb opc op3 op2 op3 op2 32
33 3. Query Rewriting for Provenance Computation Rewrite process opa opb opc Apply Rewrite rules op3 op2 33
34 3. Query Rewriting for Provenance Computation Rewrite rules notations: T + P(T + ) Rewritten statement (query) Provenance attributes 34
35 3. Query Rewriting for Provenance Computation Rewrite rules example: SELECT agg, G FROM T GROUP BY G SELECT agg, G, P(T + ) FROM (SELECT agg, G FROM T GROUP BY G) AS agg LEFT OUTER JOIN (SELECT G AS G, P(T + ) FROM T + ) AS prov ON (G = G ) 35
36 3. Query Rewriting for Provenance Computation Rewrite rules example: SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop sales shop month revenue result sum shop Jan Feb 0 50 Coop Mar 0 Coop Jan 25 Coop Feb 25 36
37 3. Query Rewriting for Provenance Computation SELECT sum, shop, pshop, pmonth, prevenue FROM (SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS agg LEFT OUTER JOIN (SELECT shop AS shop, pshop, pmonth, prevenue FROM sales + ) AS prov ON (shop = shop ) sum shop pshop pmonth prevenue 20 Jan Feb 0 20 Mar 0 50 Coop Coop Jan Coop Coop Feb 25 37
38 3. Query Rewriting for Provenance Computation SELECT sum, shop, pshop, pmonth, prevenue FROM (SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS agg LEFT OUTER JOIN (SELECT shop AS shop, pshop, pmonth, prevenue FROM sales + ) AS prov ON (shop = shop ) sum shop pshop pmonth prevenue 20 Jan Feb 0 20 Mar 0 50 Coop Coop Jan Coop Coop Feb 25 38
39 3. Query Rewriting for Provenance Computation SELECT sum, shop, pshop, pmonth, prevenue FROM (SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS agg LEFT OUTER JOIN (SELECT shop AS shop, pshop, pmonth, prevenue FROM sales + ) AS prov ON (shop = shop ) sum shop pshop pmonth prevenue 20 Jan Feb 0 20 Mar 0 50 Coop Coop Jan Coop Coop Feb 25 39
40 Overview. Introduction 2. The Perm Provenance Representation 3. Query Rewriting for Provenance Computation 4. Advanced Topics and New Stuff 5. Results 6. Conclusion 40
41 4. Copy Contribution Semantics Influence contribution semantics (I-CS) Also tuples with conditional influence Copy contribution semantics (C-CS) Tuples that have been (partially) copied to the result Subsumption: C-CS is a subset of I-CS 4
42 4. Copy Contribution Semantics Computation:. Statically analyze the query: Which attributes are copied to which attributes 2. Use query rewrite again Could use ) to filter I-CS output.....but ) can be used to heavily prune 42
43 4. Transformation Provenance Until now: Data-data provenance Which input data influenced which output data? Why not: Transformation provenance Which parts of a query influenced which output data? 43
44 4. Transformation Provenance SELECT * FROM R UNION SELECT * FROM S; SELECT * FROM R LEFT JOIN S on (a = b); 44
45 4. Transformation Provenance Representation: SELECT * FROM R LEFT JOIN <Not>S</Not> ON... <?xml version... Computation Still query rewrite Bit-sets to represent all operators of query UDF generate representation 45
46 4. Nested Subqueries Provenance for nested subqueries Important for typical provenance application domains Not addressed by other approaches 46
47 Sublinks 4. Nested Subqueries Subqueries in e.g. SELECT-clause " (R) a IN " (b=3) (S) Correlated Nested References outside attributes " (R) a IN " (b=a ) (S) Sublink that contains sublinks " (R) a IN " (b = ANY (T )) (S) 47
48 4. Nested Subqueries Sublinks play different roles in an expression Role can differ per tuple, changes provenance Computation I think you know ;-) Problems q = " a =ANY #b (S)$a=3(R) Definition breaks -> extend it How to determine role? Cannot access result of sublink query -> Produce all possible provenance tuples Simulate join by adding correlation 48
49 4. Optimizations Selection Pushdown Use case: User is interested in only the provenance of a part of the query result E.g. SELECT * FROM (SELECT PROVENANCE...) WHERE... Rationale: Reducing the size of intermediate results is even more important for provenance computation Optimization: More aggressive selection pushdown than provided by Postgres Selection sometimes allow us to transfrom outer into inner joins 49
50 4. Optimizations Un-nesting and de-correlation of sublinks Use case: Queries with sublinks Rationale: Generic strategy is very expensive Provenance computation of a join is relatively cheap and straightforward Optimization: Apply standard un-nesting and de-correlation strategies (maybe adapted for provenance computations) 50
51 Overview. Introduction 2. The Perm Provenance Representation 3. Query Rewriting for Provenance Computation 4. Advanced Topics and New Stuff 5. Results 6. Conclusion 5
52 5. Experimental Results TPC-H benchmark (normal) normal provenance 52
53 Overview. Introduction 2. The Perm Provenance Representation 3. Query Rewriting for Provenance Computation 4. Advanced Topics and New Stuff 5. Results 6. Conclusion 53
54 6. Conclusion Not covered in this talk Complete set of rewrite rules Nested subqueries Other contribution semantics Optimizations Gory implementation details Theoretical foundation 54
55 6. Conclusion Benefits Compute provenance for SQL Full SQL query power for provenance data Lazy or eager computation Reuse existing database technology Supports external provenance 55
56 6. Conclusion Future work Physical operators for more efficient provenance computation Storage compression Include transformation provenance Support different contribution semantics Support various granularities Applications: Data Integration / Data Exchange View update... 56
57 Questions 57
58 4. Handling Sublinks Provenance for nested subqueries Important for typical provenance application domains Not addressed by other approaches 58
59 Sublinks 4. Handling Sublinks Subqueries in e.g. SELECT-clause " (R) a IN " (b=3) (S) Correlated Nested References outside attributes " (R) a IN " (b=a ) (S) Sublink that contains sublinks " (R) a IN " (b = ANY (T )) (S) 59
60 4. Handling Sublinks What is the provenance of a sublink according to Definition? Sublinks can be used in different contexts Selection Projection... Sublink is used in an expression -> role of sublink in expression influences provenance 60
61 Example: 4. Handling Sublinks q = " a =ANY #b (S)$a=3(R) R a 2 3 S b 2 4 c Result a 2 3 6
62 Example: 4. Handling Sublinks q = " a =ANY #b (S)$a=3(R) Compute provenance for t = () R a 2 3 S b 2 4 c Result a
63 Example: 4. Handling Sublinks q = " a =ANY #b (S)$a=3(R) Compute provenance for t = (3) R a 2 3 S b 2 4 c Result a
64 4. Handling Sublinks How to compute the provenance according to the extended definition? Use query rewrite Generic strategy (Gen) Specialized strategies 64
65 4. Handling Sublinks Gen-strategy Correlated sublinks are problematic For queries we cannot un-nest and decorrelate. Join original query with all possible provenance tuples (base relations) 2. Rewrite the sublink query 3. Introduce additional correlation to simulate a join between ) and 2) 65
66 Overview. Introduction 2. The Perm Provenance Representation 3. Query Rewriting for Provenance Computation 4. Handling Sublinks 5. Optimizations 6. Results 7. Conclusion 66
67 4. Handling Sublinks What is the provenance of a sublink according to Definition? Sublinks can be used in different contexts Selection Projection... Sublink either Produces exactly one value Or produces a boolean value 67
68 4. Handling Sublinks Single uncorrelated ANY-sublinks in selection conditions For other Types of sublinks Correlated sublinks Nested sublinks 68
69 4. Handling Sublinks Single uncorrelated ANY-sublinks in selection conditions The result of the sublink query is fixed For a given input tuple t the sublink condition is either true or false " a =ANY " (b=3) (S)(R) 69
70 4. Handling Sublinks Some terminology The query of a sublink T sub The conditional expression of a sublink C sub q = " a =ANY #b (S)(R) T sub C sub " b (S) a = ANY " b (S) 70
71 4. Handling Sublinks Sublink condition can play different roles in a condition C of a selection (for one input tuple t): Reqtrue: the selection condition is true, iff C sub is true Reqfalse: the selection condition is true, iff C sub is false Ind: the selection condition is true indepedent of the result of C sub 7
72 4. Handling Sublinks Some more terminology All tuples from the sublink query that fulfill the unquantified sublink condition T true (t) sub All tuples from the sublink query that do not fulfill the unquantified sublink condition T sub false (t) C sub = (a = ANY " b=3 (S)) C sub = (a = b) 72
73 4. Handling Sublinks Back to ANY-sublinks in selections Proposition: " T * (t) = T true (t) # sub sub $ T sub reqtrue reqfalse,ind 73
74 Example: 4. Handling Sublinks q = " (S)(R) a =ANY #b Compute provenance for t = () R a 2 3 S b 2 4 c Result a 2 74
75 4. Handling Sublinks q = " a =ANY #b (S)(R) T sub = " b (S) C sub is reqtrue C sub = (a = b) T * true = T sub sub T sub true (t) = {()} 75
76 4. Handling Sublinks q = " a =ANY #b (S)(R) Compute provenance for t = () R a 2 3 Tsub b 2 4 C sub = (a = b) T sub true (t) = {()} 76
77 4. Handling Sublinks q = " a =ANY #b (S)(R) Compute provenance for t = () R a 2 3 S b 2 4 c Tsub b 2 4 Result a 2 R * a * Tsub b 77
78 4. Handling Sublinks Definition is ambiguous for queries with more than one sublink! q = " C #C 2 (U) C = (a =ANY R) C 2 = (a > ALL S) R b 2 S c t = (5) 5 U a 5 Result a
79 4. Handling Sublinks Definition is ambiguous for queries with more than one sublink! q = " C #C 2 (U) C = (a =ANY R) C 2 = (a > ALL S) true false R b 2 S c t = (5) 5 U a 5 Result a
80 4. Handling Sublinks q = " C #C 2 (U) C = (a =ANY R) C 2 = (a > ALL S) t = (5) Solution Solution 2 R* b 5 S* c 5 U* a 5 R* b 00 S* b U* a 5 80
81 4. Handling Sublinks q = " C #C 2 (U) C = (a =ANY R) C 2 = (a > ALL S) true false t = (5) Solution Solution 2 R* b 5 S* c 5 U* a 5 R* b 00 S* b U* a 5 8
82 4. Handling Sublinks q = " C #C 2 (U) C = (a =ANY R) C 2 = (a > ALL S) false true t = (5) Solution Solution 2 R* b 5 S* c 5 U* a 5 R* b 00 S* b U* a 5 82
83 4. Handling Sublinks Reasons for this ambiguity: The definition requires the provenance to produce the same result But not to produce the same results for the sublinks -> Definition produces false positives 83
84 4. Handling Sublinks Solution: Extend definition Add a third condition: For each sublink: If computed for one result tuple t one tuple from the provenance of the sublink Produces same sublink result as in the original query 84
85 4. Handling Sublinks q = " C #C 2 (U) C = (a =ANY R) C 2 = (a > ALL S) R* b 5 S* c 5 t = (5) Solution Solution 2 U* a 5 R* b S* b U* a
Perm: Processing provenance and data on the same data model through query rewriting
Perm: Processing provenance and data on the same data model through query rewriting oris Glavic Database Technology Research Group University of Zurich glavic@ifi.uzh.ch Gustavo Alonso ystems Group Department
More informationWho we are: Database Research - Provenance, Integration, and more hot stuff. Boris Glavic. Department of Computer Science
Who we are: Database Research - Provenance, Integration, and more hot stuff Boris Glavic Department of Computer Science September 24, 2013 Hi, I am Boris Glavic, Assistant Professor Hi, I am Boris Glavic,
More informationAnnouncements. From SQL to RA. Query Evaluation Steps. An Equivalent Expression
Announcements Introduction to Data Management CSE 344 Webquiz 3 is due tomorrow Lectures 9: Relational Algebra (part 2) and Query Evaluation 1 2 Query Evaluation Steps Translate query string into internal
More informationIntroduction to Database Systems CSE 444
Introduction to Database Systems CSE 444 Lecture 18: Query Processing Overview CSE 444 - Summer 2010 1 Where We Are We are learning how a DBMS executes a query How come a DBMS can execute a query so fast?
More information4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML)
Since in the result relation each group is represented by exactly one tuple, in the select clause only aggregate functions can appear, or attributes that are used for grouping, i.e., that are also used
More informationThe Relational Algebra
The Relational Algebra Relational Algebra Relational algebra is the basic set of operations for the relational model These operations enable a user to specify basic retrieval requests (or queries) 27-Jan-14
More informationIntroduction to Data Management CSE 344. Lectures 9: Relational Algebra (part 2) and Query Evaluation
Introduction to Data Management CSE 344 Lectures 9: Relational Algebra (part 2) and Query Evaluation 1 Announcements Webquiz 3 is due tomorrow 2 Query Evaluation Steps SQL query Translate query string
More informationCSE 344 JANUARY 19 TH SUBQUERIES 2 AND RELATIONAL ALGEBRA
CSE 344 JANUARY 19 TH SUBQUERIES 2 AND RELATIONAL ALGEBRA ASSORTED MINUTIAE Winter storm Inga Online quiz out after class Still due Wednesday, will be shorter but through today s lecture For SQLite submissions,
More informationCS631 Project Query Decorrelation for PGSQL
CS631 Project Query Decorrelation for PGSQL Nikhilesh Sharma - 07305045 Avishek Ghosh - 07305048 Election Reddy - 07305054 Amitraj S Chouhan - 07305056 November 13, 2007 Contents 01 Introduction 3 02 Description
More informationSQL - Data Query language
SQL - Data Query language Eduardo J Ruiz October 20, 2009 1 Basic Structure The simple structure for a SQL query is the following: select a1...an from t1... tr where C Where t 1... t r is a list of relations
More informationLecture 17: Query execution. Wednesday, May 12, 2010
Lecture 17: Query execution Wednesday, May 12, 2010 1 Outline of Next Few Lectures Query execution Query optimization 2 Steps of the Query Processor SQL query Parse & Rewrite Query Query optimization Select
More informationRelational Databases
Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4
More informationMissing Information. We ve assumed every tuple has a value for every attribute. But sometimes information is missing. Two common scenarios:
NULL values Missing Information We ve assumed every tuple has a value for every attribute. But sometimes information is missing. Two common scenarios: Missing value. E.g., we know a student has some email
More informationExperimenting with bags (tables and query answers with duplicate rows):
January 16, 2013 Activities CS 386/586 Experimenting with bags (tables and query answers with duplicate rows): Write an SQL query (and run it against the sailors database) that does the following: 1. List
More informationSQL: The Query Language Part 1. Relational Query Languages
SQL: The Query Language Part 1 CS 186, Fall 2002, Lecture 9 R &G - Chapter 5 Life is just a bowl of queries. -Anon (not Forrest Gump) Relational Query Languages A major strength of the relational model:
More informationQUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION
E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Database Engines Main Components Query Processing Transaction Processing Access Methods JAN 2014 Slide
More informationReexamining Some Holy Grails of Data Provenance
Reexamining Some Holy Grails of Data Provenance Boris Glavic University of Toronto Renée J. Miller University of Toronto Abstract We reconsider some of the explicit and implicit properties that underlie
More informationCS Hot topics in database systems: Data Provenance
CS 595 - Hot topics in database systems: Data Provenance Boris Glavic August 22, 2012 Outline 1 Instructor 2 Course Overview 3 Course Details and Administrative Information Boris Glavic Assistant Professor
More informationSQL Subqueries. T. M. Murali. September 2, T. M. Murali September 2, 2009 CS 4604: SQL Subqueries
SQL Subqueries T. M. Murali September 2, 2009 Linear Notation for Relational Algebra Relational algebra expressions can become very long. Use linear notation to store results of intemediate expressions.
More informationCIS 330: Applied Database Systems
1 CIS 330: Applied Database Systems Lecture 7: SQL Johannes Gehrke johannes@cs.cornell.edu http://www.cs.cornell.edu/johannes Logistics Office hours role call: Mondays, 3-4pm Tuesdays, 4:30-5:30 Wednesdays,
More informationCarnegie Mellon Univ. Dept. of Computer Science Database Applications. General Overview - rel. model. Overview - detailed - SQL
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Faloutsos Lecture#6: Rel. model - SQL part1 General Overview - rel. model Formal query languages rel algebra and calculi Commercial
More informationPlan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan
Plan for today Query Processing/Optimization CPS 216 Advanced Database Systems Overview of query processing Query execution Query plan enumeration Query rewrite heuristics Query rewrite in DB2 2 A query
More informationQuery Processing: an Overview. Query Processing in a Nutshell. .. CSC 468 DBMS Implementation Alexander Dekhtyar.. QUERY. Parser.
.. CSC 468 DBMS Implementation Alexander Dekhtyar.. Query Processing: an Overview Query Processing in a Nutshell QUERY Parser Preprocessor Logical Query plan generator Logical query plan Query rewriter
More informationBasic form of SQL Queries
SQL - 1 Week 6 Basic form of SQL Queries SELECT FROM WHERE target-list relation-list qualification target-list A list of attributes of output relations in relation-list relation-list A list of relation
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 7 - Query execution References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton
More informationEnterprise Database Systems
Enterprise Database Systems Technological Educational Institution of Larissa in collaboration with Staffordshire University Larissa 2006 Dr. Georgia Garani garani@teilar.gr Dr. Theodoros Mitakos teo_ms@yahoo.com
More informationAnnouncements. Subqueries. Lecture Goals. 1. Subqueries in SELECT. Database Systems CSE 414. HW1 is due today 11pm. WQ1 is due tomorrow 11pm
Announcements Database Systems CSE 414 Lecture 6: Nested Queries in SQL HW1 is due today 11pm WQ1 is due tomorrow 11pm no late days WQ3 is posted and due on Oct. 19, 11pm 1 2 Lecture Goals Today we will
More informationLecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy
CompSci 516 Data Intensive Computing Systems Lecture 3 SQL - 2 Instructor: Sudeepa Roy Announcements HW1 reminder: Due on 09/21 (Thurs), 11:55 pm, no late days Project proposal reminder: Due on 09/20 (Wed),
More informationSQL: Queries, Constraints, Triggers
SQL: Queries, Constraints, Triggers [R&G] Chapter 5 CS4320 1 Example Instances We will use these instances of the Sailors and Reserves relations in our examples. If the key for the Reserves relation contained
More informationCS Hot topics in database systems: Data Provenance
CS 595 - Hot topics in database systems: Data Provenance I. Database Provenance I.1 Provenance Models and Systems Boris Glavic September 24, 2012 Introduction Outline 1 How-Provenance, Semirings, and Orchestra
More informationCSC 261/461 Database Systems Lecture 5. Fall 2017
CSC 261/461 Database Systems Lecture 5 Fall 2017 MULTISET OPERATIONS IN SQL 2 UNION SELECT R.A FROM R, S WHERE R.A=S.A UNION SELECT R.A FROM R, T WHERE R.A=T.A Q 1 Q 2 r. A r. A = s. A r. A r. A = t. A}
More informationChapter 19 Query Optimization
Chapter 19 Query Optimization It is an activity conducted by the query optimizer to select the best available strategy for executing the query. 1. Query Trees and Heuristics for Query Optimization - Apply
More informationCSE 544 Principles of Database Management Systems
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 6 Lifecycle of a Query Plan 1 Announcements HW1 is due Thursday Projects proposals are due on Wednesday Office hour canceled
More informationSQL: Queries, Programming, Triggers
SQL: Queries, Programming, Triggers CSC343 Introduction to Databases - A. Vaisman 1 Example Instances We will use these instances of the Sailors and Reserves relations in our examples. If the key for the
More informationCS122 Lecture 4 Winter Term,
CS122 Lecture 4 Winter Term, 2014-2015 2 SQL Query Transla.on Last time, introduced query evaluation pipeline SQL query SQL parser abstract syntax tree SQL translator relational algebra plan query plan
More informationSIT772 Database and Information Retrieval WEEK 6. RELATIONAL ALGEBRAS. The foundation of good database design
SIT772 Database and Information Retrieval WEEK 6. RELATIONAL ALGEBRAS The foundation of good database design Outline 1. Relational Algebra 2. Join 3. Updating/ Copy Table or Parts of Rows 4. Views (Virtual
More informationPANDA A System for Provenance and Data. Example: Sales Prediction Workflow. Example: Sales Prediction Workflow. Backward Tracing. Item Sales.
PANDA A System for Provenance and Data Example: Prediction Workflow Union Predict Agg -1 s 2 Example: Prediction Workflow Backward Tracing Union Predict Agg -1 s Name Amelie Jacques Isabelle Name Address
More informationSQL. Chapter 5 FROM WHERE
SQL Chapter 5 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu Information Science Program School of Information Sciences, University of Pittsburgh 1 Basic SQL Query SELECT FROM WHERE [DISTINCT] target-list
More informationDatabase Design and Programming
Database Design and Programming Jan Baumbach jan.baumbach@imada.sdu.dk http://www.baumbachlab.net Example: EXISTS Set of beers with the same manf as b1, but not the same beer SELECT name FROM Beers b1
More informationSQL Data Query Language
SQL Data Query Language André Restivo 1 / 68 Index Introduction Selecting Data Choosing Columns Filtering Rows Set Operators Joining Tables Aggregating Data Sorting Rows Limiting Data Text Operators Nested
More informationRelational Algebra. Study Chapter Comp 521 Files and Databases Fall
Relational Algebra Study Chapter 4.1-4.2 Comp 521 Files and Databases Fall 2010 1 Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational model
More informationImproving Query Plans. CS157B Chris Pollett Mar. 21, 2005.
Improving Query Plans CS157B Chris Pollett Mar. 21, 2005. Outline Parse Trees and Grammars Algebraic Laws for Improving Query Plans From Parse Trees To Logical Query Plans Syntax Analysis and Parse Trees
More informationContents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...
Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing
More informationWhat happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques
376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list
More informationDatabase Systems CSE 414
Database Systems CSE 414 Lecture 6: Nested Queries in SQL 1 Announcements WQ2 is due on Sunday 11pm no late days HW2 is due on Tuesday 11pm 2 Lecture Goals Today we will learn how to write (even) more
More informationCSCB20 Week 4. Introduction to Database and Web Application Programming. Anna Bretscher Winter 2017
CSCB20 Week 4 Introduction to Database and Web Application Programming Anna Bretscher Winter 2017 Last Week Intro to SQL and MySQL Mapping Relational Algebra to SQL queries Focused on queries to start
More informationSQL. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan
SQL CS 564- Fall 2015 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan MOTIVATION The most widely used database language Used to query and manipulate data SQL stands for Structured Query Language many SQL standards:
More informationVirtual views. Incremental View Maintenance. View maintenance. Materialized views. Review of bag algebra. Bag algebra operators (slide 1)
Virtual views Incremental View Maintenance CPS 296.1 Topics in Database Systems A view is defined by a query over base tables Example: CREATE VIEW V AS SELECT FROM R, S WHERE ; A view can be queried just
More informationData Modeling in Looker
paper Data Modeling in Looker Quick iteration of metric calculations for powerful data exploration By Joshua Moskovitz The Reusability Paradigm of LookML At Looker, we want to make it easier for data analysts
More informationQuery Processing: The Basics. External Sorting
Query Processing: The Basics Chapter 10 1 External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot use traditional
More informationSQL: Queries, Programming, Triggers. Basic SQL Query. Conceptual Evaluation Strategy. Example of Conceptual Evaluation. A Note on Range Variables
SQL: Queries, Programming, Triggers Chapter 5 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 R1 Example Instances We will use these instances of the Sailors and Reserves relations in our
More informationSQL: Data Manipulation Language. csc343, Introduction to Databases Diane Horton Winter 2017
SQL: Data Manipulation Language csc343, Introduction to Databases Diane Horton Winter 2017 Introduction So far, we have defined database schemas and queries mathematically. SQL is a formal language for
More informationRelational Algebra. Relational Query Languages
Relational Algebra π CS 186 Fall 2002, Lecture 7 R & G, Chapter 4 By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect,
More informationSQL: csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Sina Meraji. Winter 2018
SQL: csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Sina Meraji Winter 2018 Introduction So far, we have defined database schemas and queries mathematically. SQL is a
More informationCS122 Lecture 5 Winter Term,
CS122 Lecture 5 Winter Term, 2017-2018 2 Last Time: SQL Join Expressions Last time, began discussing SQL join syntax Original SQL form: SELECT FROM t1, t2, WHERE P Any join conditions are specified in
More informationLecture 3 More SQL. Instructor: Sudeepa Roy. CompSci 516: Database Systems
CompSci 516 Database Systems Lecture 3 More SQL Instructor: Sudeepa Roy Duke CS, Fall 2018 CompSci 516: Database Systems 1 Announcements HW1 is published on Sakai: Resources -> HW -> HW1 folder Due on
More informationCSE 544 Principles of Database Management Systems
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 2 SQL and Schema Normalization 1 Announcements Paper review First paper review is due on Wednesday 10:30am Details on website
More informationHeuristic and Cost-based Optimization for Diverse Provenance Tasks (extended version)
Heuristic and Cost-based Optimization for Diverse Provenance Tasks (extended version) Xing Niu, Raghav Kapoor, Boris Glavic, Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy, Venkatesh Radhakrishnan
More informationAnnouncements. Outline UNIQUE. (Inner) joins. (Inner) Joins. Database Systems CSE 414. WQ1 is posted to gradebook double check scores
Announcements Database Systems CSE 414 Lectures 4: Joins & Aggregation (Ch. 6.1-6.4) WQ1 is posted to gradebook double check scores WQ2 is out due next Sunday HW1 is due Tuesday (tomorrow), 11pm HW2 is
More informationPROVENANCE IN MODIFIABLE DATASETS
PROVENANCE IN MODIFIABLE DATASETS by Jing Zhang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in The University
More informationDo we really understand SQL?
Do we really understand SQL? Leonid Libkin University of Edinburgh Joint work with Paolo Guagliardo, also from Edinburgh Basic questions We are taught that the core of SQL is essentially syntax for relational
More informationAdvanced Oracle Performance Troubleshooting. Query Transformations Randolf Geist
Advanced Oracle Performance Troubleshooting Query Transformations Randolf Geist http://oracle-randolf.blogspot.com/ http://www.sqltools-plusplus.org:7676/ info@sqltools-plusplus.org Independent Consultant
More informationChapter 4 SQL. Database Systems p. 121/567
Chapter 4 SQL Database Systems p. 121/567 General Remarks SQL stands for Structured Query Language Formerly known as SEQUEL: Structured English Query Language Standardized query language for relational
More informationQUERY OPTIMIZATION FOR DATABASE MANAGEMENT SYSTEM BY APPLYING DYNAMIC PROGRAMMING ALGORITHM
QUERY OPTIMIZATION FOR DATABASE MANAGEMENT SYSTEM BY APPLYING DYNAMIC PROGRAMMING ALGORITHM Wisnu Adityo NIM 13506029 Information Technology Department Institut Teknologi Bandung Jalan Ganesha 10 e-mail:
More informationQuerying Data with Transact SQL
Course 20761A: Querying Data with Transact SQL Course details Course Outline Module 1: Introduction to Microsoft SQL Server 2016 This module introduces SQL Server, the versions of SQL Server, including
More informationMidterm Exam #2 (Version B) CS 122A Spring 2018
NAME: SEAT NO.: STUDENT ID: Midterm Exam #2 (Version B) CS 122A Spring 2018 Max. Points: 100 (Please read the instructions carefully) Instructions: - The total time for the exam is 50 minutes; be sure
More informationCSE 344 JANUARY 26 TH DATALOG
CSE 344 JANUARY 26 TH DATALOG ADMINISTRATIVE MINUTIAE HW3 and OQ3 out HW3 due next Friday OQ3 due next Wednesday HW4 out next week: on Datalog Midterm reminder: Feb 9 th RELATIONAL ALGEBRA Set-at-a-time
More informationCSE 344 MAY 7 TH EXAM REVIEW
CSE 344 MAY 7 TH EXAM REVIEW EXAMINATION STATIONS Exam Wednesday 9:30-10:20 One sheet of notes, front and back Practice solutions out after class Good luck! EXAM LENGTH Production v. Verification Practice
More informationDatabasesystemer, forår 2006 IT Universitetet i København. Forelæsning 9: Mere om SQL. 30. marts Forelæser: Esben Rune Hansen
Databasesystemer, forår 2006 IT Universitetet i København Forelæsning 9: Mere om SQL 30. marts 2006 Forelæser: Esben Rune Hansen Today s lecture Subqueries in SQL. Set operators in SQL. Security and authorization
More informationQuery Optimization. Introduction to Databases CompSci 316 Fall 2018
Query Optimization Introduction to Databases CompSci 316 Fall 2018 2 Announcements (Tue., Nov. 20) Homework #4 due next in 2½ weeks No class this Thu. (Thanksgiving break) No weekly progress update due
More informationNext-Generation Parallel Query
Next-Generation Parallel Query Robert Haas & Rafia Sabih 2013 EDB All rights reserved. 1 Overview v10 Improvements TPC-H Results TPC-H Analysis Thoughts for the Future 2017 EDB All rights reserved. 2 Parallel
More informationCan We Trust SQL as a Data Analytics Tool?
Can We Trust SQL as a Data nalytics Tool? SQL The query language for relational databases International Standard since 1987 Implemented in all systems (free and commercial) $30B/year business Most common
More informationPrinciples of Database Systems CSE 544. Lecture #2 SQL The Complete Story
Principles of Database Systems CSE 544 Lecture #2 SQL The Complete Story CSE544 - Spring, 2013 1 Announcements Paper assignment Review was due last night Discussion on Thursday We need to schedule a makeup
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationHW1 is due tonight HW2 groups are assigned. Outline today: - nested queries and witnesses - We start with a detailed example! - outer joins, nulls?
L05: SQL 183 Announcements! HW1 is due tonight HW2 groups are assigned Outline today: - nested queries and witnesses - We start with a detailed example! - outer joins, nulls? 184 Small IMDB schema (SQLite)
More informationCS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen
CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen LECTURE 10: INTRODUCTION TO SQL FULL RELATIONAL OPERATIONS MODIFICATION LANGUAGE Union, Intersection, Differences (select
More informationHandout 9 CS-605 Spring 18 Page 1 of 8. Handout 9. SQL Select -- Multi Table Queries. Joins and Nested Subqueries.
Handout 9 CS-605 Spring 18 Page 1 of 8 Handout 9 SQL Select -- Multi Table Queries. Joins and Nested Subqueries. Joins In Oracle https://docs.oracle.com/cd/b19306_01/server.102/b14200/queries006.htm Many
More informationReview. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System
Review Relational Query Optimization R & G Chapter 12/15 Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory
More informationCompiler. Runtime System
Query Execution SQL Statement Compiler Runtime System Query Execution Plan Result (Relation) 1 Compiler SQL is declarative, for runtime system it has to be translated into something procedural DBMS first
More informationImproving the Performance of OLAP Queries Using Families of Statistics Trees
Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University
More informationSELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName
Announcements Introduction to Data Management CSE 344 Lectures 5: More SQL aggregates Homework 2 has been released Web quiz 2 is also open Both due next week 1 2 Outline Outer joins (6.3.8, review) More
More informationNULLs & Outer Joins. Objectives of the Lecture :
Slide 1 NULLs & Outer Joins Objectives of the Lecture : To consider the use of NULLs in SQL. To consider Outer Join Operations, and their implementation in SQL. Slide 2 Missing Values : Possible Strategies
More informationCOMP9311 Week 10 Lecture. DBMS Architecture. DBMS Architecture and Implementation. Database Application Performance
COMP9311 Week 10 Lecture DBMS Architecture DBMS Architecture and Implementation 2/51 Aims: examine techniques used in implementation of DBMSs: query processing (QP), transaction processing (TxP) use QP
More informationProvenance Management for Frequent Itemsets
Provenance Management for Frequent Itemsets Javed Siddique University of Toronto jsiddique@cs.toronto.edu Boris Glavic Illinois Institute of Technology bglavic@iit.edu Renée J. Miller University of Toronto
More informationCS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing
CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2
More informationSubquery: There are basically three types of subqueries are:
Subquery: It is also known as Nested query. Sub queries are queries nested inside other queries, marked off with parentheses, and sometimes referred to as "inner" queries within "outer" queries. Subquery
More informationyqgm_std_rules documentation (Version 1)
yqgm_std_rules documentation (Version 1) Feng Shao Warren Wong Tony Novak Computer Science Department Cornell University Copyright (C) 2003-2005 Cornell University. All Rights Reserved. 1. Introduction
More informationModule 4. Implementation of XQuery. Part 0: Background on relational query processing
Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:
More informationSQL and Incomp?ete Data
SQL and Incomp?ete Data A not so happy marriage Dr Paolo Guagliardo Applied Databases, Guest Lecture 31 March 2016 SQL is efficient, correct and reliable 1 / 25 SQL is efficient, correct and reliable...
More informationCS425 Fall 2016 Boris Glavic Chapter 1: Introduction
CS425 Fall 2016 Boris Glavic Chapter 1: Introduction Modified from: Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Textbook: Chapter 1 1.2 Database Management System (DBMS)
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I
Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.830 Database Systems: Fall 2015 Quiz I There are 12 questions and 13 pages in this quiz booklet. To receive
More informationIntroduction to SQL Part 2 by Michael Hahsler Based on slides for CS145 Introduction to Databases (Stanford)
Introduction to SQL Part 2 by Michael Hahsler Based on slides for CS145 Introduction to Databases (Stanford) Lecture 3 Lecture Overview 1. Aggregation & GROUP BY 2. Set operators & nested queries 3. Advanced
More informationMidterm Exam #2 (Version B) CS 122A Spring 2018
NAME: SEAT NO.: STUDENT ID: Midterm Exam #2 (Version B) CS 122A Spring 2018 Max. Points: 100 (Please read the instructions carefully) Instructions: - The total time for the exam is 50 minutes; be sure
More informationIntroduction to database design
Introduction to database design First lecture: RG 3.6, 3.7, [4], most of 5 Second lecture: Rest of RG 5 Rasmus Pagh Some figures are taken from the ppt slides from the book Database systems by Kiefer,
More informationAnswering Queries Using Cooperative Semantic Caching
Answering Queries Using Cooperative Caching Andrei Vancea 1, Prof. Dr. Burkhard Stiller 1,2 1 Department of Informatics IFI, Communication Systems Group CSG, University of Zürich 2 associated with the
More informationOptimizing Provenance Computations
arxiv:70.0553v [cs.db] 9 Jan 207 Optimizing Provenance Computations Xing Niu and Boris Glavic IIT DB Group Technical Report IIT/CS-DB-206-02 206-0 http://www.cs.iit.edu/ dbgroup/ LIMITED DISTRIBUTION NOTICE:
More informationIntroduction to Data Management. Lecture #11 (Relational Algebra)
Introduction to Data Management Lecture #11 (Relational Algebra) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v HW and exams:
More informationIan Kenny. November 28, 2017
Ian Kenny November 28, 2017 Introductory Databases Relational Algebra Introduction In this lecture we will cover Relational Algebra. Relational Algebra is the foundation upon which SQL is built and is
More informationIntroduction SQL DRL. Parts of SQL. SQL: Structured Query Language Previous name was SEQUEL Standardized query language for relational DBMS:
Introduction SQL: Structured Query Language Previous name was SEQUEL Standardized query language for relational DBMS: SQL The standard is evolving over time SQL-89 SQL-9 SQL-99 SQL-0 SQL is a declarative
More information