FDB: A Query Engine for Factorised Relational Databases

Size: px
Start display at page:

Download "FDB: A Query Engine for Factorised Relational Databases"

Transcription

1 FDB: A Query Engine for Factorised Relational Databases Nurzhan Bakibayev, Dan Olteanu, and Jakub Zavodny Oxford CS Christan Grant cgrant@cise.ufl.edu University of Florida November 1, 2013 cgrant (UF) FDB: Factorised RDBMS November 1, / 39

2 Contents 1 Intro 2 F-Representation 3 F-Trees 4 Query Evaluation Normalisation Operator Swap Operator Cartesian Product Operator Selection Operator Merge Selection Operator Absorb Selection Operator Selection with Constant Operator Projection Operator 5 Query Optimization Cost of an F-Plan Exhaustive Search Greedy Heuristic 6 Experiments Experiment #1 Experiment #2 Experiment #2 Experiment #3 Experiment #4 7 Conclusion 8 Thanks cgrant (UF) FDB: Factorised RDBMS November 1, / 39

3 Intro Intro A succinct representation for large relations. It is used in Google s F1 data management system for ML. Used for large incomplete/uncertain relations. FDB can speed up expected queries. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

4 Intro Intro A succinct representation for large relations. It is used in Google s F1 data management system for ML. Used for large incomplete/uncertain relations. FDB can speed up expected queries. This paper discusses basics FDB system. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

5 TL;DR Intro Given a query: Q 1 = Order item Store location Disp cgrant (UF) FDB: Factorised RDBMS November 1, / 39

6 TL;DR Intro Given a query: Q 1 = Order item Store location Disp And a result: cgrant (UF) FDB: Factorised RDBMS November 1, / 39

7 TL;DR Intro Given a query: Q 1 = Order item Store location Disp And a result: We can represent it as this: cgrant (UF) FDB: Factorised RDBMS November 1, / 39

8 TL;DR Intro Given a query: Q 1 = Order item Store location Disp And a result: We can represent it as this: Or a tree: cgrant (UF) FDB: Factorised RDBMS November 1, / 39

9 Intro TL;DR Given a query: Q 1 = Order item Store location Disp And a result: We can represent it as this: Or a tree: In exponentially -space and +speed for SPJ queries. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

10 Intro Contents 1 Intro 2 F-Representation 3 F-Trees 4 Query Evaluation Normalisation Operator Swap Operator Cartesian Product Operator Selection Operator Projection Operator 5 Query Optimization Cost of an F-Plan Exhaustive Search Greedy Heuristic 6 Experiments Experiment #1 Experiment #2 Experiment #2 Experiment #3 Experiment #4 7 Conclusion 8 Thanks cgrant (UF) FDB: Factorised RDBMS November 1, / 39

11 Example Schema Intro cgrant (UF) FDB: Factorised RDBMS November 1, / 39

12 F-Representation Factorized Representations Relational algebra expressions. Constructed with singleton relations v, the union and product operators. exponentially more succinct than relations they encode. fast (constant-delay) enumeration of tuples cgrant (UF) FDB: Factorised RDBMS November 1, / 39

13 F-Representation Factorized Representations Definition 1 A factorised representation E, or f-representation for short, over a set S of attributes and domain D is a relational algebra expression of the form, empty relation over S;, the relation consisting of the nullary tuplem if S = ; A : a, the unary relation with a single tuple with value a, if S = {A} and a is a value in the domain D; (E), where E is an f-representation over S; E 1 E n, where each E i is an f-representation over S; E 1 E n, where each E i is an f-representation over S i and S is the disjoint union over all S i. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

14 F-Representation Factorized Representations Singleton A : a is a singleton E the size is the number of singletons A single system may have many different f-representations. The tuples of a given f-representation can be enumerated with O( S ) additional space and delay. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

15 F-Representation Factorized Representations Singleton A : a is a singleton E the size is the number of singletons A single system may have many different f-representations. The tuples of a given f-representation can be enumerated with O( S ) additional space and delay. Factorized Representation Compact Factorized Representation cgrant (UF) FDB: Factorised RDBMS November 1, / 39

16 F-Trees F-Trees Definition An f-tree for short, over a schema S of attributes is an unordered rooted forest with each node labelled by a non-empty subset of S such that each Q1 F-Tree Q1 Alternate F-Tree attribute of S labels exactly one node. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

17 F-Trees F-Trees Definition An f-tree for short, over a schema S of attributes is an unordered rooted forest with each node labelled by a non-empty subset of S such that each Q1 F-Tree Q1 Alternate F-Tree attribute of S labels exactly one node. Tree created in O( S R log R ). We can go back to R in O( S R ). cgrant (UF) FDB: Factorised RDBMS November 1, / 39

18 F-Trees F-Trees Queries Given a query Q = π P σ ϕ (R 1 R n ) Given a query Q, an f-tree T is an f-tree of Q if and only if it satisfies the path constraint. all dependent attributes can only label nodes along a same root-to-leaf path. Nodes of a join are only dependent if they are in the P list or the same R i. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

19 F-Trees F-Trees Size Bounds Let st be the maximal root-to-leaf path edge cover T. For and database D and f-tree T the result size is at most O( P D s(t ) ) cgrant (UF) FDB: Factorised RDBMS November 1, / 39

20 F-Trees F-Trees Size Bounds Let st be the maximal root-to-leaf path edge cover T. For and database D and f-tree T the result size is at most O( P D s(t ) ) If we let s(q) be the minimal s(t ) over all f-trees. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

21 F-Trees F-Trees Size Bounds Let st be the maximal root-to-leaf path edge cover T. For and database D and f-tree T the result size is at most O( P D s(t ) ) If we let s(q) be the minimal s(t ) over all f-trees. P D s(q) Q(D) This can mean asymptotically smaller, exponential size and time savings cgrant (UF) FDB: Factorised RDBMS November 1, / 39

22 Query Evaluation Contents 1 Intro 2 F-Representation 3 F-Trees 4 Query Evaluation Normalisation Operator Swap Operator Cartesian Product Operator Selection Operator Projection Operator 5 Query Optimization Cost of an F-Plan Exhaustive Search Greedy Heuristic 6 Experiments Experiment #1 Experiment #2 Experiment #2 Experiment #3 Experiment #4 7 Conclusion 8 Thanks cgrant (UF) FDB: Factorised RDBMS November 1, / 39

23 Query Evaluation F-plans Any select-project-join query can be evaluated by a sequential composition of operators called an f-plan. f-trees uniquely identify f-representations operators are in tree form. The time complexity of each f-plan operator is O( T 2 N log N), where N is the sum of input and output f-representations and T is the input f-tree. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

24 Query Evaluation Normalisation Operator Normalisation Operator 1 push-up operator ψ B normalization η(t ) A tree is normalized if the push up operator can not be applied. The push-up operator is applied bottom up. ψ B η(t ) Normalization 1 All other operators expect normalized inputs cgrant (UF) FDB: Factorised RDBMS November 1, / 39

25 Query Evaluation Swap Operator Swap Operator X A,B Exchanges B with its parent A and promotes children. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

26 Query Evaluation Cartesian Product Operator Cartesian Product Operator T = T T cgrant (UF) FDB: Factorised RDBMS November 1, / 39

27 Query Evaluation Selection Operator Merge Selection Operator µ A,B If A and B are siblings cgrant (UF) FDB: Factorised RDBMS November 1, / 39

28 Query Evaluation Selection Operator Merge Selection Operator µ A,B If A and B are siblings µ(, ) = cgrant (UF) FDB: Factorised RDBMS November 1, / 39

29 Query Evaluation Selection Operator Absorb Selection Operator α A,B If A is an ancestor of B Example: cgrant (UF) FDB: Factorised RDBMS November 1, / 39

30 Query Evaluation Selection Operator Selection with Constant Operator σ Aθc Filters out all the values in A : a where a θc cgrant (UF) FDB: Factorised RDBMS November 1, / 39

31 Query Evaluation Projection Operator Projection Operator π A Removes a node from the tree. Only permitted if A is a leaf node or with other attributes. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

32 Query Optimization Contents 1 Intro 2 F-Representation 3 F-Trees 4 Query Evaluation Normalisation Operator Swap Operator Cartesian Product Operator Selection Operator Projection Operator 5 Query Optimization Cost of an F-Plan Exhaustive Search Greedy Heuristic 6 Experiments Experiment #1 Experiment #2 Experiment #2 Experiment #3 Experiment #4 7 Conclusion 8 Thanks cgrant (UF) FDB: Factorised RDBMS November 1, / 39

33 Query Optimization Query Optimization Query optimization has two objectives: minimizing the cost of computing a factorised query result; minimizing the size of this output representation. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

34 Query Optimization Query Optimization Optimize the f-tree then the f-representation. Products and Selections are always evaluated first (cheapest). Selection (A = B) can only be done if they are on the same path or siblings cgrant (UF) FDB: Factorised RDBMS November 1, / 39

35 Cost of an F-Plan Query Optimization Cost of an F-Plan use asymptotic bounds of s(t ). with an evaluation time of O( D s(f ) log D ), where s(f ) = max(s(t 0,... s(t k )). cgrant (UF) FDB: Factorised RDBMS November 1, / 39

36 Query Optimization Cost of an F-Plan Cost of an F-Plan use asymptotic bounds of s(t ). with an evaluation time of O( D s(f ) log D ), where s(f ) = max(s(t 0,... s(t k ))., or, use cardinality of estimate from intermediate f-trees using RDBMS style techniques cgrant (UF) FDB: Factorised RDBMS November 1, / 39

37 Query Optimization Exhaustive Search Exhaustive Search Enumerate plans and the one with the shortest path (Dijkstra) between T init to T final is least cost. Search space is O(n 4n ) with n p attributes in the projection list cgrant (UF) FDB: Factorised RDBMS November 1, / 39

38 Query Optimization Greedy Heuristic Greedy Heuristic Only restructures nodes that appear in selections and projections. In projection ordering chooses the least cost at each step. Computing the tree take O( T 2 ). cgrant (UF) FDB: Factorised RDBMS November 1, / 39

39 Experiments Contents 1 Intro 2 F-Representation 3 F-Trees 4 Query Evaluation Normalisation Operator Swap Operator Cartesian Product Operator Selection Operator Projection Operator 5 Query Optimization Cost of an F-Plan Exhaustive Search Greedy Heuristic 6 Experiments Experiment #1 Experiment #2 Experiment #2 Experiment #3 Experiment #4 7 Conclusion 8 Thanks cgrant (UF) FDB: Factorised RDBMS November 1, / 39

40 Experiments Implementation Compare SQLite and PostgreSQL with FDB and RDB for SQLite and PostgreSQL tried to disable expensive writing (try to make it in-memory) Implemented in C++ Optimal f-representation computed with multi-way merge-sort join algorithm. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

41 Experiments Experiment Design Generate random data for queries Repeat a number of times for optimizations time, execution time, representation sizes and f-plan quality Tuples generated independently with a uniform or Zipf distribution. Queries are equality joins cgrant (UF) FDB: Factorised RDBMS November 1, / 39

42 Experiments Experiment #1 Experiment #1 Average time for optimising a query on flat data. A = 40 Attributes, R = 1,..., 8 relations, K = 1,..., 9 selections cgrant (UF) FDB: Factorised RDBMS November 1, / 39

43 Experiments Experiment #1 Experiment #1 Average time for optimising a query on flat data. A = 40 Attributes, R = 1,..., 8 relations, K = 1,..., 9 selections cgrant (UF) FDB: Factorised RDBMS November 1, / 39

44 Experiments Experiment #1 Experiment #1 Average time for optimising a query on flat data. A = 40 Attributes, R = 1,..., 8 relations, K = 1,..., 9 selections cgrant (UF) FDB: Factorised RDBMS November 1, / 39

45 Experiment #2 Experiments Experiment #2 cgrant (UF) FDB: Factorised RDBMS November 1, / 39

46 Experiment #2 Experiments Experiment #2 cgrant (UF) FDB: Factorised RDBMS November 1, / 39

47 Experiment #3 Experiments Experiment #3 Compare performance of FDB, RDB, SQLite and PostgreSQL Results size Evaluation Time cgrant (UF) FDB: Factorised RDBMS November 1, / 39

48 Experiments Experiment #3 Experiment #3 Compare performance of FDB, RDB, SQLite and PostgreSQL. Each tuple size decreases results size by 20 Results size Evaluation time cgrant (UF) FDB: Factorised RDBMS November 1, / 39

49 Experiments Experiment #3 Experiment #3 Compare performance of FDB, RDB, SQLite and PostgreSQL. In Q A is a many-many query. This is highly compressible so the factorized result is small. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

50 Experiment #4 Experiment over Factorized data Experiments Experiment #4 Results size Evaluation time cgrant (UF) FDB: Factorised RDBMS November 1, / 39

51 Conclusion Conclusion Factorizing Relational Data prevents the explosion of data size for large joins. This paper described techniques and experiments to decrease this size and improving query performance. For MLNs and other graph DBs that require many joins this can significantly improve query performance. cgrant (UF) FDB: Factorised RDBMS November 1, / 39

52 Thank you Thanks Questions? cgrant (UF) FDB: Factorised RDBMS November 1, / 39

FDB: A Query Engine for Factorised Relational Databases

FDB: A Query Engine for Factorised Relational Databases FDB: A Query Engine for Factorised Relational Databases Nurzhan Bakibayev, Dan Olteanu, and Jakub Závodný Department of Computer Science, University of Oxford, OX1 3QD, UK {nurzhan.bakibayev, dan.olteanu,

More information

Queries with Order-by Clauses and Aggregates on Factorised Relational Data

Queries with Order-by Clauses and Aggregates on Factorised Relational Data Queries with Order-by Clauses and Aggregates on Factorised Relational Data Tomáš Kočiský Magdalen College University of Oxford A fourth year project report submitted for the degree of Masters of Mathematics

More information

arxiv: v1 [cs.db] 1 Jul 2013

arxiv: v1 [cs.db] 1 Jul 2013 Aggregation and Ordering in Factorised Databases Nurzhan Bakibayev, Tomáš Kočiský, Dan Olteanu, and Jakub Závodný Department of Computer Science, University of Oxford, OX1 3QD, UK {nurzhan.bakibayev, tomas.kocisky,

More information

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement. COS 597: Principles of Database and Information Systems Query Optimization Query Optimization Query as expression over relational algebraic operations Get evaluation (parse) tree Leaves: base relations

More information

Query Solvers on Database Covers

Query Solvers on Database Covers Query Solvers on Database Covers Wouter Verlaek Kellogg College University of Oxford Supervised by Prof. Dan Olteanu A thesis submitted for the degree of Master of Science in Computer Science Trinity 2018

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

Greedy Approach: Intro

Greedy Approach: Intro Greedy Approach: Intro Applies to optimization problems only Problem solving consists of a series of actions/steps Each action must be 1. Feasible 2. Locally optimal 3. Irrevocable Motivation: If always

More information

Union/Find Aka: Disjoint-set forest. Problem definition. Naïve attempts CS 445

Union/Find Aka: Disjoint-set forest. Problem definition. Naïve attempts CS 445 CS 5 Union/Find Aka: Disjoint-set forest Alon Efrat Problem definition Given: A set of atoms S={1, n E.g. each represents a commercial name of a drugs. This set consists of different disjoint subsets.

More information

Chapter 2: Intro to Relational Model

Chapter 2: Intro to Relational Model Non è possibile visualizzare l'immagine. Chapter 2: Intro to Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Example of a Relation attributes (or columns)

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Chapter 2: Intro to Relational Model

Chapter 2: Intro to Relational Model Chapter 2: Intro to Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Example of a Relation attributes (or columns) tuples (or rows) 2.2 Attribute Types The

More information

Parallel Query Optimisation

Parallel Query Optimisation Parallel Query Optimisation Contents Objectives of parallel query optimisation Parallel query optimisation Two-Phase optimisation One-Phase optimisation Inter-operator parallelism oriented optimisation

More information

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building External Sorting and Query Optimization A.R. Hurson 323 CS Building External sorting When data to be sorted cannot fit into available main memory, external sorting algorithm must be applied. Naturally,

More information

looking ahead to see the optimum

looking ahead to see the optimum ! Make choice based on immediate rewards rather than looking ahead to see the optimum! In many cases this is effective as the look ahead variation can require exponential time as the number of possible

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Greedy Algorithms October 28, 2016 École Centrale Paris, Châtenay-Malabry, France Dimo Brockhoff Inria Saclay Ile-de-France 2 Course Overview Date Fri, 7.10.2016 Fri, 28.10.2016

More information

QUERY OPTIMIZATION FOR DATABASE MANAGEMENT SYSTEM BY APPLYING DYNAMIC PROGRAMMING ALGORITHM

QUERY OPTIMIZATION FOR DATABASE MANAGEMENT SYSTEM BY APPLYING DYNAMIC PROGRAMMING ALGORITHM QUERY OPTIMIZATION FOR DATABASE MANAGEMENT SYSTEM BY APPLYING DYNAMIC PROGRAMMING ALGORITHM Wisnu Adityo NIM 13506029 Information Technology Department Institut Teknologi Bandung Jalan Ganesha 10 e-mail:

More information

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University Query Optimization Shuigeng Zhou December 9, 2009 School of Computer Science Fudan University Outline Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational

More information

Factorized Databases

Factorized Databases Factorized Databases http://www.cs.ox.ac.uk/projects/fdb/ Dan Olteanu Maximilian Schleich Department of Computer Science, University of Oxford ABSTRACT This paper overviews factorized databases and their

More information

Optimization Overview

Optimization Overview Lecture 17 Optimization Overview Lecture 17 Lecture 17 Today s Lecture 1. Logical Optimization 2. Physical Optimization 3. Course Summary 2 Lecture 17 Logical vs. Physical Optimization Logical optimization:

More information

Course Review for Finals. Cpt S 223 Fall 2008

Course Review for Finals. Cpt S 223 Fall 2008 Course Review for Finals Cpt S 223 Fall 2008 1 Course Overview Introduction to advanced data structures Algorithmic asymptotic analysis Programming data structures Program design based on performance i.e.,

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Question 1 (a) 10 marks

Question 1 (a) 10 marks Question 1 (a) Consider the tree in the next slide. Find any/ all violations of a B+tree structure. Identify each bad node and give a brief explanation of each error. Assume the order of the tree is 4

More information

managing an evolving set of connected components implementing a Union-Find data structure implementing Kruskal s algorithm

managing an evolving set of connected components implementing a Union-Find data structure implementing Kruskal s algorithm Spanning Trees 1 Spanning Trees the minimum spanning tree problem three greedy algorithms analysis of the algorithms 2 The Union-Find Data Structure managing an evolving set of connected components implementing

More information

Relational Model: History

Relational Model: History Relational Model: History Objectives of Relational Model: 1. Promote high degree of data independence 2. Eliminate redundancy, consistency, etc. problems 3. Enable proliferation of non-procedural DML s

More information

Reference Sheet for CO142.2 Discrete Mathematics II

Reference Sheet for CO142.2 Discrete Mathematics II Reference Sheet for CO14. Discrete Mathematics II Spring 017 1 Graphs Defintions 1. Graph: set of N nodes and A arcs such that each a A is associated with an unordered pair of nodes.. Simple graph: no

More information

Introduction Alternative ways of evaluating a given query using

Introduction Alternative ways of evaluating a given query using Query Optimization Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational Expressions Dynamic Programming for Choosing Evaluation Plans Introduction

More information

Query Evaluation and Optimization

Query Evaluation and Optimization Query Evaluation and Optimization Jan Chomicki University at Buffalo Jan Chomicki () Query Evaluation and Optimization 1 / 21 Evaluating σ E (R) Jan Chomicki () Query Evaluation and Optimization 2 / 21

More information

CMSC 424 Database design Lecture 18 Query optimization. Mihai Pop

CMSC 424 Database design Lecture 18 Query optimization. Mihai Pop CMSC 424 Database design Lecture 18 Query optimization Mihai Pop More midterm solutions Projects do not be late! Admin Introduction Alternative ways of evaluating a given query Equivalent expressions Different

More information

Principles of Data Management. Lecture #12 (Query Optimization I)

Principles of Data Management. Lecture #12 (Query Optimization I) Principles of Data Management Lecture #12 (Query Optimization I) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v B+ tree

More information

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch Advanced Databases Lecture 4 - Query Optimization Masood Niazi Torshiz Islamic Azad university- Mashhad Branch www.mniazi.ir Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE COMP 62421 Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Querying Data on the Web Date: Wednesday 24th January 2018 Time: 14:00-16:00 Please answer all FIVE Questions provided. They amount

More information

Minimum-Spanning-Tree problem. Minimum Spanning Trees (Forests) Minimum-Spanning-Tree problem

Minimum-Spanning-Tree problem. Minimum Spanning Trees (Forests) Minimum-Spanning-Tree problem Minimum Spanning Trees (Forests) Given an undirected graph G=(V,E) with each edge e having a weight w(e) : Find a subgraph T of G of minimum total weight s.t. every pair of vertices connected in G are

More information

Technical University of Denmark

Technical University of Denmark Technical University of Denmark Written examination, May 7, 27. Course name: Algorithms and Data Structures Course number: 2326 Aids: Written aids. It is not permitted to bring a calculator. Duration:

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

1 Format. 2 Topics Covered. 2.1 Minimal Spanning Trees. 2.2 Union Find. 2.3 Greedy. CS 124 Quiz 2 Review 3/25/18

1 Format. 2 Topics Covered. 2.1 Minimal Spanning Trees. 2.2 Union Find. 2.3 Greedy. CS 124 Quiz 2 Review 3/25/18 CS 124 Quiz 2 Review 3/25/18 1 Format You will have 83 minutes to complete the exam. The exam may have true/false questions, multiple choice, example/counterexample problems, run-this-algorithm problems,

More information

Lecture 13. Reading: Weiss, Ch. 9, Ch 8 CSE 100, UCSD: LEC 13. Page 1 of 29

Lecture 13. Reading: Weiss, Ch. 9, Ch 8 CSE 100, UCSD: LEC 13. Page 1 of 29 Lecture 13 Connectedness in graphs Spanning trees in graphs Finding a minimal spanning tree Time costs of graph problems and NP-completeness Finding a minimal spanning tree: Prim s and Kruskal s algorithms

More information

CS521 \ Notes for the Final Exam

CS521 \ Notes for the Final Exam CS521 \ Notes for final exam 1 Ariel Stolerman Asymptotic Notations: CS521 \ Notes for the Final Exam Notation Definition Limit Big-O ( ) Small-o ( ) Big- ( ) Small- ( ) Big- ( ) Notes: ( ) ( ) ( ) ( )

More information

CSE 100: GRAPH ALGORITHMS

CSE 100: GRAPH ALGORITHMS CSE 100: GRAPH ALGORITHMS Dijkstra s Algorithm: Questions Initialize the graph: Give all vertices a dist of INFINITY, set all done flags to false Start at s; give s dist = 0 and set prev field to -1 Enqueue

More information

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments Administrivia Midterm on Thursday 10/18 CS 133: Databases Fall 2018 Lec 12 10/16 Prof. Beth Trushkowsky Assignments Lab 3 starts after fall break No problem set out this week Goals for Today Cost-based

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

σ (R.B = 1 v R.C > 3) (S.D = 2) Conjunctive normal form Topics for the Day Distributed Databases Query Processing Steps Decomposition

σ (R.B = 1 v R.C > 3) (S.D = 2) Conjunctive normal form Topics for the Day Distributed Databases Query Processing Steps Decomposition Topics for the Day Distributed Databases Query processing in distributed databases Localization Distributed query operators Cost-based optimization C37 Lecture 1 May 30, 2001 1 2 Query Processing teps

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

( ) D. Θ ( ) ( ) Ο f ( n) ( ) Ω. C. T n C. Θ. B. n logn Ο

( ) D. Θ ( ) ( ) Ο f ( n) ( ) Ω. C. T n C. Θ. B. n logn Ο CSE 0 Name Test Fall 0 Multiple Choice. Write your answer to the LEFT of each problem. points each. The expected time for insertion sort for n keys is in which set? (All n! input permutations are equally

More information

CS 6783 (Applied Algorithms) Lecture 5

CS 6783 (Applied Algorithms) Lecture 5 CS 6783 (Applied Algorithms) Lecture 5 Antonina Kolokolova January 19, 2012 1 Minimum Spanning Trees An undirected graph G is a pair (V, E); V is a set (of vertices or nodes); E is a set of (undirected)

More information

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes Schema for Examples Query Optimization (sid: integer, : string, rating: integer, age: real) (sid: integer, bid: integer, day: dates, rname: string) Similar to old schema; rname added for variations. :

More information

Sets MAT231. Fall Transition to Higher Mathematics. MAT231 (Transition to Higher Math) Sets Fall / 31

Sets MAT231. Fall Transition to Higher Mathematics. MAT231 (Transition to Higher Math) Sets Fall / 31 Sets MAT231 Transition to Higher Mathematics Fall 2014 MAT231 (Transition to Higher Math) Sets Fall 2014 1 / 31 Outline 1 Sets Introduction Cartesian Products Subsets Power Sets Union, Intersection, Difference

More information

( ) n 3. n 2 ( ) D. Ο

( ) n 3. n 2 ( ) D. Ο CSE 0 Name Test Summer 0 Last Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to multiply two n n matrices is: A. Θ( n) B. Θ( max( m,n, p) ) C.

More information

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors: Query Optimization Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Schema for Examples (sid: integer, sname: string, rating: integer, age: real) (sid: integer, bid: integer, day: dates,

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

Chapter 11: Query Optimization

Chapter 11: Query Optimization Chapter 11: Query Optimization Chapter 11: Query Optimization Introduction Transformation of Relational Expressions Statistical Information for Cost Estimation Cost-based optimization Dynamic Programming

More information

CS301 - Data Structures Glossary By

CS301 - Data Structures Glossary By CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm

More information

Brute Force: Selection Sort

Brute Force: Selection Sort Brute Force: Intro Brute force means straightforward approach Usually based directly on problem s specs Force refers to computational power Usually not as efficient as elegant solutions Advantages: Applicable

More information

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors: Query Optimization atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Schema for Examples (sid: integer, sname: string, rating: integer, age: real) (sid: integer, bid: integer, day: dates,

More information

Chapter 13: Query Optimization. Chapter 13: Query Optimization

Chapter 13: Query Optimization. Chapter 13: Query Optimization Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Equivalent Relational Algebra Expressions Statistical

More information

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005.

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005. Improving Query Plans CS157B Chris Pollett Mar. 21, 2005. Outline Parse Trees and Grammars Algebraic Laws for Improving Query Plans From Parse Trees To Logical Query Plans Syntax Analysis and Parse Trees

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Exam 1. March 20th, CS525 - Midterm Exam Solutions

Exam 1. March 20th, CS525 - Midterm Exam Solutions Name CWID Exam 1 March 20th, 2017 CS525 - Midterm Exam s Please leave this empty! 1 2 3 4 Sum Things that you are not allowed to use Personal notes Textbook Printed lecture notes Phone The exam is 75 minutes

More information

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and Chapter 6 The Relational Algebra and Relational Calculus Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 Outline Unary Relational Operations: SELECT and PROJECT Relational

More information

Uncertain Data Models

Uncertain Data Models Uncertain Data Models Christoph Koch EPFL Dan Olteanu University of Oxford SYNOMYMS data models for incomplete information, probabilistic data models, representation systems DEFINITION An uncertain data

More information

Relational Query Optimization

Relational Query Optimization Relational Query Optimization Module 4, Lectures 3 and 4 Database Management Systems, R. Ramakrishnan 1 Overview of Query Optimization Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

ATYPICAL RELATIONAL QUERY OPTIMIZER

ATYPICAL RELATIONAL QUERY OPTIMIZER 14 ATYPICAL RELATIONAL QUERY OPTIMIZER Life is what happens while you re busy making other plans. John Lennon In this chapter, we present a typical relational query optimizer in detail. We begin by discussing

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Relational Model 2: Relational Algebra

Relational Model 2: Relational Algebra Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong The relational model defines: 1 the format by which data should be stored; 2 the operations for querying the data.

More information

Relational Query Optimization. Highlights of System R Optimizer

Relational Query Optimization. Highlights of System R Optimizer Relational Query Optimization Chapter 15 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Highlights of System R Optimizer v Impact: Most widely used currently; works well for < 10 joins.

More information

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007 Relational Query Optimization Yanlei Diao UMass Amherst October 23 & 25, 2007 Slide Content Courtesy of R. Ramakrishnan, J. Gehrke, and J. Hellerstein 1 Overview of Query Evaluation Query Evaluation Plan:

More information

CSE 100 Disjoint Set, Union Find

CSE 100 Disjoint Set, Union Find CSE 100 Disjoint Set, Union Find Union-find using up-trees The union-find data structure 1 0 2 6 7 4 3 5 8 Perform these operations: Find(4) =! Find(3) =! Union(1,0)=! Find(4)=! 2 Array representation

More information

Optimization I : Brute force and Greedy strategy

Optimization I : Brute force and Greedy strategy Chapter 3 Optimization I : Brute force and Greedy strategy A generic definition of an optimization problem involves a set of constraints that defines a subset in some underlying space (like the Euclidean

More information

Lecture Query evaluation. Cost-based transformations. By Marina Barsky Winter 2016, University of Toronto

Lecture Query evaluation. Cost-based transformations. By Marina Barsky Winter 2016, University of Toronto Lecture 02.04. Query evaluation Cost-based transformations By Marina Barsky Winter 2016, University of Toronto RDBMS query evaluation How does a RDBMS answer your query? SQL Query Relational Algebra (RA)

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25 Multi-way Search Trees (Multi-way Search Trees) Data Structures and Programming Spring 2017 1 / 25 Multi-way Search Trees Each internal node of a multi-way search tree T: has at least two children contains

More information

Relational Model, Relational Algebra, and SQL

Relational Model, Relational Algebra, and SQL Relational Model, Relational Algebra, and SQL August 29, 2007 1 Relational Model Data model. constraints. Set of conceptual tools for describing of data, data semantics, data relationships, and data integrity

More information

Monotone Constraints in Frequent Tree Mining

Monotone Constraints in Frequent Tree Mining Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection Outline Query Processing Overview Algorithms for basic operations Sorting Selection Join Projection Query optimization Heuristics Cost-based optimization 19 Estimate I/O Cost for Implementations Count

More information

CS 564 Final Exam Fall 2015 Answers

CS 564 Final Exam Fall 2015 Answers CS 564 Final Exam Fall 015 Answers A: STORAGE AND INDEXING [0pts] I. [10pts] For the following questions, clearly circle True or False. 1. The cost of a file scan is essentially the same for a heap file

More information

Database Tuning and Physical Design: Basics of Query Execution

Database Tuning and Physical Design: Basics of Query Execution Database Tuning and Physical Design: Basics of Query Execution Spring 2018 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Query Execution 1 / 43 The Client/Server

More information

Chapter 3. Algorithms for Query Processing and Optimization

Chapter 3. Algorithms for Query Processing and Optimization Chapter 3 Algorithms for Query Processing and Optimization Chapter Outline 1. Introduction to Query Processing 2. Translating SQL Queries into Relational Algebra 3. Algorithms for External Sorting 4. Algorithms

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

CSE 190D Spring 2017 Final Exam Answers

CSE 190D Spring 2017 Final Exam Answers CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Chapter 4 Distributed Query Processing

Chapter 4 Distributed Query Processing Chapter 4 Distributed Query Processing Table of Contents Overview of Query Processing Query Decomposition and Data Localization Optimization of Distributed Queries Chapter4-1 1 1. Overview of Query Processing

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14: Query Optimization Chapter 14 Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

DBMS Query evaluation

DBMS Query evaluation Data Management for Data Science DBMS Maurizio Lenzerini, Riccardo Rosati Corso di laurea magistrale in Data Science Sapienza Università di Roma Academic Year 2016/2017 http://www.dis.uniroma1.it/~rosati/dmds/

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Greedy Algorithms October 5, 2015 École Centrale Paris, Châtenay-Malabry, France Dimo Brockhoff INRIA Lille Nord Europe Course Overview 2 Date Topic Mon, 21.9.2015 Introduction

More information

Lecture 6 Basic Graph Algorithms

Lecture 6 Basic Graph Algorithms CS 491 CAP Intro to Competitive Algorithmic Programming Lecture 6 Basic Graph Algorithms Uttam Thakore University of Illinois at Urbana-Champaign September 30, 2015 Updates ICPC Regionals teams will be

More information

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence What s a database system? Review of Basic Database Concepts CPS 296.1 Topics in Database Systems According to Oxford Dictionary Database: an organized body of related information Database system, DataBase

More information

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine QUERY OPTIMIZATION 1 QUERY OPTIMIZATION QUERY SUB-SYSTEM 2 ROADMAP 3. 12 QUERY BLOCKS: UNITS OF OPTIMIZATION An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested

More information

logn D. Θ C. Θ n 2 ( ) ( ) f n B. nlogn Ο n2 n 2 D. Ο & % ( C. Θ # ( D. Θ n ( ) Ω f ( n)

logn D. Θ C. Θ n 2 ( ) ( ) f n B. nlogn Ο n2 n 2 D. Ο & % ( C. Θ # ( D. Θ n ( ) Ω f ( n) CSE 0 Test Your name as it appears on your UTA ID Card Fall 0 Multiple Choice:. Write the letter of your answer on the line ) to the LEFT of each problem.. CIRCLED ANSWERS DO NOT COUNT.. points each. The

More information

Data and File Structures Laboratory

Data and File Structures Laboratory Binary Trees Assistant Professor Machine Intelligence Unit Indian Statistical Institute, Kolkata September, 2018 1 Basics 2 Implementation 3 Traversal Basics of a tree A tree is recursively defined as

More information

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list

More information

Query Optimization. Vishy Poosala Bell Labs

Query Optimization. Vishy Poosala Bell Labs Query Optimization Vishy Poosala Bell Labs 1 Outline Introduction Necessary Details Cost Estimation Result Size Estimation Standard approach for query optimization Other ways Related Concepts 2 Given a

More information

Introduction to Combinatorial Algorithms

Introduction to Combinatorial Algorithms Fall 2009 Intro Introduction to the course What are : Combinatorial Structures? Combinatorial Algorithms? Combinatorial Problems? Combinatorial Structures Combinatorial Structures Combinatorial structures

More information

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan Plan for today Query Processing/Optimization CPS 216 Advanced Database Systems Overview of query processing Query execution Query plan enumeration Query rewrite heuristics Query rewrite in DB2 2 A query

More information

Storage hierarchy. Textbook: chapters 11, 12, and 13

Storage hierarchy. Textbook: chapters 11, 12, and 13 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow Very small Small Bigger Very big (KB) (MB) (GB) (TB) Built-in Expensive Cheap Dirt cheap Disks: data is stored on concentric circular

More information

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan THE B+ TREE INDEX CS 564- Spring 2018 ACKs: Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? The B+ tree index Basics Search/Insertion/Deletion Design & Cost 2 INDEX RECAP We have the following query:

More information

( D. Θ n. ( ) f n ( ) D. Ο%

( D. Θ n. ( ) f n ( ) D. Ο% CSE 0 Name Test Spring 0 Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to run the code below is in: for i=n; i>=; i--) for j=; j

More information

Ian Kenny. December 1, 2017

Ian Kenny. December 1, 2017 Ian Kenny December 1, 2017 Introductory Databases Query Optimisation Introduction Any given natural language query can be formulated as many possible SQL queries. There are also many possible relational

More information

Graph and Digraph Glossary

Graph and Digraph Glossary 1 of 15 31.1.2004 14:45 Graph and Digraph Glossary A B C D E F G H I-J K L M N O P-Q R S T U V W-Z Acyclic Graph A graph is acyclic if it contains no cycles. Adjacency Matrix A 0-1 square matrix whose

More information