An Adaptive Query Execution Engine for Data Integration

Size: px
Start display at page:

Download "An Adaptive Query Execution Engine for Data Integration"

Transcription

1 An Adaptive Query Execution Engine for Data Integration Zachary Ives, Daniela Florescu, Marc Friedman, Alon Levy, Daniel S. Weld University of Washington Presented by Peng 1

2 Outline The Background of Data Integration Systems Tukwila Architecture Interleaving of planning and execution Adaptive Query Operators Collector & Double Pipelined Join Performance 2

3 Background (Data Integration Systems) Data Integration System Multiple autonomous (can t affect behavior of sources) heterogeneous (different models and schemas) data sources 3

4 The key of DISs: Free users from having to locate the sources relevant to their query interact with each source independently manually combine the data from the different sources 4

5 The main challenges of the design of DISs: Query Reformulation The construction of wrapper programs Query optimizers and efficient query execution engines 5

6 Motivations: Little information for cost estimates Unpredictable data transfer rates Adaptive Unreliable, overlapping sources Want initial results quickly Network bandwidth generally constrains the data sources to be smaller than in traditional db applications. 6

7 Tukwila Architecture 1. a semantic description of the contents of the data sources; 2. overlap information about pairs of data sources ; 3. key statistics about the data, such as the cost of accessing each source and so on 7

8 Novel Features of Tukwila Interleaving of planning and execution Compensates for lack of information Handle event-condition-action rules When and how to modify the implementation of certain operators at runtime if needed. Detect opportunities for re-optimization. Manages overlapping data sources (collectors) Tolerant of latency (double-pipelined join) Returns initial results quickly 8

9 RAP1 Interleaving of planning and execution the non-traditional characteristics of Tukwila are as following, The optimizer can only create a partial plan if essential statistics are missing or uncertain The optimizer generates not only operator trees but also the appropriate event-conditionaction rules. The optimizer conserves the state of its search space when it calls the execution engine. 9

10 Slide 9 RAP1 1. too wordy 2. you mean "characteristics" rather than "characters" Rachel Pottinger, 2/20/2006

11 Overview of the query plan structure The fragment structure is the key mechanism for implementing the A plan adaptive Why includes does the property: system a partially-ordered need at the fragment end of structure? each set of fragments fragment, and the a set rest of the global plan can rules be reoptimized or rescheduled A fragment consists of a fully pipelined tree of physical operators and a set of local rules. 10

12 Rules Re-optimization The optimizer s cardinality estimate for the fragment s result is significantly different from the actual size ->reinvoke optimizer Contingent planning The execution engine checks properties of the result to select the next fragment Rescheduling Reschedule if a source times out Adaptive operators 11

13 RAP2 Rule format When event if condition then actions When closed(frag1) if card(join1)>2*est_card(join1) then replan An event triggers a rule, coursing it to check its condition. If the condition is true, the rule fires, executing the action(s). 12

14 Slide 12 RAP2 Given time constraints, I'd cut slides 12 & 13 Rachel Pottinger, 2/20/2006

15 Events Actions open, closed: fragment/operator starts or completes error: set the operator overflow failure, method e.g., for unable a double to contact pipelined source join timeout(n): alter a memory data allotment source has not responded in n msec. out-of deactivate memory: an operator join has or insufficient fragment, memory which stops its execution and deactivates its associated rules reschedule the query operator tree re-optimize the plan Conditions return an error to the user state(operutor): the operator s current state card(operator): the number of tuples produced so far time(operator): the time waiting since last tuple memory(operator): the memory used so far 13

16 Group Discussion For one of the following motivating situations of Tukwila Absence of statistics Unpredictable data arrival characteristics Overlap and redundancy among sources Optimizing the time to initial answers Q1: Can you give some examples where the chosen topic matters? Q2: If you are a member of Tukwila team, what rules or policy would you have to deal with the problem? To help discussion, more specific situations will be given But you may assume any problem or situation Discussion Form 8 groups (3~4 person per group, two teams per topic) Discuss Q1 and Q2 for one topic (5 ~ 7 minutes) 14

17 Examples Orders OrderNo TrackNo Join Orders.TrackNo = UPS.TrackNo (Orders, UPS) UPS TrackNo Status In Transit Delivered Delivered Undeliverable OrderNo TrackNo Status In Transit Delivered Delivered Delivered 15

18 Query Plan Execution Query plan represented as data-flow tree: Control flow Show which orders have been delivered Join Orders.TrackNo = UPS.TrackNo Read Orders Select Status = Delivered Read UPS Iterator (top-down) Most common database model Easier to implement Data-driven (bottom-up) Threads or external scheduling Better concurrency 16

19 Tukwila Plans & Execution Multiple fragments ending at materialization points Rules triggered by events Re-optimize remainder if necessary Return statistics Join Orders.TrackNo = UPS.TrackNo Read Orders Select Status = Delivered Read UPS (3) (1) (2) When(closed(1)): if size_of(orders) > 1000 then reoptimize {2, 3} 17

20 RAP3 Performance evaluation Interleaving Planning and Execution We can find that Tukwila s strategy of interleaving planning and execution can slash the total time spent processing a query. With a total speedup of 1.42 over pipeline and 1.69 over the naïve strategy of materializing. 18

21 Slide 18 RAP3 Given time constraints, I'd cut this slide Rachel Pottinger, 2/20/2006

22 Adaptive Query Operators Collectors Overlap issues Data Integration Systems needs to perform a union over a large number of overlapping sources. However, a standard union operator has no mechanism for handling errors or for deciding to ignore slow mirror data sources once it has obtained the full data set. Q A Q B Q C A B C Mirror_C 19

23 RAP4 Collectors (cont.) Collectors can deal with the problems by using policies! A collector operator = a set of children (wrapper calls, local data and so on) + a policy for contacting them 20

24 Slide 20 RAP4 Again, considering time constraints, consider cutting slides 19 and 20 Rachel Pottinger, 2/20/2006

25 Collectors (cont.) A complex policy example, Tukwila A B 21

26 Adaptive Query Operators Double Pipelined Join Conventional Joins Sort merge joins &indexed joins ---can not be pipelined Nested loops joins and hash joins ---Follow an asymmetric execution model For Nested loops joins, we must wait for the entire inner table to be transmitted initially before pipelining begins For hash joins, we must load the entire inner relation into a hash table before we can pipeline. 22

27 Double Pipelined Hash Join Proposed for parallel main-memory databases (Wilschut 1990) Hash table per source As a tuple comes in, add to hash table and probe opposite table Evaluation: Results as soon as tuples received Symmetric Requires memory for two hash tables But data-driven! 23

28 Orders OrderNo TrackNo Hash Table (Orders) Join Orders.TrackNo = UPS.TrackNo (Orders, UPS) UPS TrackNo Status In Transit Delivered Delivered Hash Table (UPS)

29 Double-Pipelined Join Adapted to Iterator Model Use multiple threads with queues Each child (A or B) reads tuples until full, then sleeps & awakens parent Join sleeps until awakened, then: Joins tuples from QA or QB, returning all matches as output Wakes owner of queue QA A Join QB B 25

30 RAP5 Performance Evaluation: Double Pipelined Hash Join 26

31 Slide 26 RAP5 Again, consider cutting due to time constraints. Rachel Pottinger, 2/20/2006

32 Insufficient Memory? May not be able to fit hash tables in RAM Strategy for standard hash join Swap some buckets to overflow files As new tuples arrive for those buckets, write to files After current phase, clear memory, repeat join on overflow files 27

33 Conclusions General Tukwila architecture Non-conventional characters of Tukwila Interleaving of optimization and execution main idea of the collector operator Double pipelined hash join 28

An Adaptive Query Execution System for Data Integration*

An Adaptive Query Execution System for Data Integration* An Adaptive Query Execution System for Data Integration* Zachary G. Ives Daniela Florescu University of Washington INRIA Roquencourt zives@cs.washington.edu Daniela.Florescu@inria.fr Alon Levy University

More information

AN OVERVIEW OF ADAPTIVE QUERY PROCESSING SYSTEMS. Mengmeng Liu. Computer and Information Science. University of Pennsylvania.

AN OVERVIEW OF ADAPTIVE QUERY PROCESSING SYSTEMS. Mengmeng Liu. Computer and Information Science. University of Pennsylvania. AN OVERVIEW OF ADAPTIVE QUERY PROCESSING SYSTEMS Mengmeng Liu Computer and Information Science University of Pennsylvania WPE-II exam Janurary 28, 2 ASTRACT Traditional database query processors separate

More information

Advanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch

Advanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch Advanced Databases Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch www.mniazi.ir Parallel Join The join operation requires pairs of tuples to be

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012 Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 6 April 12, 2012 1 Acknowledgements: The slides are provided by Nikolaus Augsten

More information

Eddies: Continuously Adaptive Query Processing. Jae Kyu Chun Feb. 17, 2003

Eddies: Continuously Adaptive Query Processing. Jae Kyu Chun Feb. 17, 2003 Eddies: Continuously Adaptive Query Processing Jae Kyu Chun Feb. 17, 2003 Query in Large Scale System Hardware and Workload Complexity heterogeneous hardware mix unpredictable hardware performance Data

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L10: Query Processing Other Operations, Pipelining and Materialization Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science

More information

Outline. Eddies: Continuously Adaptive Query Processing. What s the Problem? What s the Problem? Outline. Discussion 1

Outline. Eddies: Continuously Adaptive Query Processing. What s the Problem? What s the Problem? Outline. Discussion 1 : Continuously Adaptive Query Processing CPSC 504 Presentation Avnur, R. and Hellerstein, J. M. 2000. : continuously adaptive query processing. In Proceedings of the 2000 ACM SIGMOD international Conference

More information

Mid-Query Re-Optimization

Mid-Query Re-Optimization Mid-Query Re-Optimization Navin Kabra David J. DeWitt Ivan Terziev University of Pennsylvania Query Optimization Ideally an optimal plan should be found. However, this is not the case especially for complex

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!

More information

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Chapter 20: Parallel Databases. Introduction

Chapter 20: Parallel Databases. Introduction Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery

More information

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of

More information

Chapter 17: Parallel Databases

Chapter 17: Parallel Databases Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

XML Data Stream Processing: Extensions to YFilter

XML Data Stream Processing: Extensions to YFilter XML Data Stream Processing: Extensions to YFilter Shaolei Feng and Giridhar Kumaran January 31, 2007 Abstract Running XPath queries on XML data steams is a challenge. Current approaches that store the

More information

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management Outline n Introduction & architectural issues n Data distribution n Distributed query processing n Distributed query optimization n Distributed transactions & concurrency control n Distributed reliability

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement. COS 597: Principles of Database and Information Systems Query Optimization Query Optimization Query as expression over relational algebraic operations Get evaluation (parse) tree Leaves: base relations

More information

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments Administrivia Midterm on Thursday 10/18 CS 133: Databases Fall 2018 Lec 12 10/16 Prof. Beth Trushkowsky Assignments Lab 3 starts after fall break No problem set out this week Goals for Today Cost-based

More information

Sorting Pearson Education, Inc. All rights reserved.

Sorting Pearson Education, Inc. All rights reserved. 1 19 Sorting 2 19.1 Introduction (Cont.) Sorting data Place data in order Typically ascending or descending Based on one or more sort keys Algorithms Insertion sort Selection sort Merge sort More efficient,

More information

Request Window: an Approach to Improve Throughput of RDBMS-based Data Integration System by Utilizing Data Sharing Across Concurrent Distributed Queries Rubao Lee, Minghong Zhou, Huaming Liao Research

More information

The functionality. Managing more than Operating

The functionality. Managing more than Operating The functionality Managing more than Operating Remember This? What to Manage Processing CPU and Memory Storage Input and Output Devices Functions CPU - Process management RAM - Memory management Storage

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VII Lecture 15, March 17, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part VI Algorithms for Relational Operations Today s Session: DBMS

More information

Ch 5 : Query Processing & Optimization

Ch 5 : Query Processing & Optimization Ch 5 : Query Processing & Optimization Basic Steps in Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation Basic Steps in Query Processing (Cont.) Parsing and translation translate

More information

6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing:

6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing: Lab 2 -- Due Weds. Project meeting sign ups 6.830 Lecture 8 10/2/2017 Recap column stores & paper Join Processing: S = {S} tuples, S pages R = {R} tuples, R pages R < S M pages of memory Types of joins

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Outline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015

Outline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015 Outline Database Tuning Nikolaus Augsten University of Salzburg Department of Computer Science Database Group 1 Examples Unit 6 WS 2014/2015 Adapted from Database Tuning by Dennis Shasha and Philippe Bonnet.

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

CSE 544, Winter 2009, Final Examination 11 March 2009

CSE 544, Winter 2009, Final Examination 11 March 2009 CSE 544, Winter 2009, Final Examination 11 March 2009 Rules: Open books and open notes. No laptops or other mobile devices. Calculators allowed. Please write clearly. Relax! You are here to learn. Question

More information

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems Last Name: First Name: Student ID: 1. Exam is 2 hours long 2. Closed books/notes Problem 1 (6 points) Consider

More information

University of Waterloo Midterm Examination Solution

University of Waterloo Midterm Examination Solution University of Waterloo Midterm Examination Solution Winter, 2011 1. (6 total marks) The diagram below shows an extensible hash table with four hash buckets. Each number x in the buckets represents an entry

More information

Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age. Presented by Dennis Grishin

Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age. Presented by Dennis Grishin Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age Presented by Dennis Grishin What is the problem? Efficient computation requires distribution of processing between

More information

Advanced Databases: Parallel Databases A.Poulovassilis

Advanced Databases: Parallel Databases A.Poulovassilis 1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger

More information

Fundamentals of Database Systems

Fundamentals of Database Systems Fundamentals of Database Systems Assignment: 4 September 21, 2015 Instructions 1. This question paper contains 10 questions in 5 pages. Q1: Calculate branching factor in case for B- tree index structure,

More information

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large

More information

Incremental Evaluation of Sliding-Window Queries over Data Streams

Incremental Evaluation of Sliding-Window Queries over Data Streams Incremental Evaluation of Sliding-Window Queries over Data Streams Thanaa M. Ghanem 1 Moustafa A. Hammad 2 Mohamed F. Mokbel 3 Walid G. Aref 1 Ahmed K. Elmagarmid 1 1 Department of Computer Science, Purdue

More information

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: Relational Query Optimization R & G Chapter 13 Review Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:

More information

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lectures 16 17: Basics of Query Optimization and Cost Estimation (Ch. 15.{1,3,4.6,6} & 16.4-5) 1 Announcements WQ4 is due Friday 11pm HW3 is due next Tuesday 11pm Midterm is next

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Performance Optimization for Informatica Data Services ( Hotfix 3)

Performance Optimization for Informatica Data Services ( Hotfix 3) Performance Optimization for Informatica Data Services (9.5.0-9.6.1 Hotfix 3) 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

V Conclusions. V.1 Related work

V Conclusions. V.1 Related work V Conclusions V.1 Related work Even though MapReduce appears to be constructed specifically for performing group-by aggregations, there are also many interesting research work being done on studying critical

More information

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection

More information

Outline. Introduction. Introduction Definition -- Selection and Join Semantics The Cost Model Load Shedding Experiments Conclusion Discussion Points

Outline. Introduction. Introduction Definition -- Selection and Join Semantics The Cost Model Load Shedding Experiments Conclusion Discussion Points Static Optimization of Conjunctive Queries with Sliding Windows Over Infinite Streams Ahmed M. Ayad and Jeffrey F.Naughton Presenter: Maryam Karimzadehgan mkarimzadehgan@cs.uwaterloo.ca Outline Introduction

More information

Notes. Some of these slides are based on a slide set provided by Ulf Leser. CS 640 Query Processing Winter / 30. Notes

Notes. Some of these slides are based on a slide set provided by Ulf Leser. CS 640 Query Processing Winter / 30. Notes uery Processing Olaf Hartig David R. Cheriton School of Computer Science University of Waterloo CS 640 Principles of Database Management and Use Winter 2013 Some of these slides are based on a slide set

More information

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. Chapter 18 Strategies for Query Processing We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. 1 1. Translating SQL Queries into Relational Algebra and Other Operators - SQL is

More information

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2

More information

University of Waterloo Midterm Examination Sample Solution

University of Waterloo Midterm Examination Sample Solution 1. (4 total marks) University of Waterloo Midterm Examination Sample Solution Winter, 2012 Suppose that a relational database contains the following large relation: Track(ReleaseID, TrackNum, Title, Length,

More information

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10 Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10 RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY, KIRUMAMPAKKAM-607 402 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING QUESTION BANK

More information

Dataflow Architectures. Karin Strauss

Dataflow Architectures. Karin Strauss Dataflow Architectures Karin Strauss Introduction Dataflow machines: programmable computers with hardware optimized for fine grain data-driven parallel computation fine grain: at the instruction granularity

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

Avoiding Sorting and Grouping In Processing Queries

Avoiding Sorting and Grouping In Processing Queries Avoiding Sorting and Grouping In Processing Queries Outline Motivation Simple Example Order Properties Grouping followed by ordering Order Property Optimization Performance Results Conclusion Motivation

More information

Last Time. Think carefully about whether you use a heap Look carefully for stack overflow Especially when you have multiple threads

Last Time. Think carefully about whether you use a heap Look carefully for stack overflow Especially when you have multiple threads Last Time Cost of nearly full resources RAM is limited Think carefully about whether you use a heap Look carefully for stack overflow Especially when you have multiple threads Embedded C Extensions for

More information

A Distributed Query Engine for XML-QL

A Distributed Query Engine for XML-QL A Distributed Query Engine for XML-QL Paramjit Oberoi and Vishal Kathuria University of Wisconsin-Madison {param,vishal}@cs.wisc.edu Abstract: This paper describes a distributed Query Engine for executing

More information

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi Evaluation of Relational Operations: Other Techniques Chapter 14 Sayyed Nezhadi Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer,

More information

CSE 344 MAY 7 TH EXAM REVIEW

CSE 344 MAY 7 TH EXAM REVIEW CSE 344 MAY 7 TH EXAM REVIEW EXAMINATION STATIONS Exam Wednesday 9:30-10:20 One sheet of notes, front and back Practice solutions out after class Good luck! EXAM LENGTH Production v. Verification Practice

More information

Question 1. Part (a) [2 marks] what are different types of data independence? explain each in one line. Part (b) CSC 443 Term Test Solutions Fall 2017

Question 1. Part (a) [2 marks] what are different types of data independence? explain each in one line. Part (b) CSC 443 Term Test Solutions Fall 2017 Question 1. [12 marks] Short Answer Question Part (a) [2 marks] what are different types of data independence? explain each in one line Logical data independence: Protection from changes in logical structure

More information

Basant Group of Institution

Basant Group of Institution Basant Group of Institution Visual Basic 6.0 Objective Question Q.1 In the relational modes, cardinality is termed as: (A) Number of tuples. (B) Number of attributes. (C) Number of tables. (D) Number of

More information

Chapter 11 - Data Replication Middleware

Chapter 11 - Data Replication Middleware Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 11 - Data Replication Middleware Motivation Replication: controlled

More information

Mahathma Gandhi University

Mahathma Gandhi University Mahathma Gandhi University BSc Computer science III Semester BCS 303 OBJECTIVE TYPE QUESTIONS Choose the correct or best alternative in the following: Q.1 In the relational modes, cardinality is termed

More information

Architecture and Implementation of Database Systems Query Processing

Architecture and Implementation of Database Systems Query Processing Architecture and Implementation of Database Systems Query Processing Ralf Möller Hamburg University of Technology Acknowledgements The course is partially based on http://www.systems.ethz.ch/ education/past-courses/hs08/archdbms

More information

CMPUT 391 Database Management Systems. Query Processing: The Basics. Textbook: Chapter 10. (first edition: Chapter 13) University of Alberta 1

CMPUT 391 Database Management Systems. Query Processing: The Basics. Textbook: Chapter 10. (first edition: Chapter 13) University of Alberta 1 CMPUT 391 Database Management Systems Query Processing: The Basics Textbook: Chapter 10 (first edition: Chapter 13) Based on slides by Lewis, Bernstein and Kifer University of Alberta 1 External Sorting

More information

Distributed DBMS. Concepts. Concepts. Distributed DBMS. Concepts. Concepts 9/8/2014

Distributed DBMS. Concepts. Concepts. Distributed DBMS. Concepts. Concepts 9/8/2014 Distributed DBMS Advantages and disadvantages of distributed databases. Functions of DDBMS. Distributed database design. Distributed Database A logically interrelated collection of shared data (and a description

More information

CSE 530A. Query Planning. Washington University Fall 2013

CSE 530A. Query Planning. Washington University Fall 2013 CSE 530A Query Planning Washington University Fall 2013 Scanning When finding data in a relation, we've seen two types of scans Table scan Index scan There is a third common way Bitmap scan Bitmap Scans

More information

Optimizer Challenges in a Multi-Tenant World

Optimizer Challenges in a Multi-Tenant World Optimizer Challenges in a Multi-Tenant World Pat Selinger pselinger@salesforce.come Classic Query Optimizer Concepts & Assumptions Relational Model Cost = X * CPU + Y * I/O Cardinality Selectivity Clustering

More information

Distributed Data Management

Distributed Data Management Distributed Data Management Profr. Dr. Wolf-Tilo Balke Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 3.0 Introduction 3.0 Query Processing 3.1 Basic Distributed

More information

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka What problem does Kafka solve? Provides a way to deliver updates about changes in state from one service to another

More information

A Case for Merge Joins in Mediator Systems

A Case for Merge Joins in Mediator Systems A Case for Merge Joins in Mediator Systems Ramon Lawrence Kirk Hackert IDEA Lab, Department of Computer Science, University of Iowa Iowa City, IA, USA {ramon-lawrence, kirk-hackert}@uiowa.edu Abstract

More information

CompSci 516 Data Intensive Computing Systems

CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 9 Join Algorithms and Query Optimizations Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Announcements Takeaway from Homework

More information

Query Processing and Query Optimization. Prof Monika Shah

Query Processing and Query Optimization. Prof Monika Shah Query Processing and Query Optimization Query Processing SQL Query Is in Library Cache? System catalog (Dict / Dict cache) Scan and verify relations Parse into parse tree (relational Calculus) View definitions

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 7 - Query execution References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton

More information

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques Evaluation of Relational Operations: Other Techniques [R&G] Chapter 14, Part B CS4320 1 Using an Index for Selections Cost depends on #qualifying tuples, and clustering. Cost of finding qualifying data

More information

Query Execution [15]

Query Execution [15] CSC 661, Principles of Database Systems Query Execution [15] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Query processing involves Query processing compilation parsing to construct parse

More information

15-415/615 Faloutsos 1

15-415/615 Faloutsos 1 Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415/615 Faloutsos 1 Outline introduction selection

More information

Query Processing: The Basics. External Sorting

Query Processing: The Basics. External Sorting Query Processing: The Basics Chapter 10 1 External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot use traditional

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

Architecting a Network Query Engine for Producing Partial Results

Architecting a Network Query Engine for Producing Partial Results Architecting a Network Query Engine for Producing Partial Results Jayavel Shanmugasundaram 1,2 Kristin Tufte 3 David DeWitt 1 Jeffrey Naughton 1 David Maier 3 jai@cs.wisc.edu, tufte@cse.ogi.edu, dewitt@cs.wisc.edu,

More information

Distributed Data Management

Distributed Data Management Distributed Data Management Wolf-Tilo Balke Christoph Lofi Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 2.0 Introduction 3.0 Query Processing 3.1 Basic

More information

8) A top-to-bottom relationship among the items in a database is established by a

8) A top-to-bottom relationship among the items in a database is established by a MULTIPLE CHOICE QUESTIONS IN DBMS (unit-1 to unit-4) 1) ER model is used in phase a) conceptual database b) schema refinement c) physical refinement d) applications and security 2) The ER model is relevant

More information

Parallel DBMS. Prof. Yanlei Diao. University of Massachusetts Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Parallel DBMS. Prof. Yanlei Diao. University of Massachusetts Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke Parallel DBMS Prof. Yanlei Diao University of Massachusetts Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke I. Parallel Databases 101 Rise of parallel databases: late 80 s Architecture: shared-nothing

More information

Lecture 17: Query execution. Wednesday, May 12, 2010

Lecture 17: Query execution. Wednesday, May 12, 2010 Lecture 17: Query execution Wednesday, May 12, 2010 1 Outline of Next Few Lectures Query execution Query optimization 2 Steps of the Query Processor SQL query Parse & Rewrite Query Query optimization Select

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture X: Parallel Databases Topics Motivation and Goals Architectures Data placement Query processing Load balancing

More information

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 6 Lifecycle of a Query Plan 1 Announcements HW1 is due Thursday Projects proposals are due on Wednesday Office hour canceled

More information

Parser: SQL parse tree

Parser: SQL parse tree Jinze Liu Parser: SQL parse tree Good old lex & yacc Detect and reject syntax errors Validator: parse tree logical plan Detect and reject semantic errors Nonexistent tables/views/columns? Insufficient

More information

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Query Processing. Introduction to Databases CompSci 316 Fall 2017 Query Processing Introduction to Databases CompSci 316 Fall 2017 2 Announcements (Tue., Nov. 14) Homework #3 sample solution posted in Sakai Homework #4 assigned today; due on 12/05 Project milestone #2

More information

CSC 261/461 Database Systems Lecture 19

CSC 261/461 Database Systems Lecture 19 CSC 261/461 Database Systems Lecture 19 Fall 2017 Announcements CIRC: CIRC is down!!! MongoDB and Spark (mini) projects are at stake. L Project 1 Milestone 4 is out Due date: Last date of class We will

More information