CS 347 Parallel and Distributed Data Processing
|
|
- Godfrey Franklin
- 5 years ago
- Views:
Transcription
1 C 347 Parallel and Distributed Data Processing pring 2016 Query Processing Decomposition Localization Optimization Notes 3: Query Processing C 347 Notes 3 2 Decomposition ame as in centralized system 1. Normalization 2. Eliminating redundancy 3. lgebraic rewriting Normalization Convert from general language to a standard form E.g., relational algebra C 347 Notes 3 3 C 347 Notes 3 4
2 Normalization Example: Normalization lso detect invalid expressions.,.c select.,.c from, where. =. and σ (.B=1.C>3).D=2 select * from where.d = 3 does not have D attribute ((.B = 1 and.d = 2) or (.C > 3 and.d = 2)) conjunctive normal form C 347 Notes 3 5 C 347 Notes 3 6 Eliminating edundancy E.g., in conditions Eliminating edundancy E.g., sharing common sub-expressions (. = 1) (. > 5) false (. < 10) (. < 5). < 5 σ cond σ cond T σ cond T C 347 Notes 3 7 C 347 Notes 3 8
3 lgebraic ewriting E.g., pushing conditions down σ cond σ cond3 σ cond1 σ cond2 Query Processing Decomposition One or more algebraic query trees on relations Localization eplacing relations by corresponding fragments C 347 Notes 3 9 C 347 Notes 3 10 Localization teps (1) tart with query (2) eplace relations by fragments (3) Push up and, σ down (apply C 245 rules) Localization Notation for fragment [ : cond] conditions its tuples satisfy fragment (4) implify ~ eliminate unnecessary operations C 347 Notes 3 11 C 347 Notes 3 12
4 Example (1) Example (2) σ E=3 σ E=3 [ 1: E<10] [ 2: E 10] C 347 Notes 3 13 C 347 Notes 3 14 Example Example (3) (3) σ E=3 σ E=3 [ 1: E<10] [ 2: E 10] σ E=3 σ E=3 [ 1: E<10] [ 2: E 10] Ø C 347 Notes 3 15 C 347 Notes 3 16
5 Example (4) Localization ule 1 σ E=3 [ 1: E<10] σ c1 [: c2] [: false] σ c1 [: c1 c2] Ø C 347 Notes 3 17 C 347 Notes 3 18 Example Example B σ E=3 [ 2: E 10] σ E=3 [ 2: E=3 E 10] σ E=3 [ 2: false] Ø (1) = common attribute C 347 Notes 3 19 C 347 Notes 3 20
6 Example B Example B (2) (3) [1: <5] [2: 5 10] [3: >10] [1: <5] [2: 5] [1: <5] [1: <5] [1: <5] [2: 5] [3: >10] [1: <5] [3: >10] [2: 5] [2: 5 10] [1: <5] [2: 5 10] [2: 5] C 347 Notes 3 21 C 347 Notes 3 22 Example B Example B (3) (4) [1: <5] [1: <5] [1: <5] [2: 5] [3: >10] [1: <5] [3: >10] [2: 5] [1: <5] [1: <5] [2: 5 10] [2: 5] [3: >10] [2: 5] [2: 5 10] [1: <5] [2: 5 10] [2: 5] C 347 Notes 3 23 C 347 Notes 3 24
7 Localization ule 2 Example B [ 1: < 5] [ 2: 5] [: c 1] [: c 2] [ : c 1 c 2.=.] [ 1 2: 1. < = 2.] [: false] Ø [ 1 2: false] Ø C 347 Notes 3 25 C 347 Notes 3 26 Example C Example C (2) (3) K K K K K [1: <10] [2: 10] [1: K=.K.<10] [2: K=.K. 10] derived fragmentation [1] [1] [1] [2] [2] [1] [2] [2] C 347 Notes 3 27 C 347 Notes 3 28
8 Example C (3) Example C (4) K K K K K K [1] [1] [1] [2] [2] [1] [2] [2] [1: <10] [1: K=.K.<10] [2: 10] [2: K=.K. 10] C 347 Notes 3 29 C 347 Notes 3 30 Example C Example D K [ 1: < 10] [ 2: K =.K. 10] (1) [ 1 K 2: 1. < 10 2.K =.K K = 2.K] 1 (K,, B) 2 (K, C, D) [ 1 Ø K 2: false] vertical fragmentation C 347 Notes 3 31 C 347 Notes 3 32
9 Example D Example D (2) (2) K K 1 (K,,B) 2 (K,C,D) K, K, 1 (K,,B) 2 (K,C,D) C 347 Notes 3 33 C 347 Notes 3 34 Example D ule 3 (4) Consider the vertical fragmentation 1 (K,, B) i = (), i i Then for any B i B() = B( i B i Ø ) C 347 Notes 3 35 C 347 Notes 3 36
10 Query Processing Decomposition Localization Overview Optimization process 1. Generate query plans P 1 P 2 P 3 P n Optimization Overview Joins and other operations Inter-operation parallelism Optimization strategies 2. Estimate size of intermediate results 3. Estimate cost of plan C 1 C 2 C 3 C n C 347 Notes Pick minimum C 347 Notes 3 38 Overview Differences from centralized optimization New strategies for some operations Parallel/distributed sort Parallel/distributed join emi-join Privacy preserving join Duplicate elimination ggregation election Many ways to assign and schedule processors C 347 Notes 3 39 Parallel/Distributed ort Input a) elation on single site/disk b) elation fragmented/partitioned by sort attribute c) elation fragmented/partitioned by another attribute C 347 Notes 3 40
11 Parallel/Distributed ort Output a) orted on single site/disk b) orted fragments/partitions Basic ort (K, ) to be sorted on K Horizontally fragmented on K using vector (k 0, k 1,..., k n) k0 k F 1 F 2 F 3 C 347 Notes 3 41 C 347 Notes 3 42 Basic ort Basic ort lgorithm 1. ort each fragment independently 2. hip results if necessary ame idea on different architectures hared nothing P 1 P 2 M F 1 M F 2 hared memory P 1 P 2 F 1 M F 2 C 347 Notes 3 43 C 347 Notes 3 44
12 ange-partition ort (K, ) to be sorted on K located at one or more site/disk, not fragmented on K ange-partition ort lgorithm 1. ange partition on K 2. Basic sort a 1 local sort 1 k0 b 2 local sort 2 result k1 3 local sort 3 C 347 Notes 3 45 C 347 Notes 3 46 Partition Vectors Problem elect a good partition vector given fragments a b c 4 Partition Vectors Example approach Each site sends to coordinator MIN sort key MX sort key Number of tuples Coordinator Computes vector and distributes to sites Decides the number of sites to perform local sorts C 347 Notes 3 47 C 347 Notes 3 48
13 Partition Vectors ample scenario Coordinator receives : MIN = 5 MX = 9 # of tuples = 10 B: MIN = 7 MX = 16 # of tuples = 10 Partition Vectors ample scenario Coordinator receives : MIN = 5 MX = 9 # of tuples = 10 B: MIN = 7 MX = 16 # of tuples = 10 Expected tuples 2 = 10 / ( ) 1 = 10 / ( ) k0? assuming we want to sort at 2 sites C 347 Notes 3 49 C 347 Notes 3 50 Partition Vectors Partition Vectors ample scenario Expected tuples 2 Variations end more info to coordinator: a) Partition vector for local site Expected tuples with key < k 0 = half of total tuples 2(k 0 5) + (k 0 7) = 10 k 0 = ( ) / 3 = k0? 1 b) Histogram # of tuples local vector C 347 Notes 3 51 C 347 Notes 3 52
14 Partition Vectors Variations Multiple rounds 1. ites send range and # of tuples to coordinator 2. Coordinator distributes preliminary vector V 0 3. ites send coordinator # of tuples in each V 0 range Partition Vectors Variations Distributed algorithm that does not require a coordinator? 4. Coordinator computes and distributes final vector V f C 347 Notes 3 53 C 347 Notes 3 54 Parallel External ort-merge ame as range-partition sort except does sorting first a local sort a 1 Parallel/Distributed Join Input elations, May or may not be fragmented b local sort b k0 2 k1 result Output esult at one or more sites 3 in order merge C 347 Notes 3 55 C 347 Notes 3 56
15 Partitioned Join (Equijoin) local join Partitioned Join (Equijoin) ame partition function f is used for both and 1 1 f can be range or hash partitioning B 2 2 B Local join can be of any type (use any C 245 optimization) f() 3 result 3 f() C Various scheduling options, e.g., a) Partition ; partition ; join b) Partition ; build local hash table for ; partition and join C 347 Notes 3 57 C 347 Notes 3 58 Partitioned Join (Equijoin) We already know why it works Partitioned Join (Equijoin) electing a good partition function f is very important Goal is to make all i + i the same size [1] [2] [3] [1] [2] [3] [1] [1] [2] [2] [3] [3] ometimes we partition just to make this join possible C 347 Notes 3 59 C 347 Notes 3 60
16 symmetric Fragment + eplicate Join symmetric Fragment + eplicate Join 1 local join Can use any partition function f for (even round robin) Can do any join, not just equijoin (e.g., ).<.B B 2 B partition 3 result union C 347 Notes 3 61 C 347 Notes 3 62 General Fragment + eplicate Join General Fragment + eplicate Join B 2 2 B 2 2 f partition 3 fragments 3 3 n copies of each fragment f partition 3 fragments 3 3 n copies of each fragment is partitioned in similar fashion C 347 Notes 3 63 C 347 Notes 3 64
17 General Fragment + eplicate Join General Fragment + eplicate Join The asymmetric F+ join is special case of the general F symmetric F+ may be good if is small n m pairings of, fragments lso works on non-equijoins result C 347 Notes 3 65 C 347 Notes 3 66 emi-join emi-join Goal: reduce communication traffic ( ) or ( ) or ( ) ( ) B 2 a 10 b 25 c 30 d C 3 x 10 y 15 z 25 w 32 x C 347 Notes 3 67 C 347 Notes 3 68
18 emi-join emi-join B 2 a 10 b 25 c 30 d C 3 x 10 y 15 z 25 w 32 x B 2 a 10 b 25 c 30 d C 3 x 10 y 15 z 25 w 32 x = [2, 10, 25, 30] = [2, 10, 25, 30] 10 y = 25 w C 347 Notes 3 69 C 347 Notes 3 70 emi-join emi-join B 2 a 10 b 25 c 30 d C 3 x 10 y 15 z 25 w 32 x ssume is the smaller relation Then, in general, ( ) is better than ( ) if size( ) + size ( ) < size ( ) Transmitted data With semi-join ( ): T = C + result With join : T = 4 + B + result better if B is large Can do similar comparison for other joins Only taking into account transmission cost C 347 Notes 3 71 C 347 Notes 3 72
19 emi-join Trick: encode (or ) as a bit vector emi-join Useful for three way joins T keys in one bit/possible key C 347 Notes 3 73 C 347 Notes 3 74 emi-join Useful for three way joins T emi-join Useful for three way joins T Option 1: T where = = T Option 1: T where = = T Option 2: T where = = T C 347 Notes 3 75 C 347 Notes 3 76
20 emi-join Useful for three way joins T Option 1: T where = = T Option 2: T where = = T Many options ~ exponential in # of relations Privacy Preserving Join ite 1 has (, B) ite 2 has (, C) Want to compute ite 1 should not discover any info not in the join ite 2 should not discover any info not in the join C 347 Notes 3 77 C 347 Notes 3 78 Privacy Preserving Join emi-join won t work If site 1 sends to site 2, site 2 learns all keys of Privacy Preserving Join Fix: send hash keys only ite 1 hashes each value of before sending ite 2 hashes its own values (same h) to see what tuples match a1 a2 a3 B b1 b2 b3 = (a1, a2, a3, a4) a1 a3 a5 C c1 c2 c3 a1 a2 a3 B b1 b2 b3 = (h(a1), h(a2), h(a3), h(a4)) site 2 sees it has h(a1), h(a3) a1 a3 a5 C c1 c2 c3 a4 b4 a7 c4 site 1 site 2 a4 b4 a7 c4 (a1, c1), (a3, c3) site 1 site 2 C 347 Notes 3 79 C 347 Notes 3 80
21 Privacy Preserving Join Privacy Preserving Join What is the problem? What is the problem? a1 a2 a3 B b1 b2 b3 = (h(a1), h(a2), h(a3), h(a4)) site 2 sees it has h(a1), h(a3) a1 a3 a5 C c1 c2 c3 a1 a2 a3 B b1 b2 b3 = (h(a1), h(a2), h(a3), h(a4)) site 2 sees it has h(a1), h(a3) a1 a3 a5 C c1 c2 c3 a4 b4 a7 c4 (a1, c1), (a3, c3) site 1 site 2 a4 b4 a7 c4 (a1, c1), (a3, c3) site 1 site 2 Dictionary attack ite 2 can take all keys a 1, a 2, a 3, and checks if h(a 1), h(a 2), h(a 3), matches what site 1 sent C 347 Notes 3 81 C 347 Notes 3 82 Privacy Preserving Join Our adversary model: honest but curious Dictionary attack is possible (cheating is internal and can t be caught) ending incorrect keys not possible (cheater could be caught) Privacy Preserving Join One solution Information haring cross Private Databases. grawal et al., IGMOD 2003 Use commutative encryption functions E i(x) = x encrypted using a key private to site i E 1(E 2(x)) = E 2(E 1(x)) horthand: E 1(x) is x, E 2(x) is x, E 1(E 2(x)) is x C 347 Notes 3 83 C 347 Notes 3 84
22 Privacy Preserving Join Other Privacy Preserving Operations One solution Inequality join.>. a1 a2 B b1 b2 ( a1, a2, a3, a4 ) a1 a3 C c1 c2 imilarity join sim(.,.)<e a3 b3 ( a1, a2, a3, a4 ) a5 c3 a4 b4 a7 c4 ( a1, a3, a5, a7 ) site 1 site 2 Computes ( a1, a3, a5, a7 ), intersects with ( a1, a2, a3, a4 ) (a1, b1), (a3, b3) C 347 Notes 3 85 C 347 Notes 3 86 Other Parallel Operations Parallel/Distributed ggregates Duplicate elimination ort first (in parallel) then eliminate duplicates in result Partition tuples (range or hash) and eliminate locally a 1 toy 10 2 toy 20 3 sales 15 ggregates Partition by grouping attributes; compute aggregate locally b 4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30 sum (sal) group by dept C 347 Notes 3 87 C 347 Notes 3 88
23 Parallel/Distributed ggregates Parallel/Distributed ggregates a b 1 toy 10 2 toy 20 3 sales 15 4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30 1 toy 10 2 toy 20 5 toy 20 6 mgmt 15 8 mgmt 30 3 sales 15 4 sales 5 7 sales 10 a b 1 toy 10 2 toy 20 3 sales 15 4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30 1 toy 10 2 toy 20 5 toy 20 6 mgmt 15 8 mgmt 30 3 sales 15 4 sales 5 7 sales 10 sum sum dept sum toy 50 mgmt 45 dept sum sales 30 sum (sal) group by dept sum (sal) group by dept C 347 Notes 3 89 C 347 Notes 3 90 Parallel/Distributed ggregates Parallel/Distributed ggregates a 1 toy 10 2 toy 20 3 sales 15 a 1 toy 10 2 toy 20 3 sales 15 sum dept salary toy 30 toy 20 mgmt 45 b 4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30 b 4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30 sum dept salary sales 15 sales 15 sum (sal) group by dept with less data sum (sal) group by dept with less data C 347 Notes 3 91 C 347 Notes 3 92
24 Parallel/Distributed ggregates Parallel/Distributed ggregates a 1 toy 10 2 toy 20 3 sales 15 sum dept salary toy 30 toy 20 mgmt 45 sum dept sum toy 50 mgmt 45 Enhancement Perform aggregate during partition to reduce data transmitted Does not work for all aggregate functions which ones? b 4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30 sum dept salary sales 15 sales 15 sum dept sum sales 30 sum (sal) group by dept with less data C 347 Notes 3 93 C 347 Notes 3 94 Preview: Mapeduce Parallel/Distributed election traightforward if one can leverage range or hash partitioning data 1 map data B 1 reduce data C 1 How do indexes work? data 2 data B 2 data C 1 data 3 C 347 Notes 3 95 C 347 Notes 3 96
25 Parallel/Distributed election Partition vector can act as the root of a distributed index Parallel/Distributed election Distributed indexes on a non-partition attributes get complicated k0 k1 k0 k1 local indexes index sites site 1 site 2 site 3 tuple sites C 347 Notes 3 97 C 347 Notes 3 98 Parallel/Distributed election Indexing schemes Which one is better in a distributed environment? How to make updates and expansion efficient? Where to store the directory and set of participants? Is global knowledge necessary? If the index is not too big, it may be better to keep it whole and replicate it Query Processing Decomposition Localization Optimization Overview Joins and other operations Inter-operation parallelism Optimization strategies C 347 Notes 3 99 C 347 Notes 3 100
26 Inter-Operation Parallelism Pipelined Parallelism Pipelined Independent site 2 σ c site 1 site 2 site 1 tuples matching σc join probe result C 347 Notes C 347 Notes Pipelined Parallelism Pipelining cannot be used in all cases Independent Parallelism site 1 site 2 T V stream of tuples stream of tuples 1. temp 1 temp 2 T V 2. result temp 1 temp 2 C 347 Notes C 347 Notes 3 104
27 ummary s we consider query plans for optimization, we must consider various new strategies for individual operations scheduling operations C 347 Notes 3 105
σ (R.B = 1 v R.C > 3) (S.D = 2) Conjunctive normal form Topics for the Day Distributed Databases Query Processing Steps Decomposition
Topics for the Day Distributed Databases Query processing in distributed databases Localization Distributed query operators Cost-based optimization C37 Lecture 1 May 30, 2001 1 2 Query Processing teps
More informationParallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism
Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large
More information! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large
Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!
More informationChapter 20: Parallel Databases
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationChapter 20: Parallel Databases. Introduction
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationWhat happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques
376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list
More informationChapter 17: Parallel Databases
Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems
More informationChapter 18: Parallel Databases
Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery
More informationChapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction
Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of
More informationQuery Processing. Introduction to Databases CompSci 316 Fall 2017
Query Processing Introduction to Databases CompSci 316 Fall 2017 2 Announcements (Tue., Nov. 14) Homework #3 sample solution posted in Sakai Homework #4 assigned today; due on 12/05 Project milestone #2
More informationCS 347 Parallel and Distributed Data Processing
C 37 Parallel and Distributed Data Processing pring 2016 Notes : Query Optimization Query Optimization Cost estimation trategies for exploring plans Q min C 37 Notes 2 Based on estimating result sizes
More informationQuery Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.
COS 597: Principles of Database and Information Systems Query Optimization Query Optimization Query as expression over relational algebraic operations Get evaluation (parse) tree Leaves: base relations
More informationRelational Algebra Equivalencies. Database Systems: The Complete Book Ch
elational Algebra Equivalencies Database ystems: he Complete Book Ch. 16.2-16.3 Implementing: Joins olution 1 (Nested-Loop) For Each (a in A) { For Each (b in B) { emit (a, b); }} A B 2 Implementing: Joins
More informationAdvanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Advanced Databases Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch www.mniazi.ir Parallel Join The join operation requires pairs of tuples to be
More informationQuery Evaluation and Optimization
Query Evaluation and Optimization Jan Chomicki University at Buffalo Jan Chomicki () Query Evaluation and Optimization 1 / 21 Evaluating σ E (R) Jan Chomicki () Query Evaluation and Optimization 2 / 21
More informationIntroduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms
More informationQuery optimization. Query Optimization. Query Optimization. Cost estimation. Another reason why plain IOs not enough: Parallelism
Query optimization Query Optimization It is safer to accept any chance that offers itself, and extemporize a procedure to fit it, than to get a good plan matured, and wait for a chance of using it. Thomas
More informationRelational Algebra. Relational Algebra Overview. Relational Algebra Overview. Unary Relational Operations 8/19/2014. Relational Algebra Overview
The Relational Algebra Relational Algebra Relational algebra is the basic set of operations for the relational model These operations enable a user to specify basic retrieval requests (or queries) Relational
More informationMagda Balazinska - CSE 444, Spring
Query Optimization Algorithm CE 444: Database Internals Lectures 11-12 Query Optimization (part 2) Enumerate alternative plans (logical & physical) Compute estimated cost of each plan Compute number of
More informationParallel DBMS. Prof. Yanlei Diao. University of Massachusetts Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke
Parallel DBMS Prof. Yanlei Diao University of Massachusetts Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke I. Parallel Databases 101 Rise of parallel databases: late 80 s Architecture: shared-nothing
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationAdvanced Databases: Parallel Databases A.Poulovassilis
1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More informationQuery Execution [15]
CSC 661, Principles of Database Systems Query Execution [15] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Query processing involves Query processing compilation parsing to construct parse
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More informationCSIT5300: Advanced Database Systems
CSIT5300: Advanced Database Systems L10: Query Processing Other Operations, Pipelining and Materialization Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science
More informationChapter 13: Query Optimization. Chapter 13: Query Optimization
Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Equivalent Relational Algebra Expressions Statistical
More informationDatabase Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.
Database Systems ( 料 ) December 13/14, 2006 Lecture #10 1 Announcement Assignment #4 is due next week. 2 1 Overview of Query Evaluation Chapter 12 3 Outline Query evaluation (Overview) Relational Operator
More informationEECS 647: Introduction to Database Systems
EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2009 External Sorting Today s Topic Implementing the join operation 4/8/2009 Luke Huan Univ. of Kansas 2 Review DBMS Architecture
More informationIntroduction to Wireless Sensor Network. Peter Scheuermann and Goce Trajcevski Dept. of EECS Northwestern University
Introduction to Wireless Sensor Network Peter Scheuermann and Goce Trajcevski Dept. of EECS Northwestern University 1 A Database Primer 2 A leap in history A brief overview/review of databases (DBMS s)
More informationAlgorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)
Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two
More informationCMP-3440 Database Systems
CMP-3440 Database Systems Relational DB Languages Relational Algebra, Calculus, SQL Lecture 05 zain 1 Introduction Relational algebra & relational calculus are formal languages associated with the relational
More informationChapter 11: Query Optimization
Chapter 11: Query Optimization Chapter 11: Query Optimization Introduction Transformation of Relational Expressions Statistical Information for Cost Estimation Cost-based optimization Dynamic Programming
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More informationCopyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and
Chapter 6 The Relational Algebra and Relational Calculus Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 Outline Unary Relational Operations: SELECT and PROJECT Relational
More informationTrack Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross
Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationOperator Implementation Wrap-Up Query Optimization
Operator Implementation Wrap-Up Query Optimization 1 Last time: Nested loop join algorithms: TNLJ PNLJ BNLJ INLJ Sort Merge Join Hash Join 2 General Join Conditions Equalities over several attributes (e.g.,
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationSecuring Distributed Computation via Trusted Quorums. Yan Michalevsky, Valeria Nikolaenko, Dan Boneh
Securing Distributed Computation via Trusted Quorums Yan Michalevsky, Valeria Nikolaenko, Dan Boneh Setting Distributed computation over data contributed by users Communication through a central party
More informationChapter 18: Parallel Databases
Chapter 18: Parallel Databases Introduction Parallel machines are becoming quite common and affordable Prices of microprocessors, memory and disks have dropped sharply Recent desktop computers feature
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationPARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH
PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 INTRODUCTION In centralized database: Data is located in one place (one server) All DBMS functionalities are done by that server
More informationOutline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection
Outline Query Processing Overview Algorithms for basic operations Sorting Selection Join Projection Query optimization Heuristics Cost-based optimization 19 Estimate I/O Cost for Implementations Count
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationParallel Databases C H A P T E R18. Practice Exercises
C H A P T E R18 Parallel Databases Practice Exercises 181 In a range selection on a range-partitioned attribute, it is possible that only one disk may need to be accessed Describe the benefits and drawbacks
More informationDatabase System Concepts
Chapter 14: Optimization Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2007/2008 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth and Sudarshan.
More informationChapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.
Chapter 18 Strategies for Query Processing We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. 1 1. Translating SQL Queries into Relational Algebra and Other Operators - SQL is
More informationAdvanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Advanced Databases Lecture 4 - Query Optimization Masood Niazi Torshiz Islamic Azad university- Mashhad Branch www.mniazi.ir Query Optimization Introduction Transformation of Relational Expressions Catalog
More informationdata parallelism Chris Olston Yahoo! Research
data parallelism Chris Olston Yahoo! Research set-oriented computation data management operations tend to be set-oriented, e.g.: apply f() to each member of a set compute intersection of two sets easy
More informationChapter 3. Algorithms for Query Processing and Optimization
Chapter 3 Algorithms for Query Processing and Optimization Chapter Outline 1. Introduction to Query Processing 2. Translating SQL Queries into Relational Algebra 3. Algorithms for External Sorting 4. Algorithms
More informationChapter 6 Part I The Relational Algebra and Calculus
Chapter 6 Part I The Relational Algebra and Calculus Copyright 2004 Ramez Elmasri and Shamkant Navathe Database State for COMPANY All examples discussed below refer to the COMPANY database shown here.
More informationDBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation.
Query Processing QUERY PROCESSING refers to the range of activities involved in extracting data from a database. The activities include translation of queries in high-level database languages into expressions
More informationIncremental Evaluation of Sliding-Window Queries over Data Streams
Incremental Evaluation of Sliding-Window Queries over Data Streams Thanaa M. Ghanem 1 Moustafa A. Hammad 2 Mohamed F. Mokbel 3 Walid G. Aref 1 Ahmed K. Elmagarmid 1 1 Department of Computer Science, Purdue
More informationQuery Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems
Query Processing with Indexes CPS 216 Advanced Database Systems Announcements (February 24) 2 More reading assignment for next week Buffer management (due next Wednesday) Homework #2 due next Thursday
More informationPhysical Database Design and Tuning
Physical Database Design and Tuning CS 186, Fall 2001, Lecture 22 R&G - Chapter 16 Although the whole of this life were said to be nothing but a dream and the physical world nothing but a phantasm, I should
More informationQUERY OPTIMIZATION [CH 15]
Spring 2017 QUERY OPTIMIZATION [CH 15] 4/12/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 1 Example SELECT distinct ename FROM Emp E, Dept D WHERE E.did = D.did and D.dname = Toy EMP
More informationContents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...
Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing
More informationThe Relational Algebra
The Relational Algebra Relational Algebra Relational algebra is the basic set of operations for the relational model These operations enable a user to specify basic retrieval requests (or queries) 27-Jan-14
More informationFaloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection
More informationCarnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#14(b): Implementation of Relational Operations Administrivia HW4 is due today. HW5 is out. Faloutsos/Pavlo
More informationChapter 14: Query Optimization
Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog
More informationQuery optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.
Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE
More informationDBMS Query evaluation
Data Management for Data Science DBMS Maurizio Lenzerini, Riccardo Rosati Corso di laurea magistrale in Data Science Sapienza Università di Roma Academic Year 2016/2017 http://www.dis.uniroma1.it/~rosati/dmds/
More informationStorage hierarchy. Textbook: chapters 11, 12, and 13
Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow Very small Small Bigger Very big (KB) (MB) (GB) (TB) Built-in Expensive Cheap Dirt cheap Disks: data is stored on concentric circular
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from
More informationQuery Processing and Query Optimization. Prof Monika Shah
Query Processing and Query Optimization Query Processing SQL Query Is in Library Cache? System catalog (Dict / Dict cache) Scan and verify relations Parse into parse tree (relational Calculus) View definitions
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More informationData Storage and Query Answering. Indexing and Hashing (5)
Data Storage and Query Answering Indexing and Hashing (5) Linear Hash Tables No directory. Introduction Hash function computes sequences of k bits. Take only the i last of these bits and interpret them
More informationPhysical Database Design and Tuning. Chapter 20
Physical Database Design and Tuning Chapter 20 Introduction We will be talking at length about database design Conceptual Schema: info to capture, tables, columns, views, etc. Physical Schema: indexes,
More informationParallel DBMS. Chapter 22, Part A
Parallel DBMS Chapter 22, Part A Slides by Joe Hellerstein, UCB, with some material from Jim Gray, Microsoft Research. See also: http://www.research.microsoft.com/research/barc/gray/pdb95.ppt Database
More informationCSE 190D Spring 2017 Final Exam Answers
CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join
More informationAdvances in Data Management Query Processing and Query Optimisation A.Poulovassilis
1 Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis 1 General approach to the implementation of Query Processing and Query Optimisation functionalities in DBMSs 1. Parse
More informationNew Requirements. Advanced Query Processing. Top-N/Bottom-N queries Interactive queries. Skyline queries, Fast initial response time!
Lecture 13 Advanced Query Processing CS5208 Advanced QP 1 New Requirements Top-N/Bottom-N queries Interactive queries Decision making queries Tolerant of errors approximate answers acceptable Control over
More informationParallel DBs. April 23, 2018
Parallel DBs April 23, 2018 1 Why Scale? Scan of 1 PB at 300MB/s (SATA r2 Limit) Why Scale Up? Scan of 1 PB at 300MB/s (SATA r2 Limit) ~1 Hour Why Scale Up? Scan of 1 PB at 300MB/s (SATA r2 Limit) (x1000)
More informationRelational Databases
Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4
More informationPhysical Database Design and Tuning. Review - Normal Forms. Review: Normal Forms. Introduction. Understanding the Workload. Creating an ISUD Chart
Physical Database Design and Tuning R&G - Chapter 20 Although the whole of this life were said to be nothing but a dream and the physical world nothing but a phantasm, I should call this dream or phantasm
More informationQuery Decomposition and Data Localization
Query Decomposition and Data Localization Query Decomposition and Data Localization Query decomposition and data localization consists of two steps: Mapping of calculus query (SQL) to algebra operations
More informationName: Problem 1 Consider the following two transactions: T0: read(a); read(b); if (A = 0) then B = B + 1; write(b);
Name: Problem 1 Consider the following two transactions: T0: read(a); read(b); if (A = 0) then B = B + 1; write(b); T1: read(b); read(a); if (B = 0) then A = A + 1; write(a); Let the consistency requirement
More informationChapter 19 Query Optimization
Chapter 19 Query Optimization It is an activity conducted by the query optimizer to select the best available strategy for executing the query. 1. Query Trees and Heuristics for Query Optimization - Apply
More informationImplementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12
Implementation of Relational Operations CS 186, Fall 2002, Lecture 19 R&G - Chapter 12 First comes thought; then organization of that thought, into ideas and plans; then transformation of those plans into
More informationChapter 6 The Relational Algebra and Calculus
Chapter 6 The Relational Algebra and Calculus 1 Chapter Outline Example Database Application (COMPANY) Relational Algebra Unary Relational Operations Relational Algebra Operations From Set Theory Binary
More informationAnnouncement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17
Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa
More informationLassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems
Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems Last Name: First Name: Student ID: 1. Exam is 2 hours long 2. Closed books/notes Problem 1 (6 points) Consider
More informationIntroduction to Data Management. Lecture #17 (Physical DB Design!)
Introduction to Data Management Lecture #17 (Physical DB Design!) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Homework info:
More informationChapter 5 Relational Algebra. Nguyen Thi Ai Thao
Chapter 5 Nguyen Thi Ai Thao thaonguyen@cse.hcmut.edu.vn Spring- 2016 Contents 1 Unary Relational Operations 2 Operations from Set Theory 3 Binary Relational Operations 4 Additional Relational Operations
More informationFrom SQL-query to result Have a look under the hood
From SQL-query to result Have a look under the hood Classical view on RA: sets Theory of relational databases: table is a set Practice (SQL): a relation is a bag of tuples R π B (R) π B (R) A B 1 1 2
More informationExamples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15
Examples of Physical Query Plan Alternatives Selected Material from Chapters 12, 14 and 15 1 Query Optimization NOTE: SQL provides many ways to express a query. HENCE: System has many options for evaluating
More informationThe Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination
The Extended Algebra Duplicate Elimination 2 δ = eliminate duplicates from bags. τ = sort tuples. γ = grouping and aggregation. Outerjoin : avoids dangling tuples = tuples that do not join with anything.
More informationCS 582 Database Management Systems II
Review of SQL Basics SQL overview Several parts Data-definition language (DDL): insert, delete, modify schemas Data-manipulation language (DML): insert, delete, modify tuples Integrity View definition
More informationRajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10
Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10 RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY, KIRUMAMPAKKAM-607 402 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING QUESTION BANK
More informationRaunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati
Analytical Representation on Secure Mining in Horizontally Distributed Database Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering
More informationAdministrivia. Physical Database Design. Review: Optimization Strategies. Review: Query Optimization. Review: Database Design
Administrivia Physical Database Design R&G Chapter 16 Lecture 26 Homework 5 available Due Monday, December 8 Assignment has more details since first release Large data files now available No class Thursday,
More informationQuery Optimization. Introduction to Databases CompSci 316 Fall 2018
Query Optimization Introduction to Databases CompSci 316 Fall 2018 2 Announcements (Tue., Nov. 20) Homework #4 due next in 2½ weeks No class this Thu. (Thanksgiving break) No weekly progress update due
More informationRELATIONAL DATA MODEL: Relational Algebra
RELATIONAL DATA MODEL: Relational Algebra Outline 1. Relational Algebra 2. Relational Algebra Example Queries 1. Relational Algebra A basic set of relational model operations constitute the relational
More informationDatabase Design and Tuning
Database Design and Tuning Chapter 20 Comp 521 Files and Databases Spring 2010 1 Overview After ER design, schema refinement, and the definition of views, we have the conceptual and external schemas for
More informationWhat s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence
What s a database system? Review of Basic Database Concepts CPS 296.1 Topics in Database Systems According to Oxford Dictionary Database: an organized body of related information Database system, DataBase
More informationParallel DBMS. Lecture 20. Reading Material. Instructor: Sudeepa Roy. Reading Material. Parallel vs. Distributed DBMS. Parallel DBMS 11/15/18
Reading aterial CompSci 516 atabase Systems Lecture 20 Parallel BS Instructor: Sudeepa Roy [RG] Parallel BS: Chapter 22.1-22.5 [GUW] Parallel BS and map-reduce: Chapter 20.1-20.2 Acknowledgement: The following
More information