CS 347 Parallel and Distributed Data Processing

Similar documents
σ (R.B = 1 v R.C > 3) (S.D = 2) Conjunctive normal form Topics for the Day Distributed Databases Query Processing Steps Decomposition

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases. Introduction

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

Chapter 17: Parallel Databases

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Query Processing. Introduction to Databases CompSci 316 Fall 2017

CS 347 Parallel and Distributed Data Processing

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Relational Algebra Equivalencies. Database Systems: The Complete Book Ch

Advanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch

Query Evaluation and Optimization

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Query optimization. Query Optimization. Query Optimization. Cost estimation. Another reason why plain IOs not enough: Parallelism

Relational Algebra. Relational Algebra Overview. Relational Algebra Overview. Unary Relational Operations 8/19/2014. Relational Algebra Overview

Magda Balazinska - CSE 444, Spring

Parallel DBMS. Prof. Yanlei Diao. University of Massachusetts Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Chapter 12: Query Processing. Chapter 12: Query Processing

Advanced Databases: Parallel Databases A.Poulovassilis

Chapter 12: Query Processing

Query Execution [15]

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

CSIT5300: Advanced Database Systems

Chapter 13: Query Optimization. Chapter 13: Query Optimization

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

EECS 647: Introduction to Database Systems

Introduction to Wireless Sensor Network. Peter Scheuermann and Goce Trajcevski Dept. of EECS Northwestern University

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

CMP-3440 Database Systems

Chapter 11: Query Optimization

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross

Database System Concepts

Advanced Database Systems

Operator Implementation Wrap-Up Query Optimization

Chapter 13: Query Processing

Securing Distributed Computation via Trusted Quorums. Yan Michalevsky, Valeria Nikolaenko, Dan Boneh

Chapter 18: Parallel Databases

Query Processing & Optimization

PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Parallel Databases C H A P T E R18. Practice Exercises

Database System Concepts

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

data parallelism Chris Olston Yahoo! Research

Chapter 3. Algorithms for Query Processing and Optimization

Chapter 6 Part I The Relational Algebra and Calculus

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation.

Incremental Evaluation of Sliding-Window Queries over Data Streams

Query Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems

Physical Database Design and Tuning

QUERY OPTIMIZATION [CH 15]

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

The Relational Algebra

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615

Chapter 14: Query Optimization

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

DBMS Query evaluation

Storage hierarchy. Textbook: chapters 11, 12, and 13

Evaluation of Relational Operations

Query Processing and Query Optimization. Prof Monika Shah

Chapter 12: Query Processing

Data Storage and Query Answering. Indexing and Hashing (5)

Physical Database Design and Tuning. Chapter 20

Parallel DBMS. Chapter 22, Part A

CSE 190D Spring 2017 Final Exam Answers

Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis

New Requirements. Advanced Query Processing. Top-N/Bottom-N queries Interactive queries. Skyline queries, Fast initial response time!

Parallel DBs. April 23, 2018

Relational Databases

Physical Database Design and Tuning. Review - Normal Forms. Review: Normal Forms. Introduction. Understanding the Workload. Creating an ISUD Chart

Query Decomposition and Data Localization

Name: Problem 1 Consider the following two transactions: T0: read(a); read(b); if (A = 0) then B = B + 1; write(b);

Chapter 19 Query Optimization

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Chapter 6 The Relational Algebra and Calculus

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Introduction to Data Management. Lecture #17 (Physical DB Design!)

Chapter 5 Relational Algebra. Nguyen Thi Ai Thao

From SQL-query to result Have a look under the hood

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

The Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination

CS 582 Database Management Systems II

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati

Administrivia. Physical Database Design. Review: Optimization Strategies. Review: Query Optimization. Review: Database Design

Query Optimization. Introduction to Databases CompSci 316 Fall 2018

RELATIONAL DATA MODEL: Relational Algebra

Database Design and Tuning

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

Parallel DBMS. Lecture 20. Reading Material. Instructor: Sudeepa Roy. Reading Material. Parallel vs. Distributed DBMS. Parallel DBMS 11/15/18

Transcription:

C 347 Parallel and Distributed Data Processing pring 2016 Query Processing Decomposition Localization Optimization Notes 3: Query Processing C 347 Notes 3 2 Decomposition ame as in centralized system 1. Normalization 2. Eliminating redundancy 3. lgebraic rewriting Normalization Convert from general language to a standard form E.g., relational algebra C 347 Notes 3 3 C 347 Notes 3 4

Normalization Example: Normalization lso detect invalid expressions.,.c select.,.c from, where. =. and σ (.B=1.C>3).D=2 select * from where.d = 3 does not have D attribute ((.B = 1 and.d = 2) or (.C > 3 and.d = 2)) conjunctive normal form C 347 Notes 3 5 C 347 Notes 3 6 Eliminating edundancy E.g., in conditions Eliminating edundancy E.g., sharing common sub-expressions (. = 1) (. > 5) false (. < 10) (. < 5). < 5 σ cond σ cond T σ cond T C 347 Notes 3 7 C 347 Notes 3 8

lgebraic ewriting E.g., pushing conditions down σ cond σ cond3 σ cond1 σ cond2 Query Processing Decomposition One or more algebraic query trees on relations Localization eplacing relations by corresponding fragments C 347 Notes 3 9 C 347 Notes 3 10 Localization teps (1) tart with query (2) eplace relations by fragments (3) Push up and, σ down (apply C 245 rules) Localization Notation for fragment [ : cond] conditions its tuples satisfy fragment (4) implify ~ eliminate unnecessary operations C 347 Notes 3 11 C 347 Notes 3 12

Example (1) Example (2) σ E=3 σ E=3 [ 1: E<10] [ 2: E 10] C 347 Notes 3 13 C 347 Notes 3 14 Example Example (3) (3) σ E=3 σ E=3 [ 1: E<10] [ 2: E 10] σ E=3 σ E=3 [ 1: E<10] [ 2: E 10] Ø C 347 Notes 3 15 C 347 Notes 3 16

Example (4) Localization ule 1 σ E=3 [ 1: E<10] σ c1 [: c2] [: false] σ c1 [: c1 c2] Ø C 347 Notes 3 17 C 347 Notes 3 18 Example Example B σ E=3 [ 2: E 10] σ E=3 [ 2: E=3 E 10] σ E=3 [ 2: false] Ø (1) = common attribute C 347 Notes 3 19 C 347 Notes 3 20

Example B Example B (2) (3) [1: <5] [2: 5 10] [3: >10] [1: <5] [2: 5] [1: <5] [1: <5] [1: <5] [2: 5] [3: >10] [1: <5] [3: >10] [2: 5] [2: 5 10] [1: <5] [2: 5 10] [2: 5] C 347 Notes 3 21 C 347 Notes 3 22 Example B Example B (3) (4) [1: <5] [1: <5] [1: <5] [2: 5] [3: >10] [1: <5] [3: >10] [2: 5] [1: <5] [1: <5] [2: 5 10] [2: 5] [3: >10] [2: 5] [2: 5 10] [1: <5] [2: 5 10] [2: 5] C 347 Notes 3 23 C 347 Notes 3 24

Localization ule 2 Example B [ 1: < 5] [ 2: 5] [: c 1] [: c 2] [ : c 1 c 2.=.] [ 1 2: 1. < 5 2. 5 1.= 2.] [: false] Ø [ 1 2: false] Ø C 347 Notes 3 25 C 347 Notes 3 26 Example C Example C (2) (3) K K K K K [1: <10] [2: 10] [1: K=.K.<10] [2: K=.K. 10] derived fragmentation [1] [1] [1] [2] [2] [1] [2] [2] C 347 Notes 3 27 C 347 Notes 3 28

Example C (3) Example C (4) K K K K K K [1] [1] [1] [2] [2] [1] [2] [2] [1: <10] [1: K=.K.<10] [2: 10] [2: K=.K. 10] C 347 Notes 3 29 C 347 Notes 3 30 Example C Example D K [ 1: < 10] [ 2: K =.K. 10] (1) [ 1 K 2: 1. < 10 2.K =.K. 10 1.K = 2.K] 1 (K,, B) 2 (K, C, D) [ 1 Ø K 2: false] vertical fragmentation C 347 Notes 3 31 C 347 Notes 3 32

Example D Example D (2) (2) K K 1 (K,,B) 2 (K,C,D) K, K, 1 (K,,B) 2 (K,C,D) C 347 Notes 3 33 C 347 Notes 3 34 Example D ule 3 (4) Consider the vertical fragmentation 1 (K,, B) i = (), i i Then for any B i B() = B( i B i Ø ) C 347 Notes 3 35 C 347 Notes 3 36

Query Processing Decomposition Localization Overview Optimization process 1. Generate query plans P 1 P 2 P 3 P n Optimization Overview Joins and other operations Inter-operation parallelism Optimization strategies 2. Estimate size of intermediate results 3. Estimate cost of plan C 1 C 2 C 3 C n C 347 Notes 3 37 4. Pick minimum C 347 Notes 3 38 Overview Differences from centralized optimization New strategies for some operations Parallel/distributed sort Parallel/distributed join emi-join Privacy preserving join Duplicate elimination ggregation election Many ways to assign and schedule processors C 347 Notes 3 39 Parallel/Distributed ort Input a) elation on single site/disk b) elation fragmented/partitioned by sort attribute c) elation fragmented/partitioned by another attribute C 347 Notes 3 40

Parallel/Distributed ort Output a) orted on single site/disk b) orted fragments/partitions Basic ort (K, ) to be sorted on K Horizontally fragmented on K using vector (k 0, k 1,..., k n) 5 6 10 12 15 19 20 21 50 7 3 k0 k1 10 11 17 14 20 27 22 F 1 F 2 F 3 C 347 Notes 3 41 C 347 Notes 3 42 Basic ort Basic ort lgorithm 1. ort each fragment independently 2. hip results if necessary ame idea on different architectures hared nothing P 1 P 2 M F 1 M F 2 hared memory P 1 P 2 F 1 M F 2 C 347 Notes 3 43 C 347 Notes 3 44

ange-partition ort (K, ) to be sorted on K located at one or more site/disk, not fragmented on K ange-partition ort lgorithm 1. ange partition on K 2. Basic sort a 1 local sort 1 k0 b 2 local sort 2 result k1 3 local sort 3 C 347 Notes 3 45 C 347 Notes 3 46 Partition Vectors Problem elect a good partition vector given fragments 7 52 11 14 31 8 15 11 32 17 10 12 a b c 4 Partition Vectors Example approach Each site sends to coordinator MIN sort key MX sort key Number of tuples Coordinator Computes vector and distributes to sites Decides the number of sites to perform local sorts C 347 Notes 3 47 C 347 Notes 3 48

Partition Vectors ample scenario Coordinator receives : MIN = 5 MX = 9 # of tuples = 10 B: MIN = 7 MX = 16 # of tuples = 10 Partition Vectors ample scenario Coordinator receives : MIN = 5 MX = 9 # of tuples = 10 B: MIN = 7 MX = 16 # of tuples = 10 Expected tuples 2 = 10 / (9 5 + 1) 1 = 10 / (16 7 + 1) 5 10 15 20 k0? assuming we want to sort at 2 sites C 347 Notes 3 49 C 347 Notes 3 50 Partition Vectors Partition Vectors ample scenario Expected tuples 2 Variations end more info to coordinator: a) Partition vector for local site Expected tuples with key < k 0 = half of total tuples 2(k 0 5) + (k 0 7) = 10 k 0 = (10 + 10 + 7) / 3 = 9 5 10 15 20 k0? 1 b) Histogram 3 3 3 5 6 8 10 5 6 7 8 9 10 # of tuples local vector C 347 Notes 3 51 C 347 Notes 3 52

Partition Vectors Variations Multiple rounds 1. ites send range and # of tuples to coordinator 2. Coordinator distributes preliminary vector V 0 3. ites send coordinator # of tuples in each V 0 range Partition Vectors Variations Distributed algorithm that does not require a coordinator? 4. Coordinator computes and distributes final vector V f C 347 Notes 3 53 C 347 Notes 3 54 Parallel External ort-merge ame as range-partition sort except does sorting first a local sort a 1 Parallel/Distributed Join Input elations, May or may not be fragmented b local sort b k0 2 k1 result Output esult at one or more sites 3 in order merge C 347 Notes 3 55 C 347 Notes 3 56

Partitioned Join (Equijoin) local join Partitioned Join (Equijoin) ame partition function f is used for both and 1 1 f can be range or hash partitioning B 2 2 B Local join can be of any type (use any C 245 optimization) f() 3 result 3 f() C Various scheduling options, e.g., a) Partition ; partition ; join b) Partition ; build local hash table for ; partition and join C 347 Notes 3 57 C 347 Notes 3 58 Partitioned Join (Equijoin) We already know why it works Partitioned Join (Equijoin) electing a good partition function f is very important Goal is to make all i + i the same size [1] [2] [3] [1] [2] [3] [1] [1] [2] [2] [3] [3] ometimes we partition just to make this join possible C 347 Notes 3 59 C 347 Notes 3 60

symmetric Fragment + eplicate Join symmetric Fragment + eplicate Join 1 local join Can use any partition function f for (even round robin) Can do any join, not just equijoin (e.g., ).<.B B 2 B partition 3 result union C 347 Notes 3 61 C 347 Notes 3 62 General Fragment + eplicate Join General Fragment + eplicate Join 1 1 1 1 B 2 2 B 2 2 f partition 3 fragments 3 3 n copies of each fragment f partition 3 fragments 3 3 n copies of each fragment is partitioned in similar fashion C 347 Notes 3 63 C 347 Notes 3 64

General Fragment + eplicate Join General Fragment + eplicate Join The asymmetric F+ join is special case of the general F+ 1 1 1 2 symmetric F+ may be good if is small 2 1 2 2 n m pairings of, fragments lso works on non-equijoins 3 1 3 2 result C 347 Notes 3 65 C 347 Notes 3 66 emi-join emi-join Goal: reduce communication traffic ( ) or ( ) or ( ) ( ) B 2 a 10 b 25 c 30 d C 3 x 10 y 15 z 25 w 32 x C 347 Notes 3 67 C 347 Notes 3 68

emi-join emi-join B 2 a 10 b 25 c 30 d C 3 x 10 y 15 z 25 w 32 x B 2 a 10 b 25 c 30 d C 3 x 10 y 15 z 25 w 32 x = [2, 10, 25, 30] = [2, 10, 25, 30] 10 y = 25 w C 347 Notes 3 69 C 347 Notes 3 70 emi-join emi-join B 2 a 10 b 25 c 30 d C 3 x 10 y 15 z 25 w 32 x ssume is the smaller relation Then, in general, ( ) is better than ( ) if size( ) + size ( ) < size ( ) Transmitted data With semi-join ( ): T = 4 + 2 + C + result With join : T = 4 + B + result better if B is large Can do similar comparison for other joins Only taking into account transmission cost C 347 Notes 3 71 C 347 Notes 3 72

emi-join Trick: encode (or ) as a bit vector emi-join Useful for three way joins T keys in 0 0 1 1 0 1 0 0 0 0 1 0 1 0 0 one bit/possible key C 347 Notes 3 73 C 347 Notes 3 74 emi-join Useful for three way joins T emi-join Useful for three way joins T Option 1: T where = = T Option 1: T where = = T Option 2: T where = = T C 347 Notes 3 75 C 347 Notes 3 76

emi-join Useful for three way joins T Option 1: T where = = T Option 2: T where = = T Many options ~ exponential in # of relations Privacy Preserving Join ite 1 has (, B) ite 2 has (, C) Want to compute ite 1 should not discover any info not in the join ite 2 should not discover any info not in the join C 347 Notes 3 77 C 347 Notes 3 78 Privacy Preserving Join emi-join won t work If site 1 sends to site 2, site 2 learns all keys of Privacy Preserving Join Fix: send hash keys only ite 1 hashes each value of before sending ite 2 hashes its own values (same h) to see what tuples match a1 a2 a3 B b1 b2 b3 = (a1, a2, a3, a4) a1 a3 a5 C c1 c2 c3 a1 a2 a3 B b1 b2 b3 = (h(a1), h(a2), h(a3), h(a4)) site 2 sees it has h(a1), h(a3) a1 a3 a5 C c1 c2 c3 a4 b4 a7 c4 site 1 site 2 a4 b4 a7 c4 (a1, c1), (a3, c3) site 1 site 2 C 347 Notes 3 79 C 347 Notes 3 80

Privacy Preserving Join Privacy Preserving Join What is the problem? What is the problem? a1 a2 a3 B b1 b2 b3 = (h(a1), h(a2), h(a3), h(a4)) site 2 sees it has h(a1), h(a3) a1 a3 a5 C c1 c2 c3 a1 a2 a3 B b1 b2 b3 = (h(a1), h(a2), h(a3), h(a4)) site 2 sees it has h(a1), h(a3) a1 a3 a5 C c1 c2 c3 a4 b4 a7 c4 (a1, c1), (a3, c3) site 1 site 2 a4 b4 a7 c4 (a1, c1), (a3, c3) site 1 site 2 Dictionary attack ite 2 can take all keys a 1, a 2, a 3, and checks if h(a 1), h(a 2), h(a 3), matches what site 1 sent C 347 Notes 3 81 C 347 Notes 3 82 Privacy Preserving Join Our adversary model: honest but curious Dictionary attack is possible (cheating is internal and can t be caught) ending incorrect keys not possible (cheater could be caught) Privacy Preserving Join One solution Information haring cross Private Databases. grawal et al., IGMOD 2003 Use commutative encryption functions E i(x) = x encrypted using a key private to site i E 1(E 2(x)) = E 2(E 1(x)) horthand: E 1(x) is x, E 2(x) is x, E 1(E 2(x)) is x C 347 Notes 3 83 C 347 Notes 3 84

Privacy Preserving Join Other Privacy Preserving Operations One solution Inequality join.>. a1 a2 B b1 b2 ( a1, a2, a3, a4 ) a1 a3 C c1 c2 imilarity join sim(.,.)<e a3 b3 ( a1, a2, a3, a4 ) a5 c3 a4 b4 a7 c4 ( a1, a3, a5, a7 ) site 1 site 2 Computes ( a1, a3, a5, a7 ), intersects with ( a1, a2, a3, a4 ) (a1, b1), (a3, b3) C 347 Notes 3 85 C 347 Notes 3 86 Other Parallel Operations Parallel/Distributed ggregates Duplicate elimination ort first (in parallel) then eliminate duplicates in result Partition tuples (range or hash) and eliminate locally a 1 toy 10 2 toy 20 3 sales 15 ggregates Partition by grouping attributes; compute aggregate locally b 4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30 sum (sal) group by dept C 347 Notes 3 87 C 347 Notes 3 88

Parallel/Distributed ggregates Parallel/Distributed ggregates a b 1 toy 10 2 toy 20 3 sales 15 4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30 1 toy 10 2 toy 20 5 toy 20 6 mgmt 15 8 mgmt 30 3 sales 15 4 sales 5 7 sales 10 a b 1 toy 10 2 toy 20 3 sales 15 4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30 1 toy 10 2 toy 20 5 toy 20 6 mgmt 15 8 mgmt 30 3 sales 15 4 sales 5 7 sales 10 sum sum dept sum toy 50 mgmt 45 dept sum sales 30 sum (sal) group by dept sum (sal) group by dept C 347 Notes 3 89 C 347 Notes 3 90 Parallel/Distributed ggregates Parallel/Distributed ggregates a 1 toy 10 2 toy 20 3 sales 15 a 1 toy 10 2 toy 20 3 sales 15 sum dept salary toy 30 toy 20 mgmt 45 b 4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30 b 4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30 sum dept salary sales 15 sales 15 sum (sal) group by dept with less data sum (sal) group by dept with less data C 347 Notes 3 91 C 347 Notes 3 92

Parallel/Distributed ggregates Parallel/Distributed ggregates a 1 toy 10 2 toy 20 3 sales 15 sum dept salary toy 30 toy 20 mgmt 45 sum dept sum toy 50 mgmt 45 Enhancement Perform aggregate during partition to reduce data transmitted Does not work for all aggregate functions which ones? b 4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30 sum dept salary sales 15 sales 15 sum dept sum sales 30 sum (sal) group by dept with less data C 347 Notes 3 93 C 347 Notes 3 94 Preview: Mapeduce Parallel/Distributed election traightforward if one can leverage range or hash partitioning data 1 map data B 1 reduce data C 1 How do indexes work? data 2 data B 2 data C 1 data 3 C 347 Notes 3 95 C 347 Notes 3 96

Parallel/Distributed election Partition vector can act as the root of a distributed index Parallel/Distributed election Distributed indexes on a non-partition attributes get complicated k0 k1 k0 k1 local indexes index sites site 1 site 2 site 3 tuple sites C 347 Notes 3 97 C 347 Notes 3 98 Parallel/Distributed election Indexing schemes Which one is better in a distributed environment? How to make updates and expansion efficient? Where to store the directory and set of participants? Is global knowledge necessary? If the index is not too big, it may be better to keep it whole and replicate it Query Processing Decomposition Localization Optimization Overview Joins and other operations Inter-operation parallelism Optimization strategies C 347 Notes 3 99 C 347 Notes 3 100

Inter-Operation Parallelism Pipelined Parallelism Pipelined Independent site 2 σ c site 1 site 2 site 1 tuples matching σc join probe result C 347 Notes 3 101 C 347 Notes 3 102 Pipelined Parallelism Pipelining cannot be used in all cases Independent Parallelism site 1 site 2 T V stream of tuples stream of tuples 1. temp 1 temp 2 T V 2. result temp 1 temp 2 C 347 Notes 3 103 C 347 Notes 3 104

ummary s we consider query plans for optimization, we must consider various new strategies for individual operations scheduling operations C 347 Notes 3 105